Skip to content

Latest commit

 

History

History
144 lines (95 loc) · 4.28 KB

README.md

File metadata and controls

144 lines (95 loc) · 4.28 KB

H5MANIPULATOR

Read, Rearrange, and Write HDF5 files with 10x Genomics conventions

Requirements

H5MANIPULATOR requires libraries for HDF5 files. On Windows, these are bundled with the rhdf5 package. On Linux/Unix, you will need to first install hdf5 libraries.

This can usually be accomplished with:

sudo apt-get install hdf5-dev

or

sudo yum install hdf5-dev

Once hdf5 libraries are available, you can proceed to installation of rhdf5.

The rhdf5 package is provided through BioConductor, and can be installed using:

if(!"BiocManager" %in% .packages(all.available = TRUE)) {
  install.packages("BiocManager")
}
BiocManager::install("rhdf5")

H5MANIPULATOR also requires the data.table, ids, and Matrix packages, which are available on CRAN and should be automatically installed by install_github().

Installation

This package can be installed from Github using the devtools package.

You may first need to register your GitHub PAT, as this is a private repository.

Get access token from github:

from github:

Make sure to copy your new personal access token now. You won’t be able to see it again!

Sys.setenv(GITHUB_PAT = "your-access-token-here")
devtools::install_github("PAIN-initiative/H5MANIPULATOR")

.h5 structure

The h5 produced here has the addition of richer metadata and additional results.

Reading .h5 files

Reading as a Seurat object

We can read and .h5 file directoy into a Seurat object using read_h5_seurat():

library(H5MANIPULATOR)

so <- read_h5_seurat(h5_file)

This function places the RNA-seq counts in the "RNA" assay.

Reading as a SingleCellExperiment object

Likewise, we can read directly into a SingleCellExperiment object for use with BioConductor packages using read_h5_sce().

library(H5MANIPULATOR)

sce <- read_h5_sce(h5_file)

Note that this requires a recent version of SingleCellExperiment (>= 1.8.0).

Reading the matrix directly

There is a convenience function to directly read the main cell x gene matrix from the HDF5 file, read_h5_dgCMatrix():

library(H5MANIPULATOR)

mat <- read_h5_dgCMatrix(h5_file)

Note: By default, this matrix will be 1-indexed for convenient use in R. If you would rather retrieve a 0-indexed matrix, set the index1 parameter to FALSE:

mat <- read_h5_dgCMatrix(h5_file,
                         index1 = FALSE)

Reading cell metadata directly

A convenience function is provided to retrieve all cell/observation-based metadata, read_h5_cell_meta():

cell_meta <- read_h5_cell_meta(h5_file)

Note that for the test dataset, this is only the cell barcodes, as additional metadata are not present.

Reading feature metadata directly

A similar function is also provided for gene/feature-based metadata, read_h5_feature_meta():

feat_meta <- read_h5_feature_meta(h5_file)

Reading and separating out all contents

To read the entirety of an HDF5 file as a list object, use h5dump():

library(H5MANIPULATOR)

h5_list <- h5dump(h5_file)
str(h5_list)

This is a very raw representation of the contents of these HDF5 files. You may want to convert the major components to a sparse matrix (for cell x gene counts), and a data.frame (for metadata):

h5_list <- h5_list_convert_to_dgCMatrix(h5_list,
                                        target = "matrix")
                                        
mat <- h5_list$matrix_dgCMatrix

feature_metadata <- as.data.frame(h5_list$matrix$features[-1])

Now, mat will consist of a dgCMatrix with genes as rows and barcodes as columns, and feature_metadata will be a data.frame with genes as rows and various metadata as columns.

For this test dataset, there isn't any cell metadata. However, files that are generated by our pipeline will include a substantial metadata set stored in matrix/observations. This can be retrieved with:

cell_metadata <- cbind(data.frame(barcodes = h5_list$matrix$barcodes),
                       as.data.frame(h5_list$matrix$observations))