H5MANIPULATOR

Read, Rearrange, and Write HDF5 files with 10x Genomics conventions

Requirements

H5MANIPULATOR requires libraries for HDF5 files. On Windows, these are bundled with the rhdf5 package. On Linux/Unix, you will need to first install hdf5 libraries.

This can usually be accomplished with:

sudo apt-get install hdf5-dev

or

sudo yum install hdf5-dev

Once hdf5 libraries are available, you can proceed to installation of rhdf5.

The rhdf5 package is provided through BioConductor, and can be installed using:

if(!"BiocManager" %in% .packages(all.available = TRUE)) {
  install.packages("BiocManager")
}
BiocManager::install("rhdf5")

H5MANIPULATOR also requires the data.table, ids, and Matrix packages, which are available on CRAN and should be automatically installed by install_github().

Installation

This package can be installed from Github using the devtools package.

You may first need to register your GitHub PAT, as this is a private repository.

Get access token from github:

Navigate to Settings / Developer settings
Click Personal access tokens
Generate new token (or re-generate if you have an existing one but you didn't copy it to your password manager).
Under Select scopes
Give the token repo scope

from github:

Make sure to copy your new personal access token now. You won’t be able to see it again!

Sys.setenv(GITHUB_PAT = "your-access-token-here")
devtools::install_github("PAIN-initiative/H5MANIPULATOR")

.h5 structure

The h5 produced here has the addition of richer metadata and additional results.

Reading .h5 files

Reading as a Seurat object

We can read and .h5 file directoy into a Seurat object using read_h5_seurat():

library(H5MANIPULATOR)

so <- read_h5_seurat(h5_file)

This function places the RNA-seq counts in the "RNA" assay.

Reading as a SingleCellExperiment object

Likewise, we can read directly into a SingleCellExperiment object for use with BioConductor packages using read_h5_sce().

library(H5MANIPULATOR)

sce <- read_h5_sce(h5_file)

Note that this requires a recent version of SingleCellExperiment (>= 1.8.0).

Reading the matrix directly

There is a convenience function to directly read the main cell x gene matrix from the HDF5 file, read_h5_dgCMatrix():

library(H5MANIPULATOR)

mat <- read_h5_dgCMatrix(h5_file)

Note: By default, this matrix will be 1-indexed for convenient use in R. If you would rather retrieve a 0-indexed matrix, set the index1 parameter to FALSE:

mat <- read_h5_dgCMatrix(h5_file,
                         index1 = FALSE)

Reading cell metadata directly

A convenience function is provided to retrieve all cell/observation-based metadata, read_h5_cell_meta():

cell_meta <- read_h5_cell_meta(h5_file)

Note that for the test dataset, this is only the cell barcodes, as additional metadata are not present.

Reading feature metadata directly

A similar function is also provided for gene/feature-based metadata, read_h5_feature_meta():

feat_meta <- read_h5_feature_meta(h5_file)

Reading and separating out all contents

To read the entirety of an HDF5 file as a list object, use h5dump():

library(H5MANIPULATOR)

h5_list <- h5dump(h5_file)
str(h5_list)

This is a very raw representation of the contents of these HDF5 files. You may want to convert the major components to a sparse matrix (for cell x gene counts), and a data.frame (for metadata):

h5_list <- h5_list_convert_to_dgCMatrix(h5_list,
                                        target = "matrix")
                                        
mat <- h5_list$matrix_dgCMatrix

feature_metadata <- as.data.frame(h5_list$matrix$features[-1])

Now, mat will consist of a dgCMatrix with genes as rows and barcodes as columns, and feature_metadata will be a data.frame with genes as rows and various metadata as columns.

For this test dataset, there isn't any cell metadata. However, files that are generated by our pipeline will include a substantial metadata set stored in matrix/observations. This can be retrieved with:

cell_metadata <- cbind(data.frame(barcodes = h5_list$matrix$barcodes),
                       as.data.frame(h5_list$matrix$observations))

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

H5MANIPULATOR

Read, Rearrange, and Write HDF5 files with 10x Genomics conventions

Requirements

Installation

.h5 structure

Reading .h5 files

Reading as a Seurat object

Reading as a SingleCellExperiment object

Reading the matrix directly

Reading cell metadata directly

Reading feature metadata directly

Reading and separating out all contents

Files

README.md

Latest commit

History

README.md

File metadata and controls

H5MANIPULATOR

Read, Rearrange, and Write HDF5 files with 10x Genomics conventions

Requirements

Installation

.h5 structure

Reading .h5 files

Reading as a Seurat object

Reading as a SingleCellExperiment object

Reading the matrix directly

Reading cell metadata directly

Reading feature metadata directly

Reading and separating out all contents