Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible solution for converting categorical variable (factor) to proper text categoricals in H5AD #183

Open
mvfki opened this issue Jun 25, 2024 · 1 comment

Comments

@mvfki
Copy link

mvfki commented Jun 25, 2024

Exactly related to #138

For SeuratDisk Team,

So after some exploration, I believe the categorical data in the "obs" of an H5AD file works in a way by saving zero-based integer values in the 1D H5D array, and have its attribute as an H5 reference pointing to another location in the same H5AD file 'obs/__categories/variableName' where the "factor's levels" are saved. From hdf5r interface, that reference is presented as "H5R_OBJECT" class. However, I haven't yet found a clean way to create it but I can hack it by modifying the source code of an H5D class'es create_reference() method:

# `self` is an H5D object
.H5.create_reference <- function(self, ...) {
    space <- self$get_space()
    do.call("[", c(list(space), list(...)))
    ref_type <- hdf5r::h5const$H5R_OBJECT
    ref_obj <- hdf5r::H5R_OBJECT$new(1, self)
    res <- .Call("R_H5Rcreate", ref_obj$ref, self$id, ".", ref_type,
                 space$id, FALSE, PACKAGE = "hdf5r")
    if (res$return_val < 0) {
        stop("Error creating object reference")
    }
    ref_obj$ref <- res$ref
    return(ref_obj)
}

Overall and briefly, you create the H5D (call it a) for a factor from a data.frame by writing integers in it, and create another H5D (call it b) in "obs/__categories" for its levels, create a reference object ref <- .H5.create_reference(b), and then do a$create_attr(attr_name = "categories", robj = b, space = Scalar(), dtype = GuessDType(b)). This works for me to make an H5AD file loadable in Python with text categorical annotations shown properly. But the call of .Call() would trigger NOTES in the R CMD check of my package.

It would be nice if you would like to include this in your future updates or come up with an even better cleaner way to prevent the check notes!

Best,
Yichen


For users,

I'll go from the tutorial

library(Seurat)
library(SeuratData)
library(SeuratDisk)
InstallData("pbmc3k")
data("pbmc3k.final")
SaveH5Seurat(pbmc3k.final, filename = "pbmc3k.h5Seurat")
Convert("pbmc3k.h5Seurat", dest = "h5ad")

Until this point you should see the file pbmc3k.h5ad created on disk and it can be loaded in Python with integer values in "orig.ident", "seurat_annotations" etc.

Go back to your R session and do:

# Load utilities you'll need
# The library
library(hdf5r)
# My hack function
H5.create_reference <- function(self, ...) {
    space <- self$get_space()
    do.call("[", c(list(space), list(...)))
    ref_type <- hdf5r::h5const$H5R_OBJECT
    ref_obj <- hdf5r::H5R_OBJECT$new(1, self)
    res <- .Call("R_H5Rcreate", ref_obj$ref, self$id, ".", ref_type,
                 space$id, FALSE, PACKAGE = "hdf5r")
    if (res$return_val < 0) {
        stop("Error creating object reference")
    }
    ref_obj$ref <- res$ref
    return(ref_obj)
}
# Load the H5AD file, which is indeed an H5 file, "r+" mode for read-and-write access
h5ad <- H5File$new("pbmc3k.h5ad", "r+")
# Fix for `orig.ident`
ref.orig.ident <- H5.create_reference(h5ad[['obs/__categories/orig.ident']])
h5ad[['obs/orig.ident']]$create_attr(
    attr_name = "categories", 
    robj = ref.orig.ident, 
    space = H5S$new(type = "scalar")
)
# You might see it returns something of H5A class. Don't worry about it.
# And manually do the same for other categorical variables...
# Finally remember to close the H5AD file connection which has write-access on
h5ad$close_all()

Then you can reload the AnnData in Python and see the changes 😉

@mvfki
Copy link
Author

mvfki commented Jun 25, 2024

Should also be related to #137

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant