Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load examples for scanpy are incorrect and misleading #267

Open
GeoffSCollins opened this issue Jul 30, 2024 · 0 comments
Open

Load examples for scanpy are incorrect and misleading #267

GeoffSCollins opened this issue Jul 30, 2024 · 0 comments

Comments

@GeoffSCollins
Copy link

Within this file, it describes the following lines of code to load the downloaded data into a scanpy object:

import scanpy as sc
import pandas as pd
ad = sc.read_text("exprMatrix.tsv.gz")
meta = pd.read_csv("meta.tsv", sep="\t")
ad.var = meta

OR

import scanpy as sc
import pandas as pd
ad = sc.read_mtx("matrix.mtx.gz")
meta = pd.read_csv("meta.tsv", sep="\t")
ad.var = meta

I attempted to do this but ran into issues where cell and sample metadata was found in the var segment and genes were listed in the obs segment. This is described in a handful of places across scanpy and anndata, one of which is here.

After some investigation, I found that the expression matrix (exprMatrix.tsv.gz) I downloaded from cells.ucsc.edu was transposed, leading to this error. So, users such as myself should be instructed to transpose the matrix prior to loading it into scanpy.

I would make a PR for this repo, but it looks like I can't create a branch on the repo unless I fork it. So, below are the suggestions I would make to load.rst

Scanpy
^^^^^^

To create an anndata object in Scanpy if the expression matrix is a .tsv.gz file::

    import scanpy as sc
    import pandas as pd

    # transpose the downloaded expression matrix from cells.ucsc.edu
    data = pd.read_csv("exprMatrix.tsv.gz")
    
    # set the row index to be genes
    pd.set_index('gene', inplace=True)
    
    # transpose the matrix
    transposed_matrix = data.transpose()
    
    # write the transposed matrix to a file and then load into scanpy
    transposed_matrix.to_csv("transposed_matrix.tsv", sep="\t")
    
    ad = sc.read_text("transposed_matrix.tsv")

    # read the metadata and put it into the obs segment
    meta = pd.read_csv("meta.tsv", sep="\t")
    ad.obs = meta

If the expression matrix is an MTX file::

    import scanpy as sc
    import pandas as pd
    ad = sc.read_mtx("matrix.mtx.gz")
    meta = pd.read_csv("meta.tsv", sep="\t")
    ad.obs = meta
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant