Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

📝 Edit cellxgene guide #92

Merged
merged 7 commits into from
Sep 11, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
71 changes: 42 additions & 29 deletions docs/cellxgene.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -33,8 +33,7 @@
"If you are interested in building similar data assets in-house:\n",
"\n",
"1. See the [transfer guide](inv:docs#transfer) to zero-copy data to your own LaminDB instance.\n",
"2. See the [scRNA guide](inv:docs#scrna) for how to create a growing versioned queryable scRNA-seq dataset.\n",
"3. See the [Curate](./cellxgene-curate) for validating, curating and registering your own AnnData objects.\n",
"2. See the [scRNA guide](inv:docs#scrna) to create a growing, standardized & versioned scRNA-seq dataset collection.\n",
"\n",
"```{dropdown} Show me a screenshot\n",
"\n",
Expand Down Expand Up @@ -78,9 +77,7 @@
"outputs": [],
"source": [
"import lamindb as ln\n",
"import bionty as bt\n",
"\n",
"from tiledbsoma import AxisQuery"
"import bionty as bt"
]
},
{
Expand Down Expand Up @@ -353,7 +350,7 @@
"id": "12495f94",
"metadata": {},
"source": [
"Queries by string are prone to typos. Let's query with auto-completed records instead."
"Queries by string are prone to typos. Let's query `User` and `CellType` with auto-completed records instead."
]
},
{
Expand Down Expand Up @@ -385,20 +382,14 @@
"id": "d5598f32",
"metadata": {},
"source": [
"### Query and slice artifacts' content"
"### Slice an AnnData-like artifact"
]
},
{
"cell_type": "markdown",
"id": "62f49bf9",
"metadata": {},
"source": [
"Here, we discuss slicing individual `AnnData` arrays. \n",
"\n",
"If you want to slice a large concatenated array store, see the section **Query tiledbsoma array store** below.\n",
"\n",
"In the query above, each artifact stores an array in form of an `.h5ad` file, which corresponds to an `AnnData` object.\n",
"\n",
"Let's look at an artifact and show its metadata using `.describe()`."
]
},
Expand All @@ -413,7 +404,7 @@
},
"outputs": [],
"source": [
"artifact = ln.Artifact.filter(description=\"Mature kidney dataset: immune\").first()\n",
"artifact = ln.Artifact.filter(description=\"Mature kidney dataset: immune\", is_latest=True).one()\n",
"artifact.describe()"
]
},
Expand All @@ -436,12 +427,7 @@
"artifact.labels.get(features.tissue).df()\n",
"```\n",
"\n",
"```\n",
"artifact.labels.get(features.collection).one()\n",
"```\n",
"\n",
":::\n",
"\n"
":::"
]
},
{
Expand All @@ -450,10 +436,34 @@
"metadata": {},
"source": [
"If you want to query a slice of the array data, you have two options:\n",
"1. Cache & load the entire array into memory via `artifact.load() -> AnnData` (caches the h5ad on disk, so that you only download once)\n",
"2. Stream the array using a (cloud-backed) accessor `artifact.open() -> AnnDataAccessor`\n",
"1. Cache to the disk and return the path to the cached data. Doesn't download anything if files are already in the cache.\n",
"2. Cache & load the entire array into memory via `artifact.load() -> AnnData` (caches the h5ad on disk, so that you only download once)\n",
"3. Stream the array using a (cloud-backed) accessor `artifact.open() -> AnnDataAccessor`\n",
"\n",
"Both options will run much faster if you run them close to the data (AWS S3 on the US West Coast, consider logging into hosted compute there)."
"Both will run much faster in the AWS us-west-2 data center."
]
},
{
"cell_type": "markdown",
"id": "bdd54195",
"metadata": {},
"source": [
"Cache:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "49b679ea",
"metadata": {
"tags": [
"hide-output"
]
},
"outputs": [],
"source": [
"cache_path = artifact.cache()\n",
"cache_path"
]
},
{
Expand Down Expand Up @@ -506,6 +516,7 @@
":::{dropdown} See the artifact-level query\n",
"\n",
"```\n",
"collection = ln.Collection.filter(name=\"cellxgene-census\", version=\"2024-07-01\").one()\n",
"query = collection.artifacts.filter(\n",
" organism=organisms.human,\n",
" cell_types__in=[cell_types.dendritic_cell, cell_types.neutrophil],\n",
Expand Down Expand Up @@ -539,8 +550,8 @@
},
"outputs": [],
"source": [
"adata_backed = artifact.open()\n",
"adata_backed"
"with artifact.open() as adata_backed:\n",
" display(adata_backed)"
]
},
{
Expand Down Expand Up @@ -635,7 +646,7 @@
},
"outputs": [],
"source": [
"ln.Collection.filter(version=\"2024-07-01\").search(\"immune human kidney\", limit=10)"
"ln.Collection.filter(version=census_version).search(\"human retina\", limit=10)"
]
},
{
Expand All @@ -657,7 +668,7 @@
},
"outputs": [],
"source": [
"collection = ln.Collection.get(\"kqiPjpzpK9H9rdtnV67f\")\n",
"collection = ln.Collection.get(\"quQDnLsMLkP3JRsC8gp4\")\n",
"collection"
]
},
Expand All @@ -666,7 +677,7 @@
"id": "6b6e4a13",
"metadata": {},
"source": [
"We see it's a Science paper and we could find more information using the [DOI](https://doi.org/10.1126/science.aat5031) or CELLxGENE [collection id](https://cellxgene.cziscience.com/collections/120e86b4-1195-48c5-845b-b98054105eec)."
"We see it's a Science paper and we could find more information using the [DOI](https://doi.org/10.1016/j.xgen.2023.100298) or CELLxGENE [collection id](https://cellxgene.cziscience.com/collections/af893e86-8e9f-41f1-a474-ef05359b1fb7)."
]
},
{
Expand Down Expand Up @@ -807,7 +818,7 @@
"id": "e6fade48",
"metadata": {},
"source": [
"### Query tiledbsoma array store"
"### Slice a tiledbsoma-like artifact"
]
},
{
Expand Down Expand Up @@ -924,6 +935,8 @@
"metadata": {},
"outputs": [],
"source": [
"from tiledbsoma import AxisQuery\n",
"\n",
"with census.open() as store:\n",
" \n",
" experiment = store[\"census_data\"][human]\n",
Expand Down
Loading