Revamp manuscripts documentation

databio · Jun 13, 2024 · a98c3fc · a98c3fc
1 parent 20ae12f
commit a98c3fc
Show file tree

Hide file tree

Showing 7 changed files with 36 additions and 11 deletions.
diff --git a/docs/geniml/manuscripts/gharavi2021.md → docs/manuscripts/gharavi2021.md b/docs/geniml/manuscripts/gharavi2021.md → docs/manuscripts/gharavi2021.md
@@ -4,4 +4,4 @@
 
 This paper was our first publication showing how to build and evaluate region set embeddings using region-set2vec, based on word2vec.
 
-See: [train Region2Vec embeddings](../tutorials/region2vec.md)
+See: [train Region2Vec embeddings](../geniml/tutorials/region2vec.md)
diff --git a/docs/geniml/manuscripts/gharavi2024.md → docs/manuscripts/gharavi2024.md b/docs/geniml/manuscripts/gharavi2024.md → docs/manuscripts/gharavi2024.md
@@ -10,5 +10,5 @@ As available genomic interval data increase in scale, we require fast systems to
 
 This paper trained BEDspace models (using StarSpace with BED files). See these tutorials:
 
-- [How to use BEDSpace to jointly embed regions and metadata](../tutorials/bedspace.md)
+- [How to use BEDSpace to jointly embed regions and metadata](../geniml/tutorials/bedspace.md)
 
diff --git a/docs/manuscripts/gu2021.md b/docs/manuscripts/gu2021.md
@@ -0,0 +1,14 @@
+# Bedshift: perturbation of genomic interval sets
+
+Paper: [Manuscript at Genome Biology](https://doi.org/10.1186/s13059-021-02440-w) 
+
+
+## Abstract
+
+Functional genomics experiments, like ChIP-Seq or ATAC-Seq, produce results that are summarized as a region set. There is no way to objectively evaluate the effectiveness of region set similarity metrics. We present Bedshift, a tool for perturbing BED files by randomly shifting, adding, and dropping regions from a reference file. The perturbed files can be used to benchmark similarity metrics, as well as for other applications. We highlight differences in behavior between metrics, such as that the Jaccard score is most sensitive to added or dropped regions, while coverage score is most sensitive to shifted regions.
+
+## Relevant tutorials
+
+Analysis from the paper is described in these tutorials: 
+
+- [Randomizing BED files with BEDshift](../geniml/tutorials/bedshift.md)
diff --git a/docs/geniml/manuscripts/leroy2024.md → docs/manuscripts/leroy2024.md b/docs/geniml/manuscripts/leroy2024.md → docs/manuscripts/leroy2024.md
@@ -8,3 +8,11 @@ Paper: [Manuscript at bioRxiv](http://dx.doi.org/10.1101/2023.08.01.551452)
 **Motivation** Data from the single-cell assay for transposase-accessible chromatin using sequencing (scATAC-seq) is now widely available. One major computational challenge is dealing with high dimensionality and inherent sparsity, which is typically addressed by producing lower-dimensional representations of single cells for downstream clustering tasks. Current approaches produce such individual cell embeddings directly through a one-step learning process. Here, we propose an alternative approach by building embedding models pre-trained on reference data. We argue that this provides a more flexible analysis workflow that also has computational performance advantages through transfer learning.
 
 **Results** We implemented our approach in scEmbed, an unsupervised machine learning framework that learns low-dimensional embeddings of genomic regulatory regions to represent and analyze scATAC-seq data. scEmbed performs well in terms of clustering ability and has the key advantage of learning patterns of region co-occurrence that can be transferred to other, unseen datasets. Moreover, pre-trained models on reference data can be exploited to build fast and accurate cell-type annotation systems without the need for other data modalities. scEmbed is implemented in Python and it is available to download from GitHub. We also make our pre-trained models available on huggingface for public use.
+
+## Relevant tutorials
+
+Analysis from the paper is described in these tutorials: 
+
+- [Train single-cell embeddings](../geniml/tutorials/train-scembed-model.md)
+- [Populate a vector store](../geniml/tutorials/load-qdrant-with-cell-embeddings.md)
+- [Predict cell-types using KNN](../geniml/tutorials/cell-type-annotation-with-knn.md)
diff --git a/docs/geniml/manuscripts/rymuza2024.md → docs/manuscripts/rymuza2024.md b/docs/geniml/manuscripts/rymuza2024.md → docs/manuscripts/rymuza2024.md
@@ -17,11 +17,11 @@ This paper published 2 types of method: 1. Methods to *construct* a universe, an
 
 You can construct a universe either on the command line, or using geniml as a library:
 
-- [Create consensus peaks with CLI](../tutorials/create-consensus-peaks.md)
-- [Create consensus peaks with Python](../code/create-consensus-peaks-python.md)
+- [Create consensus peaks with CLI](../geniml/tutorials/create-consensus-peaks.md)
+- [Create consensus peaks with Python](../geniml/code/create-consensus-peaks-python.md)
 
 ### 2. Evaluating a universe
 
 The main methods are implemented in the `assess-universe` model with tutorial:
 
-- [Assess universe fit tutorial](../tutorials/assess-universe.md)
+- [Assess universe fit tutorial](../geniml/tutorials/assess-universe.md)
diff --git a/docs/geniml/manuscripts/zheng2024.md → docs/manuscripts/zheng2024.md b/docs/geniml/manuscripts/zheng2024.md → docs/manuscripts/zheng2024.md
@@ -9,4 +9,6 @@ Representation learning models have become a mainstay of modern genomics. These
 
 ## Relevant tutorials
 
-To evaluate, refer to this tutorial: https://github.com/databio/region2vec_eval
+Analysis from the paper is described in these tutorials: 
+
+- [How to evalute embeddings](../geniml/tutorials/evaluation.md)
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -119,11 +119,12 @@ nav:
   - How to cite: 
     - How to cite: citations.md
     - Published manuscripts:
-      - Gharavi et al. 2021: geniml/manuscripts/gharavi2021.md
-      - Rymuza et al. 2024: geniml/manuscripts/rymuza2024.md
-      - Gharavi et al. 2024: geniml/manuscripts/gharavi2024.md
-      - LeRoy et al.  2024: geniml/manuscripts/leroy2024.md
-      - Zheng et al. 2024: geniml/manuscripts/zheng2024.md
+      - Gharavi et al. 2021: manuscripts/gharavi2021.md
+      - Gu et al. 2021: manuscripts/gu2021.md
+      - Rymuza et al. 2024: manuscripts/rymuza2024.md
+      - Gharavi et al. 2024: manuscripts/gharavi2024.md
+      - LeRoy et al.  2024: manuscripts/leroy2024.md
+      - Zheng et al. 2024: manuscripts/zheng2024.md
 
 autodoc:
   jupyter: