Skip to content

MarkusHaak/mvTCR_reproducibility

 
 

Repository files navigation

Reproducibility

This repo contains the code to reproduce the results from Integrating T-cell receptor and transcriptome for large-scale single-cell immune profiling analysis.

Setup

To reproduce the results, clone this repo via:

git clone [email protected]:SchubertLab/mvTCR_reproducibility.git

Additionally, you will need to clone the mvTCR code within the folder mvTCR_reproducability:

git clone [email protected]:SchubertLab/mvTCR.git

Linux

Please run the following commands for a Linux-based OS:

conda create --name mvTCR python=3.8.8 -y 
conda activate mvTCR 
pip install -r requirements.txt 
conda install nb_conda_kernels -y 

Windows

Please comment torch from the requirements.txt, i.e. write a # before torch. Then execute the line to install all the requirements except PyTorch:

conda create --name mvTCR python=3.8.8 -y
conda activate mvTCR
pip install -r requirements.txt 
conda install nb_conda_kernels -y

Then install PyTorch 1.8.0 with the correct CUDA Version following the command here: https://pytorch.org/get-started/previous-versions/

Preprocessing

The folder preprocessing contains jupyter notebooks for preprocessing the single cell data. The links to obtain the data are referenced in the corresponding publiction and in our paper.

Experiments

The folder experiments contains the settings to train the mvTCR model on all datasets mentioned in the paper. Typically, the hyperparameter optimization was conducted for 48 GPU-hours. After HPO, you will need to copy the best resulting model indicated by the console output, or SQL database, to the corresponding folder, as indicated in the evaluation files. Based on different runtimes on the compute clusters, different hardware components, and CUDA-versions, the resulting models might diver from the models used in the publication. We therefore also provide all trained models under https://doi.org/10.5281/zenodo.7215447

Evaluation

The notebooks in the folder evaluation are used to derive the shared embedding and therefore requiere the trained models and the processed data. Since these notebooks save intermediate results for creating the figures, you will need to run these notebooks before you can recreate the figures.

Figures

All figures (except concept figure 1a) of the manuscript can be recreated by running the notebooks in the figure-Folder. This requieres to run the evaluation first on the preprocessed data and the models downloaded from Zenodo. Note, that due to reproducability issues in the underlying library, the UMAP visualisation might not be fully recreated. However, the should qualitatively match the results shown in the paper.

Citation

If you use any of any of above work, please cite:

@article {Drost2021.06.24.449733,
	author = {Drost, Felix and An, Yang and Dratva, Lisa M and Lindeboom, Rik GH and Haniffa, Muzlifah and Teichmann, Sarah A and Theis, Fabian and Lotfollahi, Mohammad and Schubert, Benjamin},
	title = {Integrating T-cell receptor and transcriptome for large-scale single-cell immune profiling analysis},
	elocation-id = {2021.06.24.449733},
	year = {2022},
	doi = {10.1101/2021.06.24.449733},
	publisher = {Cold Spring Harbor Laboratory},
	URL = {https://www.biorxiv.org/content/early/2022/10/25/2021.06.24.449733},
	eprint = {https://www.biorxiv.org/content/early/2022/10/25/2021.06.24.449733.full.pdf},
	journal = {bioRxiv}
}

About

Reproducibility Repo for mvTCR paper

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 99.9%
  • Python 0.1%