Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate Chronos for screen data #112

Open
j-andrews7 opened this issue Feb 2, 2024 · 2 comments
Open

Integrate Chronos for screen data #112

j-andrews7 opened this issue Feb 2, 2024 · 2 comments
Labels
enhancement Improvement for existing functionality

Comments

@j-andrews7
Copy link

Description of feature

Chronos is an enticing addition to this pipeline given the ever-expanding DepMap project which also uses it.

@j-andrews7 j-andrews7 added the enhancement Improvement for existing functionality label Feb 2, 2024
@LaurenceKuhl
Copy link
Contributor

Hi @j-andrews7 yes thank you it was in the backlog in my mind!
Would you have a tiny script where you've used it yourself? Since i've never used it it'd help start pan out what are the input/variables etc.
Thanks!
Laurence

@j-andrews7
Copy link
Author

Honestly, their Github README lays it out more cleanly than I could, but I'll summarize here. In short, it really only needs three pandas dataframe:

  • readcounts: A matrix of raw readcounts, where the columns are targeting sgRNAs, the rows are pDNA sequencing samples or replicate samples, and the entries are the number of reads of the given sgRNA in the given sample.
  • sequence_map: A table with at least four columns, sequence_ID, cell_line_name, pDNA_batch, and days, mapping sequencing samples to cell lines and pDNA measurements. sequence_ID should match the row names of the raw readcounts. days is the number of days between infection and when the sample was collected, should be integer or float. It will be ignored for pDNA samples. cell_line_name MUST be "pDNA" for pDNA samples. if, instead of pDNA, you are sequencing your cells at a very early time point to get initial library abundance, treat these as pDNA samples. If you don't have either, Chronos may not be the right algorithm for your experiment. pDNA_batch is needed when your experiment combines samples that have different pDNA references (within the same library).
  • guide_gene_map: A table with at least two columns, sgrna and gene, mapping the sgRNAs to genes.

Recommended is to also provide a list of negative_control_sgrnas as well.

Then running is just:

import chronos
# This removes clonal outgrowths that are seemingly unrelated to the perturbation
chronos.nan_outgrowths(readcounts, sequence_map, guide_gene_map)

model = chronos.Chronos(
	readcounts={'my_library': readcounts},
	sequence_map={'my_library': sequence_map},
	guide_gene_map={'my_library': guide_gene_map},
    negative_control_sgrnas={'my_library': negative_control_sgrnas}
)

model.train()

model.save("my_save_directory")

# Actual outputs people may be interested in.
gene_effect = model.gene_effect
guide_efficacy = model.guide_efficacy

They have a vignette with a more comprehensive example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improvement for existing functionality
Projects
Status: No status
Development

No branches or pull requests

2 participants