A ratiometric method for detecting somatic selection in cancer

This code takes somatic mutation data from the International Cancer Genome Consortium and for each gene calculates the ratio of nonsynonymous to synonymous mutations (normalised by codon composition etc.). Positively selected genes (cancer driver genes) will have a large ratio, negatively selected genes (essential genes / non-oncogene addiction genes) will have a small ratio. For a more detailed description refer to the manuscript.

By Daniel Wells and Benjamin Schuster-Böckler at the Ludwig Institute for Cancer Reasearch, University of Oxford

Installation

First ensure all dependencies are installed and loaded. Then download the code and download the ICGC project list:

curl https://github.com/daniel-wells/somatic-selection/archive/master.zip -o somatic-selection.zip
unzip somatic-selection.zip
# alternatively: git clone --depth 1 https://github.com/daniel-wells/somatic-selection.git
cd somatic-selection
Rscript code/download_ICGC_list.R

Due to licencing constraints and poor programmatic data accessibility the Cosmic Cancer Gene Census needs to be downloaded manually before running the method. The download requires registration and can be completed as follows:

sftp "[email protected]"@sftp-cancer.sanger.ac.uk
# enter password now
get /files/grch38/cosmic/v77/cancer_gene_census.csv cancer_gene_census.csv

"cancer_gene_census.csv" should now be in the same directory as the Makefile.

The file '/mnt/lustre/users/bschuster/TCGA/Coverage/mappability_100bp_windows_exons.bed.gz' is a 100bp sliding window of mappability for all exonic positions and was created from wgEncodeCrgMapabilityAlign50mer.bigWig using bedtools and then converted using bigWigToBedGraph. Resulting in a 451M file with the following format:

chr1    11769   0.139
chr1    11770   0.139
chr1    11771   0.139

For now the script just copies this local copy to the project directory.

Then proceed to run the method (~3 hours):

make

Results can be found in the 'results' directory which will be automatically created. If anything goes wrong then look through the files in the 'logs' directory for errors.

Data Flow Diagram

Note that the randomisaton control and the expression (RNAseq) annotations do not run automatically.

Dependencies

GNU Make 3.81
R 3.2.2
- data.table 1.9.6
- jsonlite (converting ICGC project list)
- gdata (Excell spreadsheet conversion)
- ggplot2 2.0.0 (plotting graphs)
- ggrepel 0.4 (labeling graphs)
- Bioconductor: SomaticSignatures (calculating mutaiton profiles)
- Bioconductor: BSgenome.Hsapiens.UCSC.hg19 (")
bcftools 1.0 (for processing ExAC)

Optional (for creating data flow diagram):

Python 2.6.6 (r266:84292)
graphviz 2.38.0 (20140413.2041) (dot for creating data flow diagram)

Hardware Requirements

This code was designed for use on a machine with at least 64GB of RAM and 30GB of avaliable disk space. ~9GB of raw data will be downloaded for the analysis.

Licence

This software is licenced under the MIT licence, see LICENSE.md for further details.

Name		Name	Last commit message	Last commit date
Latest commit History 233 Commits
auxiliary		auxiliary
code		code
results		results
LICENSE.md		LICENSE.md
Makefile		Makefile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A ratiometric method for detecting somatic selection in cancer

Installation

Data Flow Diagram

Dependencies

Hardware Requirements

Licence

About

Releases

Packages

Languages

License

daniel-wells/somatic-selection

Folders and files

Latest commit

History

Repository files navigation

A ratiometric method for detecting somatic selection in cancer

Installation

Data Flow Diagram

Dependencies

Hardware Requirements

Licence

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages