Skip to content

Commit

Permalink
Bug fixes and improvements
Browse files Browse the repository at this point in the history
  • Loading branch information
German Demidov authored and German Demidov committed Jan 20, 2022
1 parent 285fdba commit ae2382e
Show file tree
Hide file tree
Showing 9 changed files with 2,833 additions and 3,257 deletions.
512 changes: 0 additions & 512 deletions .Rhistory

This file was deleted.

10 changes: 7 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ A tool for large-scale CNV and CNA detection.

Authors: G. Demidov, S. Ossowski.

**Thanks to the contributors!** (will prepare a full list later)
**Thanks to the contributors!**

Any issues should be reported to: *german dot demidov at medizin dot uni-tuebingen dot de *. The presentation (around 60 slides with the short description of ClinCNV and results) is available [here](https://github.com/imgag/ClinCNV/tree/master/doc/ClinCNV_thesis_presentation.pdf).

Expand All @@ -17,13 +17,13 @@ ClinCNV is a part of [MegSAP](https://github.com/imgag/megSAP) pipeline.

ClinCNV detects CNVs in germline and somatic context in NGS data (targeted and whole-genome). We work in cohorts, so it makes sense to try `ClinCNV` if you have more than 10 samples (recommended amount - 40 since we estimate variances from the data). By "cohort" we mean samples sequenced with the same enrichment kit with approximately the same depth (ie 1x WGS and 30x WGS better be analysed in separate runs of ClinCNV). Of course it is better if your samples were sequenced within the same sequencing facility.

Currently we work with human genomes `hg19` and `hg38` only. For `hg38` you need to replace `cytobands.txt` with the file `cytobandsHG38.txt`. For mouse genome or any other diploid organism you have to replace *cytobands.txt* with the corresponding file. ClinCNV can work with small panels (hundreds of regions), but GC-correction can not be performed accurately for samples sequenced with such panels.
Currently we work with human genomes `hg19` and `hg38` only. **For `hg38` you need to replace `cytobands.txt` with the file `cytobandsHG38.txt`.** For mouse genome or any other diploid organism you have to replace *cytobands.txt* with the corresponding file. ClinCNV can work with small panels (hundreds of regions), but GC-correction can not be performed accurately for samples sequenced with such panels.

NOTE: Folder `PCAWG` was used for CNVs detection in PanCancer Analysis of Whole Genomes cohort and is *research* only version. It is located here for historical reasons. Feel free to remove it.

**bioRxiv** for somatic CNA analysis: https://www.biorxiv.org/content/10.1101/837971v1 (calling of copy-number alterations in normal-tumor pairs).

For **germline** CNVs: my phd thesis is available at [google drive](https://drive.google.com/file/d/1BvVqjCy8ACixej7Ul3j4PINwY-iLBGv_/view?fbclid=IwAR2UyRSSZi8HlziYpeqSmylotGJYwLjRblPA-BqY-HP2e8Pj4XSSPS5vpNY) and is citable as described [here](https://www.tdx.cat/handle/10803/668208) with the permanent link `http://hdl.handle.net/10803/668208`.
For **germline** CNVs: my phd thesis is citable as described [here](https://www.tdx.cat/handle/10803/668208) with the permanent link `http://hdl.handle.net/10803/668208`. Please, cite my thesis, and sincere apologies for not publishing the tool, I did what I could.

## Pre-requisites

Expand All @@ -40,6 +40,8 @@ install.packages("mclust")
install.packages("R.utils")
install.packages("RColorBrewer")
install.packages("party")
install.packages("dbscan")
install.packages("umap")
```

ClinCNV works faster with `Rcpp` package installed, however, if you experience any problems with this package, you may run ClinCNV without it.
Expand Down Expand Up @@ -240,6 +242,8 @@ samtools bedcov $bedFilePath -Q 3 $bamPath > $sampleName".cov"

### How to calculate BAF-files

If you have a VCF file, you can use [BAF extractor](https://github.com/imgag/ClinCNV/blob/master/helper_scripts/baf_extractor.py) script (thanks to Timofei for refactoring). I can not guarantee if it will work fine with your VCF (if your VCF contains the required info), report me if it does not.

Using *ngs-bits*:

```
Expand Down
18 changes: 15 additions & 3 deletions clinCNV.R
100644 → 100755
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
#!/usr/bin/env Rscript
set.seed(100)
options(warn=-1)
clincnvVersion = paste0("ClinCNV version: v1.17.0")
clincnvVersion = paste0("ClinCNV version: v1.17.2")

## CHECK R VERSION
if (!(as.numeric(version$major) >= 3 & as.numeric(version$minor) > 2.0)) {
if (!( (as.numeric(version$major) >= 3 & as.numeric(version$minor) > 2.0) | as.numeric(version$major) >= 4) ) {
print("Your R version is too old. We can not guarantee stable work.")
print(version)
}
Expand All @@ -18,6 +18,9 @@ library(foreach)
library(doParallel)
library(mclust)
library(R.utils)
library(umap)
library(dbscan)


Rcpp_global = "Rcpp" %in% rownames(installed.packages())
if (Rcpp_global) {library("Rcpp")}
Expand Down Expand Up @@ -460,6 +463,15 @@ left_borders <- lstOfChromBorders[[1]]
right_borders <- lstOfChromBorders[[2]]
ends_of_chroms <- lstOfChromBorders[[3]]


# check if any targets in BED are out of cytobands
for (chrom in unique(bedFile[,1])) {
if (ends_of_chroms[[chrom]] < max(bedFile[bedFile[,1] == chrom,3])) {
print("Coordinates in BED file are outside of the cytobands! Please check if your cytobands file matches your reference genome version!")
quit()
}
}

startX = NA
if (opt$par != "NO" & (framework == "germline" | frameworkOff == "offtargetGermline")) {
modifiedListOfChromosomesWithPAR = addParalogousRegions(left_borders, right_borders, ends_of_chroms)
Expand Down Expand Up @@ -627,7 +639,7 @@ if (length(samplesToFilterOut) > 0) {

print(paste("We start to cluster your data (you will find a plot if clustering is possible in your output directory)", opt$out, Sys.time()))
if (is.null(opt$clusterProvided)) {
clusteringList <- returnClustering(as.numeric(opt$minimumNumOfElemsInCluster))
clusteringList <- returnClustering2(as.numeric(opt$minimumNumOfElemsInCluster))
clustering = clusteringList[[1]]
outliersByClusteringCohort = clusteringList[[2]]
} else {
Expand Down
2 changes: 2 additions & 0 deletions doc/install.md
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@ install.packages("mclust")
install.packages("R.utils")
install.packages("RColorBrewer")
install.packages("party")
install.packages("dbscan")
install.packages("umap")
```

ClinCNV works faster with `Rcpp` package installed, however, if you experience any problems with this package, you may run ClinCNV without it.
Expand Down
Loading

0 comments on commit ae2382e

Please sign in to comment.