From 2716d9861e229c20dd8eaef34e978eaa4f202a48 Mon Sep 17 00:00:00 2001 From: jvfe Date: Mon, 13 May 2024 09:46:35 -0300 Subject: [PATCH] docs: Add documentation for ParellelEvolCCM Signed-off-by: jvfe --- docs/evolccm.md | 88 +++++++++++++++++++++++++++++++++++++++++++++++++ mkdocs.yml | 1 + 2 files changed, 89 insertions(+) create mode 100644 docs/evolccm.md diff --git a/docs/evolccm.md b/docs/evolccm.md new file mode 100644 index 0000000..a8d2104 --- /dev/null +++ b/docs/evolccm.md @@ -0,0 +1,88 @@ +# ParallelEvolCCM usage + +ParallelEvolCCM is a tool for the identification of coordinated gain and loss of features. +The method is described in detail in the following publication: + +- [The Community Coevolution Model with Application to the Study of Evolutionary Relationships between Genes Based on Phylogenetic Profiles](https://doi.org/10.1093/sysbio/syac052) + +If you use ParallelEvolCCM in your analysis, please cite the above publication. + +## ParallelEvolCCM inputs + +The ParallelEvolCCM tool requires two inputs: + +- A phylogenetic tree in Newick format +- A presence/absence table in TSV format. + +The presence/absence TSV must have genome names equal to the ones in the tree in a 'genome_id' column, +with all other columns representing features absent (0) or present (1) in each genome. I.e.: + +``` +genome_id plasmid_AA155 plasmid_AA161 +ED010 0 0 +ED017 0 1 +ED040 0 0 +ED073 0 1 +ED075 1 1 +ED082 0 1 +ED142 0 1 +ED178 0 1 +ED180 0 0 +``` + +## Using ParallelEvolCCM by itself + +The ParallelEvolCCM tool is a command line tool written in R. +It is available through the [bin/ParallelEvolCCM.R](https://github.com/beiko-lab/arete/blob/master/bin/ParallelEvolCCM.R) script. + +To download the tool and make it executable, run: + +```bash +wget https://raw.githubusercontent.com/beiko-lab/arete/master/bin/ParallelEvolCCM.R +chmod +x ParallelEvolCCM.R +``` + +Then, ensure all EvolCCM dependencies are installed. +You can install them by running the following command in your R console: + +```r +install.packages(c('ape', 'dplyr', 'phytools', 'foreach', 'doParallel', 'gplots', 'remotes')) +remotes::install_github('beiko-lab/evolCCM') +``` + +You can then run the tool like this: + +```bash +./ParallelEvolCCM.R --intree tree.nwk --intable feature_table.tsv.gz --cores -1 +``` + +- `--intree` specifies the phylogenetic tree in Newick format. +- `--intable` specifies the feature table in compressed TSV format. +- `--cores` specifies the number of cores to use. Use `-1` to use all available cores. + +Additional parameters can be found by running `./ParallelEvolCCM.R` with no additional parameters. + +## Using ParallelEvolCCM with ARETE + +The ParallelEvolCCM tool is also made available through the `evolccm` entry in ARETE. +Making it possible to run the tool with Docker or Singularity. + +To execute the ParallelEvolCCM tool with ARETE, run the following command: + +```bash +nextflow run beiko-lab/ARETE \ + -entry evolccm \ + --core_gene_tree core_gene_alignment.tre \ + --feature_profile feature_profile.tsv.gz \ + -profile docker +``` + +The parameters being: + +- `--core_gene_tree` - The reference tree, coming from a core genome alignment, + like the one generated by the `phylo` entry in ARETE. +- `--feature_profile` - A presence/absence TSV matrix of features + in genomes, like the one created in ARETE's `annotation` entry. +- `-profile` - The profile to use. In this case, `docker`. + +For more information, check the [full ARETE documentation](https://beiko-lab.github.io/arete/). diff --git a/mkdocs.yml b/mkdocs.yml index 437a792..2a01a92 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -14,6 +14,7 @@ nav: - Dataset Size: resource_profiles.md - Parameters: params.md - Subsampling: subsampling.md + - ParallelEvolCCM: evolccm.md repo_url: https://github.com/beiko-lab/arete theme: name: "readthedocs"