Skip to content

Commit

Permalink
added makespeciesfasta to readme
Browse files Browse the repository at this point in the history
  • Loading branch information
cdoorenweerd committed May 19, 2020
1 parent 6621943 commit 076e18a
Showing 1 changed file with 6 additions and 2 deletions.
8 changes: 6 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,9 @@ AACTGTCA and AACTNY-A
Are considered 'identical' haplotypes (or you might say 'compatible' or 'non-unique').


The scripts compare all sequences in a pairwise fashion, so the computational time increases exponentially with more sequences. However, it should be able to handle ~5,000 sequences in less than 30 minutes on most desktop machines, for larger datasets a cluster is advisable.
### Computational demand

The scripts compare all sequences in a pairwise fashion, so the computational time increases exponentially with more sequences. However, it should be able to handle ~5,000 sequences in a couple of hours on most desktop machines, for larger datasets a cluster is advisable.


### Fasta input format
Expand All @@ -62,7 +64,7 @@ NACTCTCTACTTTATTTTCGGAATTTGATCTGGAATATTAGGAACATCTTTAAGTATATTAATTCGAGCTGAATTAGGTA

### Script functions

For each script, run python script.py -h for usage instructions.
For each script, run `python script.py -h` for usage instructions.


- `hapcounter.py` counts the total number of sequences and unique haplotypes per species and outputs to csv table.
Expand All @@ -73,6 +75,8 @@ For each script, run python script.py -h for usage instructions.

- the Jupyter Notebook `graphs.ipynb` contains scripts to interactively generate ('barcode gap') violin plots from the output from ```pdistancer.py``` and output the graphs for publication.

- `makespeciesfastas.py` will generate a separate fasta for each species in the folder /species_fastas

- `chao1.py` Uses all species' fastas in /species_fastas to run the SpideR_chao1.R script to calculate chao 1 estimates of the total haplotype diversity and returns a csv. Note that the function assumes a large number of specimens have been sampled and that duplicate haplotypes have not been removed.

- `SpideR_haploaccum.R` R script that plots haplotype accumulation curves, based on the SpideR package (https://cran.r-project.org/web/packages/spider/spider.pdf)
Expand Down

0 comments on commit 076e18a

Please sign in to comment.