Skip to content

Purify pre-generated bins (e.g. using Maxbin) based on z-score differences in contig coverage using multiple sequencing datasets

License

Notifications You must be signed in to change notification settings

jvollme/bin_polisher

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 

Repository files navigation

bin_polisher.py

Purify pre-generated bins (e.g. using Maxbin) based on z-score differences in contig coverage using multiple sequencing datasets

optional arguments:
  -h, --help            show this help message and exit
  -v, --version         show version info and quit

Required input arguments:
  -if INPUT_FASTA_LIST [INPUT_FASTA_LIST ...], --input_fasta INPUT_FASTA_LIST [INPUT_FASTA_LIST ...]
                        Input fasta file(s) of a (single) bin (may be compressed)
  -ic INPUT_COVERAGE_LIST [INPUT_COVERAGE_LIST ...], --input_coverage INPUT_COVERAGE_LIST [INPUT_COVERAGE_LIST ...]
                        seperate abundance files for each dataset, listing the
                        respective abundance/coverage of each contig in a
                        seperate line (Format: <contig-id>\TAB<coverage>), May
                        include contigs not present in bin-fasta (such contigs
                        wil be ignored)

Filtering options:
  -pr {high,low,both,none}, --pre-remove {high,low,both,none}
                        remove extreme values based on upper (99%) or lower
                        (1%) percentile, or both. default = none
  -mi MAX_ITERATIONS, --max_iterations MAX_ITERATIONS
                        maximum number of iterations for recalculating and
                        comparing coverage average, standard deviation and
                        z-score values and removing difference-outliers.
                        Default = 50
  -uzc UPPER_ZSCORE_CUTOFF, --upper_zscore_cutoff UPPER_ZSCORE_CUTOFF
                        Start iteratively removing contigs with z-score
                        differences above cutoff starting at the specified
                        value (cutoff will be decreased by 1 when no z-scores
                        above cutoff are encountered, until the lower zscore
                        cutoff is reached). Default = 4
  -lzc LOWER_ZSCORE_CUTOFF, --lower_zscore_cutoff LOWER_ZSCORE_CUTOFF
                        Stop iteratively reducing z-score difference cutoff if
                        it falls below this value. Default = 2

Output options:
  --intermediate        output results of all intermediate iterations. Default
                        = only output results of last iteration
  --out_bad             create seperate output files for rejected reads also.
                        default = False
  -op OUT_PREFIX, --out_prefix OUT_PREFIX
                        prefix for output file(s). default = "bin_polisher"

example usage: bin_polisher.py -if <bin1.fasta.gz> -ic <bin1_sample1.coverage> [<bin1_sample2.coverage> ...]

This tool was originally created for lab-internal use and not with user friendliness or adaptiveness in mind. It is published here, to enable reproducability of any of our research projects that may have been using this script (e.g. Vollmers et al, 2017).

More flexible versions may be uploaded whenever I have time for it. For any problems or questions about the usage, please create an Issue on this repository.

About

Purify pre-generated bins (e.g. using Maxbin) based on z-score differences in contig coverage using multiple sequencing datasets

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages