-
Notifications
You must be signed in to change notification settings - Fork 20
Taxonomic Workflows
Hannes Hauswedell edited this page Nov 21, 2016
·
22 revisions
- This article refers to the lambda-next branch and releases >= 1.9.2
If you are using LAMBDA in taxonomic workflows, you might want to make use of some of the following features:
First you need to include taxonomic information in your index:
- Make sure your subject sequences contain accession numbers; GIs are not supported. The following accession numbers are automatically detected and extracted from fasta/fastq headers:
* UniProt (more information)
* NCBI nucl, NCBI prot, NCBI wgs and NCBI mga (more information)
*
Refseq(not yet supported) - Download a mapping file from the NCBI (make sure it's the correct one).
- Rebuild your index, but add
--acc-tax-map /path/to/file.accession2taxid[.gz]
(you don't have to unzip the file). * Building the new index will take longer, but it only increases the index's size by a few MBs. * If LAMBDA fails to assign most of your sequences to taxa, it will warn you!
When running LAMBDA, you need to tell it to print the taxonomic information:
- for the tabular BLAST Output Formats, specify, e.g.
--output-columns 'std staxids'
- for the SAMTOOLS Output Formats, specify e.g
--sam-bam-tags 'AS NM ZE ZI ZF ZT'
(the last tag is the important one)
Note that this implies no taxonomic binning, you just get the taxa corresponding to the subject sequences of your individual matches.
If anything is unclear, don't hesitate to contact to me.