Taxonomic Workflows

- This article refers to the lambda-next branch and releases >= 1.9.2

If you are using LAMBDA in taxonomic workflows, you might want to make use of some of the following features:

Printing taxonomic IDs of subject sequences

You only need to do this once:

Make sure your subject sequences contain accession numbers; GIs are not supported. The following accession numbers are automatically detected and extracted from fasta/fastq headers: * UniProt (more information) * NCBI nucl, NCBI prot, NCBI wgs and NCBI mga (more information) * ~~Refseq~~ (not yet supported)
Download a mapping file from the NCBI (make sure it's the correct one).
Rebuild your index, but add --acc-tax-map /path/to/file.accession2taxid[.gz] (you don't have to unzip the file). * Building the new index will take longer, but it only increases the index's size by a few MBs. * If LAMBDA fails to assign most of your sequences to taxa, it will warn you!

You need to tell it to print the taxonomic information:

for the tabular BLAST Output Formats, specify e.g. --output-columns 'std staxids'
for the SAMTOOLS Output Formats, specify e.g. --sam-bam-tags 'AS NM ZE ZI ZF st' (the last tag is the important one)

Note that this implies no taxonomic binning, you just get the taxa corresponding to the subject sequences of your individual matches.

TODO

If anything is unclear, don't hesitate to contact to me.