Skip to content

Taxonomic Workflows

Hannes Hauswedell edited this page Nov 30, 2016 · 22 revisions
- This article refers to the lambda-next branch and releases >= 1.9.2

If you are using LAMBDA in taxonomic workflows, you might want to make use of some of the following features:

Printing taxonomic IDs of subject sequences

During the indexing step

You only need to do this once:

  1. Make sure your subject sequences contain accession numbers; GIs are not supported. The following accession numbers are automatically detected and extracted from fasta/fastq headers: * UniProt (more information) * NCBI nucl, NCBI prot, NCBI wgs and NCBI mga (more information) * Refseq (not yet supported)
  2. Download a mapping file from the NCBI (make sure it's the correct one).
  3. Rebuild your index, but add --acc-tax-map /path/to/file.accession2taxid[.gz] (you don't have to unzip the file). * Building the new index will take longer, but it only increases the index's size by a few MBs. * If LAMBDA fails to assign most of your sequences to taxa, it will warn you!

When running LAMBDA

You need to tell it to print the taxonomic information:

Note that this implies no taxonomic binning, you just get the taxa corresponding to the subject sequences of your individual matches.

LCA computation / taxonomic binning

TODO