BEExact: a metataxonomic database tool for high-resolution inference of bee-associated microbial communities
Original article in mSystems here: https://msystems.asm.org/content/6/2/e00082-21
BEExact v2023.01.30 is here! There have been many formal descriptions of novel bee-associated bacterial species come out since the last release, and excitingly, some of them have turned out to be exact matches to the predicted species previously designated with 'bxid' placeholder names. For example, Lacticaseibacillus sbxid5696 (old:BX004169) has been replaced by Lacticaseibacillus zhaodongensis (new:BX006585) and Frischella sbxid5676 (old:BX004348) has been replaced by Frischella japonica (new:BX006595). See here for the full list of changes.
Furthermore, additional pre-trained/pre-formatted classifiers are now available for mothur, LotuS2, and KRAKEN2 software.
-
The unformatted reference database with full length 16S rRNA gene sequences + taxonomy:
- Sequences: BEEx_FL-refs_sequences.fasta
- Taxonomy: BEEx_FL-refs_taxonomy.txt
- Sequences: BEEx_FL-refs_sequences.fasta
-
Pre-trained/formatted region-specific training sets for specific classifiers:
IDTAXA | DADA2 [RDP] | QIIME2 [NB] | SINTAX | MOTHUR | LOTUS2 | KRAKEN2 | |
---|---|---|---|---|---|---|---|
Bakt_341F 5'-CCTACGGGNGGCWGCAG-3' Bakt_805R 5'-GACTACHVGGGTATCTAATCC-3' |
V3V4 | V3V4 | V3V4 | V3V4 | V3V4 | V3V4 | V3V4 |
515F(Parada) 5'-GTGYCAGCMGCCGCGGTAA-3' 806R(Apprill) 5'-GGACTACNVGGGTWTCTAAT-3' |
V4 | V4 | V4 | V4 | V4 | V4 | V4 |
515F(Parada) 5'-GTGYCAGCMGCCGCGGTAA-3' 926R(Quince) 5'-CCGYCAATTYMTTTRAGTTT-3' |
V4V5 | V4V5 | V4V5 | V4V5 | V4V5 | V4V5 | V4V5 |
799F-mod3 5'-CMGGATTAGATACCCKGG-3' 1115R(Kembel) 5'-AGGGTTGCGCTCGTTG-3' |
V5V6 | V5V6 | V5V6 | V5V6 | V5V6 | V5V6 | V5V6 |
Full-length 16S rRNA ref sequences | FL | FL | FL | FL | FL | FL | FL |
BEExact is a comprehensive, non-redudant, reference database that has been thoroughly curated for use with 16S rRNA gene-based sequencing on bee-associated microbial communities.
The database will be updated frequently to incoporate annotations and reference sequences for novel bee host-associated taxa. All suggestions for improvement are welcomed, see contact info below. A wiki tutorial is also currently in the works and will cover microbiota analysis using exact ASVs as opposed to the traditional OTU-based methods. As a quick note, there are several advantages to using ASVs specifically relating to their precision in characterizing microbial communities as well as their consistency for cross-study compatibility. See the latest DADA2 pipeline for more details on this.
Also, an excellent article simplifying the workflow for valid statistical analysis on compositional datasets: Microbiome Datasets Are Compositional: And This Is Not Optional
Across 32 indepedent studies encompassing 50 bee species, BEExact is enabled classification of ~80-90% of ASVs at the species-level whereas the leading exisiting database classified no more than ~30% at the same level. We noted that microbial communities from eusocial bee species generally exhibited higher classification rates, likely owing to the fact that their microbiota has been more intensively characterized compared to many solitary bee species.
Other variable region-specific training sets can be generated using the BEExact full database sequences.
An example using QIIME2 tools for making a V3-V4 specific training set:
Steps 1: Import sequence and taxonomy files as .qza
qiime tools import \
--type 'FeatureData[Sequence]' \
--input-path BEEx-FL-refs_sequences.fa \
--output-path BEEx-FL-refs_sequences.qza
qiime tools import \
--type 'FeatureData[Taxonomy]' \
--input-format HeaderlessTSVTaxonomyFormat \
--input-path BEEx-FL-refs_taxonomy.txt \
--output-path BEEx-FL-refs_taxonomy.qza
Steps 2: Trim to specific region of interest (V3-V4 in this case)
qiime feature-classifier extract-reads \
--i-sequences BEEx-FL-refs_sequences.qza \
--p-f-primer ACTCCTACGGGAGGCAGCAG \
--p-r-primer GGACTACHVGGGTWTCTAAT \
--p-min-length 100 \
--p-max-length 400 \
--o-reads BEEx-V3V4-refs_sequences.qza
Steps 3: Train the classifier
qiime feature-classifier fit-classifier-naive-bayes \
--i-reference-reads BEEx-V3V4-refs_sequences.qza \
--i-reference-taxonomy BEEx-FL-refs_taxonomy.qza \
--o-classifier QIIME2_BxV3V4TS.qza
Step 4: Classify reads with the q2-feature-classifier
qiime feature-classifier classify-sklearn \
--i-classifier QIIME2_BxV3V4TS.qza \
--i-reads ASVs_query_sequences.qza \
--p-confidence 0.5 \
--o-classification QIIME2_BxV3V4TS_ASVs_out.qza
Step 5: Visualize files
qiime metadata tabulate \
--m-input-file QIIME2_BxV3V4TS_ASVs_out.qza \
--o-visualization QIIME2_BxV3V4TS_ASVs_out.qzv
For user-friendly conversion, drag and drop "QIIME2_BxV3V4TS_ASVs_out.qzv" to https://view.qiime2.org
If you find the database helpful, please cite the following:
Daisley B.A. and G. Reid (2020). BEExact: a metataxonomic database tool for high-resolution inference of bee-associated microbial communities. mSystems 6(2):e00082-21
https://doi.org/10.1128/mSystems.00082-21
All feedback welcomed. If you have any questions, please feel free to contact me. Sharing of information is also encouraged, especially for novel bee-associated species that have recently been discovered but not yet incorporated into the BEExact database. Teamwork makes the dreamwork.
Email: [email protected]
Twitter: @bdaisley