After combine two datasets together to recluster, some of the abundance of high-abundant OTUs were reduced dramatically. #514

peiyaohu · 2023-02-28T09:18:50Z

I have two datasets A and B, where there is a high-abundance OTU (id: OTU_54) in dataset A. In order to compare the abundance of OTU_54 in the two datasets, I put the raw sequencing data of A and B together (=>A+B), followed the example steps provided on the website to cluster (the parameters are the same as when A and B analyzed), and found that the OTU_54 in the original A dataset had very low abundance in the otutab(A+B) produced by the new clustering.

So I blast all.nonchimeras.fasta (the file before cluster at 97% similarity) of A, B and A+B with OTU_54, and filtered the blast results according to identity > 97%, alignment length>300, and checked the number of matches, and found that A+B lost a lot of OTU_54.

wc -l filt_nonchim*               # filtered blast results. 
   76966 filt_nonchim18.txt   #generated from datasetB
  157240 filt_nonchim19.txt  #generated from datasetA
   12369 filt_nonchim.txt       #generated from A+B

How can I address or optimize the analysis process? Thanks!

The text was updated successfully, but these errors were encountered:

frederic-mahe · 2023-02-28T14:42:22Z

and found that the OTU_54 in the original A dataset had very low abundance in the otutab(A+B) produced by the new clustering.

This is a known downside of using a centroid-based fix-threshold clustering approach: some clusters shrink or disappear when adding more data.

A given centroid1 can be abundant in a sample A, but close to a more abundant centroid2 present in a sample B. If you clusterize A+B, then centroid2 captures some or all the reads initially captured by centroid1.

and checked the number of matches, and found that A+B lost a lot of OTU_54.

If I understand correctly, reads from OTU_54 are not lost, but were re-distributed into other OTUs. There is not much that can be done to mitigate that downside.

peiyaohu · 2023-03-01T03:54:41Z

Thanks so much!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

After combine two datasets together to recluster, some of the abundance of high-abundant OTUs were reduced dramatically. #514

After combine two datasets together to recluster, some of the abundance of high-abundant OTUs were reduced dramatically. #514

peiyaohu commented Feb 28, 2023 •

edited

Loading

frederic-mahe commented Feb 28, 2023

peiyaohu commented Mar 1, 2023

After combine two datasets together to recluster, some of the abundance of high-abundant OTUs were reduced dramatically. #514

After combine two datasets together to recluster, some of the abundance of high-abundant OTUs were reduced dramatically. #514

Comments

peiyaohu commented Feb 28, 2023 • edited Loading

frederic-mahe commented Feb 28, 2023

peiyaohu commented Mar 1, 2023

peiyaohu commented Feb 28, 2023 •

edited

Loading