You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have two datasets A and B, where there is a high-abundance OTU (id: OTU_54) in dataset A. In order to compare the abundance of OTU_54 in the two datasets, I put the raw sequencing data of A and B together (=>A+B), followed the example steps provided on the website to cluster (the parameters are the same as when A and B analyzed), and found that the OTU_54 in the original A dataset had very low abundance in the otutab(A+B) produced by the new clustering.
So I blast all.nonchimeras.fasta (the file before cluster at 97% similarity) of A, B and A+B with OTU_54, and filtered the blast results according to identity > 97%, alignment length>300, and checked the number of matches, and found that A+B lost a lot of OTU_54.
wc -l filt_nonchim* # filtered blast results.
76966 filt_nonchim18.txt #generated from datasetB
157240 filt_nonchim19.txt #generated from datasetA
12369 filt_nonchim.txt #generated from A+B
How can I address or optimize the analysis process? Thanks!
The text was updated successfully, but these errors were encountered:
and found that the OTU_54 in the original A dataset had very low abundance in the otutab(A+B) produced by the new clustering.
This is a known downside of using a centroid-based fix-threshold clustering approach: some clusters shrink or disappear when adding more data.
A given centroid1 can be abundant in a sample A, but close to a more abundant centroid2 present in a sample B. If you clusterize A+B, then centroid2 captures some or all the reads initially captured by centroid1.
and checked the number of matches, and found that A+B lost a lot of OTU_54.
If I understand correctly, reads from OTU_54 are not lost, but were re-distributed into other OTUs. There is not much that can be done to mitigate that downside.
I have two datasets A and B, where there is a high-abundance OTU (id: OTU_54) in dataset A. In order to compare the abundance of OTU_54 in the two datasets, I put the raw sequencing data of A and B together (=>A+B), followed the example steps provided on the website to cluster (the parameters are the same as when A and B analyzed), and found that the OTU_54 in the original A dataset had very low abundance in the otutab(A+B) produced by the new clustering.
So I blast
all.nonchimeras.fasta
(the file before cluster at 97% similarity) of A, B and A+B with OTU_54, and filtered the blast results according to identity > 97%, alignment length>300, and checked the number of matches, and found that A+B lost a lot of OTU_54.How can I address or optimize the analysis process? Thanks!
The text was updated successfully, but these errors were encountered: