Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setting an outgroup for the "Dengue virus DENVx genotypes" dataset #67

Merged
merged 8 commits into from
Jun 20, 2024

Conversation

j23414
Copy link
Contributor

@j23414 j23414 commented Jun 10, 2024

Description of proposed changes

This pull request aims to fine-tune the Dengue virus Denv* (dengue/denv*) datasets for accurate genotype-level assignment. Similar to a previous PR (#58) for the Dengue virus All (dengue/all) dataset, it addresses the issue of cross-serotype samples being falsely assigned to genotypes within a specific serotype. For example, when querying all samples against a DENV4 dataset, there were false-positive "DENV4II" genotype calls.

Screenshot 2024-06-03 at 1 33 41 PM

To resolve this issue, the following actions were taken, inspired by a suggestion from @rneher in a Slack channel:

  • Adding a reconstructed root for each tree
  • Adding the "dengue/all" reconstructed root as an outgroup for each tree
  • Split the clades_genotype.tsv file by serotype
  • Remove further false positives in the DENV2 genotype-level calls by adding the DENV1, DENV3, and DENV4 roots as unassigned outgroups.

Results

The resulting trees with minimized cross-serotype false-positive genotype-level assignments are documented in #69

Related issue(s)

Checklist

  • Checks pass
  • Reduction in cross-serotype false-positive genotype-level assignments (see docs)

The dataset has been pushed to PR nextstrain/nextclade_data#203 and is available for testing at the links in the PR comment nextstrain/nextclade_data#203 (comment)

@j23414 j23414 changed the title Nextclade genotype set outgroup Setting an outgroup for the "Dengue virus DENVx genotypes dataset Jun 10, 2024
@j23414 j23414 changed the title Setting an outgroup for the "Dengue virus DENVx genotypes dataset Setting an outgroup for the "Dengue virus DENVx genotypes" dataset Jun 10, 2024
@j23414 j23414 force-pushed the nextclade-genotype-set-outgroup branch 2 times, most recently from d317d0e to 03277a4 Compare June 14, 2024 17:12
@j23414 j23414 marked this pull request as ready for review June 18, 2024 17:39
@j23414 j23414 requested a review from a team June 18, 2024 17:39
@j23414
Copy link
Contributor Author

j23414 commented Jun 20, 2024

Merging this so I can continue working on feedback from slack in a new issue and PR.

@j23414 j23414 force-pushed the nextclade-genotype-set-outgroup branch from 03277a4 to d3c39c7 Compare June 20, 2024 20:14
@j23414 j23414 merged commit 84d6de2 into main Jun 20, 2024
32 checks passed
@j23414 j23414 deleted the nextclade-genotype-set-outgroup branch June 20, 2024 20:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants