Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve serotype assignment in Dengue virus DENVx genotypes datasets #70

Open
j23414 opened this issue Jun 25, 2024 · 0 comments
Open
Labels
enhancement New feature or request

Comments

@j23414
Copy link
Contributor

j23414 commented Jun 25, 2024

Context

Flagged by @rneher slack message, the Dengue virus DENVx genotypes dataset could be further improved in its clade assignments. For example for DENV1:

  1. DENV2 samples that align are correctly placed onto the outgroup node and marked as unassigned. (good!)
  2. However, DENV1 samples that don't belong to an annotated genotype are also marked as unassigned, which is arguably incorrect. (This could be improved!) An example shown below:

image

Description

These samples should be assigned to the DENV1 serotype without a specific genotype, rather than being marked as unassigned. To illustrate this group of samples visually, we aim to reduce the samples in the magenta region of the table:

Screenshot 2024-06-25 at 9 50 29 AM

Possible solution

To ensure accurate serotype assignment while allowing for true-negative genotype assignments. I'm currrently planning the following steps:

  1. In the dengue/all tree, identify the amino acid mutations from the dengue/all reconstructed root to the reconstructed root of each serotype.
  2. In each dengue/denv* tree, locate the amino acid mutations from the serotype reconstructed root to the outgroup dengue/all reconstructed root, and correct the coordinates accordingly.
  3. Add the corrected coordinates of the amino acid mutations to each of the clades_genotype_denv*.tsv files, using the serotype name (e.g., DENV1) as the identifier.

After implementing these changes:

  • All DENV1 samples should be assigned to the DENV1 serotype, even if they don't belong to a specific genotype.
  • Samples from other serotypes (e.g., DENV2) should still be correctly marked as unassigned.

Of course, open to other suggestions or guidance here.

@j23414 j23414 added the enhancement New feature or request label Jun 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

1 participant