Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consistency checking between profile_dists and gas distance units #8

Open
apetkau opened this issue May 2, 2024 · 2 comments
Open
Labels
enhancement New feature or request

Comments

@apetkau
Copy link
Member

apetkau commented May 2, 2024

Issue

Right now gas call or gas mcluster relies on distances calculated from profile_dists. Distance thresholds are set using the --threshold parameter. However, profile_dists can give distances in two different units: either scaled (number from 0 to 1) or hamming (non-negative number). The distance units passed as thresholds need to be kept in-sync. For example:

  • If profile_dists uses scaled, then thresholds need to be from 0 to 1 e.g., --threshold 0.2,0.1
  • If profile_dists uses hamming, then thresholds need to be non-negative, e.g., --threshold 10,5,0

Solution

One solution to help with error checking is to add a --distm method to gas, that takes either hamming or scaled (same values as passed to profile_dists). This parameter is used to check numbers passed to --threshold

@apetkau apetkau added the enhancement New feature or request label May 2, 2024
@apetkau
Copy link
Member Author

apetkau commented May 2, 2024

Other consistency checks could be:

  • If --distm hamming, then all distance values from profile_dists (and passed thresholds) should be integers
  • If --distm scaled, then all distance values from profile_dists should be between 0 and 1.

Not sure if it would add a lot more time for consistency checking of calculated distance values though.

@apetkau
Copy link
Member Author

apetkau commented May 2, 2024

Alternatively, you could change the column output from profile_dists to include the unit. That is:

query_id ref_id dist_scaled
A A 0
A B 0.5

And then passing --distm scaled to gas would only read from the dist_scaled column (and same for --distm hamming).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant