-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Variant method to get all alleles as a string #2181
Comments
Note that some of the changes made to the test suite for #2172 can be made simpler using this functionality. |
Might it be nice to have a general method: |
Good idea - I guess the distinction is whether you want to returned value to be a numpy array or a string. There's definitely value in getting a numpy array back too. |
@hyanwong - any thoughts on names here? I feel like this is quite a handy feature and we should support "encoding" the variant data in string or numpy format. We definitely shouldn't call the it "encode" though, as we already have the "decode" method which does something quite different. |
The reason I wanted this was to be able to compare sites from different encodings of the same tree sequence (which therefore could have alleles in a different order. I "hacked" around this in tskit-dev/tsinfer#652 (comment) by using Re naming: By the way, you say "encode", but I wonder if most people would think of this as a decoding of the indexing scheme? I can't think of any brilliant alternatives which avoid "encode/decode", but here are some ideas: |
p.s. a numpy array would be great: then you wouldn't need to bomb out on non-single-character alleles, I guess? |
This is what #2617 is about. In that PR we simply return a numpy array of strings, and after discussion, called this |
Fixed in 0.6.0 |
Fixed in #2617 |
There are often times when we'd like to get the genotypes for a given site as the actual alleles rather than the indexes into the alleles list. (This is what #2168 was about, I assume.)
We can add this as a method to the Variant class easily enough. We can just raise an error if there are any non-1 length alleles in there.
What do we call it?
This is the replacement also for the old
as_bytes
option, which basically did the same thing.It might look something like
The text was updated successfully, but these errors were encountered: