Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add states() method to return actual genotype states as characters #2617

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

hyanwong
Copy link
Member

@hyanwong hyanwong commented Oct 30, 2022

Description

Adds variant.genotype_values(). See tskit-dev/tsinfer#739

PR Checklist:

  • Tests that fully cover new/changed functionality.
  • Documentation including tutorial content if appropriate.
  • Changelogs, if there are API changes.

@codecov
Copy link

codecov bot commented Oct 30, 2022

Codecov Report

Merging #2617 (3b884b0) into main (cba6922) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #2617   +/-   ##
=======================================
  Coverage   93.93%   93.93%           
=======================================
  Files          28       28           
  Lines       27762    27773   +11     
  Branches     1279     1283    +4     
=======================================
+ Hits        26078    26089   +11     
  Misses       1647     1647           
  Partials       37       37           
Flag Coverage Δ
c-tests 92.25% <ø> (ø)
lwt-tests 89.05% <ø> (ø)
python-c-tests 71.89% <9.09%> (-0.05%) ⬇️
python-tests 98.98% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
python/tskit/genotypes.py 99.13% <100.00%> (+0.09%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cba6922...3b884b0. Read the comment docs.

@hyanwong
Copy link
Member Author

I'm not sure why this is failing CI: everything is covered AFAICS.

Ready to merge, I reckon, pending approval of the name of the function.

Copy link
Member

@jeromekelleher jeromekelleher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - not sure about the name though.

python/tskit/genotypes.py Outdated Show resolved Hide resolved
@jeromekelleher
Copy link
Member

Also not clear that an Object array is what people would want. Would it be better to have a "missing data symbol" or something like we do for haplotypes?

@hyanwong
Copy link
Member Author

hyanwong commented Nov 1, 2022

Also not clear that an Object array is what people would want. Would it be better to have a "missing data symbol" or something like we do for haplotypes?

ISWYM, but using any particular character will mean that we can't use that for an allele type (which we might conceivably want to do). This is only going to bite for missing data, so I don't think it's too bad TBH.

@jeromekelleher
Copy link
Member

Hmm - how about writing some client code for this with and without missing data? Something like the verify function in tsinfer would be a good one. We definitely don't want to have to have different code paths for dtypes with and without missing data.

We use "N" as the default missing data character elsewhere and its fine. We should be able to use this function to compare easily with e.g. alignments.

Is the np.array(list(ts.alignments()) equal to np.array(var.states() for var in ts.variants()).T with and without missing data?

@benjeffery
Copy link
Member

@hyanwong Would you like this to be in 0.5.4?

@hyanwong
Copy link
Member Author

hyanwong commented Jan 4, 2023

Ah, good point. There's no hurry from my PoV. Is it just a question of my changing the name? I can do that easily enough.

@jeromekelleher
Copy link
Member

I don't think we should hurry this in as we don't want to get stuck with Object arrays unless we have to. They have subtly different semantics to arrays of unicode.

We should have some example client code for this before committing ourselves to a particular representation.

@hyanwong
Copy link
Member Author

hyanwong commented Jan 8, 2023

Sure. I have updated this so it should never return an object array anyway. But no hurry to get this in. It was to help @szhan and others who spent a fair bit of time comparing integer arrays between tree sequence before realising that they weren't actually comparable.

@hyanwong hyanwong changed the title Add genotype_values() method Add states() method to return actual genotype states as characters Jan 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants