-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sprint 9 Task List #421
Comments
…updates hg19 and hg38 dbs
…updates hg19 and hg38 dbs
…nd hg38 data (#431) * Updates hg38.mapping.yamland hg19.mapping.yaml to support new our new hg19 and hg38 databases * Updates hg38.clean.yml and hg19.clean.yml to match the bystro annotator definitions of our new databases hg38.mapping.yml is now a separate definition, rather than a symlink to hg19.mapping.yml. It is identical to hg19.mapping.yml besides the gnomad sections, which have to be different since gnomad v4 was used in the hg38 database.
…by outputting 'false'/'true' labels
* Adds somascan adat and annotation support, see https://github.com/SomaLogic/Canopy for documentation on adat and annotations file formats.
…alse'/'true' labels (#438) * Make discordant an actual boolean field, by outputting 'false'/'true' labels This makes it easier to search the field, as well as to import it as a boolean in Pandas/R.
…ncestry Memory Usage (#449) * Adds docker file for Bystro's python library * Creates ancestry api and cli code for calculating ancestry scores * Removes unneeded dependencies from Cargo.toml, to speed up builds * Improves Makefile by introducing the ability to make production builds and install from wheel. * Reduce ancestry memory usage by reading in sample chunks * Cache ancestry scores to local disk to reduce S3 fetching To test what is here: ``` docker pull akotlar/bystro-api docker run -v /path/to/local/data:/data akotlar/bystro-api ancestry score --in /data/trio.trim.vep.vcf.gz --assembly hg19 ``` [trio.trim.vep.vcf.gz](https://github.com/bystrogenomics/bystro/files/14730266/trio.trim.vep.vcf.gz) The api function is a port of ancestry/listener.py handler_fn.
Updates on //2024-03-29 Alex:
Dennis: (has been sick)
Austin:
|
S3 Uploads now work, even for very large datasets (e.g. 100GB+ uploads). |
Semi-automated AMIs have been made and deployed to IBDGC. They need 1 more pass of refinement in order to be fully capable of being taken down / up at will in autoscaling fashion, sprint 10 task |
…ve Search Interface
…ve Search Interface
Webapp
Annotation
Ancestry
Retrain ancestry model (hg38) with 76k gnomad loadings - @cristinaetrv - March 13th
Liftover array set - @cristinaetrv - March 13th
Liftover gnomad loadings - @cristinaetrv - March 14th
Add assembly to AncestryData, update bystro-web to submit assembly, and use assembly when choosing model (Send assembly version with ancestry job requests #419) - @akotlar - March 13th
Create ancestry docker container - @akotlar - March 14th
Add support for choosing best covariate set for ancestry - @akotlar - March 15th
Gives Thomas new ancestry code to test on healthy aging study - March 20th @cristinaetrv
Add Ancestry table version that just has IDs and top hit superpops that can be expanded to larger table @akotlar
Before expanding, top hit for superpops is seen, after expanding format remains as before
Possible format: Sample ID | Top hit Superpop | Prob(Top hit superpop) --> Expand to see: Number Variants retained| Prob of other Superpops (5 columns) | Prob of Population 1,2,3 ...
For populations: Instead of adding all 26 populations as something you have to open for each one, add the columns as 'Prob(CEU), Prob(CDX)' and so on for each individual
Switch missingness to 'variants retained'
Proteomics Data Handling / API
Proteomics Statistics
Infrastructure
PRS
POE
The text was updated successfully, but these errors were encountered: