Sprint 9 Task List #421

akotlar · 2024-03-08T20:08:41Z

Webapp

Annotation

Test hg19_v8 database by comparing results, row by row, on trio_trim.vcf.gz between hg19_v8 and dave's instance b10 database - March 15th @akotlar
Create hg38 database - March 11th @akotlar
Test hg38 database by comparing results, row by row, on trio_trim.vcf.gz between hg19_v8 and dave's instance b10 database - March 18th @akotlar

Ancestry

Retrain ancestry model (hg38) with 76k gnomad loadings - @cristinaetrv - March 13th
Liftover array set - @cristinaetrv - March 13th
Liftover gnomad loadings - @cristinaetrv - March 14th
Add assembly to AncestryData, update bystro-web to submit assembly, and use assembly when choosing model (Send assembly version with ancestry job requests #419) - @akotlar - March 13th
Create ancestry docker container - @akotlar - March 14th
Add support for choosing best covariate set for ancestry - @akotlar - March 15th
Gives Thomas new ancestry code to test on healthy aging study - March 20th @cristinaetrv
Add Ancestry table version that just has IDs and top hit superpops that can be expanded to larger table @akotlar
Before expanding, top hit for superpops is seen, after expanding format remains as before
Possible format: Sample ID | Top hit Superpop | Prob(Top hit superpop) --> Expand to see: Number Variants retained| Prob of other Superpops (5 columns) | Prob of Population 1,2,3 ...
For populations: Instead of adding all 26 populations as something you have to open for each one, add the columns as 'Prob(CEU), Prob(CDX)' and so on for each individual
Switch missingness to 'variants retained'

Proteomics Data Handling / API

Finish download of proteomics data - March 12th @dlin30
Fix frontend upload of proteomic data - March 13th @dlin30
Add support for somascan upload in bystro webapp and api - March 14th @akotlar

Proteomics Statistics

Jupyter notebook demonstrating adjusting for batch effects on Adverserial PPCA on simulation data + a real dataset (@akotlar) - March 14th @austinTalbot7241993

Infrastructure

Make sure EBS scratch disk is very well provisioned, or use instance with 4TB SSD
Update bystro webapp documentation on how to use bystro, write API documentation, write library documentation - @akotlar - March 28th

PRS

Add AD GWAS summary statistics suggested by Thomas for C+T PRS - March 15th @cristinaetrv
Add readme for AD GWAS sum stats - March 18th @cristinaetrv -> Move to sprint 10
Add batch processing for PRS C+T workflow - March 25th -> Move to sprint 10
Finish PRS-CS standard way without Langevin Dynamics - @austinTalbot7241993 - March 28th.
Add support for covariates into PRS - @cristinaetrv - March 25th -> Backlogged for now, move to sprint 11/12

POE

Mike Epstein was positive on Austin's POE method. Address Mike's simulation suggestions - March 28th

…le (#427) * Dynamically load model based on the assembly passed in the job request * Cache up to 2 models to improve startup time Stacked on 159bdcf The commit for this PR: 1e64548 Also addresses #422

* Don't lowercase HGVS clinvarVCF.CLNHGVS, because no one lowercases HGVS

…updates hg19 and hg38 dbs

…nd hg38 data (#431) * Updates hg38.mapping.yamland hg19.mapping.yaml to support new our new hg19 and hg38 databases * Updates hg38.clean.yml and hg19.clean.yml to match the bystro annotator definitions of our new databases hg38.mapping.yml is now a separate definition, rather than a symlink to hg19.mapping.yml. It is identical to hg19.mapping.yml besides the gnomad sections, which have to be different since gnomad v4 was used in the hg38 database.

…by outputting 'false'/'true' labels

* Adds somascan adat and annotation support, see https://github.com/SomaLogic/Canopy for documentation on adat and annotations file formats.

…alse'/'true' labels (#438) * Make discordant an actual boolean field, by outputting 'false'/'true' labels This makes it easier to search the field, as well as to import it as a boolean in Pandas/R.

* Add the actual somascan code, missed in #433

…ncestry Memory Usage (#449) * Adds docker file for Bystro's python library * Creates ancestry api and cli code for calculating ancestry scores * Removes unneeded dependencies from Cargo.toml, to speed up builds * Improves Makefile by introducing the ability to make production builds and install from wheel. * Reduce ancestry memory usage by reading in sample chunks * Cache ancestry scores to local disk to reduce S3 fetching To test what is here: ``` docker pull akotlar/bystro-api docker run -v /path/to/local/data:/data akotlar/bystro-api ancestry score --in /data/trio.trim.vep.vcf.gz --assembly hg19 ``` [trio.trim.vep.vcf.gz](https://github.com/bystrogenomics/bystro/files/14730266/trio.trim.vep.vcf.gz) The api function is a port of ancestry/listener.py handler_fn.

akotlar · 2024-03-29T19:20:44Z

Updates on //2024-03-29

Alex:

IBDGC tasks are done
Somascan support in, but not yet threaded through to bystro-api (partially blocked by proteomics submission PR)
In process of bug fixing. Need to switch to polling strategy for currently viewed job; we're seeing evidence that our existing socketio implementation is dropping updates)

Dennis: (has been sick)

Streaming proteomics works, will PR
Proteomics submission in progress

Austin:

Spectral alignment - imputation is in
Harmonizing datasets - in progress - goal is to get an outer join (TMT + somascan), harmonized, so that the data is jointly analyzable
PRS-CS - on back burner until proteomics is in
SSPCA - has experimental results on 2 datasets (neuroscience, we beat L1 regularization; so we can argue our merits from purely predictive results, not just generative results as now)
POE - we're running into: size of parent of origin effects are so tiny relative to variance of our features that gaussian mixture models end up being a poor fit; our estimator will converge almost surely, but that is not so practically relevant, for our sample sizes. So can we do better than that, especially in taming bias which is inflating Type II / false positive error rate.

akotlar · 2024-04-01T07:37:00Z

S3 Uploads now work, even for very large datasets (e.g. 100GB+ uploads).
bystro webapp documentation is updated, but I have not had a chance to add detailed library documentation

akotlar · 2024-04-01T18:50:00Z

Semi-automated AMIs have been made and deployed to IBDGC. They need 1 more pass of refinement in order to be fully capable of being taken down / up at will in autoscaling fashion, sprint 10 task

…array of structs

…ve Search Interface

akotlar added this to the Sprint 9 milestone Mar 8, 2024

akotlar mentioned this issue Mar 8, 2024

[ancestry] Use assembly when choosing which model to use #422

Open

akotlar changed the title ~~Sprint 9 IBDGC Task List~~ Sprint 9 Task List Mar 8, 2024

akotlar added a commit to akotlar/bystro that referenced this issue Mar 11, 2024

Issue bystrogenomics#421: Add assembly support for ancestry module

1e64548

akotlar added a commit to akotlar/bystro that referenced this issue Mar 12, 2024

Issu bystrogenomics#421: Improve OpenSearch relevance for HGVS queries

474e448

akotlar added a commit that referenced this issue Mar 12, 2024

Issue #421: Improve OpenSearch relevance for HGVS queries (#428)

fdd0f60

* Don't lowercase HGVS clinvarVCF.CLNHGVS, because no one lowercases HGVS

akotlar added a commit to akotlar/bystro that referenced this issue Mar 12, 2024

Issu bystrogenomics#421: Improve OpenSearch relevance for HGVS queries

c80fc8b

akotlar added a commit to akotlar/bystro that referenced this issue Mar 12, 2024

Issu bystrogenomics#421: Improve OpenSearch relevance for HGVS queries

025fa4e

akotlar added a commit to akotlar/bystro that referenced this issue Mar 12, 2024

Issu bystrogenomics#421: Improve OpenSearch relevance for HGVS queries

7039369

akotlar added a commit to akotlar/bystro that referenced this issue Mar 13, 2024

Issue bystrogenomics#421: Update opensearch gnomad mappings to match …

f70b29e

…updates hg19 and hg38 dbs

akotlar added a commit to akotlar/bystro that referenced this issue Mar 13, 2024

Issue bystrogenomics#421: Update opensearch gnomad mappings to match …

567fa56

…updates hg19 and hg38 dbs

akotlar added a commit to akotlar/bystro that referenced this issue Mar 14, 2024

Issue bystrogenomics#421: Ancestry model selection

9faf929

akotlar added a commit to akotlar/bystro that referenced this issue Mar 14, 2024

Issue bystrogenomics#421: Add somascan support

087894e

akotlar added a commit to akotlar/bystro that referenced this issue Mar 14, 2024

Issue bystrogenomics#421: Remove refSeq.clinvar annotation output

6f34ea4

akotlar added a commit to akotlar/bystro that referenced this issue Mar 15, 2024

Issue bystrogenomics#421: Ancestry model selection

e124256

akotlar added a commit to akotlar/bystro that referenced this issue Mar 15, 2024

Issue bystrogenomics#421: Remove refSeq.clinvar annotation output

66d23d4

akotlar added a commit to akotlar/bystro that referenced this issue Mar 18, 2024

Issue bystrogenomics#421: Makae disocrdant and actual boolean field, …

be2ba9b

…by outputting 'false'/'true' labels

akotlar added a commit that referenced this issue Mar 18, 2024

Issue #421: Add somascan support (#433)

77391e7

* Adds somascan adat and annotation support, see https://github.com/SomaLogic/Canopy for documentation on adat and annotations file formats.

akotlar added a commit to akotlar/bystro that referenced this issue Mar 19, 2024

Issue bystrogenomics#421: Add somascan support code

b63afee

akotlar added a commit that referenced this issue Mar 20, 2024

Issue #421: Add somascan code (#441)

28921cc

* Add the actual somascan code, missed in #433

cristinaetrv pushed a commit to cristinaetrv/bystro that referenced this issue Mar 20, 2024

Issue bystrogenomics#421: Add somascan support code

a2cab3e

cristinaetrv added the .task list A checklist of smaller tasks label Mar 21, 2024

akotlar added a commit to akotlar/bystro that referenced this issue May 21, 2024

Issue bystrogenomics#421: Proxy Opensearch Request

e8eb55f

cristinaetrv closed this as completed May 21, 2024

akotlar added a commit to akotlar/bystro that referenced this issue May 23, 2024

issue bystrogenomics#421: improved querying interface, converting to …

54165d7

…array of structs

akotlar added a commit to akotlar/bystro that referenced this issue May 23, 2024

Issue bystrogenomics#421: Improved Proteomic Join Interface and Impro…

5eca9c4

…ve Search Interface

akotlar added a commit to akotlar/bystro that referenced this issue May 23, 2024

Issue bystrogenomics#421: Improved Proteomic Join Interface and Impro…

76df8cb

…ve Search Interface

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sprint 9 Task List #421

Sprint 9 Task List #421

akotlar commented Mar 8, 2024 •

edited by cristinaetrv

Loading

akotlar commented Mar 29, 2024

akotlar commented Apr 1, 2024

akotlar commented Apr 1, 2024

Sprint 9 Task List #421

Sprint 9 Task List #421

Comments

akotlar commented Mar 8, 2024 • edited by cristinaetrv Loading

akotlar commented Mar 29, 2024

akotlar commented Apr 1, 2024

akotlar commented Apr 1, 2024

akotlar commented Mar 8, 2024 •

edited by cristinaetrv

Loading