-
Notifications
You must be signed in to change notification settings - Fork 20
3. Running Software and Suggested Pipeline
PAINTOR -input.files [input filename] -in [input directory] -out [output directory] -Zhead [Zscore header(s)] -LDname [LD suffix(es)] -annotations [annotation1,annotation2...] <other options>
-input
(required) Filename of the input file containing the list of the fine-mapping loci [default: N/A]
-Zhead
(required) The name(s) of the Zscore column in the header of the locus file (comma separated) [default: N/A]
-LDname
(required) Suffix(es) for LD files. Must match the order of Z-scores in which the -Zhead flag is specified (comma separated) [Default:N/A]
-annotations
The names of the annotations to include in model (comma separated) [default: N/A]
-enumerate
specify this flag if you want to enumerate all possible configurations followed by the max number of causal SNPs (eg. -enumerate 3 considers up to 3 causals at each locus) [Default: not specified]
-in
Input directory with all run files [default: ./ ]
-out
Output directory where output will be written [default: ./ ]
-Gname
Output Filename for enrichment estimates [default: Enrichment.Estimate]
-Lname
Output Filename for the final sum of log bayes factors [default: Log.BayesFactor]
-RESname
Suffix for output files of results [Default: results]
-ANname
Suffix for annotation files [Default: annotations]
-MI
Maximum iterations for algorithm to run [Default: 10]
-GAMinital
Initialize the enrichment parameters to a pre-specified value (comma separated) [Default: 0,...,0]
-variance
specify prior variance on the causal effect sizes scaled by sample size [Default: 30]
-num_samples
specify number of samples to draw for each locus [Default: 1000000]
-set_seed
specify an integer as a seed for random number generator [default: clock time at execution]
-max_causal
specify the number of causals to pre-compute enrichments with [default: 2]
PAINTOR defaults to doing approximate inference by using Importance Sampling with 1 million draws/locus (specified with -num_samples
flag). The way the algorithm works is that it will infer the enrichment parameters first by doing enumeration under the assumption of 2 causal varaints per locus (can be changed by specifying the -max_causal
flag). Then it will do one round of Importance Sampling to compute posterior probabilities for SNPs to be causal.
$> ./PAINTOR -input input.files -Zhead ZSCORE.P1 -LDname LD1 -in RunDirectory/ -out OutDirectory/ -annotations Coding,DHS
For moderately sized loci and reasonable number of causal variants, one can elect to run full enumeration where every possible model is considered. To do so you can use the -enumerate [number of causals]
flag in the specication:
$> ./PAINTOR -input input.files -Zhead ZSCORE.P1 -LDname LD1 -in RunDirectory/ -out OutDirectory/ -enumerate 3 -annotations Coding,DHS
If loci are less than 500 SNPs, then it is reasonable and generally recommended to run with full enumeration as this will likely give the most accurate results.
$ ./PAINTOR -input input.files -Zhead ZSCORE.P1,ZSCORE.P2 -LDname LD1,LD2 -in RunDirectory/ -out OutDirectory/ -enumerate 3 -annotations Coding,DHS
Approximate inference is also applicable for multi-population/multi-trait fine-mapping.
In order to determine which annotations are relevant to the phenotype being considered, we recommend running PAINTOR on each annotation independently.
Example: Pipeline for a pool of 100 annotations for a single population.
>$ ./PAINTOR -input input.files -Zhead ZSCORE.P1 -LDname LD1 -in RunDirectory/ -out OutDirectory/ -enumerate 2 -Gname Enrich.Base -Lname BF.Base
>$ ./PAINTOR -input input.files -Zhead ZSCORE.P1 -LDname LD1 -in RunDirectory/ -out OutDirectory/ -enumerate 2 -annotations A1 -Gname Enrich.A1 -Lname BF.A1
>$ ./PAINTOR -input input.files -Zhead ZSCORE.P1 -LDname LD1 -in RunDirectory/ -out OutDirectory/ -enumerate 2 -annotations A2 -Gname Enrich.A2 -Lname BF.A2
>$ ./PAINTOR -input input.files -Zhead ZSCORE.P1 -LDname LD1 -in RunDirectory/ -out OutDirectory/ -enumerate 2 -annotations A3 -Gname Enrich.A3 -Lname BF.A3
.
.
.
>$ ./PAINTOR -input input.files -Zhead ZSCORE.P1 -LDname LD1 -in RunDirectory/ -out OutDirectory/ -enumerate 2 -annotations A100 -Gname Enrich.A100 -Lname BF.A100
After obtaining the output for all of the annotations marginally, prioritize annotations based on the improvement in the model fit. Take the top annotations (usually no more than 4 or 5) to enter the final model that are roughly uncorrelated with one another. We recommend correlation matrices for this process. Then use those annotations in a final model to compute trait-specific posterior probabilities for causality:
Note: it is also possible to do model selection using alternative approaches such as stratifed LD-score regression. This has the advantage that it will learn the relevant functional data by leveraging the entire genome as opposed to restricting to just the significant GWAS risk loci.
Final run to obtain posteriors
>$ ./PAINTOR -input input.files -Zhead ZSCORE.P1 -LDname LD1 -in RunDirectory/ -out OutDirectory/ -annotations A5,A20,A93 -Gname Enrich.Final -Lname BF.Final