How to Cite PheWeb

For a list of available instances of PheWeb, navigate here. For a walk-through demo see here. If you have questions or comments, check out our Google Group.

How to Cite PheWeb

If you use the PheWeb code base for your work, please cite our paper:

Gagliano Taliun, S.A., VandeHaar, P. et al. Exploring and visualizing large-scale genetic associations by using PheWeb. Nat Genet 52, 550–552 (2020).

How to Build a PheWeb for your Data

If this is broken, open an issue on github and hopefully I can help.

1. Install PheWeb

pip3 install pheweb

If that doesn't work, follow the detailed install instructions.

2. Create a directory and `config.py` for your new dataset

mkdir ~/my-new-pheweb && cd ~/my-new-pheweb

This directory will store all the files pheweb makes for your dataset. All pheweb ... commands should be run in this directory.

Make config.py in this directory. In it, either set hg_build_number = 19 or hg_build_number = 38. Other options you can set are listed here.

3. Check that your GWAS summary statistics files will work

You need one file for each phenotype. Most common GWAS file formats should work. Here are the requirements:

It needs a header row.
Columns can be delimited by tabs, spaces, or commas.
It needs a column for the reference allele (which must always match the bases on the reference genome that you specified with hg_build_number) and a column for the alternate allele. If you have a MARKER_ID column like 1:234_C/G, that's okay too. If you have an allele1 and allele2, and sometimes one or the other is the reference, then you'll need to modify your files.
It can be gzipped if you want.
Variants must be sorted by chromosome and position, with chromosomes in the order [1-22,X,Y,MT].

The file must have columns for:

column description	name	other allowed column names	allowed values
chromosome	`chrom`	`#chrom`, `chr`	1-22, `X`, `Y`, `M`, `MT`, `chr1`, etc
position	`pos`	`beg`, `begin`, `bp`	integer
reference allele	`ref`	`reference`	must match reference genome
alternate allele	`alt`	`alternate`	anything
p-value	`pval`	`pvalue`, `p`, `p.value`	number in [0,1]

You may also have columns for:

column description	name	other allowed column names	allowed values
minor allele frequency	`maf`		number in (0,0.5]
allele frequency (of alternate allele)	`af`	`a1freq`, `frq`	number in (0,1)
AF among cases	`case_af`	`af.cases`	number in (0,1)
AF among controls	`control_af`	`af.controls`	number in (0,1)
allele count	`ac`		integer
effect size (of alternate allele)	`beta`		number
standard error of effect size	`sebeta`	`se`	number
odds ratio (of alternate allele)	`or`		number
R2	`r2`		number
number of samples	`num_samples`	`ns`, `n`	integer, must be the same for every variant in its phenotype
number of controls	`num_controls`	`ns.ctrl`, `n_controls`	integer, must be the same for every variant in its phenotype
number of cases	`num_cases`	`ns.case`, `n_cases`	integer, must be the same for every variant in its phenotype

Column names are case-insensitive. If your file has a different column name, set field_aliases = {"column_name": "field_name"} in config.py. For example, field_aliases = {'P_BOLT_LMM_INF': 'pval', 'NSAMPLES': 'num_samples'}.

Any field can be null if it is one of ['', '.', 'NA', 'N/A', 'n/a', 'nan', '-nan', 'NaN', '-NaN', 'null', 'NULL']. If a required field is null, the variant gets dropped.

If your pval is log10 (like in REGENIE output), then set these variables in config.py: pval_is_neglog10 = True and field_aliases = {'LOGP':'pval'}.

4. Make a list of your phenotypes

Inside of your data directory, you need a file named pheno-list.json that looks like this:

[
 {
  "assoc_files": ["/home/peter/data/ear-length.gz"],
  "phenocode": "ear-length"
 },
 {
  "assoc_files": ["/home/peter/data/a1c.X.gz","/home/peter/data/a1c.autosomal.gz"],
  "phenocode": "A1C"
 }
]

Each phenotype needs assoc_files (a list of paths to association files) and phenocode (a string representing your phenotype that is used in filenames and URLs, comprised of [A-Za-z0-9_~-]).

If you want, you can also include:

phenostring (string): a name for the phenotype. Shown in tables and tooltips and page headers.
category (string): groups together phenotypes in the PheWAS plot. Shown in tables and tooltips.
num_cases, num_controls, and/or num_samples (number): if your input data only has AC or MAC, this will be used to calculated AF or MAF. Shown in tooltips. If your input data has correctly-named columns for these, the command pheweb phenolist read-info-from-association-files will add them into your existing pheno-list.json.
anything else you want, but you'll have to modify templates to use it.

You can use a csv by running:

pheweb phenolist import-phenolist "/path/to/pheno-list.csv"

or you can make one from scratch by running:

pheweb phenolist glob --star-is-phenocode "/home/peter/data/*.gz"

You can see other methods here.

5. Load your association files

Run pheweb process.

To distribute jobs across a cluster, follow these instructions.

To include VEP annotations, follow these instructions.

If something breaks and you can't understand the error message or it's something that PheWeb should support by default, open an issue on github or email me.

6. Serve the website

Run pheweb serve --open.

That command should either open a browser to your new PheWeb, or it should give you a URL that you can open in your browser to access your new PheWeb. If it doesn't, follow the directions for hosting a PheWeb and accessing it from your browser.

More options:

To run pheweb through systemd, see sample file here. To use Apache2 or Nginx, see instructions here. To require login via OAuth, see instructions here. To track page views with Google Analytics, see instructions here. To reduce storage use, see instructions here. To customize page contents, see instructions here.

PheWeb can display genetic correlations generated by another tool. To use this feature, set show_correlations = True in config.py and place the output of the rg pipeline as pheno-correlations.txt in the same folder as pheno-list.json.

To hide the button for downloading summary stats, add download_pheno_sumstats = "secret" and SECRET_KEY = "your random string" in config.py. That will make a secret page (printed to the console when you start the server) to share summary stats. To hide the button for downloading top hits and phenotypes, add download_top_hits = "hide" and download_phenotypes = "hide" respectively.

To allow dynamically filtering the manhattan plot, run pheweb best-of-pheno and set show_manhattan_filter_button=True in config.py.

Modifying PheWeb

See instructions here. See documentation about the files in generated-by-pheweb/ here.

Name		Name	Last commit message	Last commit date
Latest commit History 1,322 Commits
.github		.github
etc		etc
pheweb		pheweb
tests		tests
.gitignore		.gitignore
.travis.yml		.travis.yml
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

How to Cite PheWeb

How to Build a PheWeb for your Data

1. Install PheWeb

2. Create a directory and `config.py` for your new dataset

3. Check that your GWAS summary statistics files will work

4. Make a list of your phenotypes

5. Load your association files

6. Serve the website

More options:

Modifying PheWeb

About

Contributors 8

Languages

License

statgen/pheweb

Folders and files

Latest commit

History

Repository files navigation

How to Cite PheWeb

How to Build a PheWeb for your Data

1. Install PheWeb

2. Create a directory and config.py for your new dataset

3. Check that your GWAS summary statistics files will work

4. Make a list of your phenotypes

5. Load your association files

6. Serve the website

More options:

Modifying PheWeb

About

Resources

License

Stars

Watchers

Forks

Contributors 8

Languages

2. Create a directory and `config.py` for your new dataset