RohHunter documentation

RohHunter is a tool for run of homozygosity (ROH) detection based on a variant list in VCF format.

RohHunter uses the allele frequency of variants to calculate the probability to see a ROH by chance.
Allele frequency information can be annotated to the variant list via Ensembl VEP.

Algorithm description

These are steps the RohHunter algorithm performs:

Filter variants (markers) by quality to remove false genotype calls:
- Depth (default: ≥20)
- Variant Q score (default: ≥30)
Determine raw stretches of homozygous markers
Assign probability to observe ROH by chance
- based on allele frequency, e.g. using 1000g and gnomAD
Remove regions with low probability (default: <Q30)
Merge adjacent ROHs based on
- distance in markers (default: ≤1 or ≤1% of ROH marker count)
- distance in bases (default: ≤50% of ROH base count)
Filter based on
- Number of markers (default: ≥20)
- Size (default: ≥20Kb)

Example

The following image visualizes the algorihtm and show how it copes with a genotyping error (at the start of exon 2):

Using external allele frequency sources

Instead of using VEP annotations as source of allele frequency information, an external database of allele frequencies can be provided via the 'af_source' parameter.
We suggest to use genomAD in version 3.1 or higher as allele frequency database.

Pre-processing of the external database

It is important to normalize the allele frequency database (and the variant list) so that most variants can be annotated with allele frequency:

Split multi-allelic variants to several rows, e.g. with VcfBreakMulti.
Left-align InDels e.g. with VcfLeftNormalize.
Sort variants according to position, e.g. with VcfStreamSort.

Finally, the allele frequency database has to be compressed with bgzip and index with tabix.

Run-time using an external database

Using an exteral allele frequency database increases the run-time of the tool, since all variants have to be looked up in the database.

Our benchmarks show the following runtime increase when using the genomAD genome database:

Exome (60K variants) from 4.3s (annotated) to ~100s.
Genome (4.8M variants) from 3.3m (annotated) to ~90m.

Thus, for genomes it is favorable to use annotated variants lists if available.

ROHs and consanguinity

Many large ROHs in a child can be a indicator for consanguinity of the parents.

This plot shows the ROH size sum of ROHs larger than 500kb for WGS (Illumina TruSeq DNA PCR-Free):

This plot shows the ROH size sum of ROHs larger than 500kb for WES (Agilent SureSelect Human All Exon V7):

This plot shows the ROH size sum of ROHs larger than 500kb for patients with different degrees of consanguinity:

It is pretty clear from the plots that a ROH size sum larger than 75Mb is a pretty good indicator for consanguinity of the parents.

Help and ChangeLog

The RohHunter command-line help and changelog can be found here.

back to ngs-bits

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

index.md

index.md

RohHunter documentation

Algorithm description

Example

Using external allele frequency sources

Pre-processing of the external database

Run-time using an external database

ROHs and consanguinity

Help and ChangeLog

Files

index.md

Latest commit

History

index.md

File metadata and controls

RohHunter documentation

Algorithm description

Example

Using external allele frequency sources

Pre-processing of the external database

Run-time using an external database

ROHs and consanguinity

Help and ChangeLog