Cerberus is a set of tools designed to characterize and enhance transcriptome annotations. Currently Cerberus can do the following:
- Represent transcript start sites (TSSs) and transcript end sites (TESs) as bed regions rather than single base pair ends
- Integrate intron chains from multiple transcriptome annotations (GTFs) to create a transcriptome of the union of them all
- Integrate TSSs and TESs from multiple GTFs as well as from outside BED sources to create end annotations from the union of them all
- Number intron chains, TSSs, and TESs found by their priority in a reference GTF
- Use the enhanced intron chain and 5'/3' end sets to annotate an existing GTF transcriptome with transcript triplets and to modify the GTF and corresponding abundance matrices to reflect the new naming scheme / identities of the transcripts
- Compute gene triplets for different sets of isoforms for each gene based on the TSSs, ICs, and TESs used among the isoforms of the gene
- Generate plots (see examples below) to visualize gene triplets on the gene structure simplex
- Compute centroids of gene triplet coordinate distributions
- Compute pairwise gene structure simplex distances between pairs of gene triplets
Please visit the Cerberus website for documentation.
Note: Cerberus is under active development. Please feel free to open an issue or email me ( freese {at} uci.edu ) if you're interested in using it!