Scripts and config file to make an sequence alignment from a bunch of GPCR protein sequences
See snooker_align.vsd for workflow.
The original alignment script was made for https://dx.doi.org/10.1186/1471-2105-12-332
- NCBI Blast
- hmmer
- Clustalo
- Perl packages:
- Bio::SeqIO
- Text::LevenshteinXS
Steps to get an alignment
- Create numbering schema
- Download human swissprot alignment csv from gpcrdb website
- Convert csv to with only positions of numbering schema
- Create fasta from csv
- Run blast with query seed alignment against swissprot/trembl/ensembl 5.1 Make sure all seed sequences have been found
- Retrieve sequences of ids
- Make sequences unique within same species
- Remove species with less than 100 sequences
- Run per tm alignment script
- Remove sequences less than 9aa different within same species
- Remove species with less than 100 sequences
- Make tree of sequences
- Generate entropy file based on tree
See runs.md for commands to perform the steps.