Skip to content
This repository has been archived by the owner on Aug 10, 2022. It is now read-only.

Get a fasta

nyoungb2 edited this page Sep 24, 2013 · 8 revisions

Get a fasta of all/certain CRISPR loci elements (eg. spacers or direct-repeats).

For most scripts, queries can be refined to specific taxa and/or subtypes. Just use '-subtype', '-taxon_name', or '-taxon_id' to refine the query (see examples below).

Getting a fasta of all spacers

$ CLdb_array2fasta.pl -d CLdb.sqlite > spacers.fna

Getting a fasta of all spacers for a particular subtype

$ CLdb_array2fasta.pl -d CLdb.sqlite -sub "I-B" > spacers_IB.fna

Getting a fasta of all spacers for 2 subtypes

$ CLdb_array2fasta.pl -d CLdb.sqlite -sub "I-B" "I-C" > spacers_IB_IC.fna

Getting a fasta of all unique spacer sequences

$ CLdb_array2fasta.pl -d CLdb.sqlite -g > spacer_groups.fna

Getting a fasta of all unique spacer sequences oriented by the leader

Important for spacer blasting

$ CLdb_array2fasta.pl -d CLdb.sqlite -g -l > spacer_groups.fna

Getting a fasta of all direct repeats

$ CLdb_array2fasta.pl -d CLdb.sqlite -r > DR.fna

Getting a fasta of all direct repeat consensus sequences

$ CLdb_DRconsensus2fasta.pl -d CLdb.sqlite > DR_consensus.fna

Getting a nucleotide fasta of the entire locus, array, or operon region

$ CLdb_getRegionSequence.pl -d CLdb.sqlite

Getting a fasta of all protospacers (spacer blasting must be done prior)

$ CLdb_proto2fasta.pl -d CLdb.sqlite  > protos.fna