Database Setup

1) Loci.txt table (tab-delimited); columns need:

"*" blank values allowed

"**" Possible values: "present", "absent"

"***" Possible values: "intact", "absent", "broken", "shuffled"

"broken" = some genes missing "shuffled" = gene order

This table can easily be made in Excel and then saved as a tab-delimited file (example).

2) Array.txt table files (tab-delimited; copy and paste from CRISPRFinder); columns needed:

Just copy and paste from CRISPRFinder (example).

3) Genbank files for each organism of interest

4) Fasta files of each genome (only needed genome sequence information is not included in the genbank files).

The directory name for this example: './CLdb/'

The example loci table: 'loci.txt'

$ mkdir CLdb
$ cd CLdb
$ mkdir genbank

I will refer to this directory as '$CLdb_home'

$ mkdir array

$ mkdir fasta

The if no scaffold names are provided in the loci.txt table, the 'CLDB_ONE_CHROMOSOME' is used for the 'Scaffold' field.
Scaffold names in the genome fasta files should match the scaffold names in the loci.txt table.

Provide feedback