Skip to content
This repository has been archived by the owner on Aug 10, 2022. It is now read-only.

Database Setup

nyoungb2 edited this page Sep 24, 2013 · 21 revisions

Files required

1) Loci.txt table (tab-delimited); columns need:

  • Taxon_ID
  • Taxon_Name
  • Subtype*
  • Locus_Start
  • Locus_End
  • Operon_Start*
  • Operon_End*
  • Array_Start*
  • Array_End*
  • Array_status**
  • Operon_status***
  • Genbank_file
  • Array_File*
  • Fasta_File*
  • Author
  • File_Creation_Date*

"*" blank values allowed

"**" Possible values: "present", "absent"

"***" Possible values: "intact", "absent", "broken", "shuffled"

"broken" = some genes missing "shuffled" = gene order

This table can easily be made in Excel and then saved as a tab-delimited file (example).

2) Array.txt table files (tab-delimited; copy and paste from CRISPRFinder); columns needed:

  • Start position
  • Direct repeat sequence
  • Spacer sequence
  • End position

Just copy and paste from CRISPRFinder (example).

3) Genbank files for each organism of interest

4) Fasta files of each genome (only needed genome sequence information is not included in the genbank files).


Directory setup

The directory name for this example: './CLdb/'

The example loci table: 'loci.txt'

$ mkdir CLdb
$ cd CLdb
$ mkdir genbank

I will refer to this directory as '$CLdb_home'

  • place/symlink genbank files in this directory

$ mkdir array

  • place/symlink array files in this directory

$ mkdir fasta

  • place/symlink genome fasta files in the directory (optional)

WARNINGs

  • The if no scaffold names are provided in the loci.txt table, the 'CLDB_ONE_CHROMOSOME' is used for the 'Scaffold' field.

  • Scaffold names in the genome fasta files should match the scaffold names in the loci.txt table.

Clone this wiki locally