These scripts, written in the R language, compile csv files in to NBN Verification Rule file format
Examples of the file formats can be found in the templates folder.
A file named species.csv
shall list species data in 3 columns, one row per
species. The first row must contain the headings as
TVK,NAME,ID_DIFF
Each subsequent row should contain the taxon-version key, Latin name, and a number from 1 to 5 indicating identification difficulty. The difficulty number can be omitted if not needed.
Proposal Add a fourth column, ADDITIONAL, containing an integer, indicating the need and reason for manual verification. The additional number can be omittted if not needed. Where an additional number is needed but no identification difficulty, a comma must be added to indicate the empty field so the additional number is not mistaken for identification difficulty. Not yet implemented in the scripts.
A file named difficulties.csv
shall list identification difficulty data in
2 columns, one row per identification difficulty. The first row must contain
the headings as
ID,TEXT
Each subsequent row contains the identification difficulty number, which can be from 1 to 5, followed by the text describing the difficulty. Put the text in quotation marks ("") so that commas are not misunderstood.
Proposal A file named additional.csv
shall list reasons for manual
verification in 2 columns, one row per reason. The first row must contain the
headings as
ID,TEXT
Each subsequent row contains a number followed by the text describing the reason. Put the text in quotation marks ("") so that commas are not misunderstood. Not yet implemented in the scripts.
Rules indicating the recording period, for example if a species has been
introduced or become extinct, can be defined in a file named period.csv
.
It shall contain 7 columns with one row per species. The first row must
contain the headings as
TVK,DAY,MONTH,YEAR,DAY2,MONTH2,YEAR2
Each subsequent row contains the taxon-version key of the species followed by a start date, if relevant, in the DAY, MONTH, YEAR columns and an end date, if relevant, in the DAY2, MONTH2, YEAR2 columns. If a date is not relevant, simply omit values (but not the commas). Dates can be vague so that day or month can be omitted. A vague start date, e.g. 2006, is set to the start of the period, i.e. 1st January 2006. A vague end date, e.g. 3,2010, is set to the end of the period, i.e. 31st March 2010.
Rules indicating the season in which a species can be observed can be defined
in a file named flightperiod.csv
. This might be used for migratory species,
only present at certain times of year, or insects only commonly observed in
their adult life stage. The intention is to develop this to allow rules
for multiple periods linked to different life stages.
At present the file has 5 columns with one row per species. The first row must contain the headings as
TVK,DAY,MONTH,DAY2,MONTH2
Each subsequent row must contain the taxon-version key of the species, the start date in the DAY, MONTH columns and the end date in the DAY2, MONTH2 columns. The DAY and DAY2 fields may be omitted (but not the commas) in which case the period starts on the first day of MONTH and ends on the last day of MONTH2
Geographical rules are defined by the ten kilometers squares where a species
is known to be present. Squares can be given for Britain, Ireland and the
Channel Islands using OSGB, OSNI or UTM30 grid references respectively. They
should be contained in a file named tenkm.csv
having two columns. The
first row must contain the headings as
TVK,GRIDREF
Each subsequent row contains the taxon-version key of a species followed by one ten kilometer square in which it is found. There will be many rows for each species.
The scripts are written in R. They can be run manually but the aspiration is to automate them using Github continuous integration.
Ensure your working directory is set to the scripts folder before starting.
At an R prompt, setwd(path/to/scripts)
or, in R Studio, open one of the files
and select from the menus, Session > Set Working Directory > To Source File
Location
schemes.r
holds configuration for each taxonomic group, identifying the path to their csv files, and the path for the output rules.species.r
performs checks onspecies.csv
and other scripts should not be run until this reports no problems. Executespecies_checks("<group>")
, where<group>
is replaced by one of the taxonomic groups listed inschemes.r
.period.r
Executeperiod_rules("<group>")
to build period rules.flightperiod.r
Executeflightperiod_rules("<group>")
to build seasonal period rules.tenkm.r
Executetenkm_rules("<group>")
to build distribution rules.idifficulty.r
Executeidifficulty_rules("<group>")
to build identification difficulty rules.