Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated write_dwc() function to process Camtrap DP to Darwin Core Archives #1069

Open
peterdesmet opened this issue May 21, 2024 · 0 comments

Comments

@peterdesmet
Copy link
Member

peterdesmet commented May 21, 2024

@fmendezh our team has updated the write_dwc() function that processes Camtrap DP to Darwin Core Archives. We suggest to use this new function as part of the pipeline to process incoming Camtrap DP datasets.

Changes

  • The function is now part of a new, more lightweight R package "camtrapdp" (over "camtraptor" originally), with only five dependencies. We plan to maintain this package long-term and submit it to CRAN.
  • The function uses the known R transformation tool "dplyr" over SQL/sqlite.
  • The function produces a meta.xml file.
  • The generated csv files have been renamed to align with what IPT produces for Darwin Core (but .csv, not tab-delimited `.txt):
    • dwc_occurrence.csv -> occurrence.csv
    • dwc_audubon.csv -> multimedia.csv
  • The package no longer down-converts an incoming Camtrap DP, meaning that more of the Camtrap DP fields are mapped to Darwin Core (i.e. the DwC result will be different)

The function documentation can be found at https://inbo.github.io/camtrapdp/reference/write_dwc.html

Calling the function

To process a Camtrap DP to a Darwin Core Archive, two functions need to be called (similar as it was in "camtraptor"):

devtools::install_github("inbo/camtrapdp")
library(camtrapdp)

# 1. Not done here: download dataset from IPT + unzip

# 2. Read dataset into memory (via datapackage.json file)
x <- read_camtrapdp("https://raw.githubusercontent.com/tdwg/camtrap-dp/main/example/datapackage.json")

# 3. Convert data to DwC-A
my_dir <- "dwc"
write_dwc(x, directory = "dwc")
#> 
#> ── Transforming data to Darwin Core ──
#> 
#> ── Writing files ──
#> 
#> • 'dwc/dwc_occurrence.csv'
#> • 'dwc/dwc_audiovisual.csv'
#> • 'dwc/meta.xml'

Created on 2024-05-21 with reprex v2.1.0

Questions

  1. Is the pipeline in Camtrap DP Proof Of Concept #803 implemented in production? If so, can Camtrap DP Proof Of Concept #803 be closed?
  2. Can you test the pipeline with the new function and report any issues?
  • I have tested it on a large dataset (https://ipt.gbif-uat.org/resource?r=mica-full) without issues
  • The content of some Darwin Core terms has changed, which might have downstream affects (e.g. license now contains a license code like CC0-1.0)

Once all (potential) issues are resolved, we will release a stable (minor) release of the package. Let me know if you have any questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant