Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

integrating multiple streams of linked data #9

Open
srearl opened this issue Jun 7, 2018 · 2 comments
Open

integrating multiple streams of linked data #9

srearl opened this issue Jun 7, 2018 · 2 comments

Comments

@srearl
Copy link
Member

srearl commented Jun 7, 2018

A data challenge that I wrestle with is integrating multiple streams of linked or related data. An example would be a research effort that involves collecting environmental samples then running a series of analyses on those samples. The analyses could be field measurements, a manual process conducted in the lab, addressed using instrumentation, or many others, and any combination of those. The outcome of each step or analysis must be related to other outcomes or workflows. I use custom web applications and databases to address this but that approach is complicated and a lot of work. It would be great if there was a platform or tool that could be used for such workflows that would be generalizable enough to cover a wide array of situations and use- cases.

@jhp7e
Copy link

jhp7e commented Jun 11, 2018

Some of the tidyr functions in R (notably gather, spread, separate and unite, linked with base R merge) could be helpful with this.... The big challenge will be that the links that glue things together for merging are often fuzzier than one might like.... I've got one dataset where some data is reported to the year, month and day, but other related data are only reported to the nearest year and month. Coding of stations can also be inconsistent..... An interesting problem!

@kcawley
Copy link

kcawley commented Jun 11, 2018

NEON uses "named locations", date/time, sample ID, and sample class to link together this type of data on the OS side. For the IS side we use a "measurement stream", which is a combination of sensor (e.g., all air temperature sensors are given a unique ID and also an ID for the part number that they all share), sensor stream (i.e., temperature or pressure, etc.), and named location. NEON's isn't necessarily the best framework, but whatever is used, a consistent ontology and database to track terms and IDs is essential.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants