Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scoping the Psych-DS Validator - MKS update/expire/migrate #18

Open
bleonar5 opened this issue Sep 13, 2023 · 1 comment
Open

Scoping the Psych-DS Validator - MKS update/expire/migrate #18

bleonar5 opened this issue Sep 13, 2023 · 1 comment
Labels
Scoping Discussion regarding the scope, design, and features of the psych-DS validator

Comments

@bleonar5
Copy link
Contributor

bleonar5 commented Sep 13, 2023

Scoping the Psych-DS Validator

Context

This is a draft of the requirements/architecture document for the Psych-DS validator.

The Psych-DS team is preparing to begin development on our suite of validation tools, which will include a web app, a node package, a python package, and an R package. We want the tool to be open source from the ground up, so we would like to encourage community members and collaborators to contribute feedback, suggestions, and discussions in the form of Github issues.

  • Target audience
    • The target audience for a validator tool for the Psych-DS (Dataset) standard would likely be:
      • Researchers in psychology and cognitive neuroscience who collect and analyze behavioral data. Having a tool to validate their datasets against the Psych-DS schema standardizes the structure and ensures compatibility with other tools in the ecosystem.
      • Developers of software libraries and applications for behavioral data analysis like MATLAB, Python pandas, R, etc. They can integrate the validator to check if datasets adhere to the Psych-DS standards before ingesting them into their tools.
      • Cloud platforms and repositories for sharing behavioral research data. A validator helps ensure datasets uploaded to these repositories are standardized and analysis-ready.
      • Publishers of behavioral research papers. A Psych-DS validation tool could be integrated into the submission pipeline to verify accompanying datasets meet the standard. This encourages reproducibility and rigor.
      • Educators teaching behavioral data analysis methods and best practices to students. Having students validate their own datasets with the tool teaches standardization.
      • Data scientists/analysts responsible for wrangling and cleaning heterogeneous behavioral data into a consistent format for downstream use.
    • Overall, any researcher, software developer, platform, or institution involved with curating, sharing, or analyzing behavioral research data can benefit from having a simple way to validate against a common standard. The validator makes it easier to ensure quality and interoperability of these valuable scientific datasets.
  • Scope
    • Core Scope:
      • Metadata validation - Confirm that metadata objects are present at the top level of the dataset and as sidecars to raw datafiles in subdirectories. Validate for JSON-LD schema with Schema.org Dataset type.
      • File structure validation - Confirm that data directory is organized according to Psych-DS specifications, with legally allowed subdirectories and appropriate filetypes, subdirectories and metadata files within
      • Data File validation - Confirm that variables defined within metadata have corresponding columns within the data files themselves, and that files are in allowed formats.
    • Extended Scope:
      • Dataset conversion - Automatically convert a Psych-DS compliant dataset to a BIDS compliant dataset
      • Automatic download - Create functionality within platforms/tools like PsychoPy, jsPsych, and Lookit to automatically download response data in valid Psych-DS format
      • Integration with CEDAR wizard - Create template within CEDAR wizard that will allow users to easily create valid metadata jsons
    • Optional Scope:
      • Integrate CEDAR wizard directly into web validator
      • Integrate dynamic tutorials into web validator
      • Create pipeline using web validator to upload validated datasets to OSF repository with Psych-DS tag

Contributions

Collaboration on Psych-DS is expected to follow our existing Code of Conduct.

To contribute to conversation, feel free to add comments to this issue or any of the mentioned currently open issues. You can also create your own issue by using the "Scoping" template after clicking "add new issue". If you have any questions about the process, feel free to add them as a comment on this issue, and we'll get back to you.

If you contribute in some way other than interacting with this issue, please also leave a comment below so we can add your name to this list of contributors! (Capturing both PRs and non-code contributions to this project is a key goal!!)

  • Melissa Kline Struhl
  • Brian Leonard
  • Martin Seehuus
  • Russ Poldrack
  • David Moreau
  • Josh de Leeuw
  • Eduard Klapwijk

Codebase

  • The validator tools themselves will live in this repository; additional repositories may be created at https://github.com/psych-ds/ for modularity, different validator tools etc.
  • The Psych-DS “core” repo (https://github.com/psych-ds/psych-DS) contains project orientation and the initial linkML schema that Brian is currently working on
  • Once migration is complete, (1) the linkML schema plus (2) The node CLI tool will constitute the reference implementation of Psych-DS

Documentation

  • Currently, the ‘gold standard’ record of Psych-DS is the big google doc that has been the center of our work for the past several years. This will not be a good or maintainable solution in the long term!
  • Following BIDS’s model, we plan to import the text of the specification itself into a ReadtheDocs instance; doing this piece by piece in tandem with linkML schema implementation.
  • Once migration is complete, the ReadtheDocs site will serve as the canonical documentation/reference for the specification.
  • In addition to the spec itself, this ReadtheDocs site should contain links to all validator software along with tools/resources for using & getting started with Psych-DS.
  • We should follow a defined process for de-accessioning/migrating material that’s currently in the big google doc into the new documentation and/or schema files. (See Psych-DS Specification Google Doc (longer term plans) psych-DS#29)

Issues

When significant work is being done outside of the Github repos, we should maintain a GH issue that indicates that this is the case, to avoid losing track of that work. See e.g. https://github.com/psych-ds/psych-DS/issues/30

For this repo, we'll be using the https://github.com/psych-ds/psych-DS/labels/Scoping label to indicate issues where community discussion at this stage should take place, with additional labels for further categorization. These labels, for the time being, are limited to:

Here is a complete list of smaller issues relevant to scoping the validator:

Psych-DS Validator Requirements

0. Available resources

What needs to launch with the beta versions of the CLI + web browser tools?

  • CLI tool
  • Website
  • LinkML documentation "catalogue"
  • Tutorial/step-by-step guide including CEDAR wizard (video??)
  • See BIDS docs for inspiration on tutorials/beginner guides
  • Communication plan for the launch - listserv messages, at least initial thoughts about example datasets/user testing sprints
  • More canonical datasets and more communication around uploading validated datasets to some centralized repository

1. User Requirements

  • User personas
    • Non-coder Researchers
      • This researcher in the behavioral sciences would be interested in producing datasets that conform to Psych-DS criteria, but is mostly used to accessible GUI tools like RStudio, PsychoPy, Qualtrics, Excel, etc. They would require a validator tool that is either simple to use through a publicly hosted web app, or an installable package through a GUI that they are already familiar with, such as RStudio. They would likely not be interested in managing complex custom options, and would want to trust the system to validate their dataset in a comprehensive, but default manner. They would be less interested in documentation about how the tool works, and more interested in how it is used. By keeping these non-coder friendly apps simple and transparent and giving their functionality parity with the CLI tools, we hope to enable researchers to access the benefits of both Psych-DS itself and further tools that build upon it (e.g. automatic survey scoring, repository data submission).
    • Coder Researchers
      • This researcher is more comfortable with scripting, tweaking code, and using command line tools. They would also be interested in easy, low-fuss interfaces, but they would also want GUI-less, CLI options across a number of frameworks, so they can integrate Psych-DS validation into their automatic data pipelines. They would be interested in a suite of custom options with which to tweak the validation function, as well as extensive documentation of how the tool operates (rather than how it is used).
    • Managers of Research Support Software
      • These individuals and organizations would have a vested interested in the success of research support tools like PsychoPy, jsPsych, Pavlovia, ExperimentRunner, etc, and would be interested in a tool that is clearly defined, well made, and modular enough to be integrated into their own tools/platforms.
  • Non-functional requirements (performance, security, etc.)
    • All users will have a natural interest in maintaining the anonymity and security of their datasets and participants. They will require all tools to be agnostic to the actual contents of the datasets, and to not require any uploading of files to work. The tools should all be transparent about this aspect, to reduce any concerns that might come up.
    • Based on the simplicity of the validation function, these tools should all manage to work more or less instantaneously

2. UI requirements

UI discussed in full within this issue

3. Validation Process

4. Functional Requirements

5. Non-Functional Requirements

6. Testing Requirements

7. Deployment Requirements

@bleonar5 bleonar5 added the Scoping Discussion regarding the scope, design, and features of the psych-DS validator label Sep 13, 2023
@mekline mekline transferred this issue from psych-ds/psych-DS Dec 4, 2023
@mekline
Copy link
Contributor

mekline commented Dec 11, 2023

@bleonar5 make sure new tickets are in the MVP ticket

@mekline mekline changed the title Scoping the Psych-DS Validator Scoping the Psych-DS Validator - MKS update/expire/migrate Jul 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Scoping Discussion regarding the scope, design, and features of the psych-DS validator
Projects
None yet
Development

No branches or pull requests

2 participants