Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sample grouping class #233

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from
Draft

Sample grouping class #233

wants to merge 4 commits into from

Conversation

sierra-moxon
Copy link
Member

Guidelines

Soft Schema Freeze

The nmdc-schema and berkeley-schema-fy24 schemas are under a soft freeze, which means changes should not be made that have any downstream implications. To ensure this, all PRs created creating during the freeze will be closely reviewed with every component of the NMDC system in mind.

Reviewers

To ensure no changes are made unexpectedly, PR creators will request reviews from all Berkeley Schema Roll Out task coordinators.

We expect task coordinators to review PRs and provide feedback/approval within 1 week of when they are identified as reviewers.

PRs will NOT be merged until all task coordinators (or one of their delegates) have approved.

Expedition, questions, and discussion can happen at any meeting.

Delays in review & merging should be addressed in meetings or with NMDC leadership.

PR Information

What type of PR is this? (check all applicable)

  • Refactor
  • Feature
  • Bug Fix
  • Optimization
  • Documentation
  • Schema change: Structure and content
    • created, updated, or deleted a class, slot, or enum
    • changed whether a slot is multivalued
    • changed the way a slot is assigned to a class
    • changed the permissible_values of an enum
    • etc.
  • Schema change: Cleanup and preparation
    • updated the description of a class, slot, or enum
    • updated the mappings of a class, slot, or enum to an ontology
    • added an enum for future use (it is not in the range of any slot)
    • etc.

Description

PRs should be small and concise.

Aim to create small, focused pull requests that fulfill a single purpose. Smaller pull requests are easier and faster to review and merge, leave less room to introduce bugs, and provide a clearer history of changes.

  • Replace this text with a description of what this PR branch contains. Please keep in mind that all reviewers will be reading this description. Example: "In this branch, I..."

Related Issues

All PRs should relate to or fix an issue(s). Please identify the issue(s) below.

  • Related Issue(s): #
  • Fixes: #

Did you add/update any tests?

  • Yes
  • No (Add a justification below)
  • I need help with writing tests

Could this schema change make it so any valid data becomes invalid?

This is a question about what the schema allows. It is not a question about what happens to exists in the NMDC database right now.

Example: If, in this PR branch, you renamed a slot from foo to foo_bar, the answer to this question would be "yes," even if nothing in the NMDC database currently uses the foo slot.

More examples: slot or class name changes, changes to a slot's multivalued state, changes to a slot's range (e.g. string to integer), changes to slot assignments to classes, changes to an enum's permissible_values

  • Yes (A migrator is required)
  • No
  • I need help determining this

If you answered "Yes", does this PR branch include that migrator?

  • Yes
  • No, this PR is incomplete and I need help writing the migrator

Does this PR have any downstream implications?

Examples: any change here that requires a change to workflows, workflow automation, the Mongo-to-Postgres ingest process, Jupyter notebooks, the Runtime, etc.

  • Yes (Explain below)
  • No

Copy link

github-actions bot commented Jul 24, 2024

PR Preview Action v1.4.7
🚀 Deployed preview to https://microbiomedata.github.io/berkeley-schema-fy24/pr-preview/pr-233/
on branch gh-pages at 2024-07-24 21:58 UTC

id:
required: true
description: An NMDC assigned unique identifier for a biosample or processed sample.
structured_pattern:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: figure out why LinkML breaks validation if the parent class specifies a structured_pattern.
TODO: this should be abstract.

required: true
description: An NMDC assigned unique identifier for a biosample or processed sample.
structured_pattern:
syntax: "{id_nmdc_prefix}:(bsm|procsm)-{id_shoulder}-{id_blade}$"
Copy link

@eecavanna eecavanna Jul 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the minter (part of the Runtime) sees a typecode portion (in a structured pattern) that looks like this — "(bsm|procsm)" — it uses the first substring ("bsm", in this case) when generating an ID for that class. That was implemented on around July 17, 2024.

Different parts of our people/system are currently using the pattern for two different things:

  • team members use it to say "here's a pattern that all IDs must match."
  • the minter uses it to say "here's the typecode I will use in the ID I generate."

There is an ongoing Discussion about expressing the typecode for the minter via a separate slot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants