Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON-LD Validation Scoping #33

Open
bleonar5 opened this issue Nov 29, 2023 · 1 comment
Open

JSON-LD Validation Scoping #33

bleonar5 opened this issue Nov 29, 2023 · 1 comment
Assignees
Labels
SpecificationIssue Issue concerning the psych-DS specification and schema model

Comments

@bleonar5
Copy link
Contributor

bleonar5 commented Nov 29, 2023

We need to lay out some clear parameters for what we will be considering a "valid" JSON-LD object, and also possibly make some adjustments with respect to the Psych-DS specification as laid out in the Google Doc.

Here are some of the official requirements of JSON-LD, as found here:

  • A JSON-LD document MUST be able to express a linked data graph* (elaborated below) []
  • A JSON-LD document MUST be a valid JSON document [X]
  • All JSON constructs MUST have semantic meaning in a JSON-LD document: [X]
  • JSON arrays MUST NOT be interpreted as defining an object ordering. [X]

(There are other bullets in their list, but they are all SHOULDS and MAYs, where we are most interested in the MUSTs

Guidelines for a linked data graph: (in the same doc as above)

  • Subject, objects and edges all SHOULD be identified with IRIs

^^^Part of my issue with this combination of requirements is that they seem so bottom out with "being valid JSON", because:

  1. Even though a JSON-LD document must be expressable as a linked data graph, the requirements for a linked data graph are all non-normative, all SHOULDs.
  2. The requirement that JSON constructs must have meaning refers to something intentional rather than technical. That is, from what I can tell, it's not saying that all JSON constructs must be linked to some informative IRI, it's saying that the user must not use values that don't mean anything
  3. The requirement that arrays must be interpreted as unordered is a matter of interpretation, not computer validation.

In the world of JSON-LD there are an abundance of SHOULDs and barely any MUSTs. What we have to decide is whether to codify our own set of MUSTs for Psych-DS specific JSON-LD, or to just keep valid JSON format as the only MUST and implement the variety of SHOULDs as warnings.

For instance, we allow users to include non-schema.org keys (or rather, string keys that don't link to any IRI) within their metadata, which is allowed according to strict JSON-LD rules, but recommended against. Here are some questions:

  • Do we want to require that schema.org context MUST be included, and that the required terms of our spec such as "name" and "variableMeasured" MUST expand to their full schema.org IRIs?
  • Do we want to allow for expanded, contextless JSON-LDs as valid metadata files?
  • If we do choose to implement the full gamut of JSON-LD SHOULDs, are we prepared to present those recommendations to the user, at risk of overwhelming them?
  • Do we want to allow for namespaces other than schema.org in the context?
  • Do we want the validator to check that JSON-LD IRIs actually point to real web pages? [This has implications for our eventual python version, for which offline functionality is a desideratum]

There are other questions, but this set covers the gist of it. Including some misc. references below, such as Best Practices and the official "JSON-LD grammar":

Here are some "best practices" put forth by W3C:

Best Practice 1: Publish data using developer friendly JSON
Best Practice 2: Use a top-level object
Best Practice 3: Use native values
Best Practice 4: Assume arrays are unordered
Best Practice 5: Use well-known identifiers when describing data
Best Practice 6: Provide one or more types for JSON objects
Best Practice 7: Identify objects with a unique identifier
Best Practice 8: Things not strings
Best Practice 9: Nest referenced inline objects
Best Practice 10: When describing an inverse relationship, use a referenced property
Best Practice 11: External references SHOULD use typed term
Best Practice 12: Ordering of array elements
Best Practice 13: Provide a representation of the entity related by URL
Best Practice 14: Cache JSON-LD Contexts

JSON-LD Grammar

(Interesting point from the above grammar: unlinked keys in the JSON-LD MUST be ignored when processed. We may want to remind users that adding unlinked keys to their metadata does not technically add to its richness, since it will be ignored during any official processing on the web)

additional MUSTs that we can glean from the grammar:

note:
Screenshot 2023-11-29 at 1 29 26 PM
This refers to the eventual deprecation of non-IRI keys in JSON-LD

@bleonar5 bleonar5 added the SpecificationIssue Issue concerning the psych-DS specification and schema model label Nov 29, 2023
@bleonar5 bleonar5 self-assigned this Nov 29, 2023
@bleonar5
Copy link
Contributor Author

After doing a deeper dive into the jsonld.js package, I can see that it does produce error messages that correspond directly to a lot of the MUSTs from the JSON-LD Grammar. These mostly seem to revolve around restricted usages for the various "@" keywords.

This is great, because it means we can offload a lot of this fine-grained syntactic validation of json-ld objects to the official package itself, funneling its error messages into our app's validation "issues" that get presented to the user. One nice thing about these error cases is that they only really arise when you begin to use some of JSON-LDs more complex features, so there's not as much of a worry of these checks being prohibitive to beginners.

There's another category of JSON-LD MUSTs that result in ignored content rather than an error message. For instance, in the JSON-LD playground, using a key that resolves to a string instead of an IRI results in that key being dropped. We have to decide whether such violations ought to be errors or warnings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
SpecificationIssue Issue concerning the psych-DS specification and schema model
Projects
None yet
Development

No branches or pull requests

1 participant