New tooling for comparing the keyword dictionary and data model schemas #337

braingram · 2024-10-17T20:40:38Z

https://jira.stsci.edu/browse/JP-3783

As it currently stands this PR:

adds a new (private) submodule stdatamodels.jwst._kwtool
with a rudimentary CLI python -m stdatamodels.jwst._kwtool.cli <keyword dictionary path>
that generates a report html file (example attached to https://jira.stsci.edu/browse/JP-3783)

Link to added docs: https://stdatamodels--337.org.readthedocs.build/en/337/jwst/kwtool/index.html

It would be helpful to have feedback on:

What comparisons are useful?
What format(s) are most helpful?

Some known issues are:

"enum" comparison will need to account for the datamodel schemas allowing now-unused values. These are currently reported as differences but being able to specify an allowed difference (which still checking the other values in the enum) would be useful to de-clutter the report.
"path" differences are reported for keywords that are re-used in multi-model models. For example EXP_TYPE currently reports a "path" difference with "meta.exposure.type" in "kwd" and multiple paths in "dwd" including the "meta.exposure.type" but also "trace.items.meta.exposure.type" (as one example). This is coming from

stdatamodels/src/stdatamodels/jwst/datamodels/schemas/spectracesingle.schema.yaml

Line 8 in 503320f

- $ref: keyword_exptype.schema

referenced here

stdatamodels/src/stdatamodels/jwst/datamodels/schemas/spectrace.schema.yaml

Line 16 in 503320f

$ref: spectracesingle.schema

(nested in meta.items). At first glance I would say this is a bug as I don't see how having every item in the list overwrite the PRIMARY EXP_TYPE is useful. However this might be a "bug" in the comparison where we don't care about this difference and we'll want some way to ignore it or not flag it as an issue in the first place. @tapastro does this usage look like a schema bug? If so I can open a separate issue.

Tasks

update or add relevant tests
update relevant docstrings and / or docs/ page
Does this PR change any API used downstream? (if not, label with no-changelog-entry-needed)
- write news fragment(s) in changes/: echo "changed something" > changes/<PR#>.<changetype>.rst (see below for change types)
- run jwst regression tests with this branch installed ("git+https://github.com/<fork>/stdatamodels@<branch>")

news fragment change types...

changes/<PR#>.feature.rst: new feature
changes/<PR#>.bugfix.rst: fixes an issue
changes/<PR#>.doc.rst: documentation change
changes/<PR#>.removal.rst: deprecation or removal of public API
changes/<PR#>.misc.rst: infrastructure or miscellaneous change

codecov · 2024-10-17T20:48:32Z

Codecov Report

Attention: Patch coverage is 97.57174% with 11 lines in your changes missing coverage. Please review.

Project coverage is 68.93%. Comparing base (fd6be8d) to head (616a09b).
Report is 8 commits behind head on main.

Files with missing lines	Patch %	Lines
src/stdatamodels/jwst/_kwtool/compare.py	97.10%	4 Missing ⚠️
src/stdatamodels/jwst/_kwtool/kwd.py	93.61%	3 Missing ⚠️
src/stdatamodels/jwst/_kwtool/__main__.py	0.00%	2 Missing ⚠️
...atamodels/jwst/_kwtool/_tests/test_against_mast.py	97.36%	1 Missing ⚠️
src/stdatamodels/jwst/_kwtool/_tests/test_dmd.py	93.33%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #337      +/-   ##
==========================================
+ Coverage   66.55%   68.93%   +2.38%     
==========================================
  Files         102      114      +12     
  Lines        5456     5910     +454     
==========================================
+ Hits         3631     4074     +443     
- Misses       1825     1836      +11

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

chamblee-st · 2024-10-23T13:10:44Z

docs/source/jwst/kwtool/keyword_dictionary.rst

+
+The "keyword dictionary" is a collection
+of json files that describe FITS keywords used in JWST files.
+Although the format is similar to jsonschema is is not compatible


"is is" should be "it is"

Thanks! I applied that fix in b94d57d

braingram · 2024-10-29T20:55:06Z

Based on discussion with @tapastro there are a few remaining issues to sort out with this PR prior to approval (based on discussion of the currently generated report).

ignore "path" differences for datamodel schema keyword definitions nested under "items" arrays (like in "HDU: EXTRACT1D KEYWORD: EXTR_X"). These don't provide a mapping that would be useful for triggering reprocessing so even if they are listed in the archive they don't allow a one-to-one mapping of value-to-file. The tool will now ignore path differences if "items" is in the path. Fixed in 66092eb
allow some enum differences (so datamodels can read old files that use outdated values) 09ef375
don't require datamodel keyword definitions to define T, F enum for bools. Fixed in b99a215
(I added this one) sort printed sets to ensure consistent ordering 04dcbbc

…_XX keywords

braingram · 2024-10-30T20:14:58Z

@tapastro I think this is ready for final review. See the comment above #337 (comment) for a checklist and linked commits for the items we discussed.

I also added a test that fetches the latest "published" keyword dictionary from MAST (using an "unofficial" service since there is no official one) and parses it generating a report as a test of the new tool. I think this is worth expanding by adding a CI job that:

checks out stdatamodels main, generates a report
checks out the PR branch, generates a report
runs diff on the report (and updates both as artifacts)

This way we will know when PRs impact keyword dictionary differences (relative the published version, not the latest dev version which is not public). However it makes sense to me to split this work into a separate PR as I'd like to get this one in to also start working on PRs to address the many differences.

tapastro

🎉

docs/source/jwst/kwtool/keyword_dictionary.rst

stscijgbot-jp mentioned this pull request Oct 17, 2024

schema editor incorrectly parses datamodel and keyword dictionary schemas spacetelescope/jwst#8903

Closed

braingram force-pushed the kwtool branch from c9373da to 2f270f9 Compare October 21, 2024 22:16

braingram marked this pull request as ready for review October 22, 2024 17:53

braingram requested a review from a team as a code owner October 22, 2024 17:53

braingram requested a review from tapastro October 22, 2024 17:53

This was referenced Oct 22, 2024

Address keywords missing from datamodels #339

Open

Address keywords in datamodels but missing from the keyword dictionary #346

Open

chamblee-st reviewed Oct 23, 2024

View reviewed changes

braingram force-pushed the kwtool branch from d7213c2 to b94d57d Compare October 23, 2024 14:38

braingram added 19 commits October 30, 2024 13:04

WIP: kwtool

0a42396

wip cli

fc064dc

report diffs

02e956d

handle type differences between kwd and dm

43ed45b

only report path diffs for archive desination keywords, don't check P…

41df397

…_XX keywords

update comments

42534be

use argparse for cli

87be073

cleanup style

7be135d

add tests

f6cd4cd

fix style

2362fe1

add enum test

515af70

start docs

e91449f

flush out docs

5a6c7bf

add changelog fragment

f53015c

fix list in keyword dictionary docs

a8d3cf1

skip reference files for keyword comparison

4d6e2c0

fix docs typo

acf5b99

add ignore of ReferenceFileModel to docs

bf867e8

drop outdated table from docs

5d6e913

braingram added 7 commits October 30, 2024 13:04

ignore paths with items

02764b3

don't require T F enum for bools

6080762

remove stray file

36e4b16

convert sets to sorted lists for report

04dcbbc

test keyword dictionary from mast

6ab180c

fix enum boolean patching

687fa81

add timeout to mast call

2b8e9b4

braingram force-pushed the kwtool branch from 3932b7c to 2b8e9b4 Compare October 30, 2024 17:04

braingram added 2 commits October 30, 2024 16:08

allow expected differences

09ef375

add note about service

5f4e0eb

braingram mentioned this pull request Oct 30, 2024

remove schema_editor spacetelescope/jwst#8909

Merged

10 tasks

tapastro approved these changes Oct 31, 2024

View reviewed changes

docs/source/jwst/kwtool/keyword_dictionary.rst Outdated Show resolved Hide resolved

docs/source/jwst/kwtool/keyword_dictionary.rst Outdated Show resolved Hide resolved

typos

616a09b

braingram enabled auto-merge (squash) October 31, 2024 17:54

braingram merged commit 75df5f4 into spacetelescope:main Oct 31, 2024
21 checks passed

braingram deleted the kwtool branch October 31, 2024 18:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New tooling for comparing the keyword dictionary and data model schemas #337

New tooling for comparing the keyword dictionary and data model schemas #337

braingram commented Oct 17, 2024 •

edited

Loading

codecov bot commented Oct 17, 2024 •

edited

Loading

chamblee-st Oct 23, 2024

braingram Oct 23, 2024

braingram commented Oct 29, 2024 •

edited

Loading

braingram commented Oct 30, 2024 •

edited

Loading

tapastro left a comment

New tooling for comparing the keyword dictionary and data model schemas #337

New tooling for comparing the keyword dictionary and data model schemas #337

Conversation

braingram commented Oct 17, 2024 • edited Loading

Tasks

codecov bot commented Oct 17, 2024 • edited Loading

Codecov Report

chamblee-st Oct 23, 2024

Choose a reason for hiding this comment

braingram Oct 23, 2024

Choose a reason for hiding this comment

braingram commented Oct 29, 2024 • edited Loading

braingram commented Oct 30, 2024 • edited Loading

tapastro left a comment

Choose a reason for hiding this comment

braingram commented Oct 17, 2024 •

edited

Loading

codecov bot commented Oct 17, 2024 •

edited

Loading

braingram commented Oct 29, 2024 •

edited

Loading

braingram commented Oct 30, 2024 •

edited

Loading