Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JP-3773: Store logs from calibration pipeline in datamodel #8857

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

tapastro
Copy link
Contributor

@tapastro tapastro commented Oct 4, 2024

Resolves JP-3773

Closes #8855

This PR adds the log records during processing to a datamodel attribute model.cal_logs. This attribute is a dictionary, with keys equal to steps and/or pipelines applied to the data, and values associated with those keys equal to the log output.

Tasks

  • request a review from someone specific, to avoid making the maintainers review every PR
  • add a build milestone, i.e. Build 11.3 (use the latest build if not sure)
  • Does this PR change user-facing code / API? (if not, label with no-changelog-entry-needed)
    • write news fragment(s) in changes/: echo "changed something" > changes/<PR#>.<changetype>.rst (see below for change types)
    • update or add relevant tests
    • update relevant docstrings and / or docs/ page
    • start a regression test and include a link to the running job (click here for instructions)
      • Do truth files need to be updated ("okified")?
        • after the reviewer has approved these changes, run okify_regtests to update the truth files
  • if a JIRA ticket exists, make sure it is resolved properly
news fragment change types...
  • changes/<PR#>.general.rst: infrastructure or miscellaneous change
  • changes/<PR#>.docs.rst
  • changes/<PR#>.stpipe.rst
  • changes/<PR#>.datamodels.rst
  • changes/<PR#>.scripts.rst
  • changes/<PR#>.fits_generator.rst
  • changes/<PR#>.set_telescope_pointing.rst
  • changes/<PR#>.pipeline.rst

stage 1

  • changes/<PR#>.group_scale.rst
  • changes/<PR#>.dq_init.rst
  • changes/<PR#>.emicorr.rst
  • changes/<PR#>.saturation.rst
  • changes/<PR#>.ipc.rst
  • changes/<PR#>.firstframe.rst
  • changes/<PR#>.lastframe.rst
  • changes/<PR#>.reset.rst
  • changes/<PR#>.superbias.rst
  • changes/<PR#>.refpix.rst
  • changes/<PR#>.linearity.rst
  • changes/<PR#>.rscd.rst
  • changes/<PR#>.persistence.rst
  • changes/<PR#>.dark_current.rst
  • changes/<PR#>.charge_migration.rst
  • changes/<PR#>.jump.rst
  • changes/<PR#>.clean_flicker_noise.rst
  • changes/<PR#>.ramp_fitting.rst
  • changes/<PR#>.gain_scale.rst

stage 2

  • changes/<PR#>.assign_wcs.rst
  • changes/<PR#>.badpix_selfcal.rst
  • changes/<PR#>.msaflagopen.rst
  • changes/<PR#>.nsclean.rst
  • changes/<PR#>.imprint.rst
  • changes/<PR#>.background.rst
  • changes/<PR#>.extract_2d.rst
  • changes/<PR#>.master_background.rst
  • changes/<PR#>.wavecorr.rst
  • changes/<PR#>.srctype.rst
  • changes/<PR#>.straylight.rst
  • changes/<PR#>.wfss_contam.rst
  • changes/<PR#>.flatfield.rst
  • changes/<PR#>.fringe.rst
  • changes/<PR#>.pathloss.rst
  • changes/<PR#>.barshadow.rst
  • changes/<PR#>.photom.rst
  • changes/<PR#>.pixel_replace.rst
  • changes/<PR#>.resample_spec.rst
  • changes/<PR#>.residual_fringe.rst
  • changes/<PR#>.cube_build.rst
  • changes/<PR#>.extract_1d.rst
  • changes/<PR#>.resample.rst

stage 3

  • changes/<PR#>.assign_mtwcs.rst
  • changes/<PR#>.mrs_imatch.rst
  • changes/<PR#>.tweakreg.rst
  • changes/<PR#>.skymatch.rst
  • changes/<PR#>.exp_to_source.rst
  • changes/<PR#>.outlier_detection.rst
  • changes/<PR#>.tso_photometry.rst
  • changes/<PR#>.stack_refs.rst
  • changes/<PR#>.align_refs.rst
  • changes/<PR#>.klip.rst
  • changes/<PR#>.spectral_leak.rst
  • changes/<PR#>.source_catalog.rst
  • changes/<PR#>.combine_1d.rst
  • changes/<PR#>.ami.rst

other

  • changes/<PR#>.wfs_combine.rst
  • changes/<PR#>.white_light.rst
  • changes/<PR#>.cube_skymatch.rst
  • changes/<PR#>.engdb_tools.rst
  • changes/<PR#>.guider_cds.rst

@tapastro tapastro requested a review from a team as a code owner October 4, 2024 18:27
@github-actions github-actions bot added the stpipe label Oct 4, 2024
Copy link

codecov bot commented Oct 7, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 61.81%. Comparing base (fb3c2cd) to head (a6f56c9).
Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8857      +/-   ##
==========================================
+ Coverage   61.79%   61.81%   +0.01%     
==========================================
  Files         377      377              
  Lines       38834    38849      +15     
==========================================
+ Hits        23999    24015      +16     
+ Misses      14835    14834       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@tapastro
Copy link
Contributor Author

tapastro commented Oct 7, 2024

@tapastro
Copy link
Contributor Author

tapastro commented Oct 8, 2024

These logs appear to include all log messages regardless of level, e.g.

2024-10-07T13:18:38.735 - stpipe.Spec2Pipeline.flat_field - DEBUG - Input is IFUImageModel of exposure type MIR_MRS
...
2024-10-07T13:18:39.185 - stpipe.Spec2Pipeline.flat_field - DEBUG - Flat field correction for non-NIRSpec modes.
2024-10-07T13:18:39.185 - stpipe.Spec2Pipeline.flat_field - DEBUG - ref substrt1=1, subsize1=1032, substrt2=1, subsize2=1024
...
2024-10-07T13:18:45.840 - stpipe.Spec2Pipeline.extract_1d - DEBUG - IFU 1-D extraction parameters:
2024-10-07T13:18:45.840 - stpipe.Spec2Pipeline.extract_1d - DEBUG -   x_center = 20.0
2024-10-07T13:18:45.840 - stpipe.Spec2Pipeline.extract_1d - DEBUG -   y_center = 21.0
2024-10-07T13:18:45.840 - stpipe.Spec2Pipeline.extract_1d - DEBUG -   width = 41.0
2024-10-07T13:18:45.840 - stpipe.Spec2Pipeline.extract_1d - DEBUG -   height = 43.0
2024-10-07T13:18:45.840 - stpipe.Spec2Pipeline.extract_1d - DEBUG -   theta = 0.0 degrees
2024-10-07T13:18:45.840 - stpipe.Spec2Pipeline.extract_1d - DEBUG -   subtract_background = False
2024-10-07T13:18:45.840 - stpipe.Spec2Pipeline.extract_1d - DEBUG -   sigma clip value for background = 3.0
2024-10-07T13:18:45.840 - stpipe.Spec2Pipeline.extract_1d - DEBUG -   method = subpixel
2024-10-07T13:18:45.840 - stpipe.Spec2Pipeline.extract_1d - DEBUG -   subpixels = 10

This may be caused by the log setting that appears at the top of most jwst code files, but perhaps occurs in step self.log instances as well?

@braingram
Copy link
Collaborator

I think this PR could benefit from some unit tests for the new behavior and documentation on what is contained in cal_logs.

Looking at this PR (as it currently stands), cal_logs is a dict containing step/pipeline class_alias keys and values that are single strings ('\n' joined) for the logs for that particular step/pipeline. Additionally, the pipeline runs remove the logs for the steps (since they're redundant for a full pipeline run). Since cal_logs isn't in the schema (and generally doesn't map well to the FITS structure) it will only be in the ASDF portion of the DataModel files.

Some questions I have are:

  • Is a single string or a list of strings more helpful? I vote for a list of strings.
  • Are there situations where keeping the step cal logs in a pipeline run would be helpful? I can't think of one so I think removal of the step-logs (like is done in this PR) makes sense for pipeline runs.

One issue I ran into was, if I run detector1 with the regtest file dev/nircam/image/jw01538046001_03105_00001_nrcalong_uncal.fits and look at what step logs are removed I only see ramp_fit and gain_scale. For some reason the call to DataModel.update in ramp_fit is not carrying over cal_logs. These logs also get lost for SlitModel on a call to dm.open (see related spacetelescope/stdatamodels#296). I would say these aren't issues with this PR but more with the foundation on which this PR relies (the same for the inclusion of the DEBUG messages).

I'm trying to think of what a "minimally viable" version of this feature needs. I think what is in this PR is sufficient (with the known issues above) but tests and docs seem worthwhile.

One final question, does it make sense to have this be "opt-in" for now perhaps adding a "save_cal_logs" to the spec and having it default to "False"? That way:

  • users can enable this feature if they want the extra logs (and know how to access them)
  • users that don't want them won't end up with larger files
  • ops can control the inclusion of cal_logs with a parameter (instead of requiring a code change)
  • at a later point (maybe even within this build) we could switch the default to True (and still allow users or ops to turn it off for smaller files, etc).

Comment on lines +102 to +107
if hasattr(result, 'cal_logs'):
tmpdict = result.cal_logs.instance
else:
tmpdict = dict()
tmpdict[self.class_alias] = '\n'.join(self._log_records)
result.cal_logs = tmpdict
Copy link
Collaborator

@braingram braingram Oct 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if hasattr(result, 'cal_logs'):
tmpdict = result.cal_logs.instance
else:
tmpdict = dict()
tmpdict[self.class_alias] = '\n'.join(self._log_records)
result.cal_logs = tmpdict
if not hasattr(result, 'cal_logs'):
result.cal_logs = {}
setattr(result.cal_logs, self.class_alias, self._log_records)

Some code golf and switching the cal_log to a list of strings.

Comment on lines +141 to +150
if hasattr(result, 'cal_logs'):
tmpdict = result.cal_logs.instance
else:
tmpdict = dict()
tmpdict[self.class_alias] = '\n'.join(self._log_records)

for _, step in self.step_defs.items():
if step.class_alias in tmpdict.keys():
del tmpdict[step.class_alias]
result.cal_logs = tmpdict
Copy link
Collaborator

@braingram braingram Oct 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if hasattr(result, 'cal_logs'):
tmpdict = result.cal_logs.instance
else:
tmpdict = dict()
tmpdict[self.class_alias] = '\n'.join(self._log_records)
for _, step in self.step_defs.items():
if step.class_alias in tmpdict.keys():
del tmpdict[step.class_alias]
result.cal_logs = tmpdict
if not hasattr(result, 'cal_logs'):
result.cal_logs = {}
for _, step in self.step_defs.items():
if hasattr(result.cal_logs, step.class_alias):
delattr(result.cal_logs, step.class_alias)
setattr(result.cal_logs, self.class_alias, self._log_records)

Also switching cal_logs to a list of strings and removing tmpdict.

@tapastro tapastro changed the title Store logs from calibration pipeline in datamodel JP-3773: Store logs from calibration pipeline in datamodel Oct 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Store log output in datamodel extension
2 participants