Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JP-3683: fix abs_deriv handling of off-edge and nan values #311

Merged
merged 3 commits into from
Nov 5, 2024

Conversation

braingram
Copy link
Collaborator

@braingram braingram commented Oct 18, 2024

Resolves JP-3683

As part of outlier detection an "absolute derivative" is computed. From the outlier detection docs:

The derivative of the blotted image gets created using the blotted median image to compute the absolute value of the difference between each pixel and its four surrounding neighbors with the largest value being the recorded derivative.
So in this case the "absolute derivative" is the maximum absolute difference between a pixel and it's neighbors.

This PR addresses 2 issues with _abs_deriv (used in outlier detection) where it:

  • treated off-edge pixels as 0
  • included nan values in the max used for combining differences between adjacent pixels (effectively dilating nan values)

With the previous code:

>> arr = np.arange(25, dtype='f4').reshape((5, 5))
>> arr[2, 2] = np.nan
>> _abs_deriv(arr)
array([[ 5.,  5.,  5.,  5.,  5.],
       [ 5.,  5., nan,  5.,  9.],
       [10., nan, nan, nan, 14.],
       [15.,  5., nan,  5., 19.],
       [20., 21., 22., 23., 24.]])

Note that now instead of one nan at (2, 2) there are 5 nan pixels and that edge values are larger than expected. For example (1, 4) is 9 when the largest difference with all defined pixels is 5.

With this PR the result is:

array([[ 5.,  5.,  5.,  5.,  5.],
       [ 5.,  5.,  5.,  5.,  5.],
       [ 5.,  5., nan,  5.,  5.],
       [ 5.,  5.,  5.,  5.,  5.],
       [ 5.,  5.,  5.,  5.,  5.]], dtype=float32)

Tests are added for both issues.

It is expected this will change regression tests for both jwst and romancal as abs_deriv is used to compute the outlier detection threshold. With this PR more outliers will be detected (since the threshold will be lower due to edge pixels having lower values and less nans in the blot images).

Romancal regtests running: https://github.com/spacetelescope/RegressionTests/actions/runs/11682867663
JWST regtests running: https://github.com/spacetelescope/RegressionTests/actions/runs/11682842450

Tasks

  • update or add relevant tests
  • update relevant docstrings and / or docs/ page
  • Does this PR change any API used downstream? (if not, label with no-changelog-entry-needed)
    • write news fragment(s) in changes/: echo "changed something" > changes/<PR#>.<changetype>.rst (see below for change types)
    • run regression tests with this branch installed ("git+https://github.com/<fork>/stcal@<branch>")
news fragment change types...
  • changes/<PR#>.apichange.rst: change to public API
  • changes/<PR#>.bugfix.rst: fixes an issue
  • changes/<PR#>.general.rst: infrastructure or miscellaneous change

Copy link

codecov bot commented Oct 18, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 86.38%. Comparing base (60bd3b8) to head (d420770).
Report is 12 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #311      +/-   ##
==========================================
+ Coverage   86.21%   86.38%   +0.16%     
==========================================
  Files          47       49       +2     
  Lines        8812     8899      +87     
==========================================
+ Hits         7597     7687      +90     
+ Misses       1215     1212       -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@braingram braingram marked this pull request as ready for review October 18, 2024 17:50
@braingram braingram requested a review from a team as a code owner October 18, 2024 17:50
@braingram braingram changed the title fix abs_deriv JP-3683: fix abs_deriv handling of off-edge and nan values Oct 18, 2024
@tapastro
Copy link
Collaborator

Looks like the RT run didn't recognize the provided branch name (in addition to the gcleary issue)

Copy link
Collaborator

@kmacdonald-stsci kmacdonald-stsci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend avoid using single letter variable names and using more descriptive variable names.

src/stcal/outlier_detection/utils.py Outdated Show resolved Hide resolved
src/stcal/outlier_detection/utils.py Outdated Show resolved Hide resolved
src/stcal/outlier_detection/utils.py Outdated Show resolved Hide resolved
src/stcal/outlier_detection/utils.py Outdated Show resolved Hide resolved
@braingram
Copy link
Collaborator Author

Looks like the RT run didn't recognize the provided branch name (in addition to the gcleary issue)

Oof. Yeah I ran into "issues" :) The link should be updated to the correct one now:
https://github.com/spacetelescope/RegressionTests/actions/runs/11408938058
It shows 20 failures. These will be easier to sort through (and okify) once the artifactory issue is sorted (and these are re-run).

@braingram
Copy link
Collaborator Author

braingram commented Oct 25, 2024

A new set of regression tests now that artifactory is back up:
romancal: https://github.com/spacetelescope/RegressionTests/actions/runs/11518437432
jwst: https://github.com/spacetelescope/RegressionTests/actions/runs/11518439378

EDIT: maybe I spoke too soon as jwst main from last night is showing artifactory errors. We'll see.

@braingram
Copy link
Collaborator Author

Romancal regtest shows 1 unrelated error and 2 failures:

  • test_mos_pipeline
  • test_mos_skycell_pipeline

That are expected since those outputs are different since this PR changes the number of detected outliers.

@braingram
Copy link
Collaborator Author

braingram commented Oct 25, 2024

JWST regtests show several expected differences due to this PR changing the number of outlier detected. Most differences are small (<1%) However some are several % with test_nirspec_mos_fs_spec3 being one of the larger (>5%). Looking at the first slit the difference is largely due to the "nan dilation" on main preventing a large number of pixels from being flagged as outliers.

On main the first slit has 7447 outliers whereas with this PR there are 8171. For outlier detection the science data contains 59% finite values whereas the error (computed as the blotted median following spacetelescope/jwst#8828) contains 27% finite values. Looking at a portion of the slit with some finite pixels for data (streched to be able to see some structure in the finite pixels)
sci_stretched
and for err:
err
the computed "blot_deriv" (the output of "_abs_deriv") in this area is (top this PR, bottom main)
blot_deriv_0
Since any pixel in the "blot_deriv" marked as nan will never be flagged as an outlier the last plot shows the substantial number of pixels that on main are not being considered for outlier detection. With this PR these are considered and many for this example are now being flagged.

@melanieclarke
Copy link
Contributor

On main the first slit has 7447 outliers whereas with this PR there are 8171. For outlier detection the science data contains 59% finite values whereas the error (computed as the blotted median following spacetelescope/jwst#8828) contains 27% finite values. Looking at a portion of the slit with some finite pixels for data (streched to be able to see some structure in the finite pixels)

@braingram - I'd like to look into this. Which slit was this? And which product is in the screenshots?

@braingram
Copy link
Collaborator Author

@braingram - I'd like to look into this. Which slit was this? And which product is in the screenshots?

Thanks!

The 7447 vs 8171 outliers are for one of the crf files compared for test_nirspec_mos_fs_spec3
https://bytesalad.stsci.edu/ui/native/jwst-pipeline-results/2024-10-25_GITHUB_CI_Linux-X64-py3.12-850/2994_test_nirspec_mos_fs_spec3/jw02674-o004_b000000048_nirspec_f290lp-g395m_crf.fits
vs
https://bytesalad.stsci.edu/ui/native/jwst-pipeline/dev/truth/test_nirspec_mos_fs_spec3/jw02674-o004_b000000048_nirspec_f290lp-g395m_crf.fits

It looks like I left off the mos part for generating the plots. The plots are of the first slit processed for test_nirspec_fs_spec3 (taken from jw01309022001_04102_00001_nrs2_cal.fits).
On main 24 pixels are flagged as outliers, with this PR 33 are flagged.
Here's a plot of the 'sci' and 'err' (the blotted median error) for that slit showing the large number of nan pixels (likely due to the masking during the median calculation, similar to the masking done for the 'sci' median used to generate the 'blot' array).
Screenshot 2024-10-28 at 11 00 58 AM

@melanieclarke
Copy link
Contributor

Hmm, I can't reproduce those images, working from jwst on main and stcal on your branch. The below is what I see for the NRS2 blot file from S200A1. Top is SCI, bottom is ERR.

The large area marked NaN at the right is expected: this is a consequence of the default parameters weight_type = 'ivm' and maskpt = 0.70. NIRSpec is aware of this issue and considering different default parameter combinations -- see JP-3347 for discussion.

It is not expected that there would be a significant difference between the number of NaN values in the SCI and ERR arrays. In quick spot checks for the regression test outputs for MOS, MOS/FS, and FS, I haven't found any blot or median products that show a NaN discrepancy -- they all look like the below image.

jw01309022001_04102_00001_nrs1_s200a1_blot

@braingram
Copy link
Collaborator Author

Thanks for checking. The 'sci' I was referring to was the 'sci_data' passed into "flag_resampled_crs" (which comes from the "data" of the input file)

def flag_resampled_crs(

This is different from the blotted median (which I believe you plotted in the upper subplot in your comment). All 3 ('sci', 'err', 'blot') contribute to outlier detection in a way that a 'nan' pixel in any of the arrays will prevent it from being an outlier.
As this PR is only changing the blot_deriv (used on the 'blot', blotted median 'sci' data) I think my comments about the err being largely nan were a distraction.

@melanieclarke
Copy link
Contributor

The 'sci' I was referring to was the 'sci_data' passed into "flag_resampled_crs"

Ah! Okay, I understand now. In that case, yes, this is expected. Blotted median sci and err will match up. The input data will not - the extra NaNs in the blotted product are due to the extra masking in that product.

Either way, I agree, that's a side issue from the fixes for the abs_deriv function. The changes here to fix the extra NaN dilation in the blot deriv look good.

Copy link
Contributor

@melanieclarke melanieclarke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked through the differences for the regression tests for NIRSpec MOS and FS, and it looks like where the changes are significant with this branch, they are improvements. In particular, discrepant edge values in the test_nirspec_mos_fs_spec3 input are better flagged with these changes.

Thanks for the fix!

@braingram
Copy link
Collaborator Author

I looked through the differences for the regression tests for NIRSpec MOS and FS, and it looks like where the changes are significant with this branch, they are improvements. In particular, discrepant edge values in the test_nirspec_mos_fs_spec3 input are better flagged with these changes.

Thanks for the fix!

Thanks for the review! Let me check with the romancal folks to see how they would like this PR scheduled with their release (since it impacts a few regression tests there as well).

@braingram
Copy link
Collaborator Author

FYI merging of this should be delayed until:

  • a review from a roman representative (@schlafly and @mairanteodoro are requested reviewers)
  • ongoing work in romancal related to L1/L2 refacotring is finished

Copy link
Collaborator

@schlafly schlafly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

Copy link
Collaborator

@mairanteodoro mairanteodoro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thanks, @braingram!

@braingram braingram merged commit af730f4 into spacetelescope:main Nov 5, 2024
25 of 26 checks passed
@braingram braingram deleted the abs_deriv branch November 5, 2024 13:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants