JP-3683: fix abs_deriv handling of off-edge and nan values #311

braingram · 2024-10-18T17:41:38Z

As part of outlier detection an "absolute derivative" is computed. From the outlier detection docs:

The derivative of the blotted image gets created using the blotted median image to compute the absolute value of the difference between each pixel and its four surrounding neighbors with the largest value being the recorded derivative.
So in this case the "absolute derivative" is the maximum absolute difference between a pixel and it's neighbors.

This PR addresses 2 issues with _abs_deriv (used in outlier detection) where it:

treated off-edge pixels as 0
included nan values in the max used for combining differences between adjacent pixels (effectively dilating nan values)

With the previous code:

>> arr = np.arange(25, dtype='f4').reshape((5, 5))
>> arr[2, 2] = np.nan
>> _abs_deriv(arr)
array([[ 5.,  5.,  5.,  5.,  5.],
       [ 5.,  5., nan,  5.,  9.],
       [10., nan, nan, nan, 14.],
       [15.,  5., nan,  5., 19.],
       [20., 21., 22., 23., 24.]])

Note that now instead of one nan at (2, 2) there are 5 nan pixels and that edge values are larger than expected. For example (1, 4) is 9 when the largest difference with all defined pixels is 5.

With this PR the result is:

array([[ 5.,  5.,  5.,  5.,  5.],
       [ 5.,  5.,  5.,  5.,  5.],
       [ 5.,  5., nan,  5.,  5.],
       [ 5.,  5.,  5.,  5.,  5.],
       [ 5.,  5.,  5.,  5.,  5.]], dtype=float32)

Tests are added for both issues.

It is expected this will change regression tests for both jwst and romancal as abs_deriv is used to compute the outlier detection threshold. With this PR more outliers will be detected (since the threshold will be lower due to edge pixels having lower values and less nans in the blot images).

Romancal regtests running: https://github.com/spacetelescope/RegressionTests/actions/runs/11682867663
JWST regtests running: https://github.com/spacetelescope/RegressionTests/actions/runs/11682842450

Tasks

update or add relevant tests
update relevant docstrings and / or docs/ page
Does this PR change any API used downstream? (if not, label with no-changelog-entry-needed)
- write news fragment(s) in changes/: echo "changed something" > changes/<PR#>.<changetype>.rst (see below for change types)
- run regression tests with this branch installed ("git+https://github.com/<fork>/stcal@<branch>")
  - jwst regression test
  - romancal regression test

news fragment change types...

changes/<PR#>.apichange.rst: change to public API
changes/<PR#>.bugfix.rst: fixes an issue
changes/<PR#>.general.rst: infrastructure or miscellaneous change

codecov · 2024-10-18T17:46:32Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 86.38%. Comparing base (60bd3b8) to head (d420770).
Report is 12 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #311      +/-   ##
==========================================
+ Coverage   86.21%   86.38%   +0.16%     
==========================================
  Files          47       49       +2     
  Lines        8812     8899      +87     
==========================================
+ Hits         7597     7687      +90     
+ Misses       1215     1212       -3

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

tapastro · 2024-10-18T20:20:13Z

Looks like the RT run didn't recognize the provided branch name (in addition to the gcleary issue)

kmacdonald-stsci

I recommend avoid using single letter variable names and using more descriptive variable names.

src/stcal/outlier_detection/utils.py

braingram · 2024-10-18T20:28:47Z

Looks like the RT run didn't recognize the provided branch name (in addition to the gcleary issue)

Oof. Yeah I ran into "issues" :) The link should be updated to the correct one now:
https://github.com/spacetelescope/RegressionTests/actions/runs/11408938058
It shows 20 failures. These will be easier to sort through (and okify) once the artifactory issue is sorted (and these are re-run).

braingram · 2024-10-25T10:43:20Z

A new set of regression tests now that artifactory is back up:
romancal: https://github.com/spacetelescope/RegressionTests/actions/runs/11518437432
jwst: https://github.com/spacetelescope/RegressionTests/actions/runs/11518439378

EDIT: maybe I spoke too soon as jwst main from last night is showing artifactory errors. We'll see.

braingram · 2024-10-25T13:52:11Z

Romancal regtest shows 1 unrelated error and 2 failures:

test_mos_pipeline
test_mos_skycell_pipeline

That are expected since those outputs are different since this PR changes the number of detected outliers.

braingram · 2024-10-25T20:36:33Z

JWST regtests show several expected differences due to this PR changing the number of outlier detected. Most differences are small (<1%) However some are several % with test_nirspec_mos_fs_spec3 being one of the larger (>5%). Looking at the first slit the difference is largely due to the "nan dilation" on main preventing a large number of pixels from being flagged as outliers.

On main the first slit has 7447 outliers whereas with this PR there are 8171. For outlier detection the science data contains 59% finite values whereas the error (computed as the blotted median following spacetelescope/jwst#8828) contains 27% finite values. Looking at a portion of the slit with some finite pixels for data (streched to be able to see some structure in the finite pixels)

and for err:

the computed "blot_deriv" (the output of "_abs_deriv") in this area is (top this PR, bottom main)

Since any pixel in the "blot_deriv" marked as nan will never be flagged as an outlier the last plot shows the substantial number of pixels that on main are not being considered for outlier detection. With this PR these are considered and many for this example are now being flagged.

melanieclarke · 2024-10-28T14:02:04Z

On main the first slit has 7447 outliers whereas with this PR there are 8171. For outlier detection the science data contains 59% finite values whereas the error (computed as the blotted median following spacetelescope/jwst#8828) contains 27% finite values. Looking at a portion of the slit with some finite pixels for data (streched to be able to see some structure in the finite pixels)

@braingram - I'd like to look into this. Which slit was this? And which product is in the screenshots?

braingram · 2024-10-28T15:09:06Z

@braingram - I'd like to look into this. Which slit was this? And which product is in the screenshots?

Thanks!

The 7447 vs 8171 outliers are for one of the crf files compared for test_nirspec_mos_fs_spec3
https://bytesalad.stsci.edu/ui/native/jwst-pipeline-results/2024-10-25_GITHUB_CI_Linux-X64-py3.12-850/2994_test_nirspec_mos_fs_spec3/jw02674-o004_b000000048_nirspec_f290lp-g395m_crf.fits
vs
https://bytesalad.stsci.edu/ui/native/jwst-pipeline/dev/truth/test_nirspec_mos_fs_spec3/jw02674-o004_b000000048_nirspec_f290lp-g395m_crf.fits

It looks like I left off the mos part for generating the plots. The plots are of the first slit processed for test_nirspec_fs_spec3 (taken from jw01309022001_04102_00001_nrs2_cal.fits).
On main 24 pixels are flagged as outliers, with this PR 33 are flagged.
Here's a plot of the 'sci' and 'err' (the blotted median error) for that slit showing the large number of nan pixels (likely due to the masking during the median calculation, similar to the masking done for the 'sci' median used to generate the 'blot' array).

melanieclarke · 2024-10-28T15:36:06Z

Hmm, I can't reproduce those images, working from jwst on main and stcal on your branch. The below is what I see for the NRS2 blot file from S200A1. Top is SCI, bottom is ERR.

The large area marked NaN at the right is expected: this is a consequence of the default parameters weight_type = 'ivm' and maskpt = 0.70. NIRSpec is aware of this issue and considering different default parameter combinations -- see JP-3347 for discussion.

It is not expected that there would be a significant difference between the number of NaN values in the SCI and ERR arrays. In quick spot checks for the regression test outputs for MOS, MOS/FS, and FS, I haven't found any blot or median products that show a NaN discrepancy -- they all look like the below image.

jw01309022001_04102_00001_nrs1_s200a1_blot

braingram · 2024-10-28T15:47:24Z

Thanks for checking. The 'sci' I was referring to was the 'sci_data' passed into "flag_resampled_crs" (which comes from the "data" of the input file)

stcal/src/stcal/outlier_detection/utils.py

Line 157 in dfe1d6d

def flag_resampled_crs(

This is different from the blotted median (which I believe you plotted in the upper subplot in your comment). All 3 ('sci', 'err', 'blot') contribute to outlier detection in a way that a 'nan' pixel in any of the arrays will prevent it from being an outlier.
As this PR is only changing the blot_deriv (used on the 'blot', blotted median 'sci' data) I think my comments about the err being largely nan were a distraction.

melanieclarke · 2024-10-28T15:57:33Z

The 'sci' I was referring to was the 'sci_data' passed into "flag_resampled_crs"

Ah! Okay, I understand now. In that case, yes, this is expected. Blotted median sci and err will match up. The input data will not - the extra NaNs in the blotted product are due to the extra masking in that product.

Either way, I agree, that's a side issue from the fixes for the abs_deriv function. The changes here to fix the extra NaN dilation in the blot deriv look good.

melanieclarke

I looked through the differences for the regression tests for NIRSpec MOS and FS, and it looks like where the changes are significant with this branch, they are improvements. In particular, discrepant edge values in the test_nirspec_mos_fs_spec3 input are better flagged with these changes.

Thanks for the fix!

braingram · 2024-10-28T17:20:27Z

I looked through the differences for the regression tests for NIRSpec MOS and FS, and it looks like where the changes are significant with this branch, they are improvements. In particular, discrepant edge values in the test_nirspec_mos_fs_spec3 input are better flagged with these changes.

Thanks for the fix!

Thanks for the review! Let me check with the romancal folks to see how they would like this PR scheduled with their release (since it impacts a few regression tests there as well).

braingram · 2024-10-28T18:24:07Z

FYI merging of this should be delayed until:

a review from a roman representative (@schlafly and @mairanteodoro are requested reviewers)
ongoing work in romancal related to L1/L2 refacotring is finished

schlafly

LGTM, thanks!

mairanteodoro

Looks good! Thanks, @braingram!

github-actions bot added the testing label Oct 18, 2024

braingram marked this pull request as ready for review October 18, 2024 17:50

braingram requested a review from a team as a code owner October 18, 2024 17:50

braingram changed the title ~~fix abs_deriv~~ JP-3683: fix abs_deriv handling of off-edge and nan values Oct 18, 2024

braingram requested review from schlafly and tapastro October 18, 2024 17:52

stscijgbot-jp mentioned this pull request Oct 18, 2024

absolute derivitive calculation in outlier detection has edge effects spacetelescope/jwst#8636

Closed

kmacdonald-stsci approved these changes Oct 18, 2024

View reviewed changes

src/stcal/outlier_detection/utils.py Outdated Show resolved Hide resolved

src/stcal/outlier_detection/utils.py Outdated Show resolved Hide resolved

src/stcal/outlier_detection/utils.py Outdated Show resolved Hide resolved

src/stcal/outlier_detection/utils.py Outdated Show resolved Hide resolved

melanieclarke approved these changes Oct 28, 2024

View reviewed changes

braingram requested a review from mairanteodoro October 28, 2024 18:23

schlafly approved these changes Nov 1, 2024

View reviewed changes

braingram force-pushed the abs_deriv branch from 70f4045 to 350348e Compare November 1, 2024 15:39

braingram added 3 commits November 4, 2024 09:23

fix abs_deriv

bbedbda

add changelog

d239353

use descriptive variable names

d420770

braingram force-pushed the abs_deriv branch from 350348e to d420770 Compare November 4, 2024 14:23

mairanteodoro approved these changes Nov 4, 2024

View reviewed changes

braingram merged commit af730f4 into spacetelescope:main Nov 5, 2024
25 of 26 checks passed

braingram deleted the abs_deriv branch November 5, 2024 13:05

braingram mentioned this pull request Nov 5, 2024

RCAL-937: update s_region after running TweakRegStep. spacetelescope/romancal#1484

Merged

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JP-3683: fix abs_deriv handling of off-edge and nan values #311

JP-3683: fix abs_deriv handling of off-edge and nan values #311

braingram commented Oct 18, 2024 •

edited

Loading

codecov bot commented Oct 18, 2024 •

edited

Loading

tapastro commented Oct 18, 2024

kmacdonald-stsci left a comment

braingram commented Oct 18, 2024

braingram commented Oct 25, 2024 •

edited

Loading

braingram commented Oct 25, 2024

braingram commented Oct 25, 2024 •

edited

Loading

melanieclarke commented Oct 28, 2024

braingram commented Oct 28, 2024

melanieclarke commented Oct 28, 2024

braingram commented Oct 28, 2024

melanieclarke commented Oct 28, 2024

melanieclarke left a comment

braingram commented Oct 28, 2024

braingram commented Oct 28, 2024

schlafly left a comment

mairanteodoro left a comment

JP-3683: fix abs_deriv handling of off-edge and nan values #311

JP-3683: fix abs_deriv handling of off-edge and nan values #311

Conversation

braingram commented Oct 18, 2024 • edited Loading

Tasks

codecov bot commented Oct 18, 2024 • edited Loading

Codecov Report

tapastro commented Oct 18, 2024

kmacdonald-stsci left a comment

Choose a reason for hiding this comment

braingram commented Oct 18, 2024

braingram commented Oct 25, 2024 • edited Loading

braingram commented Oct 25, 2024

braingram commented Oct 25, 2024 • edited Loading

melanieclarke commented Oct 28, 2024

braingram commented Oct 28, 2024

melanieclarke commented Oct 28, 2024

braingram commented Oct 28, 2024

melanieclarke commented Oct 28, 2024

melanieclarke left a comment

Choose a reason for hiding this comment

braingram commented Oct 28, 2024

braingram commented Oct 28, 2024

schlafly left a comment

Choose a reason for hiding this comment

mairanteodoro left a comment

Choose a reason for hiding this comment

braingram commented Oct 18, 2024 •

edited

Loading

codecov bot commented Oct 18, 2024 •

edited

Loading

braingram commented Oct 25, 2024 •

edited

Loading

braingram commented Oct 25, 2024 •

edited

Loading