-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update AQM RT's to v16 and update input-data (Merged with Bring AQM changes from production/AQM.v7 into develop branch #2279) #2287
base: develop
Are you sure you want to change the base?
Conversation
…model into enable_aqm
I need to reduce the resources of this setup before being moved to ready. |
…model into enable_aqm
I've just tested the two atmaq on hera to make sure the merge didn't impact them and they were able to complete successfully. Now i'm running the full suite on hera to create an up-to-date test_changes.list for this PR. |
I am going to get a review completed for the AQM subcomponent. Please feel free to start testing. There is new input data already staged on hera
|
Also please remove any old aqm baselines from this new bl_date as the names of the tests have changes and we no longer need the old ones. |
@FernandoAndrade-NOAA @jkbk2004 @zach1221 move AQMv7 onto other RDHPCS if not already and please feel free to start testing. |
@zach1221 @FernandoAndrade-NOAA New input files AQMv7 are ready at orion/hercules/gaea/derecho. I will check if we can use lfs5 on jet. |
Anyone else having issues getting the new regional_atmaq_v16_debug case to pass baseline creation? It failed for me on hercules and derecho, but it doesn't appear to be wallclock. Idk @BrianCurtis-NOAA if you're able to take a look. Here's my error log |
Gaea is running into issues with the
|
I was able to run to completion on Hera Acorn and WCOSS2. Hmm. I'll chat with the AQM devs to see if that function call can be avoided or the issue fixed. Could you try the debug once more if you haven't done it twice yet? Then if still fails, what are the lib differences between the failed machines and hera? |
Same result for me on follow up attempts. Looking through libraries on hercules vs hera to see if anything stands out. Will keep you posted. |
Same error on Gaea from my rerun unfortunately. |
How is the Hera run coming along? Did baselines generate OK? |
Yes sorry about that I missed leaving a note on that. Generation was fine, the queue is just slow today. There's still about 94 tasks left. |
So we have Hera/Acorn/WCOSS2 as known working for the debug test. Derecho Hercules and Gaea as failing. Since the error is on the shape of an array, I am curious if maybe there was a corruption that occurred while rsync-ing to the RDHPCS platforms. Could you try to re-rsync on a fast-ish platform (that fails) and see if the debug test runs to completion? |
I'll try resyncing with Derecho. |
@BrianCurtis-NOAA rsyncing again on gaea now.... @zach1221 @FernandoAndrade-NOAA FYI |
Just leaving a note that the error is still occurring on the Gaea rerun. |
@BrianCurtis-NOAA hold this pr a bit and move to #2279 ? |
Were we able to see if Jet was effected by the issue? |
I have not tried Jet, from what I understood we were going to wait for results from Gaea to see if it was just a sync issue. Would you like me to start up a run on Jet or should we move on? |
Given that Derecho and Hercules also failed, I'd suggest we move onto the next PR. |
@BrianCurtis-NOAA I expect #2283 will be smooth only with WAM cases change. We can revisit AQM PRs either tomorrow or Monday. |
OK, move on. Please look into why Hera/Acorn/WCOSS2 pass while the others are not. Hopefully you can identify potential problems spots by comparing the different compiler/library versions between say Hera and Gaea. |
@BrianCurtis-NOAA I think we need a fix at https://github.com/BrianCurtis-NOAA/AQM/blob/b79f95f7de95b431feb74400aebb3a57e992c759/src/model/src/PT3D_STKS_DEFN.F#L232.
It runs ok on gaea once the line is updated. @zach1221 @FernandoAndrade-NOAA It might be worth to check on derencho. I don't expect any impact from the line update but we need to confirm. |
Commit Queue Requirements:
Description:
Commit Message:
Priority:
Git Tracking
UFSWM:
Sub component Pull Requests:
UFSWM Blocking Dependencies:
Changes
Regression Test Changes (Please commit test_changes.list):
Input data Changes:
/scratch2/NAGAPE/epic/UFS-WM_RT/NEMSfv3gfs/input-data-20240501/AQMv7
Library Changes/Upgrades:
Testing Log: