Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use env for repro tests, improve repro names #3430

Merged
merged 1 commit into from
Nov 12, 2024

Conversation

charleskawczynski
Copy link
Member

This PR:

  • Removes the reproducibility_test flag in the yaml config, and we instead use an environment variable. I don't think experiments should explicitly need to worry about this, and there's no way for users to leverage this feature locally anyway, so an env flag seems more appropriate.
  • Improves some names to better reflect what some reproducibility test functions are doing.

@Sbozzolo
Copy link
Member

What's the issue with the 4 GPU job?

@charleskawczynski
Copy link
Member Author

What's the issue with the 4 GPU job?

Not sure yet, still needs fixes..

@charleskawczynski
Copy link
Member Author

I incremented the reference counter because I think it's far easier to "reset" our comparable references than to support comparing data exported from HDF5 formats (which we now use to support reproducibility tests for GPU jobs) with NC files (which we previously used).

Fixes for gpu repro tests, auto-compare all state variables

Improve error message, increment ref counter

Fix zero_dict calls

Fix dict init

Fixes to zero_dict

Improve debug info
@charleskawczynski
Copy link
Member Author

The part that I was getting stuck on was that the "print new MSE tables" job was not finding any mse json files, but that's because they never got moved over (because we hadn't set the reproducibility_test flag). I think all is well, let's try this out.

@charleskawczynski
Copy link
Member Author

If the restart job gets tripped up again, then I'll manually merge.

@charleskawczynski charleskawczynski added this pull request to the merge queue Nov 12, 2024
Merged via the queue into main with commit 3ab7072 Nov 12, 2024
16 checks passed
@charleskawczynski charleskawczynski deleted the ck/rm_reproducibility_test_flag branch November 12, 2024 16:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants