-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Hashing/caching in parallel #3732
Comments
Another backtrace example:
|
Can you share the differing bits of generated code? |
This looks suspiciously similar to a parallel cache coherency issue I ran into recently (noted in these minutes). Are you running your code over multiple communicators, or just Also it would be good to see the differing code as Connor says: Take a look in |
On the Firedrake side there is just COMM_WORLD. We have some communicators that we create/destroy in C++, these are never passed to Firedrake, |
Miss-matched files are hopefully attached. I think this is the call stack for these files:
|
At the interpolation call [11] one rank is trying to interpolate a scalar valued function and the other rank a vector valued function. I do have vtu output calls for a vector valued function and a scalar valued function, both the backtraces are on the scalar valued write call.
|
One rank, the "vector" one, is on line 529 of |
I've managed to strip out all our particle gubbins to leave just Firedrake calls which still fail, see below. It's still quite involved I will continue to prune what I can. Edit: Instructions, to get this to fail I
|
I haven't seen it fail if I comment out the solve call. |
not the issue here but you can use PETSc.Sys.Print to ensure only one print statement in parallel. |
Oh nice, thanks. |
I had hoped that this FInAT PR and this Firedrake PR would have sorted the issue. Unfortunately not! It seems that the error is here on line 462. A |
I think for this case we usually compare UFL elements instead of FInAT ones. We just should be careful because sometimes we don't consider vector-ness when we make the FInAT element. I think we should probably use the UFL element here and perhaps make eq and/or hash raise an error for FInAT elements to catch any more of these. |
@will-saunders-ukaea can you try Then just need to ensure that the remaining tests pass. |
Seems to be fixed, closing |
Describe the bug
Running with multiple ranks intermittently causes errors to be thrown like:
I have a Firedrake install from 17/07/2024 which does not exhibit the issue and an install from 12/08/2024 which does.
Steps to Reproduce
Steps to reproduce the behavior:
WIP MFE - As the error is intermittent getting an backtrace to isolate the behaviour to make a MFE around is work in progress.
I have had the most success reproducing the error by:
Working on MFE.
Expected behaviour
Runs without error.
Error message
Environment:
firedrake-status
The text was updated successfully, but these errors were encountered: