Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load balance at restart is memory-intensive #1013

Open
ykempf opened this issue Aug 20, 2024 · 1 comment
Open

Load balance at restart is memory-intensive #1013

ykempf opened this issue Aug 20, 2024 · 1 comment

Comments

@ykempf
Copy link
Contributor

ykempf commented Aug 20, 2024

The scheme of our restart reading is inefficient in terms of memory high-water mark at least.

We read block counts and try to spread that evenly, but then the load balance will reshuffle things based on the LB_WEIGHT that's read in in the second stage. And that leads to massive rejigging of MPI domains and a significant peak in HWM. I assume this grew organically but it would seem more logical to simply read in the LB_WEIGHT and balance according to that, then read in, that should reduce the initial memory peak seen in current investigations.

It's not impossible I will file a patch soonish on this, but I have certain manuscript waiting for me... If anyone picks this up, I'll be grateful. :)

@ykempf
Copy link
Contributor Author

ykempf commented Aug 20, 2024

Now as @markusbattarbee pointed out, it would be a tricky mesh of small reads instead of the current sequential approach, so maybe not worth bothering with right now. If it can't reshuffle at restart it probably won't fit well in memory at runtime either.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant