Memory growth #839

inducer · 2023-02-21T17:59:03Z

I understand that there is some type of memory growth occurring.

From the 2023-02-17 dev meeting notes, I gather that

that memory growth only occurs when using the memory pool
"Using jemalloc fixes the issue": Is that before or after turning off the pool.

Possibly related: #212.

cc @matthiasdiener

inducer · 2023-02-21T18:07:20Z

Is the growth reflected in

I.e. do those increase timestep-over-timestep?

If there is growth, can identify what bins in the memory pool are affected? Can you identify which allocations? Python makes it straightforward to attach stack traces to allocations.

Do we know if this growth is of "array" memory or "other" memory?

How do your findings change if you call free_held between steps?

What is the simplest driver that exhibits the growth? I gather from @lukeolson that, maybe, examples/wave-lazy.py may be affected. Could you please confirm? Is grudge/examples/wave/wave-min-mpi.py affected as well? Is, say, vortex-mpi.py affected? Grudge's Euler?

lukeolson · 2023-02-21T18:59:01Z

Also, it looks like set_trace is exposed so you could get some additional information from that:
https://github.com/inducer/pyopencl/blob/main/src/mempool.hpp#L164

including bin size data

matthiasdiener · 2023-02-21T19:27:17Z

that memory growth only occurs when using the memory pool

The growth happens both with and without the pool. Here is an example with drivers_y2-prediction/smoke_test_ks (lazy-eval), 1 rank, Lassen CPU (y-axes are in "MByte"):

with SVM mempool:

with non-pool SVM:

both seem to "level off" after ~140 steps, but memory is likely to grow in the future. see e.g. this graph for a different Lassen run (SVM pool):
These results are qualitatively reproducible between runs, but quantitatively differ widely (even when rerunning the same exact configuration).
Using CL buffers vs. SVM allocations seem to show the same behavior.

inducer · 2023-02-21T19:52:37Z

Please confirm that the relevant growth is only of memory allocated via OpenCL.
I gather that you are using some (unspecified?) system/process-level metric of memory usage. What do things look like at the level of the OpenCL API? If you keep a running tally of memory allocated via OpenCL, does that grow as well or stay constant?

Btw, please keep vertical space in mind when writing issue text. Write claims, and hide supporting evidence under a <details>. I've done that for your comment above.

matthiasdiener · 2023-02-22T00:24:53Z

Tracing the memory pool allocations with set_trace (and using #840) with the same config as before (1 rank, smoke_test_ks, CPU) revealed some interesting information:

the (SVM) memory pool keeps growing throughout execution, although for some reason at step 38 active bytes goes down by 75%
throughout the whole execution, at least some memory pool requests required new allocations, i.e. [pool] allocation of size 1511472 required new memory

logpyle file: svm_pool_trace.log
trace log (up to step 42): https://gist.github.com/matthiasdiener/c7ebdc759829daebb13f790d2de15373

How do your findings change if you call free_held between steps?

Looking at the graph above, it seems like freeing the held memory may not help?

I gather that you are using some (unspecified?) system/process-level metric of memory usage.

The memory usage I initially added here is the RSS high water mark measured with illinois-ceesd/logpyle#79 (= max_rss).

inducer · 2023-02-22T02:39:44Z

Thanks. This tally of pool-held memory means (to me) that the issue is very likely "above" the pool, i.e. in Python. I.e., replacing the memory allocation scheme used by the pool should not help, or at least not much.

My read of this is that some member of a group of objects that cyclically refer to each other holds a reference to our arrays. This follows because Python's refcounting frees objects without cyclic referents effectively instantaneously, i.e. as soon as a reference to them is no longer being held.

To validate the latter conclusion, you could try calling gc.collect() every $N$ time steps to see if that helps free those objects. (Of course, that won't do much if there is some cyclic behavior in what references are held.)

Assuming the above conclusion is correct, the way to address this would be to find the objects referring to the arrays and make it so they no longer hold those references.

There is some support for that in the gc module: https://docs.python.org/3/library/gc.html#gc.get_referrers
There is a very bare-bones interactive reference graph explorer in pytools which might be of help.

matthiasdiener · 2023-02-22T19:51:41Z

What is the simplest driver that exhibits the growth? I gather from @lukeolson that, maybe, examples/wave-lazy.py may be affected. Could you please confirm? Is grudge/examples/wave/wave-min-mpi.py affected as well? Is, say, vortex-mpi.py affected? Grudge's Euler?

I've seen the growth in all drivers I tried, including the simplest ones:

Mirgecom's wave, wave-mpi
Grudge's euler/vortex, wave/wave-op-mpi

The growth only happens in lazy mode, not eager. The specific memory pool used (SVM, CL buffer) or lazy actx class do not seem to matter.

Graph for mirgecom's wave:

matthiasdiener · 2023-02-22T23:33:11Z

To validate the latter conclusion, you could try calling gc.collect() every N time steps to see if that helps free those objects. (Of course, that won't do much if there is some cyclic behavior in what references are held.)

It does seem that running gc.collect ~~resolves~~ mitigates this issue for us. The following results are for smoke_test_ks, but it is similar for the simpler testcases.

GC collect every 10 steps (no measurable performance overhead) :

GC collect every 1 step (~25% performance overhead):

inducer · 2023-02-22T23:41:40Z

It's important that gc.collect is not a solution, but a workaround. It's quite expensive (and should be unnecessary), and it only masks the problem.

MTCam · 2023-02-23T14:29:29Z

It's important that gc.collect is not a solution, but a workaround. It's quite expensive (and should be unnecessary), and it only masks the problem.
👍

I like your idea of running it every $N$ steps, though. This workaround can likely keep us running comfortably in the interim. afaict, after injecting this fix into the prediction driver, the code infrastructure is now capable of production-scale prediction-like runs, and at the very least in good shape for February trials (leaps and bounds over last year). Gigantic cool.

matthiasdiener · 2023-02-28T01:02:29Z

A few more updates for mirgecom's wave (w/ lazy eval):

When running without any gc invocations or gc config changes, gc.garbage is empty (which is expected I think).
When running with gc.set_debug(gc.DEBUG_SAVEALL), gc.garbage contains ~62000 objects after the first time step. Each subsequent time step adds about ~1000 objects. Is my assumption correct that those objects are the ones we suspect of having circular references (+holding a reference to arrays)? I was adapting this code https://code.activestate.com/recipes/523004-find-cyclical-references/ to check if there are array references in the objects with circular references, but this appears to be extremely time consuming.

Edit:

An interesting article about GC: https://devguide.python.org/internals/garbage-collector/index.html

matthiasdiener mentioned this issue Feb 24, 2023

add garbage collection illinois-ceesd/drivers_y2-prediction#28

Merged

matthiasdiener mentioned this issue Mar 3, 2023

debug memory leak in wave #845

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory growth #839

Memory growth #839

inducer commented Feb 21, 2023

inducer commented Feb 21, 2023

lukeolson commented Feb 21, 2023

matthiasdiener commented Feb 21, 2023 •

edited by inducer

Loading

inducer commented Feb 21, 2023

matthiasdiener commented Feb 22, 2023

inducer commented Feb 22, 2023

matthiasdiener commented Feb 22, 2023 •

edited

Loading

matthiasdiener commented Feb 22, 2023 •

edited

Loading

inducer commented Feb 22, 2023

MTCam commented Feb 23, 2023

matthiasdiener commented Feb 28, 2023 •

edited

Loading

Memory growth #839

Memory growth #839

Comments

inducer commented Feb 21, 2023

inducer commented Feb 21, 2023

lukeolson commented Feb 21, 2023

matthiasdiener commented Feb 21, 2023 • edited by inducer Loading

inducer commented Feb 21, 2023

matthiasdiener commented Feb 22, 2023

inducer commented Feb 22, 2023

matthiasdiener commented Feb 22, 2023 • edited Loading

matthiasdiener commented Feb 22, 2023 • edited Loading

inducer commented Feb 22, 2023

MTCam commented Feb 23, 2023

matthiasdiener commented Feb 28, 2023 • edited Loading

matthiasdiener commented Feb 21, 2023 •

edited by inducer

Loading

matthiasdiener commented Feb 22, 2023 •

edited

Loading

matthiasdiener commented Feb 22, 2023 •

edited

Loading

matthiasdiener commented Feb 28, 2023 •

edited

Loading