OOM kill on a small system with workqueue consumers at high speed resulting in fast memory growth - #5673

roeschter · 2024-07-18T15:59:52Z

Observed behavior

Consuming from a workqueue with multiple consumers (subject segmented) at high speed (150MB/s) result in rapid memory growth and eventually exhaustion. Server will die with a range of symptoms, mostly silently OOM killed, sometimes throwing readloop processing time warning first.

Testing on a small system with 4Gb and 2 CPUs - This is deliberate to simulate an edge node.

This does not seem to be a memory leak per se. Repeated tests with a below threshold message volume do not trigger the effect.

Expected behavior

Self limits memory growth or throughput.

Server and client version

jnats-2.19.1.jar _ Java17
2.10.17 and 2.11-preview-rc3 (self compiled)
memory growth is not platform specific. Seen on Windows as well, but crashes observed only on smaller systems with <8GB.

Host environment

Linux - Old Kernel
Linux lvps5-35-240-105.dedicated.hosteurope.de 3.10.0-1160.80.1.vz7.191.4 #1 SMP Thu Dec 15 20:31:06 MSK 2022 x86_64 x86_64 x86_64 GNU/Linux

Steps to reproduce

playground.jar.zip
nats00_hosteurope.config.txt
source.zip

Full eclipse project available in private repo: https://github.com/roeschter/natsPlayground

Run on a system with 4GB (or less) - CPU count >=2
Java 17

/var/lib/java17/bin/java -cp playground.jar:jnats-2.19.1.jar com.synadia.test.config03_performance.C03JetstreamWorQueueSlowDeletes --user user --password dja673b48fh3kqp9 --messages 200000 --messageSize 10000 --fetchsize 10 --subjects 10 --consumemode fetch

--messages 200000 Should be sufficient on a 4GB system. Increase to 1000000 if not crashing.
--messageSize 10000 Increasing message size my speed the crash up. Not observable with small messages (throughput too low I presume)

Memory should increase rapidly when the consumers start. Rate should be 10000 msg/s or more

roeschter · 2024-07-19T09:22:39Z

Will experiment with
export GOGC=nnn

Jarema · 2024-07-19T10:31:50Z

For controlling the memory and preventing such OOMs with high load on insufficient resources, you should use GOMEMLIMIT. Keep in mind that if the limit will be a lot below the needs, it will throttle the throughput a lot, while putting more load on CPU (as the GC will need to work way harder).

roeschter · 2024-07-19T10:32:00Z

GOMEMLIMIT solves the immediate problem (I tried from 256MiB to 1024Mib), but the memory usage reported by the OS still vastly exceeds the limit I set.

Also: I would expect the nats-server to self-limit to a reasonable memory footprint.
"Reasonable" would be 50% of system memory when JS is enabled (because we need space with buffers and caches) and 70% for pure Nats Core.

My test system had 2.5GB of 4GB available for NATS when the tests starts

roeschter · 2024-07-19T10:36:47Z

E.g. In this case I set
export GOMEMLIMIT=256MiB
export GOGC=20

Resident memory still exceeds 2GB (consistent with overall system memory report)
Performance is not much impacted by the limits above.

Note that this only happens when reading from the stream, footprint is small when writing.

roeschter · 2024-07-19T10:50:21Z

I now created a state of the stream, where nats-server exceeds all limits during startup, recovering the stream and gets killed.
export GOMEMLIMIT=256MiB
export GOGC=10
Stream size on disk ~3.5GB
Resident memory growing to 2.4GB very fast - then OOM killed

neilalexander · 2024-07-19T11:17:42Z

Can you get a memory profile when the memory usage is inflated, either through nats server request profile allocs or https://docs.nats.io/running-a-nats-service/nats_admin/profiling?

neilalexander · 2024-07-19T11:27:24Z

Also: I would expect the nats-server to self-limit to a reasonable memory footprint.
"Reasonable" would be 50% of system memory when JS is enabled (because we need space with buffers and caches) and 70% for pure Nats Core.

Short of using GOMEMLIMIT, which is actually a runtime hint to the GC and not an actual hard limit, this is simply not practical to do. Runtime stats can't tell us about the actual RSS size, nor can it tell us about buffers/caches from the kernel, and we also don't have custom allocators in Go to actually enforce those limits. Even if we did, failing to allocate structures would just leave the system in an undefined state, which is worse than crashing.

We would need to figure out from profiles what is using the memory and whether that can be optimised for this case.

roeschter · 2024-07-19T12:20:09Z

Here are the memory profiles:
03_allocs_after_stream_delete_from_client.zip

Observations:

After consuming messages the memory allocated will not be freed even after the client shuts down AND the stream was deleted. (03_allocs_after)
What eventually frees the memory is a new client connection (without stream deletion) and resuming activity.
The whole game starts over

neilalexander · 2024-07-19T12:26:06Z

OOI how did you capture these profiles? They're in text rather than binary format which means the pprof visualiser can't parse them. Please make sure to use --level=0 if using the CLI or ?debug=0 if using the pprof HTTP endpoint.

roeschter · 2024-07-19T12:51:51Z

Human readable output via:

http://localhost:65432/debug/pprof/ --> Allocs --> localhost:65432/debug/pprof/allocs?debug=0
Gathering more detailed output

roeschter · 2024-07-19T13:26:46Z

Testing under Windows now, as the memory growth happens here as well. (just no OOM kill)

Binary Allocs:
allocs.zip

The more I look into the behavior the less deterministic it is. Under Windows is respected: Server shrinks back to the set size. BUT under Linux it didnt. In any case, memory temporarily growth to about the amount consumed during 10seconds by the consumers.

Seems the buffer management relies on: https://pkg.go.dev/sync#Pool
Seems this only cleans up after multiple GC cycles. But also, why is the memory not effectively being re-used? Who is not returning it fast enough?

neilalexander · 2024-07-19T13:42:44Z

The sync.Pools are indeed only purged during a GC run, that is by design. Otherwise they largely rely on the number of gets to roughly match the number of puts in order to stay balanced in their resource consumption.

In this case, looking at the profile, it appears as though lots of blocks are being loaded in from the filestore and held in the block cache. They should be held there for 10 seconds from the last access, after which the buffer will be returned to the pool. That would roughly explain why you see the correlation between the data accessed and the memory growth in that window.

Perhaps exposing the cache expiry as a configurable, i.e. being able to set it to less than 10 seconds, would help reduce memory usage at the expense of potentially more I/O operations. As an experiment, you could potentially test with a custom build that changes defaultCacheBufferExpiration in filestore.go to a lower value like time.Second * 2.

roeschter · 2024-07-19T14:15:47Z

Correct! If the behavior of the block cache would be documented (I can guess it exists, but I incorrectly assumed it to be of fixed size per connection) and its size or timeout configurable I would not have been surprised.

Last post on observed behavior:

Both Linux and Windows respect GOMEMLIMIT . that is they shrink back to the configured size
For the block cache to grow its neccssary to have multiple consumers on the stream, consuming with diffrent subject filters. In the case of workqueue its also required that they leave messages behind (e.g. on consumer out of ten currently down).

It does not happen:

When a single consumer consumers reads on a limits stream
When a workqueue gets fully emptied by its consumers

In summary:

The block cache growth happens in certain circumstances only (hard to guess from the outside)
It does not happen in "trivial" tests (the good cases)
The current unconfigurable timeout makes the memory growth uncontrollable. I can make GOMEMLIMIT small forcing the server to slow down, but that's it.
It happens during stream recovery as well, resulting in nats-server crashingduring startup .

roeschter · 2024-07-19T14:23:24Z

I will test with : DefaultCacheBufferExpiration in filestore.go to a lower value like time.Second * 2.

roeschter · 2024-07-19T14:38:39Z

Reducing the buffer timeout to 1s limits the memory consumption to around 600MB in my tests. At teh cost of reduced throughput, which is fine (still about 50MB/s from a workqueue)

Tested with DefaultCacheBufferExpiration = time.Second * 1
Windows and Linux based on 2.11 preview rc3
In combination with GOMEMELIMIT=512MiB

roeschter · 2024-07-23T14:27:54Z

Closing the investigation, but I would like the CacheBuffer configurable as a small enhancement request.
Should I try to patch this myself and make it a pull request?

derekcollison · 2024-07-23T15:43:31Z

If you could file an issue that would be great.

kohlisid · 2024-08-02T03:11:10Z

What is the stream maxBytes that you were using? @roeschter

roeschter · 2024-08-02T07:55:28Z

I was not using maxbytes (unlimited). The effect is purely about the speed of consuming.
As as was testing on a 4GB device with 2.5 GB free a 2GB effective stream size was enough to trigger the OOM.

roeschter · 2024-08-02T08:04:08Z

Closing this in favor of a new CR - #5740

roeschter added the defect Suspected defect such as a bug or regression label Jul 18, 2024

kohlisid mentioned this issue Aug 2, 2024

OOM killed when using JetStream WorkQueue [v2.10.18] #5739

Open

roeschter mentioned this issue Aug 2, 2024

Make DefaultCacheBufferExpiration configurable to prevent OOM in small system at high pub and sub rates #5740

Open

roeschter closed this as completed Aug 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OOM kill on a small system with workqueue consumers at high speed resulting in fast memory growth - #5673

OOM kill on a small system with workqueue consumers at high speed resulting in fast memory growth - #5673

roeschter commented Jul 18, 2024 •

edited

Loading

roeschter commented Jul 19, 2024 •

edited

Loading

Jarema commented Jul 19, 2024

roeschter commented Jul 19, 2024 •

edited

Loading

roeschter commented Jul 19, 2024 •

edited

Loading

roeschter commented Jul 19, 2024 •

edited

Loading

neilalexander commented Jul 19, 2024

neilalexander commented Jul 19, 2024

roeschter commented Jul 19, 2024

neilalexander commented Jul 19, 2024

roeschter commented Jul 19, 2024 •

edited

Loading

roeschter commented Jul 19, 2024 •

edited

Loading

neilalexander commented Jul 19, 2024

roeschter commented Jul 19, 2024 •

edited

Loading

roeschter commented Jul 19, 2024

roeschter commented Jul 19, 2024

roeschter commented Jul 23, 2024

derekcollison commented Jul 23, 2024

kohlisid commented Aug 2, 2024

roeschter commented Aug 2, 2024

roeschter commented Aug 2, 2024

OOM kill on a small system with workqueue consumers at high speed resulting in fast memory growth - #5673

OOM kill on a small system with workqueue consumers at high speed resulting in fast memory growth - #5673

Comments

roeschter commented Jul 18, 2024 • edited Loading

Observed behavior

Expected behavior

Server and client version

Host environment

Steps to reproduce

roeschter commented Jul 19, 2024 • edited Loading

Jarema commented Jul 19, 2024

roeschter commented Jul 19, 2024 • edited Loading

roeschter commented Jul 19, 2024 • edited Loading

roeschter commented Jul 19, 2024 • edited Loading

neilalexander commented Jul 19, 2024

neilalexander commented Jul 19, 2024

roeschter commented Jul 19, 2024

neilalexander commented Jul 19, 2024

roeschter commented Jul 19, 2024 • edited Loading

roeschter commented Jul 19, 2024 • edited Loading

neilalexander commented Jul 19, 2024

roeschter commented Jul 19, 2024 • edited Loading

roeschter commented Jul 19, 2024

roeschter commented Jul 19, 2024

roeschter commented Jul 23, 2024

derekcollison commented Jul 23, 2024

kohlisid commented Aug 2, 2024

roeschter commented Aug 2, 2024

roeschter commented Aug 2, 2024

roeschter commented Jul 18, 2024 •

edited

Loading

roeschter commented Jul 19, 2024 •

edited

Loading

roeschter commented Jul 19, 2024 •

edited

Loading

roeschter commented Jul 19, 2024 •

edited

Loading

roeschter commented Jul 19, 2024 •

edited

Loading

roeschter commented Jul 19, 2024 •

edited

Loading

roeschter commented Jul 19, 2024 •

edited

Loading

roeschter commented Jul 19, 2024 •

edited

Loading