Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OOM killed when using JetStream WorkQueue [v2.10.18] #5739

Open
kohlisid opened this issue Aug 2, 2024 · 20 comments
Open

OOM killed when using JetStream WorkQueue [v2.10.18] #5739

kohlisid opened this issue Aug 2, 2024 · 20 comments
Labels
defect Suspected defect such as a bug or regression

Comments

@kohlisid
Copy link

kohlisid commented Aug 2, 2024

Observed behavior

Running 2 streams on a jetstream cluster with 3 nodes

The server config are as follows


Process Details:

         Version: 2.10.18
      Git Commit: 57d23ac
      Go Version: go1.22.5
      Start Time: 2024-08-02 01:00:45.487109016 +0000 UTC
          Uptime: 34m59s

Connection Details:

   Auth Required: true
    TLS Required: false
            Host: 0.0.0.0:4222
     Client URLs: 10.244.0.91:4222
                  10.244.0.92:4222
                  10.244.0.90:4222

JetStream:

              Domain: 
   Storage Directory: /data/jetstream/store/jetstream
          Max Memory: 768 MiB
            Max File: 1.0 TiB
      Active Acconts: 1
       Memory In Use: 0 B
         File In Use: 738 B
        API Requests: 7,948
          API Errors: 0

Limits:

        Max Conn: 65536
        Max Subs: 0
     Max Payload: 64 MiB
     TLS Timeout: 2s
  Write Deadline: 10s

Statistics:

       CPU Cores: 32 218.00%
          Memory: 162 MiB
     Connections: 8
   Subscriptions: 540
            Msgs: 682,490 in 730,334 out
           Bytes: 1.0 TiB in 1.2 TiB out
  Slow Consumers: 0

Cluster:

            Name: default
            Host: 0.0.0.0:6222
            URLs: isbsvc-default-js-0.isbsvc-default-js-svc.default.svc:6222
                  isbsvc-default-js-1.isbsvc-default-js-svc.default.svc:6222
                  isbsvc-default-js-2.isbsvc-default-js-svc.default.svc:6222

Stream configs are as follows ->

~ # js s info default-simple-pipeline-out-0
Information for Stream default-simple-pipeline-out-0 created 2024-08-02 01:02:04

             Subjects: default-simple-pipeline-out-0
             Replicas: 3
              Storage: File

Options:

            Retention: WorkQueue
     Acknowledgements: true
       Discard Policy: New
     Duplicate Window: 1m0s
    Allows Msg Delete: true
         Allows Purge: true
       Allows Rollups: false

Limits:

     Maximum Messages: 10,000
  Maximum Per Subject: unlimited
        Maximum Bytes: 100 MiB
          Maximum Age: 3d0h0m0s
 Maximum Message Size: unlimited
    Maximum Consumers: unlimited


Cluster Information:

                 Name: default
               Leader: isbsvc-default-js-2
              Replica: isbsvc-default-js-0, current, seen 0.01s ago
              Replica: isbsvc-default-js-1, outdated, seen 0.00s ago, 1 operation behind

State:

             Messages: 7
                Bytes: 44 MiB
             FirstSeq: 9,422 @ 2024-08-02T01:23:27 UTC
              LastSeq: 9,428 @ 2024-08-02T01:23:27 UTC
     Active Consumers: 1
   Number of Subjects: 1
Information for Stream default-simple-pipeline-cat-0 created 2024-08-02 01:02:03

             Subjects: default-simple-pipeline-cat-0
             Replicas: 3
              Storage: File

Options:

            Retention: WorkQueue
     Acknowledgements: true
       Discard Policy: New
     Duplicate Window: 1m0s
    Allows Msg Delete: true
         Allows Purge: true
       Allows Rollups: false

Limits:

     Maximum Messages: 10,000
  Maximum Per Subject: unlimited
        Maximum Bytes: 100 MiB
          Maximum Age: 3d0h0m0s
 Maximum Message Size: unlimited
    Maximum Consumers: unlimited


Cluster Information:

                 Name: default
               Leader: isbsvc-default-js-0
              Replica: isbsvc-default-js-1, current, seen 0.00s ago, 2 operations behind
              Replica: isbsvc-default-js-2, current, seen 0.00s ago

State:

             Messages: 5
                Bytes: 32 MiB
             FirstSeq: 3,897 @ 2024-08-02T01:10:42 UTC
              LastSeq: 4,220 @ 2024-08-02T01:11:26 UTC
     Deleted Messages: 319
     Active Consumers: 1
   Number of Subjects: 1

I am running a test where publish messages with payload of size 5mb to the streams, but I saw that the servers got terminated due to OOM killed

    State:        Running
    Started:      Fri, 02 Aug 2024 01:05:07 +0000
    Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    137
      Started:      Fri, 02 Aug 2024 01:00:45 +0000
      Finished:     Fri, 02 Aug 2024 01:05:06 +0000
    Ready:          True
    Restart Count:  1
    Limits:
      memory:  1Gi
    Requests:
      cpu:     1
      memory:  1Gi


    State:          Running
    Started:      Fri, 02 Aug 2024 01:04:24 +0000
    Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    137
      Started:      Fri, 02 Aug 2024 01:00:45 +0000
      Finished:     Fri, 02 Aug 2024 01:04:23 +0000
    Ready:          True
    Restart Count:  1

Expected behavior

I am running a test with payload size of 5Mb with 50msgs/sec

[55] 2024/08/02 01:00:45.460331 [INF] Starting nats-server
[55] 2024/08/02 01:00:45.460447 [INF]   Version:  2.10.18

[55] 2024/08/02 01:00:45.462256 [INF] ---------------- JETSTREAM ----------------
[55] 2024/08/02 01:00:45.462259 [INF]   Max Memory:      768.00 MB
[55] 2024/08/02 01:00:45.462261 [INF]   Max Storage:     1.00 TB
[55] 2024/08/02 01:00:45.462262 [INF]   Store Directory: "/data/jetstream/store/jetstream"
[55] 2024/08/02 01:00:45.462263 [INF] -------------------------------------------

There are 2 streams with replication factor of 3

I see this OOM killed error in 2 servers, I want to ask why would I be seeing this error when I'm using filestore as storage mode. Will the streams be pulled in-memory at runtime?

Is this high memory requirement expected? If yes, then what will be a good way to detemine what should be the max resource allocation to the Jetstream servers.

Server and client version

Version: 2.10.18

Host environment

No response

Steps to reproduce

No response

@kohlisid kohlisid added the defect Suspected defect such as a bug or regression label Aug 2, 2024
@kohlisid
Copy link
Author

kohlisid commented Aug 2, 2024

Perhaps related to #5673
But we are setting stream limits of maxBytes and maxMessages

@neilalexander
Copy link
Member

How much memory are you giving each NATS node?

@kohlisid
Copy link
Author

kohlisid commented Aug 2, 2024

@neilalexander

[55] 2024/08/02 01:00:45.462256 [INF] ---------------- JETSTREAM ----------------
[55] 2024/08/02 01:00:45.462259 [INF]   Max Memory:      768.00 MB
[55] 2024/08/02 01:00:45.462261 [INF]   Max Storage:     1.00 TB

@neilalexander
Copy link
Member

I asked specifically because sometimes the reporting doesn't always match cgroup restrictions, do you OOM as quickly if you set the GOMEMLIMIT environment variable to some floor below the available memory, i.e. GOMEMLIMIT=512MiB?

If you can capture some memory profiles when the memory usage is up but before OOM then that would be useful, either nats server request profile allocs from the system account or https://docs.nats.io/running-a-nats-service/nats_admin/profiling.

@ripienaar
Copy link
Contributor

The Max Memory shown does not limit how much memory NATS uses. It limits how much memory storage a stream may use and has no impact on NATS overall memory usage for its normal running needs.

What are you giving the process from your OS perspective?

@kohlisid
Copy link
Author

kohlisid commented Aug 2, 2024

I'm running the the cluster on K8s, with each JS server as a pod giving a memory of
memory: 1Gi
@ripienaar

@ripienaar
Copy link
Contributor

Huge payload messages would consume more ram, ime I wouldn't run nats with below 3GB memory when using Jetstream. Your big messages will make matters worse.

@kohlisid
Copy link
Author

kohlisid commented Aug 2, 2024

But on the flipside how would we estimate how much memory resources to allocate in such a scenario? Is there some guideline.
If there is a surge in message rate, we would need to ensure that it doesnt OOM
@ripienaar

@ripienaar
Copy link
Contributor

It varies a lot by use case, message rates etc and changes from version to version - usually the needs go down. A workload that today uses about 1GB memory for me used to use 6GB some time ago.

You should set up monitoring and use nats bench to simulate various scenarios, there's some guidance on the website about memory usage but its definitely wrong so doing your own tests would be best.

@kohlisid
Copy link
Author

kohlisid commented Aug 2, 2024

@ripienaar I agree to trying to simulate the scenarios, but for a high load this memory was spiking up a lot.
I cannot run my nodes without memory limit guardrails, hence I was trying to mimic the scenario here using limited resources to see when would it OOM. I assumed even for a worst case scenario that the whole stream is pulled into the cache (even though I'm using filestore storage) having a maxBytes of 100Mb for 2 streams and replication=3 (600Mb).
But this ceiling is getting hit more often than not.

For high work loads giving unbounded memory isn't possible, I want to understand where and why would the server be using the excess memory, and thus how to decide the upper bounds in that case.

@kohlisid
Copy link
Author

kohlisid commented Aug 2, 2024

debug/pprof/allocs
mem_pprof.zip
@neilalexander

@derekcollison
Copy link
Member

A good practice when limiting memory via cgroups and containers is to set the env GOMEMLIMIT to ~75% of actual limit. Sometimes the Go runtime only sees the host value of memory and does not feel pressure to GC and clean things up but the container and linux kernel will OOM it.

@kohlisid
Copy link
Author

kohlisid commented Aug 2, 2024

That could help in terms of the golang GC, but in terms of the nats what would be a good limit upper limit in the first place? How should we calculate that.
Even when I consider the worst case where all streams can reside in memory, I saw the OOM as stated in the above issue
For a filestore mode, we would not expect that to be the ideal case
@derekcollison

@derekcollison
Copy link
Member

For any program written in Go that is to be run in a container its best practice these days to set GOMEMLIMIT.

We do plan on introducing a high bound on how much buffer memory for all filestore backed streams in a server. This will come probably in 2.12.

@kohlisid
Copy link
Author

kohlisid commented Aug 3, 2024

I agree on the GOMEMLIMIT, will definitely go ahead on changing that.
But as a practice at this point, can we estimate the upper bound memory for the container? Based on the TPS and message size etc. Or some experimental settings that can help to measure that
@derekcollison

@derekcollison
Copy link
Member

Memory usage is quite dynamic based on connections, number of subscriptions for core. And for JetStream, number of streams and consumers, access patterns etc..

Best way is to model the upper bounds of what you expect on a larger system and monitor RSS peaks..

@wallyqs wallyqs changed the title OOM killed on 2.10.18 using Jetstream workqueue OOM killed when using JetStream WorkQueue [v2.10.18] Sep 6, 2024
@kohlisid
Copy link
Author

On a recent exploration with Limits policy, I was seeing the memory spiking up considerably again.

│                                                                                                      Stream Report                                                                                                      │
├──────────────────────────────────────────────────────────────────────────────┬─────────┬───────────┬───────────┬──────────┬─────────┬──────┬─────────┬──────────────────────────────────────────────────────────────────┤
│ Stream                                                                       │ Storage │ Placement │ Consumers │ Messages │ Bytes   │ Lost │ Deleted │ Replicas                                                         │
├──────────────────────────────────────────────────────────────────────────────┼─────────┼───────────┼───────────┼──────────┼─────────┼──────┼─────────┼──────────────────────────────────────────────────────────────────┤
│ KV_xxxx-simple-pipeline-out_SINK_OT          │ File    │           │ 0         │ 0        │ 0 B     │ 0    │ 0       │ isbsvc-default-js-1*, isbsvc-default-js-3, isbsvc-default-js-4!  │
│ KV_xxxx-simple-pipeline_SIDE_INPUTS          │ File    │           │ 0         │ 0        │ 0 B     │ 0    │ 0       │ isbsvc-default-js-1, isbsvc-default-js-2*, isbsvc-default-js-3   │
│ KV_xxxx-simple-pipeline-out_SINK_PROCESSORS  │ File    │           │ 0         │ 8        │ 1.1 KiB │ 0    │ 2002    │ isbsvc-default-js-1*, isbsvc-default-js-2, isbsvc-default-js-3   │
│ KV_xxxx-simple-pipeline-in-out_OT            │ File    │           │ 10        │ 10       │ 1.3 KiB │ 0    │ 9       │ isbsvc-default-js-0, isbsvc-default-js-1, isbsvc-default-js-3*   │
│ KV_xxxx-simple-pipeline-in-out_PROCESSORS    │ File    │           │ 7         │ 10       │ 1.3 KiB │ 0    │ 0       │ isbsvc-default-js-2, isbsvc-default-js-3*, isbsvc-default-js-4!  │
│ KV_xxxx-simple-pipeline-in_SOURCE_OT         │ File    │           │ 24        │ 10       │ 1.3 KiB │ 0    │ 1       │ isbsvc-default-js-0!, isbsvc-default-js-1!, isbsvc-default-js-4! │
│ KV_xxxx-simple-pipeline-in_SOURCE_PROCESSORS │ File    │           │ 10        │ 10       │ 1.3 KiB │ 0    │ 0       │ isbsvc-default-js-0!, isbsvc-default-js-2, isbsvc-default-js-3*  │
│ xxxx-simple-pipeline-out-1                   │ File    │           │ 1         │ 100,000  │ 6.2 GiB │ 0    │ 0       │ isbsvc-default-js-0!, isbsvc-default-js-2!, isbsvc-default-js-4! │
│ xxxx-simple-pipeline-out-0                   │ File    │           │ 1         │ 100,000  │ 6.2 GiB │ 0    │ 0       │ isbsvc-default-js-0!, isbsvc-default-js-1*, isbsvc-default-js-4  │
╰──────────────────────────────────────────────────────────────────────────────┴─────────┴───────────┴───────────┴──────────┴─────────┴──────┴─────────┴──────────────────────────────────────────────────────────────────╯

In this case I have two streams running with size of 6.2Gib running in filestore mode, but the memory usage on the servers were considerably high. More than having both the streams completely in memory even though we have filestore.

~ # njs server ls
╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│                                                           Server Overview                                                           │
├─────────────────────┬─────────┬──────┬─────────┬─────┬───────┬───────┬────────┬─────┬─────────┬───────┬───────┬──────┬────────┬─────┤
│ Name                │ Cluster │ Host │ Version │ JS  │ Conns │ Subs  │ Routes │ GWs │ Mem     │ CPU % │ Cores │ Slow │ Uptime │ RTT │
├─────────────────────┼─────────┼──────┼─────────┼─────┼───────┼───────┼────────┼─────┼─────────┼───────┼───────┼──────┼────────┼─────┤
│ isbsvc-default-js-2 │ default │ 0    │ 2.10.18 │ yes │ 13    │ 663   │ 16     │ 0   │ 447 MiB │ 3     │ 32    │ 7    │ 41m35s │ 1ms │
│ isbsvc-default-js-3 │ default │ 0    │ 2.10.18 │ yes │ 9     │ 663   │ 16     │ 0   │ 32 MiB  │ 2     │ 32    │ 0    │ 41m33s │ 1ms │
│ isbsvc-default-js-0 │ default │ 0    │ 2.10.18 │ yes │ 0     │ 663   │ 16     │ 0   │ 16 GiB  │ 18    │ 32    │ 0    │ 4m20s  │ 1ms │
│ isbsvc-default-js-4 │ default │ 0    │ 2.10.18 │ yes │ 0     │ 663   │ 16     │ 0   │ 16 GiB  │ 100   │ 32    │ 0    │ 2m49s  │ 2ms │
│ isbsvc-default-js-1 │ default │ 0    │ 2.10.18 │ yes │ 12    │ 663   │ 16     │ 0   │ 434 MiB │ 5     │ 32    │ 9    │ 41m31s │ 2ms │
├─────────────────────┼─────────┼──
────┼─────────┼─────┼───────┼───────┼────────┼─────┼─────────┼───────┼───────┼──────┼────────┼─────┤
│                     │ 1       │ 5    │         │ 5   │ 34    │ 3,315 │        │     │ 33 GiB  │       │       │ 16   │        │     │
╰─────────────────────┴─────────┴──────┴─────────┴─────┴───────┴───────┴────────┴─────┴─────────┴───────┴───────┴──────┴────────┴─────╯

Here are the memory profiles for the same
mem_prof.zip

@derekcollison @neilalexander

@neilalexander
Copy link
Member

This looks like a build-up of Raft append entries. How are you publishing messages? Are you using core NATS publishes or JetStream publishes?

@kohlisid
Copy link
Author

@neilalexander
We use the Jetstream publish

// PublishMsg publishes a Msg to JetStream.
PublishMsg(m *Msg, opts ...PubOpt) (*PubAck, error)

https://github.com/nats-io/nats.go/blob/main/js.go#L45

@kohlisid
Copy link
Author

I see a buildup during the writeMsgRecord as well
mem_prof_2.zip
@neilalexander @derekcollison

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
defect Suspected defect such as a bug or regression
Projects
None yet
Development

No branches or pull requests

4 participants