-
Notifications
You must be signed in to change notification settings - Fork 898
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory usage spikes on appliance with 100+ providers #22900
Comments
I wonder if this is due to Typically there are not that many providers, so this will slide through. But once we get enough providers, then forcing the filtering out of the db and into ram will start to show up. Question: Do we know what the server was doing when it hit an out of memory? |
@kbrock I just shared an internal box folder that contains a journalctl save that includes the "crash" - The first OOM message I see are related to the DB connection:
|
This is extremely odd, memory used by buffer cache shouldn't cause OOM it is supposed to be able to be freed at anytime. If there was an OOM while there was available memory that is really weird. One thing that comes to mind, what is the filesystem usage looking like here? If you're using mmap'd memory pages backed by files and you run out of space on the filesystem it'll actually trigger a memory fault not a ENOSPC |
I might have confused things a bit here - the
|
One other thing that comes to mind is I've heard of some issues on linux (specifically with ZFS on linux since ZFS makes heavy use of in-memory ARC cache) with not being able to free cached memory "fast enough" to prevent an OOM. This is assuming there was still available memory when the OOM hit which we don't know. |
This issue has been automatically marked as stale because it has not been updated for at least 3 months. If you can still reproduce this issue on the current release or on |
Issue Description
Memory usage periodically spikes until out of memory. Eventually something crashes and a bunch of workers are killed. Things start back up and the process repeats.
Memory used (%) since last reboot on 2024-01-31 until 2024-02-15:
Environment
4.2.0
(IM)20230913103031_b15dc60
Logs
I saved off journalctl entries for the following events. The log files are too large to upload to GitHub (also, they may contain sensitive information). I can provide access to them as-needed.
Memory usage spikes ~10% on 2024-02-08 11:10
journalctl save:
journalctl_20240208105500-20240208130500.txt
Memory usage drop on 2024-02-14 00:10
journalctl save:
journalctl_20240214000000-20240214003000.txt
Memory usage spikes ~10% 2024-02-15 12:55
journalctl save:
journalctl_20240214125000-20240214130500.txt
top_output save:
top_output_20240215.log
From the top output it looks like used stays flat while buff/cache increases:
The text was updated successfully, but these errors were encountered: