-
-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why not use mimalloc? #142
Comments
It does appear to be a common consensus that switching allocators is a good idea performance-wise on musl. I cannot confirm this yet, but it seems promising from articles. The article you posted there does a lot more heavy lifting than I would have expected / hoped for personally though. The common solution I have seen is with There are like 3 big competitors afaiu so ideally need to do some testing. help/input is welcome. if the best way forward is mimalloc and adding stuff to the image, then i am very open to this. |
references #142 and does it in one example Signed-off-by: clux <[email protected]>
Do you have any test results to share? I would be interested in something similar. |
Saw this linked from https://users.rust-lang.org/t/static-linking-for-rust-without-glibc-scratch-image/112279 and thought I'd share my own experience with musl and allocators (though I used cross-rs to build instead). Jemalloc (via jemallocator or otherwise) has good performance but has a huge downside, if you care about platforms where the page size varies, such as recent Aarch64 (ARM64) systems. For example the Raspberry Pi 5 uses 16 KB pages instead of 4 KB. Apple also use bigger base pages on their M1/M2/etc CPUs. This results in Jemalloc segfaulting if it wasn't compiled on the same system. Jemalloc bakes the page size into the binary at build time, and can not work with larger sizes than that (though apparently it can work with smaller pages than what it was compiled for). As there are ARM systems that use 64 KB pages even, I cannot recommend Jemalloc. Mimalloc doesn't have this problem, and it had almost as good performance in my tests (I found it had slightly more fixed overhead for short running programs, but comparable performance after that). Of course for performance your milage may vary depending on your exact allocation pattern. |
I actually wrote a MUSL demo project with custom memory allocator in response to the discussion in the Rust forum. This was largely meant to show how to cross compile Rust with Bazel. Anyways, the overall observation w.r.t. to Jemalloc on Arm seems to hold true. I definitely see the issue of bloated memory on Apple Silicon. However on X86, I made the opposite observation that MiMalloc was less favorable than Jemalloc. That said, replacing the default MUSL allocater is by far the best low hanging performance tweak you can do for everything async & concurrency regardless of the programming language. The difference is night and day so not sure how much of a benchmark will be needed beyond a basic throughput & latency measurement. Therefore, I suggest adding both allocators to give people choice to pick the best one depending on their target and project. |
rust-lang/rust-analyzer#1441 might be an interesting read |
Thanks @geoHeil , appreciate it. Here is a benchmark that matches MUSL vs. libc and MUSL + MiMalloc vs libc. https://www.linkedin.com/pulse/testing-alternative-c-memory-allocators-pt-2-musl-mystery-gomes In a nutshell, MUSL with its default allocator is at least 10x slower compared to the default allocator in libc. Then, when swapping out the MUSL default allocator for MiMalloc, you get a 10x boost and in some cases performs even better than the libc alloator. I only want to add that when you use Jemalloc instead of MiMalloc, it's the same story except that MiMalloc eats a bit more memory than Jemalloc. For server / cloud systems, the difference may add up for high memory usage services so you want to measure the memory footprint before settling for either one. For low memory usage services, you can pick any at random and let it run for a long time. For embedded, Mimalloc clearly wins, no doubt. You can run any combination of benchmarks, but it's always the same story, add Jemalloc or MiMalloc and you get at least a 10x boost for your MUSL binary across all metrics; latency, throughput, you name it. It really is that simple. |
Hey there,
I want to build an image with musl for smaller image sizes, which is needed for my use case.
I am not a professional in musl, but I found that musl has some performance drawbacks in multi-core environments when reading this article. The article suggests mimalloc as an alternative allocator, which could be just added to the image and avoids the performance traps.
They author of the article also built an image, but it doesn't work with openssl, which is needed for my project, and your image seems to work.
Have you thought about adopting mimalloc to your image, as it looks like a free performance boost? Or are there other drawbacks when using it?
The text was updated successfully, but these errors were encountered: