-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Actually compile with proper optimizations in a release build. #6336
base: master
Are you sure you want to change the base?
Conversation
There seem to be some stray warnings left that my local GCC did not trigger, I'll need to fix them before this is merged / reviewed |
Oh, nice find. Have you measured any performance change in more gamelike conditions than that stress test? |
I want to do icarus at some point, but I didn't find the time yet. I'll probably get to it around the weekend, once I squashed the remaining warnings. |
bf062bd
to
11b2932
Compare
11b2932
to
e5a2658
Compare
As it turns out, we did not actually compile the release builds with the maximum safe optimization level.
First of all, on Linux / Mac builds, we did not compile with -O3, but only with -O2. That's wasted performance, as O3 is still safe (unlike -Ofast, which may do inaccurate math).
Second of all, none of the toolchains has had Link-Time Optimizations enabled. This is especially bad in the FSO case, as we have some functions in cpp's (and thus uninlineable unless LTO is available) which are called very often (such as the vector math functions).
Enabling LTO throws up a few warnings: Some ODR violations, two uninitialized structs, and a few "potential negative size allocations". These latter ones are technically already caught with an Assertion, but as these are not compiled into release builds, the compiler sees a potentially dangerous path and complains. Until we get assumptions for the compiler (which is not soon, given that these are compiler extensions added to GCC13+ and standardized in C++23), we have to just make sure that the code at runtime ensures their correctness, even if that may be redundant.
Furthermore, an update to CMake 3.9 is required to actually get portable LTO support detection and enabling.
This also yields actually measureable performance increases.
In a particle stress test I was profiling (which prompted this PR), a performance boost of 10% is detectable (lower single graph / orange line is with LTO and -O3):