-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Massive performance regression between nightly-2022-08-12
and nightly-2022-08-13
#102952
Comments
That commit doesn't compile
|
@the8472 I am sorry, I forgot to mention that you need to initialize
before running the benchmarks. I will update my post above to include this information. |
I don't know if this is related but today I removed 4 unnecessary instructions from the
The diff between So it indeed seems to be the culprit of the issue that Rust + LLVM easily fails to properly optimize this function using the threaded-code style branching technique when the stars are misaligned. ReproduceThe PR that I created today can be found here: wasmi-labs/wasmi#518 // wasmi/src/engine/executor.rs
pub trait Execute: Sized {
fn execute_dummy(self) -> Result<CallOutcome, TrapCode>;
}
impl<'ctx, 'engine, 'func> Execute for Executor<'ctx, 'engine, 'func, ()> {
fn execute_dummy(self) -> Result<CallOutcome, TrapCode> {
self.execute()
}
} And also add [profile.release]
lto = "fat"
codegen-units = 1 to the Install the |
WG-prioritization assigning priority (Zulip discussion). Discussed by T-compiler (notes), a smaller reproducible would probably help:
As mentioned in the opening comment, a bisection seems to point at the LLVM15 upgrade (#99464), specifically between @rustbot label -I-prioritize +P-high E-needs-mcve -I-compiler-nominated |
@apiraino Thanks a lot for the update and bringing this issue up at the T-compiler meeting! I am working on a MCVE, however, as stated above, I am unsure whether an MCVE is actually that useful since a fundamental problem that the code in question is extremely sensitive to changes as brought up in this comment were I demonstrated a significant performance regression by removing 4 variants from an I will update this thread once I am done with (or gave up on) the MCVE. Summary of the underlying ProblemsPlease take what follows with a big grain of salt since I am no compiler or LLVM expert. 😇
One Potential SolutionWhat follows now might be somewhat controversial ... Given the summary above it seems to me that Rust lacks abstractions that allow us to take precise and explicit control over control flow constructs that are required to meet performance criteria for our system level software. Having proper guaranteed tail call abstractions in Rust is one possible solution to this problem and would completely eliminate this issue since it would provide us with sufficient amount of control over the control flow constructs.
I think the general misunderstanding of guaranteed tail calls in Rust is that people think they are just more elegant than other solutions but in reality they provide us with more precise and explicit control over our control flow constructs than what is currently possible with Rust which results in more optimal code and optimizations while using safe Rust. |
WIP MCVE: https://github.com/Robbepop/rustc-regression-102952/blob/main/src/lib.rs This is not yet a real MCVE but we can use it as a starting point and it also has inferior performance on Benchmark using
The outputs I receive are:
Which demonstrates a moderate 18% slowdown. Since |
One thing I just noticed when comparing The PR in question simply removes 4 variants of an It might very well be that this regression is not even connected to the initial regression I reported in this issue and demonstrates yet another performance pit fall that we should maybe have a separate issue for. I would be thankful if someone could answer this. |
It does now. Also feel free to make a ticket if something is missing. |
Hi @pacak Thanks for implementing the feature on |
Right, that's mostly for other people who might be following the steps :) |
Any news on this? I have the strong feeling that I encountered this performance regression bug today again in this The PR does not change the amount of instructions but merely changes the implementation of a handful of the over hundred instructions that are part of the big I used
edit: I was able to fix the performance regression. The idea is that I thought that it was maybe important for the optimizer that all ^ I will keep this as a reminder to myself for future regressions. |
@Robbepop Regarding the "one potential solution" header you mentioned a while back. Rust for some time has reserved the word "become" for exactly this reason: it has been imagined that it will be used for explicit tail calls. So I believe the thing you are talking about is not really controversial. It is mostly awaiting someone assembling a draft RFC, thinking through all the implications, presenting it to the community (and also especially T-lang), and implementing it. Mind, the same person need not accomplish all of these steps, by far, not even "thinking through all the implications" alone. Indeed, if one were to make this happen, they probably would be best off starting by talking to the people working on MIR opts. If explicit tail calls is something that can be done before code hits LLVM, that simplifies a lot of things (though maybe it makes other things more complex, that sort of thing happens). |
@workingjubilee Thank you for your response! I think I see where you are heading with your reply ... Since then there have been many proposals to add tail calls to Rust, among others:
A small summary of the issues: rust-lang/rfcs#1888 (comment) Following the endless discussions in all those threads I never felt that this feature in particular received a lot of support from the core Rust team. Technical reasons for this usually were open questions about borrow/drops that has been resolved by rust-lang/rfcs#2691 (comment) but never received a proper response to follow-up as well as a major open question about the calling ABI that needs to be adjusted for tail calls from what I understood. This gave me personally the feeling that there is missing support from the group of people from which a potential external contributor urgently needs support. Writing yet another pre-RFC, a third base implementation for I am very open to ideas. |
Discussed in the T-compiler P-high review At this point I am interpreting this issue as a request for some form of proper tail call (PTC) support. In particular, I interpret the issue author's comments as saying that they are willing to adapt their code in order to get the tail-call-elimination guarantees they want. I too want some kind of PTC (I make no secret of my Scheme background). but given the many issues that the project has, I also want to make sure we properly prioritize them. In this case, this issue strikes me as a P-medium feature request, not a P-high regression. Therefore I am downgrading it to P-medium. @rustbot label: -P-high +P-medium |
FYI there is some ongoing work for implementing fail call support, see #112788 for the tracking issue and rust-lang/rfcs#3407 for recent activity on the RFC. |
@pnkfelix Indeed this issue can be closed once Rust tail calls have been merged as I expect it to fix the underlying issue given it provides the control and stack growth guarantees stated in the current MVP design. |
Usually I develop on the
stable
channel and wanted to see how my project performs on the upcomingbeta
ornightly
channels. I saw performance regressions between 30-80% across the board in native and Wasm targets. Those regressions were confirmed by our benchmarking CI as can be seen by the link.Was it the LLVM 15 Update?
I conducted a bisect and found that the change happened between
nightly-2022-08-12
andnightly-2022-08-13
.After short research I saw that Rust updated from LLVM 14 to 15 in exactly this time period: #99464 Other merged commits in this time period were not as suspicious to me.
Past Regressions
Also this unfortunately is not the first time we saw such massive regressions ....
It is extremely hard to craft a minimal code snippet out of
wasmi
since it is a very heavily optimized bunch of code with lots of interdependencies.Unfortunately the
wasmi
project is incredibly performance critical to us. Even 10-15% performance regression are a disaster to us let alone those 30-80% we just saw ...Hint for Minimal Code Example
I have one major suspicion: Due to missing guaranteed tail calls in Rust we are heavily reliant on a non-guaranteed optimization for our
loop-switch
based interpreter hot path that pulls jumps to the match arms which results to very similar code as what threaded-code interpreter code would produce. The code that depends on this particular optimization can be found here.This suspicion is underlined by the fact that especially non call-intense workloads show most regressions in the linked benchmarks. This implies to me that the regressions have something to do with instruction dispatch.
Potential Future Solutions
loop-switch
optimizations to its set of benchmarks so that future LLVM updates won't invalidate those optimizations. I am not sure how viable this approach is to the Rust compiler developers though. Also this only works if we find all the fragile parts that cause these regressions.Reproduce
The current
stable
Rust channel is the following:In order to reproduce these benchmarks do the following:
git clone [email protected]:paritytech/wasmi.git
cd wasmi
git checkout 21e12da67a765c8c8b8a62595d2c9d21e1fa2ef6
rustup toolchain install nightly-2022-08-12
rustup toolchain install nightly-2022-08-13
git submodule update --init --recursive
cargo +stable bench --bench benches execute -- --save-baseline stable
cargo +nightly-2022-08-12 bench --bench benches execute -- --baseline stable
cargo +nightly-2022-08-13 bench --bench benches execute -- --baseline stable
The text was updated successfully, but these errors were encountered: