Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fn decode_b: Performance improvements #1320

Merged
merged 2 commits into from
Jul 16, 2024
Merged

fn decode_b: Performance improvements #1320

merged 2 commits into from
Jul 16, 2024

Conversation

rinon
Copy link
Collaborator

@rinon rinon commented Jul 16, 2024

Inlines backup2x8 which is inlined in C and improves performance slightly.

The f.a[t.a] block context reference is constant throughout decode_b, but it appears that the
function is too complex for the optimizer to not recompute this reference. Making it a local
improves performance measurably (~1% on a Ryzen 7700X for 8-bit Chimera).

This function is inlined in C and we see a small
performance improvement inlining it in Rust (~0.5%
on a Ryzen 7700X for Chimera 8-bit).
The `f.a[t.a]` block context reference is constant
throughout this function, but it appears that the
function is too complex for the optimizer to not
recompute this reference. Making it a local
improves performance measurably (~1% on a Ryzen
7700X for 8-bit Chimera).
Copy link
Collaborator

@kkysen kkysen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, very nice! This is quite a simple change for a decent perf improvement. Can we do the ta thing in any of other hot functions? Or is decode_b especially worse since it's so massive?

@rinon rinon merged commit 31aa266 into main Jul 16, 2024
27 checks passed
@rinon rinon deleted the sjc/performance branch July 16, 2024 21:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants