Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This brings a performance improvement of 20-30%.
Where possible, compiler is aided to optimise away the bounds checks without any unsafe code. No unsafe code was used.
This PR does not touch AVX code, because when testing, I did not see a noticeable improvement for that case.
Numbers before:
Numbers now:
The only thing preventing from this implementation being as fast as kagome's C++ impl are the few lines annotated with:
I couldn't yet manage to get the compiler to ellide the bounds checks in those cases. Another way of achieving this would be to add a bit of unsafe code.