Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[X86] LowerBITREVERSE - use AND+CMPEQ+MOVMSK trick to lower scalar types #92236

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

RKSimon
Copy link
Collaborator

@RKSimon RKSimon commented May 15, 2024

By splitting each byte of a scalar type into 8 copies, in reverse order, we can then test each one to see if the bit is set and then use PMOVMSKB/VPTESTMB to pack them back together to get the bit reversal.

So far I've only managed to make this worth it for i8/i16 with SSSE3 (PSHUFB), i32 with AVX2 (AVX1 would be worth it if we can force all 256-bit operations to be split), and i64 with AVX512BW (which can use VPTESTMB).

Fixes #79794

Copy link

github-actions bot commented May 15, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@RKSimon RKSimon force-pushed the bitreverse-movmsk branch 2 times, most recently from b79b209 to 95ec69b Compare June 24, 2024 21:55
…2 scalar types

By splitting each byte of a scalar type into 8 copies, in reverse order, we can then test each one to see if the bit is set and then use PMOVMSKB to pack them back together to get the bit reversal.

So far I've only managed to make this worth it for i8/i16 with SSSE3 (PSHUFB), i32 with AVX2 (AVX1 would be worth it if we can force all 256-bit operations to be split), and i64 with AVX512BW (which can use VPTESTMB).

Fixes llvm#79794
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[x86] bitReverse of a byte or half-word can probably be optimized
1 participant