Make sve_128 portable and support true march=native builds #504
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR makes two changes to improve support for SVE.
The first change makes the sve_128 build portable to any system implementing SVE by removing the fixed vector length at compile time. This code is merely augmented NEON, using native gathers, and will use predicates to force the 128-bit data path no matter what the host vector size is. Testing on Graviton 4 shows that there is no measurable performance difference between portable sve_128 and compile-time enforced sve_128, and testing on Graviton 3 shows that the portability part works fine.
The second change makes the "native" build type actually use
-march=native
and-mcpu=native
, allowing native builds to pick up SVE if the host CPU supports it, as currently native builds rely on compiler defaults which are baseline Arm-v8. Note that this needs Clang-18 or newer to pick up SVE automatically, which is newer than the default compiler on current AWS Linux images for Graviton 3 and 4.