Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[RISCV] Pack build_vectors into largest available element type (#97351)
Our worst case build_vector lowering is a serial chain of vslide1down.vx operations which creates a serial dependency chain through a relatively high latency operation. We can instead pack together elements into ELEN sized chunks, and move them from integer to scalar in a single operation. This reduces the length of the serial chain on the vector side, and costs at most three scalar instructions per element. This is a win for all cores when the sum of the latencies of the scalar instructions is less than the vslide1down.vx being replaced, and is particularly profitable for out-of-order cores which can overlap the scalar computation. This patch is restricted to configurations with zba and zbb. Without both, the zero extend might require two instructions which would bring the total scalar instructions per element to 4. zba and zba are both present in the rva22u64 baseline which is looking to be quite common for hardware in practice; we could extend this to systems without bitmanip with a bit of extra effort.
- Loading branch information