Status update: lifting the unaligned GPU matmul codegen boats #13227
nicolasvasilache
started this conversation in
Codegen
Replies: 1 comment
-
Exciting results! I can create new benchmarks with this flag enabled once #13133 is merged. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I wanted to share a quick update on unaligned matmul codegen for tensorcore-based GPUs before disappearing for 2 weeks.
Below are performance gains that become available (once #13133 lands) by turning on the
--iree-codegen-llvmgpu-enable-transform-dialect-matmul-tensorcore-strategy
flag (5-40x improvement over the current IREE unaligned cases).This can be reproduced today by just patching #13191 (which extracts the key change required from #13133) and running
make unaligned_matmuls
with this iree-samples commit.This runs a few combinations of align1/align2/align4/align_more around the 3456_1024_2048 size, f32 only for now.
Feel free to try other sizes.
Now, we are still 2-4x off where we want to be and there is still work to do around some of the low-level aspects:
128x128x16x3xwmma
If people feel bold, they could try to turn the flag on by default to get the first 5-40x perf gains.
I'll pick this up again in 2 weeks.
@silvasean @mariecwhite @mattwalsh @stellaraccident @ftynse
Beta Was this translation helpful? Give feedback.
All reactions