-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LinalgToXeGPU] Lower linalg.matmul_transpose_b
into xegpu.dpas
#347
Conversation
Signed-off-by: dchigarev <[email protected]>
linalg.matmul_transpose_b
via xegpu.dpas
linalg.matmul_transpose_b
into xegpu.dpas
Signed-off-by: dchigarev <[email protected]>
Signed-off-by: dchigarev <[email protected]>
Signed-off-by: dchigarev <[email protected]>
@@ -669,6 +669,9 @@ static SmallVector<Value> createDescriptorTiles(PatternRewriter &rewriter, | |||
Value newRowOffs = rewriter.create<arith::ConstantIndexOp>(loc, i); | |||
for (int j = 0; j < loadShape[1]; j += descTile[1] * arrayLength) { | |||
Value newColOffs = rewriter.create<arith::ConstantIndexOp>(loc, j); | |||
if (transpose) { | |||
std::swap(newRowOffs, newColOffs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changes iteration dimension for B chunks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seem to be almost generic enough to support transpose A and the batched transposed versions. Could you try making the interfaces amenable to all the matmul variations? Implementation can stay as is.
"Transpose result is used by more than one matmul operation"); | ||
} | ||
} else if (isa<memref::DeallocOp>(trUser) || | ||
isa<memref::AllocaOp>(trUser)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would alloca mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nothing, it was added here by mistake, will remove in the next commit
Signed-off-by: dchigarev <[email protected]>
Signed-off-by: dchigarev <[email protected]>
Signed-off-by: dchigarev <[email protected]>
|
Sure, I did not suggest adding support for other flavors right away, only to make the interfaces friendly. Supporting other matmuls is a separate issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's also the bool transpose param in a couple places that could be generalized but that's a minor thing.
This bool parameter is used in functions that also take a matmul operand, so this parameter can be used with both of the operands (and with any matmul op). Is this generic enough or you had something else on your mind? bool transposeA = true;
bool transposeB = false;
auto tilesA = createTiles(matA, ..., /*transpose=*/transposeA);
auto tilesB = createTiles(matB, ..., /*transpose=*/transposeB); |
Yup. Sounds good! |
Closes #340
Support lowering of
linalg.matmul_transpose_b
as well aslinalg.transpose %b + linalg.matmul %a %b
intoxegpu.dpas
.To make a transposed multiplication we have to load B chunks with
xegpu.load_nd ... <transpose = array<i64: 1, 0>>
attribute and also change the iteration dimension (from rows to cols).