-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Operator] Add tile op #148
Conversation
2ef6a9a
to
d7c8f8a
Compare
code.writeline( | ||
f"in{i}_strides = broadcasted_stride(in{i}.shape, in{i}.stride(), shape)" | ||
) | ||
code.writeline(f"if 'in{i}_shape' in kwargs:") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This modification relax the requirements that operand's shape are broadcastable when explicitly passed in input shape.
It may conflict with our further changes to the codegen functionality, but we are considering adding a more powerful analysis to handle these cases. Thanks~
) | ||
code.writeline( | ||
f"in{i}_shape = [(num_tasks + 1) for _ in range(len(shape))]" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why use in{i}_shape = [(num_tasks + 1) for _ in range(len(shape))]
as input shape in this case? Why not just use shape
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In tile op, in_shape[i]
will be used when tl.load
.
Specifically when rank == 2, change in0 = tl.load(in0_ptr + i0 * in0_stride0 + i1 * in0_stride1, mask=mask)
into in0 = tl.load(in0_ptr + (i0 % in0.shape[0]) * in0_stride0 + (i1 % in0.shape[1]) * in0_stride1, mask=mask)
.
But for other ops, we don't need to % in0.shape[0]
. To make sure this change will not affect other ops, (num_tasks + 1)
is set.
6b01543
to
3cca5e5
Compare
☂️ Python Coverage
Overall Coverage
New Files
Modified Files
|
Perf result in NV-A100