Skip to content

How cutlass avoid compiler to be too smart and maintain the pipeline order as design. #1056

Answered by hwu36
cloudhan asked this question in Q&A
Discussion options

You must be logged in to vote

then how can we ensure that the actual copy of tCrA_copy_view and tCrB_copy_view are not moved downward?

you cannot ensure that. nvcc heuristics is supposed to take care of it and generate the "best" binary it think it can. As any heuristics, it does not always generate the optimum code.

Isn't an hypothetically explicit "optimization barrier“ a better way?

Not really. First, it is hard to draw a line. load_global_a_and_b_to_register has many load instructions. We want them to be far away enough from shared memory store, but we also want them to spread a little bit evenly. So, the best place might need to interweave loads to interleave with all the other instructions. Second, it is ver…

Replies: 2 comments 3 replies

Comment options

You must be logged in to vote
3 replies
@cloudhan
Comment options

@hwu36
Comment options

Answer selected by cloudhan
@cloudhan
Comment options

Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants