Replies: 1 comment
-
Hi @rickyyx, Good question!
This is correct.
Unfortunately, not (yet). What is your use case? Does the pipeline example using mempcy async from the CUDA blog speed up your kernel? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hey, I was trying to speed up my kernel that invovles memcpy from global + some computing, looking at the official documentation of async memcpy, I see there are multiple ways of doing async memcpy:
pipelining with memcpy_async doc
using the TMA related memcpy_async doc
The pipelined async_memcpy seems to be natively support with memcpy_async from cuda/pipeline taking in a pipeline instance. However, there seems to be no similar API for the TMA’s group of APIs.
Is one able to speed up the TMA async mecmpy with pipeline with current interfaces and APIs?
Thanks
Beta Was this translation helpful? Give feedback.
All reactions