Does decoupled look-back require launch blocks in order when sm out of resource ? #1062
-
hi, I have a question about the decoupled look-back algorithm in onesweep radix sort. https://github.com/NVIDIA/cub/blob/main/cub/agent/agent_radix_sort_onesweep.cuh. Does look-back algorithm require launch blocks in order when sm resource is not enough ? For example, A100 sm counter is 108. If we launch 217 blocks (108x2+1) with size 1024. 108 sm can only launch 216 blocks, so one block left will be launched after other 216 blocks finished. If we want look-back work in the case, blockid=216 must be launched after blockid 0 ~ 215, right ? That means we can only look-back, but not look-forward. If look-forward, block 0 ~ 215 are waiting for block 216 status change, but block 216 won't launch until 0~215 finished. Will driver implementation guarantee block launch in order when sm out of resource ? Do limited verified driver support this feature ? Do we need to add check for the cuda version to make sure look-back works in current driver when call the radix sort API? |
Beta Was this translation helpful? Give feedback.
Replies: 0 comments 2 replies
-
The question seems to be a duplicate of the following issue. |
Beta Was this translation helpful? Give feedback.
The question seems to be a duplicate of the following issue.