Questions about CUDA support #475

MaxwellF1 · 2024-09-18T16:00:31Z

Hi, great work! I have some questions about the CUDA support. I want to use tiled array for tensor contraction on GPU platforms. Does the current implementation perform the whole tensor contraction process on the GPU? In the source code, I only saw calls to cuTT transpose and some other auxiliary kernels, but I did not find any calls to cuBLAS in the implementation of the “*” operator, although it seems that cuBLAS is explicitly specified as a library dependency?

evaleev · 2024-09-18T23:12:24Z

@MaxwellF1 calls to {cu,roc}BLAS do not occur directly, instead we use the awesome blaspp API which provides the proper abstractions to use BLAS on host and device. Calls to device-specific blaspp functions can be found in https://github.com/ValeevGroup/tiledarray/blob/master/src/TiledArray/device/btas.h (note the extra "queue" aka stream argument). Some operations are implemented directly (search for thrust, used to implement reductions, etc.).

Currently to dispatch to CUDA/ROCm/HIP-capable devices you need to construct DistArrays that lives in memory spaces accessible to them. The recommended space is Unified Memory (which is automatically paged in/out of the device by the device driver), this way you can deal with arrays that do not fit into the GPU memory. Example use can be found here: https://github.com/ValeevGroup/tiledarray/blob/master/examples/device/ta_dense_device.cpp

ajay-mk added the question label Sep 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about CUDA support #475

Questions about CUDA support #475

MaxwellF1 commented Sep 18, 2024

evaleev commented Sep 18, 2024

Questions about CUDA support #475

Questions about CUDA support #475

Comments

MaxwellF1 commented Sep 18, 2024

evaleev commented Sep 18, 2024