Skip to content

Latest commit

 

History

History
38 lines (35 loc) · 1.64 KB

terms.md

File metadata and controls

38 lines (35 loc) · 1.64 KB

Table comparing syntax for different compute APIs

Term CUDA HIP OpenCL
Device int deviceId int deviceId cl_device
Queue cudaStream_t hipStream_t cl_command_queue
Event cudaEvent_t hipEvent_t cl_event
Memory void * void * cl_mem
grid grid NDRange
block block work-group
thread thread work-item
warp warp sub-group
Thread-
index
threadIdx.x threadIdx.x get_local_id(0)
Block-
index
blockIdx.x blockIdx.x get_group_id(0)
Block-
dim
blockDim.x blockDim.x get_local_size(0)
Grid-dim gridDim.x gridDim.x get_num_groups(0)
Device Kernel __global__ __global__ __kernel
Device Function __device__ __device__ Implied in device compilation
Host Function __host_ (default) __host_ (default) Implied in host compilation
Host + Device Function __host__ __device__ __host__ __device__ No equivalent
Kernel Launch <<< >>> hipLaunchKernel/hipLaunchKernelGGL/<<< >>> clEnqueueNDRangeKernel
Global Memory __global__ __global__ __global
Group Memory __shared__ __shared__ __local
Constant __constant__ __constant__ __constant
__syncthreads __syncthreads barrier(CLK_LOCAL_MEMFENCE)
Atomic Builtins atomicAdd atomicAdd atomic_add
Precise Math cos(f) cos(f) cos(f)
Fast Math __cos(f) __cos(f) native_cos(f)
Vector float4 float4 float4

Notes

The indexing functions (starting with thread-index) show the terminology for a 1D grid. Some APIs use reverse order of xyz / 012 indexing for 3D grids.