generate and share OpenCL local memory #1406
Replies: 2 comments
-
Look at test/correctness/gpu_dynamic_shared.cpp for the part where -Z- On Mon, Jul 25, 2016 at 1:49 PM, ronghongbo [email protected]
|
Beta Was this translation helpful? Give feedback.
-
Thanks, Zalman. That is useful info. Yes, the source and the code generated contain shared memory and barrier. It is a good starting point. Hongbo From: Zalman Stern [mailto:[email protected]] Look at test/correctness/gpu_dynamic_shared.cpp for the part where -Z- On Mon, Jul 25, 2016 at 1:49 PM, ronghongbo [email protected]
— |
Beta Was this translation helpful? Give feedback.
-
Hello,
Here is a skeleton of OpenCL code that I would like to generate from Halide. The requirement is to generate a local memory; all threads in the same work group fill the memory together, each filling one element. Finally, the work group size must be specified (by __attribute).
__kernel
__attribute((reqd_work_group_size(SIZE1, SIZE2,SIZE3)))
void f( __global float _A)
{
___local float A1[...];*
for (...)
{
copy 1 A element to A1;
barrier(CLK_LOCAL_MEM_FENCE); // wait for all elements of the block to be copied
do work on A1
copy back work to A
}
}
Any idea how to do so?
Thanks,
Hongbo
Beta Was this translation helpful? Give feedback.
All reactions