-
Notifications
You must be signed in to change notification settings - Fork 35
[spec] Caching
This page discusses caching of plan objects and generated code regarding the OpenCL backend.
The main characteristics of a plan object are:
- partitioning of the iteration space, ie, number of partitions / partition size.
- an execution scheme for the staging of data in local memory, ie,
ind
andloc
maps. - an execution scheme for the par_loop, ie,
blockmap
, and block and thread colouring.
These depends on the characteristics of ParLoopCall
objects:
- Iteration space size: affect the number of partitions.
- Global reduction arguments' datatype size: affect the partition size because of the storage required for the on device reduction part.
- Global read and direct Dat arguments' datatype size. affect the partition size because of extra kernel stub arguments that are passed through local memory.
- Indirect Dat arguments' datatype size and dim: affect the partition size because of the local memory space required for staging.
- Indirect Dat arguments' mapping values and indices: affect the content of
ind
andloc
maps. - Indirect reduction Dat arguments' mapping values: affect the colouring scheme.
- The order of appearance of the first occurrence of a (Dat, Map) pair for Indirect Dat arguments: affect the content of the
ind
map. - The order of Indirect Dat arguments; indices for a given (Dat, Map) pair: affect the content of the
loc
maps.
In order to cache plan object in the OpenCL backend; ParLoopCall
indirect Dat arguments are sorted by (Dat, Map) (7. and 8.), and provide a canonical representation of its arguments wrt plan caching (method _plan_key
), that representation includes:
- The size of the iteration space (1.)
- The partition size (2., 3., and 4.)
- For (5.), for each mapping, a tuple of the map values' md5 digest and of a the list of indices appearing in the args.
- For (6.), for each Dat that is indirectly reduced, the list of tuple of map values' md5 digest and of indices through which Dat is reduced.
This canonical representation is laid as follows:
(iteraction_space_size, partition_size, [(map.md5digest, [idx])], [[(map.md5digest, [idx])]])
- do we actually need to digest entire mapping values ? can we digest only the elements that are actually indexed ?
The generated code for the execution of a par_loop
depends on the characteristics of ParLoopCall
:
- The user kernel as it is directly inlined in the generated code.
- The name of the user kernel function: affect the call statement inside the kernel stub.
- The type of
par_loop
: direct or indirect. - The
Const
objects of the user program: argument of the kernel stub and user kernel. - The dimension of
Const
objects (scalar or not): scalar const are passed by value to the user kernel while non scalar are passed as pointers. - Argument type (Dat, Global, Mat): each type has specific generate code logic.
- Argument data type: affect type declaration for argument, local variables etc.
- Staged Dat argument dimension: affect staging code.
- Global reduction argument dimension: affect work-grou-wide reduction code.
- Staged Dat argument access mode: (R,RW: need staging-in code, RW,W: need staging-out code, INC: need coloured execution scheme.
- The unique Dat-Map pair: affect the number of
ind
map. - The indirect Dat arguments: affect which
ind
andloc
map used. - Vec maps argument dimension: affect the code populating the local vec map array.
- The order of the arguments: affect the user kernel call statement.
- The iteration space extents.
In order to cache generated code in the OpenCL backend; the canonical representation of a ParLoopCall
wrt generated code caching comprises:
- An md5 digest of the user_kernel code (before instrumentation) and of the user kernel name (1. and 2.)
- When present the iteration space's extents (15.) or an empty tuple
- For each
Const
: a tuple (name, is_scalar) (4. and 5.) - For each arguments:
- a tuple (type, dimension, access) (3.,6.,7.,8., 9., and 10.)
- an indirect description value: (-1) for global and direct dats, the position of the
Map
in the order of appearance ofMap
s in the arguments (3., 11., 12., and 14.), of the negative value of the map dimension for vector arguments (13.).
This canonical representation is laid as follows:
(user_kernel_code_and_name_digest, extents, [(argtype, argdim, argacc, indvalue)], [(const_name, const_is_scalar)])
- note: the key is redundant wrt to (3.).