Releases · ddemidov/vexcl · GitHub

09 Nov 12:01

ddemidov

1.4.3 Latest

Latest

C++ OpenCL wrappers are now included via CL/opencl.hpp (recommended by Khronos) or CL/cl2.hpp (deprecated).
Minor fixes

Assets 2

27 Apr 05:33

ddemidov

1.4.2

Two years worth of minor fixes and improvements.
Added source_generator::num_groups() returning the number of
workgroups on the compute device.
Make push_compile_options, push_program_header behave in a cumulative way.
Added profiler::reset().
Added vector::at().
Support mixed precision in vex::copy().

Assets 2

04 May 14:45

ddemidov

1.4.1

A bug fix release.

Improvements for cmake scripts.
Bug fixes.

Assets 2

19 Apr 18:33

ddemidov

1.4.0

Modernize cmake build system.
Provide VexCL::OpenCL, VexCL::Compute, VexCL::CUDA, VexCL::JIT
imported targets, so that users may just
```
add_executable(myprogram myprogram.cpp)
target_link_libraries(myprogram VexCL::OpenCL)
```
to build a program using the corresponding VexCL backend.
Also stop polluting global cmake namespace with things like
add_definitions(), include_directories(), etc.
See http://vexcl.readthedocs.io/en/latest/cmake.html.
Make vex::backend::kernel::config() return reference to the kernel. So
that it is possible to config and launch the kernel in a single line:
K.config(nblocks, nthreads)(queue, prm1, prm2, prm3);.
Implement vector<T>::reinterpret<U>() method. It returns a new vector that
reinterprets the same data (no copies are made) as the new type.
Implemented new backend: JIT. The backend generates and compiles at runtime
C++ kernels with OpenMP support. The code will not be more effective that
hand-written OpenMP code, but allows to easily debug the generated code with
host-side debugger. The backend also may be used to develop and test new code
when other backends are not available.
Let VEX_CONSTANTS to be casted to their values in the host code. So that a
constant defined with VEX_CONSTANT(name, expr) could be used in host code
as name. Constants are still useable in vector expressions as name().
Allow passing generated kernel args for each GPU (#202).
Kernel args packed into std::vector will be unpacked and passed
to the generated kernels on respective devices.
Reimplemented vex::SpMat as vex::sparse::ell, vex::sparse::crs,
vex::sparse::matrix (automatically chooses one of the two formats based on
the current compute device), and vex::sparse::distributed<format> (this one
may span several compute devices). The new matrix-vector products are now
normal vector expressions, while the old vex::SpMat could only be used in
additive expressions. The old implementation is still available.
vex::sparse::ell is now converted from host-side CRS format on compute
device, which makes the conversion faster.
Bug fixes and minor improvements.

Assets 2

06 Apr 06:48

ddemidov

1.3.3

Added vex::tensordot() operation. Given two tensors (arrays of dimension greater than or equal to one), A and
B, and a list of axes pairs (where each pair represents corresponding axes from two tensors), sums the products of A's and B's elements over the given axes. Inspired by python's numpy.tensordot operation.
Expose constant memory space in OpenCL backend.
Provide shortcut filters vex::Filter::{CPU,GPU,Accelerator} for OpenCL backend.
Added Boost.Compute backend. Core functionality of the Boost.Compute library is used as a replacement to Khronos C++ API which seems to become more and more outdated. The Boost.Compute backend is still based on OpenCL, so there are two OpenCL backends now. Define VEXCL_BACKEND_COMPUTE to use this backend and make sure Boost.Compute headers are in include path.

Assets 2

04 Sep 06:22

ddemidov

1.3.2

Improved thread safety
Implemented any_of and all_of primitives
Minor bugfixes and improvements

Assets 2

14 May 17:53

ddemidov

1.3.1

Adopted scan_by_key algorithm from HSA-Libraries/Bolt.
Minor improvements and bug fixes.

Assets 2

14 Apr 11:55

ddemidov

1.3.0

API breaking change: vex::purge_kernel_caches() family of functions is
renamed to vex::purge_caches() as the online cache now may hold objects of
arbitrary type. The overloads that used to take
vex::backend::kernel_cache_key now take const vex::backend::command_queue&.
The online cache is now purged whenever vex::Context is destroyed. This
allows for clean release of OpenCL/CUDA contexts.
Code for random number generators has been unified between OpenCL and CUDA
backends.
Fast Fourier Transform is now supported both for OpenCL and CUDA backends.
vex::backend::kernel constructor now takes optional parameter with command
line options.
Performance of CLOGS algorithms has been improved.
VEX_BUILTIN_FUNCTION macro has been made public.
Minor bug fixes and improvements.

Assets 2

02 Apr 10:19

ddemidov

1.2.0

API breaking change: the definition of VEX_FUNCTION family of macros has changed. The previous versions are available as VEX_FUNCTION_V1.
Wrapping code for clogs library is added by @bmerry
(the author of clogs).
vector/multivector iterators are now standard-conforming iterators.
Other minor improvements and bug fixes.

Assets 2

24 Dec 12:35

ddemidov

1.1.2

reduce_by_key() may take several tied keys (see e09d249).
It is possible to reduce OpenCL vector types (cl_float2, cl_double4, etc).
VEXCL_SHOW_KERNELS may be an environment variable as well as a preprocessor macro. This allows to control kernel source output without program recompilation.
Added compute capability filter for the CUDA backend (vex::Filter::CC(major, minor)).
Fixed compilation errors and warnings generated by Visual Studio.

Assets 2