Releases: ddemidov/vexcl
Releases · ddemidov/vexcl
1.4.3
1.4.2
- Two years worth of minor fixes and improvements.
- Added
source_generator::num_groups()
returning the number of
workgroups on the compute device. - Make
push_compile_options
,push_program_header
behave in a cumulative way. - Added
profiler::reset()
. - Added
vector::at()
. - Support mixed precision in
vex::copy()
.
1.4.1
1.4.0
- Modernize cmake build system.
ProvideVexCL::OpenCL
,VexCL::Compute
,VexCL::CUDA
,VexCL::JIT
imported targets, so that users may justto build a program using the corresponding VexCL backend.add_executable(myprogram myprogram.cpp) target_link_libraries(myprogram VexCL::OpenCL)
Also stop polluting global cmake namespace with things like
add_definitions()
,include_directories()
, etc.
See http://vexcl.readthedocs.io/en/latest/cmake.html. - Make
vex::backend::kernel::config()
return reference to the kernel. So
that it is possible to config and launch the kernel in a single line:
K.config(nblocks, nthreads)(queue, prm1, prm2, prm3);
. - Implement
vector<T>::reinterpret<U>()
method. It returns a new vector that
reinterprets the same data (no copies are made) as the new type. - Implemented new backend: JIT. The backend generates and compiles at runtime
C++ kernels with OpenMP support. The code will not be more effective that
hand-written OpenMP code, but allows to easily debug the generated code with
host-side debugger. The backend also may be used to develop and test new code
when other backends are not available. - Let
VEX_CONSTANTS
to be casted to their values in the host code. So that a
constant defined withVEX_CONSTANT(name, expr)
could be used in host code
asname
. Constants are still useable in vector expressions asname()
. - Allow passing generated kernel args for each GPU (#202).
Kernel args packed into std::vector will be unpacked and passed
to the generated kernels on respective devices. - Reimplemented
vex::SpMat
asvex::sparse::ell
,vex::sparse::crs
,
vex::sparse::matrix
(automatically chooses one of the two formats based on
the current compute device), andvex::sparse::distributed<format>
(this one
may span several compute devices). The new matrix-vector products are now
normal vector expressions, while the oldvex::SpMat
could only be used in
additive expressions. The old implementation is still available.
vex::sparse::ell
is now converted from host-side CRS format on compute
device, which makes the conversion faster. - Bug fixes and minor improvements.
1.3.3
- Added
vex::tensordot()
operation. Given two tensors (arrays of dimension greater than or equal to one), A and
B, and a list of axes pairs (where each pair represents corresponding axes from two tensors), sums the products of A's and B's elements over the given axes. Inspired by python's numpy.tensordot operation. - Expose constant memory space in OpenCL backend.
- Provide shortcut filters
vex::Filter::{CPU,GPU,Accelerator}
for OpenCL backend. - Added Boost.Compute backend. Core functionality of the Boost.Compute library is used as a replacement to Khronos C++ API which seems to become more and more outdated. The Boost.Compute backend is still based on OpenCL, so there are two OpenCL backends now. Define
VEXCL_BACKEND_COMPUTE
to use this backend and make sure Boost.Compute headers are in include path.
1.3.2
1.3.1
- Adopted
scan_by_key
algorithm from HSA-Libraries/Bolt. - Minor improvements and bug fixes.
1.3.0
- API breaking change:
vex::purge_kernel_caches()
family of functions is
renamed tovex::purge_caches()
as the online cache now may hold objects of
arbitrary type. The overloads that used to take
vex::backend::kernel_cache_key
now takeconst vex::backend::command_queue&
. - The online cache is now purged whenever
vex::Context
is destroyed. This
allows for clean release of OpenCL/CUDA contexts. - Code for random number generators has been unified between OpenCL and CUDA
backends. - Fast Fourier Transform is now supported both for OpenCL and CUDA backends.
vex::backend::kernel
constructor now takes optional parameter with command
line options.- Performance of CLOGS algorithms has been improved.
- VEX_BUILTIN_FUNCTION macro has been made public.
- Minor bug fixes and improvements.
1.2.0
- API breaking change: the definition of
VEX_FUNCTION
family of macros has changed. The previous versions are available asVEX_FUNCTION_V1
. - Wrapping code for clogs library is added by @bmerry
(the author of clogs). vector
/multivector
iterators are now standard-conforming iterators.- Other minor improvements and bug fixes.
1.1.2
reduce_by_key()
may take several tied keys (see e09d249).- It is possible to reduce OpenCL vector types (
cl_float2
,cl_double4
, etc). VEXCL_SHOW_KERNELS
may be an environment variable as well as a preprocessor macro. This allows to control kernel source output without program recompilation.- Added compute capability filter for the CUDA backend (
vex::Filter::CC(major, minor)
). - Fixed compilation errors and warnings generated by Visual Studio.