All notable changes to this project will be documented in this file.
- Ability to build Kokkos and RAJA versions against existing packages.
- Thrust managed memory.
- HIP managed memory.
- New implementation using SYCL2020 USM (sycl2020-acc) and renamed original
sycl2020
tosycl2020-acc
. - New implementation in Fortran
- New implementation in Futhark
- Data initialisation and read-back timing for all models, including Java, Scala, Julia, and Rust
- Add support for the latest Aparapi (3.0.0) and TornadoVM (0.15.x) for Java
- JuliaStream.jl published to registry (pending #113)
- Fix std-data/std-indices compatibility with oneDPL, NVHPC, and AdaptiveCpp (a.k.a. hipSYCL).
- RAJA CUDA CMake build issues resolved.
- Kokkos build updates (CXX version upgraded to C++17).
- Fix CUDA memory limit check.
- Fix CUDA CMake options for
-DMEM
and-DCMAKE_CUDA_FLAGS
. - Use long double for
check_solution
in case of large problem size. - OneAPI DPCPP compiler is deprecated in favour of ICPX, so added new build option to SYCL 2020 version.
- Updates to the HIP kernels and API usage.
- Number of thread-blocks in CUDA dot kernel implementation changed to 1024.
- Fix compatibility of
sycl2020
(nowsycl2020-acc
) with AdaptiveCpp. - Bumped Julia compat to 1.9
- Bumped Scala to 3.3.1
- Bumped Rust to 1.74.0-nightly (13e6f24b9 2023-09-23)
- Upgrade CI to Ubuntu 22.04
- New implementation using the C++ parallel STL (C++17).
- New implementation using C++20 range factories and
for_each_n
. - Compiler options for OpenMP and OpenACC GNU offloading to NVIDIA and AMD.
- Compiler options for Arm Clang added to OpenMP and Kokkos.
- Kokkos 3 build system (No code changes made).
- SYCL build rules for ComputeCpp, DPCPP and HipSYCL.
- Support for CUDA Managed Memory and Page Fault memory.
- Added nstream kernel from PRK with associate command line option.
- CMake build system added for all models.
- SYCL device check for FP64 support.
- New implementations: TBB, Thrust, Julia, Scala, Java.
- Compiler options for Fujitsu added to OpenMP.
- Default branch renamed from
master
tomain
. - Array size is now a signed integer, which follows best practice.
- Driver now delays allocating large checking vectors until after computation has finished.
- Use cl::sycl::id parameters instead of cl::sycl::item.
- Update local copy of OpenCL C++ header file.
- Ensure correct SYCL queue constructor with explicit async_handler.
- Use built in SYCL runtime device discovery.
- Cray compiler OpenMP flags updated.
- Clang compiler OpenMP flags corrected for NVIDIA target.
- Reorder OpenCL objects in class so destructors are called in safe order.
- Ensure all OpenCL kernels are present in destructor.
- Unified run function in driver code to reduce code duplication, output should be uneffected.
- Normalise sum result by expected value to help false negative errors.
- HC version deprecated and moved to a legacy directory.
- Update RAJA to v0.13.0 (w/ code changes as this is a source incompatible update).
- Update SYCL version to SYCL 2020.
- Pre-building of kernels in SYCL version to ensure compatibility with SYCL 1.2.1. Pre-building kernels is also not required, and shows no overhead as the first iteration is not timed.
- OpenACC Cray compiler flags.
- Build support for Kokkos 2.x (No code changes made).
- All Makefiles; build system will now use CMake exclusively.
- OpenACC flags to build for Power 9, Volta, Skylake and KNL.
- Kokkos list CLI argument shows some information about which device will be used.
- OpenMP GNU compiler now uses native target flag.
- Support CSV output for Triad only running mode.
- NEC and PGI compiler option for OpenMP version.
- Option to calculate memory bandwidth in base 2 (MiB/s) rather than base 10 (MB/s).
- Update SYCL implementation to SYCL 1.2.1 interface.
- Output formatting of Kokkos implementation.
- Capitalisation of Kokkos filenames.
- Updated HIP implementation to new interface.
- Use parallel loop instead of kernels for OpenACC.
- OpenMP build for XL compiler uses
-qarch=auto
.
- Superfluous OpenMP 4.5 map(to:) clauses on kernel target regions.
- Kokkos namespace not used by default so the API is easier to spot.
- Manual specification of Kokkos layout (DEVICE) as the Kokkos library sets this by default.
- Kokkos now compiles and links separately to fix complication with Kokkos 2.05.00.
- Kokkos can now instantiate single and double precision.
- OpenMP 4.5 map and reduction clause order to ensure reduction result copied back.
- Potential race condition in SYCL code between unloading OpenCL library and device list deconstructor.
- Add runtime option to run just the Triad kernel.
- Add runtime option for CSV output of results.
- ROCm HC implementation added for AMD GPUs.
- Renamed project to BabelStream (from GPU-STREAM).
- Update SYCL Makefile to use ComputeCpp path variables.
- SYCL exceptions are now fatal, and are propagated to a runtime exception.
- Build instructions for RAJA and Kokkos libraries.
- Use RAJA and Kokkos internal iterator types instead of int.
- Ensure RAJA pointers do not alias.
- Align memory to 2MB pages in RAJA and OpenMP.
- Updated Intel compiler flags for OpenMP, Kokkos and RAJA to ensure streaming stores.
- CUDA Makefile now uses variables to set compiler and flags.
- Use static shared memory for dot kernel in CUDA and HIP.
- Fix initialisation of b array bug in Kokkos implementation.
- Dot kernel HIP implementation.
- Build system overhauled from CMake to a series of Makefiles.
- Android build instructions.
- New Dot kernel added to the 4 standard kernels.
- All model implementations now initialise and allocate their own arrays rather than copying from a master copy. This allows for better performance on NUMA architectures.
- Version string definition moved from header to main file.
- Combined OpenMP 3 and 4.5 implementations.
- OpenMP 4.5 target implementation uses alloc instead of to.
- Made SYCL indexing consistent.
- Update SYCL CMake build to use ComputeCpp CE 0.1.1.
- OpenMP deconstructor now only frees GPU memory only on GPU build.
- SYCL template specializations for float and double.
- New HIP version added.
- Results for v2.0 added.
- Output of OpenCL kernel build log on failure.
- Use globally defined scalar value.
- Change scalar value to stop overflow.
- Restructure results directory.
- Change SYCL default work-group size.
- CMake defaults to Release build.
- CUDA device name output string corrected.
- Out of tree builds.
- Implementations in OpenMP 4.5 OpenACC, RAJA, Kokkos and SYCL.
- Copyright headers to source files.
- Runtime option variables are printed out.
- Device selection added to OpenCL and CUDA.
- Major refactor to include multiple programming models. The change now uses C++ for driver code, with different models plugged in as classes which implement the STREAM kernels.
- Starting values in the arrays to reduce floating point errors with high iteration counts.
- Default array size now 2^25.
- Default to 100 iterations instead of 10.
- CUDA thread block size set via define rather than hardcoded value.
- Require CUDA 7 for C++11 support.
- OpenCL C++ header updated.
- Various CMake build fixes.
- Require at least 2 iterations.
- Warning message for single precision iterations.
- HIP implementation and results.
- Titan X and Fury X results.
- Output of array sizes and other information at runtime.
- Ability to set CUDA block sizes on command line.
- Android build instructions.
- Update OpenCL C++ header.
- Requires CUDA 6.5 or above.
- OpenCL uses Kernel Functor APIs instead of make_kernel API.
- Unsigned integer warnings.
Initial public release of OpenCL and CUDA GPU-STREAM.