Skip to content

CLBlast 1.5.0

Compare
Choose a tag to compare
@CNugteren CNugteren released this 04 Dec 21:10
· 191 commits to master since this release

CLBlast version 1.5.0. Changes since previous release (version 1.4.1):

  • Added support for shuffle instructions for NVIDIA GPUs (thanks to 'tyler-utah')
  • Added an option to compile the Netlib API with static OpenCL device and context (-DNETLIB_PERSISTENT_OPENCL=ON)
  • Added a FAQ page to the documentation
  • The tuners now check beforehand on invalid local thread sizes and skip those completely
  • Made the tuning API (OverrideParameters) more flexible, disregarding superfluous parameters
  • Fixed an issue with conjugate transpose not being executed in certain cases for a.o. XOMATCOPY
  • Fixed an issue with AMD GPUs and the new GEMMK == 1 kernel
  • Fixed an issue with the preprocessor and the new GEMMK == 1 kernel
  • Fixed an issue for unequal MWG and NWG and the new GEMMK == 1 kernel
  • Fixed an issue for certain parameters for AXPY's 'XaxpyFaster' kernel
  • Various minor fixes and enhancements
  • Added non-BLAS routines:
    • SCONVGEMM/DCONVGEMM/HCONVGEMM (convolution as im2col followed by batched GEMM)
    • SCOL2IM/DCOL2IM/CCOL2IM/ZCOL2IM/HCOL2IM (col2im transform as used in machine learning)