Skip to content

Releases: IntelPython/dpctl


14 Jul 13:51
Choose a tag to compare

This release features updated documentation web-page, adds cumulative reductions,
and complies with revision 2023.12 of Python Array API specification.


  • Added pybind11 caster for sycl::half to map to/from Python float to "dpctl4pybind11.hpp" header: gh-1655
  • Added support for DLPack data interchange per Python Array API 2023.12 specification: gh-1667
  • Implemented tensor.cumulative_sum, tensor.cumulative_prod and tensor.cumulative_logsumexp: gh-1602


  • Expanded documentation for dpctl: gh-1619
  • Expanded utils.intel_device_info functionality: gh-1656
  • Improved performance of elementwise operations: gh-1651
  • Efficiency improvement by avoiding unnecessary copying of sycl::queue: gh-1645
  • dpctl uses pybind11 2.12.0: gh-1640
  • Improved performance of tensor.reshape operation with order="F" when copying is needed, or requested: gh-1677


  • Fixed initialization of byte type constants in dpctl_capi Python/C API loader class in "dpctl4pybind11.hpp": gh-1665
  • Fixed crash in tensor.sort reported for a CPU device and a CUDA device: gh-1676
  • Fixed race condition in accumulation kernel for custom operations that caused test failures with AMD CPUs: gh-1624
  • Fixed comparison operators for mixed signed and unsigned integral types: gh-1650
  • Support use of index arrays of different integral types in indexing operations: gh-47
  • Fixed source code to compile for NVidia(TM) GPUs with DPC++ 2024.1: gh-1630
  • Corrected tensor.tile for scalar inputs and empty repetitions: gh-1628
  • Fixed support for out keyword in tensor.matmul: gh-1610
  • Fixed bug in basic slicing of empty arrays: gh-1680
  • Fixed bug in tensor.bitwise_invert for boolean input array: gh-1681
  • Fixed bug in tensor.repeat on zero-size input arrays: gh-1682

New Contributors

Full Changelog:


11 Apr 01:25
Choose a tag to compare

This release includes bug fixes and provides a change needed by numba_dpex project to support dispatching kernels
consuming instances of sycl::local_accessor template type.


  • Changed behavior of dpctl.tensor.usm_ndarray.__dlpack_device__ method to return device id of the parent unpartitioned device if array is allocated on a sub-device instead of raising an exception: #1604
  • Array creation functions and the usm_ndarray constructor in dpctl.tensor submodule now use cached default-selected device to improve performance: #1606
  • Changed treatment of axis keyword for dpctl.tensor.tensordot and dpctl.tensor.vecdot to align with Python Array API 2023.12 specification: #1608
  • Changed implementation of DPCTLQueue_SubmitRange, DPCTLQueue_SubmitNDRange in DPCTLSyclInterface library to support sycl::local_accessor arguments needed by numba_dpex; the enum DPCTLKernelArgT\ ype to correspond to C++ disjoint types: #1609, #1611, #1612


  • Fixed a crash on Windows platform during execution of getter of dpctl.SyclPlatfom.default_context property: : #1604
  • Fixed kernel submission error on NVidia CUDA GPUs during dpctl.tensor.matmul operation: #1605
  • Fixed corruption of context cache table entries: #1607
  • Fixed incorrect result from dpctl.tensor.tensordot reported in issue #1570: #1608
  • Fixed output of python -m dpctl --library to fix specified library name: #1615


28 Mar 02:59
Choose a tag to compare

This release is virtually identical to 0.15.1 as far as features are concerned.

This release is meant to be built with DPC++ 2024.1.0, that no longer support older integrated Gen9 Intel GPUs, such as those that came with Intel Core 10th generation and older.


10 Feb 21:51
Choose a tag to compare


This release reaches milestone of 100% compliance of dpctl.tensor functions with Python Array API 2022.12 standard for the main namespace.


  • Added reduction functions dpctl.tensor.min, dpctl.tensor.max, dpctl.tensor.argmin, dpctl.tensor.argmax, and per Python Array API specifications: #1399
  • Added dedicated in-place operations for binary elementwise operations and deployed them in Python operators of dpctl.tensor.usm_ndarray type: #1431, #1447
  • Added new elementwise functions dpctl.tensor.cbrt, dpctl.tensor.rsqrt, dpctl.tensor.exp2, dpctl.tensor.copysign, dpctl.tensor.angle, and dpctl.tensor.reciprocal: #1443, #1474
  • Added statistical functions dpctl.tensor.mean, dpctl.tensor.std, dpctl.tensor.var per Python Array API specifications: #1465
  • Added sorting functions dpctl.tensor.sort and dpctl.tensor.argsort, and set functions dpctl.tensor.unique_values, dpctl.tensor.unique_counts, dpctl.tensor.unique_inverse, dpctl.tensor.unique_all: #1483
  • Added linear algebra functions from the Array API namespace dpctl.tensor.matrix_transpose, dpctl.tensor.matmul, dpctl.tensor.vecdot, and dpctl.tensor.tensordot: #1490, #1525, #1541
  • Added dpctl.tensor.clip function: #1444, #1505
  • Added custom reduction functions dpt.logsumexp (reduction using binary function dpctl.tensor.logaddexp), dpt.reduce_hypot (reduction using binary function dpctl.tensor.hypot): #1446
  • Added inspection API to query capabilities of Python Array API specification implementation: #1469
  • Support for compilation for NVIDIA(R) sycl target with use of CodePlay oneAPI plug-in: #1411, #1124
  • Added dpctl.utils.intel_device_info function to query additional information about Intel(R) GPU devices: gh-1428 and gh-1445
  • Added support for two new device descriptors, dpctl.SyclDevice.max_mem_alloc_size and dpctl.SyclDevice.max_clock_frequency: #1530


  • Functions dpctl.tensor.result_type and dpctl.tensor.can_cast became device-aware: #1488, #1473
  • Implementation of method dpctl.SyclEvent.wait_for changed to use sycl::event::wait instead of sycl::event::wait_and_throw: gh-1436
  • dpctl.tensor.astype was changed to support device keyword as per Python Array API specification: #1511
  • C++ header files in libtensor/include/kernels containing implementations of SYCL kernels no longer depends on "pybind11.h": #1516



29 Sep 16:06
Choose a tag to compare


The 0.15.0 represents a milestone in which dpctl.tensor.usm_ndarray object now implements all special Python operators, except __matmul__ and __rmatmul__.

The dpctl.tensor increases its array-API conformance test suite pass rate to 81.8%, (passed: 916, failed: 84, skipped: 119).



  • Added dpctl.tensor.floor, dpctl.tensor.ceil, dpctl.tensor.trunc elementwise functions.
  • Added dpctl.tensor.hypot, dpctl.tensor.logaddexp elementwise functions.
  • Added trigonometric (dpctl.tensor.sin, dpctl.tensor.cos, dpctl.tensor.tan) and hyperbolic (dpctl.tensor.sinh, dpctl.tensor.cosh, dpctl.tensor.tanh) elementwise functions and their inverses (dpctl.tensor.asin, dpctl.tensor.asinh, dpctl.tensor.acos, dpctl.tensor.acosh, dpctl.tensor.atan, dpctl.tensor.atanh).
  • Added dpctl.tensor.round function.
  • Added dpctl.tensor.sign and dpctl.tensor.remainder elementwise functions.
  • Added bitwise elementwise functions dpctl.tensor.bitwise_and, dpctl.tensor.bitwise_xor, dpctl.tensor.bitwise_or, dpctl.tensor.bitwise_invert
  • Added bitwise shift functions dpctl.tensor.bitwise_left_shift and dpctl.tensor.bitwise_right_shift.
  • Added dpctl.tensor.atan2 and dpctl.tensor.signbit elementwise functions.
  • Added dpctl.tensor.minumum and dpctl.tensor.maximum binary elementwise functions.
  • Supported equality checking and hashing for dpctl.SyclPlatform.
  • Implemented types property for all unary and binary elementwise functions #1361
  • Added dpctl.tensor.repeat and dpctl.tensor.tile functions.
  • Added dpctl.tensor.matrix_transpose function.


  • Enabled support for Python arithmetic, in-place arithmetic, reflexive arithmetic, comparison, and bitwise operators for dpctl.tensor.usm_ndarray type #1324.
  • Removed dpctl.tensor.numpy_usm_shared obsolete class and associated tests which were being skipped #1310
  • Transitioned dpctl codebase to Cython 3.
  • Improved performance of boolean reduction functions dpctl.tensor.all and dpctl.tensor.any.
  • Improved performance of summation function dpctl.tensor.sum.
  • Improved in-place arithmetic operations for addition, subtraction and multiplication.
  • Updated codebase per SYCL-2020 intel/llvm compiler deprecation warnings.
  • Improved performance of advanced boolean indexing for arrays whose size fits in 32-bit signed integer type.
  • Removed deprecated DPCTLDevice_GetMaxWorkItemSizes function from the SyclInterface library.
  • Improved performance of dpctl.tensor.reshape in the case when a copy is being made.
  • Improved performance of dpctl.tensor.roll function.



19 Jul 12:34
Choose a tag to compare

This release builds on 0.14.3 and 0.14.4 releases and addresses some performance gaps as well as implements several new elementwise functions.


  • Added dpctl.tensor.log2 and dpctl.tensor.log10: #1267
  • Added dpctl.tensor.negative, dpctl.tensor.positive, dpctl.tensor.square #1268
  • Added dpctl.tensor.logical_not, dpctl.tensor.logical_and, dpctl.tensor.logical_or, dpctl.tensor.logical_xor #1270


  • dpctl.tensor.astype behavior for newdtype=None changes #1261
  • dpctl.tensor.usm_ndaray constructor default value of dtype keyword argument changed to None: #1265
  • Support for out arguments that overlap with inputs for unary elementwise functions#1281
  • Copying from one array to another a no-op if both arrays view into the same memory #1284


19 Jul 12:32
Choose a tag to compare

This is hot-fix for 0.14.3 release.


  • Added dpctl.tensor.less_equal, dpctl.tensor.greater, dpctl.tensor.greater_equal: #1239


  • Optimized in-place arithmetic operations for updating matrix with rows/columns via broadcasting: #1244


  • Fixed handling of 0d arrays in dpctl.tensor.sum: #1238


19 Jul 12:31
Choose a tag to compare


  • Added support of axis=None in dpctl.tensor.concat #1125
  • Added caching for dpctl.SyclDevice.filter_string property #1127
  • Added dpctl.tensor.isdtype from array API #1133
  • Added dpctl.tensor.unstack, dpctl.tensor.moveaxis, dpctl.tensor.swapaxes #1137, #1174
  • Allow for mutation of dpctl.tensor.usm_ndarray.flags.writable #1141
  • Added dpctl.tensor.where from array API #1147
  • Include libtensor headers in dpctl installation layout #1185
  • Added new properties of dpctl.tensor.usm_ndarray object #1199
  • Added a list of unary and binary elementwise functions from array API:
    • #1203: dpctl.tensor.add, dpctl.tensor.divide, dpctl.tensor.isnan, dpctl.tensor.isinf, dpctl.tensor.isfinite, dpctl.tensor.cos, dpctl.tensor.abs, dpctl.tensor.equal
    • #1205: dpctl.tensor.sqrt
    • #1209: implements out keyword argument
    • #1211: dpctl.tensor.multiply, dpctl.tensor.subtract
    • #1214: dpctl.tensor.not_equal
    • #1216: dpctl.tensor.exp, dpctl.tensor.sin
    • #1217: dpctl.tensor.real, dpctl.tensor.imag, dpctl.tensor.proj
    • #1218: dpctl.tensor.log, dpctl.tensor.log1p, dpctl.tensor.expm1
    • #1221: dpctl.tensor.floor_divide
    • #1235: dpctl.tensor.less
    • #1237: in-place support for addition, multiplication and subtraction
  • Added dpctl.tensor.all and dpctl.tensor.any #1204
  • Added dpctl.tensor.sum #1210


  • Updated examples of native Python extensions built using dpctl #1108
  • Used security flags to compile and link native extensions of dpctl #1109
  • Changed types of dpctl.tensor.finfo and dpctl.tensor.iinfo output structure per array API spec #1110
  • Consolidated multiple USM temporaries life-time management host_tasks to improve test suite stability #1111
  • MAINT: Improved cmake target dependency tracking #1112
  • MAINT: Improved docstrings for existing dpctl.tensor functions #1123
  • Changed default value of mode keyword in dpctl.tensor.take and dpctl.take.put from clip to wrap #1132
  • Added support for (nested) sequence of dpctl.tensor.usm_ndarray objects in dpctl.tensor.asarray #1139
  • Improved exception handling in dpctl.tensor.usm_ndarray.__setitem__ special method #1146
  • Simplified implementation of copy-and-cast kernels and removed special casing for 2D arrays to conserve binary size #1165
  • Improved speed of dpctl.tensor.usm_ndarray printing functionality #1187
  • Require DPC++ RT 2023.1 to build and run dpctl #1195
  • Compile offloading native extensions with -fno-sycl-id-queries-fit-in-int fixing gh-1184, #1200
  • Transition to conda-forge ecosystem #1213


  • Fix to add empty values check for #1105, #1106
  • Fixed gh-1089 by improving dpctl.tensor.asarray handling of NumPy arrays viewing into host-accessible USM allocation objects.
  • MAINT: Fixed build break with newer GCC and SYCLOS #1118
  • Fixed a bug in basic indexing of dpctl.tensor.usm_ndarray #1136


28 Mar 13:54
Choose a tag to compare


  • Added dpctl.SyclDevice.partition_max_sub_devices property #1005
  • Added dpctl.program.SyclKernel.max_sub_group_size property #1028
  • Implemented printing of usm_ndarray #1013, #1043, #1060
  • Implemented support for advanced indexing for dpctl.tensor.usm_ndarray #1095, #1097, #1099, #1101
  • Implemented support for platform listing in dpctl.__main__ script #1014
  • Improved performance of dpctl.tensor.asnumpy #1026
  • Added UsmNDArray_Make* C-API for constructing dpctl.tensor.usm_ndarray from native allocations #1050, #1067
  • Added support for dpctl.SyclDevice.native_vector_width_* device descriptors #1075
  • Added dpctl::tensor::usm_ndarray::get_shape_vector and dpctl::tensor::usm_ndarray::get_strides_vector methods #1090


  • Removed dpctl.select_host_device, dpctl.has_host_device, dpctl.SyclDevice.is_host, and dpctl.SyclDevice.has_aspect_host since support for host device has been removed in DPC++ 2023 and from SYCL 2020 spec #1028

  • usm_ndarrayis made writable by default #1012, and writable flag is now checked by __setitem__.

  • Added convenience signature for C++ utility function in "dpctl4pybind11.hpp" #1016

  • Improved error reported when attempting to submit kernel that uses a data-type unsupported by target device #1018, #1040

  • Updated C++ code to require DPC++ 2023.0.0 or newer #1028, #1066

  • The dpctl.tensor.Device class supports print_device_info method #1029, equality comparison, and hashing #1048

  • Updated version of pybind11 used to 2.10.2 #1031

  • Improved internal utility responsible for reduction of iteration space dimensionality #1044, #1054

  • Changed return type of DCPCTLUSM_GetPointerType function in SyclInterface library #1061, #1065

  • Updated supported version of DLPack to 0.8 #1073

  • Implemented queue cache per context/device pair and deployed it in dpctl.memory, dpctl.tensor.from_dlpack and dpctl.tensor array creation functions #1076, #1079

  • Maintainance, CI work: #1001, #1009, #1011, #1024, #1030, #1032, #1035, #1037, #1039, #1041, #1045, #1047, #1055, #1057, #1059, #1068, #1070, #1074,#1077, #1078, #1081, #1084, #1085, #1088, #1086, #1092, #1093


  • Fixed error gh-998 in forming Python exception, #999.
  • A small memory leak fixed, #1000
  • Improved dtype support in dpctl.tensor.full, PR #1002
  • Added missing header file #1008 fixing gh-1007
  • Fixed a typo in device-specific dtype mapping #1015
  • Fixed default device integer type to align with NumPy's behavior on Windows #1017
  • Fixed unexpected overflow in dpctl.tensor.linspace when one of the parameters is the largest floating point value #1034
  • Constructors dpctl.tensor.empty, dpctl.tensor.zeros, and usm_ndarray constructor itself no longer allow to create array with data-types not supported by targeted device #1042
  • Fixed parameter validation in dpctl.SyclQueue constructor #1052
  • Fixed usm_type of the resulting array in dpctl.tensor.tril and dpctl.tensor.triu functions #1062
  • Used DPC++ configuration files to ensure correct use of conda compiler toolchain on Linux #1072
  • Fixed issue with empty argument of dpctl.tensor.meshgrid function #1080
  • Fixed linking problem on Windows enabling dpctl to be functional on Windows for devices not supporting some data types #1083

Full Changelog: 0.14.0...0.14.2


19 Nov 05:10
Choose a tag to compare

[0.14.0] - 11/18/2022


  • Implemented dpctl.tensor.linspace function from array-API #875.
  • Implemented dpctl.tensor.eye function from array-API #896.
  • Implemented dpctl.tensor.tril and dpctl.tensor.triu functions from array-API #910.
  • Added data type objects to dpctl.tensor namespace, finfo, iinfo, can_cast, and result_type functions #913.
  • Implemented dpctl.tensor.meshgrid creation function from array-API #920.
  • Implemented convenience class to represent output of dpctl.tensor.usm_ndarray.flags property #921.
  • Added new device attributes and kernel's device-specific attributes #894.
  • Added dpctl.utils.onetrace_enabled context manager for targeted trace collection #903.
  • Added support for stream keyword in __dlpack__ method, enabling support for sending usm_ndarray using mpi4py #906.
  • dpctl.tensor.asarray can now transition data between incompatible devices, #951.
  • Introduced "syclinterface/dpctl_sycl_types_casters.hpp" header file with declaration of conversion routines between SYCL type pointers and SyclInterface library opaque pointers #960.
  • Added C-API to dpctl.program.SyclKernel and dpctl.program.SyclProgram. Added type casters for new types to "dpctl4pybind11" and added an example demonstrating its use #970.
  • Introduced "dpctl/sycl.pxd" Cython declaration file to streamline use of SYCL functions from Cython, and added an example demonstrating its use #981.
  • Added experimental support for sharing data allocated on sub-devices via dlpack #984.
  • Added dpctl.SyclDevice.sub_group_sizes property to retrieve supported sizes of sub-group by the device #985.


  • Improved queue compatibility testing in dpctl.tensor's implementation module #900.
  • Added automatic measurement of array-API conformance test suite in CI #901.
  • Improved performance of array metadata transfer from host to device #912.
  • Used os.add_dll_directory on Windows to ensure that DPCTLSyclInterface library can be found #918.
  • Refactored dpctl.tensor's implementation module #941 to streamline adding new functionality. Streamlined dpctl::tensor::usm_ndarray class implementation.
  • Added debugging messaging in case when DPCTLDynamicLib::getSymbol encounters errors #956.
  • Updated code base according to changes in DPC++ compiler #952, #957, #958.
  • Changed dpctl to use pybind11 2.10.1 #967.
  • Extended dpctl.tensor.full to accept 0d and higher dimensional arrays for fill-value parameter #982 and #995.


  • Improved SyclDevice constructor error message #893.
  • Fixed issue gh-890 about dpctl.tensor.reshape function #915.
  • Fixed unexpected UnboundLocalError exception in #922.
  • Fixed bugs in dpctl.tensor.arange in #945.
  • Fixed issue with type inferencing in dpctl.tensor.asarray in #949.
  • Added missing docstrings for dpctl.SyclDevice properties #964.