Releases: IntelPython/dpctl
0.17.0
This release features updated documentation web-page https://intelpython.github.io/dpctl/latest/index.html, adds cumulative reductions,
and complies with revision 2023.12 of Python Array API specification.
Added
- Added pybind11 caster for
sycl::half
to map to/from Pythonfloat
to"dpctl4pybind11.hpp"
header: gh-1655 - Added support for DLPack data interchange per Python Array API 2023.12 specification: gh-1667
- Implemented
tensor.cumulative_sum
,tensor.cumulative_prod
andtensor.cumulative_logsumexp
: gh-1602
Changed
- Expanded documentation for
dpctl
: gh-1619 - Expanded
utils.intel_device_info
functionality: gh-1656 - Improved performance of elementwise operations: gh-1651
- Efficiency improvement by avoiding unnecessary copying of
sycl::queue
: gh-1645 dpctl
uses pybind11 2.12.0: gh-1640- Improved performance of
tensor.reshape
operation withorder="F"
when copying is needed, or requested: gh-1677
Fixed
- Fixed initialization of byte type constants in
dpctl_capi
Python/C API loader class in"dpctl4pybind11.hpp"
: gh-1665 - Fixed crash in
tensor.sort
reported for a CPU device and a CUDA device: gh-1676 - Fixed race condition in accumulation kernel for custom operations that caused test failures with AMD CPUs: gh-1624
- Fixed comparison operators for mixed signed and unsigned integral types: gh-1650
- Support use of index arrays of different integral types in indexing operations: gh-47
- Fixed source code to compile for NVidia(TM) GPUs with DPC++ 2024.1: gh-1630
- Corrected
tensor.tile
for scalar inputs and empty repetitions: gh-1628 - Fixed support for
out
keyword intensor.matmul
: gh-1610 - Fixed bug in basic slicing of empty arrays: gh-1680
- Fixed bug in
tensor.bitwise_invert
for boolean input array: gh-1681 - Fixed bug in
tensor.repeat
on zero-size input arrays: gh-1682
New Contributors
- @bdmoore1 made their first contribution in #1659
- @ekomarova made their first contribution in #1666
Full Changelog: https://github.com/IntelPython/dpctl/blob/master/CHANGELOG.md
v0.16.1
This release includes bug fixes and provides a change needed by numba_dpex
project to support dispatching kernels
consuming instances of sycl::local_accessor
template type.
Changed
- Changed behavior of
dpctl.tensor.usm_ndarray.__dlpack_device__
method to return device id of the parent unpartitioned device if array is allocated on a sub-device instead of raising an exception: #1604
- Array creation functions and the
usm_ndarray
constructor indpctl.tensor
submodule now use cached default-selected device to improve performance: #1606 - Changed treatment of
axis
keyword fordpctl.tensor.tensordot
anddpctl.tensor.vecdot
to align with Python Array API 2023.12 specification: #1608 - Changed implementation of
DPCTLQueue_SubmitRange
,DPCTLQueue_SubmitNDRange
in DPCTLSyclInterface library to supportsycl::local_accessor
arguments needed bynumba_dpex
; the enumDPCTLKernelArgT\ ype
to correspond to C++ disjoint types: #1609, #1611, #1612
Fixed
- Fixed a crash on Windows platform during execution of getter of
dpctl.SyclPlatfom.default_context
property: : #1604 - Fixed kernel submission error on NVidia CUDA GPUs during
dpctl.tensor.matmul
operation: #1605 - Fixed corruption of context cache table entries: #1607
- Fixed incorrect result from
dpctl.tensor.tensordot
reported in issue #1570: #1608 - Fixed output of
python -m dpctl --library
to fix specified library name: #1615
v0.16.0
This release is virtually identical to 0.15.1 as far as features are concerned.
This release is meant to be built with DPC++ 2024.1.0, that no longer support older integrated Gen9 Intel GPUs, such as those that came with Intel Core 10th generation and older.
v0.15.1
Summary
This release reaches milestone of 100% compliance of dpctl.tensor
functions with Python Array API 2022.12 standard for the main namespace.
Added
- Added reduction functions
dpctl.tensor.min
,dpctl.tensor.max
,dpctl.tensor.argmin
,dpctl.tensor.argmax
, anddpctl.tensor.prod
per Python Array API specifications: #1399 - Added dedicated in-place operations for binary elementwise operations and deployed them in Python operators of
dpctl.tensor.usm_ndarray
type: #1431, #1447 - Added new elementwise functions
dpctl.tensor.cbrt
,dpctl.tensor.rsqrt
,dpctl.tensor.exp2
,dpctl.tensor.copysign
,dpctl.tensor.angle
, anddpctl.tensor.reciprocal
: #1443, #1474 - Added statistical functions
dpctl.tensor.mean
,dpctl.tensor.std
,dpctl.tensor.var
per Python Array API specifications: #1465 - Added sorting functions
dpctl.tensor.sort
anddpctl.tensor.argsort
, and set functionsdpctl.tensor.unique_values
,dpctl.tensor.unique_counts
,dpctl.tensor.unique_inverse
,dpctl.tensor.unique_all
: #1483 - Added linear algebra functions from the Array API namespace
dpctl.tensor.matrix_transpose
,dpctl.tensor.matmul
,dpctl.tensor.vecdot
, anddpctl.tensor.tensordot
: #1490, #1525, #1541 - Added
dpctl.tensor.clip
function: #1444, #1505 - Added custom reduction functions
dpt.logsumexp
(reduction using binary functiondpctl.tensor.logaddexp
),dpt.reduce_hypot
(reduction using binary functiondpctl.tensor.hypot
): #1446 - Added inspection API to query capabilities of Python Array API specification implementation: #1469
- Support for compilation for NVIDIA(R) sycl target with use of CodePlay oneAPI plug-in: #1411, #1124
- Added
dpctl.utils.intel_device_info
function to query additional information about Intel(R) GPU devices: gh-1428 and gh-1445 - Added support for two new device descriptors,
dpctl.SyclDevice.max_mem_alloc_size
anddpctl.SyclDevice.max_clock_frequency
: #1530
Changed
- Functions
dpctl.tensor.result_type
anddpctl.tensor.can_cast
became device-aware: #1488, #1473 - Implementation of method
dpctl.SyclEvent.wait_for
changed to usesycl::event::wait
instead ofsycl::event::wait_and_throw
: gh-1436 dpctl.tensor.astype
was changed to supportdevice
keyword as per Python Array API specification: #1511- C++ header files in
libtensor/include/kernels
containing implementations of SYCL kernels no longer depends on "pybind11.h": #1516
Fixed
v0.15.0
Summary
The 0.15.0 represents a milestone in which dpctl.tensor.usm_ndarray
object now implements all special Python operators, except __matmul__
and __rmatmul__
.
The dpctl.tensor
increases its array-API conformance test suite pass rate to 81.8%, (passed: 916, failed: 84, skipped: 119).
Details
Added
- Added
dpctl.tensor.floor
,dpctl.tensor.ceil
,dpctl.tensor.trunc
elementwise functions. - Added
dpctl.tensor.hypot
,dpctl.tensor.logaddexp
elementwise functions. - Added trigonometric (
dpctl.tensor.sin
,dpctl.tensor.cos
,dpctl.tensor.tan
) and hyperbolic (dpctl.tensor.sinh
,dpctl.tensor.cosh
,dpctl.tensor.tanh
) elementwise functions and their inverses (dpctl.tensor.asin
,dpctl.tensor.asinh
,dpctl.tensor.acos
,dpctl.tensor.acosh
,dpctl.tensor.atan
,dpctl.tensor.atanh
). - Added
dpctl.tensor.round
function. - Added
dpctl.tensor.sign
anddpctl.tensor.remainder
elementwise functions. - Added bitwise elementwise functions
dpctl.tensor.bitwise_and
,dpctl.tensor.bitwise_xor
,dpctl.tensor.bitwise_or
,dpctl.tensor.bitwise_invert
- Added bitwise shift functions
dpctl.tensor.bitwise_left_shift
anddpctl.tensor.bitwise_right_shift
. - Added
dpctl.tensor.atan2
anddpctl.tensor.signbit
elementwise functions. - Added
dpctl.tensor.minumum
anddpctl.tensor.maximum
binary elementwise functions. - Supported equality checking and hashing for
dpctl.SyclPlatform
. - Implemented
types
property for all unary and binary elementwise functions #1361 - Added
dpctl.tensor.repeat
anddpctl.tensor.tile
functions. - Added
dpctl.tensor.matrix_transpose
function.
Changed
- Enabled support for Python arithmetic, in-place arithmetic, reflexive arithmetic, comparison, and bitwise operators for
dpctl.tensor.usm_ndarray
type #1324. - Removed
dpctl.tensor.numpy_usm_shared
obsolete class and associated tests which were being skipped #1310 - Transitioned
dpctl
codebase to Cython 3. - Improved performance of boolean reduction functions
dpctl.tensor.all
anddpctl.tensor.any
. - Improved performance of summation function
dpctl.tensor.sum
. - Improved in-place arithmetic operations for addition, subtraction and multiplication.
- Updated codebase per SYCL-2020 intel/llvm compiler deprecation warnings.
- Improved performance of advanced boolean indexing for arrays whose size fits in 32-bit signed integer type.
- Removed deprecated
DPCTLDevice_GetMaxWorkItemSizes
function from the SyclInterface library. - Improved performance of
dpctl.tensor.reshape
in the case when a copy is being made. - Improved performance of
dpctl.tensor.roll
function.
Fixed
v0.14.5
This release builds on 0.14.3 and 0.14.4 releases and addresses some performance gaps as well as implements several new elementwise functions.
Added
- Added
dpctl.tensor.log2
anddpctl.tensor.log10
: #1267 - Added
dpctl.tensor.negative
,dpctl.tensor.positive
,dpctl.tensor.square
#1268 - Added
dpctl.tensor.logical_not
,dpctl.tensor.logical_and
,dpctl.tensor.logical_or
,dpctl.tensor.logical_xor
#1270
Changed
dpctl.tensor.astype
behavior fornewdtype=None
changes #1261dpctl.tensor.usm_ndaray
constructor default value ofdtype
keyword argument changed toNone
: #1265- Support for
out
arguments that overlap with inputs for unary elementwise functions#1281 - Copying from one array to another a no-op if both arrays view into the same memory #1284
v0.14.4
This is hot-fix for 0.14.3 release.
Added
- Added
dpctl.tensor.less_equal
,dpctl.tensor.greater
,dpctl.tensor.greater_equal
: #1239
Changed
- Optimized in-place arithmetic operations for updating matrix with rows/columns via broadcasting: #1244
Fixed
- Fixed handling of 0d arrays in
dpctl.tensor.sum
: #1238
v0.14.3
Added
- Added support of
axis=None
indpctl.tensor.concat
#1125 - Added caching for
dpctl.SyclDevice.filter_string
property #1127 - Added
dpctl.tensor.isdtype
from array API #1133 - Added
dpctl.tensor.unstack
,dpctl.tensor.moveaxis
,dpctl.tensor.swapaxes
#1137, #1174 - Allow for mutation of
dpctl.tensor.usm_ndarray.flags.writable
#1141 - Added
dpctl.tensor.where
from array API #1147 - Include libtensor headers in
dpctl
installation layout #1185 - Added new properties of
dpctl.tensor.usm_ndarray
object #1199 - Added a list of unary and binary elementwise functions from array API:
- #1203:
dpctl.tensor.add
,dpctl.tensor.divide
,dpctl.tensor.isnan
,dpctl.tensor.isinf
,dpctl.tensor.isfinite
,dpctl.tensor.cos
,dpctl.tensor.abs
,dpctl.tensor.equal
- #1205:
dpctl.tensor.sqrt
- #1209: implements
out
keyword argument - #1211:
dpctl.tensor.multiply
,dpctl.tensor.subtract
- #1214:
dpctl.tensor.not_equal
- #1216:
dpctl.tensor.exp
,dpctl.tensor.sin
- #1217:
dpctl.tensor.real
,dpctl.tensor.imag
,dpctl.tensor.proj
- #1218:
dpctl.tensor.log
,dpctl.tensor.log1p
,dpctl.tensor.expm1
- #1221:
dpctl.tensor.floor_divide
- #1235:
dpctl.tensor.less
- #1237: in-place support for addition, multiplication and subtraction
- #1203:
- Added
dpctl.tensor.all
anddpctl.tensor.any
#1204 - Added
dpctl.tensor.sum
#1210
Changed
- Updated examples of native Python extensions built using
dpctl
#1108 - Used security flags to compile and link native extensions of
dpctl
#1109 - Changed types of
dpctl.tensor.finfo
anddpctl.tensor.iinfo
output structure per array API spec #1110 - Consolidated multiple USM temporaries life-time management
host_task
s to improve test suite stability #1111 - MAINT: Improved cmake target dependency tracking #1112
- MAINT: Improved docstrings for existing
dpctl.tensor
functions #1123 - Changed default value of
mode
keyword indpctl.tensor.take
anddpctl.take.put
fromclip
towrap
#1132 - Added support for (nested) sequence of
dpctl.tensor.usm_ndarray
objects indpctl.tensor.asarray
#1139 - Improved exception handling in
dpctl.tensor.usm_ndarray.__setitem__
special method #1146 - Simplified implementation of copy-and-cast kernels and removed special casing for 2D arrays to conserve binary size #1165
- Improved speed of
dpctl.tensor.usm_ndarray
printing functionality #1187 - Require DPC++ RT 2023.1 to build and run
dpctl
#1195 - Compile offloading native extensions with
-fno-sycl-id-queries-fit-in-int
fixing gh-1184, #1200 - Transition to conda-forge ecosystem #1213
Fixed
- Fix to add empty values check for
dpctl.tensor.place
#1105, #1106 - Fixed gh-1089 by improving
dpctl.tensor.asarray
handling of NumPy arrays viewing into host-accessible USM allocation objects. - MAINT: Fixed build break with newer GCC and SYCLOS #1118
- Fixed a bug in basic indexing of
dpctl.tensor.usm_ndarray
#1136
v0.14.2
Added
- Added
dpctl.SyclDevice.partition_max_sub_devices
property #1005 - Added
dpctl.program.SyclKernel.max_sub_group_size
property #1028 - Implemented printing of
usm_ndarray
#1013, #1043, #1060 - Implemented support for advanced indexing for
dpctl.tensor.usm_ndarray
#1095, #1097, #1099, #1101 - Implemented support for platform listing in
dpctl.__main__
script #1014 - Improved performance of
dpctl.tensor.asnumpy
#1026 - Added
UsmNDArray_Make*
C-API for constructingdpctl.tensor.usm_ndarray
from native allocations #1050, #1067 - Added support for
dpctl.SyclDevice.native_vector_width_*
device descriptors #1075 - Added
dpctl::tensor::usm_ndarray::get_shape_vector
anddpctl::tensor::usm_ndarray::get_strides_vector
methods #1090
Changed
-
Removed
dpctl.select_host_device
,dpctl.has_host_device
,dpctl.SyclDevice.is_host
, anddpctl.SyclDevice.has_aspect_host
since support for host device has been removed in DPC++ 2023 and from SYCL 2020 spec #1028 -
usm_ndarray
is made writable by default #1012, and writable flag is now checked by__setitem__
. -
Added convenience signature for C++ utility function in "dpctl4pybind11.hpp" #1016
-
Improved error reported when attempting to submit kernel that uses a data-type unsupported by target device #1018, #1040
-
Updated C++ code to require DPC++ 2023.0.0 or newer #1028, #1066
-
The
dpctl.tensor.Device
class supportsprint_device_info
method #1029, equality comparison, and hashing #1048 -
Updated version of pybind11 used to 2.10.2 #1031
-
Improved internal utility responsible for reduction of iteration space dimensionality #1044, #1054
-
Changed return type of
DCPCTLUSM_GetPointerType
function in SyclInterface library #1061, #1065 -
Updated supported version of DLPack to 0.8 #1073
-
Implemented queue cache per context/device pair and deployed it in
dpctl.memory
,dpctl.tensor.from_dlpack
anddpctl.tensor
array creation functions #1076, #1079 -
Maintainance, CI work: #1001, #1009, #1011, #1024, #1030, #1032, #1035, #1037, #1039, #1041, #1045, #1047, #1055, #1057, #1059, #1068, #1070, #1074,#1077, #1078, #1081, #1084, #1085, #1088, #1086, #1092, #1093
Fixed
- Fixed error gh-998 in forming Python exception, #999.
- A small memory leak fixed, #1000
- Improved dtype support in
dpctl.tensor.full
, PR #1002 - Added missing header file #1008 fixing gh-1007
- Fixed a typo in device-specific dtype mapping #1015
- Fixed default device integer type to align with NumPy's behavior on Windows #1017
- Fixed unexpected overflow in
dpctl.tensor.linspace
when one of the parameters is the largest floating point value #1034 - Constructors
dpctl.tensor.empty
,dpctl.tensor.zeros
, andusm_ndarray
constructor itself no longer allow to create array with data-types not supported by targeted device #1042 - Fixed parameter validation in
dpctl.SyclQueue
constructor #1052 - Fixed
usm_type
of the resulting array indpctl.tensor.tril
anddpctl.tensor.triu
functions #1062 - Used DPC++ configuration files to ensure correct use of conda compiler toolchain on Linux #1072
- Fixed issue with empty argument of
dpctl.tensor.meshgrid
function #1080 - Fixed linking problem on Windows enabling
dpctl
to be functional on Windows for devices not supporting some data types #1083
Full Changelog: 0.14.0...0.14.2
v0.14.0
[0.14.0] - 11/18/2022
Added
- Implemented
dpctl.tensor.linspace
function from array-API #875. - Implemented
dpctl.tensor.eye
function from array-API #896. - Implemented
dpctl.tensor.tril
anddpctl.tensor.triu
functions from array-API #910. - Added data type objects to
dpctl.tensor
namespace,finfo
,iinfo
,can_cast
, andresult_type
functions #913. - Implemented
dpctl.tensor.meshgrid
creation function from array-API #920. - Implemented convenience class to represent output of
dpctl.tensor.usm_ndarray.flags
property #921. - Added new device attributes and kernel's device-specific attributes #894.
- Added
dpctl.utils.onetrace_enabled
context manager for targeted trace collection #903. - Added support for
stream
keyword in__dlpack__
method, enabling support for sendingusm_ndarray
using mpi4py #906. dpctl.tensor.asarray
can now transition data between incompatible devices, #951.- Introduced
"syclinterface/dpctl_sycl_types_casters.hpp"
header file with declaration of conversion routines between SYCL type pointers and SyclInterface library opaque pointers #960. - Added C-API to
dpctl.program.SyclKernel
anddpctl.program.SyclProgram
. Added type casters for new types to "dpctl4pybind11" and added an example demonstrating its use #970. - Introduced "dpctl/sycl.pxd" Cython declaration file to streamline use of SYCL functions from Cython, and added an example demonstrating its use #981.
- Added experimental support for sharing data allocated on sub-devices via dlpack #984.
- Added
dpctl.SyclDevice.sub_group_sizes
property to retrieve supported sizes of sub-group by the device #985.
Changed
- Improved queue compatibility testing in
dpctl.tensor
's implementation module #900. - Added automatic measurement of array-API conformance test suite in CI #901.
- Improved performance of array metadata transfer from host to device #912.
- Used
os.add_dll_directory
on Windows to ensure thatDPCTLSyclInterface
library can be found #918. - Refactored
dpctl.tensor
's implementation module #941 to streamline adding new functionality. Streamlineddpctl::tensor::usm_ndarray
class implementation. - Added debugging messaging in case when
DPCTLDynamicLib::getSymbol
encounters errors #956. - Updated code base according to changes in DPC++ compiler #952, #957, #958.
- Changed
dpctl
to use pybind11 2.10.1 #967. - Extended
dpctl.tensor.full
to accept 0d and higher dimensional arrays for fill-value parameter #982 and #995.
Fixed
- Improved SyclDevice constructor error message #893.
- Fixed issue gh-890 about
dpctl.tensor.reshape
function #915. - Fixed unexpected
UnboundLocalError
exception in #922. - Fixed bugs in
dpctl.tensor.arange
in #945. - Fixed issue with type inferencing in
dpctl.tensor.asarray
in #949. - Added missing docstrings for
dpctl.SyclDevice
properties #964.