Releases · IntelPython/dpctl

14 Jul 13:51

0.17.0

a5c40d9

0.17.0 Latest

Latest

This release features updated documentation web-page https://intelpython.github.io/dpctl/latest/index.html, adds cumulative reductions,
and complies with revision 2023.12 of Python Array API specification.

Added

Added pybind11 caster for sycl::half to map to/from Python float to "dpctl4pybind11.hpp" header: gh-1655
Added support for DLPack data interchange per Python Array API 2023.12 specification: gh-1667
Implemented tensor.cumulative_sum, tensor.cumulative_prod and tensor.cumulative_logsumexp: gh-1602

Changed

Expanded documentation for dpctl: gh-1619
Expanded utils.intel_device_info functionality: gh-1656
Improved performance of elementwise operations: gh-1651
Efficiency improvement by avoiding unnecessary copying of sycl::queue: gh-1645
dpctl uses pybind11 2.12.0: gh-1640
Improved performance of tensor.reshape operation with order="F" when copying is needed, or requested: gh-1677

Fixed

Fixed initialization of byte type constants in dpctl_capi Python/C API loader class in "dpctl4pybind11.hpp": gh-1665
Fixed crash in tensor.sort reported for a CPU device and a CUDA device: gh-1676
Fixed race condition in accumulation kernel for custom operations that caused test failures with AMD CPUs: gh-1624
Fixed comparison operators for mixed signed and unsigned integral types: gh-1650
Support use of index arrays of different integral types in indexing operations: gh-47
Fixed source code to compile for NVidia(TM) GPUs with DPC++ 2024.1: gh-1630
Corrected tensor.tile for scalar inputs and empty repetitions: gh-1628
Fixed support for out keyword in tensor.matmul: gh-1610
Fixed bug in basic slicing of empty arrays: gh-1680
Fixed bug in tensor.bitwise_invert for boolean input array: gh-1681
Fixed bug in tensor.repeat on zero-size input arrays: gh-1682

New Contributors

@bdmoore1 made their first contribution in #1659
@ekomarova made their first contribution in #1666

Full Changelog: https://github.com/IntelPython/dpctl/blob/master/CHANGELOG.md

Contributors

ekomarova and bdmoore1

Assets 4

11 Apr 01:25

oleksandr-pavlyk

0.16.1

1f13ce8

v0.16.1

This release includes bug fixes and provides a change needed by numba_dpex project to support dispatching kernels
consuming instances of sycl::local_accessor template type.

Changed

Changed behavior of dpctl.tensor.usm_ndarray.__dlpack_device__ method to return device id of the parent unpartitioned device if array is allocated on a sub-device instead of raising an exception: #1604
Array creation functions and the usm_ndarray constructor in dpctl.tensor submodule now use cached default-selected device to improve performance: #1606
Changed treatment of axis keyword for dpctl.tensor.tensordot and dpctl.tensor.vecdot to align with Python Array API 2023.12 specification: #1608
Changed implementation of DPCTLQueue_SubmitRange, DPCTLQueue_SubmitNDRange in DPCTLSyclInterface library to support sycl::local_accessor arguments needed by numba_dpex; the enum DPCTLKernelArgT\ ype to correspond to C++ disjoint types: #1609, #1611, #1612

Fixed

Fixed a crash on Windows platform during execution of getter of dpctl.SyclPlatfom.default_context property: : #1604
Fixed kernel submission error on NVidia CUDA GPUs during dpctl.tensor.matmul operation: #1605
Fixed corruption of context cache table entries: #1607
Fixed incorrect result from dpctl.tensor.tensordot reported in issue #1570: #1608
Fixed output of python -m dpctl --library to fix specified library name: #1615

Assets 4

28 Mar 02:59

oleksandr-pavlyk

0.16.0

6efb2c9

v0.16.0

This release is virtually identical to 0.15.1 as far as features are concerned.

This release is meant to be built with DPC++ 2024.1.0, that no longer support older integrated Gen9 Intel GPUs, such as those that came with Intel Core 10th generation and older.

Assets 4

10 Feb 21:51

oleksandr-pavlyk

0.15.1

94fc707

v0.15.1

Summary

This release reaches milestone of 100% compliance of dpctl.tensor functions with Python Array API 2022.12 standard for the main namespace.

Added

Added reduction functions dpctl.tensor.min, dpctl.tensor.max, dpctl.tensor.argmin, dpctl.tensor.argmax, and dpctl.tensor.prod per Python Array API specifications: #1399
Added dedicated in-place operations for binary elementwise operations and deployed them in Python operators of dpctl.tensor.usm_ndarray type: #1431, #1447
Added new elementwise functions dpctl.tensor.cbrt, dpctl.tensor.rsqrt, dpctl.tensor.exp2, dpctl.tensor.copysign, dpctl.tensor.angle, and dpctl.tensor.reciprocal: #1443, #1474
Added statistical functions dpctl.tensor.mean, dpctl.tensor.std, dpctl.tensor.var per Python Array API specifications: #1465
Added sorting functions dpctl.tensor.sort and dpctl.tensor.argsort, and set functions dpctl.tensor.unique_values, dpctl.tensor.unique_counts, dpctl.tensor.unique_inverse, dpctl.tensor.unique_all: #1483
Added linear algebra functions from the Array API namespace dpctl.tensor.matrix_transpose, dpctl.tensor.matmul, dpctl.tensor.vecdot, and dpctl.tensor.tensordot: #1490, #1525, #1541
Added dpctl.tensor.clip function: #1444, #1505
Added custom reduction functions dpt.logsumexp (reduction using binary function dpctl.tensor.logaddexp), dpt.reduce_hypot (reduction using binary function dpctl.tensor.hypot): #1446
Added inspection API to query capabilities of Python Array API specification implementation: #1469
Support for compilation for NVIDIA(R) sycl target with use of CodePlay oneAPI plug-in: #1411, #1124
Added dpctl.utils.intel_device_info function to query additional information about Intel(R) GPU devices: gh-1428 and gh-1445
Added support for two new device descriptors, dpctl.SyclDevice.max_mem_alloc_size and dpctl.SyclDevice.max_clock_frequency: #1530

Changed

Functions dpctl.tensor.result_type and dpctl.tensor.can_cast became device-aware: #1488, #1473
Implementation of method dpctl.SyclEvent.wait_for changed to use sycl::event::wait instead of sycl::event::wait_and_throw: gh-1436
dpctl.tensor.astype was changed to support device keyword as per Python Array API specification: #1511
C++ header files in libtensor/include/kernels containing implementations of SYCL kernels no longer depends on "pybind11.h": #1516

Fixed

Fixed issues with dpctl.tensor.repeat support for axis keyword: #1427, #1433
Fix for gh-1503 for bug usm_ndarray.__setitem__: #1504
Other bug fixes: #1485, #1477, #1512

Assets 4

29 Sep 16:06

oleksandr-pavlyk

0.15.0

5bd924e

v0.15.0

Summary

The 0.15.0 represents a milestone in which dpctl.tensor.usm_ndarray object now implements all special Python operators, except __matmul__ and __rmatmul__.

The dpctl.tensor increases its array-API conformance test suite pass rate to 81.8%, (passed: 916, failed: 84, skipped: 119).

Details

Added

Added dpctl.tensor.floor, dpctl.tensor.ceil, dpctl.tensor.trunc elementwise functions.
Added dpctl.tensor.hypot, dpctl.tensor.logaddexp elementwise functions.
Added trigonometric (dpctl.tensor.sin, dpctl.tensor.cos, dpctl.tensor.tan) and hyperbolic (dpctl.tensor.sinh, dpctl.tensor.cosh, dpctl.tensor.tanh) elementwise functions and their inverses (dpctl.tensor.asin, dpctl.tensor.asinh, dpctl.tensor.acos, dpctl.tensor.acosh, dpctl.tensor.atan, dpctl.tensor.atanh).
Added dpctl.tensor.round function.
Added dpctl.tensor.sign and dpctl.tensor.remainder elementwise functions.
Added bitwise elementwise functions dpctl.tensor.bitwise_and, dpctl.tensor.bitwise_xor, dpctl.tensor.bitwise_or, dpctl.tensor.bitwise_invert
Added bitwise shift functions dpctl.tensor.bitwise_left_shift and dpctl.tensor.bitwise_right_shift.
Added dpctl.tensor.atan2 and dpctl.tensor.signbit elementwise functions.
Added dpctl.tensor.minumum and dpctl.tensor.maximum binary elementwise functions.
Supported equality checking and hashing for dpctl.SyclPlatform.
Implemented types property for all unary and binary elementwise functions #1361
Added dpctl.tensor.repeat and dpctl.tensor.tile functions.
Added dpctl.tensor.matrix_transpose function.

Changed

Enabled support for Python arithmetic, in-place arithmetic, reflexive arithmetic, comparison, and bitwise operators for dpctl.tensor.usm_ndarray type #1324.
Removed dpctl.tensor.numpy_usm_shared obsolete class and associated tests which were being skipped #1310
Transitioned dpctl codebase to Cython 3.
Improved performance of boolean reduction functions dpctl.tensor.all and dpctl.tensor.any.
Improved performance of summation function dpctl.tensor.sum.
Improved in-place arithmetic operations for addition, subtraction and multiplication.
Updated codebase per SYCL-2020 intel/llvm compiler deprecation warnings.
Improved performance of advanced boolean indexing for arrays whose size fits in 32-bit signed integer type.
Removed deprecated DPCTLDevice_GetMaxWorkItemSizes function from the SyclInterface library.
Improved performance of dpctl.tensor.reshape in the case when a copy is being made.
Improved performance of dpctl.tensor.roll function.

Fixed

Fixed issues identified by Coverity security scans.
Fixed issues #1279, #1350, #1344, #1327, #1241, #1250, #1293.

Assets 4

19 Jul 12:34

oleksandr-pavlyk

0.14.5

f52182d

v0.14.5

This release builds on 0.14.3 and 0.14.4 releases and addresses some performance gaps as well as implements several new elementwise functions.

Added

Added dpctl.tensor.log2 and dpctl.tensor.log10: #1267
Added dpctl.tensor.negative, dpctl.tensor.positive, dpctl.tensor.square #1268
Added dpctl.tensor.logical_not, dpctl.tensor.logical_and, dpctl.tensor.logical_or, dpctl.tensor.logical_xor #1270

Changed

dpctl.tensor.astype behavior for newdtype=None changes #1261
dpctl.tensor.usm_ndaray constructor default value of dtype keyword argument changed to None: #1265
Support for out arguments that overlap with inputs for unary elementwise functions#1281
Copying from one array to another a no-op if both arrays view into the same memory #1284

Assets 4

19 Jul 12:32

oleksandr-pavlyk

0.14.4

3794cbc

v0.14.4

This is hot-fix for 0.14.3 release.

Added

Added dpctl.tensor.less_equal, dpctl.tensor.greater, dpctl.tensor.greater_equal: #1239

Changed

Optimized in-place arithmetic operations for updating matrix with rows/columns via broadcasting: #1244

Fixed

Fixed handling of 0d arrays in dpctl.tensor.sum: #1238

Assets 4

19 Jul 12:31

oleksandr-pavlyk

0.14.3

81553f8

v0.14.3

Added

Added support of axis=None in dpctl.tensor.concat #1125
Added caching for dpctl.SyclDevice.filter_string property #1127
Added dpctl.tensor.isdtype from array API #1133
Added dpctl.tensor.unstack, dpctl.tensor.moveaxis, dpctl.tensor.swapaxes #1137, #1174
Allow for mutation of dpctl.tensor.usm_ndarray.flags.writable #1141
Added dpctl.tensor.where from array API #1147
Include libtensor headers in dpctl installation layout #1185
Added new properties of dpctl.tensor.usm_ndarray object #1199
Added a list of unary and binary elementwise functions from array API:
- #1203: dpctl.tensor.add, dpctl.tensor.divide, dpctl.tensor.isnan, dpctl.tensor.isinf, dpctl.tensor.isfinite, dpctl.tensor.cos, dpctl.tensor.abs, dpctl.tensor.equal
- #1205: dpctl.tensor.sqrt
- #1209: implements out keyword argument
- #1211: dpctl.tensor.multiply, dpctl.tensor.subtract
- #1214: dpctl.tensor.not_equal
- #1216: dpctl.tensor.exp, dpctl.tensor.sin
- #1217: dpctl.tensor.real, dpctl.tensor.imag, dpctl.tensor.proj
- #1218: dpctl.tensor.log, dpctl.tensor.log1p, dpctl.tensor.expm1
- #1221: dpctl.tensor.floor_divide
- #1235: dpctl.tensor.less
- #1237: in-place support for addition, multiplication and subtraction
Added dpctl.tensor.all and dpctl.tensor.any #1204
Added dpctl.tensor.sum #1210

Changed

Updated examples of native Python extensions built using dpctl #1108
Used security flags to compile and link native extensions of dpctl #1109
Changed types of dpctl.tensor.finfo and dpctl.tensor.iinfo output structure per array API spec #1110
Consolidated multiple USM temporaries life-time management host_tasks to improve test suite stability #1111
MAINT: Improved cmake target dependency tracking #1112
MAINT: Improved docstrings for existing dpctl.tensor functions #1123
Changed default value of mode keyword in dpctl.tensor.take and dpctl.take.put from clip to wrap #1132
Added support for (nested) sequence of dpctl.tensor.usm_ndarray objects in dpctl.tensor.asarray #1139
Improved exception handling in dpctl.tensor.usm_ndarray.__setitem__ special method #1146
Simplified implementation of copy-and-cast kernels and removed special casing for 2D arrays to conserve binary size #1165
Improved speed of dpctl.tensor.usm_ndarray printing functionality #1187
Require DPC++ RT 2023.1 to build and run dpctl #1195
Compile offloading native extensions with -fno-sycl-id-queries-fit-in-int fixing gh-1184, #1200
Transition to conda-forge ecosystem #1213

Fixed

Fix to add empty values check for dpctl.tensor.place #1105, #1106
Fixed gh-1089 by improving dpctl.tensor.asarray handling of NumPy arrays viewing into host-accessible USM allocation objects.
MAINT: Fixed build break with newer GCC and SYCLOS #1118
Fixed a bug in basic indexing of dpctl.tensor.usm_ndarray #1136

Assets 4

28 Mar 13:54

oleksandr-pavlyk

0.14.2

6dc5479

v0.14.2

Added

Added dpctl.SyclDevice.partition_max_sub_devices property #1005
Added dpctl.program.SyclKernel.max_sub_group_size property #1028
Implemented printing of usm_ndarray #1013, #1043, #1060
Implemented support for advanced indexing for dpctl.tensor.usm_ndarray #1095, #1097, #1099, #1101
Implemented support for platform listing in dpctl.__main__ script #1014
Improved performance of dpctl.tensor.asnumpy #1026
Added UsmNDArray_Make* C-API for constructing dpctl.tensor.usm_ndarray from native allocations #1050, #1067
Added support for dpctl.SyclDevice.native_vector_width_* device descriptors #1075
Added dpctl::tensor::usm_ndarray::get_shape_vector and dpctl::tensor::usm_ndarray::get_strides_vector methods #1090

Changed

Removed dpctl.select_host_device, dpctl.has_host_device, dpctl.SyclDevice.is_host, and dpctl.SyclDevice.has_aspect_host since support for host device has been removed in DPC++ 2023 and from SYCL 2020 spec #1028
usm_ndarrayis made writable by default #1012, and writable flag is now checked by __setitem__.
Added convenience signature for C++ utility function in "dpctl4pybind11.hpp" #1016
Improved error reported when attempting to submit kernel that uses a data-type unsupported by target device #1018, #1040
Updated C++ code to require DPC++ 2023.0.0 or newer #1028, #1066
The dpctl.tensor.Device class supports print_device_info method #1029, equality comparison, and hashing #1048
Updated version of pybind11 used to 2.10.2 #1031
Improved internal utility responsible for reduction of iteration space dimensionality #1044, #1054
Changed return type of DCPCTLUSM_GetPointerType function in SyclInterface library #1061, #1065
Updated supported version of DLPack to 0.8 #1073
Implemented queue cache per context/device pair and deployed it in dpctl.memory, dpctl.tensor.from_dlpack and dpctl.tensor array creation functions #1076, #1079
Maintainance, CI work: #1001, #1009, #1011, #1024, #1030, #1032, #1035, #1037, #1039, #1041, #1045, #1047, #1055, #1057, #1059, #1068, #1070, #1074,#1077, #1078, #1081, #1084, #1085, #1088, #1086, #1092, #1093

Fixed

Fixed error gh-998 in forming Python exception, #999.
A small memory leak fixed, #1000
Improved dtype support in dpctl.tensor.full, PR #1002
Added missing header file #1008 fixing gh-1007
Fixed a typo in device-specific dtype mapping #1015
Fixed default device integer type to align with NumPy's behavior on Windows #1017
Fixed unexpected overflow in dpctl.tensor.linspace when one of the parameters is the largest floating point value #1034
Constructors dpctl.tensor.empty, dpctl.tensor.zeros, and usm_ndarray constructor itself no longer allow to create array with data-types not supported by targeted device #1042
Fixed parameter validation in dpctl.SyclQueue constructor #1052
Fixed usm_type of the resulting array in dpctl.tensor.tril and dpctl.tensor.triu functions #1062
Used DPC++ configuration files to ensure correct use of conda compiler toolchain on Linux #1072
Fixed issue with empty argument of dpctl.tensor.meshgrid function #1080
Fixed linking problem on Windows enabling dpctl to be functional on Windows for devices not supporting some data types #1083

Full Changelog: 0.14.0...0.14.2

Assets 4

19 Nov 05:10

oleksandr-pavlyk

0.14.0

21a6931

v0.14.0

[0.14.0] - 11/18/2022

Added

Implemented dpctl.tensor.linspace function from array-API #875.
Implemented dpctl.tensor.eye function from array-API #896.
Implemented dpctl.tensor.tril and dpctl.tensor.triu functions from array-API #910.
Added data type objects to dpctl.tensor namespace, finfo, iinfo, can_cast, and result_type functions #913.
Implemented dpctl.tensor.meshgrid creation function from array-API #920.
Implemented convenience class to represent output of dpctl.tensor.usm_ndarray.flags property #921.
Added new device attributes and kernel's device-specific attributes #894.
Added dpctl.utils.onetrace_enabled context manager for targeted trace collection #903.
Added support for stream keyword in __dlpack__ method, enabling support for sending usm_ndarray using mpi4py #906.
dpctl.tensor.asarray can now transition data between incompatible devices, #951.
Introduced "syclinterface/dpctl_sycl_types_casters.hpp" header file with declaration of conversion routines between SYCL type pointers and SyclInterface library opaque pointers #960.
Added C-API to dpctl.program.SyclKernel and dpctl.program.SyclProgram. Added type casters for new types to "dpctl4pybind11" and added an example demonstrating its use #970.
Introduced "dpctl/sycl.pxd" Cython declaration file to streamline use of SYCL functions from Cython, and added an example demonstrating its use #981.
Added experimental support for sharing data allocated on sub-devices via dlpack #984.
Added dpctl.SyclDevice.sub_group_sizes property to retrieve supported sizes of sub-group by the device #985.

Changed

Improved queue compatibility testing in dpctl.tensor's implementation module #900.
Added automatic measurement of array-API conformance test suite in CI #901.
Improved performance of array metadata transfer from host to device #912.
Used os.add_dll_directory on Windows to ensure that DPCTLSyclInterface library can be found #918.
Refactored dpctl.tensor's implementation module #941 to streamline adding new functionality. Streamlined dpctl::tensor::usm_ndarray class implementation.
Added debugging messaging in case when DPCTLDynamicLib::getSymbol encounters errors #956.
Updated code base according to changes in DPC++ compiler #952, #957, #958.
Changed dpctl to use pybind11 2.10.1 #967.
Extended dpctl.tensor.full to accept 0d and higher dimensional arrays for fill-value parameter #982 and #995.

Fixed

Improved SyclDevice constructor error message #893.
Fixed issue gh-890 about dpctl.tensor.reshape function #915.
Fixed unexpected UnboundLocalError exception in #922.
Fixed bugs in dpctl.tensor.arange in #945.
Fixed issue with type inferencing in dpctl.tensor.asarray in #949.
Added missing docstrings for dpctl.SyclDevice properties #964.

Assets 6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added

Changed

Fixed

New Contributors

Contributors

Changed

Fixed

Summary

Added

Changed

Fixed

Summary

Details

Added

Changed

Fixed

Added

Changed

Added

Changed

Fixed

Added

Changed

Fixed

Added

Changed

Fixed

[0.14.0] - 11/18/2022

Added

Changed

Fixed

Releases: IntelPython/dpctl

0.17.0

Added

Changed

Fixed

New Contributors

Contributors

v0.16.1

Changed

Fixed

v0.16.0

v0.15.1

Summary

Added

Changed

Fixed

v0.15.0

Summary

Details

Added

Changed

Fixed

v0.14.5

Added

Changed

v0.14.4

Added

Changed

Fixed

v0.14.3

Added

Changed

Fixed

v0.14.2

Added

Changed

Fixed

v0.14.0

[0.14.0] - 11/18/2022

Added

Changed

Fixed