Releases: spcl/dace
v1.0.0rc1
We are happy to announce the first release candidate of DaCe version 1.0!
This version uses the SDFG intermediate representation as published in the original Stateful Dataflow Multigraphs paper, which has been stable for quite some time.
On a fundamental level, this release is no different from a minor version release (this version could have been DaCe 0.17). However, with this release we would like to emphasize stability rather than new features.
If you are using DaCe and have a critical or blocking issue that makes it unstable, please create an issue and refer to it in the release discussion, so that we can add it to our release plan. Thank you for using DaCe!
Release Notes
New features:
- Add GUIDs to SDFG elements and SDFG diff support (by @phschaad)
- Added
can_be_applied_to()
to Transformation API (by @philip-paul-mueller) - Support SymPy 1.13 (by @BenWeber42)
- New
WCRToAugAssign
transformation (by @alexnick83) - (Experimental) Control flow (loop, conditional, named) regions (by @phschaad and @luca-patrignani). Stay tuned for more updates in the next development releases!
Bugfixes:
- Inter-state edge assignment race condition test in validation (by @tbennun)
- Improve memlet label and string initialization (by @tbennun, @philip-paul-mueller)
- Minor updates to documentation and internal APIs (by @tbennun, @phschaad, @philip-paul-mueller, @BenWeber42)
- Minor fixes to the following transformations and passes:
RedundantArray
,TransientReuse
,DetectLoop
,ConstantPropagation
,PruneConnectors
(by @philip-paul-mueller, @tbennun, @luigifusco) - Minor frontend improvements (by @FlorianDeconinck, @BenWeber42)
- Minor improvements to the code generator (by @iBug, @philip-paul-mueller)
See Full Changelog: v0.16.1...v1.0.0rc1
New Contributors
- @iBug made their first contribution in #1630
- @luigifusco made their first contribution in #1635
v0.16.1
What's Changed
The main purpose of this release is to require NumPy < 2 for DaCe, since NumPy 2.0.0 contains breaking changes which aren't compatible with DaCe currently.
Recently, NumPy 2.0.0 has been released: https://numpy.org/news/#numpy-200-released
The release comes with documented breaking changes. Unfortunately, DaCe is currently not compatible with these changes. This also affects the recent 0.16 release of DaCe. Hence, we adjust our dependency requirements to use NumPy < 2 as a temporary work-around in this PR:
Fix numpy version to < 2.0 by @phschaad in #1601
Long term, we are tracking adding support for NumPy 2 in DaCe in this issue: #1602
Fix constant propagation failing due to invalid topological sort by @phschaad in #1589
This changeset has also landed in DaCe's development branch earlier. It fixes an issue where the ConstantPropagation pass can fail for certain graph structures.
Full Changelog: v0.16...v0.16.1
v0.16
What's Changed
CI/CD pipeline for NOAA & NASA weather and climate model by @FlorianDeconinck & @BenWeber42 in #1460, #1478 & #1575
Our collaborators NOAA & NASA have successfully used DaCe as an optimization framework and back-end for some of the components of their climate and weather model. Particularly, the FV3 dycore and GFS physics parametrization have been ported to a combination of GT4Py Python DSL and DaCe. DaCe is used within their stack as a stencil backend and as a full-program optimizer integrating stencils and glue-code together.
With this CI/CD pipeline, we run various checks for those components on every change to DaCe. This is an important step for DaCe to ensure stability for real-world applications that utilize DaCe. We are very grateful for this contribution and the collaboration with NOAA & NASA.
Changed default of serialize_all_fields to False by @BenWeber42 in #1564
This feature was already implemented in the previous 0.15.1 release in #1452, but not enabled by default. In this release, we are changing the default so that only fields with non-default values are serialized. This generally leads to a reduction in file size for SDFGs.
Since each DaCe version stores the default values of each field, it is still possible to recover these missing values. Default values should rarely change across different DaCe versions. Nevertheless, we want to caution users & developers when using SDFG files with different DaCe versions.
Analysis passes for access range analysis by @tbennun in #1484
Adds two analysis passes to help with analyzing data access sets: access ranges and Reference sources. To enable constructing sets of memlets, this PR also reintroduces data descriptor names to memlet hashes.
Reference-to-View pass and comprehensive reference test suite by @tbennun in #1485
Implements a reference-to-view pass (converting references to views if they are only set to one particular subset). Also improves the simplify pipeline in the presence of Reference data descriptors and adds multiple tests that use references.
Ndarray strides by @alexnick83 in #1506
The PR adds support for custom strides to dace.ndarray
. Furthermore, the stride unit is number of elements, in contrast to NumPy/CuPy, where it is number of bytes. Custom strides are not supported for numpy.ndarray
and cupy.ndarray
.
Structure Support to NestedSDFGs and Python Frontend by @alexnick83 in #1366
Adds basic support for nested data (Structures) to the Python frontend. It also resolves issues with the use of Structures in nested SDFG scopes (mostly code generation).
Generalize StructArrays to ContainerArrays and refactor View class structure by @tbennun in #1504
This PR enables the use of an array data descriptor that contains a nested data descriptor (e.g., ContainerArray of Arrays). Its contents can then be viewed normally with View or StructureView.
With this, concepts such as jagged arrays are natively supported in DaCe (see test for example).
Also adds support for using ctypes pointers and arrays as arguments to SDFGs.
This PR also refactors the notion of views to a View interface, and provides views to arrays, structures, and container arrays. It also adds a syntactic-sugar/helper API to define a view of an existing data descriptor.
Add support for distributed compilation in DaceProgram by @kotsaloscv in #1551 & #1555
Adds configurable support for distributed compilation (MPI) to the Python front-end (via mpi4py). Distributed compilation can be enabled with the distributed_compilation
parameter in the dace.program
decorator.
Fixes and other improvements:
- Remove unused deps by @jack-mcivor in #1459
- Small fix for debuginfo that can be None by @kotsaloscv in #1469
- Make dynamic map range docs more explicit by @tbennun in #1474
- Added
nan
to the DaCemath
namespace by @philip-paul-mueller in #1437 - Fix for floordiv on GPU target by @edopao in #1471
- Add merge_group to CI for merge queues by @tbennun in #1482
- Fix SymPy dependency (again) by @tbennun in #1483
- Fix for CUDA codegen by @edopao in #1442
- Complete coverage for reference-to-view pass by @tbennun in #1488
- CMakeLists.txt Improvements for CUDA by @kylosus in #1337
- Faster Call for
CompiledSDFG
by @philip-paul-mueller in #1467 - Evaluate dtype_to_typeclass at use time by @tbennun in #1494
- Fix redefinition of interstate edge type in code generator by @tbennun in #1490
- CuPy fixes and special cases for HIP by @tbennun in #1492
- CI Update by @tim0s in #1502
- FPGA CI Update by @tim0s in #1508
- Bump jinja2 from 3.1.2 to 3.1.3 by @dependabot in #1503
- Jupyter fix by @phschaad in #1489
- Modernize HIP CMake commands, fix corner cases by @tbennun in #1518
- Remove the long-deprecated
symbol.get/set
methods by @tbennun in #1523 - Support output indirection in numpy frontend by @tbennun in #1509
- Fix for const references by @alexnick83 in #1522
DeadDataFlowElimination
will add type hint when removing a connector by @luca-patrignani in #1499- Fixed an issue in the Memlet duplication verification. by @philip-paul-mueller in #1526
- Refactor SDFG List to CFG List by @phschaad in #1511
- Dependency Edge Hotfix by @Berke-Ates in #1513
- Remove Property.from_string and Property.to_string by @luca-patrignani in #1529
- Fixed the
{in,out}_edges()
function of theDiGraph
class. by @philip-paul-mueller in #1527 - Fixes for structures nested in (nested) struct-arrays by @alexnick83 in #1534
- Updated and fixed the MapExpansion transformation. by @philip-paul-mueller in #1532
- Updated and fixed the MapDimShuffle tranformation. by @philip-paul-mueller in #1531
- Use State Fissioning to Generalize Transformations by @lukastruemper in #1462
- Fixed edge consolidation by @philip-paul-mueller in #1546
- Fix Profiler + Minor improvements by @JanKleine in #1548
- Add dtype for numpy.uintp which is compatible with C uintptr_t by @kotsaloscv in #1544
- Fix bug in map_fusion transformation by @edopao in #1553
- Updated the
add_state_{after, before}()
function. by @philip-paul-mueller in #1556 - Bump idna from 3.4 to 3.7 by @dependabot in #1557
- Fix infinite loops in memlet path when a scope cycle is added by @tbennun in #1559
- Adds support for ArrayView to the Python Frontend by @alexnick83 in #1565
- It is now possible to suppress output in
view()
by @philip-paul-mueller in #1566 - Bump jinja2 from 3.1.3 to 3.1.4 by @dependabot in #1569
- Correction in the docstring of the SDFG class's init method by @alexnick83 in #1571
- Fix Subscript literal evaluation for List by @FlorianDeconinck in #1570
SDFG.save()
now performs tilde expansion. by @philip-paul-mueller in #1578- Control Flow Block Constraints by @phschaad in #1476
- Updated SDFV and Corresponding HTML Template by @phschaad in #1580
- Changed Xilinx C++11 flag to C++14 by @BenWeber42 in #1585
- Made
dace::math::pow
forward tostd::pow
more generic by @Berke-Ates @philip-paul-mueller @phschaad @BenWeber42 in #1580
New Contributors
- @jack-mcivor made their first contribution in #1459
- @kylosus made their first contribution in #1337
- @luca-patrignani made their first contribution in #1499
Full Changelog: v0.15.1...v0.16
v0.15.1
What's Changed
Highlights
- Option for utilizing GPU global memory by @alexnick83 in #1405
- Add tensor storage format abstraction by @JanKleine in #1392
- Hierarchical Control Flow / Control Flow Regions by @phschaad in #1404
- GPU code generation: User-specified block/thread/warp location by @tbennun in #1358
- Implement loop-based Fortran intrinsics by @mcopik in #1394
- Change strides move assignment outside if by @Sajohn-CH in #1402
- Numpy fill accepts also variables by @philip-paul-mueller in #1420
- Implement writeset underapproximation by @matteonu in #1425
- Loop Regions by @phschaad in #1407
- Compress the SDFG generated when failing/invalid for larger codebase by @FlorianDeconinck in #1456
- Do not serialize non-default fields by default by @tbennun in #1452
Fixes and other improvements:
- replace |& which is not widely supported by @tim0s in #1399
- RTL codegen "line" error by @carljohnsen in #1403
- Bump urllib3 from 2.0.6 to 2.0.7 by @dependabot in #1400
- Bugfixes and extended testing for Fortran SUM by @mcopik in #1390
- Remove erroneous file creation in test by @JanKleine in #1411
- Fix for VS Code debug console: view opens sdfg in VS Code and not in browser by @kotsaloscv in #1419
- Bump werkzeug from 2.3.5 to 3.0.1 by @dependabot in #1409
- AugAssignToWCR: Support for more cases and increased test coverage by @lukastruemper in #1359
- Implement Subsetlist and covers_precise by @matteonu in #1412
- OTFMapFusion: Bugfix for tasklets with None connectors by @lukastruemper in #1415
- Better mangeling of the state struct in the code generator by @philip-paul-mueller in #1413
- Trivial map elimination init by @Sajohn-CH in #1353
- Fixed Improper Method Call: Replaced
mktemp
by @fazledyn-or in #1428 - Symbol specialization in
auto_optimizer()
never took effect. by @philip-paul-mueller in #1410 - Issue a warning when
to_sdfg()
ignores the auto_optimize flag (Issue #1380). by @philip-paul-mueller in #1395 - Fix schedule tree conversion for use of arrays in conditions by @tbennun in #1440
- Fixes for TaskletFusion, AugAssignToWCR and MapExpansion by @lukastruemper in #1432
- AugAssignToWCR: Minor fix for node not found error by @lukastruemper in #1447
- OTFMapFusion: Minor bug fixes by @lukastruemper in #1448
- Fix three issues related to deepcopying elements by @tbennun in #1446
- Fix CUDA high-dimensional test by @tbennun in #1441
SDFG.arg_names
was not a member but a class variable. by @philip-paul-mueller in #1457- PruneConnectors: Fission into separate states before pruning by @lukastruemper in #1451
- In-out connector's global source when connector becomes out-only at outer SDFG scopes. by @alexnick83 in #1463
- Fix two regressions in v0.15 by @tbennun in #1465
- Fix codegen with data access on inter-state edge by @edopao in #1434
New Contributors
- @kotsaloscv made their first contribution in #1419
- @matteonu made their first contribution in #1412
- @philip-paul-mueller made their first contribution in #1413
- @fazledyn-or made their first contribution in #1428
Full Changelog: v0.15...v0.15.1rc1
v0.15
What's Changed
Work-Depth / Average Parallelism Analysis by @hodelcl in #1363 and #1327
A new analysis engine allows SDFGs to be statically analyzed for work and depth / average parallelism. The analysis allows specifying a series of assumptions about symbolic program parameters that can help simplify and improve the analysis results. For an example on how to use the analysis, see the following example:
from dace.sdfg.work_depth_analysis import work_depth
# A dictionary mapping each SDFG element to a tuple (work, depth)
work_depth_map = {}
# Assumptions about symbolic parameters
assumptions = ['N>5', 'M<200', 'K>N']
work_depth.analyze_sdfg(mysdfg, work_depth_map, work_depth.get_tasklet_work_depth, assumptions)
# A dictionary mapping each SDFG element to its average parallelism
average_parallelism_map = {}
work_depth.analyze_sdfg(mysdfg, average_parallelism_map, work_depth.get_tasklet_avg_par, assumptions)
Symbol parameter reduction in generated code (#1338, #1344)
To improve our integration with external codes, we limit the symbolic parameters generated by DaCe to only the used symbols. Take the following code for example:
@dace
def addone(a: dace.float64[N]):
for i in dace.map[0:10]:
a[i] += 1
Since the internal code does not actually need N
to process the array, it will not appear in the generated code. Before this release the signature of the generated code would be:
DACE_EXPORTED void __program_addone(addone_t *__state, double * __restrict__ a, int N);
After this release it is:
DACE_EXPORTED void __program_addone(addone_t *__state, double * __restrict__ a);
Note that this is a major, breaking change that requires users who manually interact with the generated .so files to adapt to.
Externally-allocated memory (workspace) support (#1294)
A new allocation lifetime, dace.AllocationLifetime.External
, has been introduced into DaCe. Now you can use your DaCe code with external memory allocators (such as PyTorch) and ask DaCe for: (a) how much transient memory it will need; and (b) to use a specific pre-allocated pointer. Example:
@dace
def some_workspace(a: dace.float64[N]):
workspace = dace.ndarray([N], dace.float64, lifetime=dace.AllocationLifetime.External)
workspace[:] = a
workspace += 1
a[:] = workspace
csdfg = some_workspace.to_sdfg().compile()
sizes = csdfg.get_workspace_sizes() # Returns {dace.StorageType.CPU_Heap: N*8}
wsp = # ...Allocate externally...
csdfg.set_workspace(dace.StorageType.CPU_Heap, wsp)
The same interface is available in the generated code:
size_t __dace_get_external_memory_size_CPU_Heap(programname_t *__state, int N);
void __dace_set_external_memory_CPU_Heap(programname_t *__state, char *ptr, int N);
// or GPU_Global...
Schedule Trees (EXPERIMENTAL, #1145)
An experimental feature that allows you to analyze your SDFGs in a schedule-oriented format. It takes in SDFGs (even after applying transformations) and outputs a tree of elements that can be printed out in a Python-like syntax. For example:
@dace.program
def matmul(A: dace.float32[10, 10], B: dace.float32[10, 10], C: dace.float32[10, 10]):
for i in range(10):
for j in dace.map[0:10]:
atile = dace.define_local([10], dace.float32)
atile[:] = A[i]
for k in range(10):
with dace.tasklet:
# ...
sdfg = matmul.to_sdfg()
from dace.sdfg.analysis.schedule_tree.sdfg_to_tree import as_schedule_tree
stree = as_schedule_tree(sdfg)
print(stree.as_string())
will print:
for i = 0; (i < 10); i = i + 1:
map j in [0:10]:
atile = copy A[i, 0:10]
for k = 0; (k < 10); k = (k + 1):
C[i, j] = tasklet(atile[k], B(10) [k, j], C[i, j])
There are some new transformation classes and passes in dace.sdfg.analysis.schedule_tree.passes
, for example, to remove empty control flow scopes:
class RemoveEmptyScopes(tn.ScheduleNodeTransformer):
def visit_scope(self, node: tn.ScheduleTreeScope):
if len(node.children) == 0:
return None
return self.generic_visit(node)
We hope you find new ways to analyze and optimize DaCe programs with this feature!
Other Major Changes
- Support for tensor linear algebra (transpose, dot products) by @alexnick83 in #1309
- (Experimental) support for nested data containers and structures by @alexnick83 in #1324
- (Experimental) basic support for mpi4py syntax by @alexnick83 and @Com1t in #1070 and #1288
- (Experimental) Added support for a subset of F77 and F90 language features by @acalotoiu and @mcopik #1275, #1293, #1349 and #1367
Minor Changes
- Support for Python 3.12 by @alexnick83 in #1386
- Support attributes in symbolic expressions by @tbennun in #1369
- GPU User Experience Improvements by @tbennun in #1283
- State Fusion Extension with happens before dependency edge by @acalotoiu in #1268
- Add
CPU_Persistent
map schedule (OpenMP parallel regions) by @tbennun in #1330
Fixes and Smaller Changes:
- Fix transient bug in test with
array_equal
of empty arrays by @tbennun in #1374 - Fixes GPUTransform bug when data are already in GPU memory by @alexnick83 in #1291
- Fixed erroneous parsing of data slices when the data are defined inside a nested scope by @alexnick83 in #1287
- Disable OpenMP sections by default by @tbennun in #1282
- Make SDFG.name a proper property by @phschaad in #1289
- Refactor and fix performance regression with GPU runtime checks by @tbennun in #1292
- Fixed RW dependency violation when accessing data attributes by @alexnick83 in #1296
- Externally-managed memory lifetime by @tbennun in #1294
- External interaction fixes by @tbennun in #1301
- Improvements to RefineNestedAccess by @alexnick83 and @Sajohn-CH in #1310
- Fixed erroneous parsing of while-loop conditions by @alexnick83 in #1313
- Improvements to MapFusion when the Map bodies contain NestedSDFGs by @alexnick83 in #1312
- Fixed erroneous code generation of indirected accesses by @alexnick83 in #1302
- RefineNestedAccess take indices into account when checking for missing free symbols by @Sajohn-CH in #1317
- Fixed SubgraphFusion erroneously removing/merging intermediate data nodes by @alexnick83 in #1307
- Fixed SDFG DFS traversal missing InterstateEdges by @alexnick83 in #1320
- Frontend now uses the AST nodes' context to infer read/write accesses by @alexnick83 in #1297
- Added capability for non-strict shape validation by @alexnick83 in #1321
- Fixes for persistent schedule and GPUPersistentFusion transformation by @tbennun in #1322
- Relax test for inter-state edges in default schedules by @tbennun in #1326
- Improvements to inference of an SDFGState's read and write sets by @Sajohn-CH in #1325 and #1329
- Fixed ArrayElimination pass trying to eliminate data that were already removed in #1314
- Bump certifi from 2023.5.7 to 2023.7.22 by @dependabot in #1332
- Fix some underlying issues with tensor core sample by @computablee in #1336
- Updated hlslib to support Xilinx Vitis >=2022.2 by @carljohnsen in #1340
- Docs: mention FPGA backend tested with Intel Quartus PRO by @TizianoDeMatteis in #1335
- Improved validation of NestedSDFG connectors by @alexnick83 in #1333
- Remove unused global data descriptor shapes from arguments by @tbennun in #1338
- Fixed Scalar data validation in NestedSDFGs by @alexnick83 in #1341
- Fix for None set properties by @tbennun in #1345
- Add Object to defined types in code generation and some documentation by @tbennun in #1343
- Fix symbolic parsing for ternary operators by @tbennun in #1346
- Fortran fix memlet indices by @Sajohn-CH in #1342
- Have memory type as argument for fpga auto interleave by @TizianoDeMatteis in #1352
- Eliminate extraneous branch-end gotos in code generation by @tbennun in #1355
- TaskletFusion: Fix additional edges in case of none-connectors by @lukastruemper in #1360
- Fix dynamic memlet propagation condition by @tbennun in #1364
- Configurable GPU thread/block index types, minor fixes to integer code generation and GPU runtimes by @tbennun in #1357
New Contributors
- @computablee made their first contribution in #1290
- @Com1t made their first contribution in #1288
- @mcopik made their first contribution in #1349
Full Changelog: v0.14.4...v0.15
DaCe 0.14.4
Minor release; adds support for Python 3.11.
DaCe 0.14.3
What's Changed
Scope Schedules
The schedule type of a scope (e.g., a Map) is now also determined by the surrounding storage. If the surrounding storage is ambiguous, dace will fail with a nice exception. This means that codes such as the one below:
@dace.program
def add(a: dace.float32[10, 10] @ dace.StorageType.GPU_Global,
b: dace.float32[10, 10] @ dace.StorageType.GPU_Global):
return a + b @ b
will now automatically run the +
and @
operators on the GPU.
DaCe Profiler
Easier interface for profiling applications: dace.profile
and dace.instrument
can now be used within Python with a simple API:
with dace.profile(repetitions=100) as profiler:
some_program(...)
# ...
other_program(...)
# Print all execution times of the last called program (other_program)
print(profiler.times[-1])
Where instrumentation is applied can be controlled with filters in the form of strings and wildcards, or with a function:
with dace.instrument(dace.InstrumentationType.GPU_Events,
filter='*add??') as profiler:
some_program(...)
# ...
other_program(...)
# Print instrumentation report for last call
print(profiler.reports[-1])
With dace.builtin_hooks.instrument_data
, the same technique can be applied to instrument data containers.
Improved Data Instrumentation
Data container instrumentation can further now be used conditionally, allowing saving and restoring of data container contents only if certain conditions are met. In addition to this, data instrumentation now saves the SDFG's symbol values at the time of dumping data, allowing an entire SDFG's state / context to be restored from data reports.
Restricted SSA for Scalars and Symbols
Two new passes (ScalarFission
and StrictSymbolSSA
) allow fissioning of scalar data containers (or arrays of size 1) and symbols into separate containers and symbols respectively, based on the scope or reach of writes to them. This is a form of restricted SSA, which performs SSA wherever possible without introducing Phi-nodes. This change is made possible by a set of new analysis passes that provide the scope or reach of each write to scalars or symbols.
Extending Cutout Capabilities
SDFG Cutouts can now be taken from more than one state.
Additionally, taking cutouts that only access a subset of a data containre (e.g., A[2:5]
from a data container A
of size N
) results in the cutout receiving an "Alibi Node" to represent only that subset of the data (A_cutout[0:3] -> A[2:5]
, where A_cutout
is of size 4). This allows cutouts to be significantly smaller and have a smaller memory footprint, simplifying debugging and localized optimization.
Finally, cutouts now contain an exact description of their input and output configuration. The input configuration is anything that may influence a cutout's behavior and may contain data before the cutout is executed in the context of the original SDFG. Similarly, the output configuration is anything that a cutout writes to, that may be read externally or may influence the behavior of the remaining SDFG. This allows isolating all side effects of changes to a particular cutout, allowing transformations to be tested and verified in isolation and simplifying debugging.
Bug Fixes, Compatability Improvements, and Other Changes
- SymPy 1.12 Compatibility by @alexnick83 in #1256
- GPU Grid-Strided Tiling by @C-TC in #1249
- Fix MapInterchange for Maps with dynamic inputs by @alexnick83 in #1244
- Assortment of fixes for dynamic Maps on GPU (dynamic thread blocks) by @alexnick83 in #1246
- Tuning Compatibility Fixes by @lukastruemper in #1234
- Inline preprocessor command by @tbennun in #1242
unsqueeze_memlet
fixes by @alexnick83 in #1203- Fix-intermediate-nodes by @alexnick83 in #1212
- Fix for LoopToMap when applied on multi-nested loops by @alexnick83 in #1207
- Fix-nested-sdfg-deepcopy by @alexnick83 in #1221
- Fix integer division in Python frontend by @tbennun in #1196
- Fix augmented assignment on scalar in condition by @tbennun in #1225
- Fix internal subscript access if already existed by @tbennun in #1228
- Fix atomic operation detection for exactly-overlapping ranges by @tbennun in #1230
- Fix-gpu-transform-copy-out by @alexnick83 in #1231
- Fix-interstate-free-symbols by @alexnick83 in #1238
- Fix nested access with nested symbol dependency by @alexnick83 in #1239
- Fix import in the transformations tutorial. by @lamyiowce in #1210
- LoopToMap detects shared transients by @alexnick83 in #1200
- Faster CI and reachability checks for codecov.io by @tbennun in #1213
- Map-fission-single-data-multi-connectors by @alexnick83 in #1216
- Add library path to HIP CMake by @tbennun in #1219
- BatchedMatMul: MKL gemm_batch support by @lukastruemper in #1181
Full Changelog: v0.14.2...v0.14.3
Please let us know if there are any regressions with this new release.
DaCe 0.14.2
What's Changed
- GPU instrumentation support with LIKWID by @lukastruemper
- New GPU expansion for the Reduce Library Node by @hodelcl
- CSRMM and CSRMV Library Nodes by @alexnick83, @lukastruemper, and @C-TC
- New transformations (Temporal Vectorization, HBM Transform) and other FPGA improvements by @carljohnsen, @jnice-81, @sarahtr, and @TizianoDeMatteis
- AMD GPU-related fixes and rocBLAS GEMM by @tbennun
Full Changelog: v0.14.1...v0.14.2
DaCe 0.14.1
This release of DaCe offers mostly stability fixes for the Python frontend, transformations, and callbacks.
Full Changelog: v0.14...v0.14.1
DaCe 0.14
What's Changed
This release brings forth a major change to how SDFGs are simplified in DaCe, using the Simplify pass pipeline. This both improves the performance of DaCe's transformations and introduces new types of simplification, such as dead dataflow elimination.
Please let us know if there are any regressions with this new release.
Features
- Breaking change: The experimental
dace.constant
type hint has now achieved stable status and was renamed todace.compiletime
- Major change: Only modified configuration entries are now stored in
~/.dace.conf
. The SDFG build folders still include the full configuration file. Old.dace.conf
files are detected and migrated automatically. - Detailed, multi-platform performance counters are now available via native LIKWID instrumentation (by @lukastruemper in #1063). To use, set
.instrument
todace.InstrumentationType.LIKWID_Counters
- GPU Memory Pools are now supported through CUDA's
mallocAsync
API. To enable, setdesc.pool = True
on any GPU data descriptor. - Map schedule and array storage types can now be annotated directly in Python code (by @orausch in #1088). For example:
import dace
from dace.dtypes import StorageType, ScheduleType
N = dace.symbol('N')
@dace
def add_on_gpu(a: dace.float64[N] @ StorageType.GPU_Global,
b: dace.float64[N] @ StorageType.GPU_Global):
# This map will become a GPU kernel
for i in dace.map[0:N] @ ScheduleType.GPU_Device:
b[i] = a[i] + 1.0
- Customizing GPU block dimension and OpenMP threading properties per map is now supported
- Optional arrays (i.e., arrays that can be None) can now be annotated in the code. The simplification pipeline also infers non-optional arrays from their use and can optimize code by eliminating branches. For example:
@dace
def optional(maybe: Optional[dace.float64[20]], always: dace.float64[20]):
always += 1 # "always" is always used, so it will not be optional
if maybe is None: # This condition will stay in the code
return 1
if always is None: # This condition will be eliminated in simplify
return 2
return 3
Minor changes
- Miscellaneous fixes to transformations and passes
- Fixes for string literal (
"string"
) use in the Python frontend einsum
is now a library node- If CMake is already installed, it is now detected and will not be installed through
pip
- Add kernel detection flag by @TizianoDeMatteis in #1061
- Better support for
__array_interface__
objects by @gronerl in #1071 - Replacements look up base classes by @tbennun in #1080
Full Changelog: v0.13.3...v0.14