v1.17.0 RC2
Pre-release
Pre-release
1.17.0 RC2 (May 29, 2024)
Features:
UCP
- Improved the accuracy of rendezvous protocol performance estimation
- Enabled short protocol for non-host memory types on empty messages
- Improved the accuracy of performance estimation for empty messages by removing non-relevant overheads
- Added RMA_ZCOPY_MAX_SEG_SIZE configuration parameter to allow modifying segment size for RMA-ZCOPY protocols
- Added support for separate intra/inter-node rendezvous thresholds
- Added support for minimal fragment size in rendezvous protocol
- Added support for resetting request during send operation
- Added UCX_PROTO_OVERHEAD configuration variable to allow setting protocol overheads
- Improved performance for combined Active Message/RMA scenarios by separating them to different lanes
- Added support for device staging buffers in pipeline protocols
- Enabled on-demand paging for Nvidia's Grace platforms by default
RDMA CORE (IB, ROCE, etc.)
- Introduced the UCX_REVERSE_SL environment variable to configure reverse SL for DC transport. By default, it uses UCX_IB_SL.
- Added support for GID auto-detection in Floating LID based routing
- Added support for multithreading KSM registration of unaligned buffers
- Added IB_SEND_OVERHEAD and MM_[SEND|RECV]_OVERHEAD configuration variables
GPU (CUDA, ROCM)
- Added support for oneAPI Level-Zero library for Intel GPUs
UCS
- Added support for rcache dynamic region alignment
- Added dynamic bitmap data structure
- Added support for advanced key-value parsing for UCX configuration
- Added piecewise linear function data structure
- Added support for allocating dynamic arrays on stack
Tools
- Added support for device memory allocation in UCX perftest
- Added a script to use for squashing commits after PR approval
- Added support for DPU cross-gvmi daemon in UCX perftest
Java
- Added support for EP local socket address API in JUCX
Build
- Added address sanitizer support
- Added a helper shell script to run static checks
AZP
- Replaced Valgrind tests with address sanitizer tool
- Added Ubuntu 22.04 docker image testing
Configuration
- Added support for filtering configuration sections by platform type
- Added configuration file with section for Grace Hopper
Bugfixes:
UCP
- Fixed crash due to incorrect lane selection when active message is disabled
- Fixed RMA lane selection issue due to wrong bandwidth calculation
- Fixed rendezvous protocol information in protocol details table
- Fixed endpoint reconfiguration issue due to wrong bandwidth calculation
- Fixed Active Message handlers issue due to out of order registration
- Fixed registration of memh evens for imported memory key
- Fixed sockaddr unreachable destination error handling
- Fixed uninitialized memory issue in new protocols infrastructure
- Fixed race condition when using strong fence by flushing all endpoints
- Fixed incorrect RMA message size on immediate completion with no datatype
- Fixed incorrect performance estimation due to fp8 pack/unpack issue
- Fixed remote access error when rcache memory is not registered with atomic access
- Fixed assertion failure when rcache fails during memh allocation
- Fixed atomic device selection issue
- Fixed worker interface deactivation while still in use by endpoints
RDMA CORE (IB, ROCE, etc.)
- Disabled device memory if atomics are not available
- Fixed indirect keys creation for MT registered memory
- Fixed KSM start address value when creating export key
- Fixed DCI pool index to support maximum of 16 pools
- Fixed atomic rkey issue when using imported memory
- Fixed crash due to unsupported SRQ capability
GPU (CUDA, ROCM)
- Removed unused environment variable RCACHE_ADDR_ALIGN from ROCm transport
- Fixed usage of cuda device 0 when no context is active
- Removed error handling support from CUDA IPC transport
- Fixed allocation of unaligned CUDA memory
Shared Memory
- Fixed occasional crash when shm_unlink fails during interface initialization
UCS
- Fixed system device distance calculation for devices on different PCIe root
- Fixed support for large size arrays in ucs_array
- Fixed synchronization issue in rcache
Tests
- Fixed test failures when GPU is present but disabled
- Fixed Active Message hanging issue in ucp_client_server
- Fixed potential crash due to redundant munmap call in ucp mmap tests
- Fixed a crash when running CUDA gtest under valgrind
- Fixed UD endpoint timeout issue under Valgrind
Java
- Fixed failures in Java tests by waiting for send requests completion
- Fixed JVM segfault in Java tests when gdrcopy driver is not loaded
- Fixed go build and go tests failures
Packaging
- Disabled Go bindings in Debian package