Skip to content

v1.17.0 RC2

Pre-release
Pre-release
Compare
Choose a tag to compare
@shasson5 shasson5 released this 03 Jun 08:10
9cec0d4

1.17.0 RC2 (May 29, 2024)

Features:

UCP

  • Improved the accuracy of rendezvous protocol performance estimation
  • Enabled short protocol for non-host memory types on empty messages
  • Improved the accuracy of performance estimation for empty messages by removing non-relevant overheads
  • Added RMA_ZCOPY_MAX_SEG_SIZE configuration parameter to allow modifying segment size for RMA-ZCOPY protocols
  • Added support for separate intra/inter-node rendezvous thresholds
  • Added support for minimal fragment size in rendezvous protocol
  • Added support for resetting request during send operation
  • Added UCX_PROTO_OVERHEAD configuration variable to allow setting protocol overheads
  • Improved performance for combined Active Message/RMA scenarios by separating them to different lanes
  • Added support for device staging buffers in pipeline protocols
  • Enabled on-demand paging for Nvidia's Grace platforms by default

RDMA CORE (IB, ROCE, etc.)

  • Introduced the UCX_REVERSE_SL environment variable to configure reverse SL for DC transport. By default, it uses UCX_IB_SL.
  • Added support for GID auto-detection in Floating LID based routing
  • Added support for multithreading KSM registration of unaligned buffers
  • Added IB_SEND_OVERHEAD and MM_[SEND|RECV]_OVERHEAD configuration variables

GPU (CUDA, ROCM)

  • Added support for oneAPI Level-Zero library for Intel GPUs

UCS

  • Added support for rcache dynamic region alignment
  • Added dynamic bitmap data structure
  • Added support for advanced key-value parsing for UCX configuration
  • Added piecewise linear function data structure
  • Added support for allocating dynamic arrays on stack

Tools

  • Added support for device memory allocation in UCX perftest
  • Added a script to use for squashing commits after PR approval
  • Added support for DPU cross-gvmi daemon in UCX perftest

Java

  • Added support for EP local socket address API in JUCX

Build

  • Added address sanitizer support
  • Added a helper shell script to run static checks

AZP

  • Replaced Valgrind tests with address sanitizer tool
  • Added Ubuntu 22.04 docker image testing

Configuration

  • Added support for filtering configuration sections by platform type
  • Added configuration file with section for Grace Hopper

Bugfixes:

UCP

  • Fixed crash due to incorrect lane selection when active message is disabled
  • Fixed RMA lane selection issue due to wrong bandwidth calculation
  • Fixed rendezvous protocol information in protocol details table
  • Fixed endpoint reconfiguration issue due to wrong bandwidth calculation
  • Fixed Active Message handlers issue due to out of order registration
  • Fixed registration of memh evens for imported memory key
  • Fixed sockaddr unreachable destination error handling
  • Fixed uninitialized memory issue in new protocols infrastructure
  • Fixed race condition when using strong fence by flushing all endpoints
  • Fixed incorrect RMA message size on immediate completion with no datatype
  • Fixed incorrect performance estimation due to fp8 pack/unpack issue
  • Fixed remote access error when rcache memory is not registered with atomic access
  • Fixed assertion failure when rcache fails during memh allocation
  • Fixed atomic device selection issue
  • Fixed worker interface deactivation while still in use by endpoints

RDMA CORE (IB, ROCE, etc.)

  • Disabled device memory if atomics are not available
  • Fixed indirect keys creation for MT registered memory
  • Fixed KSM start address value when creating export key
  • Fixed DCI pool index to support maximum of 16 pools
  • Fixed atomic rkey issue when using imported memory
  • Fixed crash due to unsupported SRQ capability

GPU (CUDA, ROCM)

  • Removed unused environment variable RCACHE_ADDR_ALIGN from ROCm transport
  • Fixed usage of cuda device 0 when no context is active
  • Removed error handling support from CUDA IPC transport
  • Fixed allocation of unaligned CUDA memory

Shared Memory

  • Fixed occasional crash when shm_unlink fails during interface initialization

UCS

  • Fixed system device distance calculation for devices on different PCIe root
  • Fixed support for large size arrays in ucs_array
  • Fixed synchronization issue in rcache

Tests

  • Fixed test failures when GPU is present but disabled
  • Fixed Active Message hanging issue in ucp_client_server
  • Fixed potential crash due to redundant munmap call in ucp mmap tests
  • Fixed a crash when running CUDA gtest under valgrind
  • Fixed UD endpoint timeout issue under Valgrind

Java

  • Fixed failures in Java tests by waiting for send requests completion
  • Fixed JVM segfault in Java tests when gdrcopy driver is not loaded
  • Fixed go build and go tests failures

Packaging

  • Disabled Go bindings in Debian package