UCC v1.2.0
This release includes numerous updates, bug fixes, and improvements across various components. The following is a summary of the changes based on the commit messages:
New Features and Enhancements
CL/HIER
- Fixed single proc on node issue in alltoall (#658)
- Implemented allreduce rab pipelined (#608)
- Added bcast 2step algorithm (#620)
- Fixed allreduce rab pipeline (#759)
TL/CUDA
- Support for CUDA 12
- Fixed cache unmap issue (#642)
- Implemented reduce scatter linear (#669)
- Added algorithm selection based on topology (#688)
- Fixed linear algorithms (#751)
- Fixed pipelining in linear rs (#770)
TL/UCP
- Added special service worker (#560)
- Added scatterv (#663)
- Added gatherv (#664)
- Fixed running with npolls 0 (#695)
- Added knomial allgather (#729)
- Fixed bug for triggered colls (#757)
- Added bruck alltoall (#756)
- Added SLOAV alltoallv (#687)
- Large message broadcast optimizations (#738)
- Ranks reordering in ring allgather for better locality(#69)
TL/SHARP
- Fixed memory type check in allreduce (#662)
- Added support for sharpv3 dt (#661)
- Fixed assert check (#686)
- Implemented SHARP OOB fixes (#746)
- Fixed local rank when NODE SBGP not enabled (#760)
- Prevented sharp team with team max ppn > 1 (#761)
CORE
- Fixed memory type score update (#650)
- Fixed ucc parser build (#666)
- Implemented ucc_pipeline_params (#675)
- Changed log level of config_modify (#667)
- Fixed timeout handle for triggered post (#679)
DOCS
- Added User Guide (#720)