Skip to content

Commit

Permalink
C API RFC
Browse files Browse the repository at this point in the history
  • Loading branch information
rscohn2 committed Aug 6, 2024
1 parent 27bf193 commit a5a4356
Showing 1 changed file with 126 additions and 0 deletions.
126 changes: 126 additions & 0 deletions rfcs/20240806-c-api/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
# C API Design Document (RFC)


## Introduction

The oneCCL communication library’s current APIs is defined in the [oneAPI
specification][ccl-spec]. However, other APIs used by similar collective
communication libraries differ from those used by oneCCL. For example, see
[NCCL][nccl-spec] from Nvidia, [RCCL][rccl-spec] from AMD, and hccl from
Habana. This RFC asks for feedback about aligning the oneCCL APIs to be closer
to other vendor libraries, since this facilitates integration with frameworks
and upstreaming to the open source.

One difference between oneCCL and other vendors communication libraries is that
all other communication libraries have a C API, while oneCCL has a C++ API.
This is because oneCCL was designed to integrate with SYCL, which is based on
C++. One of the goals of oneCCL is to support different hardware and vendors,
such as Intel Data Center GPU Max Series, Intel Core and Intel Xeon family,
Intel Gaudi, Nvidia or AMD GPUs, among others.

[ccl-spec]: https://uxlfoundation.github.io/oneAPI-spec/spec/elements/oneCCL/source/index.html
[hccl-spec]: https://docs.habana.ai/en/latest/API_Reference_Guides/HCCL_APIs/C_API.html
[nccl-spec]: https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/api.html
[rccl-spec]: https://rocm.docs.amd.com/projects/rccl/en/latest/api-reference/api-library.html#api-library

## Proposal

The proposal is to define a C-like API that aligns with current APIs in other
communication libraries, while introducing a few changes, as described next:

1. Most APIs are C-based like other communication libraries. C++ data
structures are hidden behind handles returned to the user, such as
`ccl::stream` and `ccl::comm`.

2. The API is extended with two C++ API functions to support `sycl::queue`:

- `onecclResult_t onecclCreateStream(sycl::queue, &oneccl_stream)`
- `onecclResult_t onecclReleaseStream(oneccl_stream)`

Once the sycl::queue is registered, it is hidden behind the ccl stream
handle

3. Add functions to allow users to explicitly control the lifetime of objects,
instead of relying on the C++ destructors

- `onecclResult_t onecclCommFinalize(comm)`
- `onecclResult_t onecclCommDestroy(comm)`

4. Drop support for out-of-order SYCL queue and SYCL buffers. The current
oneCCL library support out of order SYCL queues, but this feature is not
used by the users of the library. In general, the collective operations are
submitted to an in-order queue. When out-of order behavior is required,
commands are submitted to a different in-order queue, and the two queues are
synchronized.

5. Drop support for SYCL buffers. Only [Unified Shared Memory][usm-example] is
supported.

[usm-example]: https://www.intel.com/content/www/us/en/developer/articles/code-sample/dpcpp-usm-code-sample.html

### APIs

The tables below contain the NCCL API, the corresponding new proposed oneCCL
API, and the current oneCCL API.

#### APIs related with communicator creation.

| NCCL | oneCCL (proposed C) | oneCCL (current, C++) |
|-------------------|------------------------------|-------------------------|
|`cudaError_t` |`onecclResult_t cudaSetDevice(device)(1)`| N/A |
|`ncclResult_t ncclGetUniqueId (id)`| `onecclResult_t onecclGetUniqueId (id)`| `ccl::create_main_kvs(); ccl::create_kvs(main_addr);`|
|`ncclResult_t ncclCommInitRank(comm, size, id, rank)`|`onecclResult_t onecclCommInitRank(comm, size, id, rank)`|`comm cl::create_communicator(size, rank, device, context, kvs) comms ccl:create_communicators(size, rank, device, context, kvs)`|
|`ncclResult_t ncclCommInitRankConfig(comm, size, id, rank, attr)`|`onecclResult_t onecclCommInitRankConfig(comm, size, id, rank, attr)`|`comm ccl:create_communicator(size, rank, device, context, kvs, attr)`|
|`ncclResult_t ncclCommInitAll (comms, ndev, dev_list)`|`onecclResult_t onecclCommInitAll(comms,ndev,dev_list)`| Not currently available.Working on adding support.|
|`ncclCommSplit` | Not implemented | Not implemented |
|`nccltResult ncclCommFinalize(comm)`|`onecclResult_t onecclCommFinalize(comm)`| N/A |
|`ncclResult_t ncclCommDestroy(comm)`|`onecclResult_t onecclCommDestroy(comm)`| Destructor |

Notice that cudaSetDevice(device) is a CUDA call, not a NCCL call. If an
equivalent call is available in SYCL (or calling language), the proposed
onecclSetDevice(device) will not be needed.

#### APIs related with Collective Communication operations

| NCCL | oneCCL (proposed C) | oneCCL (current, C++) |
|-------------------|------------------------------|-------------------------|
|`ncclResult_t ncclAllgather (sendbuff,recvbuff,count, datatype, op, comm, stream)`|`onecclResult_t onecclAllgather(sendbuff, recvbuff, count, datatype, op, comm, oneccl_stream)`|`ccl::event communicator::allgather (2) (sendbuff, recvbuff, count, datatype, op, comm, oneccl_stream, deps)`|
|`ncclResult_t ncclAllreduce(sendbuff,recvbuff, count, datatype, op, comm, stream)`|`onecclResult_t onecclAllreduce(sendbuff, recvbuff, count, datatype, op, comm, oneccl_stream)`|`ccl::event
communicator::allreduce(sendbuff, recvbuff, count, datatype, op, comm, oneccl_stream, deps)`|
|`ncclResult_t ncclBroadcast(sendbuff,recvbuff,count, datatype, op, comm, stream)`|`onecclResult_t onecclBroadcast(sendbuff, recvbuff, count, datatype, op, comm, oneccl_stream)`|`ccl::event communicator::broadcast (3) (sendbuff, recvbuff, count, datatype, op, comm, oneccl_stream, deps)`|
|`ncclResult_t ncclReduce(sendbuff,recvbuff,count, datatype, op, comm, stream)`|`onecclResult_t onecclReduce(sendbuff, recvbuff, count, datatype, op, comm, oneccl_stream)`|`ccl::event communicator::reduce(sendbuff, recvbuff, count, datatype, op, comm, oneccl_stream, deps)`|
|`ncclResult_t ncclReduceScatter(sendbuff,recvbuff, count, datatype, op, comm, stream)`|`onecclResult_t onecclReduceScatter(sendbuff, recvbuff, count, datatype, op, comm, oneccl_stream)`|`ccl::event communicator::reduce_scatter(sendbuff, recvbuff, count, datatype, op, comm, oneccl_stream, deps)`|
| N/A |`onecclAlltoall onecclAlltoallv` We could deprecate|`communicator::alltoall communicator::alltoallv`|
| N/A |`onecclBarrier` We could deprecate and use Allreduce with 1 Byte|`ccl::event communicator::barrier`|

- Currently oneCCL contains Allgatherv, but this will be deprecated in the
future
- The current API is slightly different, but the next oneCCL release will align
the Broadcast with the one shown here

#### Group APIs

| NCCL | oneCCL (proposed C) | oneCCL (current, C++) |
|-------------------|------------------------------|-------------------------|
|`ncclResult_t ncclGroupStart()`|`onecclResult_t onecclGroupStart()`| N/A |
|`ncclResult_t ncclGroupEnd()` |`onecclResult_t onecclGroupEnd()` | N/A |

#### Point to Point APIs

| NCCL | oneCCL (proposed C) | oneCCL (current, C++) |
|-------------------|------------------------------|-------------------------|
|`ncclResult_t ncclSend(sendbuf, count, datatype, peer, comm, stream)`|`onecclResult_t onecclSend(sendbuf, count, datatype, peer, comm, oneccl_stream)`|`ccl::event communicator::send(sendbuf, count,datatype, peer, comm, oneccl_stream)`|
|`ncclResult_t ncclRecv(…)`|`onecclResult_t onecclRecv(…)`|`communicator::recv`|

#### Other APIs

| NCCL | oneCCL (proposed C) | oneCCL (current, C++) |
|-------------------|------------------------------|-------------------------|
|`ncclResult_t ncclCommCount(comm, size)`|`onecclResult_t onecclCommCount(comm, size)`|`size communicator::size()`|
|`ncclResult_t ncclCommCuDevice(comm, device)`|`onecclResult_t onecclCommGetDevice(comm, device)`|`device communicator::get_device()`|
|`ncclResult_t ncclCommUserRank(comm, rank)`|`onecclResult_t onecclCommUserRank(comm, rank)`|`rank communicator::rank()`|
|`ncclResult_t ncclGetVersion(version)`|`onecclResult_t onecclGetVersion(version)`|`version ccl:get_library_version()`|
|`ncclCommAbort` | Not implemented | N/A |
|`ncclCommGetAsyncError`| Not implemented | N/A |
|`ncclGetLastError` | Not implemented | N/A |
|`ncclGetErrorString`| Not implemented | N/A |

0 comments on commit a5a4356

Please sign in to comment.