Skip to content

Commit

Permalink
Apply suggestions from code review
Browse files Browse the repository at this point in the history
Co-authored-by: srawat <[email protected]>
  • Loading branch information
neon60 and SwRaw committed Oct 15, 2024
1 parent b233a04 commit 0e7c61b
Show file tree
Hide file tree
Showing 4 changed files with 76 additions and 69 deletions.
36 changes: 18 additions & 18 deletions docs/how-to/hip_runtime_api.rst
Original file line number Diff line number Diff line change
@@ -1,35 +1,35 @@
.. meta::
:description: This chapter describes the HIP runtime API and shows
how to use it.
:description: HIP runtime API usage
:keywords: AMD, ROCm, HIP, CUDA, HIP runtime API How to,

.. _hip_runtime_api_how-to:

********************************************************************************
HIP Runtime API
HIP runtime API
********************************************************************************

The HIP runtime API provides C and C++ functionality to manage GPUs, like event,
stream and memory management. On AMD platforms the HIP runtime uses the
:doc:`Common Language Runtime (CLR) <hip:understand/amd_clr>`, while on NVIDIA
platforms it is only a thin layer over the CUDA runtime or Driver API.
The HIP runtime API provides C and C++ functionalities to manage event, stream,
and memory on GPUs. On AMD ROCm software, the HIP runtime uses :doc:`Common
Language Runtime (CLR) <hip:understand/amd_clr>`, while on NVIDIA CUDA platform,
it is only a thin layer over the CUDA runtime or Driver API.

- **CLR** contains source code for AMD's compute language runtimes: ``HIP`` and
``OpenCL™``. CLR includes the implementation of the ``HIP`` on the AMD
platform `hipamd <https://github.com/ROCm/clr/tree/develop/hipamd>`_ and the
Radeon Open Compute Common Language Runtime (rocclr). rocclr is a virtual
device interface, that enables the HIP runtime to interact with different
backends such as :doc:`ROCr <rocr-runtime:index>` on Linux or PAL on Windows. CLR also include the
implementation of `OpenCL runtime <https://github.com/ROCm/clr/tree/develop/opencl>`_.
``OpenCL™``. CLR includes the ``HIP`` implementation on the AMD
platform: `hipamd <https://github.com/ROCm/clr/tree/develop/hipamd>`_ and the
Radeon Open Compute Common Language Runtime (``rocclr``). ``rocclr`` is a
virtual device interface that enables the HIP runtime to interact with
different backends such as :doc:`ROCr <rocr-runtime:index>` on Linux or PAL on
Windows. CLR also includes the `OpenCL runtime <https://github.com/ROCm/clr/tree/develop/opencl>`_
implementation.
- The **CUDA runtime** is built on top of the CUDA driver API, which is a C API
with lower-level access to NVIDIA GPUs. For further information about the CUDA
driver and runtime API and its relation to HIP check the :doc:`CUDA driver API porting guide<hip:how-to/hip_porting_driver_api>`.
with lower-level access to NVIDIA GPUs. For details about the CUDA driver and
runtime API with reference to HIP, see :doc:`CUDA driver API porting guide <hip:how-to/hip_porting_driver_api>`.

The relation between the different runtimes and their backends is presented in
the following figure.
The backends of HIP runtime API under AMD and NVIDIA platform are summarized in
the following figure:

.. figure:: ../data/how-to/hip_runtime_api/runtimes.svg

.. note::

The CUDA specific headers can be found in the `hipother repository <https://github.com/ROCm/hipother>`_.
For CUDA-specific headers, see the `hipother repository <https://github.com/ROCm/hipother>`_.
91 changes: 45 additions & 46 deletions docs/how-to/hip_runtime_api/memory_management/coherence_control.rst
Original file line number Diff line number Diff line change
Expand Up @@ -111,11 +111,12 @@ following table:
- ``hipMemAdviseSetCoarseGrain``
- Coarse-grained

:sup:`1` The :cpp:func:`hipHostMalloc` memory allocation coherence mode can be
affected by the ``HIP_HOST_COHERENT`` environment variable, if the
``hipHostMallocCoherent``, ``hipHostMallocNonCoherent``, and ``hipHostMallocMapped``
are unset. If neither these flags nor the
``HIP_HOST_COHERENT`` environment variable is set, or set as 0, the host memory allocation is coarse-grained.
:sup:`1` The :cpp:func:`hipHostMalloc` memory allocation coherence mode can be
affected by the ``HIP_HOST_COHERENT`` environment variable, if the
``hipHostMallocCoherent``, ``hipHostMallocNonCoherent``, and
``hipHostMallocMapped`` are unset. If neither these flags nor the
``HIP_HOST_COHERENT`` environment variable is set, or set as 0, the host memory
allocation is coarse-grained.

.. note::

Expand All @@ -127,10 +128,10 @@ are unset. If neither these flags nor the
Visibility of synchronization functions
================================================================================

The fine-grained coherence memory is visible at synchronization points, however
at coarse-grained coherence, it depends on the used synchronization function.
The synchronization functions effect and visibility on different coherence
memory types collected in the following table.
The fine-grained coherence memory is visible at the synchronization points,
however the visibility of coarse-grained memory depends on the synchronization
function used. The effect and visibility of various synchronization functions on
fine- and coarse-grained memory types are listed here:

.. list-table:: HIP synchronize functions effect and visibility

Expand All @@ -139,43 +140,41 @@ memory types collected in the following table.
- :cpp:func:`hipDeviceSynchronize`
- :cpp:func:`hipEventSynchronize`
- :cpp:func:`hipStreamWaitEvent`
* - Synchronization Effect
- host waits for all commands in the specified stream to complete
- host waits for all commands in all streams on the specified device to complete
- host waits for the specified event to complete
- stream waits for the specified event to complete
* - Synchronization effect
- Host waits for all commands in the specified stream to complete
- Host waits for all commands in all streams on the specified device to complete
- Host waits for the specified event to complete
- Stream waits for the specified event to complete
* - Fence
- system-scope release
- system-scope release
- system-scope release
- none
- System-scope release
- System-scope release
- System-scope release
- None
* - Fine-grained host memory visibility
- yes
- yes
- yes
- yes
- Yes
- Yes
- Yes
- Yes
* - Coarse-grained host memory visibility
- yes
- yes
- depends on the used event.
- no

Developers can control the release scope for :cpp:func:`hipEvents`:

* By default, the GPU performs a device-scope acquire and release operation
with each recorded event. This will make host and device memory visible to
other commands executing on the same device.

A stronger system-level fence can be specified when the event is created with
:cpp:func:`hipEventCreateWithFlags`:

* ``hipEventReleaseToSystem``: Perform a system-scope release operation
when the event is recorded. This will make **both fine-grained and
coarse-grained host memory visible to other agents in the system**, but may
involve heavyweight operations such as cache flushing. Fine-grained memory
will typically use lighter-weight in-kernel synchronization mechanisms such as
an atomic operation and thus does not need to use.
``hipEventReleaseToSystem``.
* ``hipEventDisableTiming``: Events created with this flag will not
record profiling data and provide the best performance if used for
synchronization.
- Yes
- Yes
- Depends on the used event.
- No

You can control the release scope for hipEvents. By default, the GPU performs a
device-scope acquire and release operation with each recorded event. This makes
the host and device memory visible to other commands executing on the same
device.

:cpp:func:`hipEventCreateWithFlags`: You can specify a stronger system-level
fence by creating the event with ``hipEventCreateWithFlags``:

* ``hipEventReleaseToSystem``: Performs a system-scope release operation when
the event is recorded. This makes both fine-grained and coarse-grained host
memory visible to other agents in the system, which might also involve
heavyweight operations such as cache flushing. Fine-grained memory typically
uses lighter-weight in-kernel synchronization mechanisms such as an atomic
operation and thus doesn't need to use ``hipEventReleaseToSystem``.

* ``hipEventDisableTiming``: Events created with this flag don't record
profiling data, which significantly improves synchronization performance.
3 changes: 0 additions & 3 deletions docs/how-to/hip_runtime_api/memory_management/host_memory.rst
Original file line number Diff line number Diff line change
Expand Up @@ -183,9 +183,6 @@ The disadvantage of pinned memory is the reduced availability of RAM for other p
HIP_CHECK(hipFree(device_output));
}
The pinned memory allocation is effected with different flags, which details
described at :ref:`memory_allocation_flags`.

.. _memory_allocation_flags:

Memory allocation flags for pinned memory
Expand Down
15 changes: 13 additions & 2 deletions docs/understand/compilers.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,17 @@ details, see the :doc:`llvm project<llvm-project:index>`.
HIP compilation workflow
================================================================================

HIP provides a flexible compilation workflow that supports both offline
compilation and runtime (just-in-time, JIT) compilation. Each approach has its
advantages depending on the use case, target architecture, and performance needs.

The offline compilation ideal for production environments, where the performance
is critical, and the target GPU architecture is known in advance.

The runtime compilation useful in development environments or when distributing
software that must run on a wide range of hardware without knowing the specific
GPU beforehand. It provides flexibility at the cost of some performance overhead.

Offline compilation
--------------------------------------------------------------------------------

Expand All @@ -50,8 +61,8 @@ tutorial<compiling_on_the_command_line>` .
Runtime compilation
--------------------------------------------------------------------------------

HIP allows you to compile kernels at runtime using the ``hiprtc*`` API. Kernels are
stored as a text string, which is passed to HIPRTC alongside options to
HIP allows you to compile kernels at runtime using the ``hiprtc*`` API. Kernels
are stored as a text string, which is passed to HIPRTC alongside options to
guide the compilation.

For more details, see
Expand Down

0 comments on commit 0e7c61b

Please sign in to comment.