Apply suggestions from code review

Co-authored-by: srawat <[email protected]>
ROCm · Oct 15, 2024 · 0e7c61b · 0e7c61b
1 parent b233a04
commit 0e7c61b
Show file tree

Hide file tree

Showing 4 changed files with 76 additions and 69 deletions.
diff --git a/docs/how-to/hip_runtime_api.rst b/docs/how-to/hip_runtime_api.rst
@@ -1,35 +1,35 @@
 .. meta::
-  :description: This chapter describes the HIP runtime API and shows
-                how to use it.
+  :description: HIP runtime API usage
   :keywords: AMD, ROCm, HIP, CUDA, HIP runtime API How to,
 
 .. _hip_runtime_api_how-to:
 
 ********************************************************************************
-HIP Runtime API
+HIP runtime API
 ********************************************************************************
 
-The HIP runtime API provides C and C++ functionality to manage GPUs, like event,
-stream and memory management. On AMD platforms the HIP runtime uses the
-:doc:`Common Language Runtime (CLR) <hip:understand/amd_clr>`, while on NVIDIA
-platforms it is only a thin layer over the CUDA runtime or Driver API.
+The HIP runtime API provides C and C++ functionalities to manage event, stream,
+and memory on GPUs. On AMD ROCm software, the HIP runtime uses :doc:`Common
+Language Runtime (CLR) <hip:understand/amd_clr>`, while on NVIDIA CUDA platform,
+it is only a thin layer over the CUDA runtime or Driver API.
 
 - **CLR** contains source code for AMD's compute language runtimes: ``HIP`` and
-  ``OpenCL™``. CLR includes the implementation of the ``HIP`` on the AMD
-  platform `hipamd <https://github.com/ROCm/clr/tree/develop/hipamd>`_ and the
-  Radeon Open Compute Common Language Runtime (rocclr). rocclr is a virtual
-  device interface, that enables the HIP runtime to interact with different
-  backends such as :doc:`ROCr <rocr-runtime:index>` on Linux or PAL on Windows. CLR also include the
-  implementation of `OpenCL runtime <https://github.com/ROCm/clr/tree/develop/opencl>`_.
+  ``OpenCL™``. CLR includes the ``HIP`` implementation on the AMD
+  platform: `hipamd <https://github.com/ROCm/clr/tree/develop/hipamd>`_ and the
+  Radeon Open Compute Common Language Runtime (``rocclr``). ``rocclr`` is a
+  virtual device interface that enables the HIP runtime to interact with
+  different backends such as :doc:`ROCr <rocr-runtime:index>` on Linux or PAL on
+  Windows. CLR also includes the `OpenCL runtime <https://github.com/ROCm/clr/tree/develop/opencl>`_
+  implementation.
 - The **CUDA runtime** is built on top of the CUDA driver API, which is a C API
-  with lower-level access to NVIDIA GPUs. For further information about the CUDA
-  driver and runtime API and its relation to HIP check the :doc:`CUDA driver API porting guide<hip:how-to/hip_porting_driver_api>`.
+  with lower-level access to NVIDIA GPUs. For details about the CUDA driver and
+  runtime API with reference to HIP, see :doc:`CUDA driver API porting guide <hip:how-to/hip_porting_driver_api>`.
 
-The relation between the different runtimes and their backends is presented in
-the following figure.
+The backends of HIP runtime API under AMD and NVIDIA platform are summarized in
+the following figure:
 
 .. figure:: ../data/how-to/hip_runtime_api/runtimes.svg
 
 .. note::
 
-  The CUDA specific headers can be found in the `hipother repository <https://github.com/ROCm/hipother>`_.
+  For CUDA-specific headers, see the `hipother repository <https://github.com/ROCm/hipother>`_.
diff --git a/docs/how-to/hip_runtime_api/memory_management/coherence_control.rst b/docs/how-to/hip_runtime_api/memory_management/coherence_control.rst
@@ -111,11 +111,12 @@ following table:
       - ``hipMemAdviseSetCoarseGrain``
       - Coarse-grained
 
-:sup:`1` The :cpp:func:`hipHostMalloc` memory allocation coherence mode can be
-affected by the ``HIP_HOST_COHERENT`` environment variable, if the 
-``hipHostMallocCoherent``, ``hipHostMallocNonCoherent``, and ``hipHostMallocMapped``
-are unset. If neither these flags nor the
-``HIP_HOST_COHERENT`` environment variable is set, or set as 0, the host memory allocation is coarse-grained.
+:sup:`1` The :cpp:func:`hipHostMalloc` memory allocation coherence mode can be 
+affected by the ``HIP_HOST_COHERENT`` environment variable, if the
+``hipHostMallocCoherent``, ``hipHostMallocNonCoherent``, and
+``hipHostMallocMapped`` are unset. If neither these flags nor the
+``HIP_HOST_COHERENT`` environment variable is set, or set as 0, the host memory
+allocation is coarse-grained.
 
 .. note::
 
@@ -127,10 +128,10 @@ are unset. If neither these flags nor the
 Visibility of synchronization functions
 ================================================================================
 
-The fine-grained coherence memory is visible at synchronization points, however 
-at coarse-grained coherence, it depends on the used synchronization function.
-The synchronization functions effect and visibility on different coherence 
-memory types collected in the following table.
+The fine-grained coherence memory is visible at the synchronization points,
+however the visibility of coarse-grained memory depends on the synchronization
+function used. The effect and visibility of various synchronization functions on
+fine- and coarse-grained memory types are listed here:
 
 .. list-table:: HIP synchronize functions effect and visibility
 
@@ -139,43 +140,41 @@ memory types collected in the following table.
       - :cpp:func:`hipDeviceSynchronize`
       - :cpp:func:`hipEventSynchronize`
       - :cpp:func:`hipStreamWaitEvent`
-    * - Synchronization Effect
-      - host waits for all commands in the specified stream to complete
-      - host waits for all commands in all streams on the specified device to complete
-      - host waits for the specified event to complete
-      - stream waits for the specified event to complete
+    * - Synchronization effect
+      - Host waits for all commands in the specified stream to complete
+      - Host waits for all commands in all streams on the specified device to complete
+      - Host waits for the specified event to complete
+      - Stream waits for the specified event to complete
     * - Fence
-      - system-scope release
-      - system-scope release
-      - system-scope release
-      - none
+      - System-scope release
+      - System-scope release
+      - System-scope release
+      - None
     * - Fine-grained host memory visibility
-      - yes
-      - yes
-      - yes
-      - yes
+      - Yes
+      - Yes
+      - Yes
+      - Yes
     * - Coarse-grained host memory visibility
-      - yes
-      - yes
-      - depends on the used event.
-      - no
-
-Developers can control the release scope for :cpp:func:`hipEvents`:
-
-* By default, the GPU performs a device-scope acquire and release operation
-  with each recorded event.  This will make host and device memory visible to
-  other commands executing on the same device.
-
-A stronger system-level fence can be specified when the event is created with 
-:cpp:func:`hipEventCreateWithFlags`:
-
-* ``hipEventReleaseToSystem``: Perform a system-scope release operation
-  when the event is recorded. This will make **both fine-grained and
-  coarse-grained host memory visible to other agents in the system**, but may
-  involve heavyweight operations such as cache flushing. Fine-grained memory
-  will typically use lighter-weight in-kernel synchronization mechanisms such as
-  an atomic operation and thus does not need to use.
-  ``hipEventReleaseToSystem``.
-* ``hipEventDisableTiming``: Events created with this flag will not
-  record profiling data and provide the best performance if used for
-  synchronization.
+      - Yes
+      - Yes
+      - Depends on the used event.
+      - No
+
+You can control the release scope for hipEvents. By default, the GPU performs a
+device-scope acquire and release operation with each recorded event. This makes
+the host and device memory visible to other commands executing on the same
+device.
+
+:cpp:func:`hipEventCreateWithFlags`: You can specify a stronger system-level
+fence by creating the event with ``hipEventCreateWithFlags``:
+
+* ``hipEventReleaseToSystem``: Performs a system-scope release operation when
+  the event is recorded. This makes both fine-grained and coarse-grained host
+  memory visible to other agents in the system, which might also involve 
+  heavyweight operations such as cache flushing. Fine-grained memory typically 
+  uses lighter-weight in-kernel synchronization mechanisms such as an atomic 
+  operation and thus doesn't need to use  ``hipEventReleaseToSystem``.
+
+* ``hipEventDisableTiming``: Events created with this flag don't record
+  profiling data, which significantly improves synchronization performance.
diff --git a/docs/how-to/hip_runtime_api/memory_management/host_memory.rst b/docs/how-to/hip_runtime_api/memory_management/host_memory.rst
@@ -183,9 +183,6 @@ The disadvantage of pinned memory is the reduced availability of RAM for other p
       HIP_CHECK(hipFree(device_output));
   }
 
-The pinned memory allocation is effected with different flags, which details
-described at :ref:`memory_allocation_flags`.
-
 .. _memory_allocation_flags:
 
 Memory allocation flags for pinned memory

diff --git a/docs/understand/compilers.rst b/docs/understand/compilers.rst
@@ -24,6 +24,17 @@ details, see the :doc:`llvm project<llvm-project:index>`.
 HIP compilation workflow
 ================================================================================
 
+HIP provides a flexible compilation workflow that supports both offline
+compilation and runtime (just-in-time, JIT) compilation. Each approach has its 
+advantages depending on the use case, target architecture, and performance needs.
+
+The offline compilation ideal for production environments, where the performance
+is critical, and the target GPU architecture is known in advance.
+
+The runtime compilation useful in development environments or when distributing
+software that must run on a wide range of hardware without knowing the specific
+GPU beforehand. It provides flexibility at the cost of some performance overhead.
+
 Offline compilation
 --------------------------------------------------------------------------------
 
@@ -50,8 +61,8 @@ tutorial<compiling_on_the_command_line>` .
 Runtime compilation
 --------------------------------------------------------------------------------
 
-HIP allows you to compile kernels at runtime using the ``hiprtc*`` API. Kernels are
-stored as a text string, which is passed to HIPRTC alongside options to
+HIP allows you to compile kernels at runtime using the ``hiprtc*`` API. Kernels
+are stored as a text string, which is passed to HIPRTC alongside options to 
 guide the compilation.
 
 For more details, see