Internal review

ROCm · Jul 2, 2024 · d5e490a · d5e490a
1 parent 06b9a65
commit d5e490a
Show file tree

Hide file tree

Showing 4 changed files with 188 additions and 257 deletions.
diff --git a/docs/how-to/stream_ordered_allocator.rst b/docs/how-to/stream_ordered_allocator.rst
@@ -6,43 +6,25 @@
 Stream Ordered Memory Allocator
 *******************************************************************************
 
-The Stream Ordered Memory Allocator (SOMA) is part of the HIP runtime API. It
-provides an asynchronous memory allocation mechanism with stream-ordering
-semantics. With SOMA, you can allocate and free memory in stream order,
-ensuring that all asynchronous accesses occur between the stream executions of
-allocation and de-allocation. Compliance with stream order prevents
-use-before-allocation or use-after-free errors, which would otherwise lead to
-undefined behavior.
+The Stream Ordered Memory Allocator (SOMA) is part of the HIP runtime API. It provides an asynchronous memory allocation mechanism with stream-ordering semantics. With SOMA, you can allocate and free memory in stream order, ensuring that all asynchronous accesses occur between the stream executions of allocation and de-allocation. Compliance with stream order prevents use-before-allocation or use-after-free errors, which would otherwise lead to undefined behavior.
 
 Advantages of SOMA:
 
-- Efficient Reuse: SOMA enables efficient memory reuse across streams, reducing
-  unnecessary allocation overhead.
-- Fine-Grained Control: You can set attributes and control caching behavior for
-  memory pools.
-- Inter-Process Sharing: Secure sharing of allocations between processes is
-  possible.
-- Optimizations: The driver can optimize based on its awareness of SOMA and
-  other stream management APIs.
+- Efficient reuse: SOMA enables efficient memory reuse across streams, reducing unnecessary allocation overhead.
+- Fine-grained control: You can set attributes and control caching behavior for memory pools.
+- Inter-process sharing: Secure sharing of allocations between processes is possible.
+- Optimizations: The driver can optimize based on its awareness of SOMA and other stream management APIs.
 
 Disadvantages of SOMA:
-- Temporal Constraints: Developers must adhere strictly to stream order to
-  avoid errors.
+
+- Temporal constraints: Developers must adhere strictly to stream order to avoid errors.
 - Complexity: Properly managing memory in stream order can be intricate.
-- Learning Curve: Understanding and utilizing SOMA effectively may require
-  additional effort.
+- Learning curve: Understanding and utilizing SOMA effectively may require additional effort.
 
-How is Stream Ordered Memory Allocator Used?
-============================================
+Stream ordered memory allocator usage
+=====================================
 
-Users can allocate memory using ``hipMallocAsync()`` with stream-ordered
-semantics. This means that all asynchronous accesses to the allocation must
-occur between the stream executions of the allocation and the free.
-If memory is accessed outside of this promised stream order, it can lead to
-undefined behavior (e.g., use before allocation or use after free errors).
-The allocator may reallocate memory as long as it guarantees compliant memory
-accesses will not overlap temporally. ``hipFreeAsync()`` frees memory from the
-pool with stream-ordered semantics.
+Users can allocate memory using ``hipMallocAsync()`` with stream-ordered semantics. This means that all asynchronous accesses to the allocation must occur between the stream executions of the allocation and the free. If memory is accessed outside of this promised stream order, it can lead to undefined behavior (e.g., use before allocation or use after free errors). The allocator may reallocate memory as long as it guarantees compliant memory accesses will not overlap temporally. ``hipFreeAsync()`` frees memory from the pool with stream-ordered semantics.
 
 The following example explains how to use stream ordered memory allocation.
 
@@ -127,27 +109,17 @@ The following example explains how to use stream ordered memory allocation.
                 return 0;
             }
 
-Memory Pools
+Memory pools
 ============
 
-Memory pools provide a way to manage memory with stream-ordered behavior,
-ensuring proper synchronization and avoiding memory access errors. Division of
-a single memory system into separate pools allows querying each partition's
-access path properties. Memory pools are used for host memory, device memory,
-and unified memory.
+Memory pools provide a way to manage memory with stream-ordered behavior, ensuring proper synchronization and avoiding memory access errors. Division of a single memory system into separate pools allows querying each partition's access path properties. Memory pools are used for host memory, device memory, and unified memory.
 
-Set Pools
+Set pools
 ---------
 
-The ``hipMallocAsync()`` function uses the current memory pool, while also
-providing the opportunity to create and use different pools with the
-``hipMemPoolCreate()`` and ``hipMallocFromPoolAsync()`` functions
-respectively.
+The ``hipMallocAsync()`` function uses the current memory pool, while also providing the opportunity to create and use different pools with the ``hipMemPoolCreate()`` and ``hipMallocFromPoolAsync()`` functions respectively.
 
-Unlike CUDA, where stream-ordered memory allocation can be implicit, in AMD
-HIP, it's always explicit. This means that you need to manage memory allocation
-for each stream, ensuring precise control over memory usage and
-synchronization.
+Unlike CUDA, where stream-ordered memory allocation can be implicit, in AMD HIP, it's always explicit. This means that you need to manage memory allocation for each stream, ensuring precise control over memory usage and synchronization.
 
 .. code-block::cpp
 
@@ -192,27 +164,18 @@ synchronization.
         return 0;
     }
 
-Trim Pools
+Trim pools
 ----------
 
-The memory allocator allows you to allocate and free memory in stream order.
-To control memory usage, the release threshold attribute can be set by
-``hipMemPoolAttrReleaseThreshold``. This threshold specifies the amount of
-reserved memory in bytes that a pool should hold onto before attempting to
-release memory back to the operating system.
+The memory allocator allows you to allocate and free memory in stream order. To control memory usage, the release threshold attribute can be set by ``hipMemPoolAttrReleaseThreshold``. This threshold specifies the amount of reserved memory in bytes that a pool should hold onto before attempting to release memory back to the operating system.
 
 .. code-block::cpp
     uint64_t threshold = UINT64_MAX;
     hipMemPoolSetAttribute(memPool, hipMemPoolAttrReleaseThreshold, &threshold);
 
-When more than the specified threshold bytes of memory are held by the
-memory pool, the allocator will try to release memory back to the operating
-system during the next call to stream, event, or context synchronization.
+When more than the specified threshold bytes of memory are held by the memory pool, the allocator will try to release memory back to the operating system during the next call to stream, event, or context synchronization.
 
-Sometimes for a better performance it is a good practice to adjust the memory
-pool size with ``hipMemPoolTrimTo()``. It can be useful to reclaim memory from
-a memory pool that is larger than necessary, optimizing memory usage for your
-application.
+Sometimes for a better performance it is a good practice to adjust the memory pool size with ``hipMemPoolTrimTo()``. It can be useful to reclaim memory from a memory pool that is larger than necessary, optimizing memory usage for your application.
 
 .. code-block::cpp
 
@@ -246,19 +209,15 @@ application.
     }
 
 
-Resource Usage Statistics
+Resource usage statistics
 -------------------------
-Resource usage statistics can help in optimization. The following pool
-attributes to query memory usage:
-
-    - ``hipMemPoolAttrReservedMemCurrent`` returns the current total physical
-      GPU memory consumed by the pool.
-    - ``hipMemPoolAttrUsedMemCurrent`` returns the total size of all memory
-      allocated from the pool.
-    - ``hipMemPoolAttrReservedMemHigh`` returns the total physical GPU memory
-      consumed by the pool since the last reset.
-    - ``hipMemPoolAttrUsedMemHigh`` returns the all memory allocated from the
-      pool since the last reset.
+
+Resource usage statistics can help in optimization. The following pool attributes to query memory usage:
+
+    - ``hipMemPoolAttrReservedMemCurrent`` returns the current total physical GPU memory consumed by the pool.
+    - ``hipMemPoolAttrUsedMemCurrent`` returns the total size of all memory allocated from the pool.
+    - ``hipMemPoolAttrReservedMemHigh`` returns the total physical GPU memory consumed by the pool since the last reset.
+    - ``hipMemPoolAttrUsedMemHigh`` returns the all memory allocated from the pool since the last reset.
 
 You can reset them to the current value using the ``hipMemPoolSetAttribute()``.
 
@@ -288,47 +247,33 @@ You can reset them to the current value using the ``hipMemPoolSetAttribute()``.
         uint64_t value = 0;
         hipMemPoolSetAttribute(memPool, hipMemPoolAttrReservedMemHigh, &value);
         hipMemPoolSetAttribute(memPool, hipMemPoolAttrUsedMemHigh, &value);
-}
-
+    }
 
-Memory Reuse Policies
+Memory reuse policies
 ---------------------
 
-The allocator may reallocate memory as long as it guarantees that compliant
-memory accesses won't overlap temporally. Turning on and of the following
-memory pool reuse policy attribute flags can optimize the memory use:
-
-    - ``hipMemPoolReuseFollowEventDependencies`` checks event
-      dependencies before allocating additional GPU memory.
-    - ``hipMemPoolReuseAllowOpportunistic`` checks freed allocations to
-      determine if the stream order semantic indicated by the free operation
-      has been met.
-    - ``hipMemPoolReuseAllowInternalDependencies`` manages reuse based on
-      internal dependencies in runtime. If the driver fails to allocate and map
-      additional physical memory, it will search for memory that relies on
-      another stream's pending progress and reuse it.
-
-Device Accessibility for Multi-GPU Support
+The allocator may reallocate memory as long as it guarantees that compliant memory accesses won't overlap temporally. Turning on and of the following memory pool reuse policy attribute flags can optimize the memory use:
+
+    - ``hipMemPoolReuseFollowEventDependencies`` checks event dependencies before allocating additional GPU memory.
+    - ``hipMemPoolReuseAllowOpportunistic`` checks freed allocations to determine if the stream order semantic indicated by the free operation has been met.
+    - ``hipMemPoolReuseAllowInternalDependencies`` manages reuse based on internal dependencies in runtime. If the driver fails to allocate and map additional physical memory, it will search for memory that relies on another stream's pending progress and reuse it.
+
+Device accessibility for multi-GPU support
 ------------------------------------------
 
 Allocations are initially accessible only from the device where they reside.
 
-Inter-process Memory Handling
+Inter-process memory handling
 =============================
 
-Inter-process capable (IPC) memory pools facilitate efficient and secure
-sharing of GPU memory between processes.
+Inter-process capable (IPC) memory pools facilitate efficient and secure sharing of GPU memory between processes.
 
-There are two ways for inter-process memory sharing: pointer sharing or
-shareable handles. Both have allocator (export) and consumer (import)
-interface.
+There are two ways for inter-process memory sharing: pointer sharing or shareable handles. Both have allocator (export) and consumer (import) interface.
 
-Device Pointer
+Device pointer
 --------------
 
-The ``hipMemPoolExportPointer()`` function allows to export data to share a
-memory pool pointer directly between processes. It is useful to share a memory
-allocation with another process.
+The ``hipMemPoolExportPointer()`` function allows to export data to share a memory pool pointer directly between processes. It is useful to share a memory allocation with another process.
 
 .. code-block::cpp
 
@@ -364,11 +309,9 @@ allocation with another process.
         return 0;
     }
 
-The ``hipMemPoolImportPointer()`` function allows to import a memory pool
-pointer directly from another process.
+The ``hipMemPoolImportPointer()`` function allows to import a memory pool pointer directly from another process.
 
-Here is the example code to read the exported
-pool from the previous example.
+Here is the example code to read the exported pool from the previous example.
 
 .. code-block::cpp
 
@@ -404,13 +347,10 @@ pool from the previous example.
         return 0;
     }
 
-Shareable Handle
+Shareable handle
 ----------------
 
-The ``hipMemPoolExportToSharedHandle()`` is used to export a memory pool
-pointer to a shareable handle. This handle can be a file descriptor or a handle
-obtained from another process. The exported handle contains information about
-the memory pool, including its size, location, and other relevant details.
+The ``hipMemPoolExportToSharedHandle()`` is used to export a memory pool pointer to a shareable handle. This handle can be a file descriptor or a handle obtained from another process. The exported handle contains information about the memory pool, including its size, location, and other relevant details.
 
 .. code-block::cpp
 
@@ -446,15 +386,7 @@ the memory pool, including its size, location, and other relevant details.
         return 0;
     }
 
-The ``hipMemPoolImportFromShareableHandle()`` function is used to import a
-memory pool pointer from a shareable handle -- such as a file descriptor or a
-handle obtained from another process. It allows to restore a memory pool
-pointer that was previously exported using ``hipMemPoolExportPointer()`` or a
-similar mechanism. The exported shareable handle data contains information
-about the memory pool, including its size, location, and other relevant
-details. After importing, valid memory pointer is received that points to the
-same memory area. Useful for inter-process communication or sharing memory
-across different contexts.
+The ``hipMemPoolImportFromShareableHandle()`` function is used to import a memory pool pointer from a shareable handle -- such as a file descriptor or a handle obtained from another process. It allows to restore a memory pool pointer that was previously exported using ``hipMemPoolExportPointer()`` or a similar mechanism. The exported shareable handle data contains information about the memory pool, including its size, location, and other relevant details. After importing, valid memory pointer is received that points to the same memory area. Useful for inter-process communication or sharing memory across different contexts.
 
 .. code-block::cpp
 

diff --git a/docs/index.md b/docs/index.md
@@ -43,10 +43,10 @@ On non-AMD platforms, like NVIDIA, HIP provides header files required to support
 * [HIP Porting: Driver API Guide](./how-to/hip_porting_driver_api)
 * {doc}`./how-to/hip_rtc`
 * {doc}`./how-to/performance_guidelines`
-* {doc}`./how-to/stream_ordered_allocator`
 * [Debugging with HIP](./how-to/debugging)
 * {doc}`./how-to/logging`
 * [Unified Memory](./how-to/unified_memory)
+* {doc}`./how-to/stream_ordered_allocator`
 * {doc}`./how-to/faq`
 
 :::

diff --git a/docs/sphinx/_toc.yml.in b/docs/sphinx/_toc.yml.in
@@ -27,11 +27,10 @@ subtrees:
   - file: how-to/hip_porting_driver_api
   - file: how-to/hip_rtc
   - file: how-to/performance_guidelines
-  - file: how-to/stream_ordered_allocator
   - file: how-to/debugging
   - file: how-to/logging
   - file: how-to/unified_memory
-    title: Unified Memory
+  - file: how-to/stream_ordered_allocator
   - file: how-to/faq
 
 - caption: Reference