Skip to content

Commit

Permalink
Internal review
Browse files Browse the repository at this point in the history
  • Loading branch information
neon60 committed Jul 2, 2024
1 parent 06b9a65 commit d5e490a
Show file tree
Hide file tree
Showing 4 changed files with 188 additions and 257 deletions.
162 changes: 47 additions & 115 deletions docs/how-to/stream_ordered_allocator.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,43 +6,25 @@
Stream Ordered Memory Allocator
*******************************************************************************

The Stream Ordered Memory Allocator (SOMA) is part of the HIP runtime API. It
provides an asynchronous memory allocation mechanism with stream-ordering
semantics. With SOMA, you can allocate and free memory in stream order,
ensuring that all asynchronous accesses occur between the stream executions of
allocation and de-allocation. Compliance with stream order prevents
use-before-allocation or use-after-free errors, which would otherwise lead to
undefined behavior.
The Stream Ordered Memory Allocator (SOMA) is part of the HIP runtime API. It provides an asynchronous memory allocation mechanism with stream-ordering semantics. With SOMA, you can allocate and free memory in stream order, ensuring that all asynchronous accesses occur between the stream executions of allocation and de-allocation. Compliance with stream order prevents use-before-allocation or use-after-free errors, which would otherwise lead to undefined behavior.

Advantages of SOMA:

- Efficient Reuse: SOMA enables efficient memory reuse across streams, reducing
unnecessary allocation overhead.
- Fine-Grained Control: You can set attributes and control caching behavior for
memory pools.
- Inter-Process Sharing: Secure sharing of allocations between processes is
possible.
- Optimizations: The driver can optimize based on its awareness of SOMA and
other stream management APIs.
- Efficient reuse: SOMA enables efficient memory reuse across streams, reducing unnecessary allocation overhead.
- Fine-grained control: You can set attributes and control caching behavior for memory pools.
- Inter-process sharing: Secure sharing of allocations between processes is possible.
- Optimizations: The driver can optimize based on its awareness of SOMA and other stream management APIs.

Disadvantages of SOMA:
- Temporal Constraints: Developers must adhere strictly to stream order to
avoid errors.

- Temporal constraints: Developers must adhere strictly to stream order to avoid errors.
- Complexity: Properly managing memory in stream order can be intricate.
- Learning Curve: Understanding and utilizing SOMA effectively may require
additional effort.
- Learning curve: Understanding and utilizing SOMA effectively may require additional effort.

How is Stream Ordered Memory Allocator Used?
============================================
Stream ordered memory allocator usage
=====================================

Users can allocate memory using ``hipMallocAsync()`` with stream-ordered
semantics. This means that all asynchronous accesses to the allocation must
occur between the stream executions of the allocation and the free.
If memory is accessed outside of this promised stream order, it can lead to
undefined behavior (e.g., use before allocation or use after free errors).
The allocator may reallocate memory as long as it guarantees compliant memory
accesses will not overlap temporally. ``hipFreeAsync()`` frees memory from the
pool with stream-ordered semantics.
Users can allocate memory using ``hipMallocAsync()`` with stream-ordered semantics. This means that all asynchronous accesses to the allocation must occur between the stream executions of the allocation and the free. If memory is accessed outside of this promised stream order, it can lead to undefined behavior (e.g., use before allocation or use after free errors). The allocator may reallocate memory as long as it guarantees compliant memory accesses will not overlap temporally. ``hipFreeAsync()`` frees memory from the pool with stream-ordered semantics.

The following example explains how to use stream ordered memory allocation.

Expand Down Expand Up @@ -127,27 +109,17 @@ The following example explains how to use stream ordered memory allocation.
return 0;
}
Memory Pools
Memory pools
============

Memory pools provide a way to manage memory with stream-ordered behavior,
ensuring proper synchronization and avoiding memory access errors. Division of
a single memory system into separate pools allows querying each partition's
access path properties. Memory pools are used for host memory, device memory,
and unified memory.
Memory pools provide a way to manage memory with stream-ordered behavior, ensuring proper synchronization and avoiding memory access errors. Division of a single memory system into separate pools allows querying each partition's access path properties. Memory pools are used for host memory, device memory, and unified memory.

Set Pools
Set pools
---------

The ``hipMallocAsync()`` function uses the current memory pool, while also
providing the opportunity to create and use different pools with the
``hipMemPoolCreate()`` and ``hipMallocFromPoolAsync()`` functions
respectively.
The ``hipMallocAsync()`` function uses the current memory pool, while also providing the opportunity to create and use different pools with the ``hipMemPoolCreate()`` and ``hipMallocFromPoolAsync()`` functions respectively.

Unlike CUDA, where stream-ordered memory allocation can be implicit, in AMD
HIP, it's always explicit. This means that you need to manage memory allocation
for each stream, ensuring precise control over memory usage and
synchronization.
Unlike CUDA, where stream-ordered memory allocation can be implicit, in AMD HIP, it's always explicit. This means that you need to manage memory allocation for each stream, ensuring precise control over memory usage and synchronization.

.. code-block::cpp
Expand Down Expand Up @@ -192,27 +164,18 @@ synchronization.
return 0;
}
Trim Pools
Trim pools
----------

The memory allocator allows you to allocate and free memory in stream order.
To control memory usage, the release threshold attribute can be set by
``hipMemPoolAttrReleaseThreshold``. This threshold specifies the amount of
reserved memory in bytes that a pool should hold onto before attempting to
release memory back to the operating system.
The memory allocator allows you to allocate and free memory in stream order. To control memory usage, the release threshold attribute can be set by ``hipMemPoolAttrReleaseThreshold``. This threshold specifies the amount of reserved memory in bytes that a pool should hold onto before attempting to release memory back to the operating system.

.. code-block::cpp
uint64_t threshold = UINT64_MAX;
hipMemPoolSetAttribute(memPool, hipMemPoolAttrReleaseThreshold, &threshold);
When more than the specified threshold bytes of memory are held by the
memory pool, the allocator will try to release memory back to the operating
system during the next call to stream, event, or context synchronization.
When more than the specified threshold bytes of memory are held by the memory pool, the allocator will try to release memory back to the operating system during the next call to stream, event, or context synchronization.

Sometimes for a better performance it is a good practice to adjust the memory
pool size with ``hipMemPoolTrimTo()``. It can be useful to reclaim memory from
a memory pool that is larger than necessary, optimizing memory usage for your
application.
Sometimes for a better performance it is a good practice to adjust the memory pool size with ``hipMemPoolTrimTo()``. It can be useful to reclaim memory from a memory pool that is larger than necessary, optimizing memory usage for your application.

.. code-block::cpp
Expand Down Expand Up @@ -246,19 +209,15 @@ application.
}
Resource Usage Statistics
Resource usage statistics
-------------------------
Resource usage statistics can help in optimization. The following pool
attributes to query memory usage:

- ``hipMemPoolAttrReservedMemCurrent`` returns the current total physical
GPU memory consumed by the pool.
- ``hipMemPoolAttrUsedMemCurrent`` returns the total size of all memory
allocated from the pool.
- ``hipMemPoolAttrReservedMemHigh`` returns the total physical GPU memory
consumed by the pool since the last reset.
- ``hipMemPoolAttrUsedMemHigh`` returns the all memory allocated from the
pool since the last reset.

Resource usage statistics can help in optimization. The following pool attributes to query memory usage:

- ``hipMemPoolAttrReservedMemCurrent`` returns the current total physical GPU memory consumed by the pool.
- ``hipMemPoolAttrUsedMemCurrent`` returns the total size of all memory allocated from the pool.
- ``hipMemPoolAttrReservedMemHigh`` returns the total physical GPU memory consumed by the pool since the last reset.
- ``hipMemPoolAttrUsedMemHigh`` returns the all memory allocated from the pool since the last reset.

You can reset them to the current value using the ``hipMemPoolSetAttribute()``.

Expand Down Expand Up @@ -288,47 +247,33 @@ You can reset them to the current value using the ``hipMemPoolSetAttribute()``.
uint64_t value = 0;
hipMemPoolSetAttribute(memPool, hipMemPoolAttrReservedMemHigh, &value);
hipMemPoolSetAttribute(memPool, hipMemPoolAttrUsedMemHigh, &value);
}

}
Memory Reuse Policies
Memory reuse policies
---------------------

The allocator may reallocate memory as long as it guarantees that compliant
memory accesses won't overlap temporally. Turning on and of the following
memory pool reuse policy attribute flags can optimize the memory use:

- ``hipMemPoolReuseFollowEventDependencies`` checks event
dependencies before allocating additional GPU memory.
- ``hipMemPoolReuseAllowOpportunistic`` checks freed allocations to
determine if the stream order semantic indicated by the free operation
has been met.
- ``hipMemPoolReuseAllowInternalDependencies`` manages reuse based on
internal dependencies in runtime. If the driver fails to allocate and map
additional physical memory, it will search for memory that relies on
another stream's pending progress and reuse it.

Device Accessibility for Multi-GPU Support
The allocator may reallocate memory as long as it guarantees that compliant memory accesses won't overlap temporally. Turning on and of the following memory pool reuse policy attribute flags can optimize the memory use:

- ``hipMemPoolReuseFollowEventDependencies`` checks event dependencies before allocating additional GPU memory.
- ``hipMemPoolReuseAllowOpportunistic`` checks freed allocations to determine if the stream order semantic indicated by the free operation has been met.
- ``hipMemPoolReuseAllowInternalDependencies`` manages reuse based on internal dependencies in runtime. If the driver fails to allocate and map additional physical memory, it will search for memory that relies on another stream's pending progress and reuse it.

Device accessibility for multi-GPU support
------------------------------------------

Allocations are initially accessible only from the device where they reside.

Inter-process Memory Handling
Inter-process memory handling
=============================

Inter-process capable (IPC) memory pools facilitate efficient and secure
sharing of GPU memory between processes.
Inter-process capable (IPC) memory pools facilitate efficient and secure sharing of GPU memory between processes.

There are two ways for inter-process memory sharing: pointer sharing or
shareable handles. Both have allocator (export) and consumer (import)
interface.
There are two ways for inter-process memory sharing: pointer sharing or shareable handles. Both have allocator (export) and consumer (import) interface.

Device Pointer
Device pointer
--------------

The ``hipMemPoolExportPointer()`` function allows to export data to share a
memory pool pointer directly between processes. It is useful to share a memory
allocation with another process.
The ``hipMemPoolExportPointer()`` function allows to export data to share a memory pool pointer directly between processes. It is useful to share a memory allocation with another process.

.. code-block::cpp
Expand Down Expand Up @@ -364,11 +309,9 @@ allocation with another process.
return 0;
}
The ``hipMemPoolImportPointer()`` function allows to import a memory pool
pointer directly from another process.
The ``hipMemPoolImportPointer()`` function allows to import a memory pool pointer directly from another process.

Here is the example code to read the exported
pool from the previous example.
Here is the example code to read the exported pool from the previous example.

.. code-block::cpp
Expand Down Expand Up @@ -404,13 +347,10 @@ pool from the previous example.
return 0;
}
Shareable Handle
Shareable handle
----------------

The ``hipMemPoolExportToSharedHandle()`` is used to export a memory pool
pointer to a shareable handle. This handle can be a file descriptor or a handle
obtained from another process. The exported handle contains information about
the memory pool, including its size, location, and other relevant details.
The ``hipMemPoolExportToSharedHandle()`` is used to export a memory pool pointer to a shareable handle. This handle can be a file descriptor or a handle obtained from another process. The exported handle contains information about the memory pool, including its size, location, and other relevant details.

.. code-block::cpp
Expand Down Expand Up @@ -446,15 +386,7 @@ the memory pool, including its size, location, and other relevant details.
return 0;
}
The ``hipMemPoolImportFromShareableHandle()`` function is used to import a
memory pool pointer from a shareable handle -- such as a file descriptor or a
handle obtained from another process. It allows to restore a memory pool
pointer that was previously exported using ``hipMemPoolExportPointer()`` or a
similar mechanism. The exported shareable handle data contains information
about the memory pool, including its size, location, and other relevant
details. After importing, valid memory pointer is received that points to the
same memory area. Useful for inter-process communication or sharing memory
across different contexts.
The ``hipMemPoolImportFromShareableHandle()`` function is used to import a memory pool pointer from a shareable handle -- such as a file descriptor or a handle obtained from another process. It allows to restore a memory pool pointer that was previously exported using ``hipMemPoolExportPointer()`` or a similar mechanism. The exported shareable handle data contains information about the memory pool, including its size, location, and other relevant details. After importing, valid memory pointer is received that points to the same memory area. Useful for inter-process communication or sharing memory across different contexts.

.. code-block::cpp
Expand Down
2 changes: 1 addition & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,10 +43,10 @@ On non-AMD platforms, like NVIDIA, HIP provides header files required to support
* [HIP Porting: Driver API Guide](./how-to/hip_porting_driver_api)
* {doc}`./how-to/hip_rtc`
* {doc}`./how-to/performance_guidelines`
* {doc}`./how-to/stream_ordered_allocator`
* [Debugging with HIP](./how-to/debugging)
* {doc}`./how-to/logging`
* [Unified Memory](./how-to/unified_memory)
* {doc}`./how-to/stream_ordered_allocator`
* {doc}`./how-to/faq`

:::
Expand Down
3 changes: 1 addition & 2 deletions docs/sphinx/_toc.yml.in
Original file line number Diff line number Diff line change
Expand Up @@ -27,11 +27,10 @@ subtrees:
- file: how-to/hip_porting_driver_api
- file: how-to/hip_rtc
- file: how-to/performance_guidelines
- file: how-to/stream_ordered_allocator
- file: how-to/debugging
- file: how-to/logging
- file: how-to/unified_memory
title: Unified Memory
- file: how-to/stream_ordered_allocator
- file: how-to/faq

- caption: Reference
Expand Down
Loading

0 comments on commit d5e490a

Please sign in to comment.