Skip to content

Commit

Permalink
WIP
Browse files Browse the repository at this point in the history
  • Loading branch information
neon60 committed Sep 19, 2024
1 parent d99ef25 commit cb7cb3d
Showing 1 changed file with 42 additions and 4 deletions.
46 changes: 42 additions & 4 deletions docs/understand/programming_interface.rst
Original file line number Diff line number Diff line change
Expand Up @@ -96,10 +96,10 @@ Memory management is an important part of the HIP runtime API, when creating
high-performance applications. Both allocating and copying
memory can result in bottlenecks, which can significantly impact performance.

For basic device memory management, HIP uses the C-style functions :cpp:func:`hipMalloc`
for allocating and :cpp:func:`hipFree` for freeing memory. There are advanced
features like managed memory, virtual memory or stream ordered memory allocator
which are described in the following sections.
For traditional device memory management, HIP uses the C-style functions
:cpp:func:`hipMalloc` for allocating and :cpp:func:`hipFree` for freeing memory.
There are advanced features like managed memory, virtual memory or stream
ordered memory allocator which are described in the following sections.

Device memory
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Expand Down Expand Up @@ -157,6 +157,29 @@ either CPUs or GPUs. The Unified memory model is shown in the following figure.
.. figure:: ../data/unified_memory/um.svg

Advantages of unified memory:

- Reduces the complexity of programming heterogeneous systems, do not need to
manually allocate, transfer and track memory between the CPU and GPUs.
- Unified address space between the CPU and GPU. Data structure can use single
pointer for CPU and GPUs.
- Allows portions of the data to reside in the CPU's memory and
only transfers relevant chunks to the GPU when required, leading to better
memory utilization.
- Enables dynamic memory allocation at runtime.
- Simplify memory management in multi-GPU systems, allowing data to be shared
across multiple GPUs without the need of data synchronization and transfer
logic.
- Enables secure sharing of allocations between processes.
- Allows driver to optimize based on its awareness of SOMA and other stream
management APIs.

Disadvantages of unified memory:

- May introduce additional latency due to the need for the GPU to
access host memory.
- ...

Stream ordered memory allocator
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Expand All @@ -170,6 +193,21 @@ undefined behavior.

.. TODO: Add image here
Advantages of SOMA:

- Enables efficient memory reuse across streams, which reduces unnecessary
allocation overhead.
- Allows to set attributes and control caching behavior for memory pools.
- Enables secure sharing of allocations between processes.
- Allows driver to optimize based on its awareness of SOMA and other stream
management APIs.

Disadvantages of SOMA:

- Requires to adhere strictly to stream order to avoid errors.
- Involves memory management in stream order, which can be intricate.
- Requires to put additional efforts to understand and utilize SOMA effectively.

Virtual memory management
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Expand Down

0 comments on commit cb7cb3d

Please sign in to comment.