Add virtual memory guide

ROCm · Aug 14, 2024 · da5031c · da5031c
1 parent 91f60b6
commit da5031c
Show file tree

Hide file tree

Showing 6 changed files with 124 additions and 10 deletions.
diff --git a/.wordlist.txt b/.wordlist.txt
@@ -113,6 +113,7 @@ tradeoffs
 templated
 typedefs
 UMM
+unmap
 variadic
 WinGDB
 zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz
diff --git a/docs/how-to/virtual_memory.rst b/docs/how-to/virtual_memory.rst
@@ -0,0 +1,92 @@
+.. meta::
+  :description: This chapter describes introduces Virtual Memory (VM) and shows
+                how to use it in AMD HIP.
+  :keywords: AMD, ROCm, HIP, CUDA, virtual memory, virtual, memory, UM, APU
+
+.. _virtual_memory:
+
+*****************************
+HIP Virtual Memory Management
+*****************************
+
+Memory management is important when creating high-performance applications in the HIP ecosystem. Both allocating and copying memory can result in bottlenecks, which can significantly impact performance.
+
+Global memory allocation in HIP uses the C language style allocation function. This works fine for simple cases but can cause problems if your memory needs change. If you need to increase the size of your memory, you must allocate a second larger buffer and copy the data to it before you can free the original buffer. This increases overall memory usage and causes unnecessary ``memcpy`` calls. Another solution is to allocate a larger buffer than you initially need. However, this isn't an efficient way to handle resources and doesn't solve the issue of reallocation when the extra buffer runs out.
+
+Virtual memory management solves these memory management problems. It helps to reduce memory usage and unnecessary ``memcpy`` calls.
+
+.. _memory_allocation_virtual_memory:
+Memory Allocation
+=================
+
+Standard memory allocation uses the ``hipMalloc`` function to allocate a block of memory on the device. However, when using virtual memory, this process is separated into multiple steps using the ``hipMemCreate``, ``hipMemAddressReserve``, ``hipMemMap``, and ``hipMemSetAccess`` functions. This guide explains what these functions do and how you can use them for virtual memory management.
+
+Allocate Physical Memory
+------------------------
+
+The first step is to allocate the physical memory itself with the ``hipMemCreate`` function. This function accepts the size of the buffer, an ``unsigned long long`` variable for the flags, and a ``hipMemAllocationProp`` variable. ``hipMemAllocationProp`` contains the properties of the memory to be allocated, such as where the memory is physically located and what kind of shareable handles are available. If the allocation is successful, the function returns a value of ``hipSuccess``, with ``hipMemGenericAllocationHandle_t`` representing a valid physical memory allocation. The allocated memory size must be aligned with the granularity appropriate for the properties of the allocation. You can use the ``hipMemGetAllocationGranularity`` function to determine the correct granularity.
+
+.. code-block:: cpp
+
+    size_t granularity = 0;
+    hipMemGenericAllocationHandle_t allocHandle;
+    hipMemAllocationProp prop = {};
+    prop.type = HIP_MEM_ALLOCATION_TYPE_PINNED;
+    prop.location.type = HIP_MEM_LOCATION_TYPE_DEVICE;
+    prop.location.id = currentDev;
+    hipMemGetAllocationGranularity(&granularity, &prop, HIP_MEM_ALLOC_GRANULARITY_MINIMUM);
+    padded_size = ROUND_UP(size, granularity);
+    hipMemCreate(&allocHandle, padded_size, &prop, 0);
+
+Reserve Virtual Address Range
+-----------------------------
+
+After you have acquired an allocation of physical memory, you must map it before you can use it. To do so, you need a virtual address to map it to.  Mapping means the physical memory allocation is available from the virtual address range it is mapped to. To reserve a virtual memory range, use the ``hipMemAddressReserve`` function. The size of the virtual memory must match the amount of physical memory previously allocated. You can then map the physical memory allocation to the newly-acquired virtual memory address range using the ``hipMemMap`` function.
+
+.. code-block:: cpp
+
+    hipMemAddressReserve(&ptr, padded_size, 0, 0, 0);
+    hipMemMap(ptr, padded_size, 0, allocHandle, 0);
+
+Set Memory Access
+-----------------
+
+Finally, use the ``hipMemSetAccess`` function to enable memory access. It accepts the pointer to the virtual memory, the size, and a ``hipMemAccessDesc`` descriptor as parameters. In a multi-GPU environment, you can map the device memory of one GPU to another. This feature also works with the traditional memory management system, but isn't as scalable as with virtual memory. When memory is allocated with ``hipMalloc``, ``hipDeviceEnablePeerAccess`` is used to enable peer access. This function enables access between two devices, but it means that every call to ``hipMalloc`` takes more time to perform the checks and the mapping between the devices. When using virtual memory management, peer access is enabled by ``hipMemSetAccess``, which provides a finer level of control over what is shared. This has no performance impact on memory allocation and gives you more control over what memory buffers are shared with which devices.
+
+.. code-block:: cpp
+
+    hipMemAccessDesc accessDesc = {};
+    accessDesc.location.type = HIP_MEM_LOCATION_TYPE_DEVICE;
+    accessDesc.location.id = currentDev;
+    accessDesc.flags = HIP_MEM_ACCESS_FLAGS_PROT_READWRITE;
+    hipMemSetAccess(ptr, padded_size, &accessDesc, 1);
+
+At this point the memory is allocated, mapped, and ready for use. You can read and write to it, just like you would a C style memory allocation.
+
+Free Virtual Memory
+-------------------
+
+To free the memory allocated in this manner, use the corresponding free functions. To unmap the memory, use ``hipMemUnmap``. To release the virtual address range, use ``hipMemAddressFree``.  Finally, to release the physical memory, use ``hipMemRelease``. A side effect of these functions is the lack of synchronization when memory is released. If you call ``hipFree`` when you have multiple streams running in parallel, it synchronizes the device. This causes worse resource usage and performance.
+
+.. code-block:: cpp
+
+    hipMemUnmap(ptr, size);
+    hipMemRelease(allocHandle);
+    hipMemAddressFree(ptr, size);
+
+.. _usage_virtual_memory:
+Memory usage
+============
+
+Dynamically increase allocation size
+------------------------------------
+
+The ``hipMemAddressReserve`` function allows you to increase the amount of pre-allocated memory. This function accepts a parameter representing the requested starting address of the virtual memory. This allows you to have a continuous virtual address space without worrying about the underlying physical allocation.
+
+.. code-block:: cpp
+
+    hipMemAddressReserve(&new_ptr, (new_size - padded_size), 0, ptr + padded_size, 0);
+    hipMemMap(new_ptr, (new_size - padded_size), 0, newAllocHandle, 0);
+    hipMemSetAccess(new_ptr, (new_size - padded_size), &accessDesc, 1);
+
+The code sample above assumes that ``hipMemAddressReserve`` was able to reserve the memory address at the specified location. However, this isn't guaranteed to be true, so you should validate that ``new_ptr`` points to a specific virtual address before using it.
diff --git a/docs/index.md b/docs/index.md
@@ -45,6 +45,7 @@ On non-AMD platforms, like NVIDIA, HIP provides header files required to support
 * [Debugging with HIP](./how-to/debugging)
 * {doc}`./how-to/logging`
 * [Unified memory](./how-to/unified_memory)
+* [Virtual memory](./how-to/virtual_memory)
 * [Cooperative groups](./how-to/cooperative_groups)
 * {doc}`./how-to/faq`
 
@@ -59,6 +60,7 @@ On non-AMD platforms, like NVIDIA, HIP provides header files required to support
 * [Comparing syntax for different APIs](./reference/terms)
 * [HSA runtime API for ROCm](./reference/virtual_rocr)
 * [HIP managed memory allocation API](./reference/unified_memory_reference)
+* [HIP virtual memory management API](./reference/virtual_memory_reference)
 * [HIP Cooperative groups API](./reference/cooperative_groups)
 * [List of deprecated APIs](./reference/deprecated_api_list)
 

diff --git a/docs/reference/virtual_memory_reference.rst b/docs/reference/virtual_memory_reference.rst
@@ -0,0 +1,13 @@
+.. meta::
+  :description: This topic describes introduces Virtual Memory (VM) and shows
+                how to use it in AMD HIP.
+  :keywords: AMD, ROCm, HIP, CUDA, virtual memory, virtual, memory, VM
+
+.. _virtual_memory_reference:
+
+*******************************************************************************
+HIP Virtual Memory Management API
+*******************************************************************************
+
+.. doxygengroup:: Virtual
+   :content-only:
diff --git a/docs/sphinx/_toc.yml.in b/docs/sphinx/_toc.yml.in
@@ -31,6 +31,8 @@ subtrees:
   - file: how-to/cooperative_groups
   - file: how-to/unified_memory
     title: Unified memory
+  - file: how-to/virtual_memory
+    title: Virtual memory
   - file: how-to/faq
 
 - caption: Reference
@@ -48,6 +50,8 @@ subtrees:
   - file: reference/virtual_rocr
   - file: reference/unified_memory_reference
     title: HIP managed memory allocation API
+  - file: reference/virtual_memory_reference
+    title: HIP virtual memory management API
   - file: reference/deprecated_api_list
     title: List of deprecated APIs
 

diff --git a/include/hip/hip_runtime_api.h b/include/hip/hip_runtime_api.h
@@ -3327,7 +3327,8 @@ hipError_t hipMemRangeGetAttributes(void** data,
  *
  * @returns #hipSuccess, #hipErrorInvalidValue
  *
- * @note  This API is implemented on Linux and is under development on Microsoft Windows.
+ * @warning This API is under development. Currently it is a no-operation (NOP)
+ *          function on AMD GPUs and returns #hipSuccess.
  */
 hipError_t hipStreamAttachMemAsync(hipStream_t stream,
                                    void* dev_ptr,
@@ -4258,10 +4259,10 @@ hipError_t hipGetProcAddress(const char* symbol, void** pfn, int  hipVersion, ui
  * the host side. The symbol can be in __constant or device space.
  * Note that the symbol name needs to be encased in the HIP_SYMBOL macro.
  * This also applies to hipMemcpyFromSymbol, hipGetSymbolAddress, and hipGetSymbolSize.
- * For detailed usage, see the 
+ * For detailed usage, see the
  * <a href="https://rocm.docs.amd.com/projects/HIP/en/latest/how-to/hip_porting_guide.html#memcpytosymbol">memcpyToSymbol example</a>
  * in the HIP Porting Guide.
- * 
+ *
  *
  *  @param[out]  symbol  pointer to the device symbole
  *  @param[in]   src  pointer to the source address
@@ -5056,7 +5057,7 @@ hipError_t hipMemcpyPeerAsync(void* dst, int dstDeviceId, const void* src, int s
  *  existing driver codes.
  *
  *  These APIs are only for equivalent driver APIs on the NVIDIA platform.
- * 
+ *
  */
 
 /**
@@ -8319,8 +8320,11 @@ hipError_t hipDrvGraphExecMemsetNodeSetParams(hipGraphExec_t hGraphExec, hipGrap
  *  @{
  *  This section describes the virtual memory management functions of HIP runtime API.
  *
- *  @note  Please note, the virtual memory management functions of HIP runtime API are implemented
- *  on Linux, under development on Windows.
+ *  @note  Please note, the virtual memory management functions of HIP runtime
+ *         API are implemented on Linux, under development on Windows. The
+ *         following Virtual Memory Management APIs are not (yet)
+ *         supported in HIP:
+ *          - hipMemMapArrayAsync
  */
 
 /**
@@ -8461,10 +8465,8 @@ hipError_t hipMemMap(void* ptr, size_t size, size_t offset, hipMemGenericAllocat
  * @param [in] count - number of hipArrayMapInfo in mapInfoList.
  * @param [in] stream - stream identifier for the stream to use for map or unmap operations.
  * @returns #hipSuccess, #hipErrorInvalidValue, #hipErrorNotSupported
- * @warning This API is marked as Beta. While this feature is complete, it can
- *          change and might have outstanding issues.
- *
- * @note  This API is implemented on Linux and is under development on Microsoft Windows.
+ * @warning This API is under development. Currently it is not supported on AMD
+ *          GPUs and returns #hipErrorNotSupported.
  */
 hipError_t hipMemMapArrayAsync(hipArrayMapInfo* mapInfoList, unsigned int  count, hipStream_t stream);