ROCm · matyas-streamhpc · Oct 3, 2024 · Oct 15, 2024 · Nov 5, 2024 · Nov 11, 2024
@@ -0,0 +1,134 @@
+.. meta::
+   :description: Error Handling
+   :keywords: AMD, ROCm, HIP, error handling, error
+
+********************************************************************************
+Error handling
+********************************************************************************
+
+HIP provides functionality to detect, report, and manage errors that occur
+during the execution of HIP runtime functions or when launching kernels. Every
+HIP runtime function, apart from launching kernels, has :cpp:type:`hipError_t`
+as return type. :cpp:func:`hipGetLastError()` and :cpp:func:`hipPeekAtLastError()`
+can be used for catching errors from kernel launches, as kernel launches don't
+return an error directly. HIP maintains an internal state, that includes the
+last error code. :cpp:func:`hipGetLastError` returns and resets that error to
+hipSuccess, while :cpp:func:`hipPeekAtLastError` just returns the error without
+changing it. To get a human readable version of the errors,
+:cpp:func:`hipGetErrorString()` and :cpp:func:`hipGetErrorName()` can be used.
+
+.. note::
+
+    :cpp:func:`hipGetLastError` returns the returned error code of the last HIP
+    runtime API call even if it's hipSuccess, while ``cudaGetLastError`` returns
+    the error returned by any of the preceding CUDA APIs in the same host thread.
+    :cpp:func:`hipGetLastError` behavior will be matched with
+    ``cudaGetLastError`` in ROCm release 7.0. 
+
+Best practices of HIP error handling:
+
+1. Check errors after each API call - Avoid error propagation.
+2. Use macros for error checking - Check :ref:`hip_check_macros`.
+3. Handle errors gracefully - Free resources and provide meaningful error
+   messages to the user.
+
+For more details on the error handling functions, see :ref:`error handling 
+functions reference page <error_handling_reference>`.
+
+.. _hip_check_macros:
+
+HIP check macros
+================================================================================
+
+HIP uses check macros to simplify error checking and reduce code duplication. 
+The ``HIP_CHECK`` macros are mainly used to detect and report errors. It can 
+also exit from application with ``exit(1);`` function call after the error
+print. The ``HIP_CHECK`` macro example:
+
+.. code-block:: cpp
+
+  #define HIP_CHECK(expression)                  \
+  {                                              \
+      const hipError_t status = expression;      \
+      if(status != hipSuccess){                  \
+          std::cerr << "HIP error "              \
+                    << status << ": "            \
+                    << hipGetErrorString(status) \
+                    << " at " << __FILE__ << ":" \
+                    << __LINE__ << std::endl;    \
+      }                                          \
+  }
+
+Complete example
+================================================================================
+
+A complete example to demonstrate the error handling with a simple addition of 
+two values kernel:
+
+.. code-block:: cpp
+
+  #include <hip/hip_runtime.h>
+  #include <vector>
+  #include <iostream>
+
+  #define HIP_CHECK(expression)                  \
+  {                                              \
+      const hipError_t status = expression;      \
+      if(status != hipSuccess){                  \
+          std::cerr << "HIP error "              \
+                    << status << ": "            \
+                    << hipGetErrorString(status) \
+                    << " at " << __FILE__ << ":" \
+                    << __LINE__ << std::endl;    \
+      }                                          \
+  }
+
+  // Addition of two values.
+  __global__ void add(int *a, int *b, int *c, size_t size) {
+      const size_t index = threadIdx.x + blockDim.x * blockIdx.x;
+      if(index < size) {
+          c[index] += a[index] + b[index];
+      }
+  }
+
+  int main() {
+      constexpr int numOfBlocks = 256;
+      constexpr int threadsPerBlock = 256;
+      constexpr size_t arraySize = 1U << 16;
+
+      std::vector<int> a(arraySize), b(arraySize), c(arraySize);
+      int *d_a, *d_b, *d_c;
+
+      // Setup input values.
+      std::fill(a.begin(), a.end(), 1);
+      std::fill(b.begin(), b.end(), 2);
+
+      // Allocate device copies of a, b and c.
+      HIP_CHECK(hipMalloc(&d_a, arraySize * sizeof(*d_a)));
+      HIP_CHECK(hipMalloc(&d_b, arraySize * sizeof(*d_b)));
+      HIP_CHECK(hipMalloc(&d_c, arraySize * sizeof(*d_c)));
+
+      // Copy input values to device.
+      HIP_CHECK(hipMemcpy(d_a, &a, arraySize * sizeof(*d_a), hipMemcpyHostToDevice));
+      HIP_CHECK(hipMemcpy(d_b, &b, arraySize * sizeof(*d_b), hipMemcpyHostToDevice));
+
+      // Launch add() kernel on GPU.
+      hipLaunchKernelGGL(add, dim3(numOfBlocks), dim3(threadsPerBlock), 0, 0, d_a, d_b, d_c, arraySize);
+      // Check the kernel launch
+      HIP_CHECK(hipGetLastError());
+      // Check for kernel execution error
+      HIP_CHECK(hipDeviceSynchronize());
+
+      // Copy the result back to the host.
+      HIP_CHECK(hipMemcpy(&c, d_c, arraySize * sizeof(*d_c), hipMemcpyDeviceToHost));
+
+      // Cleanup allocated memory.
+      HIP_CHECK(hipFree(d_a));
+      HIP_CHECK(hipFree(d_b));
+      HIP_CHECK(hipFree(d_c));
+
+      // Print the result.
+      std::cout << a[0] << " + " << b[0] << " = " << c[0] << std::endl;
+
+      return 0;
+  }
@@ -0,0 +1,105 @@
+.. meta::
+   :description: Initialization.
+   :keywords: AMD, ROCm, HIP, initialization
+
+********************************************************************************
+Initialization
+********************************************************************************
+
+The initialization involves setting up the environment and resources needed for
+using GPUs. The following steps are covered with the initialization:
+
+- Setting up the HIP runtime
+
+  This includes reading the environment variables set during init, setting up
+  the active or visible devices, loading necessary libraries, setting up
+  internal buffers for memory copies or cooperative launches, initialize the
+  compiler as well as HSA runtime and checks any errors due to lack of resources
+  or no active devices.
+
+- Querying and setting GPUs
+
+  Identifying and querying the available GPU devices on the system.
+
+- Setting up contexts
+
+  Creating contexts for each GPU device, which are essential for managing
+  resources and executing kernels. For further details, check the :ref:`context
+  section <context_driver_api>`.
+
+Initialize the HIP runtime
+================================================================================
+
+The HIP runtime is initialized automatically when the first HIP API call is
+made. However, you can explicitly initialize it using :cpp:func:`hipInit`,
+to be able to control the timing of the initialization. The manual
+initialization can be useful to ensure that the GPU is initialized and
+ready, or to isolate GPU initialization time from other parts of
+your program.
+
+.. note::
+
+  You can use :cpp:func:`hipDeviceReset()` to delete all streams created, memory
+  allocated, kernels running and events created by the current process. Any new
+  HIP API call initializes the HIP runtime again.
+
+Querying and setting GPUs
+================================================================================
+
+If multiple GPUs are available in the system, you can query and select the
+desired GPU(s) to use based on device properties, such as size of global memory,
+size shared memory per block, support of cooperative launch and support of
+managed memory.
+
+Querying GPUs
+--------------------------------------------------------------------------------
+
+The properties of a GPU can be queried using :cpp:func:`hipGetDeviceProperties`,
+which returns a struct of :cpp:struct:`hipDeviceProp_t`. The properties in the struct can be
+used to identify a device or give an overview of hardware characteristics, that
+might make one GPU better suited for the task than others. 
+
+The :cpp:func:`hipGetDeviceCount` function returns the number of available GPUs,
+which can be used to loop over the available GPUs.
+
+Example code of querying GPUs:
+
+.. code-block:: cpp
+
+  #include <hip/hip_runtime.h>
+  #include <iostream>
+
+  int main() {
+
+      int deviceCount;
+      if (hipGetDeviceCount(&deviceCount) == hipSuccess){
+          for (int i = 0; i < deviceCount; ++i){
+              hipDeviceProp_t prop;
+              if ( hipGetDeviceProperties(&prop, i) == hipSuccess)
+                  std::cout << "Device" << i << prop.name << std::endl;
+          }
+      }
+
+      return 0;
+  }
+
+Setting the GPU
+--------------------------------------------------------------------------------
+
+:cpp:func:`hipSetDevice` function select the GPU to be used for subsequent HIP
+operations. This function performs several key tasks:
+
+- Context Binding
+
+  Binds the current thread to the context of the specified GPU device. This 
+  ensures that all subsequent operations are executed on the selected device.
+
+- Resource Allocation
+
+  Prepares the device for resource allocation, such as memory allocation and
+  stream creation.
+
+- Check device availability
+
+  Checks for errors in device selection and returns error if the specified 
+  device is not available or not capable of executing HIP operations.
@@ -30,7 +30,9 @@ The HIP documentation is organized into the following categories:
 :::{grid-item-card} How to
 
 * {doc}`./how-to/hip_runtime_api`
+  * {doc}`./how-to/hip_runtime_api/initialization`
   * {doc}`./how-to/hip_runtime_api/memory_management`
+  * {doc}`./how-to/hip_runtime_api/error_handling`  
   * {doc}`./how-to/hip_runtime_api/cooperative_groups`
   * {doc}`./how-to/hip_runtime_api/hipgraph`
 * [HIP porting guide](./how-to/hip_porting_guide)

@@ -34,6 +34,7 @@ subtrees:
   - file: how-to/hip_runtime_api
     subtrees:
     - entries:
+      - file: how-to/hip_runtime_api/initialization
       - file: how-to/hip_runtime_api/memory_management
         subtrees:
         - entries:
@@ -46,6 +47,7 @@ subtrees:
           - file: how-to/hip_runtime_api/memory_management/unified_memory
           - file: how-to/hip_runtime_api/memory_management/virtual_memory
           - file: how-to/hip_runtime_api/memory_management/stream_ordered_allocator
+      - file: how-to/hip_runtime_api/error_handling
       - file: how-to/hip_runtime_api/cooperative_groups
       - file: how-to/hip_runtime_api/hipgraph
   - file: how-to/hip_porting_guide