diff --git a/docs/how-to/debugging.rst b/docs/how-to/debugging.rst index c90f7ec7d8..9fd6caa5d6 100644 --- a/docs/how-to/debugging.rst +++ b/docs/how-to/debugging.rst @@ -2,6 +2,8 @@ :description: How to debug using HIP. :keywords: AMD, ROCm, HIP, debugging, ltrace, ROCgdb, WinGDB +.. _debugging_with_hip: + ************************************************************************* Debugging with HIP ************************************************************************* @@ -272,102 +274,7 @@ HIP environment variable summary Here are some of the more commonly used environment variables: -.. - -.. # COMMENT: The following lines define a break for use in the table below. -.. |break| raw:: html - -
- -.. - -.. list-table:: - - * - **Environment variable** - - **Default value** - - **Usage** - - * - AMD_LOG_LEVEL - |break| Enable HIP log on different Level - - 0 - - 0: Disable log. - |break| 1: Enable log on error level - |break| 2: Enable log on warning and below levels - |break| 0x3: Enable log on information and below levels - |break| 0x4: Decode and display AQL packets - - * - AMD_LOG_MASK - |break| Enable HIP log on different Level - - 0x7FFFFFFF - - 0x1: Log API calls - |break| 0x02: Kernel and Copy Commands and Barriers - |break| 0x4: Synchronization and waiting for commands to finish - |break| 0x8: Enable log on information and below levels - |break| 0x20: Queue commands and queue contents - |break| 0x40: Signal creation, allocation, pool - |break| 0x80: Locks and thread-safety code - |break| 0x100: Copy debug - |break| 0x200: Detailed copy debug - |break| 0x400: Resource allocation, performance-impacting events - |break| 0x800: Initialization and shutdown - |break| 0x1000: Misc debug, not yet classified - |break| 0x2000: Show raw bytes of AQL packet - |break| 0x4000: Show code creation debug - |break| 0x8000: More detailed command info, including barrier commands - |break| 0x10000: Log message location - |break| 0xFFFFFFFF: Log always even mask flag is zero - - * - HIP_LAUNCH_BLOCKING - |break| Used for serialization on kernel execution. - - 0 - - 0: Disable. Kernel executes normally. - |break| 1: Enable. Serializes kernel enqueue, behaves the same as AMD_SERIALIZE_KERNEL. - - * - HIP_VISIBLE_DEVICES (or CUDA_VISIBLE_DEVICES) - |break| Only devices whose index is present in the sequence are visible to HIP - - - - 0,1,2: Depending on the number of devices on the system - - * - GPU_DUMP_CODE_OBJECT - |break| Dump code object - - 0 - - 0: Disable - |break| 1: Enable - - * - AMD_SERIALIZE_KERNEL - |break| Serialize kernel enqueue - - 0 - - 1: Wait for completion before enqueue - |break| 2: Wait for completion after enqueue - |break| 3: Both - - * - AMD_SERIALIZE_COPY - |break| Serialize copies - - 0 - - 1: Wait for completion before enqueue - |break| 2: Wait for completion after enqueue - |break| 3: Both - - * - HIP_HOST_COHERENT - |break| Coherent memory in hipHostMalloc - - 0 - - 0: memory is not coherent between host and GPU - |break| 1: memory is coherent with host - - * - AMD_DIRECT_DISPATCH - |break| Enable direct kernel dispatch (Currently for Linux; under development for Windows) - - 1 - - 0: Disable - |break| 1: Enable - - * - GPU_MAX_HW_QUEUES - |break| The maximum number of hardware queues allocated per device - - 4 - - The variable controls how many independent hardware queues HIP runtime can create per process, - per device. If an application allocates more HIP streams than this number, then HIP runtime reuses - the same hardware queues for the new streams in a round-robin manner. Note that this maximum - number does not apply to hardware queues that are created for CU-masked HIP streams, or - cooperative queues for HIP Cooperative Groups (single queue per device). +.. include:: ../how-to/debugging_env.rst General debugging tips ====================================================== diff --git a/docs/how-to/debugging_env.rst b/docs/how-to/debugging_env.rst new file mode 100644 index 0000000000..deb2510a1f --- /dev/null +++ b/docs/how-to/debugging_env.rst @@ -0,0 +1,88 @@ +.. list-table:: + :header-rows: 1 + + * - **Environment variable** + - **Default value** + - **Usage** + + * - | ``AMD_LOG_LEVEL`` + | Enable HIP log on different Level + - 0 + - | 0: Disable log. + | 1: Enable log on error level + | 2: Enable log on warning and below levels + | 0x3: Enable log on information and below levels + | 0x4: Decode and display AQL packets + + * - | ``AMD_LOG_MASK`` + | Enable HIP log on different Level + - 0x7FFFFFFF + - | 0x1: Log API calls + | 0x02: Kernel and Copy Commands and Barriers + | 0x4: Synchronization and waiting for commands to finish + | 0x8: Enable log on information and below levels + | 0x20: Queue commands and queue contents + | 0x40: Signal creation, allocation, pool + | 0x80: Locks and thread-safety code + | 0x100: Copy debug + | 0x200: Detailed copy debug + | 0x400: Resource allocation, performance-impacting events + | 0x800: Initialization and shutdown + | 0x1000: Misc debug, not yet classified + | 0x2000: Show raw bytes of AQL packet + | 0x4000: Show code creation debug + | 0x8000: More detailed command info, including barrier commands + | 0x10000: Log message location + | 0xFFFFFFFF: Log always even mask flag is zero + + * - | ``HIP_LAUNCH_BLOCKING`` + | Used for serialization on kernel execution. + - 0 + - | 0: Disable. Kernel executes normally. + | 1: Enable. Serializes kernel enqueue, behaves the same as AMD_SERIALIZE_KERNEL. + + * - | ``HIP_VISIBLE_DEVICES`` (or ``CUDA_VISIBLE_DEVICES``) + | Only devices whose index is present in the sequence are visible to HIP + - + - 0,1,2: Depending on the number of devices on the system + + * - | ``GPU_DUMP_CODE_OBJECT`` + | Dump code object + - 0 + - | 0: Disable + | 1: Enable + + * - | ``AMD_SERIALIZE_KERNEL`` + | Serialize kernel enqueue + - 0 + - | 1: Wait for completion before enqueue + | 2: Wait for completion after enqueue + | 3: Both + + * - | ``AMD_SERIALIZE_COPY`` + | Serialize copies + - 0 + - | 1: Wait for completion before enqueue + | 2: Wait for completion after enqueue + | 3: Both + + * - | ``HIP_HOST_COHERENT`` + | Coherent memory in hipHostMalloc + - 0 + - | 0: memory is not coherent between host and GPU + | 1: memory is coherent with host + + * - | ``AMD_DIRECT_DISPATCH`` + | Enable direct kernel dispatch (Currently for Linux; under development for Windows) + - 1 + - | 0: Disable + | 1: Enable + + * - | ``GPU_MAX_HW_QUEUES`` + | The maximum number of hardware queues allocated per device + - 4 + - The variable controls how many independent hardware queues HIP runtime can create per process, + per device. If an application allocates more HIP streams than this number, then HIP runtime reuses + the same hardware queues for the new streams in a round-robin manner. Note that this maximum + number does not apply to hardware queues that are created for CU-masked HIP streams, or + cooperative queues for HIP Cooperative Groups (single queue per device). \ No newline at end of file diff --git a/docs/index.md b/docs/index.md index 094f29758c..fc27ede88f 100644 --- a/docs/index.md +++ b/docs/index.md @@ -48,6 +48,7 @@ The CUDA enabled NVIDIA GPUs are supported by HIP. For more information, see [GP * {doc}`/doxygen/html/index` * [C++ language extensions](./reference/kernel_language) +* [HIP environment variables](./reference/env_variables) * [Comparing Syntax for different APIs](./reference/terms) * [HSA Runtime API for ROCm](./reference/virtual_rocr) * [List of deprecated APIs](./reference/deprecated_api_list) diff --git a/docs/reference/env_variables.rst b/docs/reference/env_variables.rst new file mode 100644 index 0000000000..7149d251bd --- /dev/null +++ b/docs/reference/env_variables.rst @@ -0,0 +1,154 @@ +.. meta:: + :description: HIP environment variables reference + :keywords: AMD, HIP, environment variables, environment, reference + +************************************************************* +HIP environment variables +************************************************************* + +In this section the reader can find all the important HIP environment variables. +The full collection of the environment variables can be found at +:doc:`ROCm environment variables page` + +GPU isolation +============= + +The GPU isolation environment variables in HIP is collected in the next table. +For details how to use the variables check the :doc:`GPU isolation page ` + +.. list-table:: + :header-rows: 1 + + * - **Environment variable** + - **Example value** + + * - | ``ROCR_VISIBLE_DEVICES`` + | A list of device indices or UUIDs that will be exposed to applications. + - ``0,GPU-DEADBEEFDEADBEEF`` + + * - | ``GPU_DEVICE_ORDINAL`` + | Devices indices exposed to OpenCL and HIP applications. + - ``0,2`` + + * - | ``HIP_VISIBLE_DEVICES`` or ``CUDA_VISIBLE_DEVICES`` + | Device indices exposed to HIP applications. + - ``0,2`` + +Profiling environment variables +=============================== + +The profiling environment variables in HIP is collected in the next table. For +details how to use the variables check the :doc:`Setting the number of CUs page ` + +.. list-table:: + :header-rows: 1 + + * - **Environment variable** + - **Example value** + + * - | ``HSA_CU_MASK`` + | Sets the mask on a lower level of queue creation in the driver, + | this mask will also be set for queues being profiled. + - + + * - | ``ROC_GLOBAL_CU_MASK`` + | Sets the mask on queues created by the HIP or the OpenCL runtimes, + | this mask will also be set for queues being profiled. + - + + * - | ``ROCR_VISIBLE_DEVICES`` + | A list of device indices or UUIDs that will be exposed to applications. + - ``0,GPU-DEADBEEFDEADBEEF`` + +Debug environment variables +=========================== + +The debuging environment variables in HIP is collected in the next table. For +details how to use the debug variables check the :ref:`debugging_with_hip` + +.. include:: ../how-to/debugging_env.rst + +Memory management related environment variables +=============================================== + +The memory management related environment variables in HIP is collected in the +next table. + +.. list-table:: + :widths: 70,15,15 + :header-rows: 1 + + * - Environment variable + - Variable type + - Default value + + * - | ``HIP_HIDDEN_FREE_MEM`` + | Reserve free mem reporting in Mb, 0 = Disable + - ``uint`` + - 0 + + * - | ``HIP_HOST_COHERENT`` + | Coherent memory in ``hipHostMalloc`` + - ``uint`` + - 0 + + * - | ``HIP_INITIAL_DM_SIZE`` + | Set initial heap size for device malloc. The default value corresponds to 8 MiB + - ``size_t`` + - 8388608 + + * - | ``HIP_MEM_POOL_SUPPORT`` + | Enables memory pool support in HIP + - ``bool`` + - ``false`` + + * - | ``HIP_MEM_POOL_USE_VM`` + | Enables memory pool support in HIP + - ``bool`` + - | ``true`` on Windows, + | ``false`` on other OS + + * - | ``HIP_VMEM_MANAGE_SUPPORT`` + | Virtual Memory Management Support + - ``bool`` + - ``true`` + + * - | ``GPU_MAX_HEAP_SIZE`` + | Set maximum size of the GPU heap to % of board memory + - ``uint`` + - 100 + + * - | ``GPU_MAX_REMOTE_MEM_SIZE`` + | Maximum size , in Ki that allows device memory substitution with system + - ``uint`` + - 2 + + * - | ``GPU_NUM_MEM_DEPENDENCY`` + | Number of memory objects for dependency tracking + - ``size_t`` + - 256 + + * - | ``GPU_STREAMOPS_CP_WAIT`` + | Force the stream wait memory operation to wait on CP. + - ``bool`` + - ``false`` + + * - | ``HSA_LOCAL_MEMORY_ENABLE`` + | Enable HSA device local memory usage + - ``bool`` + - ``true`` + + * - | ``PAL_ALWAYS_RESIDENT`` + | Force memory resources to become resident at allocation time + - ``bool`` + - ``false`` + + * - | ``PAL_PREPINNED_MEMORY_SIZE`` + | Size in KBytes of prepinned memory + - ``size_t`` + - 64 + + * - | ``REMOTE_ALLOC`` + | Use remote memory for the global heap allocation + - ``bool`` + - ``false`` diff --git a/docs/sphinx/_toc.yml.in b/docs/sphinx/_toc.yml.in index 17af3731fc..7acbe69519 100644 --- a/docs/sphinx/_toc.yml.in +++ b/docs/sphinx/_toc.yml.in @@ -35,6 +35,7 @@ subtrees: - file: doxygen/html/index - file: reference/kernel_language title: C++ language extensions + - file: reference/env_variables - file: reference/terms title: Comparing Syntax for different APIs - file: reference/virtual_rocr