Skip to content

Commit

Permalink
Initial commit of HIP environment variables
Browse files Browse the repository at this point in the history
  • Loading branch information
neon60 committed Jun 6, 2024
1 parent 7d9aabc commit 736c7e4
Show file tree
Hide file tree
Showing 5 changed files with 247 additions and 96 deletions.
99 changes: 3 additions & 96 deletions docs/how-to/debugging.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@
:description: How to debug using HIP.
:keywords: AMD, ROCm, HIP, debugging, ltrace, ROCgdb, WinGDB

.. _debugging_with_hip:

*************************************************************************
Debugging with HIP
*************************************************************************
Expand Down Expand Up @@ -272,102 +274,7 @@ HIP environment variable summary

Here are some of the more commonly used environment variables:

.. <!-- spellcheck-disable -->
.. # COMMENT: The following lines define a break for use in the table below.
.. |break| raw:: html

<br />

.. <!-- spellcheck-enable -->
.. list-table::

* - **Environment variable**
- **Default value**
- **Usage**

* - AMD_LOG_LEVEL
|break| Enable HIP log on different Level
- 0
- 0: Disable log.
|break| 1: Enable log on error level
|break| 2: Enable log on warning and below levels
|break| 0x3: Enable log on information and below levels
|break| 0x4: Decode and display AQL packets

* - AMD_LOG_MASK
|break| Enable HIP log on different Level
- 0x7FFFFFFF
- 0x1: Log API calls
|break| 0x02: Kernel and Copy Commands and Barriers
|break| 0x4: Synchronization and waiting for commands to finish
|break| 0x8: Enable log on information and below levels
|break| 0x20: Queue commands and queue contents
|break| 0x40: Signal creation, allocation, pool
|break| 0x80: Locks and thread-safety code
|break| 0x100: Copy debug
|break| 0x200: Detailed copy debug
|break| 0x400: Resource allocation, performance-impacting events
|break| 0x800: Initialization and shutdown
|break| 0x1000: Misc debug, not yet classified
|break| 0x2000: Show raw bytes of AQL packet
|break| 0x4000: Show code creation debug
|break| 0x8000: More detailed command info, including barrier commands
|break| 0x10000: Log message location
|break| 0xFFFFFFFF: Log always even mask flag is zero

* - HIP_LAUNCH_BLOCKING
|break| Used for serialization on kernel execution.
- 0
- 0: Disable. Kernel executes normally.
|break| 1: Enable. Serializes kernel enqueue, behaves the same as AMD_SERIALIZE_KERNEL.

* - HIP_VISIBLE_DEVICES (or CUDA_VISIBLE_DEVICES)
|break| Only devices whose index is present in the sequence are visible to HIP
-
- 0,1,2: Depending on the number of devices on the system

* - GPU_DUMP_CODE_OBJECT
|break| Dump code object
- 0
- 0: Disable
|break| 1: Enable

* - AMD_SERIALIZE_KERNEL
|break| Serialize kernel enqueue
- 0
- 1: Wait for completion before enqueue
|break| 2: Wait for completion after enqueue
|break| 3: Both

* - AMD_SERIALIZE_COPY
|break| Serialize copies
- 0
- 1: Wait for completion before enqueue
|break| 2: Wait for completion after enqueue
|break| 3: Both

* - HIP_HOST_COHERENT
|break| Coherent memory in hipHostMalloc
- 0
- 0: memory is not coherent between host and GPU
|break| 1: memory is coherent with host

* - AMD_DIRECT_DISPATCH
|break| Enable direct kernel dispatch (Currently for Linux; under development for Windows)
- 1
- 0: Disable
|break| 1: Enable

* - GPU_MAX_HW_QUEUES
|break| The maximum number of hardware queues allocated per device
- 4
- The variable controls how many independent hardware queues HIP runtime can create per process,
per device. If an application allocates more HIP streams than this number, then HIP runtime reuses
the same hardware queues for the new streams in a round-robin manner. Note that this maximum
number does not apply to hardware queues that are created for CU-masked HIP streams, or
cooperative queues for HIP Cooperative Groups (single queue per device).
.. include:: ../how-to/debugging_env.rst

General debugging tips
======================================================
Expand Down
88 changes: 88 additions & 0 deletions docs/how-to/debugging_env.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
.. list-table::
:header-rows: 1

* - **Environment variable**
- **Default value**
- **Usage**

* - | ``AMD_LOG_LEVEL``
| Enable HIP log on different Level
- 0
- | 0: Disable log.
| 1: Enable log on error level
| 2: Enable log on warning and below levels
| 0x3: Enable log on information and below levels
| 0x4: Decode and display AQL packets
* - | ``AMD_LOG_MASK``
| Enable HIP log on different Level
- 0x7FFFFFFF
- | 0x1: Log API calls
| 0x02: Kernel and Copy Commands and Barriers
| 0x4: Synchronization and waiting for commands to finish
| 0x8: Enable log on information and below levels
| 0x20: Queue commands and queue contents
| 0x40: Signal creation, allocation, pool
| 0x80: Locks and thread-safety code
| 0x100: Copy debug
| 0x200: Detailed copy debug
| 0x400: Resource allocation, performance-impacting events
| 0x800: Initialization and shutdown
| 0x1000: Misc debug, not yet classified
| 0x2000: Show raw bytes of AQL packet
| 0x4000: Show code creation debug
| 0x8000: More detailed command info, including barrier commands
| 0x10000: Log message location
| 0xFFFFFFFF: Log always even mask flag is zero
* - | ``HIP_LAUNCH_BLOCKING``
| Used for serialization on kernel execution.
- 0
- | 0: Disable. Kernel executes normally.
| 1: Enable. Serializes kernel enqueue, behaves the same as AMD_SERIALIZE_KERNEL.
* - | ``HIP_VISIBLE_DEVICES`` (or ``CUDA_VISIBLE_DEVICES``)
| Only devices whose index is present in the sequence are visible to HIP
-
- 0,1,2: Depending on the number of devices on the system

* - | ``GPU_DUMP_CODE_OBJECT``
| Dump code object
- 0
- | 0: Disable
| 1: Enable
* - | ``AMD_SERIALIZE_KERNEL``
| Serialize kernel enqueue
- 0
- | 1: Wait for completion before enqueue
| 2: Wait for completion after enqueue
| 3: Both
* - | ``AMD_SERIALIZE_COPY``
| Serialize copies
- 0
- | 1: Wait for completion before enqueue
| 2: Wait for completion after enqueue
| 3: Both
* - | ``HIP_HOST_COHERENT``
| Coherent memory in hipHostMalloc
- 0
- | 0: memory is not coherent between host and GPU
| 1: memory is coherent with host
* - | ``AMD_DIRECT_DISPATCH``
| Enable direct kernel dispatch (Currently for Linux; under development for Windows)
- 1
- | 0: Disable
| 1: Enable
* - | ``GPU_MAX_HW_QUEUES``
| The maximum number of hardware queues allocated per device
- 4
- The variable controls how many independent hardware queues HIP runtime can create per process,
per device. If an application allocates more HIP streams than this number, then HIP runtime reuses
the same hardware queues for the new streams in a round-robin manner. Note that this maximum
number does not apply to hardware queues that are created for CU-masked HIP streams, or
cooperative queues for HIP Cooperative Groups (single queue per device).
1 change: 1 addition & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ The CUDA enabled NVIDIA GPUs are supported by HIP. For more information, see [GP

* {doc}`/doxygen/html/index`
* [C++ language extensions](./reference/kernel_language)
* [HIP environment variables](./reference/env_variables)
* [Comparing Syntax for different APIs](./reference/terms)
* [HSA Runtime API for ROCm](./reference/virtual_rocr)
* [List of deprecated APIs](./reference/deprecated_api_list)
Expand Down
154 changes: 154 additions & 0 deletions docs/reference/env_variables.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,154 @@
.. meta::
:description: HIP environment variables reference
:keywords: AMD, HIP, environment variables, environment, reference

*************************************************************
HIP environment variables
*************************************************************

In this section the reader can find all the important HIP environment variables.
The full collection of the environment variables can be found at
:doc:`ROCm environment variables page<rocm:reference/env-variables>`

GPU isolation
=============

The GPU isolation environment variables in HIP is collected in the next table.
For details how to use the variables check the :doc:`GPU isolation page <rocm:conceptual/gpu-isolation>`

.. list-table::
:header-rows: 1

* - **Environment variable**
- **Example value**

* - | ``ROCR_VISIBLE_DEVICES``
| A list of device indices or UUIDs that will be exposed to applications.
- ``0,GPU-DEADBEEFDEADBEEF``

* - | ``GPU_DEVICE_ORDINAL``
| Devices indices exposed to OpenCL and HIP applications.
- ``0,2``

* - | ``HIP_VISIBLE_DEVICES`` or ``CUDA_VISIBLE_DEVICES``
| Device indices exposed to HIP applications.
- ``0,2``

Profiling environment variables
===============================

The profiling environment variables in HIP is collected in the next table. For
details how to use the variables check the :doc:`Setting the number of CUs page <rocm:conceptual/settings-cu>`

.. list-table::
:header-rows: 1

* - **Environment variable**
- **Example value**

* - | ``HSA_CU_MASK``
| Sets the mask on a lower level of queue creation in the driver,
| this mask will also be set for queues being profiled.
-

* - | ``ROC_GLOBAL_CU_MASK``
| Sets the mask on queues created by the HIP or the OpenCL runtimes,
| this mask will also be set for queues being profiled.
-

* - | ``ROCR_VISIBLE_DEVICES``
| A list of device indices or UUIDs that will be exposed to applications.
- ``0,GPU-DEADBEEFDEADBEEF``

Debug environment variables
===========================

The debuging environment variables in HIP is collected in the next table. For
details how to use the debug variables check the :ref:`debugging_with_hip`

.. include:: ../how-to/debugging_env.rst

Memory management related environment variables
===============================================

The memory management related environment variables in HIP is collected in the
next table.

.. list-table::
:widths: 70,15,15
:header-rows: 1

* - Environment variable
- Variable type
- Default value

* - | ``HIP_HIDDEN_FREE_MEM``
| Reserve free mem reporting in Mb, 0 = Disable
- ``uint``
- 0

* - | ``HIP_HOST_COHERENT``
| Coherent memory in ``hipHostMalloc``
- ``uint``
- 0

* - | ``HIP_INITIAL_DM_SIZE``
| Set initial heap size for device malloc. The default value corresponds to 8 MiB
- ``size_t``
- 8388608

* - | ``HIP_MEM_POOL_SUPPORT``
| Enables memory pool support in HIP
- ``bool``
- ``false``

* - | ``HIP_MEM_POOL_USE_VM``
| Enables memory pool support in HIP
- ``bool``
- | ``true`` on Windows,
| ``false`` on other OS
* - | ``HIP_VMEM_MANAGE_SUPPORT``
| Virtual Memory Management Support
- ``bool``
- ``true``

* - | ``GPU_MAX_HEAP_SIZE``
| Set maximum size of the GPU heap to % of board memory
- ``uint``
- 100

* - | ``GPU_MAX_REMOTE_MEM_SIZE``
| Maximum size , in Ki that allows device memory substitution with system
- ``uint``
- 2

* - | ``GPU_NUM_MEM_DEPENDENCY``
| Number of memory objects for dependency tracking
- ``size_t``
- 256

* - | ``GPU_STREAMOPS_CP_WAIT``
| Force the stream wait memory operation to wait on CP.
- ``bool``
- ``false``

* - | ``HSA_LOCAL_MEMORY_ENABLE``
| Enable HSA device local memory usage
- ``bool``
- ``true``

* - | ``PAL_ALWAYS_RESIDENT``
| Force memory resources to become resident at allocation time
- ``bool``
- ``false``

* - | ``PAL_PREPINNED_MEMORY_SIZE``
| Size in KBytes of prepinned memory
- ``size_t``
- 64

* - | ``REMOTE_ALLOC``
| Use remote memory for the global heap allocation
- ``bool``
- ``false``
1 change: 1 addition & 0 deletions docs/sphinx/_toc.yml.in
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ subtrees:
- file: doxygen/html/index
- file: reference/kernel_language
title: C++ language extensions
- file: reference/env_variables
- file: reference/terms
title: Comparing Syntax for different APIs
- file: reference/virtual_rocr
Expand Down

0 comments on commit 736c7e4

Please sign in to comment.