Skip to content

Commit

Permalink
Reorganize how-to
Browse files Browse the repository at this point in the history
  • Loading branch information
neon60 committed Sep 6, 2024
1 parent 160eb07 commit 06969fa
Show file tree
Hide file tree
Showing 12 changed files with 92 additions and 127 deletions.
File renamed without changes.
58 changes: 58 additions & 0 deletions docs/how-to/hip_runtime_api.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
.. meta::
:description: HIP runtime API reference page
:keywords: AMD, ROCm, HIP, CUDA, HIP runtime API, HIP runtime

.. _runtime_api_reference:

*******************************************************************************
HIP runtime API
*******************************************************************************

This is the full HIP Runtime API how-to page.

Modules
============================================================

The API is organized into modules based on functionality.

* :ref:`global_enum_defines_reference`
* :ref:`driver_reference`
* :ref:`device_management_reference`
* :ref:`execution_control_reference`
* :ref:`error_handling_reference`
* :ref:`stream_management_reference`
* :ref:`stream_memory_operations_reference`
* :ref:`event_management_reference`
* :ref:`memory_management_reference`

* :ref:`external_resource_interoperability_reference`
* :ref:`stream_memory_operations_reference`
* :ref:`unified_memory_reference`
* :ref:`virtual_memory_reference`

* :ref:`peer_to_peer_device_memory_access_reference`
* :ref:`context_management_reference`
* :ref:`module_management_reference`
* :ref:`occupancy_reference`
* :ref:`profiler_control_reference`
* :ref:`launch_api_reference`
* :ref:`texture_management_reference`
* :ref:`runtime_compilation_reference`
* :ref:`callback_activity_apis_reference`
* :ref:`graph_management_reference`
* :ref:`opengl_interop_reference`
* :ref:`surface_object_reference`

Data structures
============================================================

* :doc:`/doxygen/html/annotated`
* :doc:`/doxygen/html/classes`
* :doc:`/doxygen/html/inherits`
* :doc:`/doxygen/html/functions_data_fields`

Files
============================================================

* :doc:`/doxygen/html/files`
* :doc:`/doxygen/html/globals_globals`
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# HIP programming manual
# Memory management

## Host Memory

Expand Down Expand Up @@ -119,94 +119,3 @@ Note, CPU access to the semaphore's memory requires volatile keyword to disable
For more details, please check the documentation `HIP-API.pdf`.
Please note, HIP stream does not guarantee concurrency on AMD hardware for the case of multiple (at least 6) long-running streams executing concurrently, using `hipStreamSynchronize(nullptr)` for synchronization.
## Direct Dispatch
HIP runtime has Direct Dispatch enabled by default in ROCM 4.4 on Linux.
With this feature we move away from our conventional producer-consumer model where the runtime creates a worker thread(consumer) for each HIP Stream, and the host thread(producer) enqueues commands to a command queue(per stream).
For Direct Dispatch, HIP runtime would directly enqueue a packet to the AQL queue (user mode queue on GPU) on the Dispatch API call from the application. That has shown to reduce the latency to launch the first wave on the idle GPU and total time of tiny dispatches synchronized with the host.
In addition, eliminating the threads in runtime has reduced the variance in the dispatch numbers as the thread scheduling delays and atomics/locks synchronization latencies are reduced.
This feature can be disabled via setting the following environment variable,
AMD_DIRECT_DISPATCH=0
Note, Direct Dispatch is implemented on Linux. It is currently not supported on Windows.
## HIP Runtime Compilation
HIP now supports runtime compilation (HIP RTC), the usage of which will provide the possibility of optimizations and performance improvement compared with other APIs via regular offline static compilation.
HIP RTC APIs accept HIP source files in character string format as input parameters and create handles of programs by compiling the HIP source files without spawning separate processes.
For more details on HIP RTC APIs, refer to [HIP Runtime API Reference](../doxygen/html/index).
For Linux developers, the link [here](https://github.com/ROCm/hip-tests/blob/develop/samples/2_Cookbook/23_cmake_hiprtc/saxpy.cpp) shows an example how to program HIP application using runtime compilation mechanism, and a detailed [HIP RTC programming guide](./hip_rtc) is also available.
## HIP Graph
HIP graphs are supported. For more details, refer to the [HIP API Guide](../doxygen/html/group___graph) or the [understand section for HIP graphs](../understand/hipgraph).
## Device-Side Malloc
HIP-Clang now supports device-side malloc and free.
This implementation does not require the use of `hipDeviceSetLimit(hipLimitMallocHeapSize,value)` nor respects any setting. The heap is fully dynamic and can grow until the available free memory on the device is consumed.
## Use of Per-thread default stream
The per-thread default stream is supported in HIP. It is an implicit stream local to both the thread and the current device. This means that the command issued to the per-thread default stream by the thread does not implicitly synchronize with other streams (like explicitly created streams), or default per-thread stream on other threads.
The per-thread default stream is a blocking stream and will synchronize with the default null stream if both are used in a program.
The per-thread default stream can be enabled via adding a compilation option,
`-fgpu-default-stream=per-thread`.
And users can explicitly use `hipStreamPerThread` as per-thread default stream handle as input in API commands. There are test codes as examples in the [link](https://github.com/ROCm/hip-tests/tree/develop/catch/unit/streamperthread).
## Use of Long Double Type
In HIP-Clang, long double type is 80-bit extended precision format for x86_64, which is not supported by AMDGPU. HIP-Clang treats long double type as IEEE double type for AMDGPU. Using long double type in HIP source code will not cause issue as long as data of long double type is not transferred between host and device. However, long double type should not be used as kernel argument type.
## Use of `_Float16` Type
If a host function is to be used between clang (or hipcc) and gcc for x86_64, i.e. its definition is compiled by one compiler but the caller is compiled by a different compiler, `_Float16` or aggregates containing `_Float16` should not be used as function argument or return type. This is due to lack of stable ABI for `_Float16` on x86_64. Passing `_Float16` or aggregates containing `_Float16` between clang and gcc could cause undefined behavior.
## FMA and contractions
By default HIP-Clang assumes `-ffp-contract=fast-honor-pragmas`.
Users can use `#pragma clang fp contract(on|off|fast)` to control `fp` contraction of a block of code.
For x86_64, FMA is off by default since the generic x86_64 target does not
support FMA by default. To turn on FMA on x86_64, either use `-mfma` or `-march=native`
on CPU's supporting FMA.
When contractions are enabled and the CPU has not enabled FMA instructions, the
GPU can produce different numerical results than the CPU for expressions that
can be contracted. Tolerance should be used for floating point comparisons.
## Math functions with special rounding modes
Note: Currently, HIP only supports basic math functions with rounding modern (round to nearest). HIP does not support basic math functions with rounding modes `ru` (round up), `rd` (round down), and `rz` (round towards zero).
## Creating Static Libraries
HIP-Clang supports generating two types of static libraries. The first type of static library does not export device functions, and only exports and launches host functions within the same library. The advantage of this type is the ability to link with a non-hipcc compiler such as gcc. The second type exports device functions to be linked by other code objects. However, this requires using hipcc as the linker.
In addition, the first type of library contains host objects with device code embedded as fat binaries. It is generated using the flag --emit-static-lib. The second type of library contains relocatable device objects and is generated using `ar`.
Here is an example to create and use static libraries:
* Type 1 using `--emit-static-lib`:
```cpp
hipcc hipOptLibrary.cpp --emit-static-lib -fPIC -o libHipOptLibrary.a
gcc test.cpp -L. -lhipOptLibrary -L/path/to/hip/lib -lamdhip64 -o test.out
```

* Type 2 using system `ar`:

```cpp
hipcc hipDevice.cpp -c -fgpu-rdc -o hipDevice.o
ar rcsD libHipDevice.a hipDevice.o
hipcc libHipDevice.a test.cpp -fgpu-rdc -o test.out
```

For more information, please see [HIP samples host functions](https://github.com/ROCm/hip-tests/tree/develop/samples/2_Cookbook/15_static_library/host_functions) and [device_functions](https://github.com/ROCm/hip-tests/tree/rocm-5.5.x/samples/2_Cookbook/15_static_library/device_functions).
File renamed without changes.
File renamed without changes.
File renamed without changes.
26 changes: 12 additions & 14 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,29 +22,28 @@ The HIP documentation is organized into the following categories:

* {doc}`./understand/programming_model`
* {doc}`./understand/hip_runtime_api`
* {doc}`./understand/hardware_implementation`
* {doc}`./understand/hipgraph`
* {doc}`./understand/amd_clr`
* {doc}`./understand/hardware_implementation`
* {doc}`./understand/hardware_capabilities`
* [Texture fetching](./understand/texture_fetching)

:::

:::{grid-item-card} How to

* [Programming manual](./how-to/programming_manual)
* [HIP runtime API](./how-to/hip_runtime_api)
* [Memory management](./how-to/hip_runtime_api/memory_management)
* [Unified memory](./how-to/hip_runtime_api/memory_management/unified_memory)
* [Virtual memory](./how-to/hip_runtime_api/memory_management/virtual_memory)
* {doc}`./how-to/hip_runtime_api/memory_management/stream_ordered_allocator`
* [HIP graphs](./how-to/hip_runtime_api/hipgraph)
* [Cooperative groups](./how-to/hip_runtime_api/cooperative_groups)
* [HIP porting guide](./how-to/hip_porting_guide)
* [HIP porting: driver API guide](./how-to/hip_porting_driver_api)
* [HIP porting: driver API guide](./how-to/hip_porting_driver_api)
* {doc}`./how-to/hip_rtc`
* {doc}`./how-to/performance_guidelines`
* [Debugging with HIP](./how-to/debugging)
* {doc}`./how-to/logging`
* [Unified memory](./how-to/unified_memory)
* [Virtual memory](./how-to/virtual_memory)
* {doc}`./how-to/stream_ordered_allocator`
* [Cooperative groups](./how-to/cooperative_groups)
* [HIP graphs](./how-to/hipgraph)
* {doc}`./how-to/faq`

:::

Expand All @@ -55,11 +54,10 @@ The HIP documentation is organized into the following categories:
* {doc}`doxygen/html/annotated_data_structures`

Check failure on line 54 in docs/index.md

View workflow job for this annotation

GitHub Actions / Documentation / Markdown

Unordered list indentation

docs/index.md:54:1 MD007/ul-indent Unordered list indentation [Expected: 2; Actual: 4] https://github.com/DavidAnson/markdownlint/blob/v0.28.2/doc/md007.md
* {doc}`doxygen/html/files_files`

Check failure on line 55 in docs/index.md

View workflow job for this annotation

GitHub Actions / Documentation / Markdown

Unordered list indentation

docs/index.md:55:1 MD007/ul-indent Unordered list indentation [Expected: 2; Actual: 4] https://github.com/DavidAnson/markdownlint/blob/v0.28.2/doc/md007.md
* [HSA runtime API for ROCm](./reference/virtual_rocr)
* [HIP math API](./reference/math_api)
* [C++ language extensions](./reference/cpp_language_extensions)
* [C++ language support](./reference/cpp_language_support)
* [Driver API](./reference/driver_api_reference)
* [HIP math API](./reference/math_api)
* [C++ language extensions](./reference/cpp_language_extensions)
* [C++ language support](./reference/cpp_language_support)

Check failure on line 59 in docs/index.md

View workflow job for this annotation

GitHub Actions / Documentation / Markdown

Unordered list indentation

docs/index.md:59:1 MD007/ul-indent Unordered list indentation [Expected: 2; Actual: 4] https://github.com/DavidAnson/markdownlint/blob/v0.28.2/doc/md007.md
* [HIP math API](./reference/math_api)

Check failure on line 60 in docs/index.md

View workflow job for this annotation

GitHub Actions / Documentation / Markdown

Unordered list indentation

docs/index.md:60:1 MD007/ul-indent Unordered list indentation [Expected: 2; Actual: 4] https://github.com/DavidAnson/markdownlint/blob/v0.28.2/doc/md007.md
* [HIP environment variables](./reference/env_variables)
* [Comparing syntax for different APIs](./reference/terms)
* [List of deprecated APIs](./reference/deprecated_api_list)
Expand Down
33 changes: 19 additions & 14 deletions docs/sphinx/_toc.yml.in
Original file line number Diff line number Diff line change
Expand Up @@ -34,22 +34,25 @@ subtrees:

- caption: How to
entries:
- file: how-to/programming_manual
- file: how-to/hip_runtime_api
subtrees:
- entries:
- file: how-to/hip_runtime_api/memory_management
subtrees:
- entries:
- file: how-to/hip_runtime_api/memory_management/unified_memory
- file: how-to/hip_runtime_api/memory_management/stream_ordered_allocator
- file: how-to/hip_runtime_api/memory_management/virtual_memory
- file: how-to/hip_runtime_api/hipgraph
- file: how-to/hip_runtime_api/cooperative_groups
- file: how-to/hip_porting_guide
- file: how-to/hip_porting_driver_api
subtrees:
- entries:
- file: how-to/hip_porting_driver_api
- file: how-to/hip_rtc
- file: how-to/performance_guidelines
- file: how-to/debugging
- file: how-to/logging
- file: how-to/cooperative_groups
- file: how-to/unified_memory
title: Unified memory
- file: how-to/virtual_memory
title: Virtual memory
- file: how-to/stream_ordered_allocator
- file: how-to/hipgraph
title: HIP graphs
- file: how-to/faq

- caption: Reference
entries:
Expand Down Expand Up @@ -91,10 +94,11 @@ subtrees:
- file: reference/virtual_rocr
- file: reference/cpp_language_extensions
title: C++ language extensions
- file: reference/cpp_language_support
title: C++ language support
subtrees:
- entries:
- file: reference/cpp_language_support
- file: reference/math_api
- file: reference/driver_api_reference
- file: reference/math_api
- file: reference/env_variables
- file: reference/terms
title: Comparing syntax for different APIs
Expand All @@ -116,3 +120,4 @@ subtrees:
- caption: About
entries:
- file: license.md
- file: faq.md
5 changes: 0 additions & 5 deletions docs/understand/hip_runtime_api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -273,11 +273,6 @@ The following figure visualizes the concept of graphs, compared to using streams
hipDeviceSynchronize, or using graphs, where the edges denote the
dependencies.

Node types
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. doxygenenum:: hipGraphNodeType

Memory management nodes
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Expand Down
4 changes: 2 additions & 2 deletions docs/what_is_hip.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,13 +19,13 @@ to build and run application kernels.
:align: center

* HIP is a thin API with little or no performance impact over coding directly
in NVIDIA CUDA or AMD :doc:`ROCm <rocm:what_is_rocm>`.
in NVIDIA CUDA or AMD :doc:`ROCm <rocm:what-is-rocm>`.
* HIP enables coding in a single-source C++ programming language including
features such as templates, C++11 lambdas, classes, namespaces, and more.
* Developers can specialize for the platform (CUDA or ROCm) to tune for
performance or handle tricky cases.

:doc:`ROCm <rocm:what_is_rocm>` offers compilers (``clang``, ``hipcc``), code
:doc:`ROCm <rocm:what-is-rocm>` offers compilers (``clang``, ``hipcc``), code
profilers (``rocprof``, ``omnitrace``), debugging tools (``rocgdb``), libraries
and HIP with the runtime API and kernel language, to create heterogeneous applications
running on both CPUs and GPUs. ROCm provides marshalling libraries like
Expand Down

0 comments on commit 06969fa

Please sign in to comment.