Skip to content

Commit

Permalink
WIP
Browse files Browse the repository at this point in the history
  • Loading branch information
neon60 committed May 25, 2024
1 parent a541785 commit 79a077f
Show file tree
Hide file tree
Showing 5 changed files with 19 additions and 19 deletions.
3 changes: 1 addition & 2 deletions .markdownlint-cli2.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,5 +12,4 @@ config:
MD041: false
MD051: false
ignores:
- CHANGELOG.md
- "{,docs/}{RELEASE,release}.md"
- docs/doxygen/mainpage.md
1 change: 1 addition & 0 deletions .wordlist.txt
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ embeded
extern
fatbinary
GPGPU
hardcoded
hipcc
Interoperation
latencies
Expand Down
26 changes: 13 additions & 13 deletions docs/how-to/hip_rtc.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ R"(
)"};
```
Now to compile this kernel, it needs to be associated with hiprtcProgram type, which is done by declaring `hiprtcProgram prog;` and associating the string of kernel with this program:
Now to compile this kernel, it needs to be associated with `hiprtcProgram` type, which is done by declaring `hiprtcProgram prog;` and associating the string of kernel with this program:
```cpp
hiprtcCreateProgram(&prog, // HIPRTC program
Expand All @@ -43,7 +43,7 @@ hiprtcCreateProgram(&prog, // HIPRTC program
hiprtcCreateProgram API also allows you to add headers which can be included in your rtc program.
For online compilation, the compiler pre-defines HIP device API functions, HIP specific types and macros for device compilation, but does not include standard C/C++ headers by default. Users can only include header files provided to hiprtcCreateProgram.

After associating the kernel string with hiprtcProgram, you can now compile this program using:
After associating the kernel string with `hiprtcProgram`, you can now compile this program using:

```cpp
hiprtcCompileProgram(prog, // hiprtcProgram
Expand Down Expand Up @@ -76,7 +76,7 @@ vector<char> kernel_binary(codeSize);
hiprtcGetCode(prog, kernel_binary.data());
```
After loading the binary, hiprtcProgram can be destroyed.
After loading the binary, `hiprtcProgram` can be destroyed.
`hiprtcDestroyProgram(&prog);`
The binary present in `kernel_binary` can now be loaded via `hipModuleLoadData` API.
Expand Down Expand Up @@ -235,7 +235,7 @@ HIPRTC provides a few HIPRTC specific flags

### Bitcode

In the usual scenario, the kernel associated with hiprtcProgram is compiled into the binary which can be loaded and run. However, if -fpu-rdc option is provided in the compile options, HIPRTC calls comgr and generates only the LLVM bitcode. It doesn't convert this bitcode to ISA and generate the final binary.
In the usual scenario, the kernel associated with `hiprtcProgram` is compiled into the binary which can be loaded and run. However, if `-fpu-rdc` option is provided in the compile options, HIPRTC calls comgr and generates only the LLVM bitcode. It doesn't convert this bitcode to ISA and generate the final binary.

```cpp
std::string sarg = std::string("-fgpu-rdc");
Expand Down Expand Up @@ -327,13 +327,13 @@ hipModuleLoadData(&module, binary);
hiprtcLinkDestroy(rtc_link_state);
```

* The correct sequence of calls is : hiprtcLinkCreate, hiprtcLinkAddData or hiprtcLinkAddFile, hiprtcLinkComplete, hiprtcModuleLoadData, hiprtcLinkDestroy.
* The correct sequence of calls is : `hiprtcLinkCreate`, `hiprtcLinkAddData` or `hiprtcLinkAddFile`, `hiprtcLinkComplete`, `hiprtcModuleLoadData`, `hiprtcLinkDestroy`.

### Input Types

HIPRTC provides hiprtcJITInputType enumeration type which defines the input types accepted by the Linker APIs. Here are the enum values of hiprtcJITInputType. However only the input types HIPRTC_JIT_INPUT_LLVM_BITCODE, HIPRTC_JIT_INPUT_LLVM_BUNDLED_BITCODE and HIPRTC_JIT_INPUT_LLVM_ARCHIVES_OF_BUNDLED_BITCODE are supported currently.
HIPRTC provides `hiprtcJITInputType` enumeration type which defines the input types accepted by the Linker APIs. Here are the enum values of `hiprtcJITInputType`. However only the input types `HIPRTC_JIT_INPUT_LLVM_BITCODE`, `HIPRTC_JIT_INPUT_LLVM_BUNDLED_BITCODE` and `HIPRTC_JIT_INPUT_LLVM_ARCHIVES_OF_BUNDLED_BITCODE` are supported currently.

HIPRTC_JIT_INPUT_LLVM_BITCODE can be used to load both LLVM bitcode or LLVM IR assembly code. However, HIPRTC_JIT_INPUT_LLVM_BUNDLED_BITCODE and HIPRTC_JIT_INPUT_LLVM_ARCHIVES_OF_BUNDLED_BITCODE are only for bundled bitcode and archive of bundled bitcode.
`HIPRTC_JIT_INPUT_LLVM_BITCODE` can be used to load both LLVM bitcode or LLVM IR assembly code. However, `HIPRTC_JIT_INPUT_LLVM_BUNDLED_BITCODE` and `HIPRTC_JIT_INPUT_LLVM_ARCHIVES_OF_BUNDLED_BITCODE` are only for bundled bitcode and archive of bundled bitcode.

```cpp
HIPRTC_JIT_INPUT_CUBIN = 0,
Expand Down Expand Up @@ -376,9 +376,9 @@ hiprtcLinkCreate(2, jit_options.data(), (void**)lopts, &linkstate);
## Error Handling
HIPRTC defines the hiprtcResult enumeration type and a function hiprtcGetErrorString for API call error handling. hiprtcResult enum defines the API result codes. HIPRTC APIs return hiprtcResult to indicate the call result. hiprtcGetErrorString function returns a string describing the given hiprtcResult code, e.g., HIPRTC_SUCCESS to "HIPRTC_SUCCESS". For unrecognized enumeration values, it returns "Invalid HIPRTC error code".
HIPRTC defines the `hiprtcResult` enumeration type and a function hiprtcGetErrorString for API call error handling. `hiprtcResult` enum defines the API result codes. HIPRTC APIs return `hiprtcResult` to indicate the call result. hiprtcGetErrorString function returns a string describing the given `hiprtcResult` code, e.g., HIPRTC_SUCCESS to "HIPRTC_SUCCESS". For unrecognized enumeration values, it returns "Invalid HIPRTC error code".
hiprtcResult enum supported values and the hiprtcGetErrorString usage are mentioned below.
`hiprtcResult` enum supported values and the hiprtcGetErrorString usage are mentioned below.
```cpp
HIPRTC_SUCCESS = 0,
Expand Down Expand Up @@ -456,7 +456,7 @@ variable_name_vec.push_back("&N1::N2::V2");
for (auto&& x : variable_name_vec) hiprtcAddNameExpression(prog, x.c_str());
```

After which, the program is compiled using hiprtcCompileProgram and the generated binary is loaded using hipModuleLoadData. And the mangled names can be fetched using hirtcGetLoweredName.
After which, the program is compiled using `hiprtcCompileProgram` and the generated binary is loaded using `hipModuleLoadData`. And the mangled names can be fetched using `hirtcGetLoweredName`.

```cpp
for (decltype(variable_name_vec.size()) i = 0; i != variable_name_vec.size(); ++i) {
Expand Down Expand Up @@ -499,13 +499,13 @@ HIPRTC follows the below versioning.
* HIPRTC follows the same versioning as HIP runtime library.
* The `so` name field for the shared library is set to MAJOR version. For example, for HIP 5.3 the `so` name is set to 5 (hiprtc.so.5).
* Windows
* HIPRTC dll is named as hiprtcXXYY.dll where XX is MAJOR version and YY is MINOR version. For example, for HIP 5.3 the name is hiprtc0503.dll.
* HIPRTC dll is named as `hiprtcXXYY.dll` where XX is MAJOR version and YY is MINOR version. For example, for HIP 5.3 the name is hiprtc0503.dll.

## HIP header support

* Added HIPRTC support for all the hip common header files such as library_types.h, hip_math_constants.h, hip_complex.h, math_functions.h, surface_types.h etc. from 6.1. HIPRTC users need not include any HIP macros or constants explicitly in their header files. All of these should get included via HIPRTC builtins when the app links to HIPRTC library.

## Deprecation notice

* Currently HIPRTC APIs are separated from HIP APIs and HIPRTC is available as a separate library libhiprtc.so/libhiprtc.dll. But on Linux, HIPRTC symbols are also present in libhipamd64.so in order to support the existing applications. Gradually, these symbols will be removed from HIP library and applications using HIPRTC will be required to explicitly link to HIPRTC library. However, on Windows hiprtc.dll must be used as the hipamd64.dll doesn't contain the HIPRTC symbols.
* Data types such as uint32_t, uint64_t, int32_t, int64_t defined in std namespace in HIPRTC are deprecated earlier and are being removed from ROCm release 6.1 since these can conflict with the standard C++ data types. These data types are now prefixed with `__hip__`, e.g. `__hip_uint32_t`. Applications previously using std::uint32_t or similar types can use `__hip_` prefixed types to avoid conflicts with standard std namespace or application can have their own definitions for these types. Also, type_traits templates previously defined in std namespace are moved to `__hip_internal` namespace as implementation details.
* Currently HIPRTC APIs are separated from HIP APIs and HIPRTC is available as a separate library `libhiprtc.so`/`libhiprtc.dll`. But on Linux, HIPRTC symbols are also present in `libhipamd64.so` in order to support the existing applications. Gradually, these symbols will be removed from HIP library and applications using HIPRTC will be required to explicitly link to HIPRTC library. However, on Windows hiprtc.dll must be used as the `hipamd64.dll` doesn't contain the HIPRTC symbols.
* Data types such as `uint32_t`, `uint64_t`, `int32_t`, `int64_t` defined in std namespace in HIPRTC are deprecated earlier and are being removed from ROCm release 6.1 since these can conflict with the standard C++ data types. These data types are now prefixed with `__hip__`, e.g. `__hip_uint32_t`. Applications previously using `std::uint32_t` or similar types can use `__hip_` prefixed types to avoid conflicts with standard std namespace or application can have their own definitions for these types. Also, type_traits templates previously defined in std namespace are moved to `__hip_internal` namespace as implementation details.
2 changes: 1 addition & 1 deletion docs/how-to/programming_manual.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ A stronger system-level fence can be specified when the event is created with `h

### Summary and Recommendations

* Coherent host memory is the default and is the easiest to use since the memory is visible to the CPU at typical synchronization points. This memory allows in-kernel synchronization commands such as threadfence_system to work transparently.
* Coherent host memory is the default and is the easiest to use since the memory is visible to the CPU at typical synchronization points. This memory allows in-kernel synchronization commands such as `threadfence_system` to work transparently.
* HIP/ROCm also supports the ability to cache host memory in the GPU using the "Non-Coherent" host memory allocations. This can provide performance benefit, but care must be taken to use the correct synchronization.

### Managed memory allocation
Expand Down
6 changes: 3 additions & 3 deletions docs/reference/kernel_language.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1258,10 +1258,10 @@ Following is the list of supported integer intrinsics. Note that intrinsics are
| Return the number of consecutive high-order zero bits in 64 bit integer.
* - | ``unsigned int __ffs(int x)``
| Find the position of least signigicant bit set to 1 in a 32 bit integer.
| Find the position of least significant bit set to 1 in a 32 bit integer.
* - | ``unsigned int __ffsll(long long int x)``
| Find the position of least signigicant bit set to 1 in a 64 bit signed integer.
| Find the position of least significant bit set to 1 in a 64 bit signed integer.
* - | ``unsigned int __fns32(unsigned long long mask, unsigned int base, int offset)``
| Find the position of the n-th set to 1 bit in a 32-bit integer.
Expand Down Expand Up @@ -2147,7 +2147,7 @@ implementation of malloc and free that can be called from device functions.
`__launch_bounds__`
============================================================

GPU multiprocessors have a fixed pool of resources (primarily registers and shared memory) which are shared by the actively running warps. Using more resources can increase IPC of the kernel but reduces the resources available for other warps and limits the number of warps that can be simulaneously running. Thus GPUs have a complex relationship between resource usage and performance.
GPU multiprocessors have a fixed pool of resources (primarily registers and shared memory) which are shared by the actively running warps. Using more resources can increase IPC of the kernel but reduces the resources available for other warps and limits the number of warps that can be simultaneously running. Thus GPUs have a complex relationship between resource usage and performance.

__launch_bounds__ allows the application to provide usage hints that influence the resources (primarily registers) used by the generated code. It is a function attribute that must be attached to a __global__ function:

Expand Down

0 comments on commit 79a077f

Please sign in to comment.