WIP

ROCm · May 25, 2024 · 79a077f · 79a077f
1 parent a541785
commit 79a077f
Show file tree

Hide file tree

Showing 5 changed files with 19 additions and 19 deletions.
diff --git a/.markdownlint-cli2.yaml b/.markdownlint-cli2.yaml
@@ -12,5 +12,4 @@ config:
   MD041: false
   MD051: false
 ignores:
-  - CHANGELOG.md
-  - "{,docs/}{RELEASE,release}.md"
+  - docs/doxygen/mainpage.md
diff --git a/.wordlist.txt b/.wordlist.txt
@@ -11,6 +11,7 @@ embeded
 extern
 fatbinary
 GPGPU
+hardcoded
 hipcc
 Interoperation
 latencies

diff --git a/docs/how-to/hip_rtc.md b/docs/how-to/hip_rtc.md
@@ -29,7 +29,7 @@ R"(
 )"};
 ```
 
-Now to compile this kernel, it needs to be associated with hiprtcProgram type, which is done by declaring `hiprtcProgram prog;` and associating the string of kernel with this program:
+Now to compile this kernel, it needs to be associated with `hiprtcProgram` type, which is done by declaring `hiprtcProgram prog;` and associating the string of kernel with this program:
 
 ```cpp
 hiprtcCreateProgram(&prog,                 // HIPRTC program
@@ -43,7 +43,7 @@ hiprtcCreateProgram(&prog,                 // HIPRTC program
 hiprtcCreateProgram API also allows you to add headers which can be included in your rtc program.
 For online compilation, the compiler pre-defines HIP device API functions, HIP specific types and macros for device compilation, but does not include standard C/C++ headers by default. Users can only include header files provided to hiprtcCreateProgram.
 
-After associating the kernel string with hiprtcProgram, you can now compile this program using:
+After associating the kernel string with `hiprtcProgram`, you can now compile this program using:
 
 ```cpp
 hiprtcCompileProgram(prog,     // hiprtcProgram
@@ -76,7 +76,7 @@ vector<char> kernel_binary(codeSize);
 hiprtcGetCode(prog, kernel_binary.data());
 ```
 
-After loading the binary, hiprtcProgram can be destroyed.
+After loading the binary, `hiprtcProgram` can be destroyed.
 `hiprtcDestroyProgram(&prog);`
 
 The binary present in `kernel_binary` can now be loaded via `hipModuleLoadData` API.
@@ -235,7 +235,7 @@ HIPRTC provides a few HIPRTC specific flags
 
 ### Bitcode
 
-In the usual scenario, the kernel associated with hiprtcProgram is compiled into the binary which can be loaded and run. However, if -fpu-rdc option is provided in the compile options, HIPRTC calls comgr and generates only the LLVM bitcode.  It doesn't convert this bitcode to ISA and generate the final binary.
+In the usual scenario, the kernel associated with `hiprtcProgram` is compiled into the binary which can be loaded and run. However, if `-fpu-rdc` option is provided in the compile options, HIPRTC calls comgr and generates only the LLVM bitcode.  It doesn't convert this bitcode to ISA and generate the final binary.
 
 ```cpp
 std::string sarg = std::string("-fgpu-rdc");
@@ -327,13 +327,13 @@ hipModuleLoadData(&module, binary);
 hiprtcLinkDestroy(rtc_link_state);
 ```
 
-* The correct sequence of calls is : hiprtcLinkCreate, hiprtcLinkAddData or hiprtcLinkAddFile, hiprtcLinkComplete, hiprtcModuleLoadData, hiprtcLinkDestroy.
+* The correct sequence of calls is : `hiprtcLinkCreate`, `hiprtcLinkAddData` or `hiprtcLinkAddFile`, `hiprtcLinkComplete`, `hiprtcModuleLoadData`, `hiprtcLinkDestroy`.
 
 ### Input Types
 
-HIPRTC provides hiprtcJITInputType enumeration type which defines the input types accepted by the Linker APIs. Here are the enum values of hiprtcJITInputType. However only the input types  HIPRTC_JIT_INPUT_LLVM_BITCODE, HIPRTC_JIT_INPUT_LLVM_BUNDLED_BITCODE and HIPRTC_JIT_INPUT_LLVM_ARCHIVES_OF_BUNDLED_BITCODE are supported currently.
+HIPRTC provides `hiprtcJITInputType` enumeration type which defines the input types accepted by the Linker APIs. Here are the enum values of `hiprtcJITInputType`. However only the input types `HIPRTC_JIT_INPUT_LLVM_BITCODE`, `HIPRTC_JIT_INPUT_LLVM_BUNDLED_BITCODE` and `HIPRTC_JIT_INPUT_LLVM_ARCHIVES_OF_BUNDLED_BITCODE` are supported currently.
 
-HIPRTC_JIT_INPUT_LLVM_BITCODE can be used to load both LLVM bitcode or LLVM IR assembly code. However, HIPRTC_JIT_INPUT_LLVM_BUNDLED_BITCODE and HIPRTC_JIT_INPUT_LLVM_ARCHIVES_OF_BUNDLED_BITCODE are only for bundled bitcode and archive of bundled bitcode.
+`HIPRTC_JIT_INPUT_LLVM_BITCODE` can be used to load both LLVM bitcode or LLVM IR assembly code. However, `HIPRTC_JIT_INPUT_LLVM_BUNDLED_BITCODE` and `HIPRTC_JIT_INPUT_LLVM_ARCHIVES_OF_BUNDLED_BITCODE` are only for bundled bitcode and archive of bundled bitcode.
 
 ```cpp
 HIPRTC_JIT_INPUT_CUBIN = 0,
@@ -376,9 +376,9 @@ hiprtcLinkCreate(2, jit_options.data(), (void**)lopts, &linkstate);
 
 ## Error Handling
 
-HIPRTC defines the hiprtcResult enumeration type and a function hiprtcGetErrorString for API call error handling.  hiprtcResult enum defines the API result codes.  HIPRTC APIs return hiprtcResult to indicate the call result. hiprtcGetErrorString function returns a string describing the given hiprtcResult code, e.g., HIPRTC_SUCCESS to "HIPRTC_SUCCESS". For unrecognized enumeration values, it returns "Invalid HIPRTC error code".
+HIPRTC defines the `hiprtcResult` enumeration type and a function hiprtcGetErrorString for API call error handling.  `hiprtcResult` enum defines the API result codes.  HIPRTC APIs return `hiprtcResult` to indicate the call result. hiprtcGetErrorString function returns a string describing the given `hiprtcResult` code, e.g., HIPRTC_SUCCESS to "HIPRTC_SUCCESS". For unrecognized enumeration values, it returns "Invalid HIPRTC error code".
 
-hiprtcResult enum supported values and the hiprtcGetErrorString usage are mentioned below.
+`hiprtcResult` enum supported values and the hiprtcGetErrorString usage are mentioned below.
 
 ```cpp
 HIPRTC_SUCCESS = 0,
@@ -456,7 +456,7 @@ variable_name_vec.push_back("&N1::N2::V2");
 for (auto&& x : variable_name_vec) hiprtcAddNameExpression(prog, x.c_str());
 ```
 
-After which, the program is compiled using hiprtcCompileProgram and the generated binary is loaded using hipModuleLoadData.  And the mangled names can be fetched using hirtcGetLoweredName.
+After which, the program is compiled using `hiprtcCompileProgram` and the generated binary is loaded using `hipModuleLoadData`.  And the mangled names can be fetched using `hirtcGetLoweredName`.
 
 ```cpp
 for (decltype(variable_name_vec.size()) i = 0; i != variable_name_vec.size(); ++i) {
@@ -499,13 +499,13 @@ HIPRTC follows the below versioning.
   * HIPRTC follows the same versioning as HIP runtime library.
   * The `so` name field for the shared library is set to MAJOR version. For example, for HIP 5.3 the `so` name is set to 5 (hiprtc.so.5).
 * Windows
-  * HIPRTC dll is named as hiprtcXXYY.dll where XX is MAJOR version and YY is MINOR version. For example, for HIP 5.3 the name is hiprtc0503.dll.
+  * HIPRTC dll is named as `hiprtcXXYY.dll` where XX is MAJOR version and YY is MINOR version. For example, for HIP 5.3 the name is hiprtc0503.dll.
 
 ## HIP header support
 
 * Added HIPRTC support for all the hip common header files such as library_types.h, hip_math_constants.h, hip_complex.h, math_functions.h, surface_types.h etc. from 6.1. HIPRTC users need not include any HIP macros or constants explicitly in their header files. All of these should get included via HIPRTC builtins when the app links to HIPRTC library.
 
 ## Deprecation notice
 
-* Currently HIPRTC APIs are separated from HIP APIs and HIPRTC is available as a separate library libhiprtc.so/libhiprtc.dll. But on Linux, HIPRTC symbols are also present in libhipamd64.so in order to support the existing applications. Gradually, these symbols will be removed from HIP library and applications using HIPRTC will be required to explicitly link to HIPRTC library. However, on Windows hiprtc.dll must be used as the hipamd64.dll doesn't contain the HIPRTC symbols.
-* Data types such as uint32_t, uint64_t, int32_t, int64_t defined in std namespace in HIPRTC are deprecated earlier and are being removed from ROCm release 6.1 since these can conflict with the standard C++ data types. These data types are now prefixed with `__hip__`, e.g. `__hip_uint32_t`. Applications previously using std::uint32_t or similar types can use `__hip_` prefixed types to avoid conflicts with standard std namespace or application can have their own definitions for these types. Also, type_traits templates previously defined in std namespace are moved to `__hip_internal` namespace as implementation details.
+* Currently HIPRTC APIs are separated from HIP APIs and HIPRTC is available as a separate library `libhiprtc.so`/`libhiprtc.dll`. But on Linux, HIPRTC symbols are also present in `libhipamd64.so` in order to support the existing applications. Gradually, these symbols will be removed from HIP library and applications using HIPRTC will be required to explicitly link to HIPRTC library. However, on Windows hiprtc.dll must be used as the `hipamd64.dll` doesn't contain the HIPRTC symbols.
+* Data types such as `uint32_t`, `uint64_t`, `int32_t`, `int64_t` defined in std namespace in HIPRTC are deprecated earlier and are being removed from ROCm release 6.1 since these can conflict with the standard C++ data types. These data types are now prefixed with `__hip__`, e.g. `__hip_uint32_t`. Applications previously using `std::uint32_t` or similar types can use `__hip_` prefixed types to avoid conflicts with standard std namespace or application can have their own definitions for these types. Also, type_traits templates previously defined in std namespace are moved to `__hip_internal` namespace as implementation details.
diff --git a/docs/how-to/programming_manual.md b/docs/how-to/programming_manual.md
@@ -76,7 +76,7 @@ A stronger system-level fence can be specified when the event is created with `h
 
 ### Summary and Recommendations
 
-* Coherent host memory is the default and is the easiest to use since the memory is visible to the CPU at typical synchronization points.  This memory allows in-kernel synchronization commands such as threadfence_system to work transparently.
+* Coherent host memory is the default and is the easiest to use since the memory is visible to the CPU at typical synchronization points.  This memory allows in-kernel synchronization commands such as `threadfence_system` to work transparently.
 * HIP/ROCm also supports the ability to cache host memory in the GPU using the "Non-Coherent" host memory allocations. This can provide performance benefit, but care must be taken to use the correct synchronization.
 
 ### Managed memory allocation

diff --git a/docs/reference/kernel_language.rst b/docs/reference/kernel_language.rst
@@ -1258,10 +1258,10 @@ Following is the list of supported integer intrinsics. Note that intrinsics are
         | Return the number of consecutive high-order zero bits in 64 bit integer.
 
     * - | ``unsigned int __ffs(int x)``
-        | Find the position of least signigicant bit set to 1 in a 32 bit integer.
+        | Find the position of least significant bit set to 1 in a 32 bit integer.
 
     * - | ``unsigned int __ffsll(long long int x)``
-        | Find the position of least signigicant bit set to 1 in a 64 bit signed integer.
+        | Find the position of least significant bit set to 1 in a 64 bit signed integer.
 
     * - | ``unsigned int __fns32(unsigned long long mask, unsigned int base, int offset)``
         | Find the position of the n-th set to 1 bit in a 32-bit integer.
@@ -2147,7 +2147,7 @@ implementation of malloc and free that can be called from device functions.
 `__launch_bounds__`
 ============================================================
 
-GPU multiprocessors have a fixed pool of resources (primarily registers and shared memory) which are shared by the actively running warps. Using more resources can increase IPC of the kernel but reduces the resources available for other warps and limits the number of warps that can be simulaneously running. Thus GPUs have a complex relationship between resource usage and performance.
+GPU multiprocessors have a fixed pool of resources (primarily registers and shared memory) which are shared by the actively running warps. Using more resources can increase IPC of the kernel but reduces the resources available for other warps and limits the number of warps that can be simultaneously running. Thus GPUs have a complex relationship between resource usage and performance.
 
 __launch_bounds__ allows the application to provide usage hints that influence the resources (primarily registers) used by the generated code.  It is a function attribute that must be attached to a __global__ function: