WIP

ROCm · May 24, 2024 · 9dc1f20 · 9dc1f20
1 parent f31fabb
commit 9dc1f20
Show file tree

Hide file tree

Showing 4 changed files with 22 additions and 20 deletions.
diff --git a/README.md b/README.md
@@ -1,4 +1,4 @@
-## What is this repository for? ###
+## What is this repository for?
 
 **HIP is a C++ Runtime API and Kernel Language that allows developers to create portable applications for AMD and NVIDIA GPUs from single source code.**
 

diff --git a/docs/how-to/faq.md b/docs/how-to/faq.md
@@ -287,7 +287,7 @@ See {doc}`/how-to/logging` for more information.
 ## What are the maximum limits of kernel launch parameters?
 
 Product of block.x, block.y, and block.z should be less than 1024.
-Please note, HIP does not support kernel launch with total work items defined in dimension with size gridDim x blockDim >= 2^32, so gridDim.x * blockDim.x, gridDim.y * blockDim.y and gridDim.z * blockDim.z are always less than 2^32.
+Please note, HIP does not support kernel launch with total work items defined in dimension with size  `gridDim x blockDim >= 2^32 `, so  `gridDim.x * blockDim.x, gridDim.y * blockDim.y and gridDim.z * blockDim.z ` are always less than 2^32.
 
 ## Are __shfl_*_sync functions supported on HIP platform?
 
@@ -316,8 +316,8 @@ If you have compiled the application yourself, make sure you have given the corr
 
 If you have a precompiled application/library (like rocblas, tensorflow etc) which gives you such error, there are one of two possibilities.
 
-* The application/library does not ship code object bundles for *all* of your device(s): in this case you need to recompile the application/library yourself with correct `--offload-arch`.
-* The application/library does not ship code object bundles for *some* of your device(s), for example you have a system with an APU + GPU and the library does not ship code objects for your APU. For this you can set the environment variable `HIP_VISIBLE_DEVICES` or `CUDA_VISIBLE_DEVICES` on NVdia platform, to only enable GPUs for which code object is available. This will limit the GPUs visible to your application and allow it to run.
+* The application/library does not ship code object bundles for **all** of your device(s): in this case you need to recompile the application/library yourself with correct `--offload-arch`.
+* The application/library does not ship code object bundles for **some** of your device(s), for example you have a system with an APU + GPU and the library does not ship code objects for your APU. For this you can set the environment variable `HIP_VISIBLE_DEVICES` or `CUDA_VISIBLE_DEVICES` on NVdia platform, to only enable GPUs for which code object is available. This will limit the GPUs visible to your application and allow it to run.
 
 Note: In previous releases, the error code is hipErrorNoBinaryForGpu with message "Unable to find code object for all current devices".
 The error code handling behavior is changed. HIP runtime shows the error code hipErrorSharedObjectInitFailed with message "Error: shared object initialization failed" on unsupported GPU.
@@ -365,7 +365,9 @@ Note: HIP supports LUID only on Windows OS.
 
 HIP version definition has been updated since ROCm 4.2 release as the following:
 
-HIP_VERSION=HIP_VERSION_MAJOR * 10000000 + HIP_VERSION_MINOR * 100000 + HIP_VERSION_PATCH)
+```cpp
+HIP_VERSION=HIP_VERSION_MAJOR * 10000000 + HIP_VERSION_MINOR * 100000 + HIP_VERSION_PATCH
+```
 
 HIP version can be queried from HIP API call,
 hipRuntimeGetVersion(&runtimeVersion);

diff --git a/docs/how-to/hip_porting_guide.md b/docs/how-to/hip_porting_guide.md
@@ -370,7 +370,7 @@ Code should not assume a warp size of 32 or 64.  See [Warp Cross-Lane Functions]
 
 ### Kernel launch with group size > 256
 
-Kernel code should use ```__attribute__((amdgpu_flat_work_group_size(<min>,<max>)))```.
+Kernel code should use  `__attribute__((amdgpu_flat_work_group_size(<min>,<max>))) `.
 
 For example:
 

diff --git a/docs/how-to/hip_rtc.md b/docs/how-to/hip_rtc.md
@@ -1,6 +1,6 @@
 # Programming for HIP Runtime Compiler (RTC)
 
-HIP lets you compile kernels at runtime with the ```hiprtc*``` APIs.
+HIP lets you compile kernels at runtime with the  `hiprtc* ` APIs.
 Kernels can be stored as a text string and can be passed to HIPRTC APIs alongside options to guide the compilation.
 
 NOTE:
@@ -12,7 +12,7 @@ NOTE:
 ## Example
 
 To use HIPRTC functionality, HIPRTC header needs to be included first.
-```#include <hip/hiprtc.h>```
+ `#include <hip/hiprtc.h> `
 
 Kernels can be stored in a string:
 
@@ -29,7 +29,7 @@ R"(
 )"};
 ```
 
-Now to compile this kernel, it needs to be associated with hiprtcProgram type, which is done by declaring ```hiprtcProgram prog;``` and associating the string of kernel with this program:
+Now to compile this kernel, it needs to be associated with hiprtcProgram type, which is done by declaring  `hiprtcProgram prog; ` and associating the string of kernel with this program:
 
 ```cpp
 hiprtcCreateProgram(&prog,                 // HIPRTC program
@@ -51,7 +51,7 @@ hiprtcCompileProgram(prog,     // hiprtcProgram
                     options);  // Clang Options [Supported Clang Options](clang_options.md)
 ```
 
-hiprtcCompileProgram returns a status value which can be converted to string via ```hiprtcGetErrorString```. If compilation is successful, hiprtcCompileProgram will return ```HIPRTC_SUCCESS```.
+hiprtcCompileProgram returns a status value which can be converted to string via  `hiprtcGetErrorString `. If compilation is successful, hiprtcCompileProgram will return  `HIPRTC_SUCCESS `.
 
 If the compilation fails, you can look up the logs via:
 
@@ -77,9 +77,9 @@ hiprtcGetCode(prog, kernel_binary.data());
 ```
 
 After loading the binary, hiprtcProgram can be destroyed.
-```hiprtcDestroyProgram(&prog);```
+ `hiprtcDestroyProgram(&prog); `
 
-The binary present in ```kernel_binary``` can now be loaded via ```hipModuleLoadData``` API.
+The binary present in  `kernel_binary ` can now be loaded via  `hipModuleLoadData ` API.
 
 ```cpp
 hipModule_t module;
@@ -228,10 +228,10 @@ int main() {
 
 HIPRTC provides a few HIPRTC specific flags
 
-* ```--gpu-architecture``` : This flag can guide the code object generation for a specific gpu arch. Example: ```--gpu-architecture=gfx906:sramecc+:xnack-```, its equivalent to ```--offload-arch```.
+*  `--gpu-architecture ` : This flag can guide the code object generation for a specific gpu arch. Example:  `--gpu-architecture=gfx906:sramecc+:xnack- `, its equivalent to  `--offload-arch `.
   * This option is compulsory if compilation is done on a system without AMD GPUs supported by HIP runtime.
   * Otherwise, HIPRTC will load the hip runtime and gather the current device and its architecture info and use it as option.
-* ```-fgpu-rdc``` : This flag when provided during the hiprtcCompileProgram generates the bitcode (HIPRTC doesn't convert this bitcode into ISA and binary). This bitcode can later be fetched using hiprtcGetBitcode and hiprtcGetBitcodeSize APIs.
+*  `-fgpu-rdc ` : This flag when provided during the hiprtcCompileProgram generates the bitcode (HIPRTC doesn't convert this bitcode into ISA and binary). This bitcode can later be fetched using hiprtcGetBitcode and hiprtcGetBitcodeSize APIs.
 
 ### Bitcode
 
@@ -408,15 +408,15 @@ std::cout << "hiprtcCompileProgram fails with error " << hiprtcGetErrorString(re
 
 HIPRTC provides the following API for querying the version.
 
-```hiprtcVersion(int* major, int* minor)``` -  This sets the output parameters major and minor with the HIP Runtime compilation major version and minor version number respectively.
+`hiprtcVersion(int* major, int* minor)` -  This sets the output parameters major and minor with the HIP Runtime compilation major version and minor version number respectively.
 
 Currently, it returns hardcoded value.  This should be implemented to return HIP runtime major and minor version in the future releases.
 
 ## Lowered Names (Mangled Names)
 
-HIPRTC mangles the ```__global__``` function names and names of ```__device__``` and ```__constant__``` variables. If the generated binary is being loaded using the HIP Runtime API, the kernel function or ```__device__/__constant__``` variable must be looked up by name, but this is very hard when the name has been mangled. To overcome this, HIPRTC provides API functions that map  ```__global__``` function or ```__device__/__constant__``` variable names in the source to the mangled names present in the generated binary.
+HIPRTC mangles the  `__global__ ` function names and names of  `__device__ ` and  `__constant__ ` variables. If the generated binary is being loaded using the HIP Runtime API, the kernel function or  `__device__/__constant__ ` variable must be looked up by name, but this is very hard when the name has been mangled. To overcome this, HIPRTC provides API functions that map   `__global__ ` function or  `__device__/__constant__ ` variable names in the source to the mangled names present in the generated binary.
 
-The two APIs hiprtcAddNameExpression and hiprtcGetLoweredName provide this functionality. First, a 'name expression' string denoting the address for the ```__global__``` function or ```__device__/__constant__``` variable is provided to hiprtcAddNameExpression. Then, the program is compiled with hiprtcCompileProgram. During compilation, HIPRTC will parse the name expression string as a C++ constant expression at the end of the user program. Finally, the function hiprtcGetLoweredName is called with the original name expression and it returns a pointer to the lowered name. The lowered name can be used to refer to the kernel or variable in the HIP Runtime API.
+The two APIs hiprtcAddNameExpression and hiprtcGetLoweredName provide this functionality. First, a 'name expression' string denoting the address for the  `__global__ ` function or  `__device__/__constant__ ` variable is provided to hiprtcAddNameExpression. Then, the program is compiled with hiprtcCompileProgram. During compilation, HIPRTC will parse the name expression string as a C++ constant expression at the end of the user program. Finally, the function hiprtcGetLoweredName is called with the original name expression and it returns a pointer to the lowered name. The lowered name can be used to refer to the kernel or variable in the HIP Runtime API.
 
 ### Note
 
@@ -426,7 +426,7 @@ The two APIs hiprtcAddNameExpression and hiprtcGetLoweredName provide this funct
 
 ### Example
 
-kernel containing various definitions ```__global__``` functions/function templates and ```__device__/__constant__``` variables can be stored in a string.
+kernel containing various definitions  `__global__ ` functions/function templates and  `__device__/__constant__ ` variables can be stored in a string.
 
 ```cpp
 static constexpr const char gpu_program[] {
@@ -444,7 +444,7 @@ __global__ void f3(int *result) { *result = sizeof(T); }
 )"};
 ```
 
-hiprtcAddNameExpression is called with various name expressions referring to the address of ```__global__``` functions and ```__device__/__constant__``` variables.
+hiprtcAddNameExpression is called with various name expressions referring to the address of  `__global__ ` functions and  `__device__/__constant__ ` variables.
 
 ```cpp
 kernel_name_vec.push_back("&f1");
@@ -508,4 +508,4 @@ HIPRTC follows the below versioning.
 ## Deprecation notice
 
 * Currently HIPRTC APIs are separated from HIP APIs and HIPRTC is available as a separate library libhiprtc.so/libhiprtc.dll. But on Linux, HIPRTC symbols are also present in libhipamd64.so in order to support the existing applications. Gradually, these symbols will be removed from HIP library and applications using HIPRTC will be required to explicitly link to HIPRTC library. However, on Windows hiprtc.dll must be used as the hipamd64.dll doesn't contain the HIPRTC symbols.
-* Data types such as uint32_t, uint64_t, int32_t, int64_t defined in std namespace in HIPRTC are deprecated earlier and are being removed from ROCm release 6.1 since these can conflict with the standard C++ data types. These data types are now prefixed with ```__hip__```, e.g. ```__hip_uint32_t```. Applications previously using std::uint32_t or similar types can use ```__hip_``` prefixed types to avoid conflicts with standard std namespace or application can have their own definitions for these types. Also, type_traits templates previously defined in std namespace are moved to ```__hip_internal``` namespace as implementation details.
+* Data types such as uint32_t, uint64_t, int32_t, int64_t defined in std namespace in HIPRTC are deprecated earlier and are being removed from ROCm release 6.1 since these can conflict with the standard C++ data types. These data types are now prefixed with  `__hip__ `, e.g.  `__hip_uint32_t `. Applications previously using std::uint32_t or similar types can use  `__hip_ ` prefixed types to avoid conflicts with standard std namespace or application can have their own definitions for these types. Also, type_traits templates previously defined in std namespace are moved to  `__hip_internal ` namespace as implementation details.