WIP

ROCm · May 25, 2024 · 2213922 · 2213922
1 parent 0abb1a9
commit 2213922
Show file tree

Hide file tree

Showing 3 changed files with 37 additions and 28 deletions.
diff --git a/.wordlist.txt b/.wordlist.txt
@@ -2,11 +2,16 @@ ALU
 ALUs
 APU
 AQL
+Bitcode
+bitcode
+bitcodes
 builtins
 Builtins
 clr
 cuCtx
 cuDNN
+dll
+DirectX
 EIGEN
 enqueue
 enqueues
@@ -15,14 +20,17 @@ extern
 fatbinary
 GPGPU
 hardcoded
+HIP's
 hipcc
 hipother
+HIPRTC
 hcBLAS
 icc
 Interoperation
 interoperate
 IPC
 latencies
+libstdc
 LUID
 Malloc
 malloc
@@ -34,5 +42,6 @@ PTX
 rocTX
 RTC
 SIMT
+structs
 SYCL
 typedefs
diff --git a/docs/how-to/faq.md b/docs/how-to/faq.md
@@ -11,7 +11,7 @@ HIP provides the following:
 * Kernel launching (`hipLaunchKernel`/`hipLaunchKernelGGL` is the preferred way of launching kernels. `hipLaunchKernelGGL` is a standard C/C++ macro that can serve as an alternative way to launch kernels, replacing the CUDA triple-chevron (`<<< >>>`) syntax).
 * HIP Module API to control when and how code is loaded.
 * CUDA-style kernel coordinate functions (`threadIdx`, `blockIdx`, `blockDim`, `gridDim`)
-* Cross-lane instructions including shfl, ballot, any, all
+* Cross-lane instructions including `shfl`, `ballot`, `any`, `all`
 * Most device-side math built-ins
 * Error reporting (`hipGetLastError()`, `hipGetErrorString()`)
 
@@ -27,7 +27,7 @@ At a high-level, the following features are not supported:
 * Dynamic parallelism (CUDA 5.0)
 * Graphics interoperability with OpenGL or Direct3D
 * CUDA IPC Functions (Under Development)
-* CUDA array, mipmappedArray and pitched memory
+* CUDA array, `mipmappedArray` and pitched memory
 * Queue priority controls
 
 See the [API Support Table](https://github.com/ROCm/HIPIFY/blob/amd-staging/docs/tables/CUDA_Runtime_API_functions_supported_by_HIP.md) for more detailed information.
@@ -37,8 +37,8 @@ See the [API Support Table](https://github.com/ROCm/HIPIFY/blob/amd-staging/docs
 * C++-style device-side dynamic memory allocations (free, new, delete) (CUDA 4.0)
 * Virtual functions, indirect functions and try/catch (CUDA 4.0)
 * `__prof_trigger`
-* PTX assembly (CUDA 4.0).  HIP-Clang supports inline GCN assembly.
-* Several kernel features are under development.  See the {doc}`/reference/kernel_language` for more information.
+* PTX assembly (CUDA 4.0). HIP-Clang supports inline GCN assembly.
+* Several kernel features are under development. See the {doc}`/reference/kernel_language` for more information.
 
 ## Is HIP a drop-in replacement for CUDA?
 
@@ -103,10 +103,10 @@ HIP offers several benefits over OpenCL:
 Both HIP and CUDA are dialects of C++, and thus porting between them is relatively straightforward.
 Both dialects support templates, classes, lambdas, and other C++ constructs.
 As one example, the hipify-perl tool was originally a Perl script that used simple text conversions from CUDA to HIP.
-HIP and CUDA provide similar math library calls as well.  In summary, the HIP philosophy was to make the HIP language close enough to CUDA that the porting effort is relatively simple.
-This reduces the potential for error, and also makes it easy to automate the translation.  HIP goal is to quickly get the ported program running on both platforms with little manual intervention, so that the programmer can focus on performance optimizations.
+HIP and CUDA provide similar math library calls as well. In summary, the HIP philosophy was to make the HIP language close enough to CUDA that the porting effort is relatively simple.
+This reduces the potential for error, and also makes it easy to automate the translation. HIP goal is to quickly get the ported program running on both platforms with little manual intervention, so that the programmer can focus on performance optimizations.
 
-There have been several tools that have attempted to convert CUDA into OpenCL, such as CU2CL.  OpenCL is a C99-based kernel language (rather than C++) and also does not support single-source compilation.
+There have been several tools that have attempted to convert CUDA into OpenCL, such as CU2CL. OpenCL is a C99-based kernel language (rather than C++) and also does not support single-source compilation.
 As a result, the OpenCL syntax is different from CUDA, and the porting tools have to perform some heroic transformations to bridge this gap.
 The tools also struggle with more complex CUDA applications, in particular, those that use templates, classes, or other C++ features inside the kernel.
 
@@ -133,13 +133,13 @@ HIP-Clang is a Clang/LLVM based compiler to compile HIP programs which can run o
 
 ## Why use HIP rather than supporting CUDA directly?
 
-While HIP is a strong subset of the CUDA, it is a subset.  The HIP layer allows that subset to be clearly defined and documented.
+While HIP is a strong subset of the CUDA, it is a subset. The HIP layer allows that subset to be clearly defined and documented.
 Developers who code to the HIP API can be assured their code will remain portable across NVIDIA and AMD platforms.
-In addition, HIP defines portable mechanisms to query architectural features and supports a larger 64-bit wavesize which expands the return type for cross-lane functions like ballot and shuffle from 32-bit ints to 64-bit ints.
+In addition, HIP defines portable mechanisms to query architectural features and supports a larger 64-bit `WaveSize` which expands the return type for cross-lane functions like ballot and shuffle from 32-bit integers to 64-bit integers.
 
 ## Can I develop HIP code on an NVIDIA CUDA platform?
 
-Yes.  HIP's CUDA path only exposes the APIs and functionality that work on both NVCC and AMDGPU back-ends.
+Yes. HIP's CUDA path only exposes the APIs and functionality that work on both NVCC and AMDGPU back-ends.
 "Extra" APIs, parameters, and features which exist in CUDA but not in HIP-Clang will typically result in compile-time or run-time errors.
 Developers need to use the HIP API for most accelerator code and bracket any CUDA-specific code with preprocessor conditionals.
 Developers concerned about portability should, of course, run on both platforms, and should expect to tune for performance.
@@ -183,8 +183,8 @@ HIP is a source-portable language that can be compiled to run on either AMD or N
 
 ## On HIP-Clang, can I link HIP code with host code compiled with another compiler such as gcc, icc, or clang?
 
-Yes.  HIP generates the object code which conforms to the GCC ABI, and also links with libstdc++.  This means you can compile host code with the compiler of your choice and link the generated object code
-with GPU code compiled with HIP.  Larger projects often contain a mixture of accelerator code (initially written in CUDA with nvcc) and host code (compiled with gcc, icc, or clang). These projects
+Yes. HIP generates the object code which conforms to the GCC ABI, and also links with libstdc++. This means you can compile host code with the compiler of your choice and link the generated object code
+with GPU code compiled with HIP. Larger projects often contain a mixture of accelerator code (initially written in CUDA with NVCC) and host code (compiled with gcc, icc, or clang). These projects
 can convert the accelerator code to HIP, compile that code with hipcc, and link with object code from their preferred compiler.
 
 ## Can HIP API support C style application? What is the difference between C and C++?
@@ -242,9 +242,9 @@ dim3 grid = {1,1,1}; // initialized as in C++
 
 ## Can I install both CUDA SDK and HIP-Clang on the same machine?
 
-Yes. You can use HIP_PLATFORM to choose which path hipcc targets.  This configuration can be useful when using HIP to develop an application which is portable to both AMD and NVIDIA.
+Yes. You can use HIP_PLATFORM to choose which path hipcc targets. This configuration can be useful when using HIP to develop an application which is portable to both AMD and NVIDIA.
 
-## HIP detected my platform (HIP-Clang vs nvcc) incorrectly * what should I do?
+## HIP detected my platform (HIP-Clang vs NVCC) incorrectly * what should I do?
 
 HIP will set the platform to AMD and use HIP-Clang as compiler if it sees that the AMD graphics driver is installed and has detected an AMD GPU.
 Sometimes this isn't what you want * you can force HIP to recognize the platform by setting the following,
@@ -270,12 +270,12 @@ HIP_COMPILER=cuda
 HIP_RUNTIME=nvcc
 ```
 
-One symptom of this problem is the message "error: 'unknown error'(11) at `square.hipref.cpp:56`.  This can occur if you have a CUDA installation on an AMD platform, and HIP incorrectly detects the platform as nvcc.  HIP may be able to compile the application using the nvcc tool-chain but will generate this error at runtime since the platform does not have a CUDA device.
+One symptom of this problem is the message "error: 'unknown error'(11) at `square.hipref.cpp:56`. This can occur if you have a CUDA installation on an AMD platform, and HIP incorrectly detects the platform as NVCC. HIP may be able to compile the application using the NVCC tool-chain but will generate this error at runtime since the platform does not have a CUDA device.
 
 ## On CUDA, can I mix CUDA code with HIP code?
 
-Yes.  Most HIP data structures (`hipStream_t`, `hipEvent_t`) are typedefs to CUDA equivalents and can be intermixed.  Both CUDA and HIP use integer device ids.
-One notable exception is that `hipError_t` is a new type, and cannot be used where a `cudaError_t` is expected.  In these cases, refactor the code to remove the expectation.  Alternatively, hip_runtime_api.h defines functions which convert between the error code spaces:
+Yes. Most HIP data structures (`hipStream_t`, `hipEvent_t`) are typedefs to CUDA equivalents and can be intermixed. Both CUDA and HIP use integer device ids.
+One notable exception is that `hipError_t` is a new type, and cannot be used where a `cudaError_t` is expected. In these cases, refactor the code to remove the expectation. Alternatively, hip_runtime_api.h defines functions which convert between the error code spaces:
 
 `hipErrorToCudaError`
 `hipCUDAErrorTohipError`
@@ -292,17 +292,17 @@ See {doc}`/how-to/logging` for more information.
 Product of block.x, block.y, and block.z should be less than 1024.
 Please note, HIP does not support kernel launch with total work items defined in dimension with size `gridDim x blockDim >= 2^32`, so `gridDim.x * blockDim.x, gridDim.y * blockDim.y and gridDim.z * blockDim.z` are always less than 2^32.
 
-## Are __shfl_*_sync functions supported on HIP platform?
+## Are ``__shfl_*_sync`` functions supported on HIP platform?
 
-__shfl_*_sync is not supported on HIP but for nvcc path CUDA 9.0 and above all shuffle calls get redirected to it's sync version.
+``__shfl_*_sync`` is not supported on HIP but for NVCC path CUDA 9.0 and above all shuffle calls get redirected to it's sync version.
 
 ## How to create a guard for code that is specific to the host or the GPU?
 
-The compiler defines the `__HIP_DEVICE_COMPILE__` macro only when compiling the code for the GPU.  It could be used to guard code that is specific to the host or the GPU.
+The compiler defines the `__HIP_DEVICE_COMPILE__` macro only when compiling the code for the GPU. It could be used to guard code that is specific to the host or the GPU.
 
 ## Why _OpenMP is undefined when compiling with `-fopenmp`?
 
-When compiling an OpenMP source file with `hipcc -fopenmp`, the compiler may generate error if there is a reference to the `_OPENMP` macro.  This is due to a limitation in hipcc that treats any source file type (for example `.cpp`) as an HIP translation unit leading to some conflicts with the OpenMP language switch.  If the OpenMP source file doesn't contain any HIP language constructs you could work around this issue by adding the `-x c++` switch to force the compiler to treat the file as regular C++.  Another approach would be to guard the OpenMP code with `#ifdef _OPENMP` so that the code block is disabled when compiling for the GPU.  The `__HIP_DEVICE_COMPILE__` macro defined by the HIP compiler when compiling GPU code could also be used for guarding code paths specific to the host or the GPU.
+When compiling an OpenMP source file with `hipcc -fopenmp`, the compiler may generate error if there is a reference to the `_OPENMP` macro. This is due to a limitation in hipcc that treats any source file type (for example `.cpp`) as an HIP translation unit leading to some conflicts with the OpenMP language switch. If the OpenMP source file doesn't contain any HIP language constructs you could work around this issue by adding the `-x c++` switch to force the compiler to treat the file as regular C++. Another approach would be to guard the OpenMP code with `#ifdef _OPENMP` so that the code block is disabled when compiling for the GPU. The `__HIP_DEVICE_COMPILE__` macro defined by the HIP compiler when compiling GPU code could also be used for guarding code paths specific to the host or the GPU.
 
 ## Does the HIP-Clang compiler support extern shared declarations?
 
@@ -317,7 +317,7 @@ This error message is seen due to the fact that you do not have valid code objec
 
 If you have compiled the application yourself, make sure you have given the correct device name(s) and its features via: `--offload-arch`. If you are not mentioning the `--offload-arch`, make sure that `hipcc` is using the correct offload arch by verifying the hipcc output generated by setting the environment variable `HIPCC_VERBOSE=1`.
 
-If you have a precompiled application/library (like rocblas, tensorflow etc) which gives you such error, there are one of two possibilities.
+If you have a precompiled application/library (like rocblas, TensorFlow etc) which gives you such error, there are one of two possibilities.
 
 * The application/library does not ship code object bundles for __all__ of your device(s): in this case you need to recompile the application/library yourself with correct `--offload-arch`.
 * The application/library does not ship code object bundles for __some__ of your device(s), for example you have a system with an APU + GPU and the library does not ship code objects for your APU. For this you can set the environment variable `HIP_VISIBLE_DEVICES` or `CUDA_VISIBLE_DEVICES` on NVIDIA platform, to only enable GPUs for which code object is available. This will limit the GPUs visible to your application and allow it to run.

diff --git a/docs/how-to/hip_rtc.md b/docs/how-to/hip_rtc.md
@@ -6,7 +6,7 @@ Kernels can be stored as a text string and can be passed to HIPRTC APIs alongsid
 NOTE:
 
 * This library can be used on systems without HIP installed nor AMD GPU driver installed at all (offline compilation). Therefore, it does not depend on any HIP runtime library.
-* But it does depend on COMGr. You may try to statically link COMGr into HIPRTC to avoid any ambiguity.
+* But it does depend on comgr. You may try to statically link comgr into HIPRTC to avoid any ambiguity.
 * Developers can decide to bundle this library with their application.
 
 ## Example
@@ -231,7 +231,7 @@ HIPRTC provides a few HIPRTC specific flags
 * `--gpu-architecture` : This flag can guide the code object generation for a specific gpu arch. Example: `--gpu-architecture=gfx906:sramecc+:xnack-`, its equivalent to `--offload-arch`.
   * This option is compulsory if compilation is done on a system without AMD GPUs supported by HIP runtime.
   * Otherwise, HIPRTC will load the hip runtime and gather the current device and its architecture info and use it as option.
-* `-fgpu-rdc` : This flag when provided during the `hiprtcCompileProgram` generates the bitcode (HIPRTC doesn't convert this bitcode into ISA and binary). This bitcode can later be fetched using hiprtcGetBitcode and `hiprtcGetBitcodeSize` APIs.
+* `-fgpu-rdc` : This flag when provided during the `hiprtcCompileProgram` generates the bitcode (HIPRTC doesn't convert this bitcode into ISA and binary). This bitcode can later be fetched using `hiprtcGetBitcode` and `hiprtcGetBitcodeSize` APIs.
 
 ### Bitcode
 
@@ -351,11 +351,11 @@ HIPRTC_JIT_NUM_INPUT_TYPES = (HIPRTC_JIT_NUM_LEGACY_INPUT_TYPES + 3)
 
 ### Backward Compatibility of LLVM Bitcode/IR
 
-For HIP applications utilizing HIPRTC to compile LLVM bitcode/IR, compatibility is assured only when the ROCm or HIP SDK version used for generating the LLVM bitcode/IR matches the version used during the runtime compilation. When an application requires the ingestion of bitcode/IR not derived from the currently installed AMD compiler, it must run with HIPRTC and COMgr dynamic libraries that are compatible with the version of the bitcode/IR.
+For HIP applications utilizing HIPRTC to compile LLVM bitcode/IR, compatibility is assured only when the ROCm or HIP SDK version used for generating the LLVM bitcode/IR matches the version used during the runtime compilation. When an application requires the ingestion of bitcode/IR not derived from the currently installed AMD compiler, it must run with HIPRTC and comgr dynamic libraries that are compatible with the version of the bitcode/IR.
 
-COMgr, a shared library, incorporates the LLVM/Clang compiler that HIPRTC relies on. To identify the bitcode/IR version that COMgr is compatible with, one can execute "clang -v" using the clang binary from the same ROCm or HIP SDK package. For instance, if compiling bitcode/IR version 14, the HIPRTC and COMgr libraries released by AMD around mid 2022 would be the best choice, assuming the LLVM/Clang version included in the package is also version 14.
+comgr, a shared library, incorporates the LLVM/Clang compiler that HIPRTC relies on. To identify the bitcode/IR version that comgr is compatible with, one can execute "clang -v" using the clang binary from the same ROCm or HIP SDK package. For instance, if compiling bitcode/IR version 14, the HIPRTC and comgr libraries released by AMD around mid 2022 would be the best choice, assuming the LLVM/Clang version included in the package is also version 14.
 
-To ensure smooth operation and compatibility, an application may choose to ship the specific versions of HIPRTC and COMgr dynamic libraries, or it may opt to clearly specify the version requirements and dependencies. This approach guarantees that the application can correctly compile the specified version of bitcode/IR.
+To ensure smooth operation and compatibility, an application may choose to ship the specific versions of HIPRTC and comgr dynamic libraries, or it may opt to clearly specify the version requirements and dependencies. This approach guarantees that the application can correctly compile the specified version of bitcode/IR.
 
 ### Link Options
 
@@ -489,7 +489,7 @@ hipModuleGetFunction(&kernel, module, name);
 hipModuleLaunchKernel(kernel, 1, 1, 1, 1, 1, 1, 0, nullptr, nullptr, config);
 ```
 
-Please have a look at hiprtcGetLoweredName.cpp for the detailed example.
+Please have a look at `hiprtcGetLoweredName.cpp` for the detailed example.
 
 ## Versioning