-
Notifications
You must be signed in to change notification settings - Fork 11.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
discrepancy between clang and nvcc regarding std::is_invocable_v #69956
Comments
It looks like the same issue as what we've recently seen with concepts and template deduction guidelines. In order to determine callability, we need to plumb through the caller context, and we currently don't (or assume global/host context, as it appears to be the case in the reproducer above.) AFAICT, NVCC gets it wrong in the opposite direction and claims that everything is callable from everywhere: It may end up being the least bad choice here, as the user may instantiate a type based on std::is_invocable_v and the same type may be applicable to GPU and the host and we can't give two different answers within the same compilation for the same type we're creating with I guess it boils down to what we want Ideally, within your reproducer we'd want |
FWIW we have experience with using strict availability attributes to influence how things SFINAE-away (which is what's happening here under the hood), and it has been the cause of a lot of issues. I am not well versed in CUDA, but from the outside I would strongly recommend that |
That suggests that we may need to stick with the "true if it may be callable somewhere" interpretation, which is what NVCC appears to do. On the positive side, it should be consistent for both host and GPU sub-compilations. We may need to grow some sort of target-specific extension |
I think something like this is quite desirable as we work towards running e.g. PSTL algorithms on GPUs (see for example #66968). However, it would probably operate at a different level than something like |
The reduced test case is here https://godbolt.org/z/xfYs16M35 clang tries to substitute for that, it tries to substitute for that it tries to substitute which ends up as a call adding In a sense, clang's behavior is reasonable. When we do substitutions for However, we are now facing the issue that One workaround for this issue is to add a wrapper header for Another solution is to disable availability check during template substitution, but this could cause consistency issues and regressions. |
Currently std::is_invocable does not work for CUDA/HIP since its implementation requires checking whether a function is invocable in the context of a synthesized host function. In general, to make <type_traits> work with CUDA/HIP, the template functions need to be defined as so that they are available in both host and device contexts. Fixes: llvm#69956 Fixes: SWDEV-428314
Added option -foffload-implicit-host-device-templates which is off by default. When the option is on, template functions and specializations without host/device attributes have implicit host device attributes. They can be overridden by device template functions with the same signagure. They are emitted on device side only if they are used on device side. This feature is added as an extension. `__has_extension(cuda_implicit_host_device_templates)` can be used to check whether it is enabled. This is to facilitate using standard C++ headers for device. Fixes: llvm#69956 Fixes: SWDEV-428314
Added option -foffload-implicit-host-device-templates which is off by default. When the option is on, template functions and specializations without host/device attributes have implicit host device attributes. They can be overridden by device template functions with the same signagure. They are emitted on device side only if they are used on device side. This feature is added as an extension. `__has_extension(cuda_implicit_host_device_templates)` can be used to check whether it is enabled. This is to facilitate using standard C++ headers for device. Fixes: #69956 Fixes: SWDEV-428314
@llvm/issue-subscribers-clang-codegen Author: Yaxun (Sam) Liu (yxsamliu)
https://godbolt.org/z/3cP5v4cbd
Basically, nvcc does not check availability by host/device attributes in std::is_invocable_v but clang does. This makes std::is_invocable_v returns false for a host function in a kernel for clang but not for nvcc. Open this issue to discuss whether this should be treated as a clang bug. @Artem-B |
@llvm/issue-subscribers-clang-driver Author: Yaxun (Sam) Liu (yxsamliu)
https://godbolt.org/z/3cP5v4cbd
Basically, nvcc does not check availability by host/device attributes in std::is_invocable_v but clang does. This makes std::is_invocable_v returns false for a host function in a kernel for clang but not for nvcc. Open this issue to discuss whether this should be treated as a clang bug. @Artem-B |
@llvm/issue-subscribers-clang-frontend Author: Yaxun (Sam) Liu (yxsamliu)
https://godbolt.org/z/3cP5v4cbd
Basically, nvcc does not check availability by host/device attributes in std::is_invocable_v but clang does. This makes std::is_invocable_v returns false for a host function in a kernel for clang but not for nvcc. Open this issue to discuss whether this should be treated as a clang bug. @Artem-B |
Added option -foffload-implicit-host-device-templates which is off by default. When the option is on, template functions and specializations without host/device attributes have implicit host device attributes. They can be overridden by device template functions with the same signagure. They are emitted on device side only if they are used on device side. This feature is added as an extension. `__has_extension(cuda_implicit_host_device_templates)` can be used to check whether it is enabled. This is to facilitate using standard C++ headers for device. Fixes: llvm#69956 Fixes: SWDEV-428314
https://godbolt.org/z/3cP5v4cbd
Basically, nvcc does not check availability by host/device attributes in std::is_invocable_v but clang does. This makes std::is_invocable_v returns false for a host function in a kernel for clang but not for nvcc.
Open this issue to discuss whether this should be treated as a clang bug.
@Artem-B
The text was updated successfully, but these errors were encountered: