You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using Clang/LLVM on the release/14.x branch, external functions declared with omp declare target all return the same thread number 0 from omp_get_thread_num. The behavior works as expected in release/13.x.
Using git bisect, we have found that the bug was introduced in 423d34f.
Minimal Example
head.cpp
#include"head.h"
#include<omp.h>
#include<stdio.h>
#pragma omp declare target
intexternal_dev_fn(int a)
{
int res = a+2;
printf("I am thread %d\n", omp_get_thread_num());
return res;
}
#pragma omp end declare target
#include"head.h"
#include<stdio.h>
#include<omp.h>intmain(){
int N = 10;
int rs = 0;
#pragma omp target data map(tofrom:N,rs)
{
#pragma omp target parallel for reduction(+:rs)
for(int i = 0; i < N; i++)
{
printf("I am thread %d\n", omp_get_thread_num());
rs += external_dev_fn(i);
}
}
printf("End result is %d\n", rs);
}
We compile the above example for an NVIDIA V100 GPU with the following:
The result produced by release/14.x is as follows:
I am thread 0
I am thread 1
I am thread 2
I am thread 3
I am thread 4
I am thread 5
I am thread 6
I am thread 7
I am thread 8
I am thread 9
I am thread 0
I am thread 0
I am thread 0
I am thread 0
I am thread 0
I am thread 0
I am thread 0
I am thread 0
I am thread 0
I am thread 0
End result is 65
Whereas the correct result produced by release/13.x is:
I am thread 0
I am thread 1
I am thread 2
I am thread 3
I am thread 4
I am thread 5
I am thread 6
I am thread 7
I am thread 8
I am thread 9
I am thread 0
I am thread 1
I am thread 2
I am thread 3
I am thread 4
I am thread 5
I am thread 6
I am thread 7
I am thread 8
I am thread 9
End result is 65
The issue is that within the function external_dev_function, we expect that the return value for each thread that calls omp_get_thread_num matches the return value for each thread that calls omp_get_thread_num in the main function. If we include main.cpp within driver.cpp, the program works as expected.
Environment Details
Clang/LLVM were built with GCC 8.3.1
CUDA 10.1 is used
We have also verified this behavior on a different machine where Clang/LLVM were built with GCC 9.4.0 and CUDA 11.4.
The text was updated successfully, but these errors were encountered:
Does this work on the main branch (currently LLVM 15.0.0)? This is most likely the same issue as #54208 and also related to #55943. Both of these problems are rooted in the fact that the OpenMP runtime library is eagerly linked and internalized for each TU, which results in some needed definitions being optimized out and not correctly shared between files. This is also solved more correctly in LLVM 15 using LTO with -foffload-lto. I'm not sure if we have a chance for a backport given that 14.0.x is finished now.
Description
When using Clang/LLVM on the
release/14.x
branch, external functions declared withomp declare target
all return the same thread number 0 fromomp_get_thread_num
. The behavior works as expected inrelease/13.x
.Using git bisect, we have found that the bug was introduced in 423d34f.
Minimal Example
head.cpp
head.h
driver.cpp
We compile the above example for an NVIDIA V100 GPU with the following:
The result produced by
release/14.x
is as follows:Whereas the correct result produced by
release/13.x
is:The issue is that within the function
external_dev_function
, we expect that the return value for each thread that callsomp_get_thread_num
matches the return value for each thread that callsomp_get_thread_num
in the main function. If we includemain.cpp
withindriver.cpp
, the program works as expected.Environment Details
The text was updated successfully, but these errors were encountered: