Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[OpenMP][Offload] Incorrect behavior with external device code #56387

Closed
ZwFink opened this issue Jul 5, 2022 · 3 comments
Closed

[OpenMP][Offload] Incorrect behavior with external device code #56387

ZwFink opened this issue Jul 5, 2022 · 3 comments
Assignees
Labels
openmp question A question, not bug report. Check out https://llvm.org/docs/GettingInvolved.html instead!

Comments

@ZwFink
Copy link

ZwFink commented Jul 5, 2022

Description

When using Clang/LLVM on the release/14.x branch, external functions declared with omp declare target all return the same thread number 0 from omp_get_thread_num. The behavior works as expected in release/13.x.

Using git bisect, we have found that the bug was introduced in 423d34f.

Minimal Example

head.cpp

#include "head.h"
#include <omp.h>
#include <stdio.h>


#pragma omp declare target
int external_dev_fn(int a)
{
  int res = a+2;
  printf("I am thread %d\n", omp_get_thread_num());
  return res;
}
#pragma omp end declare target

head.h

#ifndef HEAD_HH_INCLUDED
#define HEAD_HH_INCLUDED
#include <omp.h>

#pragma omp declare target
int external_dev_fn(int a);
#pragma omp end declare target

#endif // HEAD_HH_INCLUDED

driver.cpp

#include "head.h"
#include <stdio.h>
#include <omp.h>

int main(){
  int N = 10;
  int rs = 0;
#pragma omp target data map(tofrom:N,rs)
  {

#pragma omp target parallel for reduction(+:rs)
  for(int i = 0; i < N; i++)
    {
      printf("I am thread %d\n", omp_get_thread_num());
      rs += external_dev_fn(i);
    }
  }

  printf("End result is %d\n", rs);
}

We compile the above example for an NVIDIA V100 GPU with the following:

clang++ -Xopenmp-target -march=sm_70 -fopenmp -fopenmp-targets=nvptx64 ./driver.cpp -O1 ./head.cpp

The result produced by release/14.x is as follows:

I am thread 0
I am thread 1
I am thread 2
I am thread 3
I am thread 4
I am thread 5
I am thread 6
I am thread 7
I am thread 8
I am thread 9
I am thread 0
I am thread 0
I am thread 0
I am thread 0
I am thread 0
I am thread 0
I am thread 0
I am thread 0
I am thread 0
I am thread 0
End result is 65

Whereas the correct result produced by release/13.x is:

I am thread 0
I am thread 1
I am thread 2
I am thread 3
I am thread 4
I am thread 5
I am thread 6
I am thread 7
I am thread 8
I am thread 9
I am thread 0
I am thread 1
I am thread 2
I am thread 3
I am thread 4
I am thread 5
I am thread 6
I am thread 7
I am thread 8
I am thread 9
End result is 65

The issue is that within the function external_dev_function, we expect that the return value for each thread that calls omp_get_thread_num matches the return value for each thread that calls omp_get_thread_num in the main function. If we include main.cpp within driver.cpp, the program works as expected.

Environment Details

  • Clang/LLVM were built with GCC 8.3.1
  • CUDA 10.1 is used
  • We have also verified this behavior on a different machine where Clang/LLVM were built with GCC 9.4.0 and CUDA 11.4.
@llvmbot
Copy link
Collaborator

llvmbot commented Jul 5, 2022

@llvm/issue-subscribers-openmp

@shiltian shiltian self-assigned this Jul 5, 2022
@jhuber6
Copy link
Contributor

jhuber6 commented Jul 5, 2022

Does this work on the main branch (currently LLVM 15.0.0)? This is most likely the same issue as #54208 and also related to #55943. Both of these problems are rooted in the fact that the OpenMP runtime library is eagerly linked and internalized for each TU, which results in some needed definitions being optimized out and not correctly shared between files. This is also solved more correctly in LLVM 15 using LTO with -foffload-lto. I'm not sure if we have a chance for a backport given that 14.0.x is finished now.

@ZwFink
Copy link
Author

ZwFink commented Jul 5, 2022

Aha, building with the main branch and including -foffload-lto fixes the issue. Thanks!

@ZwFink ZwFink closed this as completed Jul 5, 2022
@EugeneZelenko EugeneZelenko added the question A question, not bug report. Check out https://llvm.org/docs/GettingInvolved.html instead! label Jul 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
openmp question A question, not bug report. Check out https://llvm.org/docs/GettingInvolved.html instead!
Projects
None yet
Development

No branches or pull requests

5 participants