-
Notifications
You must be signed in to change notification settings - Fork 11.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Flang][OpenMP] Fix to variables not inheriting data sharing attributes #74964
Conversation
@llvm/pr-subscribers-flang-openmp @llvm/pr-subscribers-flang-fir-hlfir Author: None (harishch4) ChangesWhen a default(none) clause exists and a threadprivate variable is used inside the construct, the variable does not inherit threadprivate behavior and throws the below error. > error: The DEFAULT(NONE) clause requires that 'a' must be listed in a data-sharing attribute clause Added a condition to skip the error if it is a threadprivate variable. Fixes: #49545 Full diff: https://github.com/llvm/llvm-project/pull/74964.diff 2 Files Affected:
diff --git a/flang/lib/Semantics/resolve-directives.cpp b/flang/lib/Semantics/resolve-directives.cpp
index da6c865ad56a3b..d3f8f1d3ad7c91 100644
--- a/flang/lib/Semantics/resolve-directives.cpp
+++ b/flang/lib/Semantics/resolve-directives.cpp
@@ -1910,7 +1910,8 @@ void OmpAttributeVisitor::Post(const parser::Name &name) {
if (Symbol * found{currScope().FindSymbol(name.source)}) {
if (symbol != found) {
name.symbol = found; // adjust the symbol within region
- } else if (GetContext().defaultDSA == Symbol::Flag::OmpNone) {
+ } else if (GetContext().defaultDSA == Symbol::Flag::OmpNone &&
+ !symbol->test(Symbol::Flag::OmpThreadprivate)) {
context_.Say(name.source,
"The DEFAULT(NONE) clause requires that '%s' must be listed in "
"a data-sharing attribute clause"_err_en_US,
diff --git a/flang/test/Lower/OpenMP/threadprivate-default-clause.f90 b/flang/test/Lower/OpenMP/threadprivate-default-clause.f90
new file mode 100644
index 00000000000000..be07aeca2d5c32
--- /dev/null
+++ b/flang/test/Lower/OpenMP/threadprivate-default-clause.f90
@@ -0,0 +1,60 @@
+! Simple test for lowering of OpenMP Threadprivate Directive with HLFIR.
+
+!RUN: %flang_fc1 -emit-hlfir -fopenmp %s -o - | FileCheck %s
+
+!CHECK-LABEL: func.func @_QPsub1() {
+!CHECK: %[[A:.*]] = fir.address_of(@_QFsub1Ea) : !fir.ref<i32>
+!CHECK: %[[A_DECL:.*]]:2 = hlfir.declare %[[A]] {uniq_name = "_QFsub1Ea"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
+!CHECK: %[[A_TP0:.*]] = omp.threadprivate %[[A_DECL]]#1 : !fir.ref<i32> -> !fir.ref<i32>
+!CHECK: %[[A_CVT:.*]]:2 = hlfir.declare %[[A_TP0]] {uniq_name = "_QFsub1Ea"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
+!CHECK: omp.parallel {
+!CHECK: %[[A_TP:.*]] = omp.threadprivate %[[A_DECL]]#1 : !fir.ref<i32> -> !fir.ref<i32>
+!CHECK: %[[A_TP_DECL:.*]]:2 = hlfir.declare %[[A_TP]] {uniq_name = "_QFsub1Ea"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
+!CHECK: %[[TID:.*]] = fir.call @omp_get_thread_num() fastmath<contract> : () -> i32
+!CHECK: hlfir.assign %[[TID]] to %[[A_TP_DECL]]#0 : i32, !fir.ref<i32>
+!CHECK: omp.terminator
+!CHECK: }
+
+subroutine sub1()
+ use omp_lib
+ integer, save :: a
+ !$omp threadprivate(a)
+ !$omp parallel default(none)
+ a = omp_get_thread_num()
+ !$omp end parallel
+end subroutine
+
+!CHECK-LABEL: func.func @_QPsub2() {
+!CHECK: %[[BLK:.*]] = fir.address_of(@blk_) : !fir.ref<!fir.array<4xi8>>
+!CHECK: %[[BLK_CVT:.*]] = fir.convert %[[BLK]] : (!fir.ref<!fir.array<4xi8>>) -> !fir.ref<!fir.array<?xi8>>
+!CHECK: %c0 = arith.constant 0 : index
+!CHECK: %[[A_ADDR:.*]] = fir.coordinate_of %[[BLK_CVT]], %c0 : (!fir.ref<!fir.array<?xi8>>, index) -> !fir.ref<i8>
+!CHECK: %[[A_CVT:.*]] = fir.convert %[[A_ADDR]] : (!fir.ref<i8>) -> !fir.ref<i32>
+!CHECK: %[[A_DECL:.*]]:2 = hlfir.declare %[[A_CVT]] {uniq_name = "_QFsub2Ea"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
+!CHECK: %[[A_TP0:.*]] = omp.threadprivate %[[BLK]] : !fir.ref<!fir.array<4xi8>> -> !fir.ref<!fir.array<4xi8>>
+!CHECK: %[[A_TP0_CVT:.*]] = fir.convert %[[A_TP0]] : (!fir.ref<!fir.array<4xi8>>) -> !fir.ref<!fir.array<?xi8>>
+!CHECK: %c0_0 = arith.constant 0 : index
+!CHECK: %[[A_TP0_ADDR:.*]] = fir.coordinate_of %[[A_TP0_CVT]], %c0_0 : (!fir.ref<!fir.array<?xi8>>, index) -> !fir.ref<i8>
+!CHECK: %[[A_TP0_ADDR_CVT:.*]] = fir.convert %[[A_TP0_ADDR]] : (!fir.ref<i8>) -> !fir.ref<i32>
+!CHECK: %[[A_TP0_DECL:.*]]:2 = hlfir.declare %[[A_TP0_ADDR_CVT]] {uniq_name = "_QFsub2Ea"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
+!CHECK: omp.parallel {
+!CHECK: %[[BLK_TP:.*]] = omp.threadprivate %[[BLK]] : !fir.ref<!fir.array<4xi8>> -> !fir.ref<!fir.array<4xi8>>
+!CHECK: %[[BLK_TP_CVT:.*]] = fir.convert %[[BLK_TP]] : (!fir.ref<!fir.array<4xi8>>) -> !fir.ref<!fir.array<?xi8>>
+!CHECK: %c0_1 = arith.constant 0 : index
+!CHECK: %[[A_TP_ADDR:.*]] = fir.coordinate_of %[[BLK_TP_CVT]], %c0_1 : (!fir.ref<!fir.array<?xi8>>, index) -> !fir.ref<i8>
+!CHECK: %[[A_TP_ADDR_CVT:.*]] = fir.convert %[[A_TP_ADDR]] : (!fir.ref<i8>) -> !fir.ref<i32>
+!CHECK: %[[A_TP_DECL:.*]]:2 = hlfir.declare %[[A_TP_ADDR_CVT]] {uniq_name = "_QFsub2Ea"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
+!CHECK: %[[TID:.*]] = fir.call @omp_get_thread_num() fastmath<contract> : () -> i32
+!CHECK: hlfir.assign %[[TID]] to %[[A_TP_DECL]]#0 : i32, !fir.ref<i32>
+!CHECK: omp.terminator
+!CHECK: }
+
+subroutine sub2()
+ use omp_lib
+ integer :: a
+ common/blk/a
+ !$omp threadprivate(/blk/)
+ !$omp parallel default(none)
+ a = omp_get_thread_num()
+ !$omp end parallel
+end subroutine
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the patch that looks at fixing this long-standing issue. Have a comment inline.
@@ -1910,7 +1910,8 @@ void OmpAttributeVisitor::Post(const parser::Name &name) { | |||
if (Symbol * found{currScope().FindSymbol(name.source)}) { | |||
if (symbol != found) { | |||
name.symbol = found; // adjust the symbol within region | |||
} else if (GetContext().defaultDSA == Symbol::Flag::OmpNone) { | |||
} else if (GetContext().defaultDSA == Symbol::Flag::OmpNone && | |||
!symbol->test(Symbol::Flag::OmpThreadprivate)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would including Threadprivate also into IsObjectWithDSA
work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Kiran, Apologies for the late reply, I was on vacation.
The symbol is already being added to objectWithDSA map in OmpAttributeVisitor::Pre(const parser::OpenMPThreadprivate)->ResolveOmpObjectList()->ResolveOmpObject()->AddToContextObjectWithDSA(), yet in OmpAttributeVisitor::Post(const parser::Name) the same symbol when checked in IsObjectWithDSA() is getting false.
Not only is this unused, it's really confusing having getAPValueResult() and getResultAsAPValue() as sibling APIs
-Werror is now a global default as of commit c52b467 ("Reapply "[libc] build with -Werror (llvm#73966)" (llvm#74506)")
I noticed this while working on llvm#78658
…fle (llvm#78636)" This reverts commit 4d11f04. This breaks some programs as mentioned in llvm#78636
As explained in [1], this linker is functionally equivalent to the classic one (`ld64`) for build system purposes -- in particular to enable the use of order files to link `clang`. For this reason, in addition to fixing the detection rename `LLVM_LINKER_IS_LD64` to `LLVM_LINKER_IS_APPLE` to make the result of such detection more clear -- this should not cause any issue to downstream users, from a quick search in SourceGraph [2], only Swift uses the value of this variable (which I will take care of updating in due time). [1]: https://developer.apple.com/documentation/xcode-release-notes/xcode-15-release-notes#Linking [2]: https://sourcegraph.com/search?q=context:global+LLVM_LINKER_IS_LD64+lang:cmake+fork:no+-file:AddLLVM.cmake+-file:clang/tools/driver/CMakeLists.txt&patternType=standard&sm=1&groupBy=repo rdar://120740222
Adding weights to utility nodes in BP so that we can give more importance to certain utilities. This is useful when we optimize several objectives jointly.
Use a fixed width integer type and assert that DwarRegNum fits the 16 bits. This is a follow up to review comments on llvm#78600.
…bort (llvm#78561) In the hardening modes that can be used in production (`fast` and `extensive`), make a failed assertion invoke a trap instruction rather than calling verbose abort. In the debug mode, still keep calling verbose abort to provide a better user experience and to allow us to keep our existing testing infrastructure for verifying assertion messages. Since the debug mode by definition enables all assertions, we can be sure that we still check all the assertion messages in the library when running the test suite in the debug mode. The main motivation to use trapping in production is to achieve better code generation and reduce the binary size penalty. This way, the assertion handler can compile to a single instruction, whereas the existing mechanism with verbose abort results in generating a function call that in general cannot be optimized away (made worse by the fact that it's a variadic function, imposing an additional penalty). See the [RFC](https://discourse.llvm.org/t/rfc-hardening-in-libc/73925) for more details. Note that this mechanism can now be completely [overridden at CMake configuration time](llvm#77883). This patch also significantly refactors `check_assertion.h` and expands its test coverage. The main changes: - when overriding `verbose_abort`, don't do matching inside the function -- just print the error message to `stderr`. This removes the need to set a global matcher and allows to do matching in the parent process after the child finishes; - remove unused logic for matching source locations and for using wildcards; - make matchers simple functors; - introduce `DeathTestResult` that keeps data about the test run, primarily to make it easier to test. In addition to the refactoring, `check_assertion.h` can now recognize when a process exits due to a trap.
This mirrors how the ELF linker works. I wasn't able to find anywhere where this is currently tested. Followup to llvm#78640, which triggered a regression.
…lvm#74629) `-fvisibility-from-dllstorageclass` allows for overriding the visibility of globals from their DLL storage class. The visibility to apply can be customised for the different classes of globals via a set of dependent options that specify the mapping values: - `-fvisibility-dllexport=<value>` - `-fvisibility-nodllstorageclass=<value>` - `-fvisibility-externs-dllimport=<value>` - `-fvisibility-externs-nodllstorageclass=<value>` Currently, one of the existing LLVM visibilities, `hidden`, `protected`, `default`, can be used as a mapping value. This change adds a new mapping value: `keep`, which specifies that the visibility should not be overridden for that class of globals. The behaviour of `-fvisibility-from-dllstorageclass` is otherwise unchanged and existing uses of this set of options will be unaffected. The background to this change is that currently the PS4 and PS5 compilers effectively ignore visibility - dllimport/export is the supported method for export control in C/C++ source code. Now, we would like to support visibility attributes and options in our frontend, in addition to dllimport/export. To support this, we will override the visibility of globals with explicit dllimport/export annotations but use the `keep` setting for globals which do not have an explicit dllimport/export. There are also some minor improvements to the existing options: - Make the `LANGOPS` `BENIGN` as they don't involve the AST. - Correct/clarify the help text for the options.
We just need to check if the global is large or not. In the kernel code model, globals are in the negative 2GB of the address space, so globals can be a sign extended 32-bit immediate. In other code models, small globals are in the low 2GB of the address space, so sign extending them is equivalent to zero extending them.
We want to know the upper 33 bits of the And Input are zero. SExt only guarantees they are the same. We originally checked for SExt or ZExt when we were using isImpliedByDomCondition because a ZExt may have been changed to SExt before we visited the And. We are no longer using isImpliedByDomCondition so we can only look for zext with the nneg flag. While here, switch to PatternMatch to simplify the code. Fixes llvm#78783
When the MachODebugMapParser encounters an object file that cannot be found on disk, it currently leaves the parser in an incoherent state, resulting in spurious warnings that can in turn slow down dsymutil. This fixes llvm#78411. rdar://117515153
…lvm#72717)" This reverts commit 5954b9d due to broken Windows build
It implements transformation to optimize accesses to shared memory. Reference: https://reviews.llvm.org/D127457 _This change adds a transformation and pass to the NvGPU dialect that attempts to optimize reads/writes from a memref representing GPU shared memory in order to avoid bank conflicts. Given a value representing a shared memory memref, it traverses all reads/writes within the parent op and, subject to suitable conditions, rewrites all last dimension index values such that element locations in the final (col) dimension are given by newColIdx = col % vecSize + perm[row](col / vecSize, row) where perm is a permutation function indexed by row and vecSize is the vector access size in elements (currently assumes 128bit vectorized accesses, but this can be made a parameter). This specific transformation can help optimize typical distributed & vectorized accesses common to loading matrix multiplication operands to/from shared memory._
Prior to this change, we wouldn't build headers that aren't referenced by other parts of the libc which would result in a build error during installation. To address this, we make the header target a dependency of the libc archive. Additionally, we also redo the install targets, moving the install targets closer to build targets and simplifying the hierarchy and generally matching what we do for other runtimes.
…lvm#77761) This brings `createBodyOfOp` to its final intended form. First, input privatization is performed, then the recursive lowering takes place, and finally the output privatization (lastprivate) is done. This enables fixing a known issue with infinite loops inside of an OpenMP region, and the fix is included in this patch. Fixes llvm#74348. Recursive lowering [5/5] --------- Co-authored-by: Kiran Chandramohan <[email protected]>
Add the -fspv-target-env option to the clang-dxc compatibility driver to specify the SPIR-V target environment, which is propagated to the target Triple.
…after llvm#75826 (NFC) llvm-project/llvm/lib/Transforms/Scalar/RewriteStatepointsForGC.cpp:1064:18: error: unused variable 'I' [-Werror,-Wunused-variable] Instruction *I = cast<Instruction>(Pair.first); ^ llvm-project/llvm/lib/Transforms/Scalar/RewriteStatepointsForGC.cpp:1066:11: error: unused variable 'BaseValue' [-Werror,-Wunused-variable] auto *BaseValue = State.getBaseValue(); ^ 2 errors generated.
Some are missing setting of HSA_XNACK=1 environment variable, used to enable unified memory support on amdgpu's when it's not been set at kernel boot time. Some others needed to be marked as supporting unified_shared_memory in the lit test harness. Extend lit test harness to enable unified_shared_memory requirement for AMD GPUs. Reland: llvm#77851
The 'vector_length' clause is the first of the 'int-expr' clauses that I've implemented. Currently this is just being parsed as an assignment-expr, since it needs to be usable in a list. Sema implementation will enforce the integral-nature of it.
Rename intrinsics for fcvtu to fcvtzu and fcvts to fcvtzs. Use llvm_anyvector_ty for both multi vector returns and operands, therefore the return and operands can be specified in the intrinsic call, e.g. @llvm.aarch64.sve.scvtf.x4.nxv4f32.nxv4i32
…mize std::make_exception_ptr (llvm#65534) This patch implements __cxa_init_primary_exception, an extension to the Itanium C++ ABI. This extension is already present in both libsupc++ and libcxxrt. This patch also starts making use of this function in std::make_exception_ptr: instead of going through a full throw/catch cycle, we are now able to initialize an exception directly, thus making std::make_exception_ptr around 30x faster.
…nitializer (llvm#72037) Produces now valid fixes for a member variables initialized with macros. Correctly uses expansion location instead of location inside macro to get init code. Close llvm#70189
…m#77451) Ensure that we generate correct symbol kinds and declaration fragments for unions in C and Objective-C parsing modes. rdar://120544091
These are largely copy-pasted from the corresponding function descriptions. Updated _rdtsc definition because it was just plain wrong.
…DPValues too (llvm#78726) This patch abstracts visitEntryValueDbgValue to deal with the substance of variable locations (Value, Var, Expr, DebugLoc) rather than how they're stored. That allows us to call it from handleDebugValue, which is similarly abstracted. This allows the entry-value behaviour (see the test) to be supported with non-instruction debug-info too!.
…of Displacement MachineOperand This allows us to check the entire constant address calculation, and ensure we're not performing any runtime address math into the constant pool (noticed in an upcoming patch).
Allows cases where movss/movsd etc. are loading constant (ConstantDataSequential) sub-vectors, ensuring we pad with the correct number of zero upper elements by making repeated printConstant calls to print zeroes in a matching int/fp format.
…vm#78584) When generating declaration fragments for types that use typedefs to pointer types ensure that we keep the user-defined typedef form instead of desugaring the typedef. rdar://102137655
Prepare a configuration switch and default to R_ARM_ABS32
…m-project into threadprivate-default-none
Closed this due to the wrong push. Apologies. |
When a default(none) clause exists and a threadprivate variable is used inside the construct, the variable does not inherit threadprivate behavior and throws the below error.
Added a condition to skip the error if it is a threadprivate variable.
Fixes: #49545