Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Flang][OpenMP] Fix to variables not inheriting data sharing attributes #74964

Closed
wants to merge 3,765 commits into from

Conversation

harishch4
Copy link
Contributor

When a default(none) clause exists and a threadprivate variable is used inside the construct, the variable does not inherit threadprivate behavior and throws the below error.

error: The DEFAULT(NONE) clause requires that 'a' must be listed in a data-sharing attribute clause

Added a condition to skip the error if it is a threadprivate variable.

Fixes: #49545

@llvmbot
Copy link
Collaborator

llvmbot commented Dec 10, 2023

@llvm/pr-subscribers-flang-openmp
@llvm/pr-subscribers-flang-semantics

@llvm/pr-subscribers-flang-fir-hlfir

Author: None (harishch4)

Changes

When a default(none) clause exists and a threadprivate variable is used inside the construct, the variable does not inherit threadprivate behavior and throws the below error.

> error: The DEFAULT(NONE) clause requires that 'a' must be listed in a data-sharing attribute clause

Added a condition to skip the error if it is a threadprivate variable.

Fixes: #49545


Full diff: https://github.com/llvm/llvm-project/pull/74964.diff

2 Files Affected:

  • (modified) flang/lib/Semantics/resolve-directives.cpp (+2-1)
  • (added) flang/test/Lower/OpenMP/threadprivate-default-clause.f90 (+60)
diff --git a/flang/lib/Semantics/resolve-directives.cpp b/flang/lib/Semantics/resolve-directives.cpp
index da6c865ad56a3b..d3f8f1d3ad7c91 100644
--- a/flang/lib/Semantics/resolve-directives.cpp
+++ b/flang/lib/Semantics/resolve-directives.cpp
@@ -1910,7 +1910,8 @@ void OmpAttributeVisitor::Post(const parser::Name &name) {
       if (Symbol * found{currScope().FindSymbol(name.source)}) {
         if (symbol != found) {
           name.symbol = found; // adjust the symbol within region
-        } else if (GetContext().defaultDSA == Symbol::Flag::OmpNone) {
+        } else if (GetContext().defaultDSA == Symbol::Flag::OmpNone &&
+            !symbol->test(Symbol::Flag::OmpThreadprivate)) {
           context_.Say(name.source,
               "The DEFAULT(NONE) clause requires that '%s' must be listed in "
               "a data-sharing attribute clause"_err_en_US,
diff --git a/flang/test/Lower/OpenMP/threadprivate-default-clause.f90 b/flang/test/Lower/OpenMP/threadprivate-default-clause.f90
new file mode 100644
index 00000000000000..be07aeca2d5c32
--- /dev/null
+++ b/flang/test/Lower/OpenMP/threadprivate-default-clause.f90
@@ -0,0 +1,60 @@
+! Simple test for lowering of OpenMP Threadprivate Directive with HLFIR.
+
+!RUN: %flang_fc1  -emit-hlfir -fopenmp %s -o - | FileCheck %s
+
+!CHECK-LABEL: func.func @_QPsub1() {
+!CHECK:     %[[A:.*]] = fir.address_of(@_QFsub1Ea) : !fir.ref<i32>
+!CHECK:     %[[A_DECL:.*]]:2 = hlfir.declare %[[A]]  {uniq_name = "_QFsub1Ea"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
+!CHECK:     %[[A_TP0:.*]] = omp.threadprivate %[[A_DECL]]#1 : !fir.ref<i32> -> !fir.ref<i32>
+!CHECK:     %[[A_CVT:.*]]:2 = hlfir.declare %[[A_TP0]] {uniq_name = "_QFsub1Ea"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
+!CHECK:     omp.parallel {
+!CHECK:       %[[A_TP:.*]] = omp.threadprivate %[[A_DECL]]#1 : !fir.ref<i32> -> !fir.ref<i32>
+!CHECK:       %[[A_TP_DECL:.*]]:2 = hlfir.declare %[[A_TP]] {uniq_name = "_QFsub1Ea"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
+!CHECK:       %[[TID:.*]] = fir.call @omp_get_thread_num() fastmath<contract> : () -> i32
+!CHECK:       hlfir.assign %[[TID]] to %[[A_TP_DECL]]#0 : i32, !fir.ref<i32>
+!CHECK:       omp.terminator
+!CHECK:     }
+
+subroutine sub1()
+   use omp_lib
+   integer, save :: a
+   !$omp threadprivate(a)
+   !$omp parallel default(none)
+   a = omp_get_thread_num()
+   !$omp end parallel
+end subroutine
+
+!CHECK-LABEL: func.func @_QPsub2() {
+!CHECK:    %[[BLK:.*]] = fir.address_of(@blk_) : !fir.ref<!fir.array<4xi8>>
+!CHECK:    %[[BLK_CVT:.*]] = fir.convert %[[BLK]] : (!fir.ref<!fir.array<4xi8>>) -> !fir.ref<!fir.array<?xi8>>
+!CHECK:    %c0 = arith.constant 0 : index
+!CHECK:    %[[A_ADDR:.*]] = fir.coordinate_of %[[BLK_CVT]], %c0 : (!fir.ref<!fir.array<?xi8>>, index) -> !fir.ref<i8>
+!CHECK:    %[[A_CVT:.*]] = fir.convert %[[A_ADDR]] : (!fir.ref<i8>) -> !fir.ref<i32>
+!CHECK:    %[[A_DECL:.*]]:2 = hlfir.declare %[[A_CVT]] {uniq_name = "_QFsub2Ea"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
+!CHECK:    %[[A_TP0:.*]] = omp.threadprivate %[[BLK]] : !fir.ref<!fir.array<4xi8>> -> !fir.ref<!fir.array<4xi8>>
+!CHECK:    %[[A_TP0_CVT:.*]] = fir.convert %[[A_TP0]] : (!fir.ref<!fir.array<4xi8>>) -> !fir.ref<!fir.array<?xi8>>
+!CHECK:    %c0_0 = arith.constant 0 : index
+!CHECK:    %[[A_TP0_ADDR:.*]] = fir.coordinate_of %[[A_TP0_CVT]], %c0_0 : (!fir.ref<!fir.array<?xi8>>, index) -> !fir.ref<i8>
+!CHECK:    %[[A_TP0_ADDR_CVT:.*]] = fir.convert %[[A_TP0_ADDR]] : (!fir.ref<i8>) -> !fir.ref<i32>
+!CHECK:    %[[A_TP0_DECL:.*]]:2 = hlfir.declare %[[A_TP0_ADDR_CVT]] {uniq_name = "_QFsub2Ea"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
+!CHECK:    omp.parallel {
+!CHECK:       %[[BLK_TP:.*]] = omp.threadprivate %[[BLK]] : !fir.ref<!fir.array<4xi8>> -> !fir.ref<!fir.array<4xi8>>
+!CHECK:       %[[BLK_TP_CVT:.*]] = fir.convert %[[BLK_TP]] : (!fir.ref<!fir.array<4xi8>>) -> !fir.ref<!fir.array<?xi8>>
+!CHECK:       %c0_1 = arith.constant 0 : index
+!CHECK:       %[[A_TP_ADDR:.*]] = fir.coordinate_of %[[BLK_TP_CVT]], %c0_1 : (!fir.ref<!fir.array<?xi8>>, index) -> !fir.ref<i8>
+!CHECK:       %[[A_TP_ADDR_CVT:.*]] = fir.convert %[[A_TP_ADDR]] : (!fir.ref<i8>) -> !fir.ref<i32>
+!CHECK:       %[[A_TP_DECL:.*]]:2 = hlfir.declare %[[A_TP_ADDR_CVT]] {uniq_name = "_QFsub2Ea"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
+!CHECK:       %[[TID:.*]] = fir.call @omp_get_thread_num() fastmath<contract> : () -> i32
+!CHECK:       hlfir.assign %[[TID]] to %[[A_TP_DECL]]#0 : i32, !fir.ref<i32>
+!CHECK:       omp.terminator
+!CHECK:     }
+
+subroutine sub2()
+   use omp_lib
+   integer :: a
+   common/blk/a
+   !$omp threadprivate(/blk/)
+   !$omp parallel default(none)
+   a = omp_get_thread_num()
+   !$omp end parallel
+end subroutine

Copy link
Contributor

@kiranchandramohan kiranchandramohan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch that looks at fixing this long-standing issue. Have a comment inline.

@@ -1910,7 +1910,8 @@ void OmpAttributeVisitor::Post(const parser::Name &name) {
if (Symbol * found{currScope().FindSymbol(name.source)}) {
if (symbol != found) {
name.symbol = found; // adjust the symbol within region
} else if (GetContext().defaultDSA == Symbol::Flag::OmpNone) {
} else if (GetContext().defaultDSA == Symbol::Flag::OmpNone &&
!symbol->test(Symbol::Flag::OmpThreadprivate)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would including Threadprivate also into IsObjectWithDSA work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Kiran, Apologies for the late reply, I was on vacation.

The symbol is already being added to objectWithDSA map in OmpAttributeVisitor::Pre(const parser::OpenMPThreadprivate)->ResolveOmpObjectList()->ResolveOmpObject()->AddToContextObjectWithDSA(), yet in OmpAttributeVisitor::Post(const parser::Name) the same symbol when checked in IsObjectWithDSA() is getting false.

AaronBallman and others added 19 commits January 19, 2024 15:15
Not only is this unused, it's really confusing having
getAPValueResult() and getResultAsAPValue() as sibling APIs
-Werror is now a global default as of
commit c52b467 ("Reapply "[libc] build with -Werror (llvm#73966)"
(llvm#74506)")
…fle (llvm#78636)"

This reverts commit 4d11f04.

This breaks some programs as mentioned in llvm#78636
As explained in [1], this linker is functionally equivalent to the
classic one (`ld64`) for build system purposes -- in particular to 
enable the use of order files to link `clang`. For this reason, in 
addition to fixing the detection rename `LLVM_LINKER_IS_LD64` to 
`LLVM_LINKER_IS_APPLE` to make the result of such detection more 
clear -- this should not cause any issue to downstream users, from 
a quick search in SourceGraph [2], only Swift uses the value of
this variable (which I will take care of updating in due time).

[1]: https://developer.apple.com/documentation/xcode-release-notes/xcode-15-release-notes#Linking
[2]: https://sourcegraph.com/search?q=context:global+LLVM_LINKER_IS_LD64+lang:cmake+fork:no+-file:AddLLVM.cmake+-file:clang/tools/driver/CMakeLists.txt&patternType=standard&sm=1&groupBy=repo
rdar://120740222
Adding weights to utility nodes in BP so that we can give more
importance to
certain utilities. This is useful when we optimize several objectives
jointly.
Use a fixed width integer type and assert that DwarRegNum fits the 16
bits.

This is a follow up to review comments on llvm#78600.
…bort (llvm#78561)

In the hardening modes that can be used in production (`fast` and
`extensive`), make a failed assertion invoke a trap instruction rather
than calling verbose abort. In the debug mode, still keep calling
verbose abort to provide a better user experience and to allow us to
keep our existing testing infrastructure for verifying assertion
messages. Since the debug mode by definition enables all assertions, we
can be sure that we still check all the assertion messages in the
library when running the test suite in the debug mode.

The main motivation to use trapping in production is to achieve better
code generation and reduce the binary size penalty. This way, the
assertion handler can compile to a single instruction, whereas the
existing mechanism with verbose abort results in generating a function
call that in general cannot be optimized away (made worse by the fact
that it's a variadic function, imposing an additional penalty). See the
[RFC](https://discourse.llvm.org/t/rfc-hardening-in-libc/73925) for more
details. Note that this mechanism can now be completely [overridden at
CMake configuration
time](llvm#77883).

This patch also significantly refactors `check_assertion.h` and expands
its test coverage. The main changes:
- when overriding `verbose_abort`, don't do matching inside the function
-- just print the error message to `stderr`. This removes the need to
set a global matcher and allows to do matching in the parent process
after the child finishes;
- remove unused logic for matching source locations and for using
wildcards;
- make matchers simple functors;
- introduce `DeathTestResult` that keeps data about the test run,
primarily to make it easier to test.

In addition to the refactoring, `check_assertion.h` can now recognize
when a process exits due to a trap.
This mirrors how the ELF linker works. I wasn't able to find anywhere
where this is currently tested.

Followup to llvm#78640, which triggered a regression.
…lvm#74629)

`-fvisibility-from-dllstorageclass` allows for overriding the visibility
of globals from their DLL storage class. The visibility to apply can be
customised for the different classes of globals via a set of dependent
options that specify the mapping values:
- `-fvisibility-dllexport=<value>`
- `-fvisibility-nodllstorageclass=<value>`
- `-fvisibility-externs-dllimport=<value>`
- `-fvisibility-externs-nodllstorageclass=<value>` 

Currently, one of the existing LLVM visibilities, `hidden`, `protected`,
`default`, can be used as a mapping value. This change adds a new
mapping value: `keep`, which specifies that the visibility should not be
overridden for that class of globals. The behaviour of
`-fvisibility-from-dllstorageclass` is otherwise unchanged and existing
uses of this set of options will be unaffected.

The background to this change is that currently the PS4 and PS5
compilers effectively ignore visibility - dllimport/export is the
supported method for export control in C/C++ source code. Now, we would
like to support visibility attributes and options in our frontend, in
addition to dllimport/export. To support this, we will override the
visibility of globals with explicit dllimport/export annotations but use
the `keep` setting for globals which do not have an explicit
dllimport/export.

There are also some minor improvements to the existing options:
- Make the `LANGOPS` `BENIGN` as they don't involve the AST.
- Correct/clarify the help text for the options.
We just need to check if the global is large or not.

In the kernel code model, globals are in the negative 2GB of the address
space, so globals can be a sign extended 32-bit immediate.

In other code models, small globals are in the low 2GB of the address
space, so sign extending them is equivalent to zero extending them.
)

This patch adds in support for symbolizing PGO information contained
within the SHT_LLVM_BB_ADDR_MAP section in llvm-objdump. The outputs are
simply the raw values contained within the section.
We want to know the upper 33 bits of the And Input are zero. SExt
only guarantees they are the same.

We originally checked for SExt or ZExt when we were using
isImpliedByDomCondition because a ZExt may have been changed to SExt
before we visited the And.

We are no longer using isImpliedByDomCondition so we can only look
for zext with the nneg flag.

While here, switch to PatternMatch to simplify the code.

Fixes llvm#78783
When the MachODebugMapParser encounters an object file that cannot be
found on disk, it currently leaves the parser in an incoherent state,
resulting in spurious warnings that can in turn slow down dsymutil.

This fixes llvm#78411.

rdar://117515153
It implements transformation to optimize accesses to shared memory.

Reference: https://reviews.llvm.org/D127457

_This change adds a transformation and pass to the NvGPU dialect that
attempts to optimize reads/writes from a memref representing GPU shared
memory in order to avoid bank conflicts. Given a value representing a
shared memory memref, it traverses all reads/writes within the parent op
and, subject to suitable conditions, rewrites all last dimension index
values such that element locations in the final (col) dimension are
given by newColIdx = col % vecSize + perm[row](col / vecSize, row)
where perm is a permutation function indexed by row and vecSize
is the vector access size in elements (currently assumes 128bit
vectorized accesses, but this can be made a parameter). This specific
transformation can help optimize typical distributed & vectorized
accesses
common to loading matrix multiplication operands to/from shared memory._
Prior to this change, we wouldn't build headers that aren't referenced
by other parts of the libc which would result in a build error during
installation. To address this, we make the header target a dependency of
the libc archive. Additionally, we also redo the install targets, moving
the install targets closer to build targets and simplifying the
hierarchy and generally matching what we do for other runtimes.
kparzysz and others added 23 commits January 22, 2024 08:41
…lvm#77761)

This brings `createBodyOfOp` to its final intended form. First, input
privatization is performed, then the recursive lowering takes place, and
finally the output privatization (lastprivate) is done.

This enables fixing a known issue with infinite loops inside of an
OpenMP region, and the fix is included in this patch.

Fixes llvm#74348.

Recursive lowering [5/5]

---------

Co-authored-by: Kiran Chandramohan <[email protected]>
Add the -fspv-target-env option to the clang-dxc compatibility driver to
specify the SPIR-V target environment, which is propagated to the target
Triple.
…after llvm#75826 (NFC)

llvm-project/llvm/lib/Transforms/Scalar/RewriteStatepointsForGC.cpp:1064:18: error: unused variable 'I' [-Werror,-Wunused-variable]
    Instruction *I = cast<Instruction>(Pair.first);
                 ^
llvm-project/llvm/lib/Transforms/Scalar/RewriteStatepointsForGC.cpp:1066:11: error: unused variable 'BaseValue' [-Werror,-Wunused-variable]
    auto *BaseValue = State.getBaseValue();
          ^
2 errors generated.
Some are missing setting of HSA_XNACK=1 environment variable, used to
enable unified memory support on amdgpu's when it's not been set at
kernel boot time. Some others needed to be marked as supporting
unified_shared_memory in the lit test harness.

Extend lit test harness to enable unified_shared_memory requirement for
AMD GPUs.

Reland: llvm#77851
The 'vector_length' clause is the first of the 'int-expr' clauses that I've
implemented.  Currently this is just being parsed as an assignment-expr,
  since it needs to be usable in a list. Sema implementation will
  enforce the integral-nature of it.
Rename intrinsics for fcvtu to fcvtzu and fcvts to fcvtzs.

Use llvm_anyvector_ty for both multi vector returns and operands,
therefore the return and operands can be specified in the intrinsic
call, e.g.

@llvm.aarch64.sve.scvtf.x4.nxv4f32.nxv4i32
…mize std::make_exception_ptr (llvm#65534)

This patch implements __cxa_init_primary_exception, an extension to the 
Itanium C++ ABI. This extension is already present in both libsupc++ and 
libcxxrt. This patch also starts making use of this function in 
std::make_exception_ptr: instead of going through a full throw/catch 
cycle, we are now able to initialize an exception directly, thus making 
std::make_exception_ptr around 30x faster.
…nitializer (llvm#72037)

Produces now valid fixes for a member variables initialized with macros.
Correctly uses expansion location instead of location inside macro to
get init code.

Close llvm#70189
…m#77451)

Ensure that we generate correct symbol kinds and declaration fragments
for unions in C and Objective-C parsing modes.

rdar://120544091
These are largely copy-pasted from the corresponding function
descriptions. Updated _rdtsc definition because it was just plain wrong.
…DPValues too (llvm#78726)

This patch abstracts visitEntryValueDbgValue to deal with the substance
of variable locations (Value, Var, Expr, DebugLoc) rather than how
they're stored. That allows us to call it from handleDebugValue, which
is similarly abstracted. This allows the entry-value behaviour (see the
test) to be supported with non-instruction debug-info too!.
…of Displacement MachineOperand

This allows us to check the entire constant address calculation, and ensure we're not performing any runtime address math into the constant pool (noticed in an upcoming patch).
Allows cases where movss/movsd etc. are loading constant (ConstantDataSequential) sub-vectors, ensuring we pad with the correct number of zero upper elements by making repeated printConstant calls to print zeroes in a matching int/fp format.
…vm#78584)

When generating declaration fragments for types that use typedefs to
pointer types ensure that we keep the user-defined typedef form instead
of desugaring the typedef.

rdar://102137655
Prepare a configuration switch and default to R_ARM_ABS32
@harishch4 harishch4 requested review from Endilll and a team as code owners January 22, 2024 17:19
@harishch4 harishch4 closed this Jan 22, 2024
@harishch4 harishch4 deleted the threadprivate-default-none branch January 22, 2024 17:22
@Endilll Endilll removed their request for review January 22, 2024 17:27
@ldionne ldionne removed the request for review from a team January 22, 2024 17:33
@harishch4
Copy link
Contributor Author

Closed this due to the wrong push. Apologies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flang:fir-hlfir flang:openmp flang:semantics flang Flang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Flang] [OpenMP] Variables in common blocks may not be inheriting data sharing attributes