-
Notifications
You must be signed in to change notification settings - Fork 11.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[flang] Introduce custom loop nest generation for loops in workshare construct #101445
base: users/ivanradanov/flang-workshare-emit
Are you sure you want to change the base?
[flang] Introduce custom loop nest generation for loops in workshare construct #101445
Conversation
I am debating introducing a new operation workshare_loop_container which exists only to "contain" a omp.loop_nest between lowering an elemental to lowering the omp.workshare it is contained in. so we would have this state:
Which may have come from a different lowering/codegen and we are not sure what the semantics of that code would be. This new operation can later be reused for the |
c2cbd77
to
4da93bb
Compare
8068d60
to
14878e8
Compare
decd0c5
to
cff85ab
Compare
e3460a0
to
3b8dbba
Compare
cff85ab
to
6571aed
Compare
3b8dbba
to
38322dc
Compare
35fb583
to
a14789a
Compare
38322dc
to
9046df2
Compare
a14789a
to
f1b9e86
Compare
9046df2
to
10b7a39
Compare
f1b9e86
to
21c5c42
Compare
10b7a39
to
d17c552
Compare
21c5c42
to
a9f4b77
Compare
@llvm/pr-subscribers-flang-fir-hlfir Author: Ivan R. Ivanov (ivanradanov) Changes3/4 1/4 #101443 This alternative loop nest generation is used in 4/4 for the workshare lowering. Full diff: https://github.com/llvm/llvm-project/pull/101445.diff 7 Files Affected:
diff --git a/flang/include/flang/Optimizer/Builder/HLFIRTools.h b/flang/include/flang/Optimizer/Builder/HLFIRTools.h
index 6b41025eea0780..f073f494b3fb21 100644
--- a/flang/include/flang/Optimizer/Builder/HLFIRTools.h
+++ b/flang/include/flang/Optimizer/Builder/HLFIRTools.h
@@ -357,8 +357,8 @@ hlfir::ElementalOp genElementalOp(
/// Structure to describe a loop nest.
struct LoopNest {
- fir::DoLoopOp outerLoop;
- fir::DoLoopOp innerLoop;
+ mlir::Operation *outerOp = nullptr;
+ mlir::Block *body = nullptr;
llvm::SmallVector<mlir::Value> oneBasedIndices;
};
@@ -366,11 +366,13 @@ struct LoopNest {
/// \p isUnordered specifies whether the loops in the loop nest
/// are unordered.
LoopNest genLoopNest(mlir::Location loc, fir::FirOpBuilder &builder,
- mlir::ValueRange extents, bool isUnordered = false);
+ mlir::ValueRange extents, bool isUnordered = false,
+ bool emitWorkshareLoop = false);
inline LoopNest genLoopNest(mlir::Location loc, fir::FirOpBuilder &builder,
- mlir::Value shape, bool isUnordered = false) {
+ mlir::Value shape, bool isUnordered = false,
+ bool emitWorkshareLoop = false) {
return genLoopNest(loc, builder, getIndexExtents(loc, builder, shape),
- isUnordered);
+ isUnordered, emitWorkshareLoop);
}
/// Inline the body of an hlfir.elemental at the current insertion point
diff --git a/flang/lib/Lower/ConvertCall.cpp b/flang/lib/Lower/ConvertCall.cpp
index fd873f55dd844e..0689d6e033dd9c 100644
--- a/flang/lib/Lower/ConvertCall.cpp
+++ b/flang/lib/Lower/ConvertCall.cpp
@@ -2128,7 +2128,7 @@ class ElementalCallBuilder {
hlfir::genLoopNest(loc, builder, shape, !mustBeOrdered);
mlir::ValueRange oneBasedIndices = loopNest.oneBasedIndices;
auto insPt = builder.saveInsertionPoint();
- builder.setInsertionPointToStart(loopNest.innerLoop.getBody());
+ builder.setInsertionPointToStart(loopNest.body);
callContext.stmtCtx.pushScope();
for (auto &preparedActual : loweredActuals)
if (preparedActual)
diff --git a/flang/lib/Lower/OpenMP/ReductionProcessor.cpp b/flang/lib/Lower/OpenMP/ReductionProcessor.cpp
index c3c1f363033c27..72a90dd0d6f29d 100644
--- a/flang/lib/Lower/OpenMP/ReductionProcessor.cpp
+++ b/flang/lib/Lower/OpenMP/ReductionProcessor.cpp
@@ -375,7 +375,7 @@ static void genBoxCombiner(fir::FirOpBuilder &builder, mlir::Location loc,
// know this won't miss any opportuinties for clever elemental inlining
hlfir::LoopNest nest = hlfir::genLoopNest(
loc, builder, shapeShift.getExtents(), /*isUnordered=*/true);
- builder.setInsertionPointToStart(nest.innerLoop.getBody());
+ builder.setInsertionPointToStart(nest.body);
mlir::Type refTy = fir::ReferenceType::get(seqTy.getEleTy());
auto lhsEleAddr = builder.create<fir::ArrayCoorOp>(
loc, refTy, lhs, shapeShift, /*slice=*/mlir::Value{},
@@ -389,7 +389,7 @@ static void genBoxCombiner(fir::FirOpBuilder &builder, mlir::Location loc,
builder, loc, redId, refTy, lhsEle, rhsEle);
builder.create<fir::StoreOp>(loc, scalarReduction, lhsEleAddr);
- builder.setInsertionPointAfter(nest.outerLoop);
+ builder.setInsertionPointAfter(nest.outerOp);
builder.create<mlir::omp::YieldOp>(loc, lhsAddr);
}
diff --git a/flang/lib/Optimizer/Builder/HLFIRTools.cpp b/flang/lib/Optimizer/Builder/HLFIRTools.cpp
index 8d0ae2f195178c..333331378841ed 100644
--- a/flang/lib/Optimizer/Builder/HLFIRTools.cpp
+++ b/flang/lib/Optimizer/Builder/HLFIRTools.cpp
@@ -20,6 +20,7 @@
#include "mlir/IR/IRMapping.h"
#include "mlir/Support/LLVM.h"
#include "llvm/ADT/TypeSwitch.h"
+#include <mlir/Dialect/OpenMP/OpenMPDialect.h>
#include <optional>
// Return explicit extents. If the base is a fir.box, this won't read it to
@@ -855,26 +856,51 @@ mlir::Value hlfir::inlineElementalOp(
hlfir::LoopNest hlfir::genLoopNest(mlir::Location loc,
fir::FirOpBuilder &builder,
- mlir::ValueRange extents, bool isUnordered) {
+ mlir::ValueRange extents, bool isUnordered,
+ bool emitWorkshareLoop) {
+ emitWorkshareLoop = emitWorkshareLoop && isUnordered;
hlfir::LoopNest loopNest;
assert(!extents.empty() && "must have at least one extent");
- auto insPt = builder.saveInsertionPoint();
+ mlir::OpBuilder::InsertionGuard guard(builder);
loopNest.oneBasedIndices.assign(extents.size(), mlir::Value{});
// Build loop nest from column to row.
auto one = builder.create<mlir::arith::ConstantIndexOp>(loc, 1);
mlir::Type indexType = builder.getIndexType();
- unsigned dim = extents.size() - 1;
- for (auto extent : llvm::reverse(extents)) {
- auto ub = builder.createConvert(loc, indexType, extent);
- loopNest.innerLoop =
- builder.create<fir::DoLoopOp>(loc, one, ub, one, isUnordered);
- builder.setInsertionPointToStart(loopNest.innerLoop.getBody());
- // Reverse the indices so they are in column-major order.
- loopNest.oneBasedIndices[dim--] = loopNest.innerLoop.getInductionVar();
- if (!loopNest.outerLoop)
- loopNest.outerLoop = loopNest.innerLoop;
+ if (emitWorkshareLoop) {
+ auto wslw = builder.create<mlir::omp::WorkshareLoopWrapperOp>(loc);
+ loopNest.outerOp = wslw;
+ builder.createBlock(&wslw.getRegion());
+ mlir::omp::LoopNestOperands lnops;
+ lnops.loopInclusive = builder.getUnitAttr();
+ for (auto extent : llvm::reverse(extents)) {
+ lnops.loopLowerBounds.push_back(one);
+ lnops.loopUpperBounds.push_back(extent);
+ lnops.loopSteps.push_back(one);
+ }
+ auto lnOp = builder.create<mlir::omp::LoopNestOp>(loc, lnops);
+ builder.create<mlir::omp::TerminatorOp>(loc);
+ mlir::Block *block = builder.createBlock(&lnOp.getRegion());
+ for (auto extent : llvm::reverse(extents))
+ block->addArgument(extent.getType(), extent.getLoc());
+ loopNest.body = block;
+ builder.create<mlir::omp::YieldOp>(loc);
+ for (unsigned dim = 0; dim < extents.size(); dim++)
+ loopNest.oneBasedIndices[extents.size() - dim - 1] =
+ lnOp.getRegion().front().getArgument(dim);
+ } else {
+ unsigned dim = extents.size() - 1;
+ for (auto extent : llvm::reverse(extents)) {
+ auto ub = builder.createConvert(loc, indexType, extent);
+ auto doLoop =
+ builder.create<fir::DoLoopOp>(loc, one, ub, one, isUnordered);
+ loopNest.body = doLoop.getBody();
+ builder.setInsertionPointToStart(loopNest.body);
+ // Reverse the indices so they are in column-major order.
+ loopNest.oneBasedIndices[dim--] = doLoop.getInductionVar();
+ if (!loopNest.outerOp)
+ loopNest.outerOp = doLoop;
+ }
}
- builder.restoreInsertionPoint(insPt);
return loopNest;
}
diff --git a/flang/lib/Optimizer/HLFIR/Transforms/BufferizeHLFIR.cpp b/flang/lib/Optimizer/HLFIR/Transforms/BufferizeHLFIR.cpp
index a70a6b388c4b1a..b608677c526310 100644
--- a/flang/lib/Optimizer/HLFIR/Transforms/BufferizeHLFIR.cpp
+++ b/flang/lib/Optimizer/HLFIR/Transforms/BufferizeHLFIR.cpp
@@ -31,6 +31,7 @@
#include "mlir/Pass/Pass.h"
#include "mlir/Pass/PassManager.h"
#include "mlir/Transforms/DialectConversion.h"
+#include "mlir/Dialect/OpenMP/OpenMPDialect.h"
#include "llvm/ADT/TypeSwitch.h"
namespace hlfir {
@@ -793,7 +794,7 @@ struct ElementalOpConversion
hlfir::LoopNest loopNest =
hlfir::genLoopNest(loc, builder, extents, !elemental.isOrdered());
auto insPt = builder.saveInsertionPoint();
- builder.setInsertionPointToStart(loopNest.innerLoop.getBody());
+ builder.setInsertionPointToStart(loopNest.body);
auto yield = hlfir::inlineElementalOp(loc, builder, elemental,
loopNest.oneBasedIndices);
hlfir::Entity elementValue(yield.getElementValue());
diff --git a/flang/lib/Optimizer/HLFIR/Transforms/LowerHLFIROrderedAssignments.cpp b/flang/lib/Optimizer/HLFIR/Transforms/LowerHLFIROrderedAssignments.cpp
index 85dd517cb57914..645abf65d10a32 100644
--- a/flang/lib/Optimizer/HLFIR/Transforms/LowerHLFIROrderedAssignments.cpp
+++ b/flang/lib/Optimizer/HLFIR/Transforms/LowerHLFIROrderedAssignments.cpp
@@ -464,7 +464,7 @@ void OrderedAssignmentRewriter::pre(hlfir::RegionAssignOp regionAssignOp) {
// if the LHS is not).
mlir::Value shape = hlfir::genShape(loc, builder, lhsEntity);
elementalLoopNest = hlfir::genLoopNest(loc, builder, shape);
- builder.setInsertionPointToStart(elementalLoopNest->innerLoop.getBody());
+ builder.setInsertionPointToStart(elementalLoopNest->body);
lhsEntity = hlfir::getElementAt(loc, builder, lhsEntity,
elementalLoopNest->oneBasedIndices);
rhsEntity = hlfir::getElementAt(loc, builder, rhsEntity,
@@ -484,7 +484,7 @@ void OrderedAssignmentRewriter::pre(hlfir::RegionAssignOp regionAssignOp) {
for (auto &cleanupConversion : argConversionCleanups)
cleanupConversion();
if (elementalLoopNest)
- builder.setInsertionPointAfter(elementalLoopNest->outerLoop);
+ builder.setInsertionPointAfter(elementalLoopNest->outerOp);
} else {
// TODO: preserve allocatable assignment aspects for forall once
// they are conveyed in hlfir.region_assign.
@@ -493,7 +493,7 @@ void OrderedAssignmentRewriter::pre(hlfir::RegionAssignOp regionAssignOp) {
generateCleanupIfAny(loweredLhs.elementalCleanup);
if (loweredLhs.vectorSubscriptLoopNest)
builder.setInsertionPointAfter(
- loweredLhs.vectorSubscriptLoopNest->outerLoop);
+ loweredLhs.vectorSubscriptLoopNest->outerOp);
generateCleanupIfAny(oldRhsYield);
generateCleanupIfAny(loweredLhs.nonElementalCleanup);
}
@@ -518,8 +518,8 @@ void OrderedAssignmentRewriter::pre(hlfir::WhereOp whereOp) {
hlfir::Entity savedMask{maybeSaved->first};
mlir::Value shape = hlfir::genShape(loc, builder, savedMask);
whereLoopNest = hlfir::genLoopNest(loc, builder, shape);
- constructStack.push_back(whereLoopNest->outerLoop.getOperation());
- builder.setInsertionPointToStart(whereLoopNest->innerLoop.getBody());
+ constructStack.push_back(whereLoopNest->outerOp);
+ builder.setInsertionPointToStart(whereLoopNest->body);
mlir::Value cdt = hlfir::getElementAt(loc, builder, savedMask,
whereLoopNest->oneBasedIndices);
generateMaskIfOp(cdt);
@@ -527,7 +527,7 @@ void OrderedAssignmentRewriter::pre(hlfir::WhereOp whereOp) {
// If this is the same run as the one that saved the value, the clean-up
// was left-over to be done now.
auto insertionPoint = builder.saveInsertionPoint();
- builder.setInsertionPointAfter(whereLoopNest->outerLoop);
+ builder.setInsertionPointAfter(whereLoopNest->outerOp);
generateCleanupIfAny(maybeSaved->second);
builder.restoreInsertionPoint(insertionPoint);
}
@@ -539,8 +539,8 @@ void OrderedAssignmentRewriter::pre(hlfir::WhereOp whereOp) {
mask.generateNoneElementalPart(builder, mapper);
mlir::Value shape = mask.generateShape(builder, mapper);
whereLoopNest = hlfir::genLoopNest(loc, builder, shape);
- constructStack.push_back(whereLoopNest->outerLoop.getOperation());
- builder.setInsertionPointToStart(whereLoopNest->innerLoop.getBody());
+ constructStack.push_back(whereLoopNest->outerOp);
+ builder.setInsertionPointToStart(whereLoopNest->body);
mlir::Value cdt = generateMaskedEntity(mask);
generateMaskIfOp(cdt);
return;
@@ -754,7 +754,7 @@ OrderedAssignmentRewriter::generateYieldedLHS(
loweredLhs.vectorSubscriptLoopNest = hlfir::genLoopNest(
loc, builder, loweredLhs.vectorSubscriptShape.value());
builder.setInsertionPointToStart(
- loweredLhs.vectorSubscriptLoopNest->innerLoop.getBody());
+ loweredLhs.vectorSubscriptLoopNest->body);
}
loweredLhs.lhs = temp->second.fetch(loc, builder);
return loweredLhs;
@@ -772,7 +772,7 @@ OrderedAssignmentRewriter::generateYieldedLHS(
hlfir::genLoopNest(loc, builder, *loweredLhs.vectorSubscriptShape,
!elementalAddrLhs.isOrdered());
builder.setInsertionPointToStart(
- loweredLhs.vectorSubscriptLoopNest->innerLoop.getBody());
+ loweredLhs.vectorSubscriptLoopNest->body);
mapper.map(elementalAddrLhs.getIndices(),
loweredLhs.vectorSubscriptLoopNest->oneBasedIndices);
for (auto &op : elementalAddrLhs.getBody().front().without_terminator())
@@ -798,11 +798,11 @@ OrderedAssignmentRewriter::generateMaskedEntity(MaskedArrayExpr &maskedExpr) {
if (!maskedExpr.noneElementalPartWasGenerated) {
// Generate none elemental part before the where loops (but inside the
// current forall loops if any).
- builder.setInsertionPoint(whereLoopNest->outerLoop);
+ builder.setInsertionPoint(whereLoopNest->outerOp);
maskedExpr.generateNoneElementalPart(builder, mapper);
}
// Generate the none elemental part cleanup after the where loops.
- builder.setInsertionPointAfter(whereLoopNest->outerLoop);
+ builder.setInsertionPointAfter(whereLoopNest->outerOp);
maskedExpr.generateNoneElementalCleanupIfAny(builder, mapper);
// Generate the value of the current element for the masked expression
// at the current insertion point (inside the where loops, and any fir.if
@@ -1242,7 +1242,7 @@ void OrderedAssignmentRewriter::saveLeftHandSide(
LhsValueAndCleanUp loweredLhs = generateYieldedLHS(loc, region);
fir::factory::TemporaryStorage *temp = nullptr;
if (loweredLhs.vectorSubscriptLoopNest)
- constructStack.push_back(loweredLhs.vectorSubscriptLoopNest->outerLoop);
+ constructStack.push_back(loweredLhs.vectorSubscriptLoopNest->outerOp);
if (loweredLhs.vectorSubscriptLoopNest && !rhsIsArray(regionAssignOp)) {
// Vector subscripted entity for which the shape must also be saved on top
// of the element addresses (e.g. the shape may change in each forall
@@ -1265,7 +1265,7 @@ void OrderedAssignmentRewriter::saveLeftHandSide(
// subscripted LHS.
auto &vectorTmp = temp->cast<fir::factory::AnyVectorSubscriptStack>();
auto insertionPoint = builder.saveInsertionPoint();
- builder.setInsertionPoint(loweredLhs.vectorSubscriptLoopNest->outerLoop);
+ builder.setInsertionPoint(loweredLhs.vectorSubscriptLoopNest->outerOp);
vectorTmp.pushShape(loc, builder, shape);
builder.restoreInsertionPoint(insertionPoint);
} else {
@@ -1291,7 +1291,7 @@ void OrderedAssignmentRewriter::saveLeftHandSide(
if (loweredLhs.vectorSubscriptLoopNest) {
constructStack.pop_back();
builder.setInsertionPointAfter(
- loweredLhs.vectorSubscriptLoopNest->outerLoop);
+ loweredLhs.vectorSubscriptLoopNest->outerOp);
}
}
diff --git a/flang/lib/Optimizer/HLFIR/Transforms/OptimizedBufferization.cpp b/flang/lib/Optimizer/HLFIR/Transforms/OptimizedBufferization.cpp
index c5b809514c54c6..c4aed6b79df923 100644
--- a/flang/lib/Optimizer/HLFIR/Transforms/OptimizedBufferization.cpp
+++ b/flang/lib/Optimizer/HLFIR/Transforms/OptimizedBufferization.cpp
@@ -483,7 +483,7 @@ llvm::LogicalResult ElementalAssignBufferization::matchAndRewrite(
// hlfir.elemental region inside the inner loop
hlfir::LoopNest loopNest =
hlfir::genLoopNest(loc, builder, extents, !elemental.isOrdered());
- builder.setInsertionPointToStart(loopNest.innerLoop.getBody());
+ builder.setInsertionPointToStart(loopNest.body);
auto yield = hlfir::inlineElementalOp(loc, builder, elemental,
loopNest.oneBasedIndices);
hlfir::Entity elementValue{yield.getElementValue()};
@@ -554,7 +554,7 @@ llvm::LogicalResult BroadcastAssignBufferization::matchAndRewrite(
hlfir::getIndexExtents(loc, builder, shape);
hlfir::LoopNest loopNest =
hlfir::genLoopNest(loc, builder, extents, /*isUnordered=*/true);
- builder.setInsertionPointToStart(loopNest.innerLoop.getBody());
+ builder.setInsertionPointToStart(loopNest.body);
auto arrayElement =
hlfir::getElementAt(loc, builder, lhs, loopNest.oneBasedIndices);
builder.create<hlfir::AssignOp>(loc, rhs, arrayElement);
@@ -649,7 +649,7 @@ llvm::LogicalResult VariableAssignBufferization::matchAndRewrite(
hlfir::getIndexExtents(loc, builder, shape);
hlfir::LoopNest loopNest =
hlfir::genLoopNest(loc, builder, extents, /*isUnordered=*/true);
- builder.setInsertionPointToStart(loopNest.innerLoop.getBody());
+ builder.setInsertionPointToStart(loopNest.body);
auto rhsArrayElement =
hlfir::getElementAt(loc, builder, rhs, loopNest.oneBasedIndices);
rhsArrayElement = hlfir::loadTrivialScalar(loc, builder, rhsArrayElement);
|
d17c552
to
621b017
Compare
b37194f
to
082b892
Compare
621b017
to
5e01e41
Compare
082b892
to
7cce1db
Compare
✅ With the latest revision this PR passed the C/C++ code formatter. |
5e01e41
to
70daa01
Compare
7cce1db
to
51efcce
Compare
70daa01
to
fb06794
Compare
51efcce
to
c3ec4f1
Compare
fb06794
to
de32599
Compare
c3ec4f1
to
d5fbe9c
Compare
I've been thinking about it and I believe we could actually use The only difference is that perhaps we could enforce tighter restrictions to the |
@skatrak I don't think it is currently a problem, however, if in the future someone adds some optimization or transformation that assumes that it is nested in a parallel it may break wsloops which happen to be nested in a workshare instead. (As you said it is possible to check whether it is supposed to be a workshare or parallel wsloop but an op changing its semantics or transformations dpeending on what it's nested in feels more error prone). So that is why I opted for this approach. |
Yes, the
My understanding is that subroutine workshare(A, B, C)
integer, parameter :: N = 10
integer, intent(in) :: A(N), B(N)
integer, intent(out :: C(N)
integer :: tmp
!$omp parallel workshare
C = A + B
tmp = N
C = C + A
!$omp end parallel workshare
end subroutine workshare func.func @workshare(%A : ..., %B : ..., %C : ...) {
%N = arith.constant 10 : i32
%tmp = fir.alloca i32
omp.parallel {
omp.wsloop {
omp.loop_nest (%i) : i32 = (%...) to (%...) inclusive step (%...) {
// C(%i) = A(%i) + B(%i)
omp.yield
}
omp.terminator
}
omp.single {
fir.store %N to %tmp
omp.terminator
}
omp.wsloop {
omp.loop_nest (%i) : i32 = (%...) to (%...) inclusive step (%...) {
// C(%i) = C(%i) + A(%i)
omp.yield
}
omp.terminator
}
omp.terminator
}
} Maybe support for this operation could be just based on changes to how the MLIR representation is built in the first place, what do you think? Otherwise, something more similar to your proposal for |
This is partly what this implementation aims to do. In fact, after the pass that lowers the omp.workshare operation we are left with IR very close to the one you showed in your example. (please take a look at some of the tests in #101446) The approach taken here is similar to the omp.workdistribute implementation, in that the purpose of the omp.workshare and omp.workshare.loop_wrapper ops are to preserve the high-level optimizations available when using HLFIR, after we are done with the LowerWorkshare pass, both omp.workdistribute and omp.workdistribute.loop_wrapper disappear. The sole purpose of the omp.workdistribute.loop_wrapper op is to be able to more explicitly mark loops that need to "parallelized" by the workshare construct and preserve that information through the pipeline. Its lifetime is from the frontend (Fortran->{HLFIR,FIR}) up to the the LowerWorkshare pass which runs after we are done with HLFIR optimizations (after HLFIR->FIR lowering), same for omp.workshare. The problem with trying to convert fir.do_loop's to wsloop is that it is harder to keep track of where they came from - did they come from an array intrinsic which needs to be parallelized or was it just a do loop which the programmer wrote in the workshare which must not be parallelized. |
I see that this approach reduces significantly the amount of OpenMP-specific handling that needs to be done inside of the creation of Fortran loops, so I don't have an issue with adding the
I guess what I still don't understand is the need for |
No you are right, sorry for the back and forth, as you said, since a wsloop can only be nested in a omp.parallel it is immediately obvious that it binds to the omp.parallel threads so that makes sense. My only concern was that at some point some transformation (perhaps in the future, because I don't think anything transforms I thought this was a fair assumption for an optimization/transformation to make because if for example only one of the threads executes a wsloop it would not produce the intended result. So the intention was to guard against a potential error like that. Let me know if I am wrong here since I am sure people here have more experience than me on this. I can see that if no transformation can make that assumption, then it is perfectly safe to use Thank you for the in depth review. |
Thank you for bringing attention to this potential issue I hadn't considered, I think it's a productive discussion. I'm not aware of any transformations like that existing at the moment either, but it looks like they would certainly break if they ran while the |
Ideally, the Another factor is that there may not be PFT->loop lowerings for many constructs that need to be divided into units of work. so we may need to first generate HLFIR and alter the lowerings from HLFIR to FIR to get the Are there any concerns with adding |
I see, though this doesn't necessarily mean that approach wouldn't work. It means we'd only have to delay the point at which transformations for
Can you share a case where this would happen? I agree that we wouldn't want to produce some IR that doesn't keep consistent semantics for a given operation across the pipeline. In that case, adding another operation might indeed be the right solution.
My main concern is from the dialect design perspective. It would be confusing to have two separate "worksharing loop" operations with one being used on its own and the other one in conjunction with the |
I was under the impression that direct PFT to FIR lowering is deprecated, so things like array notation (e.g. z = x + y where x,y,z are arrays) always go through hlfir.elemental and then to fir loops. Not sure if the PFT->FIR lowering for that exists, but if PFT->FIR is deprecated then we should probably use the HLFIR lowering for this.
I think the operations describe very different things. The similarity in naming is an unfortunate consequence of the |
Sorry for the delay getting back to this! I don't have any remaining blocking concerns with your proposal, but I think it would be good if others shared their opinions before proceeding with this approach. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks reasonable to me.
de32599
to
2ff1ac1
Compare
d5fbe9c
to
8afc111
Compare
Emit loop nests in a custom wrapper Only emit unordered loops as omp loops Fix uninitialized memory bug in genLoopNest
2ff1ac1
to
e23cf32
Compare
8afc111
to
d8cfd38
Compare
3/4
1/4 #101443
2/4 #101444
3/4 #101445
4/4 #101446
This alternative loop nest generation is used in 4/4 for the workshare lowering.