[flang] Lower omp.workshare to other omp constructs #101446

ivanradanov · 2024-08-01T02:23:53Z

4/4

There are two points which need some discussion in this PR:

We need to make a value computed in a omp.single accessible in all threads of the omp.parallel region. This is achieved by allocating temporary memory outside the omp.parallel and atoring that in the omp.single and then reloading it from all threads. However, from reading the standard I dont think we are guaranteed that the workshare is nested in the omp.parallel so there could be a omp.parallel { func.call @contains_workshare }, then we would not be able to access the omp.parallel. So I think adding support in the runtime to be able to yield a value from a omp.single could be the fix to this.
For the temporary allocations above not all types are supported by fir.alloca, so I need to use llvm.alloca and unrealized_cast to be able to allocate a temporary for a fir.ref type. This too can be fixed by introducing yielding from omp.single

1/4 #101443
2/4 #101444
3/4 #101445
4/4 #101446

WIP #104748 adds HLFIR lowering that make use of this pipeline

tblah

Thank you for your work so far. This is a great start.

What is the plan for transforming do loops generated by lowering (e.g. that do not become hlfir.elemental operations and are not generated by hlfir bufferization)?

flang/include/flang/Optimizer/CMakeLists.txt

tblah · 2024-08-01T09:44:19Z

flang/include/flang/Tools/CLOptions.inc

@@ -344,6 +345,7 @@ inline void createHLFIRToFIRPassPipeline(
  pm.addPass(hlfir::createLowerHLFIRIntrinsics());
  pm.addPass(hlfir::createBufferizeHLFIR());
  pm.addPass(hlfir::createConvertHLFIRtoFIR());
+  pm.addPass(flangomp::createLowerWorkshare());


The other OpenMP passes are added in createOpenMPFIRPassPipeline, which is only called when -fopenmp is used. It would be convenient if this new pass could stay with the other OpenMP passes.

Currently those passes are run immediately after lowering. There are comments which say they have to be run immediately after lowering, but at a glance it isn't obvious why they couldn't be run here after HLFIR. @agozillon what do you think?

Sorry, just seen this ping! The comment is primarily to state that the passes should be ran immediately after lowering from parse tree to IR (HLFIR/FIR/OMP), as they make a lot of changes to convert things into a more final form for the OMP dialect with respect to target. It was previously a lot more important as we had a form of outlining that ripped out target regions from their functions into seperate functions. That's no longer there, but we do still have some passes that modify the IR at this stage to a more finalized form for target, in particular OMPMapInfoFinalization which will generate some new maps for descriptor types, OMPMarkDeclareTarget which will mark functions declare target implicitly, and another that removes functions unnecessary for device. There is also a pass or will be for do concurrent which I believe outlines loops into target regions as well.

But TL;DR, there's a lot of things going on in those passes that would be preferable to keep happening immediately after lowering from the parse tree so later passes can depend on the information being in the "correct" format, whether or not that "immediate" location has changed to after this HLFIR lowering or remains where it is currently I am unsure of!

@skatrak @jsjodin may also have some feedback/input to this.

I opted to keep the rest of the openmp passes as they are and have added a bool argument to control whether to run the lower-workshare pass

flang/lib/Optimizer/HLFIR/Transforms/BufferizeHLFIR.cpp

flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp

flang/include/flang/Optimizer/OpenMP/Passes.td

flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp

ivanradanov · 2024-08-02T01:51:31Z

Thank you a lot @tblah for taking a look and for the helpful comments.

Thank you for your work so far. This is a great start.

What is the plan for transforming do loops generated by lowering (e.g. that do not become hlfir.elemental operations and are not generated by hlfir bufferization)?

I am looking at this for the standard.

I intend to go through the various constructs that require to be separated into units of work and provide an alternative lowering for them so that they will get parallelized when we lower the workdistribute operation.

To accurately keep track of constructs that need to be parallelized for workdistribute I em debating adding a new loop_nest wrapper for that as discussed here

ivanradanov · 2024-08-04T11:37:04Z

@kiranchandramohan @tblah
I think this warrants another look if you have some time.

I have reiterated a bit and opted to have a omp loop nest wrapper op which signals to the workshare lowering which specific loops need to be parallelized (i.e. converted to wsloop { loop_nest}).

This will allow us to emit this in the frontend if it is needed and be more precise about the exact loops that need to be parallelized.

So the LowerWorksharePass that I have implemented here is tasked with parallelizing the loops nested in workshare_loop_wrapper and both the Fortran->mlir frontend and the hlfir lowering passes would be responsible for emitting the workshare_loop_wrapper ops where appropriate. For that I have started with some of the obvious lowerings in the hlfir bufferizations, but perhaps that can be done gradually and not everything needs to be covered by this PR. Let me know what you think.

tblah

Thank you for all of the updates!

So the LowerWorksharePass that I have implemented here is tasked with parallelizing the loops nested in workshare_loop_wrapper and both the Fortran->mlir frontend and the hlfir lowering passes would be responsible for emitting the workshare_loop_wrapper ops where appropriate. For that I have started with some of the obvious lowerings in the hlfir bufferizations, but perhaps that can be done gradually and not everything needs to be covered by this PR. Let me know what you think.

Doing it gradually sounds good to me. When you take this out of draft, please document in the commit message exactly what is and is not supported at this point.

flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp

tblah · 2024-09-25T10:43:06Z

Thanks for the updates and good job noticing the bug with unstructured control flow.

I have added a comment explaining the limitation of not allowing CFG in workshare for now and an appropriate TODO message for that.

My concern with the TODO message is that some code that previously compiled using the lowering of WORKSHARE as SINGLE will now hit this TODO. This is okay with me so long as it is fixed soon (before LLVM 20). Otherwise, could these cases continued to be lowered as SINGLE for now.

ivanradanov · 2024-10-04T06:28:43Z

My concern with the TODO message is that some code that previously compiled using the lowering of WORKSHARE as SINGLE will now hit this TODO. This is okay with me so long as it is fixed soon (before LLVM 20). Otherwise, could these cases continued to be lowered as SINGLE for now.

I have updated it to lower to omp.single and emit a warning in CFG cases.

flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp

mjklemm · 2024-10-11T09:12:41Z

Is this PR in an acceptable state to be merged? I went through the other three and it seems that the 4 PRs all converging and seem either ready for merge or very close.

Change to workshare loop wrapper op Move single op declaration Schedule pass properly Correctly handle nested nested loop nests to be parallelized by workshare Leave comments for shouldUseWorkshareLowering Use copyprivate to scatter val from omp.single TODO still need to implement copy function TODO transitive check for usage outside of omp.single not imiplemented yet Transitively check for users outisde of single op TODO need to implement copy func TODO need to hoist allocas outside of single regions Add tests Hoist allocas More tests Emit body for copy func Test the tmp storing logic Clean up trivially dead ops Only handle single-block regions for now Fix tests for custom assembly for loop wrapper Only run the lower workshare pass if openmp is enabled Implement some missing functionality Fix tests Fix test Iterate backwards to find all trivially dead ops Add expalanation comment for createCopyFun Update test

ivanradanov · 2024-10-20T04:48:47Z

I have rebased this on the latest main and also marked the follow up #104748 as ready for review. This follow up PR contains code and tests which are needed to fully check this implementation as well.

I think this stack is currently in a good state to be merged.

The 1/4 #101443 2/4 #101444 3/4 #101445 are already approved and good to go, but 2/4 #101444 must be merged together with this PR because otherwise it will result in compilation failures for omp workshare.

Thus, it would be great if this PR can be reviewed as well and we can proceed with merging if it looks good.

(The build failures are only on windows and coming from the main branch and not introduced by this)

Thirumalai-Shaktivel · 2024-10-24T08:17:53Z

All the PRs LGTM, this works for my test cases.

ivanradanov requested review from kiranchandramohan, kparzysz and Thirumalai-Shaktivel August 1, 2024 02:23

This was referenced Aug 1, 2024

[MLIR][omp] Add omp.workshare op #101443

Open

[flang][omp] Emit omp.workshare in frontend #101444

Open

[flang] Introduce custom loop nest generation for loops in workshare construct #101445

Open

ivanradanov force-pushed the users/ivanradanov/flang-workshare-elemental-lowering branch from c2cbd77 to 4da93bb Compare August 1, 2024 02:58

ivanradanov force-pushed the users/ivanradanov/flang-workshare branch 2 times, most recently from 9a51b40 to 26d0051 Compare August 1, 2024 03:36

ivanradanov requested a review from jdoerfert August 1, 2024 03:40

tblah reviewed Aug 1, 2024

View reviewed changes

ivanradanov force-pushed the users/ivanradanov/flang-workshare branch from 26d0051 to 386157c Compare August 2, 2024 08:17

ivanradanov force-pushed the users/ivanradanov/flang-workshare-elemental-lowering branch from 4da93bb to decd0c5 Compare August 2, 2024 08:18

ivanradanov force-pushed the users/ivanradanov/flang-workshare branch from 386157c to 7006c75 Compare August 4, 2024 07:14

ivanradanov force-pushed the users/ivanradanov/flang-workshare-elemental-lowering branch from decd0c5 to cff85ab Compare August 4, 2024 07:14

ivanradanov force-pushed the users/ivanradanov/flang-workshare branch from fbd2f2d to fd782e9 Compare August 4, 2024 08:04

ivanradanov force-pushed the users/ivanradanov/flang-workshare-elemental-lowering branch from cff85ab to 6571aed Compare August 4, 2024 08:04

ivanradanov force-pushed the users/ivanradanov/flang-workshare branch 2 times, most recently from e21aeb8 to 7af84b2 Compare August 4, 2024 11:24

ivanradanov force-pushed the users/ivanradanov/flang-workshare branch from 7af84b2 to 058fb57 Compare August 4, 2024 13:08

ivanradanov force-pushed the users/ivanradanov/flang-workshare-elemental-lowering branch from 35fb583 to a14789a Compare August 4, 2024 13:08

ivanradanov force-pushed the users/ivanradanov/flang-workshare branch from 058fb57 to f8834c9 Compare August 6, 2024 04:52

ivanradanov force-pushed the users/ivanradanov/flang-workshare-elemental-lowering branch from a14789a to f1b9e86 Compare August 6, 2024 04:53

tblah reviewed Aug 8, 2024

View reviewed changes

flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp Show resolved Hide resolved

ivanradanov mentioned this pull request Aug 19, 2024

[flang][NFC] Move OpenMP related passes into a separate directory #104732

Merged

ivanradanov force-pushed the users/ivanradanov/flang-workshare branch from f8834c9 to 0cb4216 Compare August 19, 2024 04:01

ivanradanov force-pushed the users/ivanradanov/flang-workshare-elemental-lowering branch from f1b9e86 to 21c5c42 Compare August 19, 2024 04:01

ivanradanov force-pushed the users/ivanradanov/flang-workshare branch from 0cb4216 to 0858469 Compare August 19, 2024 06:21

ivanradanov requested review from mjklemm and removed request for dcci September 25, 2024 06:02

tblah reviewed Oct 4, 2024

View reviewed changes

flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp Outdated Show resolved Hide resolved

ivanradanov force-pushed the users/ivanradanov/flang-workshare branch from 4257950 to d64c6c4 Compare October 19, 2024 16:37

ivanradanov requested review from DeinAlptraum, cyndyishida, lanza and bcardosolopes as code owners October 19, 2024 16:37

ivanradanov force-pushed the users/ivanradanov/flang-workshare-elemental-lowering branch from d5fbe9c to 8afc111 Compare October 19, 2024 16:37

ivanradanov added 14 commits October 20, 2024 02:22

Emit a proper error message for CFG in workshare

1152749

Cleanup tests

e3130f1

Fix todo tests

75b213f

Fix dst src in copy function

b227891

Use omp.single to handle CFG cases

676bf68

Fix lower workshare tests

4d20893

Different warning

5760383

Fix bug and add better clarification comments

71d13e2

Fix message

4873056

Fix tests

15f8d3d

Do not emit empty omp.single's

b52a6f9

LowerWorkshare tests

21128e7

pipelines fix

e62341d

ivanradanov force-pushed the users/ivanradanov/flang-workshare branch from d64c6c4 to e62341d Compare October 19, 2024 17:22

ivanradanov force-pushed the users/ivanradanov/flang-workshare-elemental-lowering branch from 8afc111 to d8cfd38 Compare October 19, 2024 17:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[flang] Lower omp.workshare to other omp constructs #101446

[flang] Lower omp.workshare to other omp constructs #101446

ivanradanov commented Aug 1, 2024 •

edited

Loading

tblah left a comment

tblah Aug 1, 2024

agozillon Aug 12, 2024

ivanradanov Aug 19, 2024

ivanradanov commented Aug 2, 2024 •

edited

Loading

ivanradanov commented Aug 4, 2024

tblah left a comment

tblah commented Sep 25, 2024

ivanradanov commented Oct 4, 2024

mjklemm commented Oct 11, 2024

ivanradanov commented Oct 20, 2024

Thirumalai-Shaktivel commented Oct 24, 2024

[flang] Lower omp.workshare to other omp constructs #101446

Are you sure you want to change the base?

[flang] Lower omp.workshare to other omp constructs #101446

Conversation

ivanradanov commented Aug 1, 2024 • edited Loading

tblah left a comment

Choose a reason for hiding this comment

tblah Aug 1, 2024

Choose a reason for hiding this comment

agozillon Aug 12, 2024

Choose a reason for hiding this comment

ivanradanov Aug 19, 2024

Choose a reason for hiding this comment

ivanradanov commented Aug 2, 2024 • edited Loading

ivanradanov commented Aug 4, 2024

tblah left a comment

Choose a reason for hiding this comment

tblah commented Sep 25, 2024

ivanradanov commented Oct 4, 2024

mjklemm commented Oct 11, 2024

ivanradanov commented Oct 20, 2024

Thirumalai-Shaktivel commented Oct 24, 2024

ivanradanov commented Aug 1, 2024 •

edited

Loading

ivanradanov commented Aug 2, 2024 •

edited

Loading