[Transform] Hoist thread-local allocator within the nested parallel loops #283

ciyongch · 2024-08-26T07:55:00Z

This PR is to track #120

This is another implementation of hoisting thread-local allocator base on the new memref-merge design, so it depends on #44

…ijie/mem-merge

Menooker · 2024-09-13T03:00:06Z

lib/gc/Transforms/MergeAllocTickBased.cpp

@@ -30,6 +30,69 @@ using namespace special_ticks;
 /// and default memory space.
 static bool isMemRefTypeOk(MemRefType type) { return type.hasStaticShape(); }

+static inline int64_t getSizeInBytes(MemRefType &memType) {


shall we use sub-class overriding instead of directly changing the to-be-upstreamed code? It can prove that our tick-based interfaces is extendable. And it can decouple the downstream logic from the upstream part.

My consideration is: this is not a big enhancement in addition to the existing general allocator hoist logic within this "framework", if we're going to unify all the allocator behavior in a separate extension instead of mixing them, I think we can go with this way.

It will be hard to extract the downstream logic when the upstream PR got merged and if we would like to rebase. :) Just a suggestion. it is up to you anyway :)

lib/gc/Transforms/MergeAllocTickBased.cpp

Menooker · 2024-09-13T03:07:57Z

lib/gc/Transforms/MergeAllocTickBased.cpp

+              isa<arith::ConstantIndexOp>(ub.getDefiningOp()));
+    });
+
+    isStatic &= llvm::all_of(lowerBounds, [](Value &lb) {


shall we also check the step? I am also not sure if &= will short-cut the evaluation of llvm::all_of(...) if isStatic is false.

Yes, I can add the check for the step, and use early return when the expression return false.

Menooker · 2024-09-13T03:14:27Z

test/mlir/test/gc/Transforms/buffer-merge.mlir

+        %alloc_0 = memref.alloc() : memref<8xf32>
+        %1 = scf.for %k = %lb to %ub step %step
+          iter_args(%iterBuf = %arg0) -> (memref<2xf32>) {


Just to confirm, what are we testing here, to set iter_args to arg0? Is it to check if the scheduler can skip complex lifetime?

Neither, the case is simply to demonstrate the scenario of the mixed usage of scf.forall and scf.for, the only testing purpose is for those allocators in the case. And BTW, iter_args still supports memref in addition to tensor.

We will not generate loops with memref in iter_args after bufferization. This feature is for dynamic allocations which will be identified as complex access.

Thanks for your explanation, the complex access is not the main testing purpose for the case here. And it was originally borrowed from: https://github.com/llvm/llvm-project/blob/main/mlir/test/Dialect/Bufferization/Transforms/buffer-loop-hoisting.mlir#L163-L181.

Menooker · 2024-09-13T03:21:17Z

lib/gc/Transforms/MergeAllocTickBased.cpp

+  while (parent) {
+    if (auto forallOp = dyn_cast<scf::ForallOp>(parent)) {
+      if (isForallLoopBoundStatic(forallOp)) {
+        SmallVector<Value> upperBounds = forallOp.getUpperBound(builder);


suggest to use getStaticUpperBound instead.

Seems getStaticUpperBound() always return the "std::numeric_limits<int64_t>::min()", so I keep the current impl.

OK. That may be we are using constant ops instead of attrs for constant bounds. Another possible way may be to check both MiexedValue and Value being constants.

This doesn't make much difference, both ways shall be fine?

lib/gc/Transforms/MergeAllocTickBased.cpp

zhczhong · 2024-09-20T06:06:42Z

lib/gc/Transforms/MergeAllocTickBased.cpp

+       llvm::zip(forallOp.getMixedLowerBound(), forallOp.getMixedUpperBound(),
+                 forallOp.getMixedStep())) {
+    std::optional<int64_t> ubConst = getConstantIntValue(ub);
+    return ubConst.has_value() && isConstantIntValue(lb, 0) &&


Why not support the case with lb != 0 or step != 1? We may have loop like

scf.forall (%arg7) = (0) to (512) step (32)

This is not a hard limitation, it's now updated to support this case.

zhczhong · 2024-09-20T06:10:15Z

lib/gc/Transforms/MergeAllocTickBased.cpp

+
+  // Get the total number of threads from the outermost to the current level of
+  // the parallel loop that the allocation located in.
+  int64_t numThreads = 1;


The calculation of numThreads could directly use the upstream util constantTripCount.

Thanks for the tips, changed to use the util function.

Menooker and others added 30 commits May 8, 2024 16:36

[mlir][Memref] Add memref-merge optimization

45a02a4

[tests] Add example MLIR-unittest in lit

1615878

format

d1225c0

Merge branch 'yijie/unittest' into yijie/mem-merge

d107580

add test

eaf2667

update doc

f868629

doc

504c785

handle i1

ef3d150

Merge remote-tracking branch 'origin/main' into yijie/mem-merge

21fe5fa

trigger

5db4be4

fix

84a17ac

remove cprt

87a7fb6

Merge branch 'main' of https://github.com/intel/graph-compiler into y…

42d612b

…ijie/mem-merge

update

115fd66

fix lint

26adb18

Merge remote-tracking branch 'origin/main' into yijie/mem-merge

f93f1d2

rename

36354ea

Merge branch 'main' of https://github.com/intel/graph-compiler into y…

82ab370

…ijie/mem-merge

fix

1d3b887

make checker happy

e285e99

fix tidy

5990627

make you happy

eecc19f

Merge branch 'main' of https://github.com/intel/graph-compiler into y…

b3541f0

…ijie/mem-merge

Merge branch 'main' of https://github.com/intel/graph-compiler into y…

57ddeab

…ijie/mem-merge

fix

e5a2f83

Merge branch 'main' of https://github.com/intel/graph-compiler into y…

66309fc

…ijie/mem-merge

port memref-hoist

5953b13

update test cases

ddf69b0

update mlir test

8401ca0

refactor and add new cases

43f27a7

ciyongch added the ready to review label Aug 26, 2024

fix tidy

75a88aa

This was referenced Aug 27, 2024

Corrupt stack detected at runtime #279

Closed

[Transform] Only use gc runtime allocator for stack-like alloca ops #287

Merged

lmontigny added this to the 0.1 CPU milestone Sep 2, 2024

ciyongch added 3 commits September 9, 2024 09:21

Merge branch 'main' into ciyong/memref_hoist_v2

a81249d

Merge branch 'main' into ciyong/memref_hoist_v2

8f5325d

restore unchanged code

5c8fdf4

ciyongch requested a review from ZhennanQin September 12, 2024 09:41

Merge remote-tracking branch 'origin/main' into ciyong/memref_hoist_v2

d2c7cdc

Menooker reviewed Sep 13, 2024

View reviewed changes

lib/gc/Transforms/MergeAllocTickBased.cpp Outdated Show resolved Hide resolved

Menooker reviewed Sep 13, 2024

View reviewed changes

lib/gc/Transforms/MergeAllocTickBased.cpp Outdated Show resolved Hide resolved

Menooker reviewed Sep 13, 2024

View reviewed changes

lib/gc/Transforms/MergeAllocTickBased.cpp Show resolved Hide resolved

ciyongch added 2 commits September 13, 2024 18:02

address comments

fdede5b

Merge remote-tracking branch 'origin/main' into ciyong/memref_hoist_v2

4708126

Menooker approved these changes Sep 18, 2024

View reviewed changes

Merge branch 'main' into ciyong/memref_hoist_v2

cc25380

ciyongch requested review from zhczhong and Yun-Fly September 20, 2024 00:32

Yun-Fly reviewed Sep 20, 2024

View reviewed changes

lib/gc/Transforms/MergeAllocTickBased.cpp Outdated Show resolved Hide resolved

address comment

514664e

Yun-Fly approved these changes Sep 20, 2024

View reviewed changes

Yun-Fly reviewed Sep 20, 2024

View reviewed changes

lib/gc/Transforms/MergeAllocTickBased.cpp Outdated Show resolved Hide resolved

zhczhong reviewed Sep 20, 2024

View reviewed changes

ciyongch added 2 commits September 20, 2024 20:06

address comment

ac8452f

fix lint

99a2c34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Transform] Hoist thread-local allocator within the nested parallel loops #283

[Transform] Hoist thread-local allocator within the nested parallel loops #283

ciyongch commented Aug 26, 2024

Menooker Sep 13, 2024

ciyongch Sep 13, 2024

Menooker Sep 14, 2024

Menooker Sep 13, 2024

ciyongch Sep 13, 2024

Menooker Sep 13, 2024

ciyongch Sep 13, 2024

Menooker Sep 14, 2024

ciyongch Sep 14, 2024

Menooker Sep 13, 2024

ciyongch Sep 13, 2024

Menooker Sep 14, 2024

ciyongch Sep 14, 2024

zhczhong Sep 20, 2024

ciyongch Sep 20, 2024

zhczhong Sep 20, 2024

ciyongch Sep 20, 2024

[Transform] Hoist thread-local allocator within the nested parallel loops #283

Are you sure you want to change the base?

[Transform] Hoist thread-local allocator within the nested parallel loops #283

Conversation

ciyongch commented Aug 26, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment