Skip to content

Commit

Permalink
GPU: Apply GPU attributes to promoted expressions (chapel-lang#24981)
Browse files Browse the repository at this point in the history
This PR extends Chapel's GPU variable attribute support to include
promoted expressions. Thus, the following now works as expected:

```Chapel
on here.gpus[0] {
  @gpu.blockSize(32)
  @assertOnGpu
  A = A + 1;
}
```

This was a relatively more challenging tasks, since promoted expressions
create several functions with existing formals, in the middle of
resolution. To achieve this, the PR:

* Adds a 'gpu info' field to the `Promotion` information struct, to be
used when building promotion wrappers. This field is polled for any GPU
attributes that should be inserted into the bodies of promoted
functions.
* Adjust the promotion process to insert formals into the promotion
wrappers that are needed to capture outer variables. E.g., if a promoted
function is marked with `blockSize(expr)`, the free variables in `expr`
need to be added as formals to all the promotion wrappers.
* An implementation detail of this is that the PR exposes the 'consider
for outer' method from building loop functions, to avoid creating
formals for global variables, modules, etc.
* To make sure that all created iterators have the same signature, the
formal insertion happens before creating the additional leader and
follower iterators.
* Adds logic to handle the newly inserted formals to call resolution,
but threading through and modifying 'actualFormals', which tracks how
many actuals were passed and what formals they map to. This also
involves modifying the call to the wrapper function to add the captured
variables.
* Adjusts the `SymbolMap` to allow transitively replacing a symbol. That
is, if a substitution maps a variable A to B, and B to C, then
map.get(A) now returns C. This helps the case of adding outer variables,
which first involves redirecting the 'outer variable' to point to the
newly-inserted formal, then redirecting it to point at a copy of the
formal (when creating the leader/follower iterators). This seems a very
benign change to me and I'm quite surprised no one else has run into
this before.
* Adds a mechanism for allowing duplicate GPU attribute calls. The
problem with expressions like `A + 1 + 1` is that it's easiest to simply
insert a copy of `setBlockSize` for each underlying `forall` loop /
promoted expression. However, this ends up creating several copies of
`setBlockSize`, all from the same source. This seems benign, since they
ought to evaluate to the same thing, and since the GPU transformations
only pick one copy anyway (so, no duplicated side effects). To work
around this, calls to the block size primitive can now optionally
include a second argument for 'unique identifier' If two block size
calls are found to "conflict", but have the same unique identifier, an
error is not emitted. Thus, `blockSize` copies inserted into nested
promoted expressions by the GPU attribute do not cause problems.

Reviewed by @e-kayrakli and @ShreyasKhandekar -- thanks!

# Testing
- [x] paratest
- [x] `test/gpu/native`
  • Loading branch information
DanilaFe authored May 15, 2024
2 parents 592045f + 7a838f7 commit 523ed8b
Show file tree
Hide file tree
Showing 21 changed files with 331 additions and 239 deletions.
2 changes: 1 addition & 1 deletion compiler/AST/LoopExpr.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -578,7 +578,7 @@ bool isOuterVarLoop(Symbol* sym, Expr* enclosingExpr) {
}
}

static bool considerForOuter(Symbol* sym) {
bool considerForOuter(Symbol* sym) {
if (isTypeSymbol(sym->defPoint->parentSymbol)) {
// Fields are considered 'outer'
return true;
Expand Down
35 changes: 26 additions & 9 deletions compiler/AST/baseAST.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -552,24 +552,41 @@ void registerModule(ModuleSymbol* mod) {
}
}

static Symbol* lookupTransitively(SymbolMap* map, Symbol* sym) {
Symbol* x = map->get(sym);
if (!x) return x;

// If the symbol is re-maped again (e.g., x was y and y was z),
// we need to keep looking until we find the final symbol.
while (Symbol* y = map->get(x)) {
// Detect naive cycles. Note that this will not find multi-step cycles,
// but they shouldn't come up. If they do, might need to switch this
// to a tortoise-and-hare algorithm (Floyd's?)
if (y == x) break;
x = y;
}

return x;
}

#define SUB_SYMBOL(x) \
do { \
if (x) \
if (Symbol* y = map->get(x)) \
if (Symbol* y = lookupTransitively(map, x)) \
x = y; \
} while (0)

#define SUB_TYPE(x) \
do { \
if (x) \
if (Symbol* y = map->get(x->symbol)) \
x = y->type; \
#define SUB_TYPE(x) \
do { \
if (x) \
if (Symbol* y = lookupTransitively(map, x->symbol)) \
x = y->type; \
} while (0)

void update_symbols(BaseAST* ast, SymbolMap* map) {
if (SymExpr* sym_expr = toSymExpr(ast)) {
if (sym_expr->symbol()) {
if (Symbol* y = map->get(sym_expr->symbol())) {
if (Symbol* y = lookupTransitively(map, sym_expr->symbol())) {
bool skip = false;

// Do not replace symbols for type constructor calls
Expand Down Expand Up @@ -616,10 +633,10 @@ void update_symbols(BaseAST* ast, SymbolMap* map) {

} else if (ForallStmt* forall = toForallStmt(ast)) {
if (forall->fContinueLabel) {
if (LabelSymbol* y = toLabelSymbol(map->get(forall->fContinueLabel)))
if (LabelSymbol* y = toLabelSymbol(lookupTransitively(map, forall->fContinueLabel)))
forall->fContinueLabel = y;
} else if (forall->fErrorHandlerLabel) {
if (LabelSymbol* y = toLabelSymbol(map->get(forall->fErrorHandlerLabel)))
if (LabelSymbol* y = toLabelSymbol(lookupTransitively(map, forall->fErrorHandlerLabel)))
forall->fErrorHandlerLabel = y;
}
} else if (VarSymbol* ps = toVarSymbol(ast)) {
Expand Down
2 changes: 2 additions & 0 deletions compiler/include/LoopExpr.h
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,8 @@ class LoopExpr final : public Expr {
Expr* getFirstExpr() override;
};

bool considerForOuter(Symbol* sym);

void lowerLoopExprs(BaseAST* ast);

#endif
12 changes: 12 additions & 0 deletions compiler/optimizations/gpuTransforms.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1543,6 +1543,18 @@ void GpuKernel::findGpuPrimitives() {
for_vector(CallExpr, callExpr, callExprsInBody) {
if (callExpr->isPrimitive(PRIM_GPU_SET_BLOCKSIZE)) {
if (blockSizeCall_ != nullptr) {
// Check if the blockSize calls are clones of each other by comparing
// their unique identifier actuals. blockSize calls created for
// attributes get unique number as a second actual, and for clones,
// that number should match.
if (blockSizeCall_->numActuals() == 2 &&
callExpr->numActuals() == 2) {
auto sym1 = toSymExpr(blockSizeCall_->get(2))->symbol();
auto sym2 = toSymExpr(callExpr->get(2))->symbol();

if (sym1 == sym2) continue;
}

USR_FATAL(callExpr, "Can only set GPU block size once per GPU-eligible loop.");
}
blockSizeCall_ = callExpr;
Expand Down
12 changes: 11 additions & 1 deletion compiler/passes/convert-uast.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -4147,6 +4147,14 @@ bool LoopAttributeInfo::insertGpuEligibilityAssertion(BlockStmt* body) {
}

bool LoopAttributeInfo::insertBlockSizeCall(Converter& converter, BlockStmt* body) {
// In cases like compound promotion (A + 1 + 1), we might end up inserting
// the GPU blockSize attribute several times, even though there's only
// one place in the code where the attribute was created. To work around this,
// add a unique identifier integer to each blockSize call. If blockSizes
// are included twice, but they have a unique identifier that matches,
// we can safely ignore the second one.
static int counter = 0;

if (blockSizeAttr) {
if (blockSizeAttr->numActuals() != 1) {
USR_FATAL(blockSizeAttr->id(),
Expand All @@ -4155,7 +4163,9 @@ bool LoopAttributeInfo::insertBlockSizeCall(Converter& converter, BlockStmt* bod
}

Expr* blockSize = converter.convertAST(blockSizeAttr->actual(0));
body->insertAtTail(new CallExpr(PRIM_GPU_SET_BLOCKSIZE, blockSize));
body->insertAtTail(new CallExpr(PRIM_GPU_SET_BLOCKSIZE,
blockSize,
new_IntSymbol(counter++)));
return true;
}
return false;
Expand Down
34 changes: 2 additions & 32 deletions compiler/resolution/functionResolution.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -11701,39 +11701,9 @@ static void applyGpuAttributesToIterableExprs() {

int numUsers = primCall->numActuals();

// Currently, we can't apply attributes from attribute blocks to promoted
// expressions directly. So, for the time being, warn about it if we
// see any promotions.
int numPromotions = 0;

std::vector<CallExpr*> calls;
collectCallExprs(block, calls);
for (auto call : calls) {
if (call->resolvedFunction() && call->resolvedFunction()->hasFlag(FLAG_PROMOTION_WRAPPER)) {
numPromotions++;
}
}

// Check if any of the attributes were 'assertOnGpu', in order to give
// a helpful note about 'GpuDiagnostics'.
bool hasGpuAssertions = false;
for_alist(node, primitivesBlock->body) {
if (auto call = toCallExpr(node)) {
if (call->isPrimitive(PRIM_ASSERT_ON_GPU)) {
hasGpuAssertions = true;
break;
}
}
}

if (numPromotions > 0) {
USR_WARN(block, "GPU attributes on variable declarations are not currently applied to promoted expressions in the variables' initializers");
if (hasGpuAssertions) {
USR_PRINT(block, "consider using the 'GpuDiagnostics' module to ensure that promoted expressions ran on GPU at runtime");
}
} else if (numUsers == 0 && numPromotions == 0) {
if (numUsers == 0) {
USR_FATAL(block, "Found GPU attributes on a variable declaration, but no subexpression to apply them to");
USR_PRINT(block, "GPU attributes on variable declarations are applied to loop expressions in the variable's initializer");
USR_PRINT(block, "GPU attributes on variable declarations are applied to loop expressions and promoted function calls in the variable's initializer");
USR_STOP();
}

Expand Down
Loading

0 comments on commit 523ed8b

Please sign in to comment.