Dyno: eagerly build iterator groups (in the scope where they're defined) #26039

DanilaFe · 2024-10-04T00:04:54Z

This PR changes the way that iterators are resolved to stop considering the scope in which the iterator is iterated over, and only consider the scope in which it was defined / created. This came out of a discussion with @benharsh , @bradcray and @mppf about the expected semantics of iterators. I wanted to get these changes (which are largely architectural in nature) out of the way before proceeding with other iterator-related tasks such as promotion.

In production, each time an iterator is used, the compiler performs a search for matching overloads. E.g,, if a serial iterator is defined in one place, then a standalone overload is defined in the scope of a loop, the standalone loop will be preferred.

iter myIter() {
  yield 1;
}

{
  iter myIter(param tag) where tag == iterKind.standalone {
    yield "hello";
  }

  [i in myIter()] {
    writeln(i); // prints "Hello"
  }
}

However:

iter myIter() {
  yield 1;
}

{
  iter myIter(param tag) where tag == iterKind.standalone {
    yield "hello";
  }
}
[i in myIter()] {
  writeln(i); // prints 1
}

There are all sorts of other fun cases in which the type of the iterator can change depending on what is being zippered with, since the follower is re-resolved using the new leader's type:

iter myIter() do yield 1;
iter myIter(param tag) where tag == iterKind.leader do yield (1..1,);
iter myIter(param tag, followThis) where tag == iterKind.follower do
    yield if followThis.type.size == 1 then 1 else "hello";

iter otherIter() do yield 1;
iter otherIter(param tag) where tag == iterKind.leader do yield (1..1, 1..1);
iter otherIter(param tag, followThis) where tag == iterKind.follower do yield 1;

forall i in myIter() do
    writeln(i); // prints 1

forall (_, i) in zip(otherIter(), myIter()) do
    writeln(i); // prints "hello"

This PR changes this process in Dyno, making a switch to determining the types of the leader, follower, and standalone iterators associated with a particular function using the scope in which the function is declared, as opposed to the scope in which it's being used. This is somewhat complicated by the existence of point-of-instantiation scopes; generic iterators do include their isntantiation information, so that other overloads are resolved with the same instantiation scope in mind. However, this is determined at the point where the iterator is created, as opposed to the point where the iterator is used in a loop.

As of this PR, both version of the bracket loop programs print i, and the program in which the leader type affects resolution fails to resolve (because only once follower type is allowed per iterator).

Changes in This PR

Bundling PoI scope with iterators

The main trick with this PR was to store enough information in an IteratorType so that we can reconstruct the context in which it was created. This way, given an iterator type, we have all the information we need to find its various overloads. The most complicated situation in this case is when the iterator was constructed from a call to a generic iter procedure; in this case, the body of the procedure -- and presumably the other overloads for the iterator -- can use functions from the context of the call, via the point-of-instantiation mechanism. This allows programs like the following to resolve:

iter myIter(arg) {
  yield computeReturn(arg);
}
iter myIter(arg, param tag) where tag == iterKind.standalone {
  yield computeReturn(arg);
}

{
  proc computeReturn(arg: int) {
    return arg + 1;
  }

  forall i1 in myIter(13) {}
  for j1 in myIter(13) {}
}

{
  proc computeReturn(arg: int) {
    return arg == 1;
  }

  forall i2 in myIter(13) {}
  for j2 in myIter(13) {}
}

In the first block, myIter is instantiated in a scope in which computeReturn returns int, which means myIter -- and all of its versions, including the parallel one -- should return int as well. In the second block, another computeReturn produces bool, so myIter invocations there should create iterators that yield bool. Iterator values can be returned and passed around to different contexts (changing PoI), which to me suggests that the iterator type itself should include the point-of-instantiation information.

This PR takes that approach, storing a PoI scope with the iterator.

Circumventing `CallInfo` etc. for invoking `these` overloads

I noticed that we use "CallInfo" and friends as an odd intermediate representation. The resolver knows that it's handling a special case of resolving these using a tag and optionally a followThis argument; the resolution queries need to specially handle calls to these when they are made, so that they can provide compiler-backed implementations for iterator types. However, to get from the resolver to the resolution queries, prior to this PR, we went through resolveGeneratedCall, which required the construction of actual types etc.. Then, resolution queries had to unpack the CallInfo, inspect the formals, and call in to the special case.

This PR circumvents this by introducing a new IteratorKind enum, which encodes the type of these overload we're looking for (serial, standalone, leader, etc.). A new query in resolution-queries is then defined that handles the special case right away, before constructing a CallInfo. Thus, special overloads (like calls to these on a loop expression) are handled without the intermediate packing and unpacking of actual types into a CallInfo object. This considerably simplifies the resolution process in my opinion.

Issuing Errors for Incompatible Iterator Types

When a call to an iterator is made, the resolver immediately (and eagerly) searches for all other overloads (leader, follower, standalone, etc.). This is possible since the types of these overloads are determined when they are created. By doing this eager resolution, we can also report incompatible yield types. Now -- as discussed with @bradcray et al. -- the serial, standalone, and parallel iterators must all have the same type.

Testing

dyno tests, including new ones of the programs discussed in this PR OP

… groups Signed-off-by: Danila Fedorin <[email protected]>

Signed-off-by: Danila Fedorin <[email protected]>

… same type Signed-off-by: Danila Fedorin <[email protected]>

Signed-off-by: Danila Fedorin <[email protected]>

Their context is entirely determined by where they are created. Signed-off-by: Danila Fedorin <[email protected]>

Signed-off-by: Danila Fedorin <[email protected]>

The _created_ PoI scope after instantiation is what is fed to return type inference, as opposed to the "original" PoI scope in which the function was resolved. Signed-off-by: Danila Fedorin <[email protected]>

… overloads Signed-off-by: Danila Fedorin <[email protected]>

Signed-off-by: Danila Fedorin <[email protected]>

mppf · 2024-10-07T13:33:47Z

Iterator values can be returned and passed around to different contexts (changing PoI), which to me suggests that the iterator type itself should include the point-of-instantiation information.

In the event that the iterators are using the same functions from POI even though the PoiScope is different, will we end up resolving just one function? (I think it will work this way (at least if it works that way for other Function resolution) because the iterator logic will use the regular Function resolution code.

DanilaFe · 2024-10-07T18:30:58Z

In the event that the iterators are using the same functions from POI even though the PoiScope is different, will we end up resolving just one function?

Yes, with the caveat that to determine used functions from PoI we need to resolve the function body. So, the body resolution process happens again, but the results of running it are re-used.

mppf · 2024-10-08T18:02:28Z

frontend/include/chpl/resolution/resolution-queries.h

@@ -493,6 +500,22 @@ const uast::Decl* findFieldByName(Context* context,
                                  const types::CompositeType* ct,
                                  UniqueString name);

+const MostSpecificCandidate&


these new functions need docs comments

mppf · 2024-10-08T18:03:19Z

frontend/include/chpl/resolution/resolution-queries.h

@@ -408,7 +408,7 @@ CallResolutionResult resolveCallInMethod(ResolutionContext* rc,
  as the point-of-instantiation scopes that were used when resolving them.
 */
 CallResolutionResult resolveGeneratedCall(Context* context,
-                                          const uast::AstNode* astForErr,
+                                          const uast::AstNode* astForErrAndPoi,


IMO if it's used for multiple things it's better to have a short name like astContext and then describe what it's for in the docs comment.

mppf · 2024-10-08T18:04:03Z

frontend/include/chpl/resolution/resolution-types.h

+        /* rejectedPossibleIteratorCandidates */ false,
+        types::QualifiedType(),
+        PoiInfo());
+  }


I would think the zero-args constructor should do the same thing?

mppf · 2024-10-08T18:07:55Z

frontend/include/chpl/types/LoopExprIteratorType.h

@@ -73,6 +74,7 @@ class LoopExprIteratorType final : public IteratorType {
  bool contentsMatchInner(const Type* other) const override {
    auto rhs = (LoopExprIteratorType*) other;
    return yieldType_ == rhs->yieldType_ &&
+           poiScope_ == rhs->poiScope_ &&


IMO it's better to have a function for IteratorType to call that handles the fields there. An example from uAST is simpleBlockLikeContentsMatchInner.

mppf · 2024-10-08T18:08:09Z

frontend/include/chpl/types/PromotionIteratorType.h

      scalarFn_(scalarFn),
      promotedFormals_(std::move(promotedFormals)) {}

  bool contentsMatchInner(const Type* other) const override {
    auto rhs = (PromotionIteratorType*) other;
    return this->yieldType_ == rhs->yieldType_ &&
+           this->poiScope_ == rhs->poiScope_ &&


mppf · 2024-10-08T18:14:20Z