Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mutability when yielding by default / const ref / out / default intent #22950

Open
bradcray opened this issue Aug 11, 2023 · 12 comments
Open

Mutability when yielding by default / const ref / out / default intent #22950

bradcray opened this issue Aug 11, 2023 · 12 comments

Comments

@bradcray
Copy link
Member

[related to @vasslitvinov's #21888 as well as https://github.com/Cray/chapel-private/issues/5235]

While talking to @mppf yesterday, we got a bit tangled up in a question about when a loop iterator variable should / should not be mutable. For example, today, given the following loops:

forall i in 1..10 do ...
forall a in myIntArray do ...

i is const / immutable within the loop body (i.e., you couldn't write i += 1;) whereas a is mutable, referring to the elements of the array. This is the behavior I (strongly) believe we want, but the question that tripped us up a bit was "What causes it?"

In my mind, the behavior has been that "any iterator whose yield intent has const nature will prevent the corresponding loop index variable from being modified." But now, I am less convinced that this is the case since, from a few experiments, it looks as though it might simply be "do we yield by default/out intent or not?"

Specifically, if we apply a const ref intent to the following iterator:

iter foo(n) const ref {
  var i: int;
  while i < n {
    yield i;
    writeln("after yield, i is: ", i);
    i+= 1;
  }
}

for i in foo(3) {
  writeln("In loop body, i is: ", i);
  i += 1;
}

we're able to modify the loop index variable within the loop body, and doing so modifies the iterator variable, which doesn't match my intuition (i.e., the const seems to have no effect and gives the exact same behavior as yielding by ref, where I'd call the behavior expected):

In loop body, i is: 0
after yield, i is: 1
In loop body, i is: 2
after yield, i is: 3

Meanwhile, default intent prevents modifications within the loop body as expected:

code.chpl:12: error: cannot assign to const variable

And, interestingly, so does out intent (which may or may not be OK, depending on whether we want to distinguish between out and const out).

Relating this to #21888, I believe our current thinking there is that the default yield iterator should be const, which I interpret as meaning "the implementation can choose between const ref or [const] out", where I'd expect the impact in both cases to be that the loop index variable is immutable within the loop by default. I believe this supports the common case, reduces chances for surprise, and has the benefit of being symmetric with (most) default argument intents being const.

Based on that understanding, I think the two action items here are:

  • To verify that we agree that a const ref yield intent should prevent modification of the loop index variable back at the callsite, implying we have a bug in our implementation today
  • To see whether we think an out intent on an iterator should prevent modification of the loop index variable as it seems to today, or whether a distinct const out intent should imply that.
    • If out implies immutability, the implication would seem to be that there is no way to create an iterator that copies values out resulting in a mutable loop index variable. That may be acceptable, but may require some education, since I don't believe it's obvious that that's what out implies from the name.
    • If we think out shouldn't imply immutability of the loop index variable, that suggests to me that we should introduce a const out intent and have const imply either const out or const ref on iterators. This opens up the question of whether we'd also want to support a const out intent on procedures and how it would behave differently from the out intents we have today.
@bradcray
Copy link
Member Author

Thinking a bit more about my last conversation with Michael about out vs. const out, I think his argument may have been that there's no reason for both because out implies const out (?) (in the sense that we no longer permit modifying the thing returned by a procedure without tucking it into a variable first now that we've closed the array loophole in #22816).

So then thinking more about whether it's important to have a yield intent that says "copy out but allow the index variable to be modified", I'm not sure it's a necessity. In part because I don't think it's a common pattern, and in part because an iterator author could get the same effect by creating a new temporary variable containing the value to be yielded, yielding that, and giving the iterator ref intent.

Alternatively, we could someday support a syntax like for var i in 1..10 do ... which might be considered equivalent to for ii in 1..10 { var i = ii; ... } for users who want to be able to modify their loop index variable, yet make it clearer that it's intended to be a new, modifiable variable that is a copy of the yielded value (which feels equivalent to doing var x = myOutReturningProc(); where it's not the thing being returned by the proc that's mutable, but the new variable x we've created).

@vasslitvinov
Copy link
Member

[This comment starts with the rationale and ends with a proposal.]

I would like to differentiate [const] references from values.

If a piece of code receives a reference, its const-ness is inherent in the reference. For example, if it is a reference to a const array, it is unsound to treat it as a non-const reference. If an iterator yields a const-ref to a non-const array, it is the iterator's decision to prevent the loop from modifying the array.

Cf. if a piece of code receives a value, for example 56, there is generally nothing inherent in that value that tells whether it should be const or not.

I can see the exception to the above being non-POD records and owned classes. However, before we discuss these, let us look from another angle.

When we have a formal with the default intent, we disallow modifying it within the subroutine when the concrete intent is by value. Even when the value is 56 and there is nothing inherently const or non-const in the value, as far as this conversation goes. We disallow modifications not because of any qualities of values, but because we want to prevent certain classes of programmer errors.

What happens today with our loop index variables for value-yielding iterators stems from the same desire to prevent some programming errors. It is not because the value-yielding iterators object modifying their yielded values within the loop.

Going forward, we want to preserve const-ness of loop index variables for iterators that yield by the default intent and inherit the const-ness from iterators that yield by [const] ref. Yes, this implies we have a bug today.

The interesting case is the out yield intent. Today it is synonimous with the default intent so the corresponding index variables are const. When it becomes a yield intent in its own right, the easiest thing to do is to continue making the loop index variable const and allowing the loop to modify it upon for var i in myIterator() ...

W.r.t. the for var form, we want to give the iterator the option to opt out from allowing it. My reasoning is that the loop can always make its own for var by doing

for constI in myIterator() {
  var i = constI;
  // the rest of the loop works as if it were
  //   for var i in myIterator() ...
  ...
}

except for records with non-trivial or disabled copy initializers. In this latter case, a value-yielding iterator may indeed wish to disallow modification of the value it yields. We should come up with a good example when that happens, I expect that such example exists.

Toward that end, I see two choices for the out yield intent

  • allow for var by default; yield by const out to opt out
  • disallow for var by default; yield by var out(??) to opt in

I suggest that the common case is for the iterator to not care whether its yielded values are modified in the loop. This makes me favor the first option, "allow for var by default".

@mppf
Copy link
Member

mppf commented Aug 23, 2023

  • To verify that we agree that a const ref yield intent should prevent modification of the loop index variable back at the callsite, implying we have a bug in our implementation today

I absolutely think this is a bug.

  • To see whether we think an out intent on an iterator should prevent modification of the loop index variable as it seems to today, or whether a distinct const out intent should imply that.

I do not think out intent should inherently prevent modification; however I think we should generalize the lvalue error to a case like for i in 1..10 { i = 30; } since these are more likely to represent programmer error. I don't think we need const out or for var at this time.

  • If we think out shouldn't imply immutability of the loop index variable, that suggests to me that we should introduce a const out intent and have const imply either const out or const ref on iterators. This opens up the question of whether we'd also want to support a const out intent on procedures and how it would behave differently from the out intents we have today.

The const out return/yield intent does not make sense to me. If a subroutine or iterator is returning by value, why should it be able to dictate whether or not it is legal to modify that value? Certainly I could capture the value into a variable, e.g. with var x = f();, and then modify x, right? So, if the iterator/subroutine actually cared about this, it would not be able to enforce it? IMO iterators/subroutines shouldn't care about this. Once you return a value, it goes out into the world on it's own, and the iterators/subroutines just have to let it go.

IMO, the question of whether or not the result of f() is modifyable is about the call site, not the called function. We have the lvalue error because we are trying to avoid confusion at the call site. I can see a similar argument that with for i in 1..10 { i = 4; }, since the modification is discarded at the end of the loop iteration, it's probably representing a programmer error.

Either way, I don't think we need for var. Even if we want to give an lvalue-like error for modifying the local copy, I think that copying the loop index variable into another variable is sufficient for these cases, just as it is sufficient for the lvalue errors on a call.

Summing up:

  • I think that something yielding/returning by out intent should have no say over whether or not the result is modified - that is up to how the calling code is written
  • I am assuming that we will keep the current lvalue error for calls like acceptsByRefAndMutates(returnsSomethingByValue()) since they avoid certain cases of confusion
  • I lean towards extending the lvalue error to similar loop cases, e.g. for i in 1..10 { i = 30; } for the same reason — avoiding confusion at the call site.

@bradcray
Copy link
Member Author

@mppf: Just to make sure I'm understanding your vision w.r.t. this bullet:

I lean towards extending the lvalue error to similar loop cases, e.g. for i in 1..10 { i = 30; } for the same reason — avoiding confusion at the call site.

I think you're saying that the default yield intent for an iterator is const, which is either out or const ref, and that we disallow modification of the index variable in either case, is that right? If so, I can definitely get behind that.

The reason I'm not confident that I'm interpreting that right, though is:

I think that something yielding/returning by out intent should have no say over whether or not the result is modified - that is up to how the calling code is written

Specifically, if I write for i in 1..10 { i = 30; } and it generates an lvalue error, if it's not the out intent that's saying I can't modify i, then what is? And how would I opt into it if I were writing my own iterator on a type?

Certainly I could capture the value into a variable, e.g. with var x = f();, and then modify x, right?

Definitely. In my const out vision of things, I'd say that f() returns a value that can't be modified directly, but can be copied into something else that can be (what I view the declaration above as doing). A default return/yield intent would correspond to a const intent for most types, which would turn into a const out or const ref based on the compiler's choice.

Presumably a corollary would be that for a procedure with an explicit out intent, f() = 10; or takesArgByRef(f()) would be legal, which feels somewhat concerning in the first case, maybe slightly less so for the second.

I think the reason I keep coming back to the question of whether const out vs. out should be two distinct intents relates to:

  • out seeming like an obvious dual to in
  • our support for both in and const in, which have different implications
  • the lvalue error we generate having a const-like feel to it

If I were the only Chapel dev, I find myself thinking that I'd probably not support an out return/yield intent and require it to be spelled const out when explicitly typed out to emphasize this, preserve the symmetry with in and leave open the option of supporting an out that is modifiable down the road. (though this is a position I'm relatively eager to be talked out of... it just feels clearest/safest/most future proof/most orthogonal to in to me)

But then, where I start to balk is "does that mean argument intents should also support const out, and what would/could it really mean in that context?

(It's this line of thinking / attempt to compare in and out that leads me to say "That may be acceptable, but may require some education, since I don't believe it's obvious that that's what out implies from the name." in the OP).

Either way, I don't think we need for var. Even if we want to give an lvalue-like error for modifying the local copy, I think that copying the loop index variable into another variable is sufficient for these cases

I agree that it's sufficient, and wouldn't change anything here now. But since we already support for param i in ... ("make sure the loop index is a param") and want to support for i: int(16) in ..., it seems reasonably orthogonal to also permit for var i in ... or for var i: int(16) in... as a shorthand for having to explicitly copy the index variable and come up with another name for it (like the awkward i vs. ii in my example above).

But this part of the discussion is definitely not crucial, nor on any critical path in my mind, and we can (and probably should) drop it from this discussion.

@vasslitvinov
Copy link
Member

@bradcray - Michael and I agree that the source of the thing passed by an out return/yield/argument intent has nothing to do with the immutability of that thing on the receiving end.

@mppf
Copy link
Member

mppf commented Aug 24, 2023

I think you're saying that the default yield intent for an iterator is const, which is either out or const ref, and that we disallow modification of the index variable in either case, is that right? If so, I can definitely get behind that.

The reason I'm not confident that I'm interpreting that right, though is ...

Yes, I think there is some subtlety. I think we are agreeing that for i in 1..10 { i = 30 } should not compile. However, we have a different vision for the reason it should not compile.

Specifically, if I write for i in 1..10 { i = 30; } and it generates an lvalue error, if it's not the out intent that's saying I can't modify i, then what is? And how would I opt into it if I were writing my own iterator on a type?

Let's consider the function case for a minute, which is perhaps more straightforward.

var global = 1;
proc returnsRef() ref { return global; }
proc returnsConstRef() const ref { return global; }
proc returnsValue() { return global; }

proc acceptsRef(ref arg) { }

acceptsRef(returnsRef()); // OK, we can pass the mutable reference
acceptsRef(returnsConstRef()); // const-ness error. Can't mutate a const ref.
acceptsRef(returnsValue()); // lvalue error - mutating the temp is probably not what you meant to do

The point here is that acceptsRef(returnsConstRef()) and acceptsRef(returnsValue()) are errors for completely different reasons. The first has to be an error for the type system to be sound (at least, assuming we don't add a copy in that case, which is not a good idea for other reasons). The second could work if we wanted it to, but we issue an lvalue error because it's likely to represent a programmer error (since the result of returnsValue is probably mutated and then dropped on the floor). This "likely to represent programmer error" rationale applies only to the call site and IMO it should not be possible for the author of returnsValue to impact it (say with your const out vs out proposal) because they don't have sufficient information to know that the call site does or does not represent a programmer error when the value is dropped on the floor.

Now, extending it to iterators, consider the case of an iterator yielding by value. If the iterator is yielding by value, I think of for i in iterYieldingByValue() { i = 42; } as being an lvalue error. We could make it work if we wanted to -- the type system would be sound, etc. However, it's more likely to represent a programmer error than to be useful, so we make it an lvalue error. And, similarly to the call case, I don't know of any situation in which the iterator author would be able to know that such patterns do or don't represent a programmer error. So, I don't think the iterator author should be able to control whether this loop produces an error. (Of course, the lvalue error would go away if the iterator yields by ref, but that has completely different implications for the iterator, and yielding mutable ref vs value is absolutely something that the iterator author can make decisions about without knowing what the calling code is doing).

Coming back to your question more directly:

Specifically, if I write for i in 1..10 { i = 30; } and it generates an lvalue error, if it's not the out intent that's saying I can't modify i, then what is?

The out intent is indicating a value is returned. That value is then modified and then dropped on the floor. This is likely to be a programmer error. In contrast, if there was ref return intent, our assumption would be that i refers to something that continues to exist after the loop (e.g. an array element), and so it is not likely to represent a programmer error.

And how would I opt into it if I were writing my own iterator on a type?

I don't think the iterator should be able to opt in or out. I think it should be the same for all functions/iterators returning by value and that's just how the language is.

Now, let's go a bit further in exploring this lvalue error idea for loops.

I've already said that I think this should be an lvalue error:

for x in yieldsByValue() {
  x = 42;
}

But, what if we had this:

for x in yieldsByValue() {
  x = 42;
  writeln(x);
}

Now x is not just dropped on the floor; it is used in a subsequent statement that works with the modification. So, perhaps it shouldn't have an lvalue error. But, I would still make it an lvalue error, because the programmer might have expected that by modifying x, they are impacting something elsewhere (say, an element in a data structure). In comparison to a common pattern like a loop iterating over array elements and modifying them -- here x is dropped on the floor and the modifications are lost by the end of the loop.

By making it an lvalue error, we are requiring a user in this position to write

for x in yieldsByValue() {
  var copy = x;
  copy = 42;
  writeln(copy);
}

I think that's good because it makes it very clear that the assignment is not going to impact the data structure being iterated over.

@bradcray
Copy link
Member Author

OK, thanks for the explanation, though now that I (think I) understand your response, I would've answered my question slightly differently:

And how would I opt into it if I were writing my own iterator on a type?

I don't think the iterator should be able to opt in or out. I think it should be the same for all functions/iterators returning by value and that's just how the language is.

This isn't the answer I was expecting given the rest of your response, which makes me think you didn't understand my question / I didn't ask it clearly enough / I was misinterpreting the statement that led to it. Specifically, I think you thought I was asking how I could create an iterator that would permit me to modify the loop index variable without modifying the original expression back in the iterator. In reality, I was trying to make sure I understood how I would create a range-like iterator vs. an array-like iterator. It looks to me like my guess from the previous response was correct, which I'll restate as:

To create an iterator that can't modify its index variables, you would apply the explicit out yield intent, const ref yield intent (modulo the bug), or use the default yield intent (which will map to one of those two). Whereas if you wanted an iterator that did permit modification of its index variable, you would use a ref yield intent.

I think with that, we could consider this issue closed if we have the appropriate follow-up items accounted for:

@mppf
Copy link
Member

mppf commented Aug 25, 2023

Did you take an action item regarding default yield/return intent and the relationship to 'cvref' as part of the broader 'cvref' effort

I've made a note in my task https://github.com/Cray/chapel-private/issues/5235 to describe the const return and yield intent as const [cvref]. However, I'm not certain if we have sufficient agreement about the default yield intent being const [cvref] to document that in the spec. (I think we were leaning that way, but maybe waiting for implementation, but the implementation does not seem so important for the spec change because the const yield intent will mean "up to the compiler" between const ref and returning by value. Is there some other aspect of implementation / experimentation that we are waiting for?) If it's clear to you that we have enough agreement to proceed, I can fold it into my task. If not, we should create an issue.

@bradcray
Copy link
Member Author

I haven't talked directly about it to anyone about it other than you and Vass, but think the three of us agree?

I think we verified that the current implementation—whatever it's doing—always disallows direct modifications to the index variable of an iterator that yields by default intent, so arguably we've already got this behavior, where we just might not be using the same out vs. const ref heuristic that we might ultimately want. Does that seem right, or am I missing a case?

When you talk about implementing it, are you talking about just implementing a heuristic that's maybe more in line with what we'll ultimately want? (e.g., maybe today we always use out, and we'd convert some of those cases to const ref?) Or is there more to it than that?

Do you have other anticipatory thoughts about things you think the implementation will or won't reveal or surprise us with? (I don't).

@mppf
Copy link
Member

mppf commented Aug 25, 2023

I haven't talked directly about it to anyone about it other than you and Vass, but think the three of us agree?

Yes

When you talk about implementing it, are you talking about just implementing a heuristic that's maybe more in line with what we'll ultimately want?

Yes, that's all

Do you have other anticipatory thoughts about things you think the implementation will or won't reveal or surprise us with?

No & additionally I'm not worried about making a spec for the Fall release without the implementation investigation. If when we go to implement const [cvref] behavior differing for different types for iterator yields and we see big problems, we can back away from the spec change.

Looking back at notes, we had a subteam with conclusions in #21888 (comment) . I was bringing up my uncertainty because these conclusions make room for the change to default yield intent we are discussing here; but it was not discussed there to do it as a spec-only change for the Fall release.

@vasslitvinov - do you think it's OK to proceed with a spec change for the fall release, to say that iterators yield by const [cvref] (where that will be defined in the spec similarly to for formal const intent per my cvref spec changes)?

@vasslitvinov
Copy link
Member

@vasslitvinov - do you think it's OK to proceed with a spec change for the fall release, to say that iterators yield by const [cvref] (where that will be defined in the spec similarly to for formal const intent per my cvref spec changes)?

Yes, this sounds good.

@lydia-duncan
Copy link
Member

I believe we came to a conclusion in the discussion of this, but I'm not remembering what it was. Does someone else remember, and can this issue be closed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants