Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add/improve errors for .localAccess() calls to non-local elements #25754

Merged
merged 20 commits into from
Sep 10, 2024

Conversation

bradcray
Copy link
Member

@bradcray bradcray commented Aug 14, 2024

This PR adds and improves errors for .localAccess() calls that access remote elements.

The original motivation for the PR was to fix the case noted in issue #25747 in which .localAccess() on a default rectangular array would always pass, whether that array was stored locally or on a remote locale (captured in test/arrays/locality/localAccess/localAccess.chpl). It also improves the error message for .localAccess() violations on block arrays by expressing the error as a locality error rather than a simple out-of-bounds error, as before (also reported in #25747, captured in test/arrays/locality/localAccess/localAccess-block*.chpl). These new tests make sure that these errors are generated for read and write cases as well as for POD vs. non-POD arrays, since our modules have distinct dsiAccess overloads for these cases.

The check itself is implemented as an additional/optional check in checkAccess() which is called for all array accesses when checking is on. It now takes an optional ensureLocal param that is false by default, but set to true for all .localAccess() code paths to enable the bounds checking. The logic itself generates different error messages depending on whether the array is completely local to a single (remote) locale, distributed but the current locale owns nothing, or distributed but the index is outside the current locale's bounds.

The implementation approach I originally pursued to determine whether an access was in-bounds was to use the array's localSubdomains() call to check that the localAccess's indices were in-bounds. However, this caused problems for the Stencil distribution since its localSubdomain(s) (correctly) only refer to the indices that the locale truly owns, but not ones that are part of its "fluff" (cached ghost cells). Yet, we also want that fluff to be accessible using localAccess() since that's how we get performance when the compiler can't optimize them automatically, requiring a slightly more sophisticated approach described in the next paragraph. New test test/distributions/stencil/localAccess.chpl locks in a very simple case of this Stencil distribution pattern, with some accesses passing due to true ownership, some due to fluff ownership, and one generating an error due to a remote access.

The Stencil distribution challenge led to the approach taken here, which is to introduce a new internal iterator called chpl__localStoredSubdomains whose purpose is to yield a locale's subdomains, including indices that it doesn't truly own, like the fluff of the stencil distribution. This is implemented using an optional call on the array, doiLocalStoredSubdomains(), and if not supported, we fall back to using the localSubdomains() call instead. I then implemented doiLocalStoredSubdomains() on Stencil distributions.

Along the way, I found that dsiLocalSubdomains() calls weren't implemented for Block, Cyclic, and Stencil, so added those as well.

Then, while testing this, I found that our current tests that lock in support for using oversubscribed block/cyclic/stencil distributions (that is, ones in which a single locale appears in targetLocales multiple times) were getting away with using .localAccess() to access elements that weren't actually local, so I updated them to only update local array elements. The reason for this is that the dsiAccess() routine on these distributions simply falls back to a normal dsiAccess() in the event of oversubscription—meaning that we're also losing some performance there in addition to not actually doing a local access... I also found that the stencil variant of this test was never set up to run with multiple locales, so updated it to do so.

Note that much of the code in this effort may not be optimal performance-wise (e.g., iterating over local subdomains when the common case is that a locale only has a single subdomain), but that this seemed acceptable—at least for the time being—since this code is only called when checks are on.

Two bad files test/expressions/if-expr/*.bad had to be updated due to shifting IDs in the AST… :(

Resolves #25747.

I considered doing this in the original draft since the value is
always known and the different callsites need different things,
but shied away from it worrying about code size increases.
However, I'm getting problems with module initialization orders
for tests like:

arrays/ferguson/array-initialization-patterns/array-init-replicated.chpl

---
Signed-off-by: Brad Chamberlain <[email protected]>
(not to say they wouldn't be interesting to test on a single as well
to make sure there's no error, but getting the .good files right
doesn't seem worth the effort given that they'll vary from case to
case.

---
Signed-off-by: Brad Chamberlain <[email protected]>
@bradcray

This comment was marked as resolved.

@e-kayrakli

This comment was marked as resolved.

@e-kayrakli

This comment was marked as resolved.

@bradcray

This comment was marked as resolved.

e-kayrakli added a commit that referenced this pull request Aug 28, 2024
Resolves an issue @bradcray bumped into his PR:
#25754 (comment).
As such, merging this should unblock
#25754.

Consider:

```chpl
forall i in MyArr.domain {
  ... MyArr[i] ...
}
```

This innocent loop is suitable for ALA (and AA in more complicated
scenarios) if MyArr is distributed. In such cases, distributed arrays'
iterators will ensure that the array is accesssed locally. That's not
the case for DefaultRectangular arrays, though. I personally find this
confusing and [want to change
it](#25351). However, until
we do that, we should disable the optimization to comply with the
current semantics/implementation of the parallel iterators.

We could consider making the core analysis account for arrays/dists that
support ALA only through some special dynamic checks. I think that's
definitely doable. But that's not easy and its real impact is
questionable. Thus, this PR disables ALA and AA for DefaultRectangulars.

Note that this is simply done by flipping the default value of the
`config param defaultRectangularSupportsAutoLocalAccess`. I am leaving
that flag in, even though opting into the optimization will result in
potentially erroneous behavior. This flag is not documented and can
still be used for some easy experimentation as the said erroneous
behavior is relatively uncommon.

[Trivial, not reviewed, but direction OK'ed by @bradcray]

Test:
- [x] linux64
- [x] gasnet
@e-kayrakli

This comment was marked as resolved.

@bradcray

This comment was marked as resolved.

@bradcray

This comment was marked as resolved.

… locales

It seems these have never worked, and that the tests written to see
whether they did were getting away with things?

Also, add a test showing why my implementation so far is insufficient
to make locality checks for Stencils work.

---
Signed-off-by: Brad Chamberlain <[email protected]>
…ncilDist

This adds a new iterator similar to localSubdomains() yet which
returns all stored local indices, whether part of the locale's
subdomain or not, to permit fluff to be considered "in bounds" for
local accesses.

---
Signed-off-by: Brad Chamberlain <[email protected]>
@bradcray

This comment was marked as resolved.

---
Signed-off-by: Brad Chamberlain <[email protected]>
Also, add new test of multidim localAccess bounds violation to lock that
case in.

---
Signed-off-by: Brad Chamberlain <[email protected]>
---
Signed-off-by: Brad Chamberlain <[email protected]>
@bradcray bradcray marked this pull request as ready for review September 9, 2024 23:43
Copy link
Contributor

@e-kayrakli e-kayrakli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR! I have two points in the inline comments:

  • Your preference of the dsiLocalSubdomains implementation feels less performant and maintainable than a simple lookup
  • It'd be good to add a test for illegal localAccess in an oversubscribed distribution

iter BlockDom.dsiLocalSubdomains(loc: locale) {
for locid in dist.chpl__locToLocIdxs(loc) {
var inds = chpl__computeBlock(locid, dist.targetLocDom, dist.boundingBox, dist.boundingBox.dims());
yield whole[(...inds)];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am surprised by this implementation. Shouldn't it be just yield this.locDoms[locid].myBlock;?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, good question. I took this approach due to the else clause in the dsiLocalSubdomain() routine just above, and to make it work with oversubscription; but in retrospect, I think you're right that I over-engineered it. Specifically, I believe I was thinking that myBlock would only work in a non-oversubscribed environment, but with the loop over locids, I think you're probably right. I'll check that out to make sure that it works and doesn't introduce communication.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooh, and I can update the tests to do the localAcesss calls within local blocks as a way to both verify this and lock it in going forward.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But then in my final review, I realized that the locale argument was never used / always defaulted to here, so removed that logic. So again, I think you're right.

Ah wait, it is necessary (if regrettable from a complexity standpoint) to do here since this routine takes a locale argument, so may be answering local subdomain queries about a remote locale's ownership, and we don't want to communicate with that remote locale to answer the question.

We could simplify / streamline the (common) locale == here case by either putting a conditional into this loop to check for it, or by creating two overloads of the routine, one which takes no argument (defaulting to here) and the other which doesn't—then updating ChapelArray.chpl's code to similarly split the implementation into two versions, each of which calls the corresponding one here. I don't feel strongly inclined to do either of those things in this PR because I worry about keeping the two implementations in sync and would need to add more testing to distinguish between the remote and local cases. It also feels a little like a change to the public interface (though it's one that I think would be acceptable since it wouldn't break existing code); or we could keep the same public interface and put a conditional into the outer call, but that doesn't seem great either.

That said, the StencilDom.doiLocalStoredSubdomains case should not be as expensive as I was fearing in the comment you commented on now that we don't accept a locale argument in that iterator, so that comment does need to be updated (and I can add a locality check to that test to lock that in as well).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I see. I agree that the solution you outlined would be good, but shouldn't be a part of this PR.

iter CyclicDom.dsiLocalSubdomains(loc: locale) {
for locid in dist.chpl__locToLocIdxs(loc) {
var inds = chpl__computeCyclic(idxType, locid, dist.targetLocDom.dims(), dist.startIdx);
yield whole[(...inds)];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

iter StencilDom.dsiLocalSubdomains(loc: locale) {
for locid in dist.chpl__locToLocIdxs(loc) {
var inds = chpl__computeBlock(locid, dist.targetLocDom, dist.boundingBox, dist.boundingBox.dims());
yield whole[(...inds)];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

// Could calculate the indices directly, as in the normal
// subdomains case above, but that's more complicated for
// StencilDist, and this isn't on performance-critical paths
// anyway, so I'm just looking it up.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment confuses me. In other instances, you are doing some math followed by a domain slice. Looking it up feels like the more performant and maintainable implementation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, there's one other piece of history here that I was forgetting, but that this comment reminds me of. I originally wrote these iterators to take a locale argument, and didn't want to do communication to answer the question. But then in my final review, I realized that the locale argument was never used / always defaulted to here, so removed that logic. So again, I think you're right.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've now updated the comment to indicate that this is guaranteed to be local since we're only ever making this query for here (whereas the traditional localSubdomains() queries can be called with any locale argument to determine what subdomains are local to a given locale).

@@ -1,3 +1,3 @@
1 0 0 0 0 0 0 0 0 10
1 0 0 0 0 0 0 0 0 10
1 0 0 0 0 0 7 0 0 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not following the motivation for the changes in duplicateTargetLocales tests. I don't have a reason to oppose them either.

However, if I am following the diff correctly, there is no test for the error message of illegal localAccess in an oversubscribed distribution. If that's the case, could you add something simple for that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you're correct. We were permitting illegal localAccesses in oversubscribed cases (and still do with checks off), and you're also correct that adding a test of that would be good. Perhaps I can just add it to the end of these tests.

This also makes me realize: I'll need to skipif these tests with --fast and --no-checks compilations, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or, just stick a --checks in their compopts, which feels a bit easier and somewhat better as the nature of this test will now become also about testing the --checks behavior. I don't have any strong opinions here, though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah, thanks for the reminder. I don't do this nearly often enough…

Are you confident that compopts will follow the --fast thrown by nightly testing? (that is, that I don't need to use a lastcompopts to get it in the right place?). I suspect you use this often enough that you are…

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I've just verified that compopts is sufficient…

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

The main piece of feedback I'm acting on here is to update the comment
in StencilDom.doiLocalStoredSubdomains(), which had fallen out of date
when I removed the 'locale' argument to the iterator in a
late-breaking commit.  Here, I've updated it to note that the current
call should be communication-free since we are always answering the
question for 'here'.  I've also updated the stencil localAccess.chpl
test to wrap the localAccess calls in a `local` block to lock in that
no communication is required to implement the calls.

Then I took that same principle to the localAccess-block*.chpl tests
to lock in that the block localAccess calls do not require
communication.

Note that I used an unusual formatting scheme here because I didn't
want to have to update the line numbers of the error messages... :(

---
Signed-off-by: Brad Chamberlain <[email protected]>
Engin pointed out, rightly, that in updating these tests to make the
localaccess calls legal, I aso pointed out that we had a pre-existing
bug specific to the case of oversubsctibed locales that we didn't have
any tests catching.  So here, I've added the illegal accesses back at
the end of the test to lock in the remote accesses.

The reason that these cases are different is that our dsiLocalAccess()
calls on Block and Cyclic use different (arguably incorrect) logic
when used in an oversubscribed manner.

---
Signed-off-by: Brad Chamberlain <[email protected]>
While reviewing the block and cyclic versions of this test, it wasn't
clear to me why this stencil version of the test didn't require changes
when the others did.  It turns out that the answer is that it wasn't
ever enabled for multi-locale runs...  Fixing that here, and updating
it similar to the block and stencil versions.

---
Signed-off-by: Brad Chamberlain <[email protected]>
This adds a --checks flag to all these tests I've been working on for
localAccess() errors to ensure that they don't fail in --fast testing.
This may be the first time in years I've thought of this before having
the nightly tests point out my mistake (?).

---
Signed-off-by: Brad Chamberlain <[email protected]>
@bradcray
Copy link
Member Author

@e-kayrakli : OK, I think this is ready for another look. Apologies for all the talking to myself in comments and backpedaling—I am sometimes too easily convinced I'm wrong. To recap, since your original review:

  • I haven't changed the implementations of the dsiLocalSubdomains() calls because the current implementations are communication-free when called with any locale, which is a property I think we want to preserve. With more effort, we could use different implementations for when the argument locale is ==here vs. != here, but I don't think that additional complexity (and change) is worth the effort, at least as part of this PR.
  • I have updated my tests to do the localAccess() calls within local blocks to ensure that the current calls are not doing communication, and to lock this in going forward. Note that I have not tested making a dsiLocalSubdomains call from a remote locale to make sure communication is not required (but could, if desired)
  • I updated the out-of-date comment in the new doi* iterator on Stencil now that it doesn't take an arbitrary locale argument anymore; the current approach is communication-free as a result, so not at all a compromise
  • I added the missing checks you noted that lock in that we correctly get locality errors now in the oversubscribed case for a non-local localAccess() call where we were not before this PR; I did this as updates to the duplicateTargetLocales tests; I also added new comm-none .good files for these cases since they don't have the OOB issue
  • I also remembered during this process that (earlier) I'd discovered that the stencil version of duplicateTargetLocales.chpl wasn't actually running using multiple locales, but then forgot to act on that at all. So here, I've updated it to handle multiple locales, and in the same ways that I updated the block/cyclic cases.
  • I added the --checks flags we discussed to compopts files to try and head off failures in --fast testing

@bradcray bradcray merged commit 15e0d6e into chapel-lang:main Sep 10, 2024
7 checks passed
@bradcray bradcray deleted the localaccess-error-improvements branch September 10, 2024 22:04
bradcray added a commit to bradcray/chapel that referenced this pull request Sep 12, 2024
While chapel-lang#25754 did a nice job of improving the error messages for local
accesses, its implementation was sufficiently heavyweight that it
resulted in new timeouts in testing for tests that made a lot of use
of localAccesses.  On one hand, I didn't anticipate how much overhead
the implementation would take; on the other hand, I incorrectly
thought it would only occur for codes that used localAccess() (with
checks on), forgetting that our auto-local-access (ALA) optimization
would also cause it to fire for codes that didn't.  As a particularly
bad example, Engin found that test/studies/hpcc/PTRANS/old/PTRANS.chpl
went from 35 to 62 seconds on his laptop despite not using any
explicit localAccess calls.

We could just turn off the new checks by default for this release, but
it felt simpler/saner to me to just revert the PR for now.

I think the way to improve this overhead going forward is probably
to change the logic from the current approach of:

* gather all local subdomains a locale owns
  - see if the index we're accessing is within any of them

to:

* ask the distribution itself whether the index is local
* only gather all local subdomains if we get a locality error
  and need to print out what the local locale owns

Even in that case, we could improve on the current logic by doing more
specialization for the (common) "locale only owns a single sublocale"
case than I did in the currenet approach.  However, I think it's fine
to spend more time gathering the subdomains when we're about to print
an error and exit the program anyway.

---
Signed-off-by: Brad Chamberlain <[email protected]>
bradcray added a commit that referenced this pull request Sep 12, 2024
[trivial, not reviewed]

While #25754 did a nice job of improving the error messages for local
accesses, its implementation was sufficiently heavyweight that it
resulted in new timeouts in testing for tests that made a lot of use of
localAccesses, primarily noticed in configurations that were already
slow (valgrindexe, memleaks, baseline).

On one hand, I didn't fully appreciate how much overhead the new
implementation added; but more importantly, I incorrectly thought it
would only come into play rarely—for codes that explicitly used
localAccess() (with checks on), which I think of as being an uncommon
occurrence. What I forgot is that our auto-local-access (ALA)
optimization would also cause it to fire for many codes that don't
contain explicit localAccess calls. As a particularly bad example, Engin
found that test/studies/hpcc/PTRANS/old/PTRANS.chpl went from 35 to 62
seconds on his laptop despite not containing any localAccess calls.

We could just turn off the new checks by default for this release, but
it felt simpler/saner to me to just revert the PR for now given that
it's code-freeze day and that this wasn't a release-critical change.

I think the way to improve this overhead going forward is probably to
change the logic from the current approach of:

* gather all local subdomains a locale owns
  - see if the index we're accessing is within any of them
  - print the subdomains out in the error message if it's not

to:

* ask the distribution itself whether the index is stored locally
- this is trivial for most distributions; slightly less-so for the
Stencil distribution due to fluff
* only gather all local subdomains if we get a locality error and need
to print out which indices the local locale owns (or maybe just have the
distribution print that out itself?)

Even in that case, we could improve on the current logic by doing more
specialization for the (common) "locale only owns a single sublocale"
case than I did in the current approach. However, I think it's arguably
fine to spend more time gathering the subdomains when we're about to
print an error and exit the program anyway.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement/improve error messages for misuse of localAccess
2 participants