Add/improve errors for `.localAccess()` calls to non-local elements #25754

bradcray · 2024-08-14T02:47:04Z

This PR adds and improves errors for .localAccess() calls that access remote elements.

The original motivation for the PR was to fix the case noted in issue #25747 in which .localAccess() on a default rectangular array would always pass, whether that array was stored locally or on a remote locale (captured in test/arrays/locality/localAccess/localAccess.chpl). It also improves the error message for .localAccess() violations on block arrays by expressing the error as a locality error rather than a simple out-of-bounds error, as before (also reported in #25747, captured in test/arrays/locality/localAccess/localAccess-block*.chpl). These new tests make sure that these errors are generated for read and write cases as well as for POD vs. non-POD arrays, since our modules have distinct dsiAccess overloads for these cases.

The check itself is implemented as an additional/optional check in checkAccess() which is called for all array accesses when checking is on. It now takes an optional ensureLocal param that is false by default, but set to true for all .localAccess() code paths to enable the bounds checking. The logic itself generates different error messages depending on whether the array is completely local to a single (remote) locale, distributed but the current locale owns nothing, or distributed but the index is outside the current locale's bounds.

The implementation approach I originally pursued to determine whether an access was in-bounds was to use the array's localSubdomains() call to check that the localAccess's indices were in-bounds. However, this caused problems for the Stencil distribution since its localSubdomain(s) (correctly) only refer to the indices that the locale truly owns, but not ones that are part of its "fluff" (cached ghost cells). Yet, we also want that fluff to be accessible using localAccess() since that's how we get performance when the compiler can't optimize them automatically, requiring a slightly more sophisticated approach described in the next paragraph. New test test/distributions/stencil/localAccess.chpl locks in a very simple case of this Stencil distribution pattern, with some accesses passing due to true ownership, some due to fluff ownership, and one generating an error due to a remote access.

The Stencil distribution challenge led to the approach taken here, which is to introduce a new internal iterator called chpl__localStoredSubdomains whose purpose is to yield a locale's subdomains, including indices that it doesn't truly own, like the fluff of the stencil distribution. This is implemented using an optional call on the array, doiLocalStoredSubdomains(), and if not supported, we fall back to using the localSubdomains() call instead. I then implemented doiLocalStoredSubdomains() on Stencil distributions.

Along the way, I found that dsiLocalSubdomains() calls weren't implemented for Block, Cyclic, and Stencil, so added those as well.

Then, while testing this, I found that our current tests that lock in support for using oversubscribed block/cyclic/stencil distributions (that is, ones in which a single locale appears in targetLocales multiple times) were getting away with using .localAccess() to access elements that weren't actually local, so I updated them to only update local array elements. The reason for this is that the dsiAccess() routine on these distributions simply falls back to a normal dsiAccess() in the event of oversubscription—meaning that we're also losing some performance there in addition to not actually doing a local access... I also found that the stencil variant of this test was never set up to run with multiple locales, so updated it to do so.

Note that much of the code in this effort may not be optimal performance-wise (e.g., iterating over local subdomains when the common case is that a locale only has a single subdomain), but that this seemed acceptable—at least for the time being—since this code is only called when checks are on.

Two bad files test/expressions/if-expr/*.bad had to be updated due to shifting IDs in the AST… :(

Resolves #25747.

--- Signed-off-by: Brad Chamberlain <[email protected]>

I considered doing this in the original draft since the value is always known and the different callsites need different things, but shied away from it worrying about code size increases. However, I'm getting problems with module initialization orders for tests like: arrays/ferguson/array-initialization-patterns/array-init-replicated.chpl --- Signed-off-by: Brad Chamberlain <[email protected]>

--- Signed-off-by: Brad Chamberlain <[email protected]>

(not to say they wouldn't be interesting to test on a single as well to make sure there's no error, but getting the .good files right doesn't seem worth the effort given that they'll vary from case to case. --- Signed-off-by: Brad Chamberlain <[email protected]>

@bradcray

Resolves an issue @bradcray bumped into his PR: #25754 (comment). As such, merging this should unblock #25754. Consider: ```chpl forall i in MyArr.domain { ... MyArr[i] ... } ``` This innocent loop is suitable for ALA (and AA in more complicated scenarios) if MyArr is distributed. In such cases, distributed arrays' iterators will ensure that the array is accesssed locally. That's not the case for DefaultRectangular arrays, though. I personally find this confusing and [want to change it](#25351). However, until we do that, we should disable the optimization to comply with the current semantics/implementation of the parallel iterators. We could consider making the core analysis account for arrays/dists that support ALA only through some special dynamic checks. I think that's definitely doable. But that's not easy and its real impact is questionable. Thus, this PR disables ALA and AA for DefaultRectangulars. Note that this is simply done by flipping the default value of the `config param defaultRectangularSupportsAutoLocalAccess`. I am leaving that flag in, even though opting into the optimization will result in potentially erroneous behavior. This flag is not documented and can still be used for some easy experimentation as the said erroneous behavior is relatively uncommon. [Trivial, not reviewed, but direction OK'ed by @bradcray] Test: - [x] linux64 - [x] gasnet

…alaccess-error-improvements

… locales It seems these have never worked, and that the tests written to see whether they did were getting away with things? Also, add a test showing why my implementation so far is insufficient to make locality checks for Stencils work. --- Signed-off-by: Brad Chamberlain <[email protected]>

…ncilDist This adds a new iterator similar to localSubdomains() yet which returns all stored local indices, whether part of the locale's subdomain or not, to permit fluff to be considered "in bounds" for local accesses. --- Signed-off-by: Brad Chamberlain <[email protected]>

…alaccess-error-improvements

--- Signed-off-by: Brad Chamberlain <[email protected]>

…ases --- Signed-off-by: Brad Chamberlain <[email protected]>

Also, add new test of multidim localAccess bounds violation to lock that case in. --- Signed-off-by: Brad Chamberlain <[email protected]>

--- Signed-off-by: Brad Chamberlain <[email protected]>

e-kayrakli

Thanks for this PR! I have two points in the inline comments:

Your preference of the dsiLocalSubdomains implementation feels less performant and maintainable than a simple lookup
It'd be good to add a test for illegal localAccess in an oversubscribed distribution

e-kayrakli · 2024-09-10T17:18:25Z

modules/dists/BlockDist.chpl

+iter BlockDom.dsiLocalSubdomains(loc: locale) {
+  for locid in dist.chpl__locToLocIdxs(loc) {
+    var inds = chpl__computeBlock(locid, dist.targetLocDom, dist.boundingBox, dist.boundingBox.dims());
+    yield whole[(...inds)];


I am surprised by this implementation. Shouldn't it be just yield this.locDoms[locid].myBlock;?

Hmm, good question. I took this approach due to the else clause in the dsiLocalSubdomain() routine just above, and to make it work with oversubscription; but in retrospect, I think you're right that I over-engineered it. Specifically, I believe I was thinking that myBlock would only work in a non-oversubscribed environment, but with the loop over locids, I think you're probably right. I'll check that out to make sure that it works and doesn't introduce communication.

Ooh, and I can update the tests to do the localAcesss calls within local blocks as a way to both verify this and lock it in going forward.

But then in my final review, I realized that the locale argument was never used / always defaulted to here, so removed that logic. So again, I think you're right.

Ah wait, it is necessary (if regrettable from a complexity standpoint) to do here since this routine takes a locale argument, so may be answering local subdomain queries about a remote locale's ownership, and we don't want to communicate with that remote locale to answer the question.

We could simplify / streamline the (common) locale == here case by either putting a conditional into this loop to check for it, or by creating two overloads of the routine, one which takes no argument (defaulting to here) and the other which doesn't—then updating ChapelArray.chpl's code to similarly split the implementation into two versions, each of which calls the corresponding one here. I don't feel strongly inclined to do either of those things in this PR because I worry about keeping the two implementations in sync and would need to add more testing to distinguish between the remote and local cases. It also feels a little like a change to the public interface (though it's one that I think would be acceptable since it wouldn't break existing code); or we could keep the same public interface and put a conditional into the outer call, but that doesn't seem great either.

That said, the StencilDom.doiLocalStoredSubdomains case should not be as expensive as I was fearing in the comment you commented on now that we don't accept a locale argument in that iterator, so that comment does need to be updated (and I can add a locality check to that test to lock that in as well).

👍 I see. I agree that the solution you outlined would be good, but shouldn't be a part of this PR.

e-kayrakli · 2024-09-10T17:19:02Z

modules/dists/CyclicDist.chpl

+iter CyclicDom.dsiLocalSubdomains(loc: locale) {
+  for locid in dist.chpl__locToLocIdxs(loc) {
+    var inds = chpl__computeCyclic(idxType, locid, dist.targetLocDom.dims(), dist.startIdx);
+    yield whole[(...inds)];


e-kayrakli · 2024-09-10T17:19:10Z

modules/dists/StencilDist.chpl

+iter StencilDom.dsiLocalSubdomains(loc: locale) {
+  for locid in dist.chpl__locToLocIdxs(loc) {
+    var inds = chpl__computeBlock(locid, dist.targetLocDom, dist.boundingBox, dist.boundingBox.dims());
+    yield whole[(...inds)];


e-kayrakli · 2024-09-10T17:20:38Z

modules/dists/StencilDist.chpl

+    // Could calculate the indices directly, as in the normal
+    // subdomains case above, but that's more complicated for
+    // StencilDist, and this isn't on performance-critical paths
+    // anyway, so I'm just looking it up.


This comment confuses me. In other instances, you are doing some math followed by a domain slice. Looking it up feels like the more performant and maintainable implementation.

Ah, there's one other piece of history here that I was forgetting, but that this comment reminds me of. I originally wrote these iterators to take a locale argument, and didn't want to do communication to answer the question. But then in my final review, I realized that the locale argument was never used / always defaulted to here, so removed that logic. So again, I think you're right.

I've now updated the comment to indicate that this is guaranteed to be local since we're only ever making this query for here (whereas the traditional localSubdomains() queries can be called with any locale argument to determine what subdomains are local to a given locale).

e-kayrakli · 2024-09-10T17:29:08Z

test/distributions/block/duplicateTargetLocales.allow.good

@@ -1,3 +1,3 @@
 1 0 0 0 0 0 0 0 0 10
-1 0 0 0 0 0 0 0 0 10
+1 0 0 0 0 0 7 0 0 0


I am not following the motivation for the changes in duplicateTargetLocales tests. I don't have a reason to oppose them either.

However, if I am following the diff correctly, there is no test for the error message of illegal localAccess in an oversubscribed distribution. If that's the case, could you add something simple for that?

Yeah, you're correct. We were permitting illegal localAccesses in oversubscribed cases (and still do with checks off), and you're also correct that adding a test of that would be good. Perhaps I can just add it to the end of these tests.

This also makes me realize: I'll need to skipif these tests with --fast and --no-checks compilations, right?

Or, just stick a --checks in their compopts, which feels a bit easier and somewhat better as the nature of this test will now become also about testing the --checks behavior. I don't have any strong opinions here, though.

Oh yeah, thanks for the reminder. I don't do this nearly often enough…

Are you confident that compopts will follow the --fast thrown by nightly testing? (that is, that I don't need to use a lastcompopts to get it in the right place?). I suspect you use this often enough that you are…

I think I've just verified that compopts is sufficient…

The main piece of feedback I'm acting on here is to update the comment in StencilDom.doiLocalStoredSubdomains(), which had fallen out of date when I removed the 'locale' argument to the iterator in a late-breaking commit. Here, I've updated it to note that the current call should be communication-free since we are always answering the question for 'here'. I've also updated the stencil localAccess.chpl test to wrap the localAccess calls in a `local` block to lock in that no communication is required to implement the calls. Then I took that same principle to the localAccess-block*.chpl tests to lock in that the block localAccess calls do not require communication. Note that I used an unusual formatting scheme here because I didn't want to have to update the line numbers of the error messages... :( --- Signed-off-by: Brad Chamberlain <[email protected]>

Engin pointed out, rightly, that in updating these tests to make the localaccess calls legal, I aso pointed out that we had a pre-existing bug specific to the case of oversubsctibed locales that we didn't have any tests catching. So here, I've added the illegal accesses back at the end of the test to lock in the remote accesses. The reason that these cases are different is that our dsiLocalAccess() calls on Block and Cyclic use different (arguably incorrect) logic when used in an oversubscribed manner. --- Signed-off-by: Brad Chamberlain <[email protected]>

While reviewing the block and cyclic versions of this test, it wasn't clear to me why this stencil version of the test didn't require changes when the others did. It turns out that the answer is that it wasn't ever enabled for multi-locale runs... Fixing that here, and updating it similar to the block and stencil versions. --- Signed-off-by: Brad Chamberlain <[email protected]>

This adds a --checks flag to all these tests I've been working on for localAccess() errors to ensure that they don't fail in --fast testing. This may be the first time in years I've thought of this before having the nightly tests point out my mistake (?). --- Signed-off-by: Brad Chamberlain <[email protected]>

…for comm=none --- Signed-off-by: Brad Chamberlain <[email protected]>

bradcray · 2024-09-10T20:48:43Z

@e-kayrakli : OK, I think this is ready for another look. Apologies for all the talking to myself in comments and backpedaling—I am sometimes too easily convinced I'm wrong. To recap, since your original review:

I haven't changed the implementations of the dsiLocalSubdomains() calls because the current implementations are communication-free when called with any locale, which is a property I think we want to preserve. With more effort, we could use different implementations for when the argument locale is ==here vs. != here, but I don't think that additional complexity (and change) is worth the effort, at least as part of this PR.
I have updated my tests to do the localAccess() calls within local blocks to ensure that the current calls are not doing communication, and to lock this in going forward. Note that I have not tested making a dsiLocalSubdomains call from a remote locale to make sure communication is not required (but could, if desired)
I updated the out-of-date comment in the new doi* iterator on Stencil now that it doesn't take an arbitrary locale argument anymore; the current approach is communication-free as a result, so not at all a compromise
I added the missing checks you noted that lock in that we correctly get locality errors now in the oversubscribed case for a non-local localAccess() call where we were not before this PR; I did this as updates to the duplicateTargetLocales tests; I also added new comm-none .good files for these cases since they don't have the OOB issue
I also remembered during this process that (earlier) I'd discovered that the stencil version of duplicateTargetLocales.chpl wasn't actually running using multiple locales, but then forgot to act on that at all. So here, I've updated it to handle multiple locales, and in the same ways that I updated the block/cyclic cases.
I added the --checks flags we discussed to compopts files to try and head off failures in --fast testing

While chapel-lang#25754 did a nice job of improving the error messages for local accesses, its implementation was sufficiently heavyweight that it resulted in new timeouts in testing for tests that made a lot of use of localAccesses. On one hand, I didn't anticipate how much overhead the implementation would take; on the other hand, I incorrectly thought it would only occur for codes that used localAccess() (with checks on), forgetting that our auto-local-access (ALA) optimization would also cause it to fire for codes that didn't. As a particularly bad example, Engin found that test/studies/hpcc/PTRANS/old/PTRANS.chpl went from 35 to 62 seconds on his laptop despite not using any explicit localAccess calls. We could just turn off the new checks by default for this release, but it felt simpler/saner to me to just revert the PR for now. I think the way to improve this overhead going forward is probably to change the logic from the current approach of: * gather all local subdomains a locale owns - see if the index we're accessing is within any of them to: * ask the distribution itself whether the index is local * only gather all local subdomains if we get a locality error and need to print out what the local locale owns Even in that case, we could improve on the current logic by doing more specialization for the (common) "locale only owns a single sublocale" case than I did in the currenet approach. However, I think it's fine to spend more time gathering the subdomains when we're about to print an error and exit the program anyway. --- Signed-off-by: Brad Chamberlain <[email protected]>

[trivial, not reviewed] While #25754 did a nice job of improving the error messages for local accesses, its implementation was sufficiently heavyweight that it resulted in new timeouts in testing for tests that made a lot of use of localAccesses, primarily noticed in configurations that were already slow (valgrindexe, memleaks, baseline). On one hand, I didn't fully appreciate how much overhead the new implementation added; but more importantly, I incorrectly thought it would only come into play rarely—for codes that explicitly used localAccess() (with checks on), which I think of as being an uncommon occurrence. What I forgot is that our auto-local-access (ALA) optimization would also cause it to fire for many codes that don't contain explicit localAccess calls. As a particularly bad example, Engin found that test/studies/hpcc/PTRANS/old/PTRANS.chpl went from 35 to 62 seconds on his laptop despite not containing any localAccess calls. We could just turn off the new checks by default for this release, but it felt simpler/saner to me to just revert the PR for now given that it's code-freeze day and that this wasn't a release-critical change. I think the way to improve this overhead going forward is probably to change the logic from the current approach of: * gather all local subdomains a locale owns - see if the index we're accessing is within any of them - print the subdomains out in the error message if it's not to: * ask the distribution itself whether the index is stored locally - this is trivial for most distributions; slightly less-so for the Stencil distribution due to fluff * only gather all local subdomains if we get a locality error and need to print out which indices the local locale owns (or maybe just have the distribution print that out itself?) Even in that case, we could improve on the current logic by doing more specialization for the (common) "locale only owns a single sublocale" case than I did in the current approach. However, I think it's arguably fine to spend more time gathering the subdomains when we're about to print an error and exit the program anyway.

bradcray added 2 commits August 13, 2024 18:38

Add .localAccess() errors for remote single-locale arrays

22618dd

--- Signed-off-by: Brad Chamberlain <[email protected]>

Refactor to support better errors for block-distributed arrays as well

4128a23

--- Signed-off-by: Brad Chamberlain <[email protected]>

bradcray mentioned this pull request Aug 14, 2024

Implement/improve error messages for misuse of localAccess #25747

Open

bradcray added 3 commits August 15, 2024 17:37

Extend locality checks to handle arrays with multiple blocks per locale

b71e027

--- Signed-off-by: Brad Chamberlain <[email protected]>

This comment was marked as resolved.

Sign in to view

e-kayrakli mentioned this pull request Aug 28, 2024

Disable ALA and AA for Default Rectangular arrays #25840

Merged

2 tasks

This comment was marked as resolved.

Sign in to view

Merge branch 'main' of https://github.com/chapel-lang/chapel into loc…

1175602

…alaccess-error-improvements

This comment was marked as resolved.

Sign in to view

bradcray added 2 commits September 6, 2024 13:30

This comment was marked as resolved.

Sign in to view

bradcray added 7 commits September 6, 2024 19:28

Merge branch 'main' of https://github.com/chapel-lang/chapel into loc…

802c72a

…alaccess-error-improvements

Update a sensitive-to-uid changes .bad entries

9a30dd7

--- Signed-off-by: Brad Chamberlain <[email protected]>

Remove trailing spaces

12f63a0

--- Signed-off-by: Brad Chamberlain <[email protected]>

Clean up skipifs and generalize for --no-local testing

5df5a4a

--- Signed-off-by: Brad Chamberlain <[email protected]>

Update test to work for comm=none as well; add more working + error c…

266fb3f

…ases --- Signed-off-by: Brad Chamberlain <[email protected]>

Clean up localAccess indexing expressions for 1D and nD arrays

435c1bf

Also, add new test of multidim localAccess bounds violation to lock that case in. --- Signed-off-by: Brad Chamberlain <[email protected]>

Clean up / simplify code and diff

f7270bd

--- Signed-off-by: Brad Chamberlain <[email protected]>

bradcray marked this pull request as ready for review September 9, 2024 23:43

e-kayrakli reviewed Sep 10, 2024

View reviewed changes

bradcray added 2 commits September 10, 2024 12:14

bradcray added 3 commits September 10, 2024 12:47

With new error-generating localAccess cases added, specialize .goods …

2f9db17

…for comm=none --- Signed-off-by: Brad Chamberlain <[email protected]>

e-kayrakli approved these changes Sep 10, 2024

View reviewed changes

bradcray merged commit 15e0d6e into chapel-lang:main Sep 10, 2024
7 checks passed

bradcray deleted the localaccess-error-improvements branch September 10, 2024 22:04

bradcray mentioned this pull request Sep 12, 2024

Revert #25754 (improvements to localAccess() calls) #25936

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add/improve errors for `.localAccess()` calls to non-local elements #25754

Add/improve errors for `.localAccess()` calls to non-local elements #25754

bradcray commented Aug 14, 2024 •

edited

Loading

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

e-kayrakli left a comment

e-kayrakli Sep 10, 2024

bradcray Sep 10, 2024

bradcray Sep 10, 2024

bradcray Sep 10, 2024

e-kayrakli Sep 10, 2024

e-kayrakli Sep 10, 2024

e-kayrakli Sep 10, 2024

e-kayrakli Sep 10, 2024

bradcray Sep 10, 2024

bradcray Sep 10, 2024

e-kayrakli Sep 10, 2024

bradcray Sep 10, 2024

e-kayrakli Sep 10, 2024

bradcray Sep 10, 2024

bradcray Sep 10, 2024

e-kayrakli Sep 10, 2024

bradcray commented Sep 10, 2024

Add/improve errors for .localAccess() calls to non-local elements #25754

Add/improve errors for .localAccess() calls to non-local elements #25754

Conversation

bradcray commented Aug 14, 2024 • edited Loading

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

e-kayrakli left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bradcray commented Sep 10, 2024

Add/improve errors for `.localAccess()` calls to non-local elements #25754

Add/improve errors for `.localAccess()` calls to non-local elements #25754

bradcray commented Aug 14, 2024 •

edited

Loading