-
Notifications
You must be signed in to change notification settings - Fork 419
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add/improve errors for .localAccess()
calls to non-local elements
#25754
Add/improve errors for .localAccess()
calls to non-local elements
#25754
Conversation
--- Signed-off-by: Brad Chamberlain <[email protected]>
--- Signed-off-by: Brad Chamberlain <[email protected]>
I considered doing this in the original draft since the value is always known and the different callsites need different things, but shied away from it worrying about code size increases. However, I'm getting problems with module initialization orders for tests like: arrays/ferguson/array-initialization-patterns/array-init-replicated.chpl --- Signed-off-by: Brad Chamberlain <[email protected]>
--- Signed-off-by: Brad Chamberlain <[email protected]>
(not to say they wouldn't be interesting to test on a single as well to make sure there's no error, but getting the .good files right doesn't seem worth the effort given that they'll vary from case to case. --- Signed-off-by: Brad Chamberlain <[email protected]>
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
Resolves an issue @bradcray bumped into his PR: #25754 (comment). As such, merging this should unblock #25754. Consider: ```chpl forall i in MyArr.domain { ... MyArr[i] ... } ``` This innocent loop is suitable for ALA (and AA in more complicated scenarios) if MyArr is distributed. In such cases, distributed arrays' iterators will ensure that the array is accesssed locally. That's not the case for DefaultRectangular arrays, though. I personally find this confusing and [want to change it](#25351). However, until we do that, we should disable the optimization to comply with the current semantics/implementation of the parallel iterators. We could consider making the core analysis account for arrays/dists that support ALA only through some special dynamic checks. I think that's definitely doable. But that's not easy and its real impact is questionable. Thus, this PR disables ALA and AA for DefaultRectangulars. Note that this is simply done by flipping the default value of the `config param defaultRectangularSupportsAutoLocalAccess`. I am leaving that flag in, even though opting into the optimization will result in potentially erroneous behavior. This flag is not documented and can still be used for some easy experimentation as the said erroneous behavior is relatively uncommon. [Trivial, not reviewed, but direction OK'ed by @bradcray] Test: - [x] linux64 - [x] gasnet
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
…alaccess-error-improvements
This comment was marked as resolved.
This comment was marked as resolved.
… locales It seems these have never worked, and that the tests written to see whether they did were getting away with things? Also, add a test showing why my implementation so far is insufficient to make locality checks for Stencils work. --- Signed-off-by: Brad Chamberlain <[email protected]>
…ncilDist This adds a new iterator similar to localSubdomains() yet which returns all stored local indices, whether part of the locale's subdomain or not, to permit fluff to be considered "in bounds" for local accesses. --- Signed-off-by: Brad Chamberlain <[email protected]>
This comment was marked as resolved.
This comment was marked as resolved.
…alaccess-error-improvements
--- Signed-off-by: Brad Chamberlain <[email protected]>
--- Signed-off-by: Brad Chamberlain <[email protected]>
--- Signed-off-by: Brad Chamberlain <[email protected]>
…ases --- Signed-off-by: Brad Chamberlain <[email protected]>
Also, add new test of multidim localAccess bounds violation to lock that case in. --- Signed-off-by: Brad Chamberlain <[email protected]>
--- Signed-off-by: Brad Chamberlain <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this PR! I have two points in the inline comments:
- Your preference of the
dsiLocalSubdomains
implementation feels less performant and maintainable than a simple lookup - It'd be good to add a test for illegal localAccess in an oversubscribed distribution
iter BlockDom.dsiLocalSubdomains(loc: locale) { | ||
for locid in dist.chpl__locToLocIdxs(loc) { | ||
var inds = chpl__computeBlock(locid, dist.targetLocDom, dist.boundingBox, dist.boundingBox.dims()); | ||
yield whole[(...inds)]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am surprised by this implementation. Shouldn't it be just yield this.locDoms[locid].myBlock;
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, good question. I took this approach due to the else
clause in the dsiLocalSubdomain()
routine just above, and to make it work with oversubscription; but in retrospect, I think you're right that I over-engineered it. Specifically, I believe I was thinking that myBlock
would only work in a non-oversubscribed environment, but with the loop over locid
s, I think you're probably right. I'll check that out to make sure that it works and doesn't introduce communication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooh, and I can update the tests to do the localAcesss calls within local
blocks as a way to both verify this and lock it in going forward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But then in my final review, I realized that the locale argument was never used / always defaulted to here, so removed that logic. So again, I think you're right.
Ah wait, it is necessary (if regrettable from a complexity standpoint) to do here since this routine takes a locale argument, so may be answering local subdomain queries about a remote locale's ownership, and we don't want to communicate with that remote locale to answer the question.
We could simplify / streamline the (common) locale == here
case by either putting a conditional into this loop to check for it, or by creating two overloads of the routine, one which takes no argument (defaulting to here
) and the other which doesn't—then updating ChapelArray.chpl's code to similarly split the implementation into two versions, each of which calls the corresponding one here. I don't feel strongly inclined to do either of those things in this PR because I worry about keeping the two implementations in sync and would need to add more testing to distinguish between the remote and local cases. It also feels a little like a change to the public interface (though it's one that I think would be acceptable since it wouldn't break existing code); or we could keep the same public interface and put a conditional into the outer call, but that doesn't seem great either.
That said, the StencilDom.doiLocalStoredSubdomains
case should not be as expensive as I was fearing in the comment you commented on now that we don't accept a locale argument in that iterator, so that comment does need to be updated (and I can add a locality check to that test to lock that in as well).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 I see. I agree that the solution you outlined would be good, but shouldn't be a part of this PR.
iter CyclicDom.dsiLocalSubdomains(loc: locale) { | ||
for locid in dist.chpl__locToLocIdxs(loc) { | ||
var inds = chpl__computeCyclic(idxType, locid, dist.targetLocDom.dims(), dist.startIdx); | ||
yield whole[(...inds)]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
iter StencilDom.dsiLocalSubdomains(loc: locale) { | ||
for locid in dist.chpl__locToLocIdxs(loc) { | ||
var inds = chpl__computeBlock(locid, dist.targetLocDom, dist.boundingBox, dist.boundingBox.dims()); | ||
yield whole[(...inds)]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
modules/dists/StencilDist.chpl
Outdated
// Could calculate the indices directly, as in the normal | ||
// subdomains case above, but that's more complicated for | ||
// StencilDist, and this isn't on performance-critical paths | ||
// anyway, so I'm just looking it up. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment confuses me. In other instances, you are doing some math followed by a domain slice. Looking it up feels like the more performant and maintainable implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, there's one other piece of history here that I was forgetting, but that this comment reminds me of. I originally wrote these iterators to take a locale
argument, and didn't want to do communication to answer the question. But then in my final review, I realized that the locale argument was never used / always defaulted to here
, so removed that logic. So again, I think you're right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've now updated the comment to indicate that this is guaranteed to be local since we're only ever making this query for here
(whereas the traditional localSubdomains() queries can be called with any locale argument to determine what subdomains are local to a given locale).
@@ -1,3 +1,3 @@ | |||
1 0 0 0 0 0 0 0 0 10 | |||
1 0 0 0 0 0 0 0 0 10 | |||
1 0 0 0 0 0 7 0 0 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not following the motivation for the changes in duplicateTargetLocales
tests. I don't have a reason to oppose them either.
However, if I am following the diff correctly, there is no test for the error message of illegal localAccess
in an oversubscribed distribution. If that's the case, could you add something simple for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you're correct. We were permitting illegal localAccesses in oversubscribed cases (and still do with checks off), and you're also correct that adding a test of that would be good. Perhaps I can just add it to the end of these tests.
This also makes me realize: I'll need to skipif these tests with --fast
and --no-checks
compilations, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or, just stick a --checks
in their compopts, which feels a bit easier and somewhat better as the nature of this test will now become also about testing the --checks
behavior. I don't have any strong opinions here, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah, thanks for the reminder. I don't do this nearly often enough…
Are you confident that compopts will follow the --fast
thrown by nightly testing? (that is, that I don't need to use a lastcompopts to get it in the right place?). I suspect you use this often enough that you are…
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I've just verified that compopts is sufficient…
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
The main piece of feedback I'm acting on here is to update the comment in StencilDom.doiLocalStoredSubdomains(), which had fallen out of date when I removed the 'locale' argument to the iterator in a late-breaking commit. Here, I've updated it to note that the current call should be communication-free since we are always answering the question for 'here'. I've also updated the stencil localAccess.chpl test to wrap the localAccess calls in a `local` block to lock in that no communication is required to implement the calls. Then I took that same principle to the localAccess-block*.chpl tests to lock in that the block localAccess calls do not require communication. Note that I used an unusual formatting scheme here because I didn't want to have to update the line numbers of the error messages... :( --- Signed-off-by: Brad Chamberlain <[email protected]>
Engin pointed out, rightly, that in updating these tests to make the localaccess calls legal, I aso pointed out that we had a pre-existing bug specific to the case of oversubsctibed locales that we didn't have any tests catching. So here, I've added the illegal accesses back at the end of the test to lock in the remote accesses. The reason that these cases are different is that our dsiLocalAccess() calls on Block and Cyclic use different (arguably incorrect) logic when used in an oversubscribed manner. --- Signed-off-by: Brad Chamberlain <[email protected]>
While reviewing the block and cyclic versions of this test, it wasn't clear to me why this stencil version of the test didn't require changes when the others did. It turns out that the answer is that it wasn't ever enabled for multi-locale runs... Fixing that here, and updating it similar to the block and stencil versions. --- Signed-off-by: Brad Chamberlain <[email protected]>
This adds a --checks flag to all these tests I've been working on for localAccess() errors to ensure that they don't fail in --fast testing. This may be the first time in years I've thought of this before having the nightly tests point out my mistake (?). --- Signed-off-by: Brad Chamberlain <[email protected]>
…for comm=none --- Signed-off-by: Brad Chamberlain <[email protected]>
@e-kayrakli : OK, I think this is ready for another look. Apologies for all the talking to myself in comments and backpedaling—I am sometimes too easily convinced I'm wrong. To recap, since your original review:
|
While chapel-lang#25754 did a nice job of improving the error messages for local accesses, its implementation was sufficiently heavyweight that it resulted in new timeouts in testing for tests that made a lot of use of localAccesses. On one hand, I didn't anticipate how much overhead the implementation would take; on the other hand, I incorrectly thought it would only occur for codes that used localAccess() (with checks on), forgetting that our auto-local-access (ALA) optimization would also cause it to fire for codes that didn't. As a particularly bad example, Engin found that test/studies/hpcc/PTRANS/old/PTRANS.chpl went from 35 to 62 seconds on his laptop despite not using any explicit localAccess calls. We could just turn off the new checks by default for this release, but it felt simpler/saner to me to just revert the PR for now. I think the way to improve this overhead going forward is probably to change the logic from the current approach of: * gather all local subdomains a locale owns - see if the index we're accessing is within any of them to: * ask the distribution itself whether the index is local * only gather all local subdomains if we get a locality error and need to print out what the local locale owns Even in that case, we could improve on the current logic by doing more specialization for the (common) "locale only owns a single sublocale" case than I did in the currenet approach. However, I think it's fine to spend more time gathering the subdomains when we're about to print an error and exit the program anyway. --- Signed-off-by: Brad Chamberlain <[email protected]>
[trivial, not reviewed] While #25754 did a nice job of improving the error messages for local accesses, its implementation was sufficiently heavyweight that it resulted in new timeouts in testing for tests that made a lot of use of localAccesses, primarily noticed in configurations that were already slow (valgrindexe, memleaks, baseline). On one hand, I didn't fully appreciate how much overhead the new implementation added; but more importantly, I incorrectly thought it would only come into play rarely—for codes that explicitly used localAccess() (with checks on), which I think of as being an uncommon occurrence. What I forgot is that our auto-local-access (ALA) optimization would also cause it to fire for many codes that don't contain explicit localAccess calls. As a particularly bad example, Engin found that test/studies/hpcc/PTRANS/old/PTRANS.chpl went from 35 to 62 seconds on his laptop despite not containing any localAccess calls. We could just turn off the new checks by default for this release, but it felt simpler/saner to me to just revert the PR for now given that it's code-freeze day and that this wasn't a release-critical change. I think the way to improve this overhead going forward is probably to change the logic from the current approach of: * gather all local subdomains a locale owns - see if the index we're accessing is within any of them - print the subdomains out in the error message if it's not to: * ask the distribution itself whether the index is stored locally - this is trivial for most distributions; slightly less-so for the Stencil distribution due to fluff * only gather all local subdomains if we get a locality error and need to print out which indices the local locale owns (or maybe just have the distribution print that out itself?) Even in that case, we could improve on the current logic by doing more specialization for the (common) "locale only owns a single sublocale" case than I did in the current approach. However, I think it's arguably fine to spend more time gathering the subdomains when we're about to print an error and exit the program anyway.
This PR adds and improves errors for
.localAccess()
calls that access remote elements.The original motivation for the PR was to fix the case noted in issue #25747 in which
.localAccess()
on a default rectangular array would always pass, whether that array was stored locally or on a remote locale (captured intest/arrays/locality/localAccess/localAccess.chpl
). It also improves the error message for.localAccess()
violations on block arrays by expressing the error as a locality error rather than a simple out-of-bounds error, as before (also reported in #25747, captured intest/arrays/locality/localAccess/localAccess-block*.chpl
). These new tests make sure that these errors are generated for read and write cases as well as for POD vs. non-POD arrays, since our modules have distinct dsiAccess overloads for these cases.The check itself is implemented as an additional/optional check in
checkAccess()
which is called for all array accesses when checking is on. It now takes an optionalensureLocal
param that is false by default, but set to true for all.localAccess()
code paths to enable the bounds checking. The logic itself generates different error messages depending on whether the array is completely local to a single (remote) locale, distributed but the current locale owns nothing, or distributed but the index is outside the current locale's bounds.The implementation approach I originally pursued to determine whether an access was in-bounds was to use the array's
localSubdomains()
call to check that the localAccess's indices were in-bounds. However, this caused problems for the Stencil distribution since its localSubdomain(s) (correctly) only refer to the indices that the locale truly owns, but not ones that are part of its "fluff" (cached ghost cells). Yet, we also want that fluff to be accessible using localAccess() since that's how we get performance when the compiler can't optimize them automatically, requiring a slightly more sophisticated approach described in the next paragraph. New testtest/distributions/stencil/localAccess.chpl
locks in a very simple case of this Stencil distribution pattern, with some accesses passing due to true ownership, some due to fluff ownership, and one generating an error due to a remote access.The Stencil distribution challenge led to the approach taken here, which is to introduce a new internal iterator called
chpl__localStoredSubdomains
whose purpose is to yield a locale's subdomains, including indices that it doesn't truly own, like the fluff of the stencil distribution. This is implemented using an optional call on the array,doiLocalStoredSubdomains()
, and if not supported, we fall back to using thelocalSubdomains()
call instead. I then implementeddoiLocalStoredSubdomains()
on Stencil distributions.Along the way, I found that
dsiLocalSubdomains()
calls weren't implemented for Block, Cyclic, and Stencil, so added those as well.Then, while testing this, I found that our current tests that lock in support for using oversubscribed block/cyclic/stencil distributions (that is, ones in which a single locale appears in targetLocales multiple times) were getting away with using
.localAccess()
to access elements that weren't actually local, so I updated them to only update local array elements. The reason for this is that thedsiAccess()
routine on these distributions simply falls back to a normaldsiAccess()
in the event of oversubscription—meaning that we're also losing some performance there in addition to not actually doing a local access... I also found that the stencil variant of this test was never set up to run with multiple locales, so updated it to do so.Note that much of the code in this effort may not be optimal performance-wise (e.g., iterating over local subdomains when the common case is that a locale only has a single subdomain), but that this seemed acceptable—at least for the time being—since this code is only called when checks are on.
Two bad files
test/expressions/if-expr/*.bad
had to be updated due to shifting IDs in the AST… :(Resolves #25747.