Revert #25754 (improvements to localAccess() calls) #25936

bradcray · 2024-09-12T00:13:38Z

While #25754 did a nice job of improving the error messages for local accesses, its implementation was sufficiently heavyweight that it resulted in new timeouts in testing for tests that made a lot of use of localAccesses, primarily noticed in configurations that were already slow (valgrindexe, memleaks, baseline).

On one hand, I didn't fully appreciate how much overhead the new implementation added; but more importantly, I incorrectly thought it would only come into play rarely—for codes that explicitly used localAccess() (with checks on), which I think of as being an uncommon occurrence. What I forgot is that our auto-local-access (ALA) optimization would also cause it to fire for many codes that don't contain explicit localAccess calls. As a particularly bad example, Engin found that test/studies/hpcc/PTRANS/old/PTRANS.chpl went from 35 to 62 seconds on his laptop despite not containing any localAccess calls.

We could just turn off the new checks by default for this release, but it felt simpler/saner to me to just revert the PR for now given that it's code-freeze day and that this wasn't a release-critical change.

I think the way to improve this overhead going forward is probably to change the logic from the current approach of:

gather all local subdomains a locale owns
- see if the index we're accessing is within any of them
- print the subdomains out in the error message if it's not

to:

ask the distribution itself whether the index is stored locally
- this is trivial for most distributions; slightly less-so for the Stencil distribution due to fluff
only gather all local subdomains if we get a locality error and need to print out which indices the local locale owns (or maybe just have the distribution print that out itself?)

Even in that case, we could improve on the current logic by doing more specialization for the (common) "locale only owns a single sublocale" case than I did in the current approach. However, I think it's arguably fine to spend more time gathering the subdomains when we're about to print an error and exit the program anyway.

While chapel-lang#25754 did a nice job of improving the error messages for local accesses, its implementation was sufficiently heavyweight that it resulted in new timeouts in testing for tests that made a lot of use of localAccesses. On one hand, I didn't anticipate how much overhead the implementation would take; on the other hand, I incorrectly thought it would only occur for codes that used localAccess() (with checks on), forgetting that our auto-local-access (ALA) optimization would also cause it to fire for codes that didn't. As a particularly bad example, Engin found that test/studies/hpcc/PTRANS/old/PTRANS.chpl went from 35 to 62 seconds on his laptop despite not using any explicit localAccess calls. We could just turn off the new checks by default for this release, but it felt simpler/saner to me to just revert the PR for now. I think the way to improve this overhead going forward is probably to change the logic from the current approach of: * gather all local subdomains a locale owns - see if the index we're accessing is within any of them to: * ask the distribution itself whether the index is local * only gather all local subdomains if we get a locality error and need to print out what the local locale owns Even in that case, we could improve on the current logic by doing more specialization for the (common) "locale only owns a single sublocale" case than I did in the currenet approach. However, I think it's fine to spend more time gathering the subdomains when we're about to print an error and exit the program anyway. --- Signed-off-by: Brad Chamberlain <[email protected]>

--- Signed-off-by: Brad Chamberlain <[email protected]>

bradcray added 2 commits September 11, 2024 17:06

Missed that this chunk failed when reverting my patch

c79e001

--- Signed-off-by: Brad Chamberlain <[email protected]>

bradcray changed the title ~~Revert #25754~~ Revert #25754 (improvements to localAccess() calls) Sep 12, 2024

bradcray merged commit 3f36e34 into chapel-lang:main Sep 12, 2024
7 checks passed

bradcray deleted the revert-localaccess-improvements-for-now branch September 12, 2024 02:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revert #25754 (improvements to localAccess() calls) #25936

Revert #25754 (improvements to localAccess() calls) #25936

bradcray commented Sep 12, 2024 •

edited

Loading

Revert #25754 (improvements to localAccess() calls) #25936

Revert #25754 (improvements to localAccess() calls) #25936

Conversation

bradcray commented Sep 12, 2024 • edited Loading

bradcray commented Sep 12, 2024 •

edited

Loading