Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert #25754 (improvements to localAccess() calls) #25936

Merged

Conversation

bradcray
Copy link
Member

@bradcray bradcray commented Sep 12, 2024

While #25754 did a nice job of improving the error messages for local accesses, its implementation was sufficiently heavyweight that it resulted in new timeouts in testing for tests that made a lot of use of localAccesses, primarily noticed in configurations that were already slow (valgrindexe, memleaks, baseline).

On one hand, I didn't fully appreciate how much overhead the new implementation added; but more importantly, I incorrectly thought it would only come into play rarely—for codes that explicitly used localAccess() (with checks on), which I think of as being an uncommon occurrence. What I forgot is that our auto-local-access (ALA) optimization would also cause it to fire for many codes that don't contain explicit localAccess calls. As a particularly bad example, Engin found that test/studies/hpcc/PTRANS/old/PTRANS.chpl went from 35 to 62 seconds on his laptop despite not containing any localAccess calls.

We could just turn off the new checks by default for this release, but it felt simpler/saner to me to just revert the PR for now given that it's code-freeze day and that this wasn't a release-critical change.

I think the way to improve this overhead going forward is probably to change the logic from the current approach of:

  • gather all local subdomains a locale owns
    • see if the index we're accessing is within any of them
    • print the subdomains out in the error message if it's not

to:

  • ask the distribution itself whether the index is stored locally
    • this is trivial for most distributions; slightly less-so for the Stencil distribution due to fluff
  • only gather all local subdomains if we get a locality error and need to print out which indices the local locale owns (or maybe just have the distribution print that out itself?)

Even in that case, we could improve on the current logic by doing more specialization for the (common) "locale only owns a single sublocale" case than I did in the current approach. However, I think it's arguably fine to spend more time gathering the subdomains when we're about to print an error and exit the program anyway.

While chapel-lang#25754 did a nice job of improving the error messages for local
accesses, its implementation was sufficiently heavyweight that it
resulted in new timeouts in testing for tests that made a lot of use
of localAccesses.  On one hand, I didn't anticipate how much overhead
the implementation would take; on the other hand, I incorrectly
thought it would only occur for codes that used localAccess() (with
checks on), forgetting that our auto-local-access (ALA) optimization
would also cause it to fire for codes that didn't.  As a particularly
bad example, Engin found that test/studies/hpcc/PTRANS/old/PTRANS.chpl
went from 35 to 62 seconds on his laptop despite not using any
explicit localAccess calls.

We could just turn off the new checks by default for this release, but
it felt simpler/saner to me to just revert the PR for now.

I think the way to improve this overhead going forward is probably
to change the logic from the current approach of:

* gather all local subdomains a locale owns
  - see if the index we're accessing is within any of them

to:

* ask the distribution itself whether the index is local
* only gather all local subdomains if we get a locality error
  and need to print out what the local locale owns

Even in that case, we could improve on the current logic by doing more
specialization for the (common) "locale only owns a single sublocale"
case than I did in the currenet approach.  However, I think it's fine
to spend more time gathering the subdomains when we're about to print
an error and exit the program anyway.

---
Signed-off-by: Brad Chamberlain <[email protected]>
@bradcray bradcray changed the title Revert #25754 Revert #25754 (improvements to localAccess() calls) Sep 12, 2024
@bradcray bradcray merged commit 3f36e34 into chapel-lang:main Sep 12, 2024
7 checks passed
@bradcray bradcray deleted the revert-localaccess-improvements-for-now branch September 12, 2024 02:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant