Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Teams API topic branch #886

Merged
merged 99 commits into from
Nov 19, 2019
Merged

Conversation

davidozog
Copy link
Member

@davidozog davidozog commented Sep 20, 2019

This is a work-in-progress branch that implements the OpenSHMEM v1.5 teams API. This pull request is intended to merge the wip/teams branch from my fork into the topic/teams branch into the SOS main repository.

Here is a running checklist of open questions and TODO items:

  • Add more psync arrays for back-to-back collectives
  • The unit tests are more-or-less consistent with the teams proposal examples, but may need modification (most require multiple PEs).
  • Write unit test(s) with more complete coverage of the teams API and various types.
  • Support more than 64 teams.
  • Are some types missing from shmem_internal_op.h?
  • Should I add pWrk space for each team at initialization? I deferred this since SOS doesn't yet make use of pWrk arrays.
  • I'm not sure whether shmem_internal_1st_nonzero_bit supports big-endiannass.
  • The C11/C++ "extras" bindings might need some extra help.
  • SHMEM_TEAM_SHARED only works with a uniform stride across the local process IDs. Users can otherwise set DISABLE_TEAM_SHARED, but this could definitely be cleaner.
  • Whoops - are private contexts lost during team destroy? (test it)

davidozog and others added 30 commits April 12, 2019 17:44
James Dinan added 4 commits November 16, 2019 08:40
* README update
* teams_fini reordering
* shmem_internal_free moved
* added shmem_internal_bit_fetch
* rename/comment check_for_linear_stride
* move teams_init errors to appropriate place
* move teams_init pool initializations to appropriate place
* use shmem_internal_params.TEAMS_MAX
* use asserts where needed
* fix 'myteam' leak in parent-only PEs
* fix choose_psync bug for parent-only PEs
* more informative error in split_strided
* All of parent must call split_strided in split_2d
* cleanup unit tests
* other minor changes...

Signed-off-by: David M. Ozog <[email protected]>
src/shmem_team.h Outdated Show resolved Hide resolved
@jdinan
Copy link
Member

jdinan commented Nov 18, 2019

@davidozog Just a few minor comments for myself while reviewing your most recent patches. I will work through these items (file issues, push minor changes, etc.) and expect this to be ready to merge later today. With respect to the tests from the spec, I will move these to a separate PR to allow more time for us to clean them up and propose fixes upstream in the spec.

James Dinan added 8 commits November 18, 2019 13:06
Catch and correct for SHMEM_TEAMS_MAX values that are too small.

Signed-off-by: James Dinan <[email protected]>
This is a special case; the default context isn't inserted into the
contexts list on the world team, so it is not freed when the teams are
destroyed.  This context must be freed separately by the transport
layer.

Signed-off-by: James Dinan <[email protected]>
These tests are failing and it looks like there may be bugs in the unit
tests.  Setting these tests aside for debugging.

Signed-off-by: James Dinan <[email protected]>
Signed-off-by: James Dinan <[email protected]>
Signed-off-by: James Dinan <[email protected]>
@jdinan
Copy link
Member

jdinan commented Nov 18, 2019

@davidozog I took the liberty of pushing directly to your fork to speed up time to merge. Hope you don't mind! Please review the edits and, if all looks good on your end, go ahead and merge! Excellent work on a huge new feature! 💯

I was able to sort out some, but not all of the issues with the spec tests. After merging, please git revert 90886c8 and post a new PR so we can work through the remaining issues with those tests.

Copy link
Member

@jdinan jdinan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥇 :shipit:

@jdinan
Copy link
Member

jdinan commented Nov 18, 2019

Sorry, one last patch. Was looking into the other build warnings and realized we updated from int nreduce on the legacy reductions to size_t nreduce on the teams reductions. Fixing those build warnings fixed our support for the larger type.

@davidozog davidozog merged commit 788ad10 into Sandia-OpenSHMEM:topic/teams Nov 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants