add context parallel group init to mp init #1174

amylittleyang · 2024-04-03T01:03:41Z

What does this PR do?

follow up from D55538929.
update model_parallel/initialize.py to also initialize context parallel groups.
original change made in this PR: https://github.com/fairinternal/llm_inference/pull/333/files

Before submitting

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

jasonjk-park · 2024-04-03T16:33:52Z

fairscale/nn/model_parallel/initialize.py

    pipeline_length: int = 1,
    *,
    model_parallel_backend: Optional[str] = None,
    pipeline_backend: Optional[str] = None,
-    ddp_backend: Optional[str] = None
+    ddp_backend: Optional[str] = None,
+    cp_backend: Optional[str] = None,


can we put cp_backend between model_parallel_backend and pipeline_backend to meet the ordering?

jasonjk-park · 2024-04-03T16:35:46Z

fairscale/nn/model_parallel/initialize.py

+    global _CP_PREV_RANK
+    global _CP_NEXT_RANK
+    global _CP_ZERO_RANK


wondering why we add these?
all these can be derived with CP GROUP RANKS (in fact, PP does that for similar functionalities)

we can always utilize @functools.lru_cache if performance is concern?

Sounds good let me move this part into fbcode. It's probably for convenience in the original PR where they put all comms related changes in this file.

jasonjk-park · 2024-04-03T16:36:49Z

fairscale/nn/model_parallel/initialize.py

+def model_parallel_is_initialized() -> bool:
+    """Check if model and data parallel groups are initialized."""
+    if _MODEL_PARALLEL_GROUP is None or _DATA_PARALLEL_GROUP is None or _PIPELINE_PARALLEL_GROUP is None:
+        return False
+    return True


defined twice here and below?

also, should we add context parallel group to condition?

Good catch!

jasonjk-park · 2024-04-03T16:37:10Z

fairscale/nn/model_parallel/initialize.py

+    return True
+
+
+def get_context_parallel_group():


add return type annotation for all newly added functions?

jasonjk-park · 2024-04-03T16:38:07Z

fairscale/nn/model_parallel/initialize.py

-    global _PIPELINE_PARALLEL_RANKS
-    _PIPELINE_PARALLEL_RANKS = None


why remove?

fairscale/nn/model_parallel/initialize.py

jasonjk-park · 2024-04-03T16:39:17Z

fairscale/nn/model_parallel/initialize.py

+    if torch.distributed.is_available() and torch.distributed.is_initialized():
+        return torch.distributed.get_world_size(group=get_context_parallel_group())
+    else:
+        return 0


is 0 right default value? usually world size returns 1 by default?

We should return torch.distributed.get_world_size(...) directly here. get_context_parallel_group() already checks the CP pg is initialized.

jasonjk-park · 2024-04-03T21:30:17Z

fairscale/nn/model_parallel/initialize.py

+def get_context_parallel_rank() -> int:
+    """Return my rank for the context parallel group."""
+    return torch.distributed.get_rank(group=get_context_parallel_group())


looks like line 161-165 is the same function - maybe remove this?

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 3, 2024

amylittleyang force-pushed the add_context_parallel_init branch from 289e0c1 to a8e7d06 Compare April 3, 2024 01:05

amylittleyang marked this pull request as ready for review April 3, 2024 01:12

jasonjk-park reviewed Apr 3, 2024

View reviewed changes

fairscale/nn/model_parallel/initialize.py Show resolved Hide resolved

jasonjk-park reviewed Apr 3, 2024

View reviewed changes

amylittleyang force-pushed the add_context_parallel_init branch 2 times, most recently from 9fb8ed9 to 78471c8 Compare April 3, 2024 20:33

jasonjk-park reviewed Apr 3, 2024

View reviewed changes

add context parallel group init to mp init

66d75ce

amylittleyang force-pushed the add_context_parallel_init branch from 78471c8 to 66d75ce Compare April 3, 2024 21:47

jasonjk-park approved these changes Apr 3, 2024

View reviewed changes

amylittleyang merged commit 0af41ae into main Apr 4, 2024
1 of 19 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add context parallel group init to mp init #1174

add context parallel group init to mp init #1174

amylittleyang commented Apr 3, 2024 •

edited

Loading

jasonjk-park Apr 3, 2024

jasonjk-park Apr 3, 2024

amylittleyang Apr 3, 2024

jasonjk-park Apr 3, 2024

amylittleyang Apr 3, 2024

jasonjk-park Apr 3, 2024

jasonjk-park Apr 3, 2024

jasonjk-park Apr 3, 2024

amylittleyang Apr 3, 2024

jasonjk-park Apr 3, 2024

		global _PIPELINE_PARALLEL_RANKS
		_PIPELINE_PARALLEL_RANKS = None

add context parallel group init to mp init #1174

add context parallel group init to mp init #1174

Conversation

amylittleyang commented Apr 3, 2024 • edited Loading

What does this PR do?

Before submitting

PR review

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amylittleyang commented Apr 3, 2024 •

edited

Loading