Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify launcher world size parsing #3398

Merged
merged 2 commits into from
Jun 17, 2024

Conversation

mvpatel2000
Copy link
Contributor

What does this PR do?

Simplify launcher world size parsing. Now, composer -n correctly enables running on fewer GPUs when iterating on a single machine.

@mvpatel2000 mvpatel2000 requested a review from a team as a code owner June 12, 2024 16:11
Copy link
Contributor

@eracah eracah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

composer/cli/launcher.py Show resolved Hide resolved
@eracah
Copy link
Contributor

eracah commented Jun 12, 2024

Any way we can do a test for this?

@eracah eracah self-requested a review June 12, 2024 16:24
@mvpatel2000
Copy link
Contributor Author

Any way we can do a test for this?

Unfortunately not easily since pytest runs in a single instance and this would require a separate launch. But ill include a manual test

Copy link
Contributor

@snarayan21 snarayan21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

waiting for dais hold & tests pass, lgtm otherwise

@mvpatel2000
Copy link
Contributor Author

can someone put a request changes hold to block during codefreeze? @dakinggg ?

Copy link
Contributor

@dakinggg dakinggg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Holding temporarily

@bigning
Copy link
Contributor

bigning commented Jun 12, 2024

I'm curious, why set env_var WORLD_SIZE for single node in the first place? It breaks the CPU test python -m pytest in interactive if there are mutliple GPUs. Error is missing RANK in env_var. This PR doesn't fix that, because here it's not 1. I had to unset the WORLD_SIZE

@mvpatel2000
Copy link
Contributor Author

I'm curious, why set env_var WORLD_SIZE for single node in the first place?

was bug in mcloud, it should set all env vars even if on single node

It breaks the CPU test python -m pytest

will investigate

@mvpatel2000
Copy link
Contributor Author

It breaks the CPU test python -m pytest

will investigate

Discussed offline, if you dont use the launcher it will still use the WORLD_SIZE env var which we cannot fix. This break is not related to this PR, you must use composer -n 1 pytest...

@dakinggg dakinggg self-requested a review June 17, 2024 17:02
@dakinggg dakinggg dismissed their stale review June 17, 2024 17:02

dais over

@mvpatel2000 mvpatel2000 merged commit 6023fe5 into mosaicml:dev Jun 17, 2024
17 checks passed
@mvpatel2000 mvpatel2000 deleted the mvpatel2000/simplify-envvar branch June 17, 2024 17:57
mvpatel2000 added a commit to mvpatel2000/composer that referenced this pull request Jul 21, 2024
mvpatel2000 added a commit that referenced this pull request Jul 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants