Skip to content

Commit

Permalink
Fix MinGPT Example - Data Class Default Field Before Required Field -…
Browse files Browse the repository at this point in the history
… Allow for Non-Distributed Training Depending on Env (#707)

Summary:
Pull Request resolved: #707

```buck2 run :mingpt_example ``` fails as the dataclass fields are initialized in the wrong order. This script also fails on a CPU machine as training strategy  provided appears to require a process group. With a single process the script fails with:
```
File "/usr/local/fbcode/platform010/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/local/fbcode/platform010/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/data/users/ethenderson/fbsource/buck-out/v2/gen/fbcode/6481d33c0dd0a120/torchtnt/examples/mingpt/__mingpt_example__/mingpt_example#link-tree/torchtnt/examples/mingpt/main.py", line 190, in <module>
    main(get_args())
  File "/data/users/ethenderson/fbsource/buck-out/v2/gen/fbcode/6481d33c0dd0a120/torchtnt/examples/mingpt/__mingpt_example__/mingpt_example#link-tree/torchtnt/examples/mingpt/main.py", line 141, in main
    my_unit = MinGPTUnit(
  File "/data/users/ethenderson/fbsource/buck-out/v2/gen/fbcode/6481d33c0dd0a120/torchtnt/examples/mingpt/__mingpt_example__/mingpt_example#link-tree/torchtnt/framework/auto_unit.py", line 119, in __call__
    x = super().__call__(*args, **kwargs)
  File "/data/users/ethenderson/fbsource/buck-out/v2/gen/fbcode/6481d33c0dd0a120/torchtnt/examples/mingpt/__mingpt_example__/mingpt_example#link-tree/torchtnt/examples/mingpt/main.py", line 75, in __init__
    super().__init__(
  File "/data/users/ethenderson/fbsource/buck-out/v2/gen/fbcode/6481d33c0dd0a120/torchtnt/examples/mingpt/__mingpt_example__/mingpt_example#link-tree/torchtnt/framework/auto_unit.py", line 480, in __init__
    self.module: torch.nn.Module = prepare_module(
  File "/data/users/ethenderson/fbsource/buck-out/v2/gen/fbcode/6481d33c0dd0a120/torchtnt/examples/mingpt/__mingpt_example__/mingpt_example#link-tree/torchtnt/utils/prepare_module.py", line 294, in prepare_module
    module = prepare_ddp(module, device, strategy)
  File "/data/users/ethenderson/fbsource/buck-out/v2/gen/fbcode/6481d33c0dd0a120/torchtnt/examples/mingpt/__mingpt_example__/mingpt_example#link-tree/torchtnt/utils/prepare_module.py", line 178, in prepare_ddp
    module = DDP(module, device_ids=device_ids, **params_dict)
  File "/data/users/ethenderson/fbsource/buck-out/v2/gen/fbcode/6481d33c0dd0a120/torchtnt/examples/mingpt/__mingpt_example__/mingpt_example#link-tree/torch/nn/parallel/distributed.py", line 731, in __init__
    self.process_group = _get_default_group()
  File "/data/users/ethenderson/fbsource/buck-out/v2/gen/fbcode/6481d33c0dd0a120/torchtnt/examples/mingpt/__mingpt_example__/mingpt_example#link-tree/torch/distributed/distributed_c10d.py", line 1001, in _get_default_group
    raise ValueError(
ValueError: Default process group has not been initialized, please make sure to call init_process_group.
```

Reviewed By: JKSenthil

Differential Revision: D53872532

fbshipit-source-id: 2a793707cc3fa430bf362fb75ce184acf55f550f
  • Loading branch information
Ethan Henderson authored and facebook-github-bot committed Feb 16, 2024
1 parent 8a8f24a commit 96ecbe4
Show file tree
Hide file tree
Showing 2 changed files with 2 additions and 2 deletions.
2 changes: 1 addition & 1 deletion examples/mingpt/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -143,7 +143,7 @@ def main(args: Namespace) -> None:
opt_cfg=OptimizerConfig(learning_rate=args.lr, weight_decay=args.weight_decay),
module=module,
device=device,
strategy="ddp",
strategy="ddp" if torch.distributed.is_initialized() else None,
log_every_n_steps=args.log_every_n_steps,
gradient_accumulation_steps=4,
detect_anomaly=True,
Expand Down
2 changes: 1 addition & 1 deletion examples/mingpt/model.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,10 +20,10 @@

@dataclass
class GPTConfig:
model_type: str = "gpt2"
n_layer: int
n_head: int
n_embd: int
model_type: str = "gpt2"
# openai's values for gpt2
vocab_size: int = 50257
block_size: int = 1024
Expand Down

0 comments on commit 96ecbe4

Please sign in to comment.