Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training fails because of accelerate configure settings. #1749

Closed
4 tasks
AdamLouly opened this issue Mar 7, 2024 · 1 comment
Closed
4 tasks

Training fails because of accelerate configure settings. #1749

AdamLouly opened this issue Mar 7, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@AdamLouly
Copy link
Contributor

System Info

nightly optimum
nightly transformers
accelerate

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction (minimal, reproducible, runnable)

args.accelerator_config does not get initialized.

Make sure accelerate is installed (even without using it)

python -m torch.distributed.launch --nproc_per_node=8 --use-env run_glue.py --model_name_or_path microsoft/deberta-large --task_name MRPC --max_seq_length 128 --learning_rate 3e-6 --do_train --output_dir /dev/shm --overwrite_output_dir --max_steps 200 --logging_steps 20 --per_device_train_batch_size 32 --fp16

Expected behavior

Error:

[rank4]: Traceback (most recent call last): [rank4]: File "/workspace/optimum/examples/onnxruntime/training/text-classification/run_glue.py", line 649, in <module> [rank4]: main() [rank4]: File "/workspace/optimum/examples/onnxruntime/training/text-classification/run_glue.py", line 540, in main [rank4]: trainer = ORTTrainer( [rank4]: File "/opt/conda/envs/ptca/lib/python3.8/site-packages/optimum-1.18.0.dev0-py3.8.egg/optimum/onnxruntime/trainer.py", line 227, in __init__ [rank4]: super().__init__( [rank4]: File "/opt/conda/envs/ptca/lib/python3.8/site-packages/transformers-4.39.0.dev0-py3.8.egg/transformers/trainer.py", line 369, in __init__ [rank4]: self.create_accelerator_and_postprocess() [rank4]: File "/opt/conda/envs/ptca/lib/python3.8/site-packages/transformers-4.39.0.dev0-py3.8.egg/transformers/trainer.py", line 4094, in create_accelerator_and_postprocess [rank4]: **self.args.accelerator_config.to_dict(), [rank4]: AttributeError: 'NoneType' object has no attribute 'to_dict'

@AdamLouly AdamLouly added the bug Something isn't working label Mar 7, 2024
@AdamLouly
Copy link
Contributor Author

This should fix it:
#1750

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant