Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to use multiple gpu, can u show example ? #12

Open
ucalyptus2 opened this issue Oct 11, 2023 · 2 comments
Open

how to use multiple gpu, can u show example ? #12

ucalyptus2 opened this issue Oct 11, 2023 · 2 comments

Comments

@ucalyptus2
Copy link

@wangjk666 greetings, please show example for multi gpu setting in same cluster. Single Machine with 8 GPUs I have.

@ucalyptus2
Copy link
Author

ucalyptus2 commented Oct 11, 2023

 'TRAIN': {'CHECKPOINT_EPOCH_RESET': False,
           'CHECKPOINT_LOAD_PATH': '/ingenuity_NAS/21sd45_nas/21sd45_mount/checkpoints/Xception_FFDF_epoch_00075.pyth',
           'CHECKPOINT_PERIOD': 5,
           'CHECKPOINT_SAVE_PATH': '/ingenuity_NAS/21sd45_nas/21sd45_mount/',
           'ENABLE': True,
           'EVAL_PERIOD': 1,
           'MAX_EPOCH': 200}}
[10/11 19:32:47][INFO] build_helper.py:  81: MODEL_NAME: Xception
Traceback (most recent call last):
  File "/home/21sd45/PyDeepFakeDet/run.py", line 26, in <module>
    main()
  File "/home/21sd45/PyDeepFakeDet/run.py", line 20, in main
    launch_func(cfg=cfg, func=train)
  File "/home/21sd45/PyDeepFakeDet/tools/utils.py", line 94, in launch_func
    torch.multiprocessing.spawn(
  File "/home/21sd45/miniconda3/envs/new/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 240, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/home/21sd45/miniconda3/envs/new/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 198, in start_processes
    while not context.join():
  File "/home/21sd45/miniconda3/envs/new/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 160, in join
    raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException: 

-- Process 3 terminated with the following error:
Traceback (most recent call last):
  File "/home/21sd45/miniconda3/envs/new/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
    fn(i, *args)
  File "/home/21sd45/PyDeepFakeDet/tools/train.py", line 151, in train
    model = build_model(cfg)
  File "/home/21sd45/PyDeepFakeDet/PyDeepFakeDet/utils/build_helper.py", line 89, in build_model
    device_ids=[int(os.environ['LOCAL_RANK'])]
  File "/home/21sd45/miniconda3/envs/new/lib/python3.9/os.py", line 679, in __getitem__
    raise KeyError(key) from None
KeyError: 'LOCAL_RANK'

@ucalyptus2
Copy link
Author

@zhangchaosd

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant