Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple GPU #520

Open
MinaGabriel opened this issue Jun 3, 2020 · 4 comments
Open

Multiple GPU #520

MinaGabriel opened this issue Jun 3, 2020 · 4 comments

Comments

@MinaGabriel
Copy link

I am trying to run training on two GPUs

StreamExecutor device (0): GeForce RTX 2080 Ti, Compute Capability 7.5
StreamExecutor device (1): GeForce RTX 2080 Ti, Compute Capability 7.5

I keep on getting the following error, i am assuming that this error is because the weights are on the CPU while Input is on GPU, correct?


Traceback (most recent call last):
  File "/home/lambda/PyTorch-Yolov3/train.py", line 115, in <module>
    loss, outputs = model(imgs, targets)
  File "/home/lambda/anaconda3/envs/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/lambda/PyTorch-Yolov3/models.py", line 252, in forward
    x = module(x)
  File "/home/lambda/anaconda3/envs/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/lambda/anaconda3/envs/venv/lib/python3.8/site-packages/torch/nn/modules/container.py", line 100, in forward
    input = module(input)
  File "/home/lambda/anaconda3/envs/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/lambda/anaconda3/envs/venv/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 349, in forward
    return self._conv_forward(input, self.weight)
  File "/home/lambda/anaconda3/envs/venv/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 345, in _conv_forward
    return F.conv2d(input, weight, self.bias, self.stride,
RuntimeError: Expected tensor for argument #1 'input' to have the same device as tensor for argument #2 'weight'; but device 0 does not equal 1 (while checking arguments for cudnn_convolution)
@ScottHoang
Copy link

Are you using pytorch distributed package? if so, did you correctly set your default Cuda location for your local process rank? if not, this happens.

@jxhno1
Copy link

jxhno1 commented Jun 17, 2020

Can you put some pipeline advice for Multi-gpu training? Thanks a lot!@voodoopotato

@genqiaolynn
Copy link

Can you success to multi-gpu training? Thanks!

@Flova
Copy link
Collaborator

Flova commented Aug 2, 2021

I will not add multi GPU training in the near future. If anybody wants to make a pr feel free.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants