You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm training a basic CIFAR10 classifier (two Dense layers) using multi-GPU with torch backend (see the code below). The code works fine when the net is written in torch. When written in Keras it returns the following error in line 95:
RuntimeError: Exception encountered when calling Dense.call().
Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument mat2 in method wrapper_CUDA_mm)
Arguments received by Dense.call():
• inputs=torch.Tensor(shape=torch.Size([192, 3, 32, 32]), dtype=float32)
• training=None
Instead of using nn.DataParallel, consider using torch.nn.parallel.DistributedDataParallel maybe? which provides more robust multi-GPU training support.
I'm training a basic CIFAR10 classifier (two Dense layers) using multi-GPU with torch backend (see the code below). The code works fine when the net is written in torch. When written in Keras it returns the following error in line 95:
RuntimeError: Exception encountered when calling Dense.call().
Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument mat2 in method wrapper_CUDA_mm)
Arguments received by Dense.call():
• inputs=torch.Tensor(shape=torch.Size([192, 3, 32, 32]), dtype=float32)
• training=None
The code is below
import os
os.environ[ "KERAS_BACKEND" ] = "torch"
os.environ[ "PYTORCH_CUDA_ALLOC_CONF" ] = "expandable_segments:True"
import time
import datetime
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision.transforms as transforms
from torchvision.datasets import CIFAR10
from torch.utils.data import DataLoader
from model import pyramidnet
import keras
num_epochs = 100
batch_size = 768
num_workers = torch.cuda.device_count()
print( 'Running on {} GPUs'.format( num_workers ) )
lr = 0.01
def main():
device = 'cuda' if torch.cuda.is_available() else 'cpu'
def train( net, criterion, optimizer, train_loader, device ):
net.train()
if name == 'main':
main()
The text was updated successfully, but these errors were encountered: