RuntimeError is encounted when training cifar10_rnn_gate_rl_38 #7

VictoriaYyz · 2019-04-11T01:29:32Z

Get a RuntimeError when training cifar10_rnn_gate_rl_38 :

04-11-19 09:10:start training cifar10_rnn_gate_rl_38
04-11-19 09:10:=> loading checkpoint ./save_checkpoints/cifar10_rnn_gate_38/model_best.pth.tar
04-11-19 09:10:=> loaded checkpoint ./save_checkpoints/cifar10_rnn_gate_38/model_best.pth.tar (iter: 59000)
Files already downloaded and verified
Files already downloaded and verified
start: 0
04-11-19 09:10:Iter [0] learning rate = 0.0001
Traceback (most recent call last):
File "train_rl.py", line 492, in
main()
File "train_rl.py", line 121, in main
run_training(args)
File "train_rl.py", line 235, in run_training
R = r + args.gamma * R
File "/seu_share/home/zhanjun/anaconda3/envs/pytorch0.2/lib/python3.6/site-packages/torch/tensor.py", line 293, in add
return self.add(other)
RuntimeError: invalid argument 3: sizes do not match at /pytorch/torch/lib/THC/generated/../generic/THCTensorMathPointwise.cu:217

I didn't change any default configure, please help. Thanks.

The text was updated successfully, but these errors were encountered:

xinw1012 · 2019-04-11T08:33:05Z

Hi Victoria, what's your PyTorch version? It's likely some APIs have changed in the newer version of PyTorch. I'm working on updating the code to the new version and hope to release it soon. Thanks!

VictoriaYyz · 2019-04-12T03:57:12Z

I use PyTorch 0.2 and Python 3.6. My cuda version is 9.0.
I google the error, it seems that the following code causes the problem.
R = - pred_loss.data
R = r + args.gamma * R

13597862 · 2022-03-23T07:30:20Z

Excuse me,I encountered the bug ,too.Can you run the code using the command"python3 train_rl.py train cifar10_rnn_gate_rl_110 --resume resnet-110-rnn-sp-cifar10.pth.tar -d cifar10 --gate-type rnn
" normally?

The bug information lists as follow.
Traceback (most recent call last):
File "train_rl.py", line 492, in
main()
File "train_rl.py", line 121, in main
run_training(args)
File "train_rl.py", line 217, in run_training
output, masks, probs = model(input_var)
File "/home/wym/anaconda3/envs/python_auto/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/wym/anaconda3/envs/python_auto/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 167, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/home/wym/anaconda3/envs/python_auto/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 177, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/home/wym/anaconda3/envs/python_auto/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply
output.reraise()
File "/home/wym/anaconda3/envs/python_auto/lib/python3.6/site-packages/torch/_utils.py", line 429, in reraise
raise self.exc_type(msg)
TypeError: Caught TypeError in replica 0 on device 0.
Original Traceback (most recent call last):
File "/home/wym/anaconda3/envs/python_auto/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
output = module(*input, **kwargs)
File "/home/wym/anaconda3/envs/python_auto/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/wym/skipnet-master/cifar/models.py", line 1243, in forward
mask, gprob = self.control(gate_feature)
File "/home/wym/anaconda3/envs/python_auto/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/wym/skipnet-master/cifar/models.py", line 1136, in forward
action = bi_prob.multinomial()
TypeError: multinomial() missing 1 required positional arguments: "num_samples"

akinsanyaayomide · 2022-11-14T12:44:58Z

I use PyTorch 0.2 and Python 3.6. My cuda version is 9.0. I google the error, it seems that the following code causes the problem. R = - pred_loss.data R = r + args.gamma * R

Hello @VictoriaYyz have you been able to resolve this issue?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError is encounted when training cifar10_rnn_gate_rl_38 #7

RuntimeError is encounted when training cifar10_rnn_gate_rl_38 #7

VictoriaYyz commented Apr 11, 2019

xinw1012 commented Apr 11, 2019

VictoriaYyz commented Apr 12, 2019

13597862 commented Mar 23, 2022

akinsanyaayomide commented Nov 14, 2022

RuntimeError is encounted when training cifar10_rnn_gate_rl_38 #7

RuntimeError is encounted when training cifar10_rnn_gate_rl_38 #7

Comments

VictoriaYyz commented Apr 11, 2019

xinw1012 commented Apr 11, 2019

VictoriaYyz commented Apr 12, 2019

13597862 commented Mar 23, 2022

akinsanyaayomide commented Nov 14, 2022