Issue about backpropagation #26

jiawei357 · 2016-01-25T00:18:47Z

Hi, is there any specific reason that you did the back propagation one layer at a time?

fzliu · 2016-01-27T19:17:31Z

Since we can't write custom GPU layers in Caffe using Python, the only way to compute losses and gradients at certain layers is to grab the activations and compute them using numpy. If you'd like faster backprop, you can try the gram-layer branch, which does the full forward & backward pass on the GPU, but requires an extra Caffe layer written in C++.

hermitman · 2016-03-04T02:43:44Z

I have a question that may just be due to my lack of programming knowledge.

In line 191 and 198, the grad variable i updated with the computed gradient. However, the grad seems to have no influence on the net.backward() call, as it is not used to update the network. Finally, in line 205, grad is reset to the diff in the next layer, which discard the previous grad computation.

I am confused about this part of the code. Could you help me understand it? I am not very fluent with Python, and this might be the reason why I am lost in here.

Thanks,

jiawei357 · 2016-03-04T02:46:00Z

What he does is use it as some kind of pointer and so he can update the
gradient. And the updated gradient could be used for back propagation in
the next layer.

2016年3月3日星期四，hermitman [email protected] 写道：

I have a question that may just be due to my lack of programming knowledge.

In line 191 and 198, the grad variable i updated with the computed
gradient. However, the grad seems to have no influence on the
net.backward() call, as it is not used to update the network. Finally, in
line 205, grad is reset to the diff in the next layer, which discard the
previous grad computation.

I am confused about this part of the code. Could you help me understand
it? I am not very fluent with Python, and this might be the reason why I am
lost in here.

Thanks,

—
Reply to this email directly or view it on GitHub
#26 (comment)
.

hermitman · 2016-03-04T02:54:43Z

@jiawei357

Hi, thanks for the response. I thought about this explanation, which indicates that grad is a pointer points to net.blobs[layer].diff[0]. However, I found two things that I do not understand:

I used id([variable]) to verify the memory address of grad, after assigning
grad = net.blobs[layer].diff[0],

v.s. net.blobs[layer].diff[0]

and the two ids are not the same. (Is this a problem related to CPU/GPU address?)
Is it OK to insert additional gradient at a layer by adding the gradient computed from local loss to the backpropagated loss?

Thanks for the answer, I think the results are fine, but I am just curious about the code.

jiawei357 · 2016-03-04T03:01:11Z

Well for the first thing I'm not sure why that happened.
What I did in this project is making a new custom layer called loss
layer(Euclidean for content loss and the gram matrix thing for style loss).
And in the custom layer I added the additional gradient in the back
propagation process. And caffe took care of all the rest BP(from conv layer
to data layer).
So yeah I think in either way(BP layer by layer or using custom Python
layer) you can both do that.

2016年3月3日星期四，hermitman [email protected] 写道：

@jiawei357 https://github.com/jiawei357

Hi, thanks for the response. I thought about this explanation, which
indicates that grad is a pointer points to net.blobs[layer].diff[0].
However, I found two things that I do not understand:

I used id([variable]) to verify the memory address of grad, after
assigning:

grad = net.blobs[layer].diff[0],

and the two ids are not the same. (Is this a problem related to CPU/GPU
address?)

Is it OK to insert additional gradient at a layer by adding the
gradient computed from local loss to the backpropagated loss?

Thanks for the answer, I think the results are fine, but I am just curious
about the code.

—
Reply to this email directly or view it on GitHub
#26 (comment)
.

jiawei357 · 2016-03-04T03:08:25Z

Not sure if I stated that clear enough

2016年3月3日星期四，hermitman [email protected] 写道：

@jiawei357 https://github.com/jiawei357

Hi, thanks for the response. I thought about this explanation, which
indicates that grad is a pointer points to net.blobs[layer].diff[0].
However, I found two things that I do not understand:

I used id([variable]) to verify the memory address of grad, after
assigning:

grad = net.blobs[layer].diff[0],

and the two ids are not the same. (Is this a problem related to CPU/GPU
address?)

Is it OK to insert additional gradient at a layer by adding the
gradient computed from local loss to the backpropagated loss?

Thanks for the answer, I think the results are fine, but I am just curious
about the code.

—
Reply to this email directly or view it on GitHub
#26 (comment)
.

hermitman · 2016-03-04T03:08:50Z

@jiawei357

Thanks for the clarification. So in your implementation, you have multiple loss computation at different conv layers, which are used as content/style layers?

Could I take a look at your network prototxt? I think that should answer the question xD

jiawei357 · 2016-03-04T03:13:30Z

Currently I'm on spring break so couldn't give you my prototxt. What I did
is have multiple input layer. One for white noise, others for precomputed
style or content gram matrix/activation and a custom layer that takes
output of all those input layer and conv layers.

2016年3月3日星期四，hermitman [email protected] 写道：

@jiawei357 https://github.com/jiawei357

Thanks for the clarification. So in your implementation, you have multiple
loss computation at different conv layers, which are used as content/style
layers?

Could I take a look at your network prototxt? I think that should answer
the question xD

—
Reply to this email directly or view it on GitHub
#26 (comment)
.

hermitman · 2016-03-04T03:17:16Z

@jiawei357 and your loss layer will do backpropagation from the end of the network to the input?

jiawei357 · 2016-03-04T03:19:22Z

In custom loss layer you only have to define the gradient for the bottom
layer.

2016年3月3日星期四，hermitman [email protected] 写道：

@jiawei357 https://github.com/jiawei357 and your loss layer will do
backpropagation from the end of the network to the input?

—
Reply to this email directly or view it on GitHub
#26 (comment)
.

hermitman · 2016-03-04T03:21:18Z

what is the bottom layer? input?

hermitman · 2016-03-04T03:23:55Z

I mean the input layers that connects to your loss layer. So, if you have one style layer and one content layer connected to the custom loss, then your loss will backprop to each of them, respectively.

fzliu · 2016-03-04T03:25:08Z

Hey there - didn't get a chance to read through the whole thread, but this might be of interest to you: https://github.com/fzliu/style-transfer/tree/gram-layer.

jiawei357 · 2016-03-04T03:25:25Z

The bottom layer also include the conv layer of your network. That's where
we want to set the gradient. The gradient for those input layer could be
set to be zero.

2016年3月3日星期四，hermitman [email protected] 写道：

I mean the input layers that connects to your loss layer. So, if you have
one style layer and one content layer connected to the custom loss, then
your loss will backprop to each of them, respectively.

—
Reply to this email directly or view it on GitHub
#26 (comment)
.

hermitman · 2016-03-04T03:33:38Z

@fzliu I just had a question about how the "grad" in the master branch style_optfn gets used. From the code, I do not see any reference to the computed "grad"

hermitman · 2016-03-04T03:35:16Z

@jiawei357

hmm, I am still confused here = =!

So, A gradient that is computed at your custom loss layer will travel through all the conv layers and finally reach the input image?

hermitman · 2016-03-04T03:41:55Z

Hey, all:

I think I got the idea from reading the code in the gram layer branch. Thanks @jiawei357 @fzliu

hermitman · 2016-03-04T03:50:03Z

@fzliu one last thing, where is the protobuf that has the gramianParameter defined? I couldn't locate it, and my caffe gives me error for not having it

fzliu · 2016-03-04T03:55:10Z

You'll need a custom version of Caffe which contains the necessary layer definition: https://github.com/dpaiton/caffe/tree/gramian

hermitman · 2016-03-04T04:09:30Z

got it thanks!

hermitman · 2016-03-04T06:47:48Z

@fzliu I run the code on both branch, the results are not the same. The master branch produces reasonable results while the gram-layer branch provide really strange result

hermitman · 2016-03-04T09:15:25Z

@fzliu after some digging, I think the problem is that the network is not using style loss at all. I can reproduce the above error when I turn off grad update from style layers. Any ideas?

hermitman · 2016-03-04T09:20:48Z

@jiawei357 are you using the same gramian layer implementation from　https://github.com/dpaiton/caffe/tree/gramian?

hermitman · 2016-03-04T20:32:25Z

@fzliu More observations:

In the code, I did a validation on the output of the gramian layer in the modified caffe.

I think the output of the gramian layer is not correct. If I compare the output of the gramian layer and the output by simply computing the matrix multiplication of the convolution layer's result. The number do not match.

e.g.

in the network, there is a connection: conv1_1 -> conv1_1/gramian

the output of conv1_1/gramian should be the inner product of conv1_1's output. However the result do not match to the manual computation of conv1_1 using scipy.sgemm.

Am I the only one having problem with the Gramian layer?

Thanks,

fzliu · 2016-03-04T21:08:06Z

Try this one instead: https://github.com/fzliu/caffe/tree/gram-layer. I don't quite remember how it's different, but I remember doing some minor changes to the original gram layer implementation. I'll look into merging it into dpaiton's branch soon.

hermitman · 2016-03-04T22:02:19Z

@fzliu works this time. Thanks, I took a look at the layer implementation, but could not find obvious difference. I think the main issue might be how the pointers or data dimensions are tweaked.

I do have another question that I want to ask,

How do python decide when to copy by reference or by copy. I found several places in the code that you use .copy() while some other places that you use assignment. when we copy caffe's blob, do we need to use assignment or .copy()?

I found several places of these operations,

{master branch} -> style.py:184, you used grad as a "pointer" to update the network's blob directly.
{gram-layer} -> style.py:405, you copied data from one blob to another.
{master branch} -> style.py:139, you explicitly used shallow copy to create a copy of the blob and manipulate the copy's values.

in 1 and 2, the assignment obviously has different function, while in 1, it is a reference, and in 2, it is a copy.

I thought I understand the python assignment and copy well, but found it hard to differentiate these situations....... = =! Please teach me,

Thanks,

boother · 2016-10-18T20:15:03Z

Hi, everybody! Could somebody share mentioned above prototxt?
Thanks!

lgyhero · 2017-08-14T13:08:09Z

@jiawei357 Could you please tell me your E-mail address？ I'v also defined a custom layer using PyCaffe, but get some trouble when override 'backward()' function. Hope to seek some advice from you. Thanks！

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue about backpropagation #26

Issue about backpropagation #26

jiawei357 commented Jan 25, 2016

fzliu commented Jan 27, 2016

hermitman commented Mar 4, 2016

jiawei357 commented Mar 4, 2016

hermitman commented Mar 4, 2016

jiawei357 commented Mar 4, 2016

jiawei357 commented Mar 4, 2016

hermitman commented Mar 4, 2016

jiawei357 commented Mar 4, 2016

hermitman commented Mar 4, 2016

jiawei357 commented Mar 4, 2016

hermitman commented Mar 4, 2016

hermitman commented Mar 4, 2016

fzliu commented Mar 4, 2016

jiawei357 commented Mar 4, 2016

hermitman commented Mar 4, 2016

hermitman commented Mar 4, 2016

hermitman commented Mar 4, 2016

hermitman commented Mar 4, 2016

fzliu commented Mar 4, 2016

hermitman commented Mar 4, 2016

hermitman commented Mar 4, 2016

hermitman commented Mar 4, 2016

hermitman commented Mar 4, 2016

hermitman commented Mar 4, 2016

fzliu commented Mar 4, 2016

hermitman commented Mar 4, 2016

boother commented Oct 18, 2016

lgyhero commented Aug 14, 2017

Issue about backpropagation #26

Issue about backpropagation #26

Comments

jiawei357 commented Jan 25, 2016

fzliu commented Jan 27, 2016

hermitman commented Mar 4, 2016

jiawei357 commented Mar 4, 2016

hermitman commented Mar 4, 2016

jiawei357 commented Mar 4, 2016

jiawei357 commented Mar 4, 2016

hermitman commented Mar 4, 2016

jiawei357 commented Mar 4, 2016

hermitman commented Mar 4, 2016

jiawei357 commented Mar 4, 2016

hermitman commented Mar 4, 2016

hermitman commented Mar 4, 2016

fzliu commented Mar 4, 2016

jiawei357 commented Mar 4, 2016

hermitman commented Mar 4, 2016

hermitman commented Mar 4, 2016

hermitman commented Mar 4, 2016

hermitman commented Mar 4, 2016

fzliu commented Mar 4, 2016

hermitman commented Mar 4, 2016

hermitman commented Mar 4, 2016

hermitman commented Mar 4, 2016

hermitman commented Mar 4, 2016

hermitman commented Mar 4, 2016

fzliu commented Mar 4, 2016

hermitman commented Mar 4, 2016

boother commented Oct 18, 2016

lgyhero commented Aug 14, 2017