Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue about backpropagation #26

Open
jiawei357 opened this issue Jan 25, 2016 · 28 comments
Open

Issue about backpropagation #26

jiawei357 opened this issue Jan 25, 2016 · 28 comments

Comments

@jiawei357
Copy link

Hi, is there any specific reason that you did the back propagation one layer at a time?

@fzliu
Copy link
Owner

fzliu commented Jan 27, 2016

Since we can't write custom GPU layers in Caffe using Python, the only way to compute losses and gradients at certain layers is to grab the activations and compute them using numpy. If you'd like faster backprop, you can try the gram-layer branch, which does the full forward & backward pass on the GPU, but requires an extra Caffe layer written in C++.

@hermitman
Copy link

I have a question that may just be due to my lack of programming knowledge.

In line 191 and 198, the grad variable i updated with the computed gradient. However, the grad seems to have no influence on the net.backward() call, as it is not used to update the network. Finally, in line 205, grad is reset to the diff in the next layer, which discard the previous grad computation.

I am confused about this part of the code. Could you help me understand it? I am not very fluent with Python, and this might be the reason why I am lost in here.

Thanks,

@jiawei357
Copy link
Author

What he does is use it as some kind of pointer and so he can update the
gradient. And the updated gradient could be used for back propagation in
the next layer.

2016年3月3日星期四,hermitman [email protected] 写道:

I have a question that may just be due to my lack of programming knowledge.

In line 191 and 198, the grad variable i updated with the computed
gradient. However, the grad seems to have no influence on the
net.backward() call, as it is not used to update the network. Finally, in
line 205, grad is reset to the diff in the next layer, which discard the
previous grad computation.

I am confused about this part of the code. Could you help me understand
it? I am not very fluent with Python, and this might be the reason why I am
lost in here.

Thanks,


Reply to this email directly or view it on GitHub
#26 (comment)
.

@hermitman
Copy link

@jiawei357

Hi, thanks for the response. I thought about this explanation, which indicates that grad is a pointer points to net.blobs[layer].diff[0]. However, I found two things that I do not understand:

  1. I used id([variable]) to verify the memory address of grad, after assigning
    grad = net.blobs[layer].diff[0],

    v.s. net.blobs[layer].diff[0]

    and the two ids are not the same. (Is this a problem related to CPU/GPU address?)

  2. Is it OK to insert additional gradient at a layer by adding the gradient computed from local loss to the backpropagated loss?

Thanks for the answer, I think the results are fine, but I am just curious about the code.

@jiawei357
Copy link
Author

Well for the first thing I'm not sure why that happened.
What I did in this project is making a new custom layer called loss
layer(Euclidean for content loss and the gram matrix thing for style loss).
And in the custom layer I added the additional gradient in the back
propagation process. And caffe took care of all the rest BP(from conv layer
to data layer).
So yeah I think in either way(BP layer by layer or using custom Python
layer) you can both do that.

2016年3月3日星期四,hermitman [email protected] 写道:

@jiawei357 https://github.com/jiawei357

Hi, thanks for the response. I thought about this explanation, which
indicates that grad is a pointer points to net.blobs[layer].diff[0].
However, I found two things that I do not understand:

  1. I used id([variable]) to verify the memory address of grad, after
    assigning:

grad = net.blobs[layer].diff[0],

and the two ids are not the same. (Is this a problem related to CPU/GPU
address?)

  1. Is it OK to insert additional gradient at a layer by adding the
    gradient computed from local loss to the backpropagated loss?

Thanks for the answer, I think the results are fine, but I am just curious
about the code.


Reply to this email directly or view it on GitHub
#26 (comment)
.

@jiawei357
Copy link
Author

Not sure if I stated that clear enough

2016年3月3日星期四,hermitman [email protected] 写道:

@jiawei357 https://github.com/jiawei357

Hi, thanks for the response. I thought about this explanation, which
indicates that grad is a pointer points to net.blobs[layer].diff[0].
However, I found two things that I do not understand:

  1. I used id([variable]) to verify the memory address of grad, after
    assigning:

grad = net.blobs[layer].diff[0],

and the two ids are not the same. (Is this a problem related to CPU/GPU
address?)

  1. Is it OK to insert additional gradient at a layer by adding the
    gradient computed from local loss to the backpropagated loss?

Thanks for the answer, I think the results are fine, but I am just curious
about the code.


Reply to this email directly or view it on GitHub
#26 (comment)
.

@hermitman
Copy link

@jiawei357

Thanks for the clarification. So in your implementation, you have multiple loss computation at different conv layers, which are used as content/style layers?

Could I take a look at your network prototxt? I think that should answer the question xD

@jiawei357
Copy link
Author

Currently I'm on spring break so couldn't give you my prototxt. What I did
is have multiple input layer. One for white noise, others for precomputed
style or content gram matrix/activation and a custom layer that takes
output of all those input layer and conv layers.

2016年3月3日星期四,hermitman [email protected] 写道:

@jiawei357 https://github.com/jiawei357

Thanks for the clarification. So in your implementation, you have multiple
loss computation at different conv layers, which are used as content/style
layers?

Could I take a look at your network prototxt? I think that should answer
the question xD


Reply to this email directly or view it on GitHub
#26 (comment)
.

@hermitman
Copy link

@jiawei357 and your loss layer will do backpropagation from the end of the network to the input?

@jiawei357
Copy link
Author

In custom loss layer you only have to define the gradient for the bottom
layer.

2016年3月3日星期四,hermitman [email protected] 写道:

@jiawei357 https://github.com/jiawei357 and your loss layer will do
backpropagation from the end of the network to the input?


Reply to this email directly or view it on GitHub
#26 (comment)
.

@hermitman
Copy link

what is the bottom layer? input?

@hermitman
Copy link

I mean the input layers that connects to your loss layer. So, if you have one style layer and one content layer connected to the custom loss, then your loss will backprop to each of them, respectively.

@fzliu
Copy link
Owner

fzliu commented Mar 4, 2016

Hey there - didn't get a chance to read through the whole thread, but this might be of interest to you: https://github.com/fzliu/style-transfer/tree/gram-layer.

@jiawei357
Copy link
Author

The bottom layer also include the conv layer of your network. That's where
we want to set the gradient. The gradient for those input layer could be
set to be zero.

2016年3月3日星期四,hermitman [email protected] 写道:

I mean the input layers that connects to your loss layer. So, if you have
one style layer and one content layer connected to the custom loss, then
your loss will backprop to each of them, respectively.


Reply to this email directly or view it on GitHub
#26 (comment)
.

@hermitman
Copy link

@fzliu I just had a question about how the "grad" in the master branch style_optfn gets used. From the code, I do not see any reference to the computed "grad"

@hermitman
Copy link

@jiawei357

hmm, I am still confused here = =!

So, A gradient that is computed at your custom loss layer will travel through all the conv layers and finally reach the input image?

@hermitman
Copy link

Hey, all:

I think I got the idea from reading the code in the gram layer branch. Thanks @jiawei357 @fzliu

@hermitman
Copy link

@fzliu one last thing, where is the protobuf that has the gramianParameter defined? I couldn't locate it, and my caffe gives me error for not having it

@fzliu
Copy link
Owner

fzliu commented Mar 4, 2016

You'll need a custom version of Caffe which contains the necessary layer definition: https://github.com/dpaiton/caffe/tree/gramian

@hermitman
Copy link

got it thanks!

@hermitman
Copy link

@fzliu I run the code on both branch, the results are not the same. The master branch produces reasonable results while the gram-layer branch provide really strange result
johannesburg-starry_night-vgg19-content-1e4-256

@hermitman
Copy link

@fzliu after some digging, I think the problem is that the network is not using style loss at all. I can reproduce the above error when I turn off grad update from style layers. Any ideas?

@hermitman
Copy link

@jiawei357 are you using the same gramian layer implementation from https://github.com/dpaiton/caffe/tree/gramian?

@hermitman
Copy link

@fzliu More observations:

In the code, I did a validation on the output of the gramian layer in the modified caffe.

I think the output of the gramian layer is not correct. If I compare the output of the gramian layer and the output by simply computing the matrix multiplication of the convolution layer's result. The number do not match.

e.g.

in the network, there is a connection: conv1_1 -> conv1_1/gramian

the output of conv1_1/gramian should be the inner product of conv1_1's output. However the result do not match to the manual computation of conv1_1 using scipy.sgemm.

Am I the only one having problem with the Gramian layer?

Thanks,

@fzliu
Copy link
Owner

fzliu commented Mar 4, 2016

Try this one instead: https://github.com/fzliu/caffe/tree/gram-layer. I don't quite remember how it's different, but I remember doing some minor changes to the original gram layer implementation. I'll look into merging it into dpaiton's branch soon.

@hermitman
Copy link

@fzliu works this time. Thanks, I took a look at the layer implementation, but could not find obvious difference. I think the main issue might be how the pointers or data dimensions are tweaked.

I do have another question that I want to ask,

How do python decide when to copy by reference or by copy. I found several places in the code that you use .copy() while some other places that you use assignment. when we copy caffe's blob, do we need to use assignment or .copy()?

I found several places of these operations,

  1. {master branch} -> style.py:184, you used grad as a "pointer" to update the network's blob directly.
  2. {gram-layer} -> style.py:405, you copied data from one blob to another.
  3. {master branch} -> style.py:139, you explicitly used shallow copy to create a copy of the blob and manipulate the copy's values.

in 1 and 2, the assignment obviously has different function, while in 1, it is a reference, and in 2, it is a copy.

I thought I understand the python assignment and copy well, but found it hard to differentiate these situations....... = =! Please teach me,

Thanks,

@boother
Copy link

boother commented Oct 18, 2016

Hi, everybody! Could somebody share mentioned above prototxt?
Thanks!

@lgyhero
Copy link

lgyhero commented Aug 14, 2017

@jiawei357 Could you please tell me your E-mail address? I'v also defined a custom layer using PyCaffe, but get some trouble when override 'backward()' function. Hope to seek some advice from you. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants