Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

activation function of guidance map #10

Open
gybuaa opened this issue Dec 2, 2020 · 20 comments
Open

activation function of guidance map #10

gybuaa opened this issue Dec 2, 2020 · 20 comments

Comments

@gybuaa
Copy link

gybuaa commented Dec 2, 2020

the original official code used 'sigmoid' as the second 1*1 conv's activation function,but this code used 'tanh'.Does it perform better than official version? thanks for your great job!

@creotiv
Copy link
Owner

creotiv commented Dec 2, 2020

So tanh was here because i though that grid_sample needs input in [-1,1] (because they said so in docs), but as i understand that was bad intuition. Right now i use Sigmoid function for guide. And seems it started to work at last.

@gybuaa
Copy link
Author

gybuaa commented Dec 3, 2020

So tanh was here because i though that grid_sample needs input in [-1,1] (because they said so in docs), but as i understand that was bad intuition. Right now i use Sigmoid function for guide. And seems it started to work at last.

Thanks for your reply!I have seen it in the newest version.So have you trained it on the fivek's 5000 images? I generate 5000 .jpg format images and I am about to doing some modification on the architecture of this net. I think that there would be some more direct way to get the full-res [B,12,H,W] affine coefficients.I found that this model tend to make images brighter,losing original true and delicate color.Do you plan to share your training results in the future? Hoping futher communication with you!

@creotiv
Copy link
Owner

creotiv commented Dec 3, 2020 via email

@gybuaa
Copy link
Author

gybuaa commented Dec 9, 2020

Hi,have you added some data augumentation?I am training it on 32G tesla V100.I find that this model is so small,but training work is very difficult.The metrics psnr on both train dataset and test dataset grow too slowly.I dont understand that since the model have few training parameters (just several conv layers and fc layers),why the training work is so hard?

@creotiv
Copy link
Owner

creotiv commented Dec 9, 2020 via email

@gybuaa
Copy link
Author

gybuaa commented Dec 10, 2020

I just trained it on paper's 5000 fivek dataset,and use 500 of them as validate dataset.I see that paper's some final results can be up to 30 psnr after 2-3 days training job.Until now I only get 15 psnr for 24 hours.The results seem to be just brighter than input images and some results dont work.Did you have reimplemented the results of this paper?And how long have you trained?
Thanks a lot for your reply and patience!

@creotiv
Copy link
Owner

creotiv commented Dec 10, 2020 via email

@creotiv
Copy link
Owner

creotiv commented Dec 10, 2020 via email

@gybuaa
Copy link
Author

gybuaa commented Dec 10, 2020

Yes,I use the latest code from master.I just use the default parameters as paper did: 1e-4 learning rate,adam optimizer and default momentum settings.Because i use 32G gpu,so i add batchsize up to 32 and 64.

@creotiv
Copy link
Owner

creotiv commented Dec 10, 2020 via email

@gybuaa
Copy link
Author

gybuaa commented Dec 10, 2020

I am not sure what you mean about 'luma' and spatial bins?the low-res coefficient's output shape of bilateral grid is [B,12*8,H,W],is spatial bins grid's channel '8' and change it to 16 ?And what 'luma' refers?I will try it again!Thanks :)

@creotiv
Copy link
Owner

creotiv commented Dec 11, 2020 via email

@gybuaa
Copy link
Author

gybuaa commented Dec 22, 2020

Hi.Did you find that pytorch's "grid_sample" function is too slow? I tested that this function would cost 0.2~0.4 seconds per image,which means there's still distance for real time video processing...

@creotiv
Copy link
Owner

creotiv commented Dec 23, 2020 via email

@gybuaa
Copy link
Author

gybuaa commented Dec 28, 2020

Hi,considering that trilinear interpolation needs (x,y,z) coordinates which should be scaled to [-1,1],but sampled 'z' which is generated from 1*1 point-wise NN and 'sigmoid' activation function is a positive number.Does it means that our luma bins' first 4 channel numbers is not used?Do you think it appropriate ?

@QiuJueqin
Copy link

Hi,considering that trilinear interpolation needs (x,y,z) coordinates which should be scaled to [-1,1],but sampled 'z' which is generated from 1*1 point-wise NN and 'sigmoid' activation function is a positive number.Does it means that our luma bins' first 4 channel numbers is not used?Do you think it appropriate ?

Same thought. Using nn.Tanh as activation should make more sense.

@creotiv
Copy link
Owner

creotiv commented Feb 11, 2022

ive added bilateral_slice from original repo compiled for jit. But still has some problems with optimization for some reason.
So i think grid_sample was working correctly

@QiuJueqin
Copy link

After some comparison with my customized tri-linear interpolation, which consists of multiple 2D bilinear interpolation, I'm now pretty sure that the second argument to F.grid_sample (grid) should be something like

torch.cat([wg, hg, guidemap], dim=3).unsqueeze(1)

instead of

torch.cat([hg, wg, guidemap], dim=3).unsqueeze(1)

Furthermore, elements in grid along all axes should be in [-1, 1] range, not [0, 1], which means in the guidance net, the activation should be torch.tanh, instead of torch.sigmoid.

The result of my customized slicing oprator is very similar to the F.grid_sample with inputs formatted mentioned above. The abs error is smaller than 1E-5:

all close with atol=1E-6:  False
all close with atol=1E-5:  True

@Varato
Copy link

Varato commented May 25, 2022

I just trained it on paper's 5000 fivek dataset,and use 500 of them as validate dataset.I see that paper's some final results can be up to 30 psnr after 2-3 days training job.Until now I only get 15 psnr for 24 hours.The results seem to be just brighter than input images and some results dont work.Did you have reimplemented the results of this paper?And how long have you trained?

Hi, I'm trying the model recently. I think the fundamental difficulty of training this model is that the guide prediction net and the bilateral grid (the low-res affine coeffs) update simultaneously. The bilateral grid is like a dictionary and the guide is like keys to look up in the dictionary. When updating together, they have to adapt each other constantly. I can only get psnr about 18 on FiveK. Don't know how to improve.

@creotiv
Copy link
Owner

creotiv commented May 25, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants