Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem submission - Learning gradient descent with synthetic objectives #10

Open
wants to merge 23 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
462e86d
Create learning-gradient-descent-with-synthetic-objectives.md
cjratcliff Nov 9, 2016
e44d308
Update learning-gradient-descent-with-synthetic-objectives.md
cjratcliff Nov 9, 2016
cb5ed68
Update learning-gradient-descent-with-synthetic-objectives.md
cjratcliff Nov 9, 2016
09bcf59
Update learning-gradient-descent-with-synthetic-objectives.md
cjratcliff Nov 9, 2016
686edb6
Update learning-gradient-descent-with-synthetic-objectives.md
cjratcliff Nov 9, 2016
d4691d1
Update learning-gradient-descent-with-synthetic-objectives.md
cjratcliff Nov 9, 2016
7a97aff
Update learning-gradient-descent-with-synthetic-objectives.md
cjratcliff Nov 9, 2016
b9b8050
Update learning-gradient-descent-with-synthetic-objectives.md
cjratcliff Nov 10, 2016
a93633f
Update learning-gradient-descent-with-synthetic-objectives.md
cjratcliff Nov 10, 2016
ba0b3bb
Update learning-gradient-descent-with-synthetic-objectives.md
cjratcliff Nov 10, 2016
c61cb89
Update learning-gradient-descent-with-synthetic-objectives.md
cjratcliff Nov 10, 2016
247d891
Update learning-gradient-descent-with-synthetic-objectives.md
cjratcliff Nov 10, 2016
5807bee
Update learning-gradient-descent-with-synthetic-objectives.md
cjratcliff Nov 10, 2016
410a5b4
Update learning-gradient-descent-with-synthetic-objectives.md
cjratcliff Nov 10, 2016
fafad48
Update learning-gradient-descent-with-synthetic-objectives.md
cjratcliff Nov 10, 2016
a71ff34
Update learning-gradient-descent-with-synthetic-objectives.md
cjratcliff Nov 10, 2016
b240a26
Update learning-gradient-descent-with-synthetic-objectives.md
cjratcliff Nov 10, 2016
b329475
Update learning-gradient-descent-with-synthetic-objectives.md
cjratcliff Nov 12, 2016
e2b9caf
Update learning-gradient-descent-with-synthetic-objectives.md
cjratcliff Nov 12, 2016
976f149
Update learning-gradient-descent-with-synthetic-objectives.md
cjratcliff Nov 12, 2016
2a8d5a2
Update learning-gradient-descent-with-synthetic-objectives.md
cjratcliff Nov 12, 2016
a8ab79b
Update learning-gradient-descent-with-synthetic-objectives.md
cjratcliff Nov 12, 2016
a14ba86
Removed some white space.
cjratcliff Nov 12, 2016
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 32 additions & 0 deletions projects/learning-gradient-descent-with-synthetic-objectives.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
Title: Learning Gradient Descent with Synthetic Objective Functions
Tagline: Develop techniques for training gradient descent optimizers for neural networks
Date: November 2016
Category: Fundamental Research
Mailing list: https://groups.google.com/forum/#!forum/aion-learning-gradient-descent-with-synthetic-objectives
Contact: Chris Ratcliff - [email protected]


## Problem description
Current optimization algorithms for neural networks such as SGD, RMSProp and Adam are hand-crafted and generally quite simple. This can be partly explained by the high-dimensional, non-convex nature of neural network's objective functions which human intuition, normally limited to three spatial dimensions, is not well-suited for. A learning algorithm, therefore, may be able to design a superior optimizer.

Recently, Andrychowicz et al. attempted to solve the problem by training an LSTM which takes the gradient at a point and its hidden state as input and outputs the proposed update to the parameters of the net which is being trained. They trained on one optimization problem (such as the MNIST dataset with an MLP) at a time but found that it failed to generalize properly, even to networks with the same architecture but using a different activation function. Using 'synthetic' objective functions (i.e. explicitly specified functions in the same way a quadratic equation is) allows an arbitrary number of functions to be generated at negligible cost, increasing generalization by having a training set that is effectively infinite.


## Why this problem matters
Optimization is at the heart of deep learning with the choice of algorithm used affecting both accuracy and training time. As neural networks become deeper they are also likely to become harder to train, necessitating the use of more sophisticated optimizers.


## How to measure success
A graph of training loss against the number of iterations for a network trained under different algorithms is commonly used to see which algorithm is better by simple visual inspection. One may also consider plotting loss against time rather than the number of iterations. This would be a harder metric, given the expense of computing one step of an LSTM compared to a single scalar multiplication, as in standard SGD.


## Project status
A formula for generating synthetic objective functions has been created. These functions are differentiable and their dimensionality and degree of non-linearity can be controlled with hyperparameters.

A proof of concept optimizer trained with supervised learning has shown that the approach does indeed generalize well but currently performs no better than SGD. An alternative approach using reinforcement learning is theoretically superior as it does not have to approximate the task's objective but has not produced good results so far.


## References
- [Andrychowicz, M., Denil, M., Gomez, S., Hoffman, MW., Pfau, D., Schaul, T., and de Freitas, N. Learning to learn by gradient descent by gradient descent.](https://arxiv.org/pdf/1606.04474v1.pdf)
- [Kingma, D. and Ba, J. Adam: A Method for Stochastic Optimization](https://arxiv.org/abs/1412.6980)
- [Koushik, J. and Hayashi, H. Improving Stochastic Gradient Descent with Feedback](https://arxiv.org/pdf/1611.01505.pdf)