best way to skip gradient for activation functions #4176

bionicles · 2020-08-29T18:09:05Z

bionicles
Aug 29, 2020

to implement hebbian descent as a tinkerer I want to skip gradients for the activation functions

refs on the idea:
https://arxiv.org/abs/1905.10585 Hebbian-Descent
https://arxiv.org/abs/1905.12937 A Hippocampus Model for Online One-Shot Storage of Pattern Sequences

TLDR: skip activation function gradients and center the network

How do you make a wrapper to skip the gradients for activation functions? I'm a bit of a noob on this but I assume I need to pass the tangents along?

This works, but only if you don't use it inside flax.nn.module (otherwise you get assert self.master.trace_type is StagingJaxprTrace assertion error) -- and I'm not sure if it wipes earlier gradients

@jax.custom_jvp
def tanh_skip_grad(x):
    return jnp.tanh(x)

@tanh_skip_grad.defjvp
def tanh_skip_grad_jvp(primals, tangents):
    (x,) = primals
    (x_dot, ) = tangents
    return tanh_skip_grad(x), x_dot

So I went to abstract it into a function transform

def skip_grad(fun):
    @jax.custom_jvp
    def f_skip_grad(*args, **kwargs):
        return fun(*args, **kwargs)
    @f_skip_grad.defjvp
    def f_skip_grad_jvp(primals, tangents):
        return f_skip_grad(*primals), tangents[0]
    return f_skip_grad

This gives an AssertionError about --> 291 assert self.master.trace_type is StagingJaxprTrace both inside and outside of flax.nn.module definitions

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

best way to skip gradient for activation functions #4176

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

best way to skip gradient for activation functions #4176

bionicles Aug 29, 2020

Replies: 0 comments

bionicles
Aug 29, 2020