You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've successfully reimplemented your work in Julia / Knet DL framework here SHA-RNN.jl.
During training I've faced some problems with the first batch of the dataset. Since there is no previous hidden or attn memory exists, the model finds it hard to predict the right output. And during training the model see this case only once in every epoch.
To deal with this issue I saw this part in your main file:
Brilliant work on the Julia / Knet implementation! I've looked towards Julia with interest and curiosity given the many advantages it offers. The more DL examples on that side of the fence the better! =]
Regarding the first batch problem, you are entirely correct. The SHA-RNN codebase is optimized for final perplexity on enwik8 and similar documents however and hence rarely has to deal with "first batches". For the model to learn how to deal with them effectively generally means worse performance on long form documents.
If you were interested in tailoring your model for handling such "first batches" you could indeed do what was in the codebase by zeroing out the hidden state. Better than that however would be to store an initial hidden state that's updated via gradients during model training. This doesn't make sense for the model I wrote as there are only a few dozen examples per epoch of "first batches".
The extreme version of this would be to consume part of the input and then select between K initial hidden states, each tailored for a different category of input, and then running from there.
I've successfully reimplemented your work in Julia / Knet DL framework here SHA-RNN.jl.
During training I've faced some problems with the first batch of the dataset. Since there is no previous hidden or attn memory exists, the model finds it hard to predict the right output. And during training the model see this case only once in every epoch.
To deal with this issue I saw this part in your main file:
This seems a proper solution to the problem. But you've commented it.
Why did you disabled this part? did not this approach help ?
Thanks!
The text was updated successfully, but these errors were encountered: