-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sequence models aren't learning #48
Comments
How long did you train |
I trained reverse_words for the default 100 iterations. Character log-likelihood fluctuates between 1.7 - 2.5, mostly around 2.0, but there's no discernible trend downwards. When I do beam search using the smaller model, I get an error:
|
The error in beam search has just been fixed. The default number of iterations is 10000, not 100, but I guess this is I will try to run it on my machine and will answer you soon. On 2 October 2015 at 06:25, scfrank [email protected] wrote:
|
Thanks for looking into this! I can confirm beam search works now. The larger model is now at 2000 iterations and seems to be decreasing the average character log likelihood (from 2.35 at iteration 10 to 1.44 at iteration 2000). So maybe this is not actually a bug and more mistaken expectations on my part, sorry! I was expecting a very quick drop within the first tens of iterations and then a levelling off, whereas it seems to be much more gradual. It would be nice to have indications of expected behaviour in the readme, but I suppose the default number of iterations is a hint. How long would one have to run the MT example before seeing something semi-reasonable? The config setting is 1000000 iterations; is this an "optimal performance for WMT" setting or a "bare minimum necessary" kind of setting? FWIW, beam search for reverse_words at this checkpoint (2000) doesn't do very well yet:
|
When I run the reverse_words and machine_translation examples, the cost does not decrease, and (in the MT example) the generated samples are still gibberish after 80 epochs. The sqrt example works correctly, which is why I'm suspecting it's to do with the sequence models.
I'm using very recent (yesterday's) git checkouts of blocks-examples, blocks, fuel and theano (installed with pip). Scipy, numpy, etc, are from standard pip install libraries. Using python 2.7 on CPUs; I've replicated this behaviour on two different machines.
theano.version '0.7.0.dev-49b554843f47f1b2bc83bb1cbf64dbcbfc70484a'
Is this a known issue?
The text was updated successfully, but these errors were encountered: