Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sequence models aren't learning #48

Open
scfrank opened this issue Oct 1, 2015 · 4 comments
Open

Sequence models aren't learning #48

scfrank opened this issue Oct 1, 2015 · 4 comments

Comments

@scfrank
Copy link

scfrank commented Oct 1, 2015

When I run the reverse_words and machine_translation examples, the cost does not decrease, and (in the MT example) the generated samples are still gibberish after 80 epochs. The sqrt example works correctly, which is why I'm suspecting it's to do with the sequence models.

I'm using very recent (yesterday's) git checkouts of blocks-examples, blocks, fuel and theano (installed with pip). Scipy, numpy, etc, are from standard pip install libraries. Using python 2.7 on CPUs; I've replicated this behaviour on two different machines.
theano.version '0.7.0.dev-49b554843f47f1b2bc83bb1cbf64dbcbfc70484a'

Is this a known issue?

@rizar
Copy link
Contributor

rizar commented Oct 1, 2015

How long did you train reverse_words? Did you try to use beam_search mode?

@scfrank
Copy link
Author

scfrank commented Oct 2, 2015

I trained reverse_words for the default 100 iterations. Character log-likelihood fluctuates between 1.7 - 2.5, mostly around 2.0, but there's no discernible trend downwards.
In the first instance I only ran on one of the billion-word files; I'm now running on the full dataset (using the default fuel wrapper) and so far (iteration 20 or so) I'm seeing the same behaviour - log likelihood fluctuations but no stable decreases. (Clearly each batch will have different costs but I am expecting a general trend downwards, especially at the beginning.)

When I do beam search using the smaller model, I get an error:

$python -m reverse_words beam_search rev_words
[model is loaded]
Enter a sentence
hi
Enter the beam size
3
Encoder input: [42, 7, 8, 43]
Target:  [42, 8, 7, 43]
Traceback (most recent call last):
  File "/home/sfrank1/.local/lib/python2.7/runpy.py", line 162, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/home/sfrank1/.local/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/datastore/home/sfrank1/smt/nmt/blocks-examples/reverse_words/__main__.py", line 42, in <module>
    main(**vars(args))
  File "reverse_words/__init__.py", line 314, in main
    batch_size, axis=1))
  File "reverse_words/__init__.py", line 274, in generate
    ComputationGraph(generated[1]))
ValueError: too many values to unpack

@rizar
Copy link
Contributor

rizar commented Oct 2, 2015

The error in beam search has just been fixed.

The default number of iterations is 10000, not 100, but I guess this is
what you ran.

I will try to run it on my machine and will answer you soon.

On 2 October 2015 at 06:25, scfrank [email protected] wrote:

I trained reverse_words for the default 100 iterations. Character
log-likelihood fluctuates between 1.7 - 2.5, mostly around 2.0, but there's
no discernible trend downwards.
In the first instance I only ran on one of the billion-word files; I'm now
running on the full dataset (using the default fuel wrapper) and so far
(iteration 20 or so) I'm seeing the same behaviour - log likelihood
fluctuations but no stable decreases. (Clearly each batch will have
different costs but I am expecting a general trend downwards, especially at
the beginning.)

When I do beam search using the smaller model, I get an error:

$python -m reverse_words beam_search rev_words
[model is loaded]
Enter a sentence
hi
Enter the beam size
3
Encoder input: [42, 7, 8, 43]
Target: [42, 8, 7, 43]
Traceback (most recent call last):
File "/home/sfrank1/.local/lib/python2.7/runpy.py", line 162, in _run_module_as_main
"main", fname, loader, pkg_name)
File "/home/sfrank1/.local/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/datastore/home/sfrank1/smt/nmt/blocks-examples/reverse_words/main.py", line 42, in
main(**vars(args))
File "reverse_words/init.py", line 314, in main
batch_size, axis=1))
File "reverse_words/init.py", line 274, in generate
ComputationGraph(generated[1]))
ValueError: too many values to unpack


Reply to this email directly or view it on GitHub
#48 (comment)
.

@scfrank
Copy link
Author

scfrank commented Oct 2, 2015

Thanks for looking into this! I can confirm beam search works now. The larger model is now at 2000 iterations and seems to be decreasing the average character log likelihood (from 2.35 at iteration 10 to 1.44 at iteration 2000). So maybe this is not actually a bug and more mistaken expectations on my part, sorry! I was expecting a very quick drop within the first tens of iterations and then a levelling off, whereas it seems to be much more gradual.

It would be nice to have indications of expected behaviour in the readme, but I suppose the default number of iterations is a hint. How long would one have to run the MT example before seeing something semi-reasonable? The config setting is 1000000 iterations; is this an "optimal performance for WMT" setting or a "bare minimum necessary" kind of setting?

FWIW, beam search for reverse_words at this checkpoint (2000) doesn't do very well yet:

Enter a sentence
the sun is shining
Enter the beam size
2
Encoder input: [42, 19, 7, 4, 41, 18, 20, 13, 41, 8, 18, 41, 18, 7, 8, 13, 8, 13, 6, 43]
Target:  [42, 4, 7, 19, 41, 13, 20, 18, 41, 18, 8, 41, 6, 13, 8, 13, 8, 7, 18, 43]

(66.2302337189)<S>eht ssisisis si sehtis si sehtis si sehtis si sehtis si seh
(66.0378270722)<S>eht ssisisis si sehtis si sehtis si sehtis si sehtis si sis

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants