Sequence models aren't learning #48

scfrank · 2015-10-01T09:48:56Z

When I run the reverse_words and machine_translation examples, the cost does not decrease, and (in the MT example) the generated samples are still gibberish after 80 epochs. The sqrt example works correctly, which is why I'm suspecting it's to do with the sequence models.

I'm using very recent (yesterday's) git checkouts of blocks-examples, blocks, fuel and theano (installed with pip). Scipy, numpy, etc, are from standard pip install libraries. Using python 2.7 on CPUs; I've replicated this behaviour on two different machines.
theano.version '0.7.0.dev-49b554843f47f1b2bc83bb1cbf64dbcbfc70484a'

Is this a known issue?

rizar · 2015-10-01T14:50:49Z

How long did you train reverse_words? Did you try to use beam_search mode?

scfrank · 2015-10-02T10:25:09Z

I trained reverse_words for the default 100 iterations. Character log-likelihood fluctuates between 1.7 - 2.5, mostly around 2.0, but there's no discernible trend downwards.
In the first instance I only ran on one of the billion-word files; I'm now running on the full dataset (using the default fuel wrapper) and so far (iteration 20 or so) I'm seeing the same behaviour - log likelihood fluctuations but no stable decreases. (Clearly each batch will have different costs but I am expecting a general trend downwards, especially at the beginning.)

When I do beam search using the smaller model, I get an error:

$python -m reverse_words beam_search rev_words
[model is loaded]
Enter a sentence
hi
Enter the beam size
3
Encoder input: [42, 7, 8, 43]
Target:  [42, 8, 7, 43]
Traceback (most recent call last):
  File "/home/sfrank1/.local/lib/python2.7/runpy.py", line 162, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/home/sfrank1/.local/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/datastore/home/sfrank1/smt/nmt/blocks-examples/reverse_words/__main__.py", line 42, in <module>
    main(**vars(args))
  File "reverse_words/__init__.py", line 314, in main
    batch_size, axis=1))
  File "reverse_words/__init__.py", line 274, in generate
    ComputationGraph(generated[1]))
ValueError: too many values to unpack

rizar · 2015-10-02T13:20:00Z

The error in beam search has just been fixed.

The default number of iterations is 10000, not 100, but I guess this is
what you ran.

I will try to run it on my machine and will answer you soon.

On 2 October 2015 at 06:25, scfrank [email protected] wrote:

I trained reverse_words for the default 100 iterations. Character
log-likelihood fluctuates between 1.7 - 2.5, mostly around 2.0, but there's
no discernible trend downwards.
In the first instance I only ran on one of the billion-word files; I'm now
running on the full dataset (using the default fuel wrapper) and so far
(iteration 20 or so) I'm seeing the same behaviour - log likelihood
fluctuations but no stable decreases. (Clearly each batch will have
different costs but I am expecting a general trend downwards, especially at
the beginning.)

When I do beam search using the smaller model, I get an error:

$python -m reverse_words beam_search rev_words
[model is loaded]
Enter a sentence
hi
Enter the beam size
3
Encoder input: [42, 7, 8, 43]
Target: [42, 8, 7, 43]
Traceback (most recent call last):
File "/home/sfrank1/.local/lib/python2.7/runpy.py", line 162, in _run_module_as_main
"main", fname, loader, pkg_name)
File "/home/sfrank1/.local/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/datastore/home/sfrank1/smt/nmt/blocks-examples/reverse_words/main.py", line 42, in
main(**vars(args))
File "reverse_words/init.py", line 314, in main
batch_size, axis=1))
File "reverse_words/init.py", line 274, in generate
ComputationGraph(generated[1]))
ValueError: too many values to unpack

—
Reply to this email directly or view it on GitHub
#48 (comment)
.

scfrank · 2015-10-02T15:14:52Z

Thanks for looking into this! I can confirm beam search works now. The larger model is now at 2000 iterations and seems to be decreasing the average character log likelihood (from 2.35 at iteration 10 to 1.44 at iteration 2000). So maybe this is not actually a bug and more mistaken expectations on my part, sorry! I was expecting a very quick drop within the first tens of iterations and then a levelling off, whereas it seems to be much more gradual.

It would be nice to have indications of expected behaviour in the readme, but I suppose the default number of iterations is a hint. How long would one have to run the MT example before seeing something semi-reasonable? The config setting is 1000000 iterations; is this an "optimal performance for WMT" setting or a "bare minimum necessary" kind of setting?

FWIW, beam search for reverse_words at this checkpoint (2000) doesn't do very well yet:

Enter a sentence
the sun is shining
Enter the beam size
2
Encoder input: [42, 19, 7, 4, 41, 18, 20, 13, 41, 8, 18, 41, 18, 7, 8, 13, 8, 13, 6, 43]
Target:  [42, 4, 7, 19, 41, 13, 20, 18, 41, 18, 8, 41, 6, 13, 8, 13, 8, 7, 18, 43]

(66.2302337189)<S>eht ssisisis si sehtis si sehtis si sehtis si sehtis si seh
(66.0378270722)<S>eht ssisisis si sehtis si sehtis si sehtis si sehtis si sis

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sequence models aren't learning #48

Sequence models aren't learning #48

scfrank commented Oct 1, 2015

rizar commented Oct 1, 2015

scfrank commented Oct 2, 2015

rizar commented Oct 2, 2015

scfrank commented Oct 2, 2015

Sequence models aren't learning #48

Sequence models aren't learning #48

Comments

scfrank commented Oct 1, 2015

rizar commented Oct 1, 2015

scfrank commented Oct 2, 2015

rizar commented Oct 2, 2015

scfrank commented Oct 2, 2015