Use a pre-trained bigram P for LF-MMI training #222

csukuangfj · 2021-06-29T11:20:14Z

Will post the result (the WER, probably tomorrow) when it is available.

Here is the information about the size of P when the number of phones is 86.

	current P	pre-trained P	pre-trained P after epsilon removal
num_states	88	74	74
num_arcs	7568	3634	7209

If we are going to use word pieces with vocab_size 5000, hope that P is not going to increase its size quadratically.
Will show the size of P for word pieces soon.

csukuangfj · 2021-06-30T05:43:46Z

Here are the WERs
(100 hours data with 10 epochs)

test-clean	test-other

Results from #212 (comment)

Results with this pull request are a little worse than that of #212

Results from #218 (comment)

#218 is the latest run that I have. The experiment environments between #218 and this pull request are similar and comparable. Compared with #218, it shows pre-trained P has a lower WER on both test-clean (5.74 vs 5.83) and test-other (15.00 vs 15.64).

danpovey · 2021-06-30T06:01:05Z

That's interesting!
To make the graphs smaller we can consider using count cutoffs (min-counts) to take away low-count n-grams.
I'm kind of confused why there is so much difference; I would have expected the two would be very similar because
how we train the graphs is very similar to ML. But could be due to the ARPA having smoothing, or due to differences
RE silence.

csukuangfj · 2021-06-30T06:31:39Z

egs/librispeech/asr/simple_v1/local/make_kn_lm.py

@@ -0,0 +1,377 @@
+#!/usr/bin/env python3
+
+# Copyright 2016  Johns Hopkins University (Author: Daniel Povey)


This file is a verbatim-copy of https://github.com/kaldi-asr/kaldi/blob/master/egs/wsj/s5/utils/lang/make_kn_lm.py

danpovey · 2021-06-30T07:03:30Z

I'm OK to merge this as-is.

…

On Wed, Jun 30, 2021 at 2:31 PM Fangjun Kuang ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In egs/librispeech/asr/simple_v1/local/make_kn_lm.py <#222 (comment)>: > @@ -0,0 +1,377 @@ +#!/usr/bin/env python3 + +# Copyright 2016 Johns Hopkins University (Author: Daniel Povey) This file is a verbatim-copy of https://github.com/kaldi-asr/kaldi/blob/master/egs/wsj/s5/utils/lang/make_kn_lm.py — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#222 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZFLO6FJ36RWTXDHYRPJWTTVK25NANCNFSM47P4QUAA> .

csukuangfj · 2021-06-30T07:17:01Z

I'm OK to merge this as-is.

I have removed the code supporting training P on-the-fly in this pull-request.
Shall I add an option to let the user choose which kinds of P to use (This will make the code complicated.)

danpovey · 2021-06-30T08:09:14Z

Let's keep it simple for now.

…

On Wed, Jun 30, 2021 at 3:17 PM Fangjun Kuang ***@***.***> wrote: I'm OK to merge this as-is. I have removed the code supporting training P on-the-fly in this pull-request. Shall I add an option to let the user choose which kinds of P to use (This will make the code complicated.) — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#222 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZFLO4CVJZAW4GNU5UIDDLTVLAHNANCNFSM47P4QUAA> .

csukuangfj · 2021-07-01T08:00:26Z

snowfall/common.py

@@ -88,9 +88,11 @@ def load_checkpoint(
            src_key = '{}.{}'.format('module', key)
            dst_state_dict[key] = src_state_dict.pop(src_key)
        assert len(src_state_dict) == 0
-        model.load_state_dict(dst_state_dict)
+        model.load_state_dict(dst_state_dict, strict=False)


@danpovey
Adding strict=False should prevent PyTorch from complaining
about extra key P_scores in the checkpoints.

csukuangfj · 2021-07-01T08:22:08Z

Will merge.

danpovey · 2021-07-01T08:31:17Z

You might want to check the strict=False option. IIRC last time I tried it the torch code was broken and was not correctly respecting that option, and I had to make changes to torch itself locally.

csukuangfj · 2021-07-01T08:39:23Z

@danpovey

You might want to check the strict=False option. IIRC last time I tried it the torch code was broken and was not correctly respecting that option, and I had to make changes to torch itself locally.

I suspect that you forgot to add it to the function average_checkpoint() and you added it only to load_checkpoint().

I just verified it works perfectly with strict=False to load the old model checkpoint which has P_scores.

pzelasko · 2021-07-02T11:55:20Z

After this PR, the “pretrained” P is always going to be used, right?

also did you try 3 or 4 gram (Kaldi style)? I guess it should further help

csukuangfj · 2021-07-02T21:45:26Z

After this PR, the “pretrained” P is always going to be used, right?

Yes, that's right. Supporting both pre-trained P and on-the-fly trained P makes the code complicated.
Pre-trained P gives a slightly better WER according to the above experiments.

also did you try 3 or 4 gram (Kaldi style)? I guess it should further help

Thanks. I will try that.

danpovey · 2021-07-03T03:30:53Z

For using a 3 or 4-gram LM, we would definitely need to do some kind of pruning, or the LM will be way too large.
Ruizhe Huang @huangruizhe is working on a self-contained Python script for Kaldi that can do that. Let's try this after he finishes it.

csukuangfj added 2 commits June 29, 2021 19:19

Use a pre-trained bigram P for LF-MMI training

c2c215b

Fix errors for decoding.

b80a3e8

csukuangfj commented Jun 30, 2021

View reviewed changes

Support loading models trained with P_scores.

6632629

csukuangfj commented Jul 1, 2021

View reviewed changes

More fixes for checkpoints loading.

42c3256

csukuangfj merged commit 25051ea into k2-fsa:master Jul 2, 2021

csukuangfj deleted the pretrained-P branch July 2, 2021 02:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use a pre-trained bigram P for LF-MMI training #222

Use a pre-trained bigram P for LF-MMI training #222

csukuangfj commented Jun 29, 2021

csukuangfj commented Jun 30, 2021 •

edited

Loading

danpovey commented Jun 30, 2021

csukuangfj Jun 30, 2021

danpovey commented Jun 30, 2021 via email

csukuangfj commented Jun 30, 2021

danpovey commented Jun 30, 2021 via email

csukuangfj Jul 1, 2021

csukuangfj commented Jul 1, 2021

danpovey commented Jul 1, 2021

csukuangfj commented Jul 1, 2021

pzelasko commented Jul 2, 2021

csukuangfj commented Jul 2, 2021

danpovey commented Jul 3, 2021

		@@ -0,0 +1,377 @@
		#!/usr/bin/env python3

		# Copyright 2016 Johns Hopkins University (Author: Daniel Povey)

Use a pre-trained bigram P for LF-MMI training #222

Use a pre-trained bigram P for LF-MMI training #222

Conversation

csukuangfj commented Jun 29, 2021

csukuangfj commented Jun 30, 2021 • edited Loading

Results from #212 (comment)

Results from #218 (comment)

danpovey commented Jun 30, 2021

csukuangfj Jun 30, 2021

Choose a reason for hiding this comment

danpovey commented Jun 30, 2021 via email

csukuangfj commented Jun 30, 2021

danpovey commented Jun 30, 2021 via email

csukuangfj Jul 1, 2021

Choose a reason for hiding this comment

csukuangfj commented Jul 1, 2021

danpovey commented Jul 1, 2021

csukuangfj commented Jul 1, 2021

pzelasko commented Jul 2, 2021

csukuangfj commented Jul 2, 2021

danpovey commented Jul 3, 2021

csukuangfj commented Jun 30, 2021 •

edited

Loading