Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CTC training #3

Merged
merged 26 commits into from
Jul 31, 2021
Merged

Add CTC training #3

merged 26 commits into from
Jul 31, 2021

Conversation

csukuangfj
Copy link
Collaborator

@csukuangfj csukuangfj commented Jul 15, 2021

There are various code formatting and style issues in snowfall since
it is written by different people with different preferred styles.

This pull request tries to ensure that code styles in icefall are as consistent as possible; all happens automagically with the help of the following tools:

@pzelasko
Copy link
Collaborator

+1 for black, not sure about mypy — it will be very strict about typing and we might end up spending a lot of extra effort to adhere to its strict checks.

@csukuangfj
Copy link
Collaborator Author

it will be very strict about typing and we might end up spending a lot of extra effort to adhere to its strict checks.

ok, in that case, I can remove mypy.

@csukuangfj
Copy link
Collaborator Author

I am trying to add the data preparation part for the LibriSpeech recipe and am trying to put everything in Python.

@pzelasko
Copy link
Collaborator

Are you considering porting prepare_lang.sh to Python? I have wanted to do it for some time now... it will be very useful for adding new recipes.

@danpovey
Copy link
Collaborator

If we do port it to Python I don't think we need an entire Kaldi-compatible lang directory, we can just keep the things we need for k2. We might have several different possible formats, e.g. a lexicon-based version and a BPE-based version with different information. But I think writing things to files in a directory is a good idea as it makes them easy to inspect.

mkdir -p data/LibriSpeech
# TODO

if [ ! -f data/LibriSpeech/train-other-500/.completed ]; then
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that this check is technically needed; i.e. the download script would figure the completion status by itself

OTOH I am concerned about fixing the location of the dataset in data/LibriSpeech -- people tend to have corpora downloaded in some standard locations on their own setups, I think this should remain customizable

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the download script would figure the completion status by itself

If the zipped files are removed after extraction, the script will download it again. That's why I add this check to avoid invoking the download script.

people tend to have corpora downloaded in some standard locations on their own setups, I think this should remain customizable

Fix the location of the dataset makes the code simpler. Can we let the users create a symlink to its original dataset path, i.e.,

ln -s /path/to/LibriSpeech data/

as mentioned in the script

# If you have pre-downloaded it to /path/to/LibriSpeech,
# you can create a symlink to avoid downloading it again:
#
# ln -sfv /path/to/LibriSpeech data/
#

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, I see, I missed the bit about the symbolic link before. I am OK with that.

If the zipped files are removed after extraction, the script will download it again. That's why I add this check to avoid invoking the download script.

I see... maybe this is the right way then. I'm not sure if there is a straightforward way to address this issue inside of lhotse download.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... I take that back -- I think we can change Lhotse so that the "completed detector" is executed before downloading the files (move this line a few lines up)

I can make this change later for all the recipes. WDYT @csukuangfj ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can make this change later for all the recipes.

That's would be great. In that case, I won't need to add that check here.

# ln -s /path/to/musan data/
#
if [ ! -e data/musan ]; then
wget https://www.openslr.org/resources/17/musan.tar.gz
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@csukuangfj
Copy link
Collaborator Author

Are you considering porting prepare_lang.sh to Python?

Yes, but I am going to implement only a subset of prepare_lang.sh that is needed by snowfall, i.e., it contains
no lexiconp.txt, no unk_fst, no position-dependent phones, no extra questions.txt, no silprob, no word boundary, no grammar_opts. Only those that are currently used in snowfall will be ported to Python.

If that's ok, then I will go ahead. Otherwise, I will use the current prepare_lang.sh

output_dir = Path("data/manifests")
num_jobs = min(15, os.cpu_count())

librispeech_manifests = prepare_librispeech(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this and the download script can be completely replaced with:

$ lhotse download librispeech --full $CORPUS_DIR
$ lhotse prepare librispeech -j $NUM_JOBS $CORPUS_DIR $MANIFEST_DIR



@contextmanager
def get_executor():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This executor bit seems like a good candidate to move to the library-level?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will move it to a new file local/utils.py

@pzelasko
Copy link
Collaborator

Are you considering porting prepare_lang.sh to Python?

Yes, but I am going to implement only a subset of prepare_lang.sh that is needed by snowfall, i.e., it contains
no lexiconp.txt, no unk_fst, no position-dependent phones, no extra questions.txt, no silprob, no word boundary, no grammar_opts. Only those that are currently used in snowfall will be ported to Python.

If that's ok, then I will go ahead. Otherwise, I will use the current prepare_lang.sh

That sounds good to me!

@csukuangfj
Copy link
Collaborator Author

csukuangfj commented Jul 20, 2021

I've ported a subset of prepare_lang.sh to local/prepare_lang.py

Here are some test results.

Input lexicon.txt

!SIL SIL
<SPOKEN_NOISE> SPN
<UNK> SPN
f f
a a
foo f o o
bar b a r
bark b a r k
food f o o d
food2 f o o d
fo f o

The following are outputs:

lexicon_disambig.txt

!SIL SIL
<SPOKEN_NOISE> SPN #1
<UNK> SPN #2
f f #1
a a
foo f o o #1
bar b a r #1
bark b a r k
food f o o d #1
food2 f o o d #2
fo f o #1

phones.txt

<eps> 0
SIL 1
SPN 2
a 3
b 4
d 5
f 6
k 7
o 8
r 9
#0 10
#1 11
#2 12

words.txt

<eps> 0
!SIL 1
<SPOKEN_NOISE> 2
<UNK> 3
a 4
bar 5
bark 6
f 7
fo 8
foo 9
food 10
food2 11
#0 12
<s> 13
</s> 14

L.fst

L_disambig.fst

@danpovey
Copy link
Collaborator

Great!!
BTW, I anticipate possibly having different versions of the lexicon for BPE and phones, either separately (separate systems) or at the same time
(jointly trained systems). The BPE version might use a more compact representation, e.g as a ragged array, at the python level, although that's separate from the on-disk representation where we may choose something maximally human readable.
I'm not trying to dictate anything here about the formats-- I would probably overthink things-- I'm just saying this stuff so you can keep it in mind and make your own decisions.
It might be that the Lexicon object, if there is such a thing, would have phone and BPE versions with different characteristics and perhaps some methods in common but not all. Again, I don't want to dictate any of this, just giving ideas.

@csukuangfj
Copy link
Collaborator Author

I just found that prepare_lang.sh adds base phones to the lexicon.
That is, there are AA0, AA1, AA2 in the lexicon; that script adds a new phone AA to the phones.txt.

I cannot think of any benefits of adding AA. It never appears in the training data since it does not
exist in the lexicon. Moreover, it increases the number of neural output units and never gets trained.

@danpovey Should the python version prepare_lang.py follow prepare_lang.sh to add AA or just ignore it?

@danpovey
Copy link
Collaborator

You can ignore it. The only time it's really needed is for the optional silence.
Note: technically the silence can appear inside words-- at least, we have no check against this-- which is why SIL_B snd so on still exist.

states_needs_self_loops = set()
for arc in arcs:
src, dst, ilable, olable, score = arc
if olable != 0:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lable -> label

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Fixed.

@csukuangfj csukuangfj changed the title Add style check tools. Add CTC training Jul 24, 2021
@csukuangfj
Copy link
Collaborator Author

Here are the decoding results with icefall (using only the encoder part) from the pre-trained model (downloaded from https://huggingface.co/GuoLiyong/snowfall_bpe_model/tree/main/exp-duration-200-feat_batchnorm-bpe-lrfactor5.0-conformer-512-8-noam, as mentioned in k2-fsa/snowfall#227)

HLG - no LM rescoring

(output beam size is 8)

1-best decoding

[test-clean-no_rescore] %WER 3.15% [1656 / 52576, 127 ins, 377 del, 1152 sub ]
[test-other-no_rescore] %WER 7.03% [3682 / 52343, 220 ins, 1024 del, 2438 sub ]

n-best decoding

For n=100,

[test-clean-no_rescore-100] %WER 3.15% [1656 / 52576, 127 ins, 377 del, 1152 sub ]
[test-other-no_rescore-100] %WER 7.14% [3737 / 52343, 275 ins, 1020 del, 2442 sub ]

For n=200,

[test-clean-no_rescore-200] %WER 3.16% [1660 / 52576, 125 ins, 378 del, 1157 sub ]
[test-other-no_rescore-200] %WER 7.04% [3684 / 52343, 228 ins, 1012 del, 2444 sub ]

HLG - with LM rescoring

Whole lattice rescoring

[test-clean-lm_scale_0.8] %WER 2.77% [1456 / 52576, 150 ins, 210 del, 1096 sub ]
[test-other-lm_scale_0.8] %WER 6.23% [3262 / 52343, 246 ins, 635 del, 2381 sub ]

WERs of different LM scales are:

For test-clean, WER of different settings are:
lm_scale_0.8    2.77    best for test-clean
lm_scale_0.9    2.87
lm_scale_1.0    3.06
lm_scale_1.1    3.34
lm_scale_1.2    3.71
lm_scale_1.3    4.18
lm_scale_1.4    4.8
lm_scale_1.5    5.48
lm_scale_1.6    6.08
lm_scale_1.7    6.79
lm_scale_1.8    7.49
lm_scale_1.9    8.14
lm_scale_2.0    8.82

For test-other, WER of different settings are:
lm_scale_0.8    6.23    best for test-other
lm_scale_0.9    6.37
lm_scale_1.0    6.62
lm_scale_1.1    6.99
lm_scale_1.2    7.46
lm_scale_1.3    8.13
lm_scale_1.4    8.84
lm_scale_1.5    9.61
lm_scale_1.6    10.32
lm_scale_1.7    11.17
lm_scale_1.8    12.12
lm_scale_1.9    12.93
lm_scale_2.0    13.77

n-best LM rescoring

n = 100

[test-clean-lm_scale_0.8] %WER 2.79% [1469 / 52576, 149 ins, 212 del, 1108 sub ]
[test-other-lm_scale_0.8] %WER 6.36% [3329 / 52343, 259 ins, 666 del, 2404 sub ]

WERs of different LM scales are:

For test-clean, WER of different settings are:
lm_scale_0.8    2.79    best for test-clean
lm_scale_0.9    2.89
lm_scale_1.0    3.03
lm_scale_1.1    3.28
lm_scale_1.2    3.52
lm_scale_1.3    3.78
lm_scale_1.4    4.04
lm_scale_1.5    4.24
lm_scale_1.6    4.45
lm_scale_1.7    4.58
lm_scale_1.8    4.7
lm_scale_1.9    4.8
lm_scale_2.0    4.92
For test-other, WER of different settings are:
lm_scale_0.8    6.36    best for test-other
lm_scale_0.9    6.45
lm_scale_1.0    6.64
lm_scale_1.1    6.92
lm_scale_1.2    7.25
lm_scale_1.3    7.59
lm_scale_1.4    7.88
lm_scale_1.5    8.13
lm_scale_1.6    8.36
lm_scale_1.7    8.54
lm_scale_1.8    8.71
lm_scale_1.9    8.88
lm_scale_2.0    9.02

n = 150

[test-clean-lm_scale_0.8] %WER 2.80% [1472 / 52576, 149 ins, 218 del, 1105 sub ]
[test-other-lm_scale_0.8] %WER 6.35% [3325 / 52343, 262 ins, 660 del, 2403 sub ]
For test-clean, WER of different settings are:
lm_scale_0.8    2.8     best for test-clean
lm_scale_0.9    2.89
lm_scale_1.0    3.05
lm_scale_1.1    3.3
lm_scale_1.2    3.56
lm_scale_1.3    3.83
lm_scale_1.4    4.1
lm_scale_1.5    4.32
lm_scale_1.6    4.56
lm_scale_1.7    4.73
lm_scale_1.8    4.88
lm_scale_1.9    5.01
lm_scale_2.0    5.14
For test-other, WER of different settings are:
lm_scale_0.8    6.35    best for test-other
lm_scale_0.9    6.44
lm_scale_1.0    6.63
lm_scale_1.1    6.91
lm_scale_1.2    7.25
lm_scale_1.3    7.62
lm_scale_1.4    7.92
lm_scale_1.5    8.23
lm_scale_1.6    8.43
lm_scale_1.7    8.67
lm_scale_1.8    8.89
lm_scale_1.9    9.08
lm_scale_2.0    9.23

@danpovey
Copy link
Collaborator

Great! By the n-best LM rescoring, are you talking about a neural or n-gram LM?

@pzelasko
Copy link
Collaborator

Nice! Do we have the scripts for training the BPE models from scratch somewhere?

@csukuangfj
Copy link
Collaborator Author

Great! By the n-best LM rescoring, are you talking about a neural or n-gram LM?

I am only using the transformer encoder + HLG decoding + 4-gram rescoring.

Will integrate the attention decoder for rescoring.

@csukuangfj
Copy link
Collaborator Author

Nice! Do we have the scripts for training the BPE models from scratch somewhere?

Yes, the training code is in the pull-request: k2-fsa/snowfall#219

We're porting it to icefall and polishing the training code.

@danpovey
Copy link
Collaborator

danpovey commented Jul 29, 2021 via email

@csukuangfj
Copy link
Collaborator Author

csukuangfj commented Jul 31, 2021

The following are the partial results (1best decoding without LM rescoring and without attention decoder)
of the following training command:

./conformer_ctc/train.py \
  --bucketing-sampler 1 \
  --num-buckets 1000 \
  --concatenate-cuts 0 \
  --max-duration 200 \
  --full-libri 1 \
  --world-size 3

It seems it is able to reproduce what Liyong has been doing. I think it is safe to merge now. I will fix all the comments in
other PRs.

The tensorboard log is available at
https://tensorboard.dev/experiment/plZOLw07RUGYw8sGEybXyg/

epoch-0

[test-clean-no_rescore] %WER 34.43% [18104 / 52576, 56 ins, 13896 del, 4152 sub ]
[test-other-no_rescore] %WER 47.82% [25032 / 52343, 49 ins, 19273 del, 5710 sub ]

epoch-1

[test-clean-no_rescore] %WER 15.64% [8225 / 52576, 121 ins, 4689 del, 3415 sub ]
[test-other-no_rescore] %WER 29.53% [15455 / 52343, 130 ins, 9454 del, 5871 sub ]

average of epoch 0 and 1

[test-clean-no_rescore] %WER 16.69% [8775 / 52576, 222 ins, 4173 del, 4380 sub ]
[test-other-no_rescore] %WER 30.20% [15810 / 52343, 282 ins, 8426 del, 7102 sub ]

epoch-2

[test-clean-no_rescore] %WER 11.83% [6220 / 52576, 140 ins, 3309 del, 2771 sub ]
[test-other-no_rescore] %WER 23.83% [12471 / 52343, 162 ins, 7268 del, 5041 sub ]

average of epoch 1 and 2

[test-clean-no_rescore] %WER 9.40% [4940 / 52576, 172 ins, 2036 del, 2732 sub ]
[test-other-no_rescore] %WER 19.25% [10075 / 52343, 247 ins, 4521 del, 5307 sub ]

average of epoch 0, 1 and 2

[test-clean-no_rescore] %WER 15.06% [7918 / 52576, 244 ins, 3516 del, 4158 sub ]
[test-other-no_rescore] %WER 29.58% [15482 / 52343, 288 ins, 8137 del, 7057 sub ]

epoch-3

[test-clean-no_rescore] %WER 10.65% [5600 / 52576, 102 ins, 3013 del, 2485 sub ]
[test-other-no_rescore] %WER 23.10% [12091 / 52343, 127 ins, 7583 del, 4381 sub ]

average of epoch 2 and 3

[test-clean-no_rescore] %WER 7.38% [3880 / 52576, 147 ins, 1483 del, 2250 sub ]
[test-other-no_rescore] %WER 16.26% [8512 / 52343, 207 ins, 3808 del, 4497 sub ]

average of epoch 1, 2 and 3

[test-clean-no_rescore] %WER 8.49% [4464 / 52576, 164 ins, 1746 del, 2554 sub ]
[test-other-no_rescore] %WER 18.20% [9528 / 52343, 248 ins, 4221 del, 5059 sub ]

average of epoch 0, 1, 2 and 3

[test-clean-no_rescore] %WER 12.93% [6800 / 52576, 224 ins, 3028 del, 3548 sub ]
[test-other-no_rescore] %WER 26.00% [13609 / 52343, 312 ins, 6635 del, 6662 sub ]

epoch 4

[test-clean-no_rescore] %WER 8.40% [4414 / 52576, 131 ins, 2048 del, 2235 sub ]
[test-other-no_rescore] %WER 18.93% [9907 / 52343, 149 ins, 5410 del, 4348 sub ]

average of epoch 3 and 4

[test-clean-no_rescore] %WER 6.24% [3280 / 52576, 150 ins, 1100 del, 2030 sub
[test-other-no_rescore] %WER 14.12% [7390 / 52343, 197 ins, 3058 del, 4135 sub ]

average of epoch 2, 3 and 4

[test-clean-no_rescore] %WER 6.27% [3295 / 52576, 156 ins, 1051 del, 2088 sub ]
[test-other-no_rescore] %WER 14.61% [7645 / 52343, 232 ins, 2977 del, 4436 sub ]

average of epoch 1, 2, 3 and 4

[test-clean-no_rescore] %WER 8.83% [4643 / 52576, 170 ins, 1952 del, 2521 sub ]
[test-other-no_rescore] %WER 20.18% [10564 / 52343, 247 ins, 4855 del, 5462 sub ]

average of epoch 0, 1, 2, 3 and 4

[test-clean-no_rescore] %WER 14.56% [7657 / 52576, 202 ins, 3777 del, 3678 sub ]
[test-other-no_rescore] %WER 30.02% [15714 / 52343, 277 ins, 8482 del, 6955 sub ]

epoch 5

[test-clean-no_rescore] %WER 6.97% [3665 / 52576, 126 ins, 1484 del, 2055 sub ]
[test-other-no_rescore] %WER 16.21% [8483 / 52343, 157 ins, 4283 del, 4043 sub ]

average of epoch 4 and 5

[test-clean-no_rescore] %WER 5.47% [2877 / 52576, 144 ins, 911 del, 1822 sub ]
[test-other-no_rescore] %WER 12.90% [6752 / 52343, 218 ins, 2636 del, 3898 sub ]

average of epoch 3, 4 and 5

[test-clean-no_rescore] %WER 5.34% [2808 / 52576, 154 ins, 781 del, 1873 sub ]
[test-other-no_rescore] %WER 12.39% [6487 / 52343, 217 ins, 2265 del, 4005 sub ]

average of epoch 2, 3, 4 and 5

[test-clean-no_rescore] %WER 6.00% [3153 / 52576, 154 ins, 941 del, 2058 sub ]
[test-other-no_rescore] %WER 14.27% [7471 / 52343, 247 ins, 2648 del, 4576 sub ]

average of epoch 1, 2, 3, 4 and 5

[test-clean-no_rescore] %WER 10.10% [5311 / 52576, 160 ins, 2288 del, 2863 sub ]
[test-other-no_rescore] %WER 23.36% [12229 / 52343, 221 ins, 5877 del, 6131 sub ]

average of epoch 0, 1, 2, 3, 4 and 5

test-clean-no_rescore] %WER 22.07% [11606 / 52576, 149 ins, 6744 del, 4713 sub ]
[test-other-no_rescore] %WER 41.52% [21733 / 52343, 159 ins, 13730 del, 7844 sub ]

epoch 6

[test-clean-no_rescore] %WER 6.67% [3507 / 52576, 101 ins, 1603 del, 1803 sub ]
[test-other-no_rescore] %WER 15.59% [8160 / 52343, 141 ins, 4465 del, 3554 sub ]

average of epoch 5 and 6

[test-clean-no_rescore] %WER 5.23% [2750 / 52576, 127 ins, 944 del, 1679 sub ]
[test-other-no_rescore] %WER 12.23% [6403 / 52343, 187 ins, 2649 del, 3567 sub ]

average of epoch 4, 5 and 6

[test-clean-no_rescore] %WER 5.07% [2666 / 52576, 134 ins, 821 del, 1711 sub ]
[test-other-no_rescore] %WER 11.69% [6118 / 52343, 205 ins, 2322 del, 3591 sub ]

average of epoch 3, 4, 5 and 6

[test-clean-no_rescore] %WER 5.09% [2674 / 52576, 151 ins, 732 del, 1791 sub ]
[test-other-no_rescore] %WER 11.86% [6209 / 52343, 207 ins, 2125 del, 3877 sub ]

average of epoch 2, 3, 4, 5 and 6

[test-clean-no_rescore] %WER 5.96% [3136 / 52576, 152 ins, 951 del, 2033 sub ]
[test-other-no_rescore] %WER 14.77% [7730 / 52343, 260 ins, 2776 del, 4694 sub ]

average of epoch 1, 2, 3, 4, 5 and 6

[test-clean-no_rescore] %WER 12.04% [6328 / 52576, 151 ins, 2859 del, 3318 sub ]
[test-other-no_rescore] %WER 27.15% [14213 / 52343, 230 ins, 7208 del, 6775 sub ]

average of epoch 0, 1, 2, 3, 4, 5 and 6

[test-clean-no_rescore] %WER 34.10% [17928 / 52576, 145 ins, 11849 del, 5934 sub ]
[test-other-no_rescore] %WER 55.40% [29000 / 52343, 117 ins, 20099 del, 8784 sub ]

epoch 7

[test-clean-no_rescore] %WER 6.48% [3405 / 52576, 120 ins, 1468 del, 1817 sub ]
[test-other-no_rescore] %WER 14.98% [7843 / 52343, 139 ins, 4147 del, 3557 sub ]

average of epoch 6 and 7

[test-clean-no_rescore] %WER 5.15% [2706 / 52576, 117 ins, 987 del, 1602 sub ]
[test-other-no_rescore] %WER 12.09% [6330 / 52343, 174 ins, 2846 del, 3310 sub ]

average of epoch 5, 6 and 7

[test-clean-no_rescore] %WER 4.87% [2560 / 52576, 133 ins, 805 del, 1622 sub ]
[test-other-no_rescore] %WER 11.34% [5934 / 52343, 195 ins, 2379 del, 3360 sub ]

average of epoch 4, 5, 6 and 7

[test-clean-no_rescore] %WER 4.82% [2533 / 52576, 136 ins, 737 del, 1660 sub ]
[test-other-no_rescore] %WER 11.12% [5821 / 52343, 203 ins, 2125 del, 3493 sub ]

average of epoch 3, 4, 5, 6 and 7

[test-clean-no_rescore] %WER 4.89% [2572 / 52576, 148 ins, 699 del, 1725 sub ]
[test-other-no_rescore] %WER 11.62% [6083 / 52343, 211 ins, 2091 del, 3781 sub ]

average of epoch 2, 3, 4, 5, 6 and 7

[test-clean-no_rescore] %WER 5.95% [3128 / 52576, 153 ins, 949 del, 2026 sub ]
[test-other-no_rescore] %WER 14.77% [7730 / 52343, 250 ins, 2756 del, 4724 sub ]

average of epoch 1, 2, 3, 4, 5, 6 and 7

[test-clean-no_rescore] %WER 13.60% [7148 / 52576, 156 ins, 3317 del, 3675 sub ]
[test-other-no_rescore] %WER 29.95% [15677 / 52343, 199 ins, 8156 del, 7322 sub ]

average of epoch 0, 1, 2, 3, 4, 5, 6 and 7

[test-clean-no_rescore] %WER 47.09% [24758 / 52576, 80 ins, 18322 del, 6356 sub ]
[test-other-no_rescore] %WER 67.21% [35182 / 52343, 52 ins, 26717 del, 8413 sub ]

epoch 8

[test-clean-no_rescore] %WER 6.44% [3386 / 52576, 117 ins, 1648 del, 1621 sub ]
[test-other-no_rescore] %WER 14.59% [7639 / 52343, 142 ins, 4293 del, 3204 sub ]

average of epoch 7 and 8

[test-clean-no_rescore] %WER 5.25% [2760 / 52576, 130 ins, 1035 del, 1595 sub ]
[test-other-no_rescore] %WER 11.71% [6131 / 52343, 159 ins, 2822 del, 3150 sub ]

average of epoch 6, 7, and 8

[test-clean-no_rescore] %WER 4.83% [2542 / 52576, 126 ins, 883 del, 1533 sub ]
[test-other-no_rescore] %WER 11.07% [5793 / 52343, 189 ins, 2513 del, 3091 sub ]

average of epoch 5, 6, 7, and 8

[test-clean-no_rescore] %WER 4.68% [2462 / 52576, 135 ins, 757 del, 1570 sub ]
[test-other-no_rescore] %WER 10.74% [5621 / 52343, 196 ins, 2187 del, 3238 sub ]

average of epoch 4, 5, 6, 7, and 8

[test-clean-no_rescore] %WER 4.64% [2438 / 52576, 139 ins, 706 del, 1593 sub ]
[test-other-no_rescore] %WER 10.63% [5562 / 52343, 200 ins, 1975 del, 3387 sub ]

average of epoch 3, 4, 5, 6, 7, and 8

[test-clean-no_rescore] %WER 4.81% [2527 / 52576, 150 ins, 710 del, 1667 sub ]
[test-other-no_rescore] %WER 11.21% [5867 / 52343, 218 ins, 1992 del, 3657 sub ]

average of epoch 2, 3, 4, 5, 6, 7, and 8

[test-clean-no_rescore] %WER 6.02% [3165 / 52576, 149 ins, 984 del, 2032 sub ]
[test-other-no_rescore] %WER 14.58% [7634 / 52343, 262 ins, 2751 del, 4621 sub ]

@danpovey danpovey merged commit cf8d762 into k2-fsa:master Jul 31, 2021
@danpovey
Copy link
Collaborator

We'll address the remaining comments from @pzelasko later on.

@danpovey
Copy link
Collaborator

Great!
BTW, perhaps we could put shared executable scripts like parse_options.sh in shared/, which can be a soft link, rather
than local/.

@csukuangfj
Copy link
Collaborator Author

Where should the shared directory be, icefall/shared?

@danpovey
Copy link
Collaborator

Sure, that sounds OK.
.. also I think it might be a good idea, in our data-directory hierarchy, to make a very clear distinction between data that might be written to by local scripts, and data that is simply downloaded from elsewhere. I'd like to make it possibe to set the data-download dir to some common (possibly-writable) directory so that if multiple people share it, it will just do the appropriate caching.

@csukuangfj
Copy link
Collaborator Author

.. also I think it might be a good idea, in our data-directory hierarchy, to make a very clear distinction between data that might be written to by local scripts, and data that is simply downloaded from elsewhere. I

I agree. Will do it.

@pzelasko
Copy link
Collaborator

pzelasko commented Jul 31, 2021 via email

@csukuangfj
Copy link
Collaborator Author

Cool! Nice work. I wondered about the choice of num buckets 1000, what was your motivation?

The training options are copied from Liyong's work in
https://github.com/k2-fsa/snowfall/pull/219/files#diff-fab538403388f199d70e4194f70d212991f5f72cad14f8357c4c40d4a2699ad4

@glynpu Maybe Liyong has something to say about this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants