Releases: SeanNaren/deepspeech.pytorch
Release V3
Release of deepspeech.pytorch, where we've moved to Pytorch Lightning!
Previous release checkpoints will not be compatible, as a lot was deprecated and cleaned up for the future. Please use V2.1 if you need compatibility.
- Rely on Pytorch Lightning for training
- Moved to native CTC function, removing warp-ctc
- Refactor model objects, clean up technical debt
- Move towards json structure for manifest files
Pre-Trained models
AN4
Training command:
python train.py +configs=an4
Test Command:
python test.py model.model_path=an4_pretrained_v3.ckpt test_path=data/an4_test_manifest.json
Dataset | WER | CER |
---|---|---|
AN4 test | 9.573 | 5.515 |
Download here.
Librispeech
Training command:
python train.py +configs=librispeech
Test Command:
python test.py model.model_path=librispeech.ckpt test_path=libri_test_clean_manifest.json
python test.py model.model_path=librispeech.ckpt test_path=libri_test_other_manifest.json
Dataset | WER | CER |
---|---|---|
Librispeech clean | 10.463 | 3.399 |
Librispeech other | 28.285 | 12.036 |
With 3-Gram ARPA LM with tuned alpha/beta values (alpha=1.97, beta=4.36, beam-width=1024)
Test Command:
python test.py model.model_path=librispeech.ckpt test_path=data/libri_test_clean_manifest.json lm.decoder_type=beam lm.alpha=1.97 lm.beta=4.36 lm.beam_width=1024 lm.lm_path=3-gram.arpa lm.lm_workers=16
python test.py model.model_path=librispeech.ckpt test_path=data/libri_test_other_manifest.json lm.decoder_type=beam lm.alpha=1.97 lm.beta=4.36 lm.beam_width=1024 lm.lm_path=3-gram.arpa lm.lm_workers=16
Dataset | WER | CER |
---|---|---|
Librispeech clean | 7.062 | 2.984 |
Librispeech other | 19.984 | 11.178 |
Download here.
TEDLIUM
Training command:
python train.py +configs=tedlium
Test Command:
python test.py model.model_path=ted_pretrained_v3.ckpt test_path=ted_test_manifest.json
Dataset | WER | CER |
---|---|---|
Ted test | 28.056 | 10.548 |
Download here.
Release V2.1
This release represents the last release before the PyTorch Lightning Integration. This is important in case anyone would like to use the old code base before we pivot to Lightning.
AN4
Training command:
python train.py --rnn-type lstm --hidden-size 1024 --hidden-layers 5 --train-manifest data/an4_train_manifest.csv --val-manifest data/an4_val_manifest.csv --epochs 70 --num-workers 16 --cuda --learning-anneal 1.01 --batch-size 32 --no-sortaGrad --visdom --opt-level O1 --loss-scale 1 --id an4 --checkpoint --save-folder deepspeech.pytorch/an4/ --model-path deepspeech.pytorch/an4/deepspeech_final.pth
Test Command:
python test.py --model-path an4_pretrained_v2.pth --test-manifest data/an4_val_manifest.csv --cuda --half
Dataset | WER | CER |
---|---|---|
AN4 test | 10.349 | 7.076 |
Download here.
Librispeech
Training command:
python train.py --rnn-type lstm --hidden-size 1024 --hidden-layers 5 --train-manifest data/libri_train_manifest.csv --val-manifest data/libri_val_manifest.csv --epochs 60 --num-workers 16 --cuda --learning-anneal 1.01 --batch-size 64 --no-sortaGrad --visdom --opt-level O1 --loss-scale 1 --id libri --checkpoint --save-folder deepspeech.pytorch/librispeech/ --model-path deepspeech.pytorch/librispeech/deepspeech_final.pth
Test Command:
python test.py --model-path librispeech_pretrained_v2.pth --test-manifest data/libri_test_clean.csv --cuda --half
python test.py --model-path librispeech_pretrained_v2.pth --test-manifest data/libri_test_other.csv --cuda --half
Dataset | WER | CER |
---|---|---|
Librispeech clean | 9.919 | 3.307 |
Librispeech other | 28.116 | 12.040 |
With 3-Gram ARPA LM with tuned alpha/beta values (alpha=1.97, beta=4.36, beam-width=1024)
Test Command:
python test.py --test-manifest libri_test_clean.csv --lm-path 3-gram.pruned.3e-7.arpa --decoder beam --alpha 1.97 --beta 4.36 --model-path librispeech_pretrained_v2.pth --lm-workers 8 --num-workers 16 --cuda --half --beam-width 1024
python test.py --test-manifest libri_test_other.csv --lm-path 3-gram.pruned.3e-7.arpa --decoder beam --alpha 1.97 --beta 4.36 --model-path librispeech_pretrained_v2.pth --lm-workers 8 --num-workers 16 --cuda --half --beam-width 1024
Dataset | WER | CER |
---|---|---|
Librispeech clean | 6.654 | 2.705 |
Librispeech other | 19.889 | 10.467 |
Download here.
TEDLIUM
Training command:
python train.py --rnn-type lstm --hidden-size 1024 --hidden-layers 5 --train-manifest data/ted_train_manifest.csv --val-manifest data/ted_val_manifest.csv --epochs 60 --num-workers 16 --cuda --learning-anneal 1.01 --batch-size 64 --no-sortaGrad --visdom --opt-level O1 --loss-scale 1 --id ted --checkpoint --save-folder deepspeech.pytorch/tedlium/ --model-path deepspeech.pytorch/tedlium/deepspeech_final.pth
Test Command:
python test.py --model-path ted_pretrained_v2.pth --test-manifest data/ted_test_manifest.csv --cuda --half
Dataset | WER | CER |
---|---|---|
Ted test | 30.886 | 11.196 |
Download here.
Release V2
Supplied are a set of pre-trained networks that can be used for evaluation on academic datasets. Do not expect these models to perform well on your own data! They are heavily tuned to the datasets they are trained on.
Most results are given using 'greedy decoding', with the addition of WER/CER for LibriSpeech using a LM. Expect a well trained language model to reduce WER/CER substantially.
Improvements:
- Remove TorchAudio and use Scipy when loading audio based on speed comparisons and ease of installation
- Improved implementation of Nvidia Apex to make mixed precision training easier to use
- New pre-trained models using mixed-precision
- Documentation and improvements on how to tune and use librispeech LMs, and results based with the 3-gram model
- Evaluation fixes for fairer comparison
Commit Hash used for training and testing.
AN4
Training command:
python train.py --rnn-type lstm --hidden-size 1024 --hidden-layers 5 --train-manifest data/an4_train_manifest.csv --val-manifest data/an4_val_manifest.csv --epochs 70 --num-workers 16 --cuda --learning-anneal 1.01 --batch-size 32 --no-sortaGrad --visdom --opt-level O1 --loss-scale 1 --id an4 --checkpoint --save-folder deepspeech.pytorch/an4/ --model-path deepspeech.pytorch/an4/deepspeech_final.pth
Test Command:
python test.py --model-path an4_pretrained_v2.pth --test-manifest data/an4_val_manifest.csv --cuda --half
Dataset | WER | CER |
---|---|---|
AN4 test | 10.349 | 7.076 |
Download here.
Librispeech
Training command:
python train.py --rnn-type lstm --hidden-size 1024 --hidden-layers 5 --train-manifest data/libri_train_manifest.csv --val-manifest data/libri_val_manifest.csv --epochs 60 --num-workers 16 --cuda --learning-anneal 1.01 --batch-size 64 --no-sortaGrad --visdom --opt-level O1 --loss-scale 1 --id libri --checkpoint --save-folder deepspeech.pytorch/librispeech/ --model-path deepspeech.pytorch/librispeech/deepspeech_final.pth
Test Command:
python test.py --model-path librispeech_pretrained_v2.pth --test-manifest data/libri_test_clean.csv --cuda --half
python test.py --model-path librispeech_pretrained_v2.pth --test-manifest data/libri_test_other.csv --cuda --half
Dataset | WER | CER |
---|---|---|
Librispeech clean | 9.919 | 3.307 |
Librispeech other | 28.116 | 12.040 |
With 3-Gram ARPA LM with tuned alpha/beta values (alpha=1.97, beta=4.36, beam-width=1024)
Test Command:
python test.py --test-manifest libri_test_clean.csv --lm-path 3-gram.pruned.3e-7.arpa --decoder beam --alpha 1.97 --beta 4.36 --model-path librispeech_pretrained_v2.pth --lm-workers 8 --num-workers 16 --cuda --half --beam-width 1024
python test.py --test-manifest libri_test_other.csv --lm-path 3-gram.pruned.3e-7.arpa --decoder beam --alpha 1.97 --beta 4.36 --model-path librispeech_pretrained_v2.pth --lm-workers 8 --num-workers 16 --cuda --half --beam-width 1024
Dataset | WER | CER |
---|---|---|
Librispeech clean | 6.654 | 2.705 |
Librispeech other | 19.889 | 10.467 |
Download here.
TEDLIUM
Training command:
python train.py --rnn-type lstm --hidden-size 1024 --hidden-layers 5 --train-manifest data/ted_train_manifest.csv --val-manifest data/ted_val_manifest.csv --epochs 60 --num-workers 16 --cuda --learning-anneal 1.01 --batch-size 64 --no-sortaGrad --visdom --opt-level O1 --loss-scale 1 --id ted --checkpoint --save-folder deepspeech.pytorch/tedlium/ --model-path deepspeech.pytorch/tedlium/deepspeech_final.pth
Test Command:
python test.py --model-path ted_pretrained_v2.pth --test-manifest data/ted_test_manifest.csv --cuda --half
Dataset | WER | CER |
---|---|---|
Ted test | 30.886 | 11.196 |
Download here.
Release v1.2
This release is functionally identical to the previous but includes various bugfixes. The previously released models are still compatible. Performance of the pretrained models:
Dataset | WER | CER |
---|---|---|
AN4 test | 9.573 | 3.977 |
Librispeech test clean | 10.239 | 2.765 |
Librispeech test other | 28.008 | 9.791 |
Pre-trained models V2
Supplied are a set of pre-trained networks that can be used for evaluation. Do not expect these models to perform well on your own data! They are heavily tuned to the datasets they are trained on.
Results are given using greedy decoding. Expect a well trained language model to reduce WER/CER substantially.
These models should work with later versions of deepspeech.pytorch. A note to consider is that parameters have changed from underscores to dashes (i.e --rnn_type
is now --rnn-type
).
AN4
Commit hash: e2c2d832357a992f36e68b5f378c117dd270d6ff
Training command:
python train.py --rnn_type gru --hidden_size 800 --hidden_layers 5 --checkpoint --train_manifest data/an4_train_manifest.csv --val_manifest data/an4_val_manifest.csv --epochs 100 --num_workers $(nproc) --cuda --batch_size 32 --learning_anneal 1.01 --augment
Dataset | WER | CER |
---|---|---|
AN4 test | 10.58 | 4.88 |
Download here.
Librispeech
Commit hash: e2c2d832357a992f36e68b5f378c117dd270d6ff
Training command:
python train.py --rnn_type gru --hidden_size 800 --hidden_layers 5 --checkpoint --visdom --train_manifest data/libri_train_manifest.csv --val_manifest data/libri_val_manifest.csv --epochs 15 --num_workers $(nproc) --cuda --checkpoint --batch_size 10 --learning_anneal 1.1
Dataset | WER | CER |
---|---|---|
Librispeech clean | 11.27 | 3.09 |
Librispeech other | 30.74 | 10.97 |
Download here.
TEDLIUM
Commit hash: e2c2d832357a992f36e68b5f378c117dd270d6ff
Training command:
python train.py --rnn_type gru --hidden_size 800 --hidden_layers 5 --checkpoint --visdom --train_manifest data/ted_train_manifest.csv --val_manifest data/ted_val_manifest.csv --epochs 15 --num_workers $(nproc) --cuda --checkpoint --batch_size 10 --learning_anneal 1.1
Dataset | WER | CER |
---|---|---|
Ted test | 31.04 | 10.00 |
Download here.
Pre-trained models
Supplied are a set of pre-trained networks that can be used for evaluation. Do not expect these models to perform well on your own data! They are heavily tuned to the datasets they are trained on.
Results are given using greedy decoding. Expect a well trained language model to reduce WER/CER substantially.
AN4
Download here.
Dataset | WER | CER |
---|---|---|
AN4 test | 10.52 | 4.78 |
LibriSpeech
Download here.
Dataset | WER | CER |
---|---|---|
Librispeech clean | 11.20 | 3.36 |
Librispeech other | 31.31 | 12.29 |
TEDLIUM
Download here.
Dataset | WER | CER |
---|---|---|
TED test | 34.01 | 13.14 |