Releases · SeanNaren/deepspeech.pytorch

30 Jan 14:07

709df90

Release V3 Latest

Latest

Release of deepspeech.pytorch, where we've moved to Pytorch Lightning!

Previous release checkpoints will not be compatible, as a lot was deprecated and cleaned up for the future. Please use V2.1 if you need compatibility.

Rely on Pytorch Lightning for training
Moved to native CTC function, removing warp-ctc
Refactor model objects, clean up technical debt
Move towards json structure for manifest files

Pre-Trained models

AN4

Training command:

python train.py +configs=an4

Test Command:

python test.py model.model_path=an4_pretrained_v3.ckpt test_path=data/an4_test_manifest.json

Dataset	WER	CER
AN4 test	9.573	5.515

Download here.

Librispeech

Training command:

python train.py +configs=librispeech

Test Command:

python test.py model.model_path=librispeech.ckpt test_path=libri_test_clean_manifest.json
python test.py model.model_path=librispeech.ckpt test_path=libri_test_other_manifest.json

Dataset	WER	CER
Librispeech clean	10.463	3.399
Librispeech other	28.285	12.036

With 3-Gram ARPA LM with tuned alpha/beta values (alpha=1.97, beta=4.36, beam-width=1024)

Test Command:

python test.py model.model_path=librispeech.ckpt test_path=data/libri_test_clean_manifest.json lm.decoder_type=beam lm.alpha=1.97 lm.beta=4.36  lm.beam_width=1024 lm.lm_path=3-gram.arpa lm.lm_workers=16
python test.py model.model_path=librispeech.ckpt test_path=data/libri_test_other_manifest.json lm.decoder_type=beam lm.alpha=1.97 lm.beta=4.36  lm.beam_width=1024 lm.lm_path=3-gram.arpa lm.lm_workers=16

Dataset	WER	CER
Librispeech clean	7.062	2.984
Librispeech other	19.984	11.178

Download here.

TEDLIUM

Training command:

python train.py +configs=tedlium

Test Command:

python test.py model.model_path=ted_pretrained_v3.ckpt test_path=ted_test_manifest.json

Dataset	WER	CER
Ted test	28.056	10.548

Download here.

Assets 6

29 Jan 17:21

SeanNaren

V2.1

4cb209a

Release V2.1

This release represents the last release before the PyTorch Lightning Integration. This is important in case anyone would like to use the old code base before we pivot to Lightning.

AN4

Training command:

python train.py --rnn-type lstm --hidden-size 1024 --hidden-layers 5  --train-manifest data/an4_train_manifest.csv --val-manifest data/an4_val_manifest.csv --epochs 70 --num-workers 16 --cuda  --learning-anneal 1.01 --batch-size 32 --no-sortaGrad --visdom  --opt-level O1 --loss-scale 1 --id an4 --checkpoint --save-folder deepspeech.pytorch/an4/ --model-path deepspeech.pytorch/an4/deepspeech_final.pth

Test Command:

python test.py --model-path an4_pretrained_v2.pth --test-manifest data/an4_val_manifest.csv --cuda --half

Dataset	WER	CER
AN4 test	10.349	7.076

Download here.

Librispeech

Training command:

python train.py --rnn-type lstm --hidden-size 1024 --hidden-layers 5  --train-manifest data/libri_train_manifest.csv --val-manifest data/libri_val_manifest.csv --epochs 60 --num-workers 16 --cuda  --learning-anneal 1.01 --batch-size 64 --no-sortaGrad --visdom  --opt-level O1 --loss-scale 1 --id libri --checkpoint --save-folder deepspeech.pytorch/librispeech/ --model-path deepspeech.pytorch/librispeech/deepspeech_final.pth

Test Command:

python test.py --model-path librispeech_pretrained_v2.pth --test-manifest data/libri_test_clean.csv --cuda --half
python test.py --model-path librispeech_pretrained_v2.pth --test-manifest data/libri_test_other.csv --cuda --half

Dataset	WER	CER
Librispeech clean	9.919	3.307
Librispeech other	28.116	12.040

With 3-Gram ARPA LM with tuned alpha/beta values (alpha=1.97, beta=4.36, beam-width=1024)

Test Command:

python test.py --test-manifest libri_test_clean.csv --lm-path 3-gram.pruned.3e-7.arpa --decoder beam --alpha 1.97 --beta 4.36 --model-path librispeech_pretrained_v2.pth --lm-workers 8 --num-workers 16 --cuda --half --beam-width 1024
python test.py --test-manifest libri_test_other.csv --lm-path 3-gram.pruned.3e-7.arpa --decoder beam --alpha 1.97 --beta 4.36 --model-path librispeech_pretrained_v2.pth --lm-workers 8 --num-workers 16 --cuda --half --beam-width 1024

Dataset	WER	CER
Librispeech clean	6.654	2.705
Librispeech other	19.889	10.467

Download here.

TEDLIUM

Training command:

python train.py --rnn-type lstm --hidden-size 1024 --hidden-layers 5  --train-manifest data/ted_train_manifest.csv --val-manifest data/ted_val_manifest.csv --epochs 60 --num-workers 16 --cuda  --learning-anneal 1.01 --batch-size 64 --no-sortaGrad --visdom  --opt-level O1 --loss-scale 1 --id ted --checkpoint --save-folder deepspeech.pytorch/tedlium/ --model-path deepspeech.pytorch/tedlium/deepspeech_final.pth

Test Command:

python test.py --model-path ted_pretrained_v2.pth --test-manifest data/ted_test_manifest.csv --cuda --half

Dataset	WER	CER
Ted test	30.886	11.196

Download here.

Assets 2

01 Oct 09:36

SeanNaren

v2.0

9b9c96a

Release V2

Supplied are a set of pre-trained networks that can be used for evaluation on academic datasets. Do not expect these models to perform well on your own data! They are heavily tuned to the datasets they are trained on.

Most results are given using 'greedy decoding', with the addition of WER/CER for LibriSpeech using a LM. Expect a well trained language model to reduce WER/CER substantially.

Improvements:

Remove TorchAudio and use Scipy when loading audio based on speed comparisons and ease of installation
Improved implementation of Nvidia Apex to make mixed precision training easier to use
New pre-trained models using mixed-precision
Documentation and improvements on how to tune and use librispeech LMs, and results based with the 3-gram model
Evaluation fixes for fairer comparison

Commit Hash used for training and testing.

AN4

Training command:

python train.py --rnn-type lstm --hidden-size 1024 --hidden-layers 5  --train-manifest data/an4_train_manifest.csv --val-manifest data/an4_val_manifest.csv --epochs 70 --num-workers 16 --cuda  --learning-anneal 1.01 --batch-size 32 --no-sortaGrad --visdom  --opt-level O1 --loss-scale 1 --id an4 --checkpoint --save-folder deepspeech.pytorch/an4/ --model-path deepspeech.pytorch/an4/deepspeech_final.pth

Test Command:

python test.py --model-path an4_pretrained_v2.pth --test-manifest data/an4_val_manifest.csv --cuda --half

Dataset	WER	CER
AN4 test	10.349	7.076

Download here.

Librispeech

Training command:

python train.py --rnn-type lstm --hidden-size 1024 --hidden-layers 5  --train-manifest data/libri_train_manifest.csv --val-manifest data/libri_val_manifest.csv --epochs 60 --num-workers 16 --cuda  --learning-anneal 1.01 --batch-size 64 --no-sortaGrad --visdom  --opt-level O1 --loss-scale 1 --id libri --checkpoint --save-folder deepspeech.pytorch/librispeech/ --model-path deepspeech.pytorch/librispeech/deepspeech_final.pth

Test Command:

python test.py --model-path librispeech_pretrained_v2.pth --test-manifest data/libri_test_clean.csv --cuda --half
python test.py --model-path librispeech_pretrained_v2.pth --test-manifest data/libri_test_other.csv --cuda --half

Dataset	WER	CER
Librispeech clean	9.919	3.307
Librispeech other	28.116	12.040

With 3-Gram ARPA LM with tuned alpha/beta values (alpha=1.97, beta=4.36, beam-width=1024)

Test Command:

python test.py --test-manifest libri_test_clean.csv --lm-path 3-gram.pruned.3e-7.arpa --decoder beam --alpha 1.97 --beta 4.36 --model-path librispeech_pretrained_v2.pth --lm-workers 8 --num-workers 16 --cuda --half --beam-width 1024
python test.py --test-manifest libri_test_other.csv --lm-path 3-gram.pruned.3e-7.arpa --decoder beam --alpha 1.97 --beta 4.36 --model-path librispeech_pretrained_v2.pth --lm-workers 8 --num-workers 16 --cuda --half --beam-width 1024

Dataset	WER	CER
Librispeech clean	6.654	2.705
Librispeech other	19.889	10.467

Download here.

TEDLIUM

Training command:

python train.py --rnn-type lstm --hidden-size 1024 --hidden-layers 5  --train-manifest data/ted_train_manifest.csv --val-manifest data/ted_val_manifest.csv --epochs 60 --num-workers 16 --cuda  --learning-anneal 1.01 --batch-size 64 --no-sortaGrad --visdom  --opt-level O1 --loss-scale 1 --id ted --checkpoint --save-folder deepspeech.pytorch/tedlium/ --model-path deepspeech.pytorch/tedlium/deepspeech_final.pth

Test Command:

python test.py --model-path ted_pretrained_v2.pth --test-manifest data/ted_test_manifest.csv --cuda --half

Dataset	WER	CER
Ted test	30.886	11.196

Download here.

Assets 5

20 Apr 13:55

ryanleary

v1.2

99fee14

Release v1.2

This release is functionally identical to the previous but includes various bugfixes. The previously released models are still compatible. Performance of the pretrained models:

Dataset	WER	CER
AN4 test	9.573	3.977
Librispeech test clean	10.239	2.765
Librispeech test other	28.008	9.791

Assets 2

12 Jan 18:42

SeanNaren

v1.1

e2c2d83

Pre-trained models V2

Supplied are a set of pre-trained networks that can be used for evaluation. Do not expect these models to perform well on your own data! They are heavily tuned to the datasets they are trained on.

Results are given using greedy decoding. Expect a well trained language model to reduce WER/CER substantially.

These models should work with later versions of deepspeech.pytorch. A note to consider is that parameters have changed from underscores to dashes (i.e --rnn_type is now --rnn-type).

AN4

Commit hash: e2c2d832357a992f36e68b5f378c117dd270d6ff

Training command:

python train.py  --rnn_type gru --hidden_size 800 --hidden_layers 5 --checkpoint --train_manifest data/an4_train_manifest.csv --val_manifest data/an4_val_manifest.csv --epochs 100 --num_workers $(nproc) --cuda --batch_size 32 --learning_anneal 1.01 --augment

Dataset	WER	CER
AN4 test	10.58	4.88

Download here.

Librispeech

Commit hash: e2c2d832357a992f36e68b5f378c117dd270d6ff

Training command:

python train.py  --rnn_type gru --hidden_size 800 --hidden_layers 5 --checkpoint --visdom --train_manifest data/libri_train_manifest.csv --val_manifest data/libri_val_manifest.csv --epochs 15 --num_workers $(nproc) --cuda --checkpoint --batch_size 10 --learning_anneal 1.1

Dataset	WER	CER
Librispeech clean	11.27	3.09
Librispeech other	30.74	10.97

Download here.

TEDLIUM

Commit hash: e2c2d832357a992f36e68b5f378c117dd270d6ff

Training command:

python train.py  --rnn_type gru --hidden_size 800 --hidden_layers 5 --checkpoint --visdom --train_manifest data/ted_train_manifest.csv --val_manifest data/ted_val_manifest.csv --epochs 15 --num_workers $(nproc) --cuda --checkpoint --batch_size 10 --learning_anneal 1.1

Dataset	WER	CER
Ted test	31.04	10.00

Download here.

Assets 5

24 Aug 13:27

SeanNaren

v1.0

fc8efd8

Pre-trained models

Supplied are a set of pre-trained networks that can be used for evaluation. Do not expect these models to perform well on your own data! They are heavily tuned to the datasets they are trained on.

Results are given using greedy decoding. Expect a well trained language model to reduce WER/CER substantially.

AN4

Download here.

Dataset	WER	CER
AN4 test	10.52	4.78

LibriSpeech

Download here.

Dataset	WER	CER
Librispeech clean	11.20	3.36
Librispeech other	31.31	12.29

TEDLIUM

Download here.

Dataset	WER	CER
TED test	34.01	13.14

Assets 5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pre-Trained models

AN4

Librispeech

With 3-Gram ARPA LM with tuned alpha/beta values (alpha=1.97, beta=4.36, beam-width=1024)

TEDLIUM

AN4

Librispeech

With 3-Gram ARPA LM with tuned alpha/beta values (alpha=1.97, beta=4.36, beam-width=1024)

TEDLIUM

AN4

Librispeech

With 3-Gram ARPA LM with tuned alpha/beta values (alpha=1.97, beta=4.36, beam-width=1024)

TEDLIUM

AN4

Librispeech

TEDLIUM

AN4

LibriSpeech

TEDLIUM

Releases: SeanNaren/deepspeech.pytorch

Release V3

Pre-Trained models

AN4

Librispeech

With 3-Gram ARPA LM with tuned alpha/beta values (alpha=1.97, beta=4.36, beam-width=1024)

TEDLIUM

Release V2.1

AN4

Librispeech

With 3-Gram ARPA LM with tuned alpha/beta values (alpha=1.97, beta=4.36, beam-width=1024)

TEDLIUM

Release V2

AN4

Librispeech

With 3-Gram ARPA LM with tuned alpha/beta values (alpha=1.97, beta=4.36, beam-width=1024)

TEDLIUM

Release v1.2

Pre-trained models V2

AN4

Librispeech

TEDLIUM

Pre-trained models

AN4

LibriSpeech

TEDLIUM