-
Notifications
You must be signed in to change notification settings - Fork 111
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[ctc] KWS with CTCloss training and CTC prefix beam search detection. (…
…#135) * add ctcloss training scripts. * update compute_det_ctc * fix typo. * add fsmn model, can use pretrained kws model from modelscope. * Add streaming detection of CTC model. Add CTC model onnx export. Add CTC model's result in README; For now CTC model runtime is not supported yet. * QA run.sh, maxpooling training scripts is compatible. Ready to PR. * Add a streaming kws demo, support fsmn online forward * fix typo. * Align Stream FSMN and Non-Stream FSMN, both in feature extraction and model forward. * fix repeat activation, add a interval restrict. * fix timestamp when subsampling!=1. * fix flake8, update training script and README, give pretrained ckpt. * fix quickcheck and flake8 * Add realtime CTC-KWS demo in README. --------- Co-authored-by: dujing <[email protected]>
- Loading branch information
Showing
22 changed files
with
3,328 additions
and
19 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,50 @@ | ||
dataset_conf: | ||
filter_conf: | ||
max_length: 2048 | ||
min_length: 0 | ||
resample_conf: | ||
resample_rate: 16000 | ||
speed_perturb: false | ||
feature_extraction_conf: | ||
feature_type: 'fbank' | ||
num_mel_bins: 40 | ||
frame_shift: 10 | ||
frame_length: 25 | ||
dither: 1.0 | ||
spec_aug: true | ||
spec_aug_conf: | ||
num_t_mask: 1 | ||
num_f_mask: 1 | ||
max_t: 20 | ||
max_f: 10 | ||
shuffle: true | ||
shuffle_conf: | ||
shuffle_size: 1500 | ||
batch_conf: | ||
batch_size: 256 | ||
|
||
model: | ||
hidden_dim: 256 | ||
preprocessing: | ||
type: linear | ||
backbone: | ||
type: tcn | ||
ds: true | ||
num_layers: 4 | ||
kernel_size: 8 | ||
dropout: 0.1 | ||
activation: | ||
type: identity | ||
|
||
|
||
optim: adam | ||
optim_conf: | ||
lr: 0.001 | ||
weight_decay: 0.0001 | ||
|
||
training_config: | ||
grad_clip: 5 | ||
max_epoch: 80 | ||
log_interval: 10 | ||
criterion: ctc | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,50 @@ | ||
dataset_conf: | ||
filter_conf: | ||
max_length: 2048 | ||
min_length: 0 | ||
resample_conf: | ||
resample_rate: 16000 | ||
speed_perturb: false | ||
feature_extraction_conf: | ||
feature_type: 'fbank' | ||
num_mel_bins: 40 | ||
frame_shift: 10 | ||
frame_length: 25 | ||
dither: 1.0 | ||
spec_aug: true | ||
spec_aug_conf: | ||
num_t_mask: 1 | ||
num_f_mask: 1 | ||
max_t: 20 | ||
max_f: 10 | ||
shuffle: true | ||
shuffle_conf: | ||
shuffle_size: 1500 | ||
batch_conf: | ||
batch_size: 200 | ||
|
||
model: | ||
hidden_dim: 256 | ||
preprocessing: | ||
type: linear | ||
backbone: | ||
type: tcn | ||
ds: true | ||
num_layers: 4 | ||
kernel_size: 8 | ||
dropout: 0.1 | ||
activation: | ||
type: identity | ||
|
||
|
||
optim: adam | ||
optim_conf: | ||
lr: 0.001 | ||
weight_decay: 0.0001 | ||
|
||
training_config: | ||
grad_clip: 5 | ||
max_epoch: 50 | ||
log_interval: 100 | ||
criterion: ctc | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,64 @@ | ||
dataset_conf: | ||
filter_conf: | ||
max_length: 2048 | ||
min_length: 0 | ||
resample_conf: | ||
resample_rate: 16000 | ||
speed_perturb: false | ||
feature_extraction_conf: | ||
feature_type: 'fbank' | ||
num_mel_bins: 80 | ||
frame_shift: 10 | ||
frame_length: 25 | ||
dither: 1. | ||
context_expansion: true | ||
context_expansion_conf: | ||
left: 2 | ||
right: 2 | ||
frame_skip: 3 | ||
spec_aug: true | ||
spec_aug_conf: | ||
num_t_mask: 1 | ||
num_f_mask: 1 | ||
max_t: 20 | ||
max_f: 10 | ||
shuffle: true | ||
shuffle_conf: | ||
shuffle_size: 1500 | ||
batch_conf: | ||
batch_size: 256 | ||
|
||
model: | ||
input_dim: 400 | ||
preprocessing: | ||
type: none | ||
hidden_dim: 128 | ||
backbone: | ||
type: fsmn | ||
input_affine_dim: 140 | ||
num_layers: 4 | ||
linear_dim: 250 | ||
proj_dim: 128 | ||
left_order: 10 | ||
right_order: 2 | ||
left_stride: 1 | ||
right_stride: 1 | ||
output_affine_dim: 140 | ||
classifier: | ||
type: identity | ||
dropout: 0.1 | ||
activation: | ||
type: identity | ||
|
||
|
||
optim: adam | ||
optim_conf: | ||
lr: 0.001 | ||
weight_decay: 0.0001 | ||
|
||
training_config: | ||
grad_clip: 5 | ||
max_epoch: 80 | ||
log_interval: 10 | ||
criterion: ctc | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.