Skip to content

Commit

Permalink
add the results of gemini-df-resnet
Browse files Browse the repository at this point in the history
  • Loading branch information
wsstriving committed Apr 25, 2024
1 parent f163eda commit 98cf3d6
Show file tree
Hide file tree
Showing 5 changed files with 104 additions and 6 deletions.
9 changes: 6 additions & 3 deletions examples/cnceleb/v3_finetune/README.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,9 @@
## Fine-tuning Results Based on DINO

* Setup: fbank80, num_frms200, epoch75 (pretrain), epoch50 (finetune), ArcMargin, aug_prob0.6, speed_perturb (no spec_aug)
* [Pre-trained ECAPA-TDNN checkpoints](https://drive.google.com/drive/folders/1XDIUjnKPrvJE5auBWT5CcE4mqcglCwzq?usp=drive_link): teacher models extracted from `model_75.pt` (please refer to `wespeaker/ssl/bin/average_dino_model.py` for information on the extraction process)
* Setup: fbank80, num_frms200, epoch50 (finetune), ArcMargin, aug_prob0.6, speed_perturb (no spec_aug)
* test_trials: CNC-Eval-Avg.lst
* These results are obtained by pretraining on different datasets and then finetuning with CNCeleb.


| Model | Params | FLOPs | Pretraining Data | LM | AS-Norm | EER (%) | minDCF (p=0.01) |
| :------------------------------ | :-----: | :-----: | :--------------------: | :-: | :-------: | :-------: | :--------------: |
| ECAPA_TDNN_GLOB_c1024-ASTP-emb192 | 14.65M | 2.65 G | CNCeleb | × | × | 8.217 | 0.439 |
Expand All @@ -20,3 +18,8 @@
* 🔥 UPDATE 2024.03: We support finetuning DINO-based self-supervised models, which is trained on the WenetSpeech dataset. Pretrained Paper related to the finetuning results:
* [WenetSpeech: A 10000+ Hours Multi-domain Mandarin Corpus for Speech Recognition](https://arxiv.org/pdf/2110.03370.pdf)
* [Leveraging In-the-wild Data for Effective Self-supervised Pretraining in Speaker Recognition](https://arxiv.org/pdf/2309.11730.pdf)

## Resources
* [Pre-trained ECAPA-TDNN checkpoints](https://drive.google.com/drive/folders/1XDIUjnKPrvJE5auBWT5CcE4mqcglCwzq?usp=drive_link)
* [The filtering metadata for wenetspeech](https://drive.google.com/file/d/1UaGuyT1wcKc5g9vRdfIBvLoDRcuOxBlX/view?usp=drive_link)

4 changes: 4 additions & 0 deletions examples/voxceleb/v2/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,10 @@
| | | ||| 0.744 | 0.896 | 1.603 |
| Res2Net34_Base | 4.68M | 1.77G | × | × | 1.351 | 1.347 | 2.478 |
| | | | × || 1.234 | 1.232 | 2.162 |
| Gemini_DFResNet114 | 6.53M | 5.42G | × | × | 0.787 | 0.963 | 1.760 |
| | | | × || 0.707 | 0.889 | 1.546 |
| | | || x | 0.771 | 0.906 | 1.599 |
| | | ||| 0.638 | 0.839 | 1.427 |


## PLDA results
Expand Down
4 changes: 2 additions & 2 deletions examples/voxceleb/v2/conf/gemini_dfresnet_adam.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
### train configuraton

exp_dir: exp/Gemini_DF_ResNet60-TSTP-emb256-fbank80-num_frms200-aug0.6-spTrue-saFalse-ArcMargin-SGD-epoch150
exp_dir: exp/Gemini_DF_ResNet114-TSTP-emb256-fbank80-num_frms200-aug0.6-spTrue-saFalse-ArcMargin-AdamW-epoch165
gpus: "[0,1]"
num_avg: 2
enable_amp: False # whether enable automatic mixed precision training
Expand Down Expand Up @@ -45,7 +45,7 @@ dataset_args:
max_f: 8
prob: 0.6

model: Gemini_DF_ResNet60 # Gemini_DF_ResNet60 Gemini_DF_ResNet114 GemGemini_DF_ResNet183 Gemini_DF_ResNet237
model: Gemini_DF_ResNet114 # Gemini_DF_ResNet60 Gemini_DF_ResNet114 GemGemini_DF_ResNet183 Gemini_DF_ResNet237
model_init: null
model_args:
feat_dim: 80
Expand Down
91 changes: 91 additions & 0 deletions examples/voxceleb/v2/conf/gemini_dfresnet_sgd_lm.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
### Large margin fine-tuning configuration
#
# The large margin fine-tuning operation is often used in speaker
# verification challenge system to further improve the performance.
# In this fine-tuning stage, large margin and longer segment will
# be used.

exp_dir: exp/Gemini_DF_ResNet114-TSTP-emb256-fbank80-num_frms200-aug0.6-spTrue-saFalse-ArcMargin-AdamW-epoch165-LM
gpus: "[0,1]"
num_avg: 1
enable_amp: False # whether enable automatic mixed precision training
do_lm: True

seed: 42
num_epochs: 5
save_epoch_interval: 1 # save model per epoch
log_batch_interval: 100 # log every 100 batchs

dataloader_args:
batch_size: 32
num_workers: 8
pin_memory: False
prefetch_factor: 8
drop_last: True

dataset_args:
# the sample number which will be traversed within one epoch, if the value equals to 0,
# the utterance number in the dataset will be used as the sample_num_per_epoch.
sample_num_per_epoch: 0
shuffle: True
shuffle_args:
shuffle_size: 2500
filter: True
filter_args:
min_num_frames: 100
max_num_frames: 800
resample_rate: 16000
speed_perturb: True
num_frms: 600
aug_prob: 0.6 # prob to add reverb & noise aug per sample
fbank_args:
num_mel_bins: 80
frame_shift: 10
frame_length: 25
dither: 1.0
spec_aug: False
spec_aug_args:
num_t_mask: 1
num_f_mask: 1
max_t: 10
max_f: 8
prob: 0.6

model: Gemini_DF_ResNet114 # ResNet18, ResNet34, ResNet50, ResNet101, ResNet152
model_init: null
model_args:
feat_dim: 80
embed_dim: 256
pooling_func: "TSTP" # TSTP, ASTP, MQMHASTP
two_emb_layer: False
projection_args:
project_type: "arc_margin" # add_margin, arc_margin, sphere, softmax, arc_margin_intertopk_subcenter
scale: 32.0
easy_margin: False

margin_scheduler: MarginScheduler
margin_update:
initial_margin: 0.5
final_margin: 0.5
increase_start_epoch: 1
fix_start_epoch: 1
update_margin: True
increase_type: "exp" # exp, linear

loss: CrossEntropyLoss
loss_args: {}

optimizer: SGD
optimizer_args:
momentum: 0.9
nesterov: True
weight_decay: 0.0001

scheduler: ExponentialDecrease
scheduler_args:
initial_lr: 1.0e-4
final_lr: 2.5e-5
warm_up_epoch: 1
warm_from_zero: True


2 changes: 1 addition & 1 deletion wespeaker/models/gemini_dfresnet.py
Original file line number Diff line number Diff line change
Expand Up @@ -165,7 +165,7 @@ def Gemini_DF_ResNet237(feat_dim, embed_dim, pooling_func='TSTP', two_emb_layer=

if __name__ == '__main__':
x = torch.zeros(1, 200, 80)
model = Gemini_DF_ResNet183(80, 256, 'TSTP')
model = Gemini_DF_ResNet114(80, 256, 'TSTP')
model.eval()
out = model(x)
print(out[-1].size())
Expand Down

0 comments on commit 98cf3d6

Please sign in to comment.