diff --git a/docs/README.md b/docs/README.md index d0c8eb27..789d0528 100644 --- a/docs/README.md +++ b/docs/README.md @@ -2,8 +2,9 @@ This directory includes the basic documents for wespeaker, including -- [Tutorial on VoxCeleb (Supervised-VoxCeleb34)](https://github.com/wenet-e2e/wespeaker/blob/master/docs/vox.md) -- [Tutorial on VoxCeleb (Self-supervised-DINO)](https://github.com/wenet-e2e/wespeaker/blob/master/docs/vox_ssl.md) +- [SV Tutorial on VoxCeleb v2 (Supervised)](https://github.com/wenet-e2e/wespeaker/blob/master/docs/vox.md) +- [SV Tutorial on VoxCeleb v3 (Self-Supervised-DINO)](https://github.com/wenet-e2e/wespeaker/blob/master/docs/vox_ssl.md) +- [Diarization Tutorial on VoxConverse v2](https://github.com/wenet-e2e/wespeaker/blob/master/docs/voxconverse_diar.md) - [Suggested papers for speaker embedding learning](https://github.com/wenet-e2e/wespeaker/blob/master/docs/speaker_recognition_papers.md) - [Provided pretrained models](https://github.com/wenet-e2e/wespeaker/blob/master/docs/pretrained.md) - [Off-the-shelf Usages: from Command Line or Python Code](https://github.com/wenet-e2e/wespeaker/blob/master/docs/python_package.md) @@ -13,6 +14,5 @@ This directory includes the basic documents for wespeaker, including ## ToDo List (possible) -- [ ] Diarization Tutorial on Voxconverse - [ ] Chinese HandBooks - [ ] Introduction in Video diff --git a/docs/contribute.md b/docs/contribute.md index 7361158b..552930a1 100644 --- a/docs/contribute.md +++ b/docs/contribute.md @@ -31,7 +31,7 @@ and [Google C++ style guide](https://google.github.io/styleguide/cppguide.html). When submitting a pull request: -1. Make sure your code has been rebased on top of the latest commit on the main branch. +1. Make sure your code has been rebased on top of the latest commit on the master branch. 2. Ensure code is properly formatted. 3. Include a detailed description of the changes in the pull request. Explain why you made the changes you did. diff --git a/docs/index.rst b/docs/index.rst index 391b049d..3f37923a 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -3,19 +3,19 @@ You can adapt this file completely to your liking, but it should at least contain the root `toctree` directive. -Welcome to wespeaker's documentation! +Welcome to Wespeaker's documentation! ===================================== -wespeaker is an research and production oriented Speaker Verification, Recognition and Diarization Toolkit. +Wespeaker is an Research and Production Oriented Speaker Verification, Recognition and Diarization Toolkit. .. toctree:: :maxdepth: 2 :caption: Contents: - ./train.rst - ./runtime.md ./python_package.md + ./train.rst ./pretrained.md + ./runtime.md ./reference.rst ./contribute.md diff --git a/docs/paper.md b/docs/paper.md index 39cf0fb6..6e0e0e69 100644 --- a/docs/paper.md +++ b/docs/paper.md @@ -1,4 +1,4 @@ -# Papers +# Wespeaker Papers * [Wespeaker: A research and production oriented speaker embedding learning toolkit](https://arxiv.org/pdf/2210.17016.pdf), accepted by ICASSP 2023. * [Wespeaker baselines for VoxSRC2023](https://arxiv.org/pdf/2306.15161.pdf) diff --git a/docs/pretrained.md b/docs/pretrained.md index 145e8746..96fbf513 100644 --- a/docs/pretrained.md +++ b/docs/pretrained.md @@ -11,7 +11,7 @@ modeling, such as For users who would like to verify the SV performance or extract speaker embeddings for the above tasks without troubling about training the speaker embedding learner, we provide two types of pretrained models. -1. **Checkpoint Model**, with suffix **.pt**, the model trained and saved as checkpoint by WeNet python code, you can +1. **Checkpoint Model**, with suffix **.pt**, the model trained and saved as checkpoint by WeSpeaker python code, you can reproduce our published result with it, or you can use it as checkpoint to continue. 2. **Runtime Model**, with suffix **.onnx**, the `runtime model` is exported by `Onnxruntime` on the `checkpoint model`. @@ -35,7 +35,7 @@ python wespeaker/bin/infer_onnx.py --onnx_path $onnx_path --wav_path $wav_path ``` You can easily adapt `infer_onnx.py` to your application, a speaker diarization example can be found -in [the voxconverse recipe](https://github.com/wenet-e2e/wespeaker/tree/master/examples/voxconverse) +in [the voxconverse recipe](https://github.com/wenet-e2e/wespeaker/tree/master/examples/voxconverse). ## Model List diff --git a/docs/python_package.md b/docs/python_package.md index 3e43b5ca..9c72ba0f 100644 --- a/docs/python_package.md +++ b/docs/python_package.md @@ -14,7 +14,7 @@ cd wespeaker pip install -e . ``` -## Command line Usage +## Command Line Usage ``` sh $ wespeaker --task embedding --audio_file audio.wav --output_file embedding.txt @@ -44,22 +44,22 @@ You can specify the following parameters. (use `-h` for details) * `--resample_rate`: resample rate (default: 16000) * `--vad`: apply vad or not for the input audios (default: true) * `--output_file`: output file to save speaker embedding, if you use kaldi wav_scp, output will be `output_file.ark` - and `output_file.scp` + and `output_file.scp` ### Pretrained model support We provide different pretrained models, which can be found -at [pretrained models](https://github.com/wenet-e2e/wespeaker/blob/master/docs/pretrained.md) +at [pretrained models](https://github.com/wenet-e2e/wespeaker/blob/master/docs/pretrained.md). **Warning** If you want to use the models provided in the above link, be sure to rename the model and config file -to `avg_model.pt` and `config.yaml` +to `avg_model.pt` and `config.yaml`. By default, specifying the `language` option will download the pretrained models as * english: `ResNet221_LM` pretrained on VoxCeleb * chinese: `ResNet34_LM` pretrained on CnCeleb -if you want to use other pretrained models, please use the `-p` or `--pretrain` to specify the directory +If you want to use other pretrained models, please use the `-p` or `--pretrain` to specify the directory containing `avg_model.pt` and `config.yaml`, which can either be the ones we provided and trained by yourself. diff --git a/docs/runtime.md b/docs/runtime.md index e0a56c81..bc916459 100644 --- a/docs/runtime.md +++ b/docs/runtime.md @@ -5,11 +5,11 @@ The Wespeaker runtime supports the following platforms. - Server - - [GPU](https://github.com/wenet-e2e/wespeaker/tree/master/runtime/server/x86_gpu) + - [TensorRT GPU](https://github.com/wenet-e2e/wespeaker/tree/master/runtime/server/x86_gpu) - Device - [Horizon X3 PI](https://github.com/wenet-e2e/wespeaker/tree/master/runtime/horizonbpu) - - [onnxruntime](https://github.com/wenet-e2e/wespeaker/tree/master/runtime/onnxruntime) + - [Onnxruntime](https://github.com/wenet-e2e/wespeaker/tree/master/runtime/onnxruntime) - linux_x86_cpu - linux_x86_gpu - macOS @@ -17,7 +17,7 @@ The Wespeaker runtime supports the following platforms. - Android (coming) - ncnn (coming) -## onnxruntime +## Onnxruntime * Step 1. Export your experiment model to ONNX by https://github.com/wenet-e2e/wespeaker/blob/master/wespeaker/bin/export_onnx.py @@ -85,7 +85,7 @@ onnx_dir=your_model_dir --embedding_size 256 ``` -## horizonbpu +## Horizonbpu * Step 1. Setup environment (install horizon packages and cross compile tools) in the PC. @@ -188,7 +188,7 @@ embed_out=your_embedding_txt ``` -## server (tensorrt gpu) +## Server (tensorrt gpu) ### Introduction In this project, we use models trained in [wespeaker](https://github.com/wenet-e2e/wespeaker) as an example to show how to convert speaker model to tensorrt and deploy them on [Triton Inference Server](https://github.com/triton-inference-server/server.git). If you only have CPUs, instead of using GPUs to deploy Tensorrt model, you may deploy the exported onnx model on Triton Inference Server as well. diff --git a/docs/speaker_recognition_papers.md b/docs/speaker_recognition_papers.md index 3583551d..cabe1a4c 100644 --- a/docs/speaker_recognition_papers.md +++ b/docs/speaker_recognition_papers.md @@ -1,4 +1,4 @@ -# Speaker recognition papers +# Speaker Recognition Papers - Dataset - VoxCeleb diff --git a/docs/train.rst b/docs/train.rst index 37b12145..4eea8f26 100644 --- a/docs/train.rst +++ b/docs/train.rst @@ -7,3 +7,4 @@ How to train models? ./vox.md ./vox_ssl.md + ./voxconverse_diar.md diff --git a/docs/vox.md b/docs/vox.md index a68f30e3..8702669b 100644 --- a/docs/vox.md +++ b/docs/vox.md @@ -1,4 +1,4 @@ -## Tutorial on VoxCeleb v2 (Supervised-VoxCeleb34) +## SV Tutorial on VoxCeleb v2 (Supervised) If you meet any problems when going through this tutorial, please feel free to ask in github [issues](https://github.com/wenet-e2e/wespeaker/issues). Thanks for any kind of feedback. @@ -89,7 +89,7 @@ id10001/Y8hIVOBuels/00001.wav id10999/G5R2-Hl7YX8/00008.wav nontarget ... ``` -### Stage 1: Reformat the Data +### Stage 2: Reformat the Data ``` if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then diff --git a/docs/vox_ssl.md b/docs/vox_ssl.md index a62a16db..1d054197 100644 --- a/docs/vox_ssl.md +++ b/docs/vox_ssl.md @@ -1,4 +1,4 @@ -## Tutorial on VoxCeleb v3 (Self-Supervised on VoxCeleb) +## SV Tutorial on VoxCeleb v3 (Self-Supervised) If you meet any problems when going through this tutorial, please feel free to ask in github [issues](https://github.com/wenet-e2e/wespeaker/issues). Thanks for any kind of feedback. @@ -102,7 +102,7 @@ id10001/Y8hIVOBuels/00001.wav id10999/G5R2-Hl7YX8/00008.wav nontarget In this step, we generated **utt2spk** and **spk2utt**, but we will not use any speaker labels during the training process. -### Stage 1: Reformat the Data +### Stage 2: Reformat the Data ``` if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then diff --git a/docs/voxconverse_diar.md b/docs/voxconverse_diar.md new file mode 100644 index 00000000..e60819fa --- /dev/null +++ b/docs/voxconverse_diar.md @@ -0,0 +1,314 @@ +## Diarization Tutorial on VoxConverse v2 + +If you meet any problems when going through this tutorial, please feel free to ask in github [issues](https://github.com/wenet-e2e/wespeaker/issues). Thanks for any kind of feedback. + + +### First Experiment + +Speaker diarization is a typical downstream task of applying the well-learnt speaker embedding. +Here we introduce our diarization recipe `examples/voxconverse/v2/run.sh` on the Voxconverse 2020 dataset. + +Note that we provide two recipes: **v1** and **v2**. Their only difference is that in **v2**, we split the Fbank extraction, embedding extraction and clustering modules to different stages. +We recommend newcomers to follow the **v2** recipe and run it stage by stage and check the result to better understand the whole process. + +``` +cd examples/voxconverse/v2/ +bash run.sh --stage 1 --stop_stage 1 +bash run.sh --stage 2 --stop_stage 2 +bash run.sh --stage 3 --stop_stage 3 +bash run.sh --stage 4 --stop_stage 4 +bash run.sh --stage 5 --stop_stage 5 +bash run.sh --stage 6 --stop_stage 6 +bash run.sh --stage 7 --stop_stage 7 +bash run.sh --stage 8 --stop_stage 8 +``` + + +### Stage 1: Download Prerequisites + +``` +if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then + mkdir -p external_tools + + # [1] Download evaluation toolkit + wget -c https://github.com/usnistgov/SCTK/archive/refs/tags/v2.4.12.zip -O external_tools/SCTK-v2.4.12.zip + unzip -o external_tools/SCTK-v2.4.12.zip -d external_tools + + # [2] Download voice activity detection model pretrained by Silero Team + wget -c https://github.com/snakers4/silero-vad/archive/refs/tags/v3.1.zip -O external_tools/silero-vad-v3.1.zip + unzip -o external_tools/silero-vad-v3.1.zip -d external_tools + + # [3] Download ResNet34 speaker model pretrained by WeSpeaker Team + mkdir -p pretrained_models + + wget -c https://wespeaker-1256283475.cos.ap-shanghai.myqcloud.com/models/voxceleb/voxceleb_resnet34_LM.onnx -O pretrained_models/voxceleb_resnet34_LM.onnx +fi +``` + +Download three Prerequisites: +* the evaluation toolkit **SCTK**: Compute the DER metric +* the open-source VAD model pre-trained by [silero-vad](https://github.com/snakers4/silero-vad): Remove the silence in audio +* the pre-trained ResNet34 model: Extract the speaker embeddings + +When finishing this stage, you will get two new dirs: +- **external_tools** + - SCTK-v2.4.12.zip + - SCTK-v2.4.12 + - silero-vad-v3.1.zip + - silero-vad-v3.1 +- **pretrained_models** + - voxceleb_resnet34_LM.onnx + + +### Stage 2: Download and Prepare Data + +``` +if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then + mkdir -p data + + # Download annotations for dev and test sets (version 0.0.3) + wget -c https://github.com/joonson/voxconverse/archive/refs/heads/master.zip -O data/voxconverse_master.zip + unzip -o data/voxconverse_master.zip -d data + + # Download annotations from VoxSRC-23 validation toolkit (looks like version 0.0.2) + # cd data && git clone https://github.com/JaesungHuh/VoxSRC2023.git --recursive && cd - + + # Download dev audios + mkdir -p data/dev + + wget --no-check-certificate -c https://www.robots.ox.ac.uk/~vgg/data/voxconverse/data/voxconverse_dev_wav.zip -O data/voxconverse_dev_wav.zip + unzip -o data/voxconverse_dev_wav.zip -d data/dev + + # Create wav.scp for dev audios + ls `pwd`/data/dev/audio/*.wav | awk -F/ '{print substr($NF, 1, length($NF)-4), $0}' > data/dev/wav.scp + + # Test audios + mkdir -p data/test + + wget --no-check-certificate -c https://www.robots.ox.ac.uk/~vgg/data/voxconverse/data/voxconverse_test_wav.zip -O data/voxconverse_test_wav.zip + unzip -o data/voxconverse_test_wav.zip -d data/test + + # Create wav.scp for test audios + ls `pwd`/data/test/voxconverse_test_wav/*.wav | awk -F/ '{print substr($NF, 1, length($NF)-4), $0}' > data/test/wav.scp +fi +``` + +Download the Voxconverse 2020 dev and test sets as well as their annotations. +Here we use the latest version 0.0.3 in default (recommended). +You can also try the version 0.0.2 (seem to be used in the [VoxSRC-23 baseline repo](https://github.com/JaesungHuh/VoxSRC2023.git)). + +When finishing this stage, you will get the new **data** dir: +- **data** + - voxconverse_master.zip + - voxconverse_dev_wav.zip + - voxconverse_test_wav.zip + - voxconverse_master + - dev: ground-truth rttms + - test: ground-truth rttms + - dev + - audio: wav files + - wav.scp + - test + - voxconverse_test_wav: wav files + - wav.scp + +**wav.scp**: each line records two blank-separated columns : `wav_id` and `wav_path` +``` +abjxc /path/to/wespeaker/examples/voxconverse/v2/data/dev/audio/abjxc.wav +afjiv /path/to/wespeaker/examples/voxconverse/v2/data/dev/audio/afjiv.wav +... +``` + + +### Stage 3: Apply SAD (i.e., VAD) + +``` +if [ ${stage} -le 3 ] && [ ${stop_stage} -ge 3 ]; then + # Set VAD min duration + min_duration=0.255 + + if [[ "x${sad_type}" == "xoracle" ]]; then + # Oracle SAD: handling overlapping or too short regions in ground truth RTTM + while read -r utt wav_path; do + python3 wespeaker/diar/make_oracle_sad.py \ + --rttm data/voxconverse-master/${partition}/${utt}.rttm \ + --min-duration $min_duration + done < data/${partition}/wav.scp > data/${partition}/oracle_sad + fi + + if [[ "x${sad_type}" == "xsystem" ]]; then + # System SAD: applying 'silero' VAD + python3 wespeaker/diar/make_system_sad.py \ + --repo-path external_tools/silero-vad-3.1 \ + --scp data/${partition}/wav.scp \ + --min-duration $min_duration > data/${partition}/system_sad + fi +fi +``` + +`sad_type` could be oracle or system: +* oracle: get vad infos from the ground truth RTTMs, saved in `data/${partition}/oracle_sad` +* system: compute vad results using the [silero-vad](https://github.com/snakers4/silero-vad), saved in `data/${partition}/system_sad` + +where `partition` is dev or test. + +Note that too short VAD segments with less than `min_duration` seconds are ignored and simply regarded as silence. + + +### Stage 4: Extract Fbank Features + +``` +if [ ${stage} -le 4 ] && [ ${stop_stage} -ge 4 ]; then + + [ -d "exp/${sad_type}_sad_fbank" ] && rm -r exp/${sad_type}_sad_fbank + + echo "Make Fbank features and store it under exp/${sad_type}_sad_fbank" + echo "..." + bash local/make_fbank.sh \ + --scp data/${partition}/wav.scp \ + --segments data/${partition}/${sad_type}_sad \ + --store_dir exp/${partition}_${sad_type}_sad_fbank \ + --subseg_cmn ${subseg_cmn} \ + --nj 24 +fi +``` + +`subseg_cmn` suggests applying Cepstral Mean Normalization (CMN) to Fbanks: +* on the sliding-window sub-segment (`subseg_cmn=true`) or +* on the whole vad segment (`subseg_cmn=false`) + +You can specify `nj` jobs according to your cpu cores num. +The final Fbank features are saved under dir `exp/${partition}_${sad_type}_sad_fbank`. + + +### Stage 5: Extract Sliding-window Speaker Embeddings + +``` +if [ ${stage} -le 5 ] && [ ${stop_stage} -ge 5 ]; then + + [ -d "exp/${sad_type}_sad_embedding" ] && rm -r exp/${sad_type}_sad_embedding + + echo "Extract embeddings and store it under exp/${sad_type}_sad_embedding" + echo "..." + bash local/extract_emb.sh \ + --scp exp/${partition}_${sad_type}_sad_fbank/fbank.scp \ + --pretrained_model pretrained_models/voxceleb_resnet34_LM.onnx \ + --device cuda \ + --store_dir exp/${partition}_${sad_type}_sad_embedding \ + --batch_size 96 \ + --frame_shift 10 \ + --window_secs 1.5 \ + --period_secs 0.75 \ + --subseg_cmn ${subseg_cmn} \ + --nj 1 +fi +``` + +Extract speaker embeddings from the Fbank features in a sliding-window fashion: `step=0.75s, window=1.5s`, +which means extracting embedding from each `1.5s` speech window after every `0.75s`. +Thus the contiguous windows overlap by `1.5-0.75=0.75s` in duration. + +You can also specify `nj` jobs and decide to use the `gpu` or `cpu` devices. +The extracted embeddings are saved under dir `exp/${partition}_${sad_type}_sad_embedding`. + + +### Stage 6: Apply Spectral Clustering + +``` +if [ ${stage} -le 6 ] && [ ${stop_stage} -ge 6 ]; then + + [ -f "exp/spectral_cluster/${partition}_${sad_type}_sad_labels" ] && rm exp/spectral_cluster/${partition}_${sad_type}_sad_labels + + echo "Doing spectral clustering and store the result in exp/spectral_cluster/${partition}_${sad_type}_sad_labels" + echo "..." + python3 wespeaker/diar/spectral_clusterer.py \ + --scp exp/${partition}_${sad_type}_sad_embedding/emb.scp \ + --output exp/spectral_cluster/${partition}_${sad_type}_sad_labels +fi +``` + +Apply spectral clustering using the extracted sliding-window speaker embeddings, +and store the results in `exp/spectral_cluster/${partition}_${sad_type}_sad_labels`, +where each line records two blank-separated columns : `subseg_id` and `spk_id` +``` +abjxc-00000400-00007040-00000000-00000150 0 +abjxc-00000400-00007040-00000075-00000225 0 +abjxc-00000400-00007040-00000150-00000300 0 +abjxc-00000400-00007040-00000225-00000375 0 +... +``` + + +### Stage 7: Reformat Clustering Labels into RTTMs + +``` +if [ ${stage} -le 7 ] && [ ${stop_stage} -ge 7 ]; then + python3 wespeaker/diar/make_rttm.py \ + --labels exp/spectral_cluster/${partition}_${sad_type}_sad_labels \ + --channel 1 > exp/spectral_cluster/${partition}_${sad_type}_sad_rttm +fi +``` + +Convert the clustering labels into the Rich Transcription Time Marked (RTTM) format, saved in `exp/spectral_cluster/${partition}_${sad_type}_sad_rttm`. + +RTTM files are space-delimited text files containing one turn per line, each line containing ten fields: + +* `Type` -- segment type; should always by `SPEAKER` +* `File ID` -- file name; basename of the recording minus extension (e.g., `abjxc`) +* `Channel ID` -- channel (1-indexed) that turn is on; should always be `1` +* `Turn Onset` -- onset of turn in seconds from beginning of recording +* `Turn Duration` -- duration of turn in seconds +* `Orthography Field` -- should always by `` +* `Speaker Type` -- should always be `` +* `Speaker Name` -- name of speaker of turn; should be unique within scope of each file +* `Confidence Score` -- system confidence (probability) that information is correct; should always be `` +* `Signal Lookahead Time` -- should always be `` + +For instance, + +``` +SPEAKER abjxc 1 0.400 6.640 0 +SPEAKER abjxc 1 8.680 55.960 0 +``` + + +### Stage 8: Evaluate the Result (DER) + +``` +if [ ${stage} -le 8 ] && [ ${stop_stage} -ge 8 ]; then + ref_dir=data/voxconverse-master/ + #ref_dir=data/VoxSRC2023/voxconverse/ + echo -e "Get the DER results\n..." + perl external_tools/SCTK-2.4.12/src/md-eval/md-eval.pl \ + -c 0.25 \ + -r <(cat ${ref_dir}/${partition}/*.rttm) \ + -s exp/spectral_cluster/${partition}_${sad_type}_sad_rttm 2>&1 | tee exp/spectral_cluster/${partition}_${sad_type}_sad_res + + if [ ${get_each_file_res} -eq 1 ];then + single_file_res_dir=exp/spectral_cluster/${partition}_${sad_type}_single_file_res + mkdir -p $single_file_res_dir + echo -e "\nGet the DER results for each file and the results will be stored underd ${single_file_res_dir}\n..." + + awk '{print $2}' exp/spectral_cluster/${partition}_${sad_type}_sad_rttm | sort -u | while read file_name; do + perl external_tools/SCTK-2.4.12/src/md-eval/md-eval.pl \ + -c 0.25 \ + -r <(cat ${ref_dir}/${partition}/${file_name}.rttm) \ + -s <(grep "${file_name}" exp/spectral_cluster/${partition}_${sad_type}_sad_rttm) > ${single_file_res_dir}/${partition}_${file_name}_res + done + echo "Done!" + fi +fi +``` + +Use the **SCTK** toolkit to compute the Diarization Error Rate (DER) metric, which is the sum of + +* speaker error -- percentage of scored time for which the wrong speaker id is assigned within a speech region +* false alarm speech -- percentage of scored time for which a nonspeech region is incorrectly marked as containing speech +* missed speech -- percentage of scored time for which a speech region is incorrectly marked as not containing speech + +For more details about DER, consult Section 6.1 of the [NIST RT-09 evaluation plan](https://web.archive.org/web/20100606092041if_/http://www.itl.nist.gov/iad/mig/tests/rt/2009/docs/rt09-meeting-eval-plan-v2.pdf). + +The overall DER result would be saved in `exp/spectral_cluster/${partition}_${sad_type}_sad_res`. +Optionally, set `get_each_file_res` as `1` if you also want to get the DER result for each single file, which will be saved under dir `exp/spectral_cluster/${partition}_${sad_type}_single_file_res`. + + diff --git a/examples/voxconverse/v1/run.sh b/examples/voxconverse/v1/run.sh index d4a7948e..b852ec80 100755 --- a/examples/voxconverse/v1/run.sh +++ b/examples/voxconverse/v1/run.sh @@ -16,7 +16,7 @@ stage=-1 stop_stage=-1 -sad_type="system" +sad_type="oracle" . tools/parse_options.sh diff --git a/examples/voxconverse/v2/run.sh b/examples/voxconverse/v2/run.sh index 334a5216..8e786297 100755 --- a/examples/voxconverse/v2/run.sh +++ b/examples/voxconverse/v2/run.sh @@ -18,8 +18,8 @@ stage=-1 stop_stage=-1 -sad_type="system" -partition="test" +sad_type="oracle" +partition="dev" # do cmn on the sub-segment or on the vad segment subseg_cmn=true @@ -47,7 +47,7 @@ if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then fi -# Download VoxConverse dev audios and the corresponding annotations +# Download VoxConverse dev/test audios and the corresponding annotations if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then mkdir -p data