wenet-e2e · cdliang11 · Apr 2, 2024 · Mar 31, 2024 · Apr 1, 2024 · Apr 1, 2024
diff --git a/docs/README.md b/docs/README.md
@@ -2,8 +2,9 @@
 
 This directory includes the basic documents for wespeaker, including
 
-- [Tutorial on VoxCeleb (Supervised-VoxCeleb34)](https://github.com/wenet-e2e/wespeaker/blob/master/docs/vox.md)
-- [Tutorial on VoxCeleb (Self-supervised-DINO)](https://github.com/wenet-e2e/wespeaker/blob/master/docs/vox_ssl.md)
+- [SV Tutorial on VoxCeleb v2 (Supervised)](https://github.com/wenet-e2e/wespeaker/blob/master/docs/vox.md)
+- [SV Tutorial on VoxCeleb v3 (Self-Supervised-DINO)](https://github.com/wenet-e2e/wespeaker/blob/master/docs/vox_ssl.md)
+- [Diarization Tutorial on VoxConverse v2](https://github.com/wenet-e2e/wespeaker/blob/master/docs/voxconverse_diar.md)
 - [Suggested papers for speaker embedding learning](https://github.com/wenet-e2e/wespeaker/blob/master/docs/speaker_recognition_papers.md)
 - [Provided pretrained models](https://github.com/wenet-e2e/wespeaker/blob/master/docs/pretrained.md)
 - [Off-the-shelf Usages: from Command Line or Python Code](https://github.com/wenet-e2e/wespeaker/blob/master/docs/python_package.md)
@@ -13,6 +14,5 @@ This directory includes the basic documents for wespeaker, including
 
 ## ToDo List (possible)
 
-- [ ] Diarization Tutorial on Voxconverse
 - [ ] Chinese HandBooks
 - [ ] Introduction in Video
diff --git a/docs/contribute.md b/docs/contribute.md
@@ -31,7 +31,7 @@ and [Google C++ style guide](https://google.github.io/styleguide/cppguide.html).
 
 When submitting a pull request:
 
-1. Make sure your code has been rebased on top of the latest commit on the main branch.
+1. Make sure your code has been rebased on top of the latest commit on the master branch.
 2. Ensure code is properly formatted.
 3. Include a detailed description of the changes in the pull request.
    Explain why you made the changes you did.

diff --git a/docs/index.rst b/docs/index.rst
@@ -3,19 +3,19 @@
    You can adapt this file completely to your liking, but it should at least
    contain the root `toctree` directive.
 
-Welcome to wespeaker's documentation!
+Welcome to Wespeaker's documentation!
 =====================================
 
-wespeaker is an research and production oriented Speaker Verification, Recognition and Diarization Toolkit.
+Wespeaker is an Research and Production Oriented Speaker Verification, Recognition and Diarization Toolkit.
 
 .. toctree::
    :maxdepth: 2
    :caption: Contents:
 
-   ./train.rst
-   ./runtime.md
    ./python_package.md
+   ./train.rst
    ./pretrained.md
+   ./runtime.md
    ./reference.rst
    ./contribute.md
 

diff --git a/docs/paper.md b/docs/paper.md
@@ -1,4 +1,4 @@
-# Papers
+# Wespeaker Papers
 
 * [Wespeaker: A research and production oriented speaker embedding learning toolkit](https://arxiv.org/pdf/2210.17016.pdf), accepted by ICASSP 2023.
 * [Wespeaker baselines for VoxSRC2023](https://arxiv.org/pdf/2306.15161.pdf)
diff --git a/docs/pretrained.md b/docs/pretrained.md
@@ -11,7 +11,7 @@ modeling, such as
 For users who would like to verify the SV performance or extract speaker embeddings for the above tasks without
 troubling about training the speaker embedding learner, we provide two types of pretrained models.
 
-1. **Checkpoint Model**, with suffix **.pt**, the model trained and saved as checkpoint by WeNet python code, you can
+1. **Checkpoint Model**, with suffix **.pt**, the model trained and saved as checkpoint by WeSpeaker python code, you can
    reproduce our published result with it, or you can use it as checkpoint to continue.
 
 2. **Runtime Model**, with suffix **.onnx**, the `runtime model` is exported by `Onnxruntime` on the `checkpoint model`.
@@ -35,7 +35,7 @@ python wespeaker/bin/infer_onnx.py --onnx_path $onnx_path --wav_path $wav_path
 ```
 
 You can easily adapt `infer_onnx.py` to your application, a speaker diarization example can be found
-in [the voxconverse recipe](https://github.com/wenet-e2e/wespeaker/tree/master/examples/voxconverse)
+in [the voxconverse recipe](https://github.com/wenet-e2e/wespeaker/tree/master/examples/voxconverse).
 
 ## Model List
 

diff --git a/docs/python_package.md b/docs/python_package.md
@@ -14,7 +14,7 @@ cd wespeaker
 pip install -e .
 ```
 
-## Command line Usage
+## Command Line Usage
 
 ``` sh
 $ wespeaker --task embedding --audio_file audio.wav --output_file embedding.txt
@@ -44,22 +44,22 @@ You can specify the following parameters. (use `-h` for details)
 * `--resample_rate`: resample rate (default: 16000)
 * `--vad`: apply vad or not for the input audios (default: true)
 * `--output_file`: output file to save speaker embedding, if you use kaldi wav_scp, output will be `output_file.ark`
-  and `output_file.scp`
+                   and `output_file.scp`
 
 ### Pretrained model support
 
 We provide different pretrained models, which can be found
-at [pretrained models](https://github.com/wenet-e2e/wespeaker/blob/master/docs/pretrained.md)
+at [pretrained models](https://github.com/wenet-e2e/wespeaker/blob/master/docs/pretrained.md).
 
 **Warning** If you want to use the models provided in the above link, be sure to rename the model and config file
-to `avg_model.pt` and `config.yaml`
+to `avg_model.pt` and `config.yaml`.
 
 By default, specifying the `language` option will download the pretrained models as
 
 * english: `ResNet221_LM` pretrained on VoxCeleb
 * chinese: `ResNet34_LM` pretrained on CnCeleb
 
-if you want to use other pretrained models, please use the `-p` or `--pretrain` to specify the directory
+If you want to use other pretrained models, please use the `-p` or `--pretrain` to specify the directory
 containing `avg_model.pt` and `config.yaml`,
 which can either be the ones we provided and trained by yourself.
 

diff --git a/docs/runtime.md b/docs/runtime.md
@@ -5,19 +5,19 @@
 The Wespeaker runtime supports the following platforms.
 
 - Server
-    - [GPU](https://github.com/wenet-e2e/wespeaker/tree/master/runtime/server/x86_gpu)
+    - [TensorRT GPU](https://github.com/wenet-e2e/wespeaker/tree/master/runtime/server/x86_gpu)
 
 - Device
     - [Horizon X3 PI](https://github.com/wenet-e2e/wespeaker/tree/master/runtime/horizonbpu)
-    - [onnxruntime](https://github.com/wenet-e2e/wespeaker/tree/master/runtime/onnxruntime)
+    - [Onnxruntime](https://github.com/wenet-e2e/wespeaker/tree/master/runtime/onnxruntime)
         - linux_x86_cpu
         - linux_x86_gpu
         - macOS
         - windows
     - Android (coming)
     - ncnn (coming)
 
-## onnxruntime
+## Onnxruntime
 
 * Step 1. Export your experiment model to ONNX by https://github.com/wenet-e2e/wespeaker/blob/master/wespeaker/bin/export_onnx.py
 
@@ -85,7 +85,7 @@ onnx_dir=your_model_dir
     --embedding_size 256
 ```
 
-## horizonbpu
+## Horizonbpu
 
 * Step 1. Setup environment (install horizon packages and cross compile tools) in the PC.
 
@@ -188,7 +188,7 @@ embed_out=your_embedding_txt
 ```
 
 
-## server (tensorrt gpu)
+## Server (tensorrt gpu)
 
 ### Introduction
 In this project, we use models trained in [wespeaker](https://github.com/wenet-e2e/wespeaker) as an example to show how to convert speaker model to tensorrt and deploy them on [Triton Inference Server](https://github.com/triton-inference-server/server.git). If you only have CPUs, instead of using GPUs to deploy Tensorrt model, you may deploy the exported onnx model on Triton Inference Server as well.

diff --git a/docs/speaker_recognition_papers.md b/docs/speaker_recognition_papers.md
@@ -1,4 +1,4 @@
-# Speaker recognition papers
+# Speaker Recognition Papers
 
 - Dataset
     - VoxCeleb

diff --git a/docs/train.rst b/docs/train.rst
@@ -7,3 +7,4 @@ How to train models?
 
    ./vox.md
    ./vox_ssl.md
+   ./voxconverse_diar.md
diff --git a/docs/vox.md b/docs/vox.md
@@ -1,4 +1,4 @@
-## Tutorial on VoxCeleb v2 (Supervised-VoxCeleb34)
+## SV Tutorial on VoxCeleb v2 (Supervised)
 
 If you meet any problems when going through this tutorial, please feel free to ask in
 github [issues](https://github.com/wenet-e2e/wespeaker/issues). Thanks for any kind of feedback.
@@ -89,7 +89,7 @@ id10001/Y8hIVOBuels/00001.wav id10999/G5R2-Hl7YX8/00008.wav nontarget
 ...
 ```
 
-### Stage 1: Reformat the Data
+### Stage 2: Reformat the Data
 
 ```
 if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then

diff --git a/docs/vox_ssl.md b/docs/vox_ssl.md
@@ -1,4 +1,4 @@
-## Tutorial on VoxCeleb v3 (Self-Supervised on VoxCeleb)
+## SV Tutorial on VoxCeleb v3 (Self-Supervised)
 
 If you meet any problems when going through this tutorial, please feel free to ask in
 github [issues](https://github.com/wenet-e2e/wespeaker/issues). Thanks for any kind of feedback.
@@ -102,7 +102,7 @@ id10001/Y8hIVOBuels/00001.wav id10999/G5R2-Hl7YX8/00008.wav nontarget
 In this step, we generated **utt2spk** and **spk2utt**, but we will not use any speaker labels during the training
 process.
 
-### Stage 1: Reformat the Data
+### Stage 2: Reformat the Data
 
 ```
 if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then