Skip to content

Commit

Permalink
[docs] add diarization tutorial in doc and re-order the directory str…
Browse files Browse the repository at this point in the history
…ucture in the index page (#300)

* [docs] fix some docs

* [docs] fix docs/vox.md and docs/vox_ssl.md

* [docs] use upper case for titles

* [docs] change default sad_type as oracle in v1&&v2/run.sh

* [docs] add docs/voxconverse_diar.md for diar tutorial and re-order the directory structure in the index.rst page
  • Loading branch information
JiJiJiang authored Apr 2, 2024
1 parent 6b9e6e0 commit deb72ba
Show file tree
Hide file tree
Showing 14 changed files with 345 additions and 30 deletions.
6 changes: 3 additions & 3 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,9 @@

This directory includes the basic documents for wespeaker, including

- [Tutorial on VoxCeleb (Supervised-VoxCeleb34)](https://github.com/wenet-e2e/wespeaker/blob/master/docs/vox.md)
- [Tutorial on VoxCeleb (Self-supervised-DINO)](https://github.com/wenet-e2e/wespeaker/blob/master/docs/vox_ssl.md)
- [SV Tutorial on VoxCeleb v2 (Supervised)](https://github.com/wenet-e2e/wespeaker/blob/master/docs/vox.md)
- [SV Tutorial on VoxCeleb v3 (Self-Supervised-DINO)](https://github.com/wenet-e2e/wespeaker/blob/master/docs/vox_ssl.md)
- [Diarization Tutorial on VoxConverse v2](https://github.com/wenet-e2e/wespeaker/blob/master/docs/voxconverse_diar.md)
- [Suggested papers for speaker embedding learning](https://github.com/wenet-e2e/wespeaker/blob/master/docs/speaker_recognition_papers.md)
- [Provided pretrained models](https://github.com/wenet-e2e/wespeaker/blob/master/docs/pretrained.md)
- [Off-the-shelf Usages: from Command Line or Python Code](https://github.com/wenet-e2e/wespeaker/blob/master/docs/python_package.md)
Expand All @@ -13,6 +14,5 @@ This directory includes the basic documents for wespeaker, including

## ToDo List (possible)

- [ ] Diarization Tutorial on Voxconverse
- [ ] Chinese HandBooks
- [ ] Introduction in Video
2 changes: 1 addition & 1 deletion docs/contribute.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ and [Google C++ style guide](https://google.github.io/styleguide/cppguide.html).

When submitting a pull request:

1. Make sure your code has been rebased on top of the latest commit on the main branch.
1. Make sure your code has been rebased on top of the latest commit on the master branch.
2. Ensure code is properly formatted.
3. Include a detailed description of the changes in the pull request.
Explain why you made the changes you did.
Expand Down
8 changes: 4 additions & 4 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,19 +3,19 @@
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
Welcome to wespeaker's documentation!
Welcome to Wespeaker's documentation!
=====================================

wespeaker is an research and production oriented Speaker Verification, Recognition and Diarization Toolkit.
Wespeaker is an Research and Production Oriented Speaker Verification, Recognition and Diarization Toolkit.

.. toctree::
:maxdepth: 2
:caption: Contents:

./train.rst
./runtime.md
./python_package.md
./train.rst
./pretrained.md
./runtime.md
./reference.rst
./contribute.md

Expand Down
2 changes: 1 addition & 1 deletion docs/paper.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Papers
# Wespeaker Papers

* [Wespeaker: A research and production oriented speaker embedding learning toolkit](https://arxiv.org/pdf/2210.17016.pdf), accepted by ICASSP 2023.
* [Wespeaker baselines for VoxSRC2023](https://arxiv.org/pdf/2306.15161.pdf)
4 changes: 2 additions & 2 deletions docs/pretrained.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ modeling, such as
For users who would like to verify the SV performance or extract speaker embeddings for the above tasks without
troubling about training the speaker embedding learner, we provide two types of pretrained models.

1. **Checkpoint Model**, with suffix **.pt**, the model trained and saved as checkpoint by WeNet python code, you can
1. **Checkpoint Model**, with suffix **.pt**, the model trained and saved as checkpoint by WeSpeaker python code, you can
reproduce our published result with it, or you can use it as checkpoint to continue.

2. **Runtime Model**, with suffix **.onnx**, the `runtime model` is exported by `Onnxruntime` on the `checkpoint model`.
Expand All @@ -35,7 +35,7 @@ python wespeaker/bin/infer_onnx.py --onnx_path $onnx_path --wav_path $wav_path
```

You can easily adapt `infer_onnx.py` to your application, a speaker diarization example can be found
in [the voxconverse recipe](https://github.com/wenet-e2e/wespeaker/tree/master/examples/voxconverse)
in [the voxconverse recipe](https://github.com/wenet-e2e/wespeaker/tree/master/examples/voxconverse).

## Model List

Expand Down
10 changes: 5 additions & 5 deletions docs/python_package.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ cd wespeaker
pip install -e .
```

## Command line Usage
## Command Line Usage

``` sh
$ wespeaker --task embedding --audio_file audio.wav --output_file embedding.txt
Expand Down Expand Up @@ -44,22 +44,22 @@ You can specify the following parameters. (use `-h` for details)
* `--resample_rate`: resample rate (default: 16000)
* `--vad`: apply vad or not for the input audios (default: true)
* `--output_file`: output file to save speaker embedding, if you use kaldi wav_scp, output will be `output_file.ark`
and `output_file.scp`
and `output_file.scp`

### Pretrained model support

We provide different pretrained models, which can be found
at [pretrained models](https://github.com/wenet-e2e/wespeaker/blob/master/docs/pretrained.md)
at [pretrained models](https://github.com/wenet-e2e/wespeaker/blob/master/docs/pretrained.md).

**Warning** If you want to use the models provided in the above link, be sure to rename the model and config file
to `avg_model.pt` and `config.yaml`
to `avg_model.pt` and `config.yaml`.

By default, specifying the `language` option will download the pretrained models as

* english: `ResNet221_LM` pretrained on VoxCeleb
* chinese: `ResNet34_LM` pretrained on CnCeleb

if you want to use other pretrained models, please use the `-p` or `--pretrain` to specify the directory
If you want to use other pretrained models, please use the `-p` or `--pretrain` to specify the directory
containing `avg_model.pt` and `config.yaml`,
which can either be the ones we provided and trained by yourself.

Expand Down
10 changes: 5 additions & 5 deletions docs/runtime.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,19 +5,19 @@
The Wespeaker runtime supports the following platforms.

- Server
- [GPU](https://github.com/wenet-e2e/wespeaker/tree/master/runtime/server/x86_gpu)
- [TensorRT GPU](https://github.com/wenet-e2e/wespeaker/tree/master/runtime/server/x86_gpu)

- Device
- [Horizon X3 PI](https://github.com/wenet-e2e/wespeaker/tree/master/runtime/horizonbpu)
- [onnxruntime](https://github.com/wenet-e2e/wespeaker/tree/master/runtime/onnxruntime)
- [Onnxruntime](https://github.com/wenet-e2e/wespeaker/tree/master/runtime/onnxruntime)
- linux_x86_cpu
- linux_x86_gpu
- macOS
- windows
- Android (coming)
- ncnn (coming)

## onnxruntime
## Onnxruntime

* Step 1. Export your experiment model to ONNX by https://github.com/wenet-e2e/wespeaker/blob/master/wespeaker/bin/export_onnx.py

Expand Down Expand Up @@ -85,7 +85,7 @@ onnx_dir=your_model_dir
--embedding_size 256
```

## horizonbpu
## Horizonbpu

* Step 1. Setup environment (install horizon packages and cross compile tools) in the PC.

Expand Down Expand Up @@ -188,7 +188,7 @@ embed_out=your_embedding_txt
```


## server (tensorrt gpu)
## Server (tensorrt gpu)

### Introduction
In this project, we use models trained in [wespeaker](https://github.com/wenet-e2e/wespeaker) as an example to show how to convert speaker model to tensorrt and deploy them on [Triton Inference Server](https://github.com/triton-inference-server/server.git). If you only have CPUs, instead of using GPUs to deploy Tensorrt model, you may deploy the exported onnx model on Triton Inference Server as well.
Expand Down
2 changes: 1 addition & 1 deletion docs/speaker_recognition_papers.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Speaker recognition papers
# Speaker Recognition Papers

- Dataset
- VoxCeleb
Expand Down
1 change: 1 addition & 0 deletions docs/train.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,4 @@ How to train models?

./vox.md
./vox_ssl.md
./voxconverse_diar.md
4 changes: 2 additions & 2 deletions docs/vox.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
## Tutorial on VoxCeleb v2 (Supervised-VoxCeleb34)
## SV Tutorial on VoxCeleb v2 (Supervised)

If you meet any problems when going through this tutorial, please feel free to ask in
github [issues](https://github.com/wenet-e2e/wespeaker/issues). Thanks for any kind of feedback.
Expand Down Expand Up @@ -89,7 +89,7 @@ id10001/Y8hIVOBuels/00001.wav id10999/G5R2-Hl7YX8/00008.wav nontarget
...
```

### Stage 1: Reformat the Data
### Stage 2: Reformat the Data

```
if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then
Expand Down
4 changes: 2 additions & 2 deletions docs/vox_ssl.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
## Tutorial on VoxCeleb v3 (Self-Supervised on VoxCeleb)
## SV Tutorial on VoxCeleb v3 (Self-Supervised)

If you meet any problems when going through this tutorial, please feel free to ask in
github [issues](https://github.com/wenet-e2e/wespeaker/issues). Thanks for any kind of feedback.
Expand Down Expand Up @@ -102,7 +102,7 @@ id10001/Y8hIVOBuels/00001.wav id10999/G5R2-Hl7YX8/00008.wav nontarget
In this step, we generated **utt2spk** and **spk2utt**, but we will not use any speaker labels during the training
process.

### Stage 1: Reformat the Data
### Stage 2: Reformat the Data

```
if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then
Expand Down
Loading

0 comments on commit deb72ba

Please sign in to comment.