Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[docs] add diarization tutorial in doc and re-order the directory structure in the index page #300

Merged
merged 5 commits into from
Apr 2, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,9 @@

This directory includes the basic documents for wespeaker, including

- [Tutorial on VoxCeleb (Supervised-VoxCeleb34)](https://github.com/wenet-e2e/wespeaker/blob/master/docs/vox.md)
- [Tutorial on VoxCeleb (Self-supervised-DINO)](https://github.com/wenet-e2e/wespeaker/blob/master/docs/vox_ssl.md)
- [SV Tutorial on VoxCeleb v2 (Supervised)](https://github.com/wenet-e2e/wespeaker/blob/master/docs/vox.md)
- [SV Tutorial on VoxCeleb v3 (Self-Supervised-DINO)](https://github.com/wenet-e2e/wespeaker/blob/master/docs/vox_ssl.md)
- [Diarization Tutorial on VoxConverse v2](https://github.com/wenet-e2e/wespeaker/blob/master/docs/voxconverse_diar.md)
- [Suggested papers for speaker embedding learning](https://github.com/wenet-e2e/wespeaker/blob/master/docs/speaker_recognition_papers.md)
- [Provided pretrained models](https://github.com/wenet-e2e/wespeaker/blob/master/docs/pretrained.md)
- [Off-the-shelf Usages: from Command Line or Python Code](https://github.com/wenet-e2e/wespeaker/blob/master/docs/python_package.md)
Expand All @@ -13,6 +14,5 @@ This directory includes the basic documents for wespeaker, including

## ToDo List (possible)

- [ ] Diarization Tutorial on Voxconverse
- [ ] Chinese HandBooks
- [ ] Introduction in Video
2 changes: 1 addition & 1 deletion docs/contribute.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ and [Google C++ style guide](https://google.github.io/styleguide/cppguide.html).

When submitting a pull request:

1. Make sure your code has been rebased on top of the latest commit on the main branch.
1. Make sure your code has been rebased on top of the latest commit on the master branch.
2. Ensure code is properly formatted.
3. Include a detailed description of the changes in the pull request.
Explain why you made the changes you did.
Expand Down
8 changes: 4 additions & 4 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,19 +3,19 @@
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.

Welcome to wespeaker's documentation!
Welcome to Wespeaker's documentation!
=====================================

wespeaker is an research and production oriented Speaker Verification, Recognition and Diarization Toolkit.
Wespeaker is an Research and Production Oriented Speaker Verification, Recognition and Diarization Toolkit.

.. toctree::
:maxdepth: 2
:caption: Contents:

./train.rst
./runtime.md
./python_package.md
./train.rst
./pretrained.md
./runtime.md
./reference.rst
./contribute.md

Expand Down
2 changes: 1 addition & 1 deletion docs/paper.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Papers
# Wespeaker Papers

* [Wespeaker: A research and production oriented speaker embedding learning toolkit](https://arxiv.org/pdf/2210.17016.pdf), accepted by ICASSP 2023.
* [Wespeaker baselines for VoxSRC2023](https://arxiv.org/pdf/2306.15161.pdf)
4 changes: 2 additions & 2 deletions docs/pretrained.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ modeling, such as
For users who would like to verify the SV performance or extract speaker embeddings for the above tasks without
troubling about training the speaker embedding learner, we provide two types of pretrained models.

1. **Checkpoint Model**, with suffix **.pt**, the model trained and saved as checkpoint by WeNet python code, you can
1. **Checkpoint Model**, with suffix **.pt**, the model trained and saved as checkpoint by WeSpeaker python code, you can
reproduce our published result with it, or you can use it as checkpoint to continue.

2. **Runtime Model**, with suffix **.onnx**, the `runtime model` is exported by `Onnxruntime` on the `checkpoint model`.
Expand All @@ -35,7 +35,7 @@ python wespeaker/bin/infer_onnx.py --onnx_path $onnx_path --wav_path $wav_path
```

You can easily adapt `infer_onnx.py` to your application, a speaker diarization example can be found
in [the voxconverse recipe](https://github.com/wenet-e2e/wespeaker/tree/master/examples/voxconverse)
in [the voxconverse recipe](https://github.com/wenet-e2e/wespeaker/tree/master/examples/voxconverse).

## Model List

Expand Down
10 changes: 5 additions & 5 deletions docs/python_package.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ cd wespeaker
pip install -e .
```

## Command line Usage
## Command Line Usage

``` sh
$ wespeaker --task embedding --audio_file audio.wav --output_file embedding.txt
Expand Down Expand Up @@ -44,22 +44,22 @@ You can specify the following parameters. (use `-h` for details)
* `--resample_rate`: resample rate (default: 16000)
* `--vad`: apply vad or not for the input audios (default: true)
* `--output_file`: output file to save speaker embedding, if you use kaldi wav_scp, output will be `output_file.ark`
and `output_file.scp`
and `output_file.scp`

### Pretrained model support

We provide different pretrained models, which can be found
at [pretrained models](https://github.com/wenet-e2e/wespeaker/blob/master/docs/pretrained.md)
at [pretrained models](https://github.com/wenet-e2e/wespeaker/blob/master/docs/pretrained.md).

**Warning** If you want to use the models provided in the above link, be sure to rename the model and config file
to `avg_model.pt` and `config.yaml`
to `avg_model.pt` and `config.yaml`.

By default, specifying the `language` option will download the pretrained models as

* english: `ResNet221_LM` pretrained on VoxCeleb
* chinese: `ResNet34_LM` pretrained on CnCeleb

if you want to use other pretrained models, please use the `-p` or `--pretrain` to specify the directory
If you want to use other pretrained models, please use the `-p` or `--pretrain` to specify the directory
containing `avg_model.pt` and `config.yaml`,
which can either be the ones we provided and trained by yourself.

Expand Down
10 changes: 5 additions & 5 deletions docs/runtime.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,19 +5,19 @@
The Wespeaker runtime supports the following platforms.

- Server
- [GPU](https://github.com/wenet-e2e/wespeaker/tree/master/runtime/server/x86_gpu)
- [TensorRT GPU](https://github.com/wenet-e2e/wespeaker/tree/master/runtime/server/x86_gpu)

- Device
- [Horizon X3 PI](https://github.com/wenet-e2e/wespeaker/tree/master/runtime/horizonbpu)
- [onnxruntime](https://github.com/wenet-e2e/wespeaker/tree/master/runtime/onnxruntime)
- [Onnxruntime](https://github.com/wenet-e2e/wespeaker/tree/master/runtime/onnxruntime)
- linux_x86_cpu
- linux_x86_gpu
- macOS
- windows
- Android (coming)
- ncnn (coming)

## onnxruntime
## Onnxruntime

* Step 1. Export your experiment model to ONNX by https://github.com/wenet-e2e/wespeaker/blob/master/wespeaker/bin/export_onnx.py

Expand Down Expand Up @@ -85,7 +85,7 @@ onnx_dir=your_model_dir
--embedding_size 256
```

## horizonbpu
## Horizonbpu

* Step 1. Setup environment (install horizon packages and cross compile tools) in the PC.

Expand Down Expand Up @@ -188,7 +188,7 @@ embed_out=your_embedding_txt
```


## server (tensorrt gpu)
## Server (tensorrt gpu)

### Introduction
In this project, we use models trained in [wespeaker](https://github.com/wenet-e2e/wespeaker) as an example to show how to convert speaker model to tensorrt and deploy them on [Triton Inference Server](https://github.com/triton-inference-server/server.git). If you only have CPUs, instead of using GPUs to deploy Tensorrt model, you may deploy the exported onnx model on Triton Inference Server as well.
Expand Down
2 changes: 1 addition & 1 deletion docs/speaker_recognition_papers.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Speaker recognition papers
# Speaker Recognition Papers

- Dataset
- VoxCeleb
Expand Down
1 change: 1 addition & 0 deletions docs/train.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,4 @@ How to train models?

./vox.md
./vox_ssl.md
./voxconverse_diar.md
4 changes: 2 additions & 2 deletions docs/vox.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
## Tutorial on VoxCeleb v2 (Supervised-VoxCeleb34)
## SV Tutorial on VoxCeleb v2 (Supervised)

If you meet any problems when going through this tutorial, please feel free to ask in
github [issues](https://github.com/wenet-e2e/wespeaker/issues). Thanks for any kind of feedback.
Expand Down Expand Up @@ -89,7 +89,7 @@ id10001/Y8hIVOBuels/00001.wav id10999/G5R2-Hl7YX8/00008.wav nontarget
...
```

### Stage 1: Reformat the Data
### Stage 2: Reformat the Data

```
if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then
Expand Down
4 changes: 2 additions & 2 deletions docs/vox_ssl.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
## Tutorial on VoxCeleb v3 (Self-Supervised on VoxCeleb)
## SV Tutorial on VoxCeleb v3 (Self-Supervised)

If you meet any problems when going through this tutorial, please feel free to ask in
github [issues](https://github.com/wenet-e2e/wespeaker/issues). Thanks for any kind of feedback.
Expand Down Expand Up @@ -102,7 +102,7 @@ id10001/Y8hIVOBuels/00001.wav id10999/G5R2-Hl7YX8/00008.wav nontarget
In this step, we generated **utt2spk** and **spk2utt**, but we will not use any speaker labels during the training
process.

### Stage 1: Reformat the Data
### Stage 2: Reformat the Data

```
if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then
Expand Down
Loading
Loading