SSR7000: A SYNCHRONIZED CORPUS OF ULTRASOUND TONGUE IMAGING FOR END-TO-END SILENT SPEECH RECOGNITION

Overview

The SSR7000 corpus is a recording set of 7384 utterances of training data and 100 utterances of test data by a single male native English speaker. All utterances were recorded with silent speech in which the participant did not speak aloud but only moved his articulatory organs. The recordings of the lip and ultrasound tongue images were synchronized when the speaker was silently speaking.

Here you can download the dataset and the recipe we used for the benchmark result. The corpus is publicly available under the CC BY-NC4.0 license.

Downloads

You can download the dataset from HERE.

The SSR7000 provides both raw data without any preprocessing and the processed data. The raw data is useful for those who wish to work on improving the preprocessing. For those who are more interested in the recognizer rather than in the preprocessing, we have provided the preprocessed data too.

How to Use the Recipe

Install ESPnet (not ESPnet2) following their instruction.
Put our recipe folder under espnet/egs, like espnet/egs/recipe.

Google Colab

You can try our benchmark recognition on Google Colab without any environment setting!

Baseline

Our benchmark results using ESPnet and the recipe on this repository. This table shows a comparison of the number of data.

	1000	3000	5000	7384 (all)
CER	51.5	47.4	23.7	`17.6`
WER	89.5	81.0	50.0	`37.6`

Contact

kimura-naoki[at]g.ecc.u-tokyo.ac.jp

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
recipe		recipe
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SSR7000: A SYNCHRONIZED CORPUS OF ULTRASOUND TONGUE IMAGING FOR END-TO-END SILENT SPEECH RECOGNITION

Overview

Downloads

How to Use the Recipe

Google Colab

Baseline

Contact

About

Releases

Packages

Contributors 3

Languages

supernaiter/ssr7000

Folders and files

Latest commit

History

Repository files navigation

SSR7000: A SYNCHRONIZED CORPUS OF ULTRASOUND TONGUE IMAGING FOR END-TO-END SILENT SPEECH RECOGNITION

Overview

Downloads

How to Use the Recipe

Google Colab

Baseline

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages