Skip to content

supernaiter/ssr7000

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 

Repository files navigation

SSR7000: A SYNCHRONIZED CORPUS OF ULTRASOUND TONGUE IMAGING FOR END-TO-END SILENT SPEECH RECOGNITION

raw_compressed raw2_compressed raw3_compressed

Overview

The SSR7000 corpus is a recording set of 7384 utterances of training data and 100 utterances of test data by a single male native English speaker. All utterances were recorded with silent speech in which the participant did not speak aloud but only moved his articulatory organs. The recordings of the lip and ultrasound tongue images were synchronized when the speaker was silently speaking.

Here you can download the dataset and the recipe we used for the benchmark result. The corpus is publicly available under the CC BY-NC4.0 license.

Downloads

You can download the dataset from HERE.

The SSR7000 provides both raw data without any preprocessing and the processed data. The raw data is useful for those who wish to work on improving the preprocessing. For those who are more interested in the recognizer rather than in the preprocessing, we have provided the preprocessed data too.

How to Use the Recipe

  1. Install ESPnet (not ESPnet2) following their instruction.

  2. Put our recipe folder under espnet/egs, like espnet/egs/recipe.

Google Colab

You can try our benchmark recognition on Google Colab Open In Colab without any environment setting!

Baseline

Our benchmark results using ESPnet and the recipe on this repository. This table shows a comparison of the number of data.

1000 3000 5000 7384 (all)
CER 51.5 47.4 23.7 17.6
WER 89.5 81.0 50.0 37.6

Contact

kimura-naoki[at]g.ecc.u-tokyo.ac.jp

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published