Tags: czech language, stt, speech recognition, voice transcription, asr, offline, free
Check another model that is trained on similar dataset. It is part of Conqui.ai
Czech language lacks a modern speech recognition engine that can be used offline, e.g. home automation, robotics.
Mozilla Common Voice is an initiative that provides some hope but there are very little czech speaking supporters giving their voice. Given current speed of gathering data, it will still take significant time to capture enough recordings to train a official model.
Please contribute your voice: 👉 https://commonvoice.mozilla.org/cs/speak 👈
Thank you 👍
Goal of this project is to train a model now, from existing data, though not perfect, and make it available to everyone.
This project is based on Baidu Deep Speeech 2 paper and https://github.com/SeanNaren/deepspeech.pytorch.
Mozilla Deep Speech implementation in Tesorflow was failing training on Segmentation Fault after several epochs which I was not able to resolve.
- Czech language - no konečně aspoň něco.
- Runs locally in docker, no dependencies.
- REST API for whole audio file transcription. Streaming is not supported yet.
- Streaming transcription
- Dedicated Dockerfile.serving that will be minimized version with just needed for serving API
- Integrate additional datasets
Transcription happens in two phases. First an accoustic model estimates best representation of audio by characters in czech alphabet. This is sometimes incorrect as there can be multiple options for similarly sounding characters. Hence a language model kicks in as a second phase to cross-check against czech words and its relevant context.
See specific README.md
Based on:
- 45 hours of Mozilla Common Voice
See specific README.md
Based on:
- 700MB of czech wikipedia
- Docker
- Download latest models from releases https://deepspeech.slesinger.info
There are two files:
- Accoustic model:
deepspeech_final.pth
- Language model:
lm.bin
Place the files to any directory, for example /home/hass/docker-volumes/deepspeech
- Download and start a container image that will be ready for serving transcription API with this command:
sudo docker run -v /home/hass/docker-volumes/deepspeech:/workspace/data --tmpfs /tmp -p 8888:8888 -p 10456:10456 --net=host --ipc=host --name deepspeech -e MODEL=/workspace/data/deepspeech_final.pth -e LM=/workspace/data/lm.bin slesinger/deepspeech:latest
Inspiration for Dockerfile.
Make sure environment variable point to correct model files.
Once container is started you can make HTTP POST queries supplying a WAV file (PCM, 16bit, mono, 16kHz)
Example: curl command:
curl -X POST http://localhost:10456/transcribe -H "Content-type: multipart/form-data" -F "file=@/tmp/wav/sample_voice.wav"