Update ASR-related documentation

roboticslab-uc3m · Jul 2, 2023 · c89fbc4 · c89fbc4
1 parent f2aa7d2
commit c89fbc4
Show file tree

Hide file tree

Showing 3 changed files with 71 additions and 79 deletions.
diff --git a/doc/speech-install.md b/doc/speech-install.md
@@ -7,14 +7,12 @@
 - [Install YARP 3.6+](https://github.com/roboticslab-uc3m/installation-guides/blob/master/install-yarp.md)
 with [Python bindings](https://github.com/roboticslab-uc3m/installation-guides/blob/master/install-yarp.md#install-python-bindings), the latter for `speechRecognition.py` (ASR)
 - [Install eSpeak with MBROLA Voices](https://github.com/roboticslab-uc3m/installation-guides/blob/master/install-espeak-mbrola.md) of `Espeak` (TTS)
-- [Install gstreamer with pocketsphinx](https://github.com/roboticslab-uc3m/installation-guides/blob/master/install-gstreamer-pocketsphinx.md) for `speechRecognition.py` (ASR)
 
 ## Install the Software on Ubuntu (working on all tested versions)
 
 Our software integrates the previous dependencies. Note that you will be prompted for your password upon using `sudo` a couple of times:
 
 ```bash
-pip install --user pyalsaaudio # For `speechRecognition.py` (ASR)
 cd  # go home
 mkdir -p repos; cd repos  # create $HOME/repos if it doesn't exist; then, enter it
 git clone https://github.com/roboticslab-uc3m/speech.git  # Download speech software from the repository
@@ -31,15 +29,6 @@ echo "export ROBOTICSLAB_SPEECH_DIR=`pwd`" >> ~/.bashrc
 
 For additional SPEECH options use `ccmake` instead of `cmake`.
 
-## Troubleshooting installation
-
-For `pip install --user pyalsaaudio`, some users have had to:
-
-```bash
-sudo apt install python-gi # requirement on some systems for pyalsaaudio
-sudo apt install libasound2-dev # requirement on some systems for pyalsaaudio
-```
-
 ## Troubleshooting selecting default soundcard
 
 This is a way set default sound output card using PulseAudio (not ALSA).

diff --git a/programs/speechRecognition/README.md b/programs/speechRecognition/README.md
@@ -1,25 +1,45 @@
 # Speech recognition
 
-## How to launch
+## Installation and usage
 
-1. First, follow the steps described on [installation instructions](/doc/speech-install.md)
-2. Be sure you have a microphone connected to your computer.
-3. Configure the input device selecting the microphone. You can use the  default `sound settings` of Ubuntu and select the input device. This requires using the Ubuntu interface. If you want to configure it remotely (by ssh), you can use the `alsamixer` software.
-In that case, run `alsamixer` on the bash,  press `F6`, select your Sound Card (e.g HDA Intel PCH), press `F4` and select your `Input Source (Front Mic)`. You can turn the input level up/down too.
-4. Run `speechRecognition.py`. By default it uses the `follow-me-en-us.dic` dictionary.  If you want to: know, change or add new dictionary words, you can find them in: `speech/share/speechRecognition/dictionary/` directory.
-5. Try to say some orders of  `follow-me`  demo using the microphone and check if `speechRecognition` detects the words.
-6. The final result in lower case comes out through a yarp port. You can read from the output port writing `yarp read ... /speechRecognition:o`.
+This is a Python 3 application that requires the `sounddevice` package to grab live frames from a mic. Install it with:
 
-## How to configure it
+```bash
+pip install sounddevice
+```
+
+Depending on the selected backend, additional dependencies might be required (see below).
+
+Launch the program with `--help` to see available options. You can display and select the preferred input device with `--list-devices` and `--device`, respectively (otherwise the system default will be chosen).
+
+This application opens two YARP ports: an `<prefix>/rpc:s` port that allows to request a dictionary/model change and to mute/unmute the microphone, and a `<prefix>/result:o` port that broadcasts the transcribed text. The default prefix is `/speechRecognition`, but it can be changed with the `--prefix` option.
+
+## PocketSphinx backend
+
+Install the `pocketsphinx` package with:
+
+```bash
+pip install pocketsphinx
+```
 
-Once `speechRecognition.py` has started, connect it to the yarp configuration dictionary port and change the language to use.
-For example, if you want to change to waiter Spanish orders, put:
+Then, launch the program with the `--backend pocketsphinx --dictionary xxx --language xxx` options. The dictionary and language combo relies on the adequate dictionary and model files being installed (check [share/speechRecognition](/share/speechRecognition/)). For example, to use the waiter Spanish orders dictionary, put:
 
 ```bash
-yarp rpc /speechRecognition/rpc:s
-setDictionary waiter es
+speechRecognition --backend pocketsphinx --dictionary waiter --language es
 ```
 
-## Troubleshooting
+## Vosk (Kaldi) backend
+
+Install the `vosk` package with:
+
+```bash
+pip install vosk
+```
+
+Then, launch the program with the `--backend vosk --model xxx` options. Model files are downloaded on demand from the [Vosk website](https://alphacephei.com/vosk/models). For example, to use the ~50 MB Spanish model, put:
+
+```bash
+speechRecognition --backend vosk --model small-es-0.42
+```
 
-Some pointers on muting/unmuting the microphone have been collected in [#13](https://github.com/roboticslab-uc3m/speech/issues/13).
+To list and download the desired models offline and test the Vosk engine, you can use the `vosk-transcriber` application.
diff --git a/programs/speechRecognition/speechRecognition.py b/programs/speechRecognition/speechRecognition.py
@@ -1,55 +1,38 @@
 #!/usr/bin/env python3
 
-# ##
-# # @ingroup speech-programs
-# # \defgroup speechRecognition speechRecognition.py
-# # @brief Provides basic ASR capabilities.
-# #
-# # @section speechRecognition_legal Legal
-# #
-# # Copyright: 2016-present (c) edits by Santiago Morante, Juan G Victores and Raul de Santos; past-2008 (c) Carnegie Mellon University.
-# #
-# # CopyPolicy: You may modify and redistribute this file under the same terms as the CMU Sphinx system. See
-# # http://cmusphinx.sourceforge.net/html/LICENSE for more information.
-# #
-# # @section speechRecognition_running Running (assuming correct installation)
-# #
-# # First we must run a YARP name server if it is not running in our current namespace:
-# #
-# # \verbatim
-# # [on terminal 1] yarp server
-# # \endverbatim
-# #
-# # Now launch the program:
-# #
-# # \verbatim
-# # [on terminal 2] speechRecognition
-# # \endverbatim
-# #
-# # You can launch a 3rd terminal to read what is published via YARP port:
-# #
-# # \verbatim
-# # [on terminal 3] yarp read ...  /speechRecognition:o
-# # \endverbatim
-# #
-# # or you can send commands to configure the speech recognition app. To know how to do that, use "help":
-# # \verbatim
-# # [on terminal 4] yarp rpc /speechRecognition/rpc:s
-# # [on terminal 4] >>help
-# # \endverbatim
-# #
-# # For instance, you can load the 20k words Spanish dictionary with:
-# # \verbatim
-# # [on terminal 4] >>setDictionary 20k es
-# # \endverbatim
-# #
-# # @section speechRecognition_troubleshooting Troubleshooting
-# #
-# # 1. gst-inspect-1.0 pocketsphinx
-# #
-# # 2. Check operating system sound settings
-# #
-# # 3. Check hardware such as cables, and physical volume controls and on/off switches
+##
+# @ingroup speech-programs
+# @defgroup speechRecognition speechRecognition.py
+# @brief Provides basic ASR capabilities.
+#
+# First we must run a YARP name server if it is not running in our current namespace:
+#
+# \verbatim
+# [on terminal 1] yarp server
+# \endverbatim
+#
+# Now launch the program (use --help for a list of options):
+#
+# @verbatim
+# [on terminal 2] speechRecognition --backend vosk --model es-0.42
+# @endverbatim
+#
+# You can launch a 3rd terminal to read what is published via YARP port:
+#
+# @verbatim
+# [on terminal 3] yarp read ...  /speechRecognition/result:o
+# @endverbatim
+#
+# or you can send commands to configure the speech recognition app. To know how to do that, use "help":
+# @verbatim
+# [on terminal 4] yarp rpc /speechRecognition/rpc:s
+# [on terminal 4] >>help
+# @endverbatim
+#
+# For instance, you can load the 20k words Spanish dictionary for the PocketSphinx backend with:
+# @verbatim
+# [on terminal 4] >>setDictionary 20k es
+# @endverbatim
 
 # borrowed from https://github.com/alphacep/vosk-api/blob/12f29a3/python/example/test_microphone.py
 # and https://github.com/cmusphinx/pocketsphinx/blob/b16c631/examples/live.py
@@ -235,7 +218,7 @@ def int_or_str(text):
 parser.add_argument('--dictionary', type=str, help='(only --backend pocketsphinx) dictionary, e.g. follow-me')
 parser.add_argument('--language', type=str, help='(only --backend pocketsphinx) language, e.g. en-us')
 parser.add_argument('--model', type=str, help='(only --backend vosk) model, e.g. es-0.42')
-parser.add_argument('--port', '-p', type=str, default='/speechRecognition', help='YARP port prefix')
+parser.add_argument('--prefix', '-p', type=str, default='/speechRecognition', help='YARP port prefix')
 parser.add_argument('--context', type=str, default='speechRecognition', help='YARP context directory')
 parser.add_argument('--from', type=str, dest='ini', default='speechRecognition.ini', help='YARP configuration (.ini) file')
 args = parser.parse_args(remaining)
@@ -269,11 +252,11 @@ def int_or_str(text):
 asrPort = yarp.BufferedPortBottle()
 configPort = yarp.RpcServer()
 
-if not asrPort.open(args.port + '/result:o'):
+if not asrPort.open(args.prefix + '/result:o'):
     print('Unable to open output port')
     raise SystemExit
 
-if not configPort.open(args.port + '/rpc:s'):
+if not configPort.open(args.prefix + '/rpc:s'):
     print('Unable to open RPC port')
     raise SystemExit