forked from zaf/asterisk-speech-recog
-
Notifications
You must be signed in to change notification settings - Fork 0
/
README
209 lines (187 loc) · 7.38 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
==============================================
Speech recognition script for Asterisk
==============================================
This script makes use of Google's Cloud Speech API in order to render speech
to text and return it back to the dialplan as an asterisk channel variable.
------------
Requirements
------------
Perl The Perl Programming Language
perl-libwww The World-Wide Web library for Perl
perl-libjson Module for manipulating JSON-formatted data
IO-Socket-SSL Perl module that implements an interface to SSL sockets.
flac Free Lossless Audio Codec
Cloud Speech API key from Google (https://cloud.google.com/speech).
Internet access in order to contact Google and get the speech data.
------------
Installation
------------
To install copy speech-recog.agi to your agi-bin directory.
Usually this is /var/lib/asterisk/agi-bin/
To make sure check your /etc/asterisk/asterisk.conf file
-----
Usage
-----
agi(speech-recog.agi,[lang],[timeout],[intkey],[NOBEEP],[rtimeout],[speechContexts])
Records from the current channel until 2 seconds of silence are detected
(this can be set by the user by the 'timeout' argument, -1 for no timeout) or the
interrupt key (# by default) is pressed. If NOBEEP is set, no beep sound is played
back to the user to indicate the start of the recording. If 'rtimeout' is set,
overwrite to the absolute recording timeout. 'SpeechContext' provides hints to
favor specific words and phrases in the results. Usage: [Agamemnon,Midas]
The recorded sound is send over to Google speech recognition service and the
returned text string is assigned as the value of the channel variable 'utterance'.
The scripts sets the following channel variables:
utterance : The generated text string.
confidence : A value between 0 and 1 indicating the probability of a correct recognition.
Values bigger than 0.95 usually mean that the resulted text is correct.
In case of an unexpected error both these variables are set to '-1'.
--------
Examples
--------
sample dialplan code for your extensions.conf
;Simple speech recognition
exten => 1234,1,Answer()
exten => 1234,n,agi(speech-recog.agi,en-US)
exten => 1234,n,Verbose(1,The text you just said is: ${utterance})
exten => 1234,n,Verbose(1,The probability to be right is: ${confidence})
exten => 1234,n,Hangup()
;Speech recognition demo also using googletts.agi for text to speech synthesis:
exten => 1235,1,Answer()
exten => 1235,n,agi(googletts.agi,"Say something in English, when done press the pound key.",en)
exten => 1235,n(record),agi(speech-recog.agi,en-US)
exten => 1235,n,Verbose(1,Script returned: ${confidence} , ${utterance})
;Check the probability of a successful recognition:
exten => 1235,n,GotoIf($["${confidence}" > "0.8"]?playback:retry)
;Playback the text
exten => 1235,n(playback),agi(googletts.agi,"The text you just said was...",en)
exten => 1235,n,agi(googletts.agi,"${utterance}",en)
exten => 1235,n,goto(end)
;Retry in case speech recognition wasn't successful:
exten => 1235,n(retry),agi(googletts.agi,"Can you please repeat more clearly?",en)
exten => 1235,n,goto(record)
exten => 1235,n(fail),agi(googletts.agi,"Failed to get speech data.",en)
exten => 1235,n(end),Hangup()
;Voice dialing example
exten => 1236,1,Answer()
exten => 1236,n,agi(googletts.agi,"PLease say the number you want to dial.",en)
exten => 1236,n(record),agi(speech-recog.agi,en-US)
exten => 1236,n,GotoIf($["${confidence}" > "0.8"]?success:retry)
exten => 1236,n(success),goto(${utterance},1)
exten => 1236,n(retry),agi(googletts.agi,"Can you please repeat?",en)
exten => 1236,n,goto(record)
-------------------
Supported Languages
-------------------
"af-ZA" Afrikaans (Suid-Afrika)
"id-ID" Bahasa Indonesia (Indonesia)
"ms-MY" Bahasa Melayu (Malaysia)
"ca-ES" Català (Espanya)
"cs-CZ" Čeština (Česká republika)
"da-DK" Dansk (Danmark)
"de-DE" Deutsch (Deutschland)
"en-AU" English (Australia)
"en-CA" English (Canada)
"en-GB" English (Great Britain)
"en-IN" English (India)
"en-IE" English (Ireland)
"en-NZ" English (New Zealand)
"en-PH" English (Philippines)
"en-ZA" English (South Africa)
"en-US" English (United States)
"es-AR" Español (Argentina)
"es-BO" Español (Bolivia)
"es-CL" Español (Chile)
"es-CO" Español (Colombia)
"es-CR" Español (Costa Rica)
"es-EC" Español (Ecuador)
"es-SV" Español (El Salvador)
"es-ES" Español (España)
"es-US" Español (Estados Unidos)
"es-GT" Español (Guatemala)
"es-HN" Español (Honduras)
"es-MX" Español (México)
"es-NI" Español (Nicaragua)
"es-PA" Español (Panamá)
"es-PY" Español (Paraguay)
"es-PE" Español (Perú)
"es-PR" Español (Puerto Rico)
"es-DO" Español (República Dominicana)
"es-UY" Español (Uruguay)
"es-VE" Español (Venezuela)
"eu-ES" Euskara (Espainia)
"fil-PH" Filipino (Pilipinas)
"fr-FR" Français (France)
"gl-ES" Galego (España)
"hr-HR" Hrvatski (Hrvatska)
"zu-ZA" IsiZulu (Ningizimu Afrika)
"is-IS" Íslenska (Ísland)
"it-IT" Italiano (Italia)
"lt-LT" Lietuvių (Lietuva)
"hu-HU" Magyar (Magyarország)
"nl-NL" Nederlands (Nederland)
"nb-NO" Norsk bokmål (Norge)
"pl-PL" Polski (Polska)
"pt-BR" Português (Brasil)
"pt-PT" Português (Portugal)
"ro-RO" Română (România)
"sk-SK" Slovenčina (Slovensko)
"sl-SI" Slovenščina (Slovenija)
"fi-FI" Suomi (Suomi)
"sv-SE" Svenska (Sverige)
"vi-VN" Tiếng Việt (Việt Nam)
"tr-TR" Türkçe (Türkiye)
"el-GR" Ελληνικά (Ελλάδα)
"bg-BG" Български (България)
"ru-RU" Русский (Россия)
"sr-RS" Српски (Србија)
"uk-UA" Українська (Україна)
"he-IL" עברית (ישראל)
"ar-IL" العربية (إسرائيل)
"ar-JO" العربية (الأردن)
"ar-AE" العربية (الإمارات)
"ar-BH" العربية (البحرين)
"ar-DZ" العربية (الجزائر)
"ar-SA" العربية (السعودية)
"ar-IQ" العربية (العراق)
"ar-KW" العربية (الكويت)
"ar-MA" العربية (المغرب)
"ar-TN" العربية (تونس)
"ar-OM" العربية (عُمان)
"ar-PS" العربية (فلسطين)
"ar-QA" العربية (قطر)
"ar-LB" العربية (لبنان)
"ar-EG" العربية (مصر)
"fa-IR" فارسی (ایران)
"hi-IN" हिन्दी (भारत)
"th-TH" ไทย (ประเทศไทย)
"ko-KR" 한국어 (대한민국)
"cmn-Hant-TW" 國語 (台灣)
"yue-Hant-HK" 廣東話 (香港)
"ja-JP" 日本語(日本)
"cmn-Hans-HK" 普通話 (香港)
"cmn-Hans-CN" 普通话 (中国大陆)
-----------------------
Security Considerations
-----------------------
This script contacts Google servers in order send the recorded voice data and get back
the resulted text. The script uses TLS by default to encrypt all the traffic between
your PBX and Google servers so no 3rd party can eavesdrop your communication, but your
voice data will be available to Google under a not yet defined policy.
------------
Tiny version
------------
The '-tiny' suffixed scripts use the HTTP::Tiny perl module instead of LWP::UserAgent and
JSON::Tiny instead of JSON. This makes them a lot faster when run in small embedded systems
or boards like the Raspberry pi. They can be used as an in-place replacement of the normal
scripts and expose the same interface/cli args. To use them just make sure
you have HTTP::Tiny and JSON::Tiny installed.
-------
License
-------
The speech-recog script for asterisk is distributed under the GNU General Public
License v2. See COPYING for details.
--------
Homepage
--------
http://zaf.github.com/asterisk-speech-recog/