Real-time transcription on Raspberry Pi 4 #166
Replies: 18 comments 26 replies
-
Fantastic work!! On my RPi 4 with Raspberry Pi OS Lite (bullseye) installed, I had to run
or the compiler would complain about a missing "SDL.h" header file. Now it works like a charm. |
Beta Was this translation helpful? Give feedback.
-
Thanks, worked pretty well on the pi for me! ( twitter clip ) Yeah, I had to install libsdl2-dev as @eternitybt mentioned as well. (for the mic, I used a ReSpeaker Mic Array v2) |
Beta Was this translation helpful? Give feedback.
-
Made a video of the install here: https://youtu.be/caaKhWcfcCY |
Beta Was this translation helpful? Give feedback.
-
Hi, what is version Pi4 are you using? Is there a minimum memory requirement? I'm getting a 'Illegal instruction (core dumped)' when I try this on a 1GB Pi4. |
Beta Was this translation helpful? Give feedback.
-
can this be done on Raspberry? pi 3b+ model? I wanted to use is in speech recognition |
Beta Was this translation helpful? Give feedback.
-
Hello, I followed the build instructions on a Pi4 model B and am receiving this error: "fatal error: immintrin.h: No such file or directory" when attempting the make/build. |
Beta Was this translation helpful? Give feedback.
-
I typed the following in terminal: pi@raspberrypi:~ $ uname -a Linux raspberrypi 6.1.29-v8+ #1652 SMP PREEMPT Wed May 24 14:46:55 BST 2023 aarch64 Also typed: pi@raspberrypi:~ $ cat /etc/os-release PRETTY_NAME="Debian GNU/Linux 11 (bullseye)" Used the Raspberry Pi Imager v1.7.4 with Pi OS 64 bit, Debian Bullseye Desktop |
Beta Was this translation helpful? Give feedback.
-
I am running on Raspberry Pi 4b and I can record through ffmpeg, but Stream has no output : ffmpeg -f pulse -i alsa_input.usb-C-Media_Electronics_Inc._USB_PnP_Sound_Device-00.analog-mono -ar 16000 -ac 1 recording.wav root@a0f34bc2c254:/whisper-cpp/whisper.cpp# ./stream -m ./models/ggml-tiny.bin -t 6 --step 0 --length 30000 -vth 0.6 whisper_model_load: adding 1608 extra tokens whisper_model_load: model size = 73.54 MB whisper_init_state: kv cross size = 8.79 MB main: processing 0 samples (step = 0.0 sec / len = 30.0 sec / keep = 0.0 sec), 6 threads, lang = en, task = transcribe, timestamps = 1 ... [Start speaking] |
Beta Was this translation helpful? Give feedback.
-
Did you try the default example from above? ./stream -m models/ggml-tiny.en.bin --step 4000 --length 8000 -c 0 -t 4 -ac 512 I wasn't able to get your code to work either. ./stream -m ./models/ggml-tiny.bin -t 6 --step 0 --length 30000 -vth 0.6 Try taking the default example and add -vth 0.6 to the end for the voice activation detector (VAD) like below. Worked well for me. ./stream -m models/ggml-tiny.en.bin --step 4000 --length 8000 -c 0 -t 4 -ac 512 -vth 0.6 Also, the line below works with 6 threads which surprised me, because I thought the Raspberry Pi 4 could go up to 4 threads because it has 4 cores. In the task manager, the CPU usage would sometimes throttle up to near 100% when using either -t 4 or -t 6 ./stream -m models/ggml-tiny.en.bin --step 4000 --length 8000 -c 0 -t 6 -ac 512 -vth 0.6 I turned the -step down to 0 like below and it worked once then stopped working. ./stream -m models/ggml-tiny.en.bin --step 0 --length 8000 -c 0 -t 6 -ac 512 -vth 0.6 From what I've seen, upping the --step to 2000 works better and 4000 even better. |
Beta Was this translation helpful? Give feedback.
-
The Raspberry Pi 4 is a bit slow, but some development boards equipped with the RK3588 chip have a 6 TOPS NPU. We should consider supporting these chips, as they could potentially enable "real" real-time transcription. @ggerganov |
Beta Was this translation helpful? Give feedback.
-
Well, it takes several tens of seconds for a 3 second long wav file... |
Beta Was this translation helpful? Give feedback.
-
Whisper is working on the Raspberry Pi 5, up to the small model. Video here: https://youtu.be/W39teHesXus |
Beta Was this translation helpful? Give feedback.
-
I managed to get Output from
Output from
For example, if I use I tried
But then no transcription, even using the sample jfk.wav. |
Beta Was this translation helpful? Give feedback.
-
Hi! Great job, really! One question: is there any example/tutorials/guide/whatever on how to implements the same thing using whisper.cpp inside a python script? Thank you. |
Beta Was this translation helpful? Give feedback.
-
I'm getting an error when compiling using make -j stream on a Raspberry Pi 5 running Pi OS Bookworm 12.2.0 (uname -a following the build instructions I get an error from make which results in no ./stream folder
any ideas? thanks for any help! |
Beta Was this translation helpful? Give feedback.
-
can you show how to fix this build errer ?thanks! |
Beta Was this translation helpful? Give feedback.
-
Hi @ggerganov i was trying to run the |
Beta Was this translation helpful? Give feedback.
-
Hi, I'm trying this on a Raspberry Pi 3, (tried on buster and bullseye) it compiles perfectly and the main example with the wav files works too. The problem is with the stream, it seems like it loses pieces or can't hear well, I'll start by saying that I tried both with a cheap USB microphone and with a Zoom H1 microphone at 44100hz 16bit, if you record with arecord the audio is perfect. I then tried to directly record what the stream program listens to by adding the Some extra info..this is what happens when I run the program:
related: |
Beta Was this translation helpful? Give feedback.
-
It is possible to some extend to run Whisper in real-time mode on an embedded device such as the Raspberry Pi.
Below are a few examples + build instructions.
Real-time with 4 seconds step
whisper-raspberry-2.mp4
Real-time with 7.5 seconds step
whisper-raspberry-3.mp4
Build instructions
More information
In order to speed-up the processing, the Encoder's context is reduced from the original 1500 down to 512 (using the
-ac 512
flag). This allows to run the above examples on a Raspberry Pi 4 Model B (2018) on 3 CPU threads using thetiny.en
Whisper model. The attached microphone is from a USB camera, so not great quality.More detailed discussion can be found in this issue: #7
Explanation of what the
-ac
flags does: #137Beta Was this translation helpful? Give feedback.
All reactions