An attempt to add support for decoding input with ffmpeg in Apple M1 #2255
jingcodeguy
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I have been thinking about adding FFmpeg support during the file reading process without needing to convert the file manually each time while using a Mac M1 Max. Although I can resolve this issue by using a bash script to convert the file first and then pass it to whisper.cpp, I still want to try streamlining the process.
Inspired by @WilliamTambellini's PR for adding support for decoding input with FFmpeg in Linux, I have tried to implement this feature for M1.
However, it is not 100% successful because it seems that Whisper cannot recognize the text normally using the resulting WAV file. Here are the details of my trial and implementation.
At first, I attempted to use the implemented
ffmpeg-transcode.cpp
but soon realized that it depends on the version of FFmpeg. With the latest versions (after 6.x), there are many changes, and there is not much example or reference to migrate the old methods. It is necessary to read and test based on the documentation, which is not easy to digest, at least for me.Then I tried another approach by using a subprocess. I conducted isolated tests for a simple implementation of the subprocess using the Boost library and the Poco library. However, I found that there were problems with the Boost library when integrating it into
common.cpp
. So, I chose Poco, which works properly.I tried to replicate the implemented FFmpeg logic for Linux by picking up the file and returning a WAV audio in the pipe. I have tried two approaches; one almost worked, and the other did not work. ("Did not work" means Whisper cannot transcribe it properly, "almost worked" means Whisper can transcribe some parts but then gives garbage.)
During the isolation test, I output the stream read to a file to ensure the audio file works.
The implementation is as follows:
The first one uses a custom WAV header:
Beta Was this translation helpful? Give feedback.
All reactions