Talk = GPT-2 + Whisper + WASM #167
Replies: 11 comments 8 replies
-
this sounds really fun! |
Beta Was this translation helpful? Give feedback.
-
So.. this is turning out to be even better than I expected 😆 talk-0.mp4 |
Beta Was this translation helpful? Give feedback.
-
These results are extremely impressive! I recently tried to implement something similar in Python, only not locally, but instead using different online APIs, but it felt worse than your demo video because Whisper is much better than the free Google Speech Recognition API (and your optimized version runs significantly better on CPU than the standard Whisper Python lib I tried) :). |
Beta Was this translation helpful? Give feedback.
-
And here is a less cringe video to demonstrate the capabilities of this implementation: talk-tech-demo-0-lq.mp4These are 2 Chrome tabs talking and being nice to each other using the microphone and the speakers of a Macbook. |
Beta Was this translation helpful? Give feedback.
-
There is an error on browser Firefox and Chrome.. |
Beta Was this translation helpful? Give feedback.
-
Amazing solution. Works like a charm ;)
Pazar, Kasım 27, 2022 13:17 +03 tarihinde, Georgi Gerganov ***@***.***> şunu yazdı:
Likely, you haven't enabled cross-origin isolation on your HTTP server.
For more information, see my #88 (comment)
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you commented.Message ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
This is great, been trying to make the same thing except through terminal. Why GPT-2 instead of the GPT-3 text-davinci-003 model? |
Beta Was this translation helpful? Give feedback.
-
should use hot mic method hold down space to talk so you can take longer pauses while you think about what to say release spacebar to translate speech to text |
Beta Was this translation helpful? Give feedback.
-
I think it would be great to have inference of a text-to-speech model like this. |
Beta Was this translation helpful? Give feedback.
-
how about run the wasm on a server? |
Beta Was this translation helpful? Give feedback.
-
Can't you accelerate the model inference with the GPU via WebGPU for C++? |
Beta Was this translation helpful? Give feedback.
-
I just had an awesome idea:
Make a web-page that:
All of this running locally in the browser - no server required
I have all the ingredients and I think the performance is just enough. I just have to put it together.
The total data that the page will have to load on startup (probably using Fetch API) is:
tiny.en
modelsmall
modelI think it will be very fun because you could talk to the web-page or even add extra devices that talk to each other only through the mic and the speakers. For example, you simply open the page on your phone and tablet and put them next to each other - listen to them talk about something 😄
Any ideas to make this even more fun?
Update:
This is now fully functional at: https://whisper.ggerganov.com/talk/
Source code is here: https://github.com/ggerganov/whisper.cpp/tree/master/examples/talk.wasm
Looking for beta testers, feedback and ideas for improvement!
talk-2.mp4
Beta Was this translation helpful? Give feedback.
All reactions