Replies: 11 comments 4 replies
-
Can you elaborate on what you mean by "Large Model hallucination" ? As a former drug addict (Hallucinogens mostly), I have some experience with hallucinations. The LLM probably does not have enough context to interpret the audio file. There are many articles on the internet about that problem. hallucinations-in-llms-what-you-need-to-know-before-integration It probably has something to do with one or more results for whatever algorithm it used to arrive at a solution for a snippet of sound, and then maybe it just gets 'stuck'. I do not know, but functions are usually one-to-one. This is an interesting one:
|
Beta Was this translation helpful? Give feedback.
-
Hi Scotti,
Thanks for replying to my discussion.
When I said "Large Model hallucination," I meant that I followed the
instructions to download and install the whisper.cpp on my Macbook Pro M1,
I chose the "large" model, which the developer offered to install (SHA:
ad82bf6a9043ceed055076d0fd39f5f186ff8062), and successfully installed and
run it.
Everything goes well until the model processes the transcribe work, it
shows that in a particular time code (depending on the audio file), the
model will stuck and keep repeating the same sentences. I can attach the
screenshot to show you this situation. (Tried in different translated
language, but the repeating issue always occured)
I've tried to install the fixed plugs (#1059
<openai/whisper#1059>), or follow some
instructions that previous users have offered, but it still doesn't work.
Hope that I have articulated my question correctly, if there's any unclear
part, feel free to let me know, thanks.
Sincerely,
John
Stephen D. Scotti ***@***.***> 於 2023年11月15日 週三 下午3:58寫道:
… Can you elaborate on what you mean by "Large Model hallucination" ? As a
former drug addict (Hallucinogens mostly), I have some experience with
hallucinations. The LLM probably does not have enough context to interpret
the audio file. There are many articles on the internet about that problem.
hallucinations-in-llms-what-you-need-to-know-before-integration
<https://masterofcode.com/blog/hallucinations-in-llms-what-you-need-to-know-before-integration>
It probably has something to do with one or more results for whatever
algorithm it used to arrive at a solution for a snippet of sound, and then
maybe it just gets 'stuck'. I do not know, but functions are usually
one-to-one.
—
Reply to this email directly, view it on GitHub
<#1490 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/A2QV7CYHSYQRWHADHAAJP7TYERYZTAVCNFSM6AAAAAA7L3XI5SVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM3TKNZTHA3TM>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Yes, and I installed "pip install openai-whisper==20230308"
However, the repeating issue (every sentence was repeated) still appeared,
and it didn't just happen in the silence segment but also replaced the
correct lines that audio recorded.
I haven't tried the #679 solution, because I can't find the location of
"After line 178 of whisper/transcribe.py"
Stephen D. Scotti ***@***.***> 於 2023年11月15日 週三 下午4:34寫道:
… I am a little confused by the last e-mail I received. Did you already see
this:
openai/whisper#1059&#discussion-4942423
—
Reply to this email directly, view it on GitHub
<#1490 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/A2QV7CZ2RXJMKP43WSYLVNDYER5CDAVCNFSM6AAAAAA7L3XI5SVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM3TKNZUGIZDK>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
I believe that there is something wrong with the v3 large model, so you should try using |
Beta Was this translation helpful? Give feedback.
-
I see, thanks for the information.
I've reinstalled the large-v2 model and the fixed plug that has been
mentioned in #1059 (pip install openai-whisper==20230308)
The repeating issue does decrease, but it still happens in some moments,
which lasts between 10-45 secs.
The V2 model is more stable for sure, and I've read some discussions that
say the repeating issue won't be completely solved for now.
Again, thanks for the tech support, if there's any way to fix this issue
completely, please let me know, thanks!
Georgi Gerganov ***@***.***> 於 2023年11月15日 週三 下午7:16寫道:
… I believe that there is something wrong with the v3 large model, so you
should try using large-v2 instead.
I will soon switch back the default large to v2
—
Reply to this email directly, view it on GitHub
<#1490 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/A2QV7C4VQMEYCBL7VDLJVYTYESQCNAVCNFSM6AAAAAA7L3XI5SVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM3TKNZVHA2TS>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
This happens to me occasionally. I restart the transcription, using the -ot switch, at the point Whisper gets stuck (or whatever is happening to it). If it happens after a stretch of music or other non-verbal content, I start it where speech begins again. It's happened with the base as well as the large model. |
Beta Was this translation helpful? Give feedback.
-
I see, have you tried to update to the latest version (1.5.0)? I don't know
if the new update will solve or decrease this issue.
senorfunes ***@***.***> 於 2023年11月17日 週五 上午12:26寫道:
… I'm experiencing a similar problem.
Please note, I'm not a coder, so I've only been doing what is accessible
through basic instructions found online. That said, I've got whisper.cpp up
and running on a 2023 Mac with an M2Max and 96GB RAM.
I am trying to use it to generate rough transcripts of Turkish TV dramas,
meaning there are pauses and often background noises. If this worked well,
I might also see how translation works, but the hallucination problem is
serious. As far as I can tell, it may be prompted by a lack of dialog,
which often (about 50% of the time) leads to a hallucination along the
lines of "subtitles by" or "transcribed by" or "thanks for watching" (but
in Turkish), in keeping with what shows up during silences on the models on
which Whisper was trained. Other times, a line of dialog just gets
repeated. In any case, the repetitions tend to go on for VERY long (up to
20 minutes out of roughly hour-long episodes), and it's not clear to me
what causes the system to click back in to recognizing the actual dialog.
The non-hallucination portions of the transcripts are decent, but
hallucinations make up more than half of the total, so the transcripts are
ultimately not very useful.
I have made the large-v2 model and that did seem to improve things
somewhat, but the problem is still serious. I also installed "pip install
openai-whisper==20230308" as noted above, though it is unclear to me
whether this would or should affect whisper.cpp. In any case, I did not
notice a major difference in behavior after this step.
I have seen a number of suggestions for dealing with similar issues in the
broader Whisper forums, but it's not clear to me whether the suggestions
there are transferable to Whisper.cpp. I have also downloaded and installed
the version of Whisper via [pip install -U openai-whisper], but my lack of
coding ability means my attempts to use it thus far have not gotten very
far. (It's present on my machine, but I get various error messages when
attempting even the most basic commands.)
So my main question here is whether and (if so) how some of the
suggestions and solutions on the broader Whisper forums could possibly be
applied to Whisper.cpp. Examples of those forums and solutions are as
follows:
openai/whisper#1059 <openai/whisper#1059>
openai/whisper#679 <openai/whisper#679>
https://github.com/fleek/VADtransciber
https://github.com/EtienneAb3d/WhisperHallu
https://github.com/EtienneAb3d/WhisperTimeSync
Thanks in advance for any thoughts you might be able to offer.
Best wishes,
Josh
—
Reply to this email directly, view it on GitHub
<#1490 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/A2QV7C4MZRCB2F43QF3JA5TYEY5EDAVCNFSM6AAAAAA7L3XI5SVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM3TKOJQGE2TG>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
I've tried large-v2 and large-v3, and it turns out that v2 is more stable,
and the self-repeating issue will happen shorter (around 5-15 secs).
Seems like this issue will not able to completely fixed in the while, hope
that the update version (1.5.0) will solve the problem.
ArthurPeabody ***@***.***> 於 2023年11月23日 週四 上午1:57寫道:
… This happens to me occasionally. I restart the transcription, using the
-ot switch, at the point Whisper gets stuck (or whatever is happening to
it). If it happens after a stretch of music or other non-verbal content, I
start it where speech begins again. It's happened with the base as well as
the large model.
—
Reply to this email directly, view it on GitHub
<#1490 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/A2QV7C2YNXNAXFU5Z7FITO3YFY4IFAVCNFSM6AAAAAA7L3XI5SVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM3TMNBVGEYDK>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Glad to hear that, but I still don't know how to use/activate the -ot
switch, could you give me a hint on it?
senorfunes ***@***.***> 於 2023年11月23日 週四 下午7:48寫道:
… Thanks cchhenwei and ArthurPeabody! The combinations of large-V2 and -ot
made a big difference, once I actually managed to get them working. (My
command prompts were off on both counts, and it took me a while to realize
this.)
Best wishes,
Josh
—
Reply to this email directly, view it on GitHub
<#1490 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/A2QV7C5NH6QEIBGQD7T7WU3YF4ZZDAVCNFSM6AAAAAA7L3XI5SVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM3TMNJRGE4TO>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Thanks for that, I followed the instruction you offered in #1511, and tried
to replace the code -mc to -ot, and the result shows that -mc is more
stable and near perfect (still have a little repeating sentence), and
the -ot seems even worse (more hallucination). In my case,
"whisper.cpp-master % ./main -osrt -mc 0 -l tr -m
./models/ggml-large-v2.bin -f filepath" is the best command as far.
senorfunes ***@***.***> 於 2023年11月23日 週四 下午8:08寫道:
… I'm sorry, cchhenwei, I made a mistake above in confusing -mc (which I
eventually got working) for -ot (which I have not yet tried). I've edited
my post to avoid future confusion. -mc was recommended to me on this forum:
#1511 <#1511> and it took
me a number of trials with the order of commands in the prompt to get it to
work. I found the same to be true of other commands, so it may be worth
trying to change the position of -ot to various spots in the prompt to see
if that makes a difference.
I hope this is of some help.
Best wishes,
Josh
—
Reply to this email directly, view it on GitHub
<#1490 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/A2QV7C732UQWJMTLVZAXSEDYF44FVAVCNFSM6AAAAAA7L3XI5SVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM3TMNJRGMZDS>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
The -ot switch takes time in milliseconds, so if you want whisper to start at 5 minutes, 32 seconds, you use -ot 332000 . I don't know if order matters, but I use it as the first switch. I have carelessly inserted it between the -m switch and its argument, which just gets me an error. Otherwise it has worked for me. |
Beta Was this translation helpful? Give feedback.
-
Hi all,
I just installed the large model of whisper.cpp and everything seems to be functioning well.
However, when I start transcribing the .wav file (40 mins), the self-repeating transcribe occurs many times.
I've tried to enhance the audio quality, lowering the noise and background music... but I always randomly start repeating the same sentences in the transcribing process.
Just tried some solutions that have been posted in the previous discussion posts (and tried to transcribe in a different language), but it still didn't work and kept repeating the same sentences.
Is there any effective way that could solve this problem?
Beta Was this translation helpful? Give feedback.
All reactions