Large Model hallucination and repeating issue #1490

cchhenwei · 2023-11-15T04:22:11Z

cchhenwei
Nov 15, 2023

Hi all,

I just installed the large model of whisper.cpp and everything seems to be functioning well.
However, when I start transcribing the .wav file (40 mins), the self-repeating transcribe occurs many times.
I've tried to enhance the audio quality, lowering the noise and background music... but I always randomly start repeating the same sentences in the transcribing process.

Just tried some solutions that have been posted in the previous discussion posts (and tried to transcribe in a different language), but it still didn't work and kept repeating the same sentences.

Is there any effective way that could solve this problem?

sscotti · 2023-11-15T07:58:07Z

sscotti
Nov 15, 2023

Can you elaborate on what you mean by "Large Model hallucination" ? As a former drug addict (Hallucinogens mostly), I have some experience with hallucinations. The LLM probably does not have enough context to interpret the audio file. There are many articles on the internet about that problem.

hallucinations-in-llms-what-you-need-to-know-before-integration

It probably has something to do with one or more results for whatever algorithm it used to arrive at a solution for a snippet of sound, and then maybe it just gets 'stuck'. I do not know, but functions are usually one-to-one.

This is an interesting one:

Discriminating and Toxic Content LLMs have the potential to perpetuate and amplify harmful biases through hallucinations, giving rise to the production of discriminatory and toxic content. Recent research highlights the impact of persona assignment on ChatGPT, indicating that its toxicity can increase up to 6 times, resulting in outputs that endorse incorrect stereotypes, harmful dialogue, and hurtful opinions. The training data utilized for LLMs often contains stereotypes, which can be reinforced by the models themselves. For instance, studies have revealed significant over-representation of younger users, primarily from developed countries and English-speaking backgrounds, within LLMs. Consequently, LLMs may generate discriminatory content targeting disadvantaged groups based on factors such as race, gender, religion, ethnicity, and more. For example, certain entities, such as specific races, are 3 times more targeted regardless of the assigned persona, indicating the presence of inherent discriminatory biases within the model. These hallucinated outputs have the potential to perpetuate harmful ideas, contributing to the marginalization and discrimination of vulnerable communities.
Read more at: https://masterofcode.com/blog/hallucinations-in-llms-what-you-need-to-know-before-integration

0 replies

cchhenwei · 2023-11-15T08:16:20Z

cchhenwei
Nov 15, 2023
Author

Hi Scotti, Thanks for replying to my discussion. When I said "Large Model hallucination," I meant that I followed the instructions to download and install the whisper.cpp on my Macbook Pro M1, I chose the "large" model, which the developer offered to install (SHA: ad82bf6a9043ceed055076d0fd39f5f186ff8062), and successfully installed and run it. Everything goes well until the model processes the transcribe work, it shows that in a particular time code (depending on the audio file), the model will stuck and keep repeating the same sentences. I can attach the screenshot to show you this situation. (Tried in different translated language, but the repeating issue always occured) I've tried to install the fixed plugs (#1059 <openai/whisper#1059>), or follow some instructions that previous users have offered, but it still doesn't work. Hope that I have articulated my question correctly, if there's any unclear part, feel free to let me know, thanks. Sincerely, John Stephen D. Scotti ***@***.***> 於 2023年11月15日週三下午3:58寫道：

…

Can you elaborate on what you mean by "Large Model hallucination" ? As a former drug addict (Hallucinogens mostly), I have some experience with hallucinations. The LLM probably does not have enough context to interpret the audio file. There are many articles on the internet about that problem. hallucinations-in-llms-what-you-need-to-know-before-integration <https://masterofcode.com/blog/hallucinations-in-llms-what-you-need-to-know-before-integration> It probably has something to do with one or more results for whatever algorithm it used to arrive at a solution for a snippet of sound, and then maybe it just gets 'stuck'. I do not know, but functions are usually one-to-one. — Reply to this email directly, view it on GitHub <#1490 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/A2QV7CYHSYQRWHADHAAJP7TYERYZTAVCNFSM6AAAAAA7L3XI5SVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM3TKNZTHA3TM> . You are receiving this because you authored the thread.Message ID: ***@***.***>

1 reply

sscotti Nov 15, 2023

I am a little confused by the last e-mail I received. Did you already see this: openai/whisper#1059

Maybe you have a memory problem ? What are the specs for your MacBook ? I have an old 2014 iMac and a MacBook Pro M1 chip that I purchased in the USA.

I'd have to boot up my laptop to get the specs for that.

cchhenwei · 2023-11-15T09:23:27Z

cchhenwei
Nov 15, 2023
Author

Yes, and I installed "pip install openai-whisper==20230308" However, the repeating issue (every sentence was repeated) still appeared, and it didn't just happen in the silence segment but also replaced the correct lines that audio recorded. I haven't tried the #679 solution, because I can't find the location of "After line 178 of whisper/transcribe.py" Stephen D. Scotti ***@***.***> 於 2023年11月15日週三下午4:34寫道：

…

I am a little confused by the last e-mail I received. Did you already see this: openai/whisper#1059&#discussion-4942423 — Reply to this email directly, view it on GitHub <#1490 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/A2QV7CZ2RXJMKP43WSYLVNDYER5CDAVCNFSM6AAAAAA7L3XI5SVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM3TKNZUGIZDK> . You are receiving this because you authored the thread.Message ID: ***@***.***>

0 replies

ggerganov · 2023-11-15T11:16:42Z

ggerganov
Nov 15, 2023
Maintainer

I believe that there is something wrong with the v3 large model, so you should try using large-v2 instead.
I will soon switch back the default large to v2

0 replies

cchhenwei · 2023-11-15T14:22:47Z

cchhenwei
Nov 15, 2023
Author

I see, thanks for the information. I've reinstalled the large-v2 model and the fixed plug that has been mentioned in #1059 (pip install openai-whisper==20230308) The repeating issue does decrease, but it still happens in some moments, which lasts between 10-45 secs. The V2 model is more stable for sure, and I've read some discussions that say the repeating issue won't be completely solved for now. Again, thanks for the tech support, if there's any way to fix this issue completely, please let me know, thanks! Georgi Gerganov ***@***.***> 於 2023年11月15日週三下午7:16寫道：

…

I believe that there is something wrong with the v3 large model, so you should try using large-v2 instead. I will soon switch back the default large to v2 — Reply to this email directly, view it on GitHub <#1490 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/A2QV7C4VQMEYCBL7VDLJVYTYESQCNAVCNFSM6AAAAAA7L3XI5SVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM3TKNZVHA2TS> . You are receiving this because you authored the thread.Message ID: ***@***.***>

1 reply

senorfunes Nov 16, 2023

I'm experiencing a similar problem.
Please note, I'm not a coder, so I've only been doing what is accessible through basic instructions found online. That said, I've got whisper.cpp up and running on a 2023 Mac with an M2Max and 96GB RAM.
I am trying to use it to generate rough transcripts of Turkish TV dramas, meaning there are pauses and often background noises. If this worked well, I might also see how translation works, but the hallucination problem is serious. As far as I can tell, it may be prompted by a lack of dialog, which often (about 50% of the time) leads to a hallucination along the lines of "subtitles by" or "transcribed by" or "thanks for watching" (but in Turkish), in keeping with what shows up during silences on the models on which Whisper was trained. Other times, a line of dialog just gets repeated. In any case, the repetitions tend to go on for VERY long (up to 20 minutes out of roughly hour-long episodes), and it's not clear to me what causes the system to click back in to recognizing the actual dialog. The non-hallucination portions of the transcripts are decent, but hallucinations make up more than half of the total, so the transcripts are ultimately not very useful.
I have made the large-v2 model and that did seem to improve things somewhat, but the problem is still serious. I also installed "pip install openai-whisper==20230308" as noted above, though it is unclear to me whether this would or should affect whisper.cpp. In any case, I did not notice a major difference in behavior after this step.
I have seen a number of suggestions for dealing with similar issues in the broader Whisper forums, but it's not clear to me whether the suggestions there are transferable to Whisper.cpp. I have also downloaded and installed the version of Whisper via [pip install -U openai-whisper], but my lack of coding ability means my attempts to use it thus far have not gotten very far. (It's present on my machine, but I get various error messages when attempting even the most basic commands.)
So my main question here is whether and (if so) how some of the suggestions and solutions on the broader Whisper forums could possibly be applied to Whisper.cpp. Examples of those forums and solutions are as follows:
openai/whisper#1059
openai/whisper#679
https://github.com/fleek/VADtransciber
https://github.com/EtienneAb3d/WhisperHallu
https://github.com/EtienneAb3d/WhisperTimeSync
Thanks in advance for any thoughts you might be able to offer.
Best wishes,
Josh

ArthurPeabody · 2023-11-22T17:57:10Z

ArthurPeabody
Nov 22, 2023

This happens to me occasionally. I restart the transcription, using the -ot switch, at the point Whisper gets stuck (or whatever is happening to it). If it happens after a stretch of music or other non-verbal content, I start it where speech begins again. It's happened with the base as well as the large model.

0 replies

cchhenwei · 2023-11-23T11:28:47Z

cchhenwei
Nov 23, 2023
Author

I see, have you tried to update to the latest version (1.5.0)? I don't know if the new update will solve or decrease this issue. senorfunes ***@***.***> 於 2023年11月17日週五上午12:26寫道：

…

I'm experiencing a similar problem. Please note, I'm not a coder, so I've only been doing what is accessible through basic instructions found online. That said, I've got whisper.cpp up and running on a 2023 Mac with an M2Max and 96GB RAM. I am trying to use it to generate rough transcripts of Turkish TV dramas, meaning there are pauses and often background noises. If this worked well, I might also see how translation works, but the hallucination problem is serious. As far as I can tell, it may be prompted by a lack of dialog, which often (about 50% of the time) leads to a hallucination along the lines of "subtitles by" or "transcribed by" or "thanks for watching" (but in Turkish), in keeping with what shows up during silences on the models on which Whisper was trained. Other times, a line of dialog just gets repeated. In any case, the repetitions tend to go on for VERY long (up to 20 minutes out of roughly hour-long episodes), and it's not clear to me what causes the system to click back in to recognizing the actual dialog. The non-hallucination portions of the transcripts are decent, but hallucinations make up more than half of the total, so the transcripts are ultimately not very useful. I have made the large-v2 model and that did seem to improve things somewhat, but the problem is still serious. I also installed "pip install openai-whisper==20230308" as noted above, though it is unclear to me whether this would or should affect whisper.cpp. In any case, I did not notice a major difference in behavior after this step. I have seen a number of suggestions for dealing with similar issues in the broader Whisper forums, but it's not clear to me whether the suggestions there are transferable to Whisper.cpp. I have also downloaded and installed the version of Whisper via [pip install -U openai-whisper], but my lack of coding ability means my attempts to use it thus far have not gotten very far. (It's present on my machine, but I get various error messages when attempting even the most basic commands.) So my main question here is whether and (if so) how some of the suggestions and solutions on the broader Whisper forums could possibly be applied to Whisper.cpp. Examples of those forums and solutions are as follows: openai/whisper#1059 <openai/whisper#1059> openai/whisper#679 <openai/whisper#679> https://github.com/fleek/VADtransciber https://github.com/EtienneAb3d/WhisperHallu https://github.com/EtienneAb3d/WhisperTimeSync Thanks in advance for any thoughts you might be able to offer. Best wishes, Josh — Reply to this email directly, view it on GitHub <#1490 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/A2QV7C4MZRCB2F43QF3JA5TYEY5EDAVCNFSM6AAAAAA7L3XI5SVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM3TKOJQGE2TG> . You are receiving this because you authored the thread.Message ID: ***@***.***>

1 reply

senorfunes Nov 23, 2023

Thanks cchhenwei and ArthurPeabody! The combinations of large-V2 and -mc made a big difference, once I actually managed to get them working. (My command prompts were off on both counts, and it took me a while to realize this.)
Best wishes,
Josh

cchhenwei · 2023-11-23T11:31:08Z

cchhenwei
Nov 23, 2023
Author

I've tried large-v2 and large-v3, and it turns out that v2 is more stable, and the self-repeating issue will happen shorter (around 5-15 secs). Seems like this issue will not able to completely fixed in the while, hope that the update version (1.5.0) will solve the problem. ArthurPeabody ***@***.***> 於 2023年11月23日週四上午1:57寫道：

…

This happens to me occasionally. I restart the transcription, using the -ot switch, at the point Whisper gets stuck (or whatever is happening to it). If it happens after a stretch of music or other non-verbal content, I start it where speech begins again. It's happened with the base as well as the large model. — Reply to this email directly, view it on GitHub <#1490 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/A2QV7C2YNXNAXFU5Z7FITO3YFY4IFAVCNFSM6AAAAAA7L3XI5SVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM3TMNBVGEYDK> . You are receiving this because you authored the thread.Message ID: ***@***.***>

0 replies

cchhenwei · 2023-11-23T11:55:49Z

cchhenwei
Nov 23, 2023
Author

Glad to hear that, but I still don't know how to use/activate the -ot switch, could you give me a hint on it? senorfunes ***@***.***> 於 2023年11月23日週四下午7:48寫道：

…

Thanks cchhenwei and ArthurPeabody! The combinations of large-V2 and -ot made a big difference, once I actually managed to get them working. (My command prompts were off on both counts, and it took me a while to realize this.) Best wishes, Josh — Reply to this email directly, view it on GitHub <#1490 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/A2QV7C5NH6QEIBGQD7T7WU3YF4ZZDAVCNFSM6AAAAAA7L3XI5SVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM3TMNJRGE4TO> . You are receiving this because you authored the thread.Message ID: ***@***.***>

1 reply

senorfunes Nov 23, 2023

I'm sorry, cchhenwei, I made a mistake above in confusing -mc (which I eventually got working, and which improved things greatly when set to 0; e.g.: "-mc 0") for -ot (which I have not yet tried). I've edited my post to avoid future confusion. -mc was recommended to me on this forum: #1511 and it took me a number of trials with the order of commands in the prompt to get it to work. I found the same to be true of other commands, so it may be worth trying to change the position of -ot to various spots in the prompt to see if that makes a difference.
I hope this is of some help.
Best wishes,
Josh

cchhenwei · 2023-11-23T12:25:09Z

cchhenwei
Nov 23, 2023
Author

Thanks for that, I followed the instruction you offered in #1511, and tried to replace the code -mc to -ot, and the result shows that -mc is more stable and near perfect (still have a little repeating sentence), and the -ot seems even worse (more hallucination). In my case, "whisper.cpp-master % ./main -osrt -mc 0 -l tr -m ./models/ggml-large-v2.bin -f filepath" is the best command as far. senorfunes ***@***.***> 於 2023年11月23日週四下午8:08寫道：

…

I'm sorry, cchhenwei, I made a mistake above in confusing -mc (which I eventually got working) for -ot (which I have not yet tried). I've edited my post to avoid future confusion. -mc was recommended to me on this forum: #1511 <#1511> and it took me a number of trials with the order of commands in the prompt to get it to work. I found the same to be true of other commands, so it may be worth trying to change the position of -ot to various spots in the prompt to see if that makes a difference. I hope this is of some help. Best wishes, Josh — Reply to this email directly, view it on GitHub <#1490 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/A2QV7C732UQWJMTLVZAXSEDYF44FVAVCNFSM6AAAAAA7L3XI5SVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM3TMNJRGMZDS> . You are receiving this because you authored the thread.Message ID: ***@***.***>

0 replies

ArthurPeabody · 2023-11-23T13:17:39Z

ArthurPeabody
Nov 23, 2023

The -ot switch takes time in milliseconds, so if you want whisper to start at 5 minutes, 32 seconds, you use -ot 332000 . I don't know if order matters, but I use it as the first switch. I have carelessly inserted it between the -m switch and its argument, which just gets me an error. Otherwise it has worked for me.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Large Model hallucination and repeating issue #1490

{{title}}

Replies: 11 comments 4 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Large Model hallucination and repeating issue #1490

Replies: 11 comments · 4 replies

cchhenwei Nov 15, 2023 Author

cchhenwei Nov 15, 2023 Author

ggerganov Nov 15, 2023 Maintainer

cchhenwei Nov 15, 2023 Author

cchhenwei Nov 23, 2023 Author

cchhenwei Nov 23, 2023 Author

cchhenwei Nov 23, 2023 Author

cchhenwei Nov 23, 2023 Author

Replies: 11 comments 4 replies

cchhenwei
Nov 15, 2023
Author

cchhenwei
Nov 15, 2023
Author

ggerganov
Nov 15, 2023
Maintainer

cchhenwei
Nov 15, 2023
Author

cchhenwei
Nov 23, 2023
Author

cchhenwei
Nov 23, 2023
Author

cchhenwei
Nov 23, 2023
Author

cchhenwei
Nov 23, 2023
Author