The strategies for Reliable Long-form Transcription
in whisper.cpp differs from OpenAI's Whisper
#1461
bobqianic
started this conversation in
Show and tell
Replies: 2 comments 4 replies
-
The reason for the temp increase to be 0.4 is that it is faster processing when the fallback triggers. After we add efficient batched decoding, we will reduce it to 0.2. We don't use |
Beta Was this translation helpful? Give feedback.
4 replies
-
Hey @ggerganov and @Artoria2e5 just coming across this thread. Not sure what the current thinking is / timeline of implementation? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I suddenly wanted to take a closer look at the OpenAI Whisper paper, and one section that caught my attention is the one I highlighted in yellow. Then I checked the whisper.cpp code and found that there are mainly two issues: the size of the temperature increment, and the method of calculating the compression ratio. The temperature in whisper.cpp increases by
0.4
each time instead of the0.2
mentioned in the paper. Additionally, whisper.cpp usesentropy
as a substitute for thegzip compression ratio
, while OpenAI Whisper actually compresses the text and calculates the realgzip compression ratio
. @ggerganovTemprature:
whisper/transcribe.py
whisper.cpp/whisper.cpp
Line 3833 in 0de8582
whisper.cpp/whisper.cpp
Lines 4545 to 4554 in 0de8582
Gzip Compression Ratio:
whisper/utils.py
whisper.cpp/whisper.cpp
Lines 4326 to 4372 in 0de8582
Beta Was this translation helpful? Give feedback.
All reactions