Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there any way to get rid of [Blank Audio] in transcript? #2420

Open
janngobble opened this issue Sep 17, 2024 · 8 comments
Open

Is there any way to get rid of [Blank Audio] in transcript? #2420

janngobble opened this issue Sep 17, 2024 · 8 comments

Comments

@janngobble
Copy link

I'm seeing this using medium.en:

[00:14:57.380 --> 00:14:59.900]   I'm trying to trace his next look in.
[00:14:59.900 --> 00:15:02.740]   >> Yes, he, he worked here.
[00:15:02.740 --> 00:15:05.180]   He wasn't married though.
[00:15:05.180 --> 00:15:06.820]   No family he ever mentioned.
[00:15:06.820 --> 00:15:08.020]   >> But there's something else.
[00:15:08.020 --> 00:15:10.740]   [BLANK_AUDIO]
[00:15:10.740 --> 00:15:14.740]   >> Miss Stallwood, I do not like to speak ill of the dead, but
[00:15:14.740 --> 00:15:17.900]   there was an incident while he worked here.
[00:15:17.900 --> 00:15:18.820]   >> What kind of incident?
[00:15:18.820 --> 00:15:21.460]   [BLANK_AUDIO]

I don't know why it would list "[Blank_Audio]" instead of just not putting a timestamp in the file...

Can I suppress this token?

thanks!

Jann

@mrfragger
Copy link

st=$SECONDS && for f in *.opus ; do ffmpeg -hide_banner -i "$f" -f wav -ar 16000 -ac 1 - | nice ~/whisper/whisper.cpp-1.6.2/./main -m ~/whisper/whisper.cpp-1.6.2/models/ggml-$setmodel.bin - -ovtt -of "$f" -t 8 -l "$setlanguage" $translate -ml $maxlength $splitatword $setprintcolors $setng $setfa --prompt "$setprompt" ; for f in *.vtt ; do LC_ALL=C sed -r -i .bak -e's|\[BLANK_AUDIO\]||g' "$f" ; done && for i in *opus.vtt ; do mv -i -- "$i" "$(printf '%s' "$i" | sed '1s/.opus.vtt/.vtt/')" ; [ ! -d vttsubs ] && mkdir vttsubs/ ; mv *.vtt vttsubs/ ; done && rm *.bak ; done && secs=$((SECONDS-st)); printf '\nwhisper.cpp took %02dh:%02dm:%02ds\n' $(($secs/3600)) $(($secs%3600/60)) $(($secs%60))

@janngobble
Copy link
Author

st=$SECONDS && for f in *.opus ; do ffmpeg -hide_banner -i "$f" -f wav -ar 16000 -ac 1 - | nice ~/whisper/whisper.cpp-1.6.2/./main -m ~/whisper/whisper.cpp-1.6.2/models/ggml-$setmodel.bin - -ovtt -of "$f" -t 8 -l "$setlanguage" $translate -ml $maxlength $splitatword $setprintcolors $setng $setfa --prompt "$setprompt" ; for f in *.vtt ; do LC_ALL=C sed -r -i .bak -e's|\[BLANK_AUDIO\]||g' "$f" ; done && for i in *opus.vtt ; do mv -i -- "$i" "$(printf '%s' "$i" | sed '1s/.opus.vtt/.vtt/')" ; [ ! -d vttsubs ] && mkdir vttsubs/ ; mv *.vtt vttsubs/ ; done && rm *.bak ; done && secs=$((SECONDS-st)); printf '\nwhisper.cpp took %02dh:%02dm:%02ds\n' $(($secs/3600)) $(($secs%3600/60)) $(($secs%60))

Maybe I should've specified... I'm doing subtitles... and all these are going in the SRT file. I do not wanna have to go line by line and remove the group of 4 lines with this each time it's in the srt. Then also the index numbers are no longer sequential.

But thanks for the option...

@jensdraht1999
Copy link

You know, what the problem is normally with removing silence. It can be done with replacing it with another sound, but I this will lead Whisper to recognize is another sound. I think, currently this is an unfixable issue with the whisper models we currently have. You can try whisper distill (for english only). At least, this should be doing good work for you.

Please close this issue and here is my script, if you would like to try it out:

`@echo off
REM Navigate to the input folder
cd input

REM Delete all .srt files in the input folder
del /q *.srt

REM Check if the "temp" folder exists
if exist temp (
REM Clean all files in the "temp" folder
del /q temp*
if exist temp\temp (
del /q temp\temp*
) else (
mkdir temp\temp
)
) else (
REM Create the "temp" folder
mkdir temp
mkdir temp\temp
)

REM Look for all media files in the input folder
for %%f in (*.mp4 *.avi *.mkv *.mov *.wmv *.flv *.webm *.mpeg *.mpg *.mp3 *.wav *.aac *.flac *.ogg *.wma *.m4a *.aiff *.alac) do (

REM Create a temporary WAV file from the media file using ffmpeg with 16 kHz sample rate
"..\software\ffmpeg.exe" -i "%%f" -ar 16000 -y -q:a 0 -map a "temp\\temp1.wav"

setlocal enabledelayedexpansion

REM Create or clear the CSV file
echo filename;silence > temp\silence_detection.csv

REM Split temp1.wav into 1-second chunks
"..\software\ffmpeg.exe" -i "temp\\temp1.wav" -f segment -segment_time 1 -c copy "temp\\temp\\chunk_%%06d.wav"

REM Check each chunk for silence
for %%i in (temp\\temp\\chunk_*.wav) do (
    echo Processing file: %%i
    set silent=1
    REM Redirect the output to a temporary file
    "..\software\ffmpeg.exe" -i "%%i" -af silencedetect=noise=-50dB:d=0.5 -f null - 2>&1 | findstr /r /c:"silence_end" > nul
    if !errorlevel! equ 0 (
        echo Silence detected in file: %%i
        set silent=0
    )
    echo %%i;!silent! >> temp\silence_detection.csv

    REM Replace the file if silence is detected
    if !silent! equ 0 (
        copy /y "..\software\sound.wav" "%%i"
    )
)

REM Create a file list for concatenation
for /f "tokens=*" %%f in ('dir /b "temp\\temp\\*.wav"') do @echo file %%f >> "temp\\temp\\filelist.txt"

REM Concatenate all WAV files into one output file
"..\software\ffmpeg.exe" -y -f concat -safe 0 -i "temp\\temp\\filelist.txt" -c copy "temp\\temp.wav"

REM Transcribe the temporary WAV file using main.exe with the specified settings
"..\software\main.exe" -t 12 -p 1 -ot 0 -bo 5 -bs 5 -wt 0.01 -et 2.40 -lpt -1.00 -nf -osrt -of "..\input\%%~nf" -pc -pp -l auto -m "..\software\models\ggml-large-v2-q5_0.bin" -f "temp\\temp.wav"
REM Rename the output files based on the original video filename

)

REM Check if the "temp" folder exists
if exist temp (
REM Clean all files in the "temp" folder
del /q temp*
if exist temp\temp (
del /q temp\temp*
) else (
mkdir temp\temp
)
) else (
REM Create the "temp" folder
mkdir temp
mkdir temp\temp
)

echo Process completed.
pause`

Just put you audio or video file in the input folder, then go into the software folder and put in there content of this: https://github.com/ggerganov/whisper.cpp/releases/tag/v1.5.4
(Select cuda, if you have Nvidia GPU).

Then also please put in there ffmpeg.exe from here: https://www.gyan.dev/ffmpeg/builds

Then the script should work.

THIS ONLY WORKS FOR WINDOWS.

@janngobble
Copy link
Author

Just put you audio or video file in the input folder, then go into the software folder and put in there content of this: https://github.com/ggerganov/whisper.cpp/releases/tag/v1.5.4 (Select cuda, if you have Nvidia GPU).

Then also please put in there ffmpeg.exe from here: https://www.gyan.dev/ffmpeg/builds

Then the script should work.

THIS ONLY WORKS FOR WINDOWS.

I 100% disagree. It appears in some audio, and in others it does not. This is not a problem that is “unfixable”. Unless you can tell me why some places have blank audio in transcript and others do not - and then fix it in whisper.cpp - then it is not “by definition” unfixable.

Also, a windows-only solution is not a solution for all the platforms whisper.cpp runs on. And neither is requiring an old version of whisper.cpp.

So, no. With all due respect, do not close this issue.

Jann

@StrandmonYellow
Copy link

I am using this server to run with home assistant. I was also wondering if it is possible to remove the [BLANK_AUDIO] outupts because it messes up the home assistant command.

@jensdraht1999
Copy link

I am using this server to run with home assistant. I was also wondering if it is possible to remove the [BLANK_AUDIO] outupts because it messes up the home assistant command.

It might be possible with a python script running, that will remove that.

@jensdraht1999
Copy link

Just put you audio or video file in the input folder, then go into the software folder and put in there content of this: https://github.com/ggerganov/whisper.cpp/releases/tag/v1.5.4 (Select cuda, if you have Nvidia GPU).
Then also please put in there ffmpeg.exe from here: https://www.gyan.dev/ffmpeg/builds
Then the script should work.
THIS ONLY WORKS FOR WINDOWS.

I 100% disagree. It appears in some audio, and in others it does not. This is not a problem that is “unfixable”. Unless you can tell me why some places have blank audio in transcript and others do not - and then fix it in whisper.cpp - then it is not “by definition” unfixable.

Also, a windows-only solution is not a solution for all the platforms whisper.cpp runs on. And neither is requiring an old version of whisper.cpp.

So, no. With all due respect, do not close this issue.

Jann

@janngobble
Hi,

I do not want to upset you, but when I say this is "unfixable", then I mean, it's "unfixable". Whisper.cpp could implement that, what you want, but this will ultimately lead to the problem below:

You know, what happens, when there is a silence: There will be some place it will transcribe as "Thank You" "Subbed by xyz" or whatever. And you know, why this happens?

This happens, because when OpenAi trained Whisper on a lot of video, which very transcribed by a lot of humans. You had the video and the subtitle files. Then according to what happens to the audio, it will try match in text form.

However, some people, that are transcribing their videos, they will at the end or a silent place in their video edit the subtitle file with something like "Thank you" "Thanks for watching" "Subscribe to my channel" "This has been subbed by SubTitleFreak123".

Now what does this means? This means, that OpenAi trained Whisper on a faulty dataset and this behavior cannot be fixed by Whisper.cpp, because the model has been corrupted by those errors.

However, there still is hope, there are guys, who try to fix this behavior, by retraining whisper model with a good dataset with accurate subtitles, so the model tries relearn, that silence is actually silence. Those guys are: https://github.com/huggingface/distil-whisper

@mrfragger
Copy link

I used to use ffmpeg to remove silence but it didn't always work....this seems to work quite well though
https://github.com/DarkTrick/python-video-silence-cutter

"Remove silence with video silence cutter")
    clear
    echo "Remove silence with video silence cutter"
    [ ! -d output ] && mkdir output
    read -p "Enter type of audio files mp4, webm, m4a, mp3, opus, etc. : " typeaudio
        if [ -n "$typeaudio" ] ; then  
      echo "Will remove all silence segments  from all $typeaudio files"
      echo "files will be in output/"
      echo "-22dB deep cleaning of silence"
      echo "-26dB aggressive removes almost all silence"
      echo "-30dB [recommended] medium removes quite a bit"
      echo "-34dB a tad conservative removing silence"
      echo "-38dB quite conservative and just removes a bit"
      echo "-42dB kinda too strict and barely removes stuff"
      echo ""
    read -p "Input dB for silence removal (22, 26, [30], 34, 38, 42):  " setdBsilence
    setdBsilence=${setdBsilence:-30}
    ls *.$typeaudio
    echo ""
    else
      echo "Didn't input extension so returning to main menu"
      echo ""
      exec "$0"
      fi
      read -p "Press ENTER to remove silence or m (Main Menu) " choice
      [[ "$choice" == [Mm]* ]] && exec "$0" || echo "silencing"
    st=$SECONDS
    dt=$( date +%Y_%m_%d_%H_%M_%S)
    dtstart=$( date +%Y_%m_%d_%H:%M:%S)
    [ ! -d output ] && mkdir output
    mkdir output/"$dt"
  parallel --bar python3 source/stuff/silence_cutter.py {} output/$dt/{/.}.opus -"$setdBsilence" ::: *.$typeaudio 
    secs=$((SECONDS-st))
    echo "All removed silence audio has been put in output/$dt/"
    echo ""
c=0 ; NAMEPREFIX="[*]" ; DELIMITER="." ; for f in *.$typeaudio ; do infstr=`ffprobe -show_entries format=duration -show_entries "format_tags=title" -sexagesimal -of compact=p=0:nk=1 "${f}" 2>/dev/null`; c=$((${c}+1)) ; echo -e "${NAMEPREFIX}${c}${DELIMITER}" $(sed -e 's/.*|//' <<<${infstr}) [$(sed -e 's/\..*//' <<<${infstr})] ; done
tdt=$(LENGTH=0; for file in *.$typeaudio; do if [ -f "$file" ]; then LENGTH="$LENGTH+$(ffprobe -show_entries format=duration -of default=noprint_wrappers=1:nokey=1 "$file" 2>/dev/null)"; fi; done; echo "$LENGTH" | bc )
echo "Total Seconds of all audio segments"
tdts=$(printf "%.0f" $tdt)
echo $tdts
echo ""
printf 'Total duration of audiobook with original audio files will be %02dh:%02dm:%02ds\n\n' $(($tdts/3600)) $(($tdts%3600/60)) $(($tdts%60))
c=0 ; NAMEPREFIX="[*]" ; DELIMITER="." ; for f in output/$dt/*.opus ; do infstr=`ffprobe -show_entries format=duration -show_entries "format_tags=title" -sexagesimal -of compact=p=0:nk=1 "${f}" 2>/dev/null`; c=$((${c}+1)) ; echo -e "${NAMEPREFIX}${c}${DELIMITER}" $(sed -e 's/.*|//' <<<${infstr}) [$(sed -e 's/\..*//' <<<${infstr})] ; done
tdtm=$(LENGTH=0; for file in output/$dt/*.opus; do if [ -f "$file" ]; then LENGTH="$LENGTH+$(ffprobe -show_entries format=duration -of default=noprint_wrappers=1:nokey=1 "$file" 2>/dev/null)"; fi; done; echo "$LENGTH" | bc )
echo "Total Seconds of all audio segments with silence removed"
tdfsm=$(printf "%.0f" $tdtm)
echo $tdfsm
echo ""
printf 'Total duration of audiobook with silenced removed will be %02dh:%02dm:%02ds\n\n' $(($tdfsm/3600)) $(($tdfsm%3600/60)) $(($tdfsm%60))
diffsecs=$(echo $tdts-$tdfsm | bc )
echo "$diffsecs seconds less"
echo "Total duration difference from original audio files" 
echo "compared to silence removed of audio files is"
printf '%02dh:%02dm:%02ds\n\n' $(($diffsecs/3600)) $(($diffsecs%3600/60)) $(($diffsecs%60))
echo ""
dtend=$( date +%Y_%m_%d_%H:%M:%S)
secs=$((SECONDS-st))
printf 'Removing silence took %02dh:%02dm:%02ds\n' $(($secs/3600)) $(($secs%3600/60)) $(($secs%60))
echo "Removing silence  started at $dtstart"
echo "Removing silence finished at $dtend"
echo "-dB was set at: -'$setdBsilence'"
echo ""
dtmove=$( date +%Y_%m_%d_%H_%M_%S)
[ ! -d output ] && mkdir output
mkdir output/"$dtmove"
parallel --bar ffmpeg -i  {} -hide_banner -c:a libopus -b:a 32k -af dynaudnorm output/$dtmove/{/} ::: output/$dt/*.opus
secsh=$((SECONDS-st))
printf 'Removing silence took %02dh:%02dm:%02ds\n' $(($secsh/3600)) $(($secsh%3600/60)) $(($secsh%60))
read -p "Press ENTER to continue"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants