ASR Worker that uses faster-whisper as the backend, to be used for transcribing AV material from B&G.
This is still a WIP, so it is subject to change.
There are 2 ways in which the whisper-asr-worker can be tested (ON THE CPU):
- Check if Docker is installed
- Make sure you have the
.env.override
file in your local repo folder - In
.env.override
, changeW_DEVICE
fromcuda
tocpu
- Comment out the lines indicated in
docker-compose.yml
- Open your preferred terminal and navigate to the local repository folder
- To build the image, execute the following command:
docker build . -t whisper-asr-worker
- To run the worker, execute the following command:
docker compose up
All commands should be run within WSL if on Windows or within your terminal if on Linux.
- Follow the steps here (under "Adding
pyproject.toml
and generating apoetry.lock
based on it") to install Poetry and the dependencies required to run the worker - Make sure you have the
.env.override
file in your local repo folder - In
.env.override
, changeW_DEVICE
fromcuda
tocpu
- Install
ffmpeg
. You can run this command, for example:
apt-get -y update && apt-get -y upgrade && apt-get install -y --no-install-recommends ffmpeg
- Navigate to
scripts
, then execute the following command:
./run.sh
To run the worker with a CUDA-compatible GPU instead of the CPU, either:
- skip steps 3 & 4 from "Docker CPU run"
- skip step 3 from "Local run"
(OUTDATED BUT STILL MIGHT BE RELEVANT) To run it using a GPU via Docker, check the instructions from the dane-example-worker.
Make sure to replace dane-example-worker
in the docker run
command with dane-whisper-asr-worker
.
The expected run of this worker (whose pipeline is defined in asr.py
) should
-
download the input file if it isn't downloaded already in
/data/input/
viadownload.py
-
download the model if not present via
model_download.py
-
run
transcode.py
if the input file is a video to convert it to audio format (though there are plans to remove this and instead use the audio-extraction-worker to extract the audio) -
run
whisper.py
to transcribe the audio and save it in/data/output/
if a transcription doesn't already exist -
convert Whisper's output to DAAN index format using
daan_transcript.py
-
(optional) transfer the output to an S3 bucket.
If you prefer to use your own model that is stored locally, make sure to set MODEL_BASE_DIR
to the path where the model files can be found. A model found locally will take precedence over downloading it from Huggingface or S3 (so, no matter how W_MODEL
is set, it will ignore it if a model is present locally).
The pre-trained Whisper model version can be adjusted in the .env
file by editing the W_MODEL
parameter. Possible options are:
Size | Parameters |
---|---|
tiny |
39 M |
base |
74 M |
small |
244 M |
medium |
769 M |
large |
1550 M |
large-v2 |
1550 M |
large-v3 |
1550 M |
We recommend version large-v2
as it performs better than large-v3
in our benchmarks.
You can also specify an S3 URI if you have your own custom model available via S3 (by modifying the W_MODEL
parameter).