Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increasing WER & Validation Loss During Whisper Fine-Tuning #197

Open
monk1337 opened this issue Oct 31, 2023 · 1 comment
Open

Increasing WER & Validation Loss During Whisper Fine-Tuning #197

monk1337 opened this issue Oct 31, 2023 · 1 comment

Comments

@monk1337
Copy link

monk1337 commented Oct 31, 2023

Hi,
I've recently created a dataset using speech-to-text APIs on custom documents. The dataset consists of 1,000 audio samples, with 700 designated for training and 300 for testing. In total, this equates to about 4 hours of audio, where each clip is approximately 30 seconds long.

I'm attempting to fine-tune the Whisper small model with the help of HuggingFace's script, following the tutorial they've provided Fine-Tune Whisper For Multilingual ASR with 🤗 Transformers.

Before diving into the fine-tuning, I evaluated the WER on OpenAI's pre-trained model, which stood at WER = 23.078%.

However, as my fine-tuning progresses, I'm observing some unexpected behavior:

Screenshot 2023-11-01 at 2 11 14 AM

As visible, the Validation Loss and WER are both on the rise during the fine-tuning phase. I'm at a bit of a loss here. Why might this be happening? Any insights or recommendations would be greatly appreciated.

Thank you in advance!
@Vaibhavs10 @sanchit-gandhi

@sanchit-gandhi
Copy link
Contributor

sanchit-gandhi commented Dec 7, 2023

Hey @monk1337! Awesome that you reduce the WER by over half in just 1k training steps! The increasing WER after 1k steps looks like it could well be a case of over-fitting. You could combat this by:

  1. Introducing regularisation through dropout and activation dropout (set these config attributes to either 0.1 or 0.2 to activate dropout. In my experience, a small amount of dropout helps for small datasets. Going above a dropout of 0.2 is too severe and hurts performance)
  2. Using a larger dataset (in practice this may not be feasible, but it is one valid solution to reduce over fitting)

If you have the script you used to train the model and pushed the checkpoint to the Hugging Face Hub, I'd be happy to advise further!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants