Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1 main code #18

Merged
merged 10 commits into from
Apr 18, 2024
Merged

1 main code #18

merged 10 commits into from
Apr 18, 2024

Conversation

greenw0lf
Copy link
Collaborator

Closes #1

Added the main code that runs Whisper on audio to generate transcriptions.

When testing, I recommend using the CPU for processing (it is already set to that in the config.yml). If you happen to have a more powerful Nvidia GPU available, then you can change the value to cuda and test it (I will also test it myself using a GPU I have available).

I have also already tested the vad and word_timestamps settings and it seems to be working fine.

@greenw0lf greenw0lf requested a review from Veldhoen April 9, 2024 11:37
@greenw0lf greenw0lf self-assigned this Apr 9, 2024
Copy link
Member

@Veldhoen Veldhoen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, and runs on my machine. Nice work! I added some small remarks, other than that ready to merge, I'd say :)

whisper.py Show resolved Hide resolved
whisper.py Outdated Show resolved Hide resolved
whisper.py Outdated Show resolved Hide resolved
whisper.py Outdated Show resolved Hide resolved
@Veldhoen
Copy link
Member

Oh and of course it would be really great to add some automated tests, for instance for the config settings.

@greenw0lf
Copy link
Collaborator Author

greenw0lf commented Apr 12, 2024

I tried to run it on GPU via Docker, but it's more complicated than I thought. It expects CUDA version 11.x, but I tried several things and none of them work. I might have to do a multi-stage build in order to make it work. I could test if the GPU works the S3 way since, in that case, it will run locally, without needing Docker (if I remember correctly).

Otherwise, if you have ideas on how to address this, let me know

I will also be adding tests, but for another issue

@greenw0lf greenw0lf merged commit 62b0498 into main Apr 18, 2024
1 check passed
@greenw0lf greenw0lf deleted the 1-main-code branch April 18, 2024 09:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create main_data_processor
2 participants