Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removed uncessary channel crushing #48

Merged
merged 1 commit into from
Aug 30, 2024

Conversation

UsernamesLame
Copy link
Contributor

Calling set_channels(1) already converts the audio to one channel, so there was no need to call split_to_mono, instead we can just call get_array_of_samples```

This should have a negligible positive impact on performance. Can someone test this quickly for me? I'm not in a position to test at the moment.

Calling ```set_channels(1) already converts the audio to one channel, so there was no need to call ```split_to_mono```, instead we can just call ```get_array_of_samples```

This should have a negligible positive impact on performance
@UsernamesLame
Copy link
Contributor Author

@abdeladim-s Please review this. In the meantime I'm going to write a helper script to auto convert various types of files into something whisper can accept.

Copy link
Owner

@abdeladim-s abdeladim-s left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@UsernamesLame, Approved 👍

Looking forward your helper script :)

pywhispercpp/model.py Show resolved Hide resolved
@UsernamesLame
Copy link
Contributor Author

UsernamesLame commented Aug 30, 2024

@UsernamesLame, Approved 👍

Looking forward your helper script :)

https://github.com/UsernamesLame/WhisperWav :)

It's not remotely done, but it works. I'm thinking of making it spit out pickled numpy arrays of the audio for faster transcription. As of now, it spits out wav files that Whisper.cpp can handle

@abdeladim-s
Copy link
Owner

The script looks nice, but if I understand it correctly, it's similar to what I am already doing!
There is no need to convert the files to wav as we are calling the cpp functions directly, not calling whisper.cpp using the CLI.
Numpy array of the audio file is what we need.

@UsernamesLame
Copy link
Contributor Author

The script looks nice, but if I understand it correctly, it's similar to what I am already doing!

There is no need to convert the files to wav as we are calling the cpp functions directly, not calling whisper.cpp using the CLI.

Numpy array of the audio file is what we need.

No I know. But my understanding is you're doing some pre processing pre transcription.

I want to do all the pre processing on large batches of data while transcription is actively happening to remove one more task from the processing pipeline.

Also, if I'm understanding correctly, you're not actually converting it to 16bit files and that may harm transcription.

I'll look over it again, but I also want to remove the need to convert the wav files to numpy arrays before transcribing.

That would help with performance. At least according to my CPython internals book that recommends a lot of things to avoid.

I'm aiming to make the slowest part loading the model into memory. And even then, I want to explore making it '''deepcopy''' able so we can load the model, clone it in memory, and change its settings as needed from the defaults by using copy.deepcopy.

This way multiple instances can be loaded 100% independent of each other vs the current situation with static methods.

I explored making everything not static and deep copy friendly but ran into some issues with PyBind11.

It's late here, so I might have to advise on that front tomorrow.

@abdeladim-s
Copy link
Owner

  • Yes, pre processing is necessary to make every type of media compatible with whisper.cpp

  • The files need to be converted to 16Khz mono, similar to what you did in the script, it's actually here

  • If you removed the conversion to numpy array, how you are going to pass the data to c++ ?

Other than that, good luck with your exploration, the code is yours .. looking forward it 😄

I can merge this for now if you want ?

@UsernamesLame
Copy link
Contributor Author

UsernamesLame commented Aug 30, 2024

  • Yes, pre processing is necessary to make every type of media compatible with whisper.cpp
  • The files need to be converted to 16Khz mono, similar to what you did in the script, it's actually here
  • If you removed the conversion to numpy array, how you are going to pass the data to c++ ?

Other than that, good luck with your exploration, the code is yours .. looking forward it 😄

I can merge this for now if you want ?

Yea please do!

As for how we would load them, we could re-initialize the numpy.ndarrays I guess and pass them along to the C++ interface? I'm struggling to explain it. I need to write some code to show it. Also I was confused why the code wasn't updated, just saw you merge it.

Edit:

My PR isn't merged(?) Says 2 workflows awaiting approval.

Also yea I noticed the channel crushing, sorry. I'm super tired right now so I shouldn't comment too much. I am going to see what else I can work on tomorrow.

@abdeladim-s
Copy link
Owner

Go get some rest .. Take your time!
Looking forward your futur PRs.
I'll merge this one for now!

Thanks for the contribution 😄

@abdeladim-s abdeladim-s merged commit 4df7cfc into abdeladim-s:main Aug 30, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants