-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Removed uncessary channel crushing #48
Conversation
Calling ```set_channels(1) already converts the audio to one channel, so there was no need to call ```split_to_mono```, instead we can just call ```get_array_of_samples``` This should have a negligible positive impact on performance
@abdeladim-s Please review this. In the meantime I'm going to write a helper script to auto convert various types of files into something whisper can accept. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@UsernamesLame, Approved 👍
Looking forward your helper script :)
https://github.com/UsernamesLame/WhisperWav :) It's not remotely done, but it works. I'm thinking of making it spit out pickled numpy arrays of the audio for faster transcription. As of now, it spits out wav files that |
The script looks nice, but if I understand it correctly, it's similar to what I am already doing! |
No I know. But my understanding is you're doing some pre processing pre transcription. I want to do all the pre processing on large batches of data while transcription is actively happening to remove one more task from the processing pipeline. Also, if I'm understanding correctly, you're not actually converting it to 16bit files and that may harm transcription. I'll look over it again, but I also want to remove the need to convert the wav files to numpy arrays before transcribing. That would help with performance. At least according to my CPython internals book that recommends a lot of things to avoid. I'm aiming to make the slowest part loading the model into memory. And even then, I want to explore making it '''deepcopy''' able so we can load the model, clone it in memory, and change its settings as needed from the defaults by using copy.deepcopy. This way multiple instances can be loaded 100% independent of each other vs the current situation with static methods. I explored making everything not static and deep copy friendly but ran into some issues with PyBind11. It's late here, so I might have to advise on that front tomorrow. |
Other than that, good luck with your exploration, the code is yours .. looking forward it 😄 I can merge this for now if you want ? |
Yea please do! As for how we would load them, we could re-initialize the numpy.ndarrays I guess and pass them along to the C++ interface? I'm struggling to explain it. I need to write some code to show it. Also I was confused why the code wasn't updated, just saw you merge it. Edit: My PR isn't merged(?) Says 2 workflows awaiting approval. Also yea I noticed the channel crushing, sorry. I'm super tired right now so I shouldn't comment too much. I am going to see what else I can work on tomorrow. |
Go get some rest .. Take your time! Thanks for the contribution 😄 |
Calling
set_channels(1) already converts the audio to one channel, so there was no need to call
split_to_mono, instead we can just call
get_array_of_samples```This should have a negligible positive impact on performance. Can someone test this quickly for me? I'm not in a position to test at the moment.