Removed uncessary channel crushing #48

UsernamesLame · 2024-08-29T20:41:52Z

Calling set_channels(1) already converts the audio to one channel, so there was no need to call split_to_mono, instead we can just call get_array_of_samples```

This should have a negligible positive impact on performance. Can someone test this quickly for me? I'm not in a position to test at the moment.

Calling ```set_channels(1) already converts the audio to one channel, so there was no need to call ```split_to_mono```, instead we can just call ```get_array_of_samples``` This should have a negligible positive impact on performance

UsernamesLame · 2024-08-29T20:42:48Z

@abdeladim-s Please review this. In the meantime I'm going to write a helper script to auto convert various types of files into something whisper can accept.

abdeladim-s

@UsernamesLame, Approved 👍

Looking forward your helper script :)

pywhispercpp/model.py

UsernamesLame · 2024-08-30T01:01:15Z

@UsernamesLame, Approved 👍

Looking forward your helper script :)

https://github.com/UsernamesLame/WhisperWav :)

It's not remotely done, but it works. I'm thinking of making it spit out pickled numpy arrays of the audio for faster transcription. As of now, it spits out wav files that Whisper.cpp can handle

abdeladim-s · 2024-08-30T01:50:19Z

The script looks nice, but if I understand it correctly, it's similar to what I am already doing!
There is no need to convert the files to wav as we are calling the cpp functions directly, not calling whisper.cpp using the CLI.
Numpy array of the audio file is what we need.

UsernamesLame · 2024-08-30T02:38:02Z

The script looks nice, but if I understand it correctly, it's similar to what I am already doing!

There is no need to convert the files to wav as we are calling the cpp functions directly, not calling whisper.cpp using the CLI.

Numpy array of the audio file is what we need.

No I know. But my understanding is you're doing some pre processing pre transcription.

I want to do all the pre processing on large batches of data while transcription is actively happening to remove one more task from the processing pipeline.

Also, if I'm understanding correctly, you're not actually converting it to 16bit files and that may harm transcription.

I'll look over it again, but I also want to remove the need to convert the wav files to numpy arrays before transcribing.

That would help with performance. At least according to my CPython internals book that recommends a lot of things to avoid.

I'm aiming to make the slowest part loading the model into memory. And even then, I want to explore making it '''deepcopy''' able so we can load the model, clone it in memory, and change its settings as needed from the defaults by using copy.deepcopy.

This way multiple instances can be loaded 100% independent of each other vs the current situation with static methods.

I explored making everything not static and deep copy friendly but ran into some issues with PyBind11.

It's late here, so I might have to advise on that front tomorrow.

abdeladim-s · 2024-08-30T03:06:16Z

Yes, pre processing is necessary to make every type of media compatible with whisper.cpp
The files need to be converted to 16Khz mono, similar to what you did in the script, it's actually here
If you removed the conversion to numpy array, how you are going to pass the data to c++ ?

Other than that, good luck with your exploration, the code is yours .. looking forward it 😄

I can merge this for now if you want ?

UsernamesLame · 2024-08-30T03:10:01Z

Yes, pre processing is necessary to make every type of media compatible with whisper.cpp

The files need to be converted to 16Khz mono, similar to what you did in the script, it's actually here

If you removed the conversion to numpy array, how you are going to pass the data to c++ ?

Other than that, good luck with your exploration, the code is yours .. looking forward it 😄

I can merge this for now if you want ?

Yea please do!

As for how we would load them, we could re-initialize the numpy.ndarrays I guess and pass them along to the C++ interface? I'm struggling to explain it. I need to write some code to show it. Also I was confused why the code wasn't updated, just saw you merge it.

Edit:

My PR isn't merged(?) Says 2 workflows awaiting approval.

Also yea I noticed the channel crushing, sorry. I'm super tired right now so I shouldn't comment too much. I am going to see what else I can work on tomorrow.

abdeladim-s · 2024-08-30T03:50:28Z

Go get some rest .. Take your time!
Looking forward your futur PRs.
I'll merge this one for now!

Thanks for the contribution 😄

Removed uncessary channel crushing

b8d3f54

Calling ```set_channels(1) already converts the audio to one channel, so there was no need to call ```split_to_mono```, instead we can just call ```get_array_of_samples``` This should have a negligible positive impact on performance

abdeladim-s approved these changes Aug 30, 2024

View reviewed changes

pywhispercpp/model.py Show resolved Hide resolved

abdeladim-s merged commit 4df7cfc into abdeladim-s:main Aug 30, 2024
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Removed uncessary channel crushing #48

Removed uncessary channel crushing #48

UsernamesLame commented Aug 29, 2024

UsernamesLame commented Aug 29, 2024

abdeladim-s left a comment

UsernamesLame commented Aug 30, 2024 •

edited

Loading

abdeladim-s commented Aug 30, 2024

UsernamesLame commented Aug 30, 2024

abdeladim-s commented Aug 30, 2024

UsernamesLame commented Aug 30, 2024 •

edited

Loading

abdeladim-s commented Aug 30, 2024

Removed uncessary channel crushing #48

Removed uncessary channel crushing #48

Conversation

UsernamesLame commented Aug 29, 2024

UsernamesLame commented Aug 29, 2024

abdeladim-s left a comment

Choose a reason for hiding this comment

UsernamesLame commented Aug 30, 2024 • edited Loading

abdeladim-s commented Aug 30, 2024

UsernamesLame commented Aug 30, 2024

abdeladim-s commented Aug 30, 2024

UsernamesLame commented Aug 30, 2024 • edited Loading

abdeladim-s commented Aug 30, 2024

UsernamesLame commented Aug 30, 2024 •

edited

Loading

UsernamesLame commented Aug 30, 2024 •

edited

Loading