Switch from output seeking to combined input seeking and output seeking to accelerate extracting small segments from huge files #729

inatuwe · 2023-05-14T22:51:46Z

Without this change, seeking inside a huge file gets slower the further you seek. E.g., if you want to have a small AudioSegment with 100 seconds starting from 7000 seconds, it may take up to 5 seconds to extract the segment. With this change, it takes less than a second!

I had the idea of trimming start and end positions with:

AudioSegment.from_file(
    file = "video.mp4",
    start_seconds = 7000,
    duration = 100,
)

But this took surprisingly long!

From analysing the command line I realized, that AudioSegment.from_file() was first specifying the input file and then the seek parameters:

'ffmpeg', '-y', '-i', 'video.mp4', '-ss', '7000', '-t', '100', ...

But when reading about the seek parameter for FFMPEG I understood from "https://trac.ffmpeg.org/wiki/Seeking", that actually:
As of FFmpeg 2.1, when transcoding with ffmpeg (i.e. not stream copying), -ss is now also "frame-accurate" even when used as an input option...

So I tried instead the following command:

'ffmpeg', '-y', '-ss', '7000', '-t', '100', '-i', 'video.mp4', ...

... and the video file was processed incredibly much faster!

When the pull request first ran through a test with an mp3 file failed:
test_partial_load_start_second_and_duration_equals_cropped_mp3_audio_segment

The reason was that input seeking is not accurate for encoded streams! So I had to add some margin in the input seeking. I tried to calculate the maximum required margin based on the properties of the stream. Let me know, if it makes sense for you as well.
The approximate margin needed is ~144 ms.
At the end, we still need to seek the output stream, so the ffmpeg command would be

'ffmpeg', '-y', '-ss', '6999.856', '-t', '100.288', '-i', 'video.mp4', '-ss' '0.144' ...

Without this change, seeking inside a huge file gets slower the further you seek. E.g., if you want to have a small AudioSegment with 10 seconds starting from 6900 seconds, it may take up to 5 seconds to extract the segment. With this change, it takes only few milliseconds!

Make input sampling accurate by respecting the worst case expected inaccuracy due to the frame sizes

Add test case for very small start_second, since the new logic must cover the case, where the additional part before the start_second would be truncated to zero.

inatuwe mentioned this pull request May 14, 2023

Add Support for large audio files ( > 2GB) #135

Open

inatuwe marked this pull request as draft May 15, 2023 19:13

inatuwe added 4 commits May 15, 2023 23:54

Make input sampling accurate

734445a

Make input sampling accurate by respecting the worst case expected inaccuracy due to the frame sizes

Add test case for very small start_second

88f780a

Add test case for very small start_second, since the new logic must cover the case, where the additional part before the start_second would be truncated to zero.

Revert seek strategy for untested function

ce2b6ca

Move output seeking (still required!) back to the previous place

92d410a

inatuwe marked this pull request as ready for review May 15, 2023 22:07

inatuwe changed the title ~~Switch from output seeking to input seeking~~ Switch from output seeking to combined input seeking and output seeking to accelerate small segments from huge files May 15, 2023

inatuwe changed the title ~~Switch from output seeking to combined input seeking and output seeking to accelerate small segments from huge files~~ Switch from output seeking to combined input seeking and output seeking to accelerate extracting small segments from huge files May 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch from output seeking to combined input seeking and output seeking to accelerate extracting small segments from huge files #729

Switch from output seeking to combined input seeking and output seeking to accelerate extracting small segments from huge files #729

inatuwe commented May 14, 2023 •

edited

Loading

Switch from output seeking to combined input seeking and output seeking to accelerate extracting small segments from huge files #729

Are you sure you want to change the base?

Switch from output seeking to combined input seeking and output seeking to accelerate extracting small segments from huge files #729

Conversation

inatuwe commented May 14, 2023 • edited Loading

inatuwe commented May 14, 2023 •

edited

Loading