Dynamic slicing and batch size #5671

rems75 · 2024-10-09T10:18:06Z

Describe the question.

Hello everyone,

I'm trying to optimise a torch data loading pipeline that involves video decoding and thought I'd give DALI a try (already tried things like pynvvideocodec but that ended up quite slow). I have something more or less working but at the cost of some suboptimal decisions so I'm wondering whether I missed relevant options or whether DALI is not perfectly suited for my use case.

I have a set of N 1s videos, where N changes from batch to batch, and I want to extract a certain number of frames from those videos, where the indices of the frames differ from video to video. From reading other posts, it does seem at the frontier of what DALI was designed for.

I have set up an ExternalInputCallable class with batch=False (in order to leverage parallelism) where __call__ returns a video and list of indices, and a pipeline based on fn.experimental.decoders.video.

The questions I have are the following:

how can I handle the dynamic number of videos per batch? I tried setting up a pipeline batch size larger than the max number of videos and a StopIteration in the external source but it doesn't seem to work. I could reinitialise the pipeline with the appropriate batch size every time, but it seems wasteful.
is there a way to compose fn.element_extract and decoding at the sample level in the pipeline? Right now I'm doing the slicing per sample on a torch tensor built from each tensorGPU returned by pipeline.run(), which feels very inefficient.
or maybe a different setup is more appropriate?

Check for duplicates

I have searched the open bugs/issues and have found no duplicates for this bug report

The text was updated successfully, but these errors were encountered:

mzient · 2024-10-09T11:14:59Z

Hello @rems75
I don't think you can use a parallel external source for your use case - please try ordinary external_source in batch mode. This way the batch size for the iteration is determined by whatever these external_sources produce.

rems75 · 2024-10-10T19:22:08Z

Thanks for the answer @mzient . I'll be training things on H100s, which have 7 nvdecs, will the ordinary external source in batch mode be able to leverage all of them? (In my case the batch will contain 20-30 1s videos)

Regarding the second question, any thoughts on extracting frames in the pipeline itself? Maybe a custom operator?

JanuszL · 2024-10-11T07:08:45Z

Hi @rems75,

Thank you for reaching out. Now, only experimental.decoders.video attempts to decode multiple files in parallel, while other video readers/decoders do it sequentially, using only one nvdec at a time.

rems75 · 2024-10-11T09:39:12Z

Hi @JanuszL
Good to know, that's the one I've been using. Any recommendation on profiling tools to check how things are going?

JanuszL · 2024-10-11T09:48:38Z

Hi @rems75,

You can try using NSight System and explore its video profiling capabilities.

rems75 · 2024-10-18T17:42:17Z

Thanks for the pointer @JanuszL, still looking into it.
In the mean time, any thoughts on doing indexing directly in the decoder to save memory space (and presumably speed things up)? Did I miss an option?

JanuszL · 2024-10-18T17:45:57Z

In the mean time, any thoughts on doing indexing directly in the decoder to save memory space (and presumably speed things up)? Did I miss an option?

Let me add this to our ToDo list.

rems75 added the question Further information is requested label Oct 9, 2024

dali-automaton assigned klecki Oct 9, 2024

JanuszL added the Video Video related feature/question label Oct 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dynamic slicing and batch size #5671

Dynamic slicing and batch size #5671

rems75 commented Oct 9, 2024

mzient commented Oct 9, 2024

rems75 commented Oct 10, 2024 •

edited

Loading

JanuszL commented Oct 11, 2024

rems75 commented Oct 11, 2024

JanuszL commented Oct 11, 2024

rems75 commented Oct 18, 2024

JanuszL commented Oct 18, 2024

Dynamic slicing and batch size #5671

Dynamic slicing and batch size #5671

Comments

rems75 commented Oct 9, 2024

Describe the question.

Check for duplicates

mzient commented Oct 9, 2024

rems75 commented Oct 10, 2024 • edited Loading

JanuszL commented Oct 11, 2024

rems75 commented Oct 11, 2024

JanuszL commented Oct 11, 2024

rems75 commented Oct 18, 2024

JanuszL commented Oct 18, 2024

rems75 commented Oct 10, 2024 •

edited

Loading