Distributed inference for re-captioning large datasets with `vllm` and other things #8363

sayakpaul · 2024-09-11T09:11:54Z

sayakpaul
Sep 11, 2024

Hello.

vllm is just awesome!

I did go through a lot of difficulties to try to come up with a script to re-caption a large image dataset (which is common in image generation space) with distributed inference. But then when I started poking around in vllm, it felt so good to see all the optimizations right off the bat.

I am looking for feedback on the script I finally came up with.

My basic pipeline is:

Load webdataset shards through a dedicated dataloader.
Use the Llava NeXT model for recaptioning. I am distributing the inference across two GPUs.
Ask the MLLM to explicitly look for watermarks (it can be done with a standalone model too).
Obtain predictions.
Serialize the artifacts in a separate thread to not block the GPUs.

(The break statement in the script is intentional to quickly see the outputs)

Happy to get any feedback :)

sayakpaul · 2024-09-11T12:49:03Z

sayakpaul
Sep 11, 2024
Author

I ultimately buttoned things up and created this small repo: https://github.com/sayakpaul/simple-image-recaptioning

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Distributed inference for re-captioning large datasets with `vllm` and other things #8363

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Distributed inference for re-captioning large datasets with vllm and other things #8363

sayakpaul Sep 11, 2024

Replies: 1 comment

sayakpaul Sep 11, 2024 Author

Distributed inference for re-captioning large datasets with `vllm` and other things #8363

sayakpaul
Sep 11, 2024

sayakpaul
Sep 11, 2024
Author