Distributed inference for re-captioning large datasets with vllm
and other things
#8363
sayakpaul
started this conversation in
Show and tell
Replies: 1 comment
-
I ultimately buttoned things up and created this small repo: https://github.com/sayakpaul/simple-image-recaptioning |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello.
vllm
is just awesome!I did go through a lot of difficulties to try to come up with a script to re-caption a large image dataset (which is common in image generation space) with distributed inference. But then when I started poking around in
vllm
, it felt so good to see all the optimizations right off the bat.I am looking for feedback on the script I finally came up with.
My basic pipeline is:
webdataset
shards through a dedicated dataloader.(The
break
statement in the script is intentional to quickly see the outputs)Happy to get any feedback :)
Beta Was this translation helpful? Give feedback.
All reactions