Support Distributed Training for Fine-tuning Stable Diffusion Example #1802

clintg6 · 2024-03-17T19:00:20Z

Issue Type

Documentation Feature Request

Source

source

Keras Version

Keras 2.13.1

Custom Code

Yes

OS Platform and Distribution

Linux Ubuntu 22.04

Python version

3.9

GPU model and memory

No response

Current Behavior?

The current example documentation for Fine-tuning Stable Diffusion only demonstrates how to fine-tune on a single GPU. At the end of the documentation, Sayak concludes that to improve the quality of the stable diffusion model generation that the next steps would be "To enable that, having support for gradient accumulation and distributed training is crucial. This can be thought of as the next step in this tutorial.".

It is not trivial from reading the current TensorFlow docs how to update a custom Trainer class to achieve distributed training as current documentation mostly details it for compiled models. It would be nice to have a section going into greater detail for integrating a Trainer class with distributed training. Consequently, this example could also be included as an additional example of how to perform distributed training in Keras with custom Trainer classes.

I would like to update the documentation and associated files for this example to include a new section that demonstrates how to fine-tune a stable diffusion model in Keras/Tensorflow through distributed training using multiple GPUs when using custom Trainer classes. This would involve:

intro to how distributed training works with Keras/Tensorflow
modify the Trainer class/loss function to handle multiple GPUs

Standalone code to reproduce the issue or tutorial link

https://keras.io/examples/generative/finetune_stable_diffusion/

Relevant log output

No response

sachinprasadhs · 2024-05-02T23:18:55Z

Adding author of the notebook @sayakpaul

sayakpaul · 2024-05-03T01:15:11Z

Won't have the bandwidth to work on this.

clintg6 · 2024-05-03T16:38:22Z

I have already expanded the code in the Trainer class for multi GPU training as well as the text introducing the reader to distributed training so it shouldn't take much time as it just needs review. @sayakpaul

sachinprasadhs · 2024-05-03T17:06:25Z

@clintg6 , Please create a PR, Keras team would be happy to review.

clintg6 · 2024-05-03T17:20:19Z

Thank you will do

github-actions bot assigned sachinprasadhs Mar 17, 2024

sachinprasadhs added the type:doc-feature label Apr 23, 2024

sachinprasadhs added the stat:contributions welcome label May 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Distributed Training for Fine-tuning Stable Diffusion Example #1802

Support Distributed Training for Fine-tuning Stable Diffusion Example #1802

clintg6 commented Mar 17, 2024

sachinprasadhs commented May 2, 2024

sayakpaul commented May 3, 2024

clintg6 commented May 3, 2024

sachinprasadhs commented May 3, 2024

clintg6 commented May 3, 2024

Support Distributed Training for Fine-tuning Stable Diffusion Example #1802

Support Distributed Training for Fine-tuning Stable Diffusion Example #1802

Comments

clintg6 commented Mar 17, 2024

Issue Type

Source

Keras Version

Custom Code

OS Platform and Distribution

Python version

GPU model and memory

Current Behavior?

Standalone code to reproduce the issue or tutorial link

Relevant log output

sachinprasadhs commented May 2, 2024

sayakpaul commented May 3, 2024

clintg6 commented May 3, 2024

sachinprasadhs commented May 3, 2024

clintg6 commented May 3, 2024