Chang Liu, Rui Li, Kaidong Zhang, Xin Luo, Dong Liu
[Paper
] / [Project
] / [Huggingface
] / [ModelScope
] / [Demo
]
- 1. News
- 2. To-Do Lists
- 3. Overview of LaCon
- 4. Code Structure
- 5. Prerequisites
- 6. Training of Condition Aligner
- 7. Sampling with Condition Aligner
- 8. Evaluation
- 9. Results
- 10. Citation
- 11. Stars, Forked, and Star History
If you have any questions about this work, please feel free to start a new issue or propose a PR.
- [Jun. 12th] We have updated the training and sampling code of LaCon. Pre-trained model weights are currently available at our Huggingface repo and ModelScope repo.
- Upload a newer version of paper to arXiv
- Update the codebase
- Update the repo document
- Upload the pre-trained model weights of LaCon based on Celeb and Stable Diffusion v1.4
- Update the pre-trained model weights of LaCon based on Stable Diffusion v2.1
- Update implementation for local Gradio demo
- Update online HuggingFace demo
Diffusion models have demonstrated impressive abilities in generating photo-realistic and creative images. To offer more controllability for the generation process, existing studies, termed as early-constraint methods in this paper, leverage extra conditions and incorporate them into pre-trained diffusion models. Particularly, some of them adopt condition-specific modules to handle conditions separately, where they struggle to generalize across other conditions. Although follow-up studies present unified solutions to solve the generalization problem, they also require extra resources to implement, e.g., additional inputs or parameter optimization, where more flexible and efficient solutions are expected to perform steerable guided image synthesis. In this paper, we present an alternative paradigm, namely Late-Constraint Diffusion (LaCon), to simultaneously integrate various conditions into pre-trained diffusion models. Specifically, LaCon establishes an alignment between the external condition and the internal features of diffusion models, and utilizes the alignment to incorporate the target condition, guiding the sampling process to produce tailored results. Experimental results on COCO dataset illustrate the effectiveness and superior generalization capability of LaCon under various conditions and settings. Ablation studies investigate the functionalities of different components in LaCon, and illustrate its great potential to serve as an efficient solution to offer flexible controllability for diffusion models.
This GitHub repo is constructed following the code structure below:
LaCon/
└── condition_aligner_src <----- Source code of LaCon
├── __init__.py
├── condition_aligner_dataset.py <----- Dataset
├── condition_aligner_model.py <----- Model
└── condition_aligner_runner.py <----- Runner (train and inference)
├── configs <----- Configuration files
├── data-preprocessing <----- Code of data pre-processing
├── evaluation-metrics <----- Code of evaluation metrics
├── github-materials
├── ldm <----- Source code of LDM (Stable Diffusion)
├── taming <----- Source code of `taming` package
├── tools <----- Code of toolkits to assist data pre-processing
├── README.md
├── condition-aligner-inference.py <----- Script to reconstruct conditions with the condition aligner
├── condition-aligner-train.py <----- Script to train condition aligner
├── generate-batch-image.py <----- Script to generate results in batch
├── generate-single-image.py <----- Script to generate a single result
└── install.sh <----- Bash script to install the virtual environment
- To install the virtual environment of LaCon, you can execute the following command lines:
conda create -n lacon
conda activate lacon
pip install torch==2.0.0 torchvision==0.15.1
bash install.sh
- To prepare the pre-trained model weights of different components in
Stable Diffusion
as well as our condition aligner, please download the model weights from our Huggingface repo and put them in./checkpoints
. Once the weights are downloaded, modify the configuration files in./configs
. Check this document for more details of modifying configuration files. We strongly recommend you to download the whole Huggingface repo of CLIP locally, in order to avoid the network issue of Huggingface.
- We use a subset of the training set COCO with approximate 10,000 data samples. To train the condition aligner, you need to follow the instructions in this document and construct the data in the following structure:
data/
└── bdcn-edges
├── 1.png
├── 2.png
├── ...
└── saliency-masks
├── 1.png
├── 2.png
├── ...
└── color-strokes
├── 1.png
├── 2.png
├── ...
└── coco-captions
├── 1.txt
├── 2.txt
├── ...
└── images
- Once the training data is ready, you need to modify the configuration files following this document.
- Now you are ready to go by executing the following command line:
python condition-aligner-train.py -b CONFIG_PATH -l OUTPUT_PATH
You can refer to this example command line:
python condition-aligner-train.py -b configs/sd-edge.yaml -l outputs/training/sd-edge
Execute the following command line to generate an image with the trained condition aligner:
python generate-single-image.py --cond_type COND_TYPE --indir CONDITION_PATH --resume CONDITION_ALIGNER_PATH --caption TEXT_PROMPT --cond_scale CONTROLLING_SCALE --unconditional_guidance_scale CLASSIFIER_FREE_GUIDANCE_SCALE --outdir OUTPUT_PATH -b CONFIG_PATH --seed SEED --truncation_steps TRUNCATION_STEPS --use_neg_prompt
You can refer to this example command line:
python generate-single-image.py --cond_type mask --indir examples/horse.png --resume checkpoints/sdv14_mask.pth --caption "a horse standing in the moon surface" --cond_scale 2.0 --unconditional_guidance_scale 6.0 --outdir outputs/ -b configs/sd-mask.yaml --seed 23 --truncation_steps 600 --use_neg_prompt
We suggest the following settings to achieve the optimal performance for various conditions:
Condition | Setting | Model Weight | Controlling Scale | Truncation Steps |
---|---|---|---|---|
Canny Edge | Unconditional Generation | sd_celeb_edge.pth |
2.0 | 500 |
HED Edge | Unconditional Generation | sd_celeb_edge.pth |
2.0 | 500 |
User Sketch | Unconditional Generation | sd_celeb_edge.pth |
2.0 | 600 |
Color Stroke | Unconditional Generation | sd_celeb_color.pth |
2.0 | 600 |
Image Palette | Unconditional Generation | sd_celeb_color.pth |
2.0 | 800 |
Canny Edge | T2I Generation | sdv14_edge.pth |
2.0 | 500 |
HED Edge | T2I Generation | sdv14_edge.pth |
2.5 | 500 |
User Sketch | T2I Generation | sdv14_edge.pth |
2.0 | 600 |
Color Stroke | T2I Generation | sdv14_color.pth |
2.0 | 600 |
Image Palette | T2I Generation | sdv14_color.pth |
2.0 | 800 |
Saliency Mask | T2I Generation | sdv14_mask.pth |
2.0 | 600 |
User Scribble | T2I Generation | sdv14_mask.pth |
2.0 | 700 |
Prepare the test set following the data structure below:
data/
└── bdcn-edges
├── 1.png
├── 2.png
├── ...
└── saliency-masks
├── 1.png
├── 2.png
├── ...
└── color-strokes
├── 1.png
├── 2.png
├── ...
└── image-palette
├── 1.png
├── 2.png
├── ...
└── coco-captions
├── 1.txt
├── 2.txt
├── ...
└── images
Execute the following command line to test all data samples in the test set:
python generate-batch-image.py -b CONFIG_PATH --indir DATA_FILELIST_PATH --text CAPTION_PATH --target_cond CONDITION_PATH --resume CONDITION_ALIGNER_PATH --cond_scale CONTROLLING_SCALE --truncation_steps TRUNCATION_STEPS
You can refer to this example command line:
python generate-batch-image.py -b configs/sd-mask.yaml --indir data/coco2017val/data_flist.txt --text data/coco2017val/coco-captions --target_cond data/coco2017val/saliency-masks --resume checkpoints/sdv14_mask.pth --cond_scale 2.0 --truncation_steps 600
To compute evaluation metrics (e.g., FID and CLIP scores), please refer to this document for more details. We report the performance of LaCon on COCO 2017 validation set in the following table:
Condition | Model Weight | FID | CLIP Score |
---|---|---|---|
HED Edge | sdv14_edge.pth |
21.02 | 0.2590 |
Color Stroke | sdv14_color.pth |
20.27 | 0.2589 |
Image Palette | sdv14_color.pth |
20.61 | 0.2580 |
Saliency Mask | sdv14_mask.pth |
20.94 | 0.2617 |
We demonstrate results generated by LaCon under various conditions in the following figures.
If you find our paper helpful to your work, please cite our paper with the following BibTeX reference:
@misc{liu-etal-2024-lacon,
title={{LaCon: Late-Constraint Diffusion for Steerable Guided Image Synthesis}},
author={{Chang Liu, Rui Li, Kaidong Zhang, Xin Luo, and Dong Liu}},
year={2024},
eprint={2305.11520},
archivePrefix={arXiv},
primaryClass={cs.CV}
}