Skip to content

The official code implementation of "LaCon: Late-Constraint Diffusion for Steerable Guided Image Synthesis".

Notifications You must be signed in to change notification settings

AlonzoLeeeooo/LCDG

Repository files navigation

LaCon: Late-Constraint Diffusion for Steerable Guided Image Synthesis

Chang Liu, Rui Li, Kaidong Zhang, Xin Luo, Dong Liu

[Paper] / [Project] / [Huggingface] / [ModelScope] / [Demo]

Table of Contents

If you have any questions about this work, please feel free to start a new issue or propose a PR.

News

  • [Jun. 12th] We have updated the training and sampling code of LaCon. Pre-trained model weights are currently available at our Huggingface repo and ModelScope repo.

To-Do Lists

  • Upload a newer version of paper to arXiv
  • Update the codebase
  • Update the repo document
  • Upload the pre-trained model weights of LaCon based on Celeb and Stable Diffusion v1.4
  • Update the pre-trained model weights of LaCon based on Stable Diffusion v2.1
  • Update implementation for local Gradio demo
  • Update online HuggingFace demo

Overview of LaCon

teasor

Diffusion models have demonstrated impressive abilities in generating photo-realistic and creative images. To offer more controllability for the generation process, existing studies, termed as early-constraint methods in this paper, leverage extra conditions and incorporate them into pre-trained diffusion models. Particularly, some of them adopt condition-specific modules to handle conditions separately, where they struggle to generalize across other conditions. Although follow-up studies present unified solutions to solve the generalization problem, they also require extra resources to implement, e.g., additional inputs or parameter optimization, where more flexible and efficient solutions are expected to perform steerable guided image synthesis. In this paper, we present an alternative paradigm, namely Late-Constraint Diffusion (LaCon), to simultaneously integrate various conditions into pre-trained diffusion models. Specifically, LaCon establishes an alignment between the external condition and the internal features of diffusion models, and utilizes the alignment to incorporate the target condition, guiding the sampling process to produce tailored results. Experimental results on COCO dataset illustrate the effectiveness and superior generalization capability of LaCon under various conditions and settings. Ablation studies investigate the functionalities of different components in LaCon, and illustrate its great potential to serve as an efficient solution to offer flexible controllability for diffusion models.

<🎯Back to Table of Contents>

Code Structure

This GitHub repo is constructed following the code structure below:

LaCon/
└── condition_aligner_src                  <----- Source code of LaCon
    ├── __init__.py
    ├── condition_aligner_dataset.py       <----- Dataset
    ├── condition_aligner_model.py         <----- Model
    └── condition_aligner_runner.py        <----- Runner (train and inference)
├── configs                                <----- Configuration files
├── data-preprocessing                     <----- Code of data pre-processing
├── evaluation-metrics                     <----- Code of evaluation metrics
├── github-materials
├── ldm                                    <----- Source code of LDM (Stable Diffusion)
├── taming                                 <----- Source code of `taming` package
├── tools                                  <----- Code of toolkits to assist data pre-processing
├── README.md
├── condition-aligner-inference.py         <----- Script to reconstruct conditions with the condition aligner
├── condition-aligner-train.py             <----- Script to train condition aligner
├── generate-batch-image.py                <----- Script to generate results in batch
├── generate-single-image.py               <----- Script to generate a single result
└── install.sh                             <----- Bash script to install the virtual environment

<🎯Back to Table of Contents>

Prerequisites

  1. To install the virtual environment of LaCon, you can execute the following command lines:
conda create -n lacon
conda activate lacon
pip install torch==2.0.0 torchvision==0.15.1
bash install.sh
  1. To prepare the pre-trained model weights of different components in Stable Diffusion as well as our condition aligner, please download the model weights from our Huggingface repo and put them in ./checkpoints. Once the weights are downloaded, modify the configuration files in ./configs. Check this document for more details of modifying configuration files. We strongly recommend you to download the whole Huggingface repo of CLIP locally, in order to avoid the network issue of Huggingface.

<🎯Back to Table of Contents>

Training of Condition Aligner

  1. We use a subset of the training set COCO with approximate 10,000 data samples. To train the condition aligner, you need to follow the instructions in this document and construct the data in the following structure:
data/
└── bdcn-edges
    ├── 1.png
    ├── 2.png
    ├── ...
└── saliency-masks
    ├── 1.png
    ├── 2.png
    ├── ...
└── color-strokes
    ├── 1.png
    ├── 2.png
    ├── ...
└── coco-captions
    ├── 1.txt
    ├── 2.txt
    ├── ...
└── images
  1. Once the training data is ready, you need to modify the configuration files following this document.
  2. Now you are ready to go by executing the following command line:
python condition-aligner-train.py -b CONFIG_PATH -l OUTPUT_PATH

You can refer to this example command line:

python condition-aligner-train.py -b configs/sd-edge.yaml -l outputs/training/sd-edge

<🎯Back to Table of Contents>

Sampling with Condition Aligner

Execute the following command line to generate an image with the trained condition aligner:

python generate-single-image.py --cond_type COND_TYPE --indir CONDITION_PATH --resume CONDITION_ALIGNER_PATH --caption TEXT_PROMPT --cond_scale CONTROLLING_SCALE --unconditional_guidance_scale CLASSIFIER_FREE_GUIDANCE_SCALE  --outdir OUTPUT_PATH -b CONFIG_PATH --seed SEED --truncation_steps TRUNCATION_STEPS --use_neg_prompt

You can refer to this example command line:

python generate-single-image.py --cond_type mask --indir examples/horse.png --resume checkpoints/sdv14_mask.pth --caption "a horse standing in the moon surface" --cond_scale 2.0 --unconditional_guidance_scale 6.0  --outdir outputs/ -b configs/sd-mask.yaml --seed 23 --truncation_steps 600 --use_neg_prompt

We suggest the following settings to achieve the optimal performance for various conditions:

Condition Setting Model Weight Controlling Scale Truncation Steps
Canny Edge Unconditional Generation sd_celeb_edge.pth 2.0 500
HED Edge Unconditional Generation sd_celeb_edge.pth 2.0 500
User Sketch Unconditional Generation sd_celeb_edge.pth 2.0 600
Color Stroke Unconditional Generation sd_celeb_color.pth 2.0 600
Image Palette Unconditional Generation sd_celeb_color.pth 2.0 800
Canny Edge T2I Generation sdv14_edge.pth 2.0 500
HED Edge T2I Generation sdv14_edge.pth 2.5 500
User Sketch T2I Generation sdv14_edge.pth 2.0 600
Color Stroke T2I Generation sdv14_color.pth 2.0 600
Image Palette T2I Generation sdv14_color.pth 2.0 800
Saliency Mask T2I Generation sdv14_mask.pth 2.0 600
User Scribble T2I Generation sdv14_mask.pth 2.0 700

<🎯Back to Table of Contents>

Evaluation

Prepare the test set following the data structure below:

data/
└── bdcn-edges
    ├── 1.png
    ├── 2.png
    ├── ...
└── saliency-masks
    ├── 1.png
    ├── 2.png
    ├── ...
└── color-strokes
    ├── 1.png
    ├── 2.png
    ├── ...
└── image-palette
    ├── 1.png
    ├── 2.png
    ├── ...
└── coco-captions
    ├── 1.txt
    ├── 2.txt
    ├── ...
└── images

Execute the following command line to test all data samples in the test set:

python generate-batch-image.py -b CONFIG_PATH --indir DATA_FILELIST_PATH --text CAPTION_PATH --target_cond CONDITION_PATH --resume CONDITION_ALIGNER_PATH --cond_scale CONTROLLING_SCALE --truncation_steps TRUNCATION_STEPS

You can refer to this example command line:

python generate-batch-image.py -b configs/sd-mask.yaml --indir data/coco2017val/data_flist.txt --text data/coco2017val/coco-captions --target_cond data/coco2017val/saliency-masks --resume checkpoints/sdv14_mask.pth --cond_scale 2.0 --truncation_steps 600

To compute evaluation metrics (e.g., FID and CLIP scores), please refer to this document for more details. We report the performance of LaCon on COCO 2017 validation set in the following table:

Condition Model Weight FID CLIP Score
HED Edge sdv14_edge.pth 21.02 0.2590
Color Stroke sdv14_color.pth 20.27 0.2589
Image Palette sdv14_color.pth 20.61 0.2580
Saliency Mask sdv14_mask.pth 20.94 0.2617

<🎯Back to Table of Contents>

Results

We demonstrate results generated by LaCon under various conditions in the following figures.
Canny Edge

canny-edge

HED Edge

hed-edge

User Sketch

user-sketch

Color Stroke

Color Stroke

Image Palette

image-palette

Mask

mask

<🎯Back to Table of Contents>

Citation

If you find our paper helpful to your work, please cite our paper with the following BibTeX reference:

@misc{liu-etal-2024-lacon,
      title={{LaCon: Late-Constraint Diffusion for Steerable Guided Image Synthesis}}, 
      author={{Chang Liu, Rui Li, Kaidong Zhang, Xin Luo, and Dong Liu}},
      year={2024},
      eprint={2305.11520},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

<🎯Back to Table of Contents>

Stars, Forked, and Star History

Stargazers repo roster for @AlonzoLeeeooo/LCDG

Forkers repo roster for @AlonzoLeeeooo/LCDG

Star History Chart

<🎯Back to Table of Contents>

About

The official code implementation of "LaCon: Late-Constraint Diffusion for Steerable Guided Image Synthesis".

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages