Invertible Consistency Distillation for
Text-Guided Image Editing in Around 7 Steps

This paper proposes invertible Consistency Distillation, enabling

highly efficient and accurate text-guided image editing
diverse and high-quality image generation

Installation

# Clone a repo
git clone https://github.com/yandex-research/invertible-cd

# Create an environment and install packages
conda create -n icd python=3.10 -y 
conda activate icd

pip3 install -r requirements/req.txt

We provide the following checkpoints:

Guidance distilled diffusion models
- Stable Diffusion 1.5, 3GB
- SDXL, 8.9GB

These models saved as .pt files.

Invertible Consistency Distillation (forward and reverse CD) on top of the guidance distilled models

Model	Steps	Time steps
iCD-SD1.5, 0.5GB	4	Reverse: [259, 519, 779, 999]; Forward: [19, 259, 519, 779]
iCD-SD1.5, 0.5GB	4	Reverse: [249, 499, 699, 999]; Forward: [19, 249, 499, 699]
iCD-SD1.5, 0.5GB	3	Reverse: [339, 699, 999]; Forward: [19, 339, 699]
iCD-SDXL, 1.4GB	4	Reverse: [259, 519, 779, 999]; Forward: [19, 259, 519, 779]
iCD-SDXL, 1.4GB	4	Reverse: [249, 499, 699, 999]; Forward: [19, 249, 499, 699]
iCD-SDXL, 1.4GB	3	Reverse: [339, 699, 999]; Forward: [19, 339, 699]

These models saved as .safetensors files.

Easy-to-run examples

Step 0. Download the models and put them to the checkpoints folder

For this example, we consider iCD-SD1.5 using reverse: [259, 519, 779, 999], forward: [19, 259, 519, 779] time steps.

Step 1. Load the models

from utils.loading import load_models
from diffusers import DDPMScheduler

root = 'checkpoints'
ldm_stable, reverse_cons_model, forward_cons_model = load_models(
    model_id="runwayml/stable-diffusion-v1-5",
    device='cuda',
    forward_checkpoint=f'{root}/iCD-SD15-forward_19_259_519_779.safetensors',
    reverse_checkpoint=f'{root}/iCD-SD15-reverse_259_519_779_999.safetensors',
    r=64,
    w_embed_dim=512,
    teacher_checkpoint=f'{root}/sd15_cfg_distill.pt',
)

tokenizer = ldm_stable.tokenizer
noise_scheduler = DDPMScheduler.from_pretrained(
    "runwayml/stable-diffusion-v1-5", subfolder="scheduler", )

Step 2. Specify the configuration according to the downloaded model

from utils import p2p, generation

NUM_REVERSE_CONS_STEPS = 4
REVERSE_TIMESTEPS = [259, 519, 779, 999]
NUM_FORWARD_CONS_STEPS = 4
FORWARD_TIMESTEPS = [19, 259, 519, 779]
NUM_DDIM_STEPS = 50

solver = generation.Generator(
    model=ldm_stable,
    noise_scheduler=noise_scheduler,
    n_steps=NUM_DDIM_STEPS,
    forward_cons_model=forward_cons_model,
    forward_timesteps=FORWARD_TIMESTEPS,
    reverse_cons_model=reverse_cons_model,
    reverse_timesteps=REVERSE_TIMESTEPS,
    num_endpoints=NUM_REVERSE_CONS_STEPS,
    num_forward_endpoints=NUM_FORWARD_CONS_STEPS,
    max_forward_timestep_index=49,
    start_timestep=19)

p2p.NUM_DDIM_STEPS = NUM_DDIM_STEPS
p2p.tokenizer = tokenizer
p2p.device = 'cuda'

Generation with iCD-SD1.5

Step 3. Generate

import torch

prompt = ['a cute owl with a graduation cap']
controller = p2p.AttentionStore()

generator = torch.Generator().manual_seed(150)
tau = 1.0
image, _ = generation.runner(
    # Playing params
    guidance_scale=19.0,
    tau1=tau,  # Dynamic guidance if tau < 1.0
    tau2=tau,

    # Fixed params
    is_cons_forward=True,
    model=reverse_cons_model,
    w_embed_dim=512,
    solver=solver,
    prompt=prompt,
    controller=controller,
    generator=generator,
    latent=None,
    return_type='image')

# The left image is inversion, the right - editing.
generation.to_pil_images(image).save('test_generation_iCD-SD1.5.jpg')
generation.view_images(image)

Editing with iCD-SD1.5

Step 3. Load and invert real image

from utils import inversion

image_path = f"assets/bird.jpg"
prompt = ["a photo of a bird standing on a branch"]

(image_gt, image_rec), ddim_latent, uncond_embeddings = inversion.invert(
    # Playing params
    image_path=image_path,
    prompt=prompt,

    # Fixed params
    is_cons_inversion=True,
    w_embed_dim=512,
    inv_guidance_scale=0.0,
    stop_step=50,
    solver=solver,
    seed=10500)

Step 4. Edit the image

p2p.NUM_DDIM_STEPS = 4
p2p.tokenizer = tokenizer
p2p.device = 'cuda'

prompts = ["a photo of a bird standing on a branch",
           "a photo of a lego bird standing on a branch"
           ]

# Playing params
cross_replace_steps = {'default_': 0.2, }
self_replace_steps = 0.2
blend_word = ((('bird',), ('lego',)))
eq_params = {"words": ("lego",), "values": (3.,)}

controller = p2p.make_controller(prompts,
                                 False, # (is_replacement) True if only one word is changed
                                 cross_replace_steps,
                                 self_replace_steps,
                                 blend_word,
                                 eq_params)

tau = 0.8
image, _ = generation.runner(
    # Playing params
    guidance_scale=19.0,
    tau1=tau,  # Dynamic guidance if tau < 1.0
    tau2=tau,

    # Fixed params
    model=reverse_cons_model,
    is_cons_forward=True,
    w_embed_dim=512,
    solver=solver,
    prompt=prompts,
    controller=controller,
    num_inference_steps=50,
    generator=None,
    latent=ddim_latent,
    uncond_embeddings=uncond_embeddings,
    return_type='image')

generation.to_pil_images(image).save('test_editing_iCD-SD1.5.jpg')
generation.view_images(image)

Note:
Please note that zero-shot editing is highly sensitive to hyperparameters. Thus, we recommend tuning: cross_replace_steps (from 0.0 to 1.0), self_replace_steps (from 0.0 to 1), tau (0.7 or 0.8 seems to work best), guidance scale (up to 19), and amplify factor (eq_params).

You can also consider the similar easy-to-run examples for the SDXL model or move on to in-depth examples

Citation

@article{starodubcev2024invertible,
  title={Invertible Consistency Distillation for Text-Guided Image Editing in Around 7 Steps},
  author={Starodubcev, Nikita and Khoroshikh, Mikhail and Babenko, Artem and Baranchuk, Dmitry},
  journal={arXiv preprint arXiv:2406.14539},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
assets		assets
requirements		requirements
running		running
training		training
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Invertible Consistency Distillation for
Text-Guided Image Editing in Around 7 Steps

Table of contents

Installation

Easy-to-run examples

Generation with iCD-SD1.5

Editing with iCD-SD1.5

Citation

About

Releases

Packages

Contributors 2

Languages

License

yandex-research/invertible-cd

Folders and files

Latest commit

History

Repository files navigation

Invertible Consistency Distillation for Text-Guided Image Editing in Around 7 Steps

Table of contents

Installation

Easy-to-run examples

Generation with iCD-SD1.5

Editing with iCD-SD1.5

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Invertible Consistency Distillation for
Text-Guided Image Editing in Around 7 Steps

Packages