Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SDXL] Add SDXL image to image support #239

Merged
merged 55 commits into from
Oct 6, 2023
Merged
Show file tree
Hide file tree
Changes from 53 commits
Commits
Show all changes
55 commits
Select commit Hold shift + click to select a range
eabcc2f
draft img2img sd pipe
JingyaHuang Aug 7, 2023
ce6b506
Merge branch 'main' into add-sd-img2img
JingyaHuang Aug 7, 2023
0de6a76
refactor inheritance
JingyaHuang Aug 16, 2023
ae8ae70
Merge branch 'main' into add-sd-img2img
JingyaHuang Aug 16, 2023
848888c
refactoring
JingyaHuang Aug 16, 2023
b816f8e
refactoring
JingyaHuang Aug 16, 2023
9ce8dde
Merge branch 'main' into add-sd-img2img
JingyaHuang Aug 24, 2023
41ee9d7
fix sdxl unet inf
JingyaHuang Sep 4, 2023
010304f
inference done
JingyaHuang Sep 5, 2023
ffe164d
add post processing and doc
JingyaHuang Sep 5, 2023
050b50d
fix style
JingyaHuang Sep 5, 2023
ff2b904
Update docs/source/guides/models.mdx
JingyaHuang Sep 6, 2023
712d410
add test
JingyaHuang Sep 6, 2023
5021f90
update doc prompt
JingyaHuang Sep 6, 2023
aca666f
fix num images per prompt issue
JingyaHuang Sep 6, 2023
4390ceb
fix test
JingyaHuang Sep 6, 2023
d8e14c9
Merge branch 'add-sdxl-inf' into add-sd-img2img
JingyaHuang Sep 6, 2023
708387a
add img2img pipe
JingyaHuang Sep 7, 2023
6b7ebe4
better scale
JingyaHuang Sep 7, 2023
6e6f2d5
test no resize
JingyaHuang Sep 10, 2023
5f345db
Merge branch 'main' into add-sd-img2img
JingyaHuang Sep 11, 2023
cabe5ce
remove image
JingyaHuang Sep 11, 2023
c6232a6
remove hack
JingyaHuang Sep 11, 2023
823ec98
hack for debug vae encoder
JingyaHuang Sep 14, 2023
045b688
Merge branch 'main' into add-sd-img2img
JingyaHuang Sep 19, 2023
8a4bd86
img2img done
JingyaHuang Sep 19, 2023
cd2e09c
add inpaint pipe
JingyaHuang Sep 20, 2023
0aefd5b
add tests
JingyaHuang Sep 20, 2023
fc28225
update doc
JingyaHuang Sep 20, 2023
743196c
title upper class
JingyaHuang Sep 20, 2023
f60e666
add results img
JingyaHuang Sep 20, 2023
0d5c979
fix shape
JingyaHuang Sep 20, 2023
6530550
Merge branch 'main' into add-sd-img2img
JingyaHuang Sep 20, 2023
5da8f4b
address comments & api doc
JingyaHuang Sep 21, 2023
5852370
fix doc
JingyaHuang Sep 21, 2023
581b344
due with name
JingyaHuang Sep 21, 2023
2fb532d
improve api doc
JingyaHuang Sep 21, 2023
0660230
update doc
JingyaHuang Sep 21, 2023
f45dc46
Update docs/source/guides/models.mdx
JingyaHuang Sep 21, 2023
a97552e
Update docs/source/guides/models.mdx
JingyaHuang Sep 21, 2023
f6c6e2f
Update docs/source/guides/models.mdx
JingyaHuang Sep 21, 2023
b786cbb
apply suggestion
JingyaHuang Sep 21, 2023
c17ba14
refactoring other pipes
JingyaHuang Sep 21, 2023
1ff6e9b
Merge branch 'main' into add-sdxl-refiner
JingyaHuang Sep 21, 2023
a04efaf
update sdxl base with neg
JingyaHuang Sep 22, 2023
56c88f6
add refiner export support
JingyaHuang Oct 2, 2023
b7d04d9
finish img2img pipe
JingyaHuang Oct 3, 2023
837c480
add inpaint pipe
JingyaHuang Oct 3, 2023
5c4a0e8
update img2img doc
JingyaHuang Oct 4, 2023
a9348b4
update doc refiner
JingyaHuang Oct 5, 2023
d1c3544
add tests
JingyaHuang Oct 5, 2023
c87b05a
add title for doc
JingyaHuang Oct 5, 2023
9de1e8f
complete docstring
JingyaHuang Oct 6, 2023
873a229
apply comments
JingyaHuang Oct 6, 2023
23869a3
add changes indicators
JingyaHuang Oct 6, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
118 changes: 117 additions & 1 deletion docs/source/guides/models.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,12 @@ And the next time when you want to run inference, just load your compiled model
As you see, there is no need to pass the neuron arguments used during the export as they are
saved in a `config.json` file, and will be restored automatically by `NeuronModelForXXX` class.

<Tip>

When running inference for the first time, there is a warmup phase which would take 3x-4x latency than a regular run.
JingyaHuang marked this conversation as resolved.
Show resolved Hide resolved

</Tip>

## Discriminative NLP models

As explained in the previous section, you will need only few modifications to your Transformers code to export and run NLP models:
Expand Down Expand Up @@ -282,6 +288,7 @@ prompt = "ghibli style, a fantasy landscape with snowcapped mountains, trees, la
image = pipeline(prompt=prompt, image=init_image, strength=0.75, guidance_scale=7.5).images[0]
image.save("fantasy_landscape.png")
```

`image` | `prompt` | output |
:-------------------------:|:-------------------------:|:-------------------------:|-------------------------:|
<img src="https://huggingface.co/datasets/optimum/documentation-images/resolve/main/neuron/models/03-sd-img2img-init.png" alt="landscape photo" width="256" height="256"/> | ***ghibli style, a fantasy landscape with snowcapped mountains, trees, lake with detailed reflection. warm colors, 8K*** | <img src="https://huggingface.co/datasets/optimum/documentation-images/resolve/main/neuron/models/04-sd-img2img.png" alt="drawing" width="250"/> |
Expand Down Expand Up @@ -322,6 +329,8 @@ image.save("cat_on_bench.png")

## Stable Diffusion XL

### Text-to-Image

Similar to Stable Diffusion, you will be able to use `NeuronStableDiffusionXLPipeline` API to export and run inference on Neuron devices with SDXL models.

```python
Expand All @@ -341,7 +350,7 @@ Similar to Stable Diffusion, you will be able to use `NeuronStableDiffusionXLPip
... )
```

Now generate an image with a prompt on neuron:
Now generate an image with a text prompt on neuron:

```python
>>> prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
Expand All @@ -355,5 +364,112 @@ Now generate an image with a prompt on neuron:
alt="sdxl generated image"
/>

### Image-to-Image

With `NeuronStableDiffusionXLImg2ImgPipeline`, you can pass an initial image, and a text prompt to condition gnerated images:
JingyaHuang marked this conversation as resolved.
Show resolved Hide resolved

```python
from optimum.neuron import NeuronStableDiffusionXLImg2ImgPipeline
from diffusers.utils import load_image

prompt = "a dog running, lake, moat"
url = "https://huggingface.co/datasets/optimum/documentation-images/resolve/main/intel/openvino/sd_xl/castle_friedrich.png"
init_image = load_image(url).convert("RGB")

pipe = NeuronStableDiffusionXLImg2ImgPipeline.from_pretrained("sd_neuron_xl/", device_ids=[0, 1])
image = pipe(prompt=prompt, image=init_image).images[0]
```

`image` | `prompt` | output |
:-------------------------:|:-------------------------:|:-------------------------:|-------------------------:|
<img src="https://huggingface.co/datasets/optimum/documentation-images/resolve/main/intel/openvino/sd_xl/castle_friedrich.png" alt="castle photo" width="256" height="256"/> | ***a dog running, lake, moat*** | <img src="https://huggingface.co/datasets/optimum/documentation-images/resolve/main/neuron/models/06-sdxl-img2img.png" alt="castle with dog" width="250"/> |

### Inpaint

With `NeuronStableDiffusionXLInpaintPipeline`, pass the original image and a mask of what you want to replace in the original image. Then replace the masked area with content described in a prompt.

```python
from optimum.neuron import NeuronStableDiffusionXLInpaintPipeline
from diffusers.utils import load_image

img_url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/sdxl-text2img.png"
mask_url = (
"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/sdxl-inpaint-mask.png"
)

init_image = load_image(img_url).convert("RGB")
mask_image = load_image(mask_url).convert("RGB")
prompt = "A deep sea diver floating"

pipe = NeuronStableDiffusionXLInpaintPipeline.from_pretrained("sd_neuron_xl/", device_ids=[0, 1])
image = pipe(prompt=prompt, image=init_image, mask_image=mask_image, strength=0.85, guidance_scale=12.5).images[0]
```

`image` | `mask_image` | `prompt` | output |
:-------------------------:|:-------------------------:|:-------------------------:|-------------------------:|
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/sdxl-text2img.png" alt="drawing" width="250"/> | <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/sdxl-inpaint-mask.png" alt="drawing" width="250"/> | ***A deep sea diver floating*** | <img src="https://huggingface.co/datasets/optimum/documentation-images/resolve/main/neuron/models/07-sdxl-inpaint.png" alt="drawing" width="250"/> |

### Refine Image Quality

SDXL includes a [refiner model](https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0) to denoise low-noise stage images generated from the base model. There are two ways to use the refiner:

1. use the base and refiner model together to produce a refined image.
2. use the base model to produce an image, and subsequently use the refiner model to add more details to the image.

#### Base + refiner model

```python
from optimum.neuron import NeuronStableDiffusionXLPipeline, NeuronStableDiffusionXLImg2ImgPipeline

prompt = "A majestic lion jumping from a big stone at night"
base = NeuronStableDiffusionXLPipeline.from_pretrained("sd_neuron_xl/", device_ids=[0, 1])
image = base(
prompt=prompt,
num_images_per_prompt=num_images_per_prompt,
num_inference_steps=40,
denoising_end=0.8,
output_type="latent",
).images[0]
del base # To avoid neuron device OOM

refiner = NeuronStableDiffusionXLImg2ImgPipeline.from_pretrained("sd_neuron_xl_refiner/", device_ids=[0, 1])
image = image = refiner(
prompt=prompt,
num_inference_steps=40,
denoising_start=0.8,
image=image,
).images[0]
```

<img
src="https://huggingface.co/datasets/optimum/documentation-images/resolve/main/neuron/models/08-sdxl-base-refine.png"
width="256"
height="256"
alt="sdxl base + refiner"
/>

#### Base to refiner model

```python
from optimum.neuron import NeuronStableDiffusionXLPipeline, NeuronStableDiffusionXLImg2ImgPipeline

prompt = "A majestic lion jumping from a big stone at night"
base = NeuronStableDiffusionXLPipeline.from_pretrained("sd_neuron_xl/", device_ids=[0, 1])
image = base(prompt=prompt, output_type="latent").images[0]
del base # To avoid neuron device OOM

refiner = NeuronStableDiffusionXLImg2ImgPipeline.from_pretrained("sd_neuron_xl_refiner/", device_ids=[0, 1])
image = refiner(prompt=prompt, image=image[None, :]).images[0]
```

`Base Image` | Refined Image |
:-------------------------:|-------------------------:|
<img src="https://huggingface.co/datasets/optimum/documentation-images/resolve/main/neuron/models/09-sdxl-base-full.png" alt="drawing" width="250"/> | <img src="https://huggingface.co/datasets/optimum/documentation-images/resolve/main/neuron/models/010-sdxl-refiner-detailed.png" alt="drawing" width="250"/> |

<Tip>

To avoid Neuron device out of memory, it's suggested to finish all base inference and release the device memory before running the refiner.

</Tip>

Happy inference with Neuron! 🚀
8 changes: 8 additions & 0 deletions docs/source/package_reference/modeling.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -87,4 +87,12 @@ The following Neuron model classes are available for natural language processing
### NeuronStableDiffusionXLPipeline

[[autodoc]] modeling_diffusion.NeuronStableDiffusionXLPipeline
- __call__

### NeuronStableDiffusionXLImg2ImgPipeline
[[autodoc]] modeling_diffusion.NeuronStableDiffusionXLImg2ImgPipeline
- __call__

### NeuronStableDiffusionXLInpaintPipeline
[[autodoc]] modeling_diffusion.NeuronStableDiffusionXLInpaintPipeline
- __call__
19 changes: 14 additions & 5 deletions optimum/exporters/neuron/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -138,7 +138,12 @@ def infer_stable_diffusion_shapes_from_diffusers(
input_shapes: Dict[str, Dict[str, int]],
model: Union["StableDiffusionPipeline", "StableDiffusionXLPipeline"],
):
sequence_length = model.tokenizer.model_max_length
if model.tokenizer is not None:
sequence_length = model.tokenizer.model_max_length
elif hasattr(model, "tokenizer_2") and model.tokenizer_2 is not None:
sequence_length = model.tokenizer_2.model_max_length
else:
raise AttributeError(f"Cannot infer sequence_length from {type(model)} as there is no tokenizer as attribute.")
unet_num_channels = model.unet.config.in_channels
vae_encoder_num_channels = model.vae.config.in_channels
vae_decoder_num_channels = model.vae.config.latent_channels
Expand Down Expand Up @@ -227,8 +232,9 @@ def main_export(

# Saving the model config and preprocessor as this is needed sometimes.
model.scheduler.save_pretrained(output.joinpath("scheduler"))
model.tokenizer.save_pretrained(output.joinpath("tokenizer"))
if hasattr(model, "tokenizer_2"):
if hasattr(model, "tokenizer") and model.tokenizer is not None:
model.tokenizer.save_pretrained(output.joinpath("tokenizer"))
if hasattr(model, "tokenizer_2") and model.tokenizer_2 is not None:
model.tokenizer_2.save_pretrained(output.joinpath("tokenizer_2"))
if hasattr(model, "feature_extractor"):
model.feature_extractor.save_pretrained(output.joinpath("feature_extractor"))
Expand All @@ -241,12 +247,15 @@ def main_export(
**input_shapes,
)
output_model_names = {
DIFFUSION_MODEL_TEXT_ENCODER_NAME: os.path.join(DIFFUSION_MODEL_TEXT_ENCODER_NAME, NEURON_FILE_NAME),
DIFFUSION_MODEL_UNET_NAME: os.path.join(DIFFUSION_MODEL_UNET_NAME, NEURON_FILE_NAME),
DIFFUSION_MODEL_VAE_ENCODER_NAME: os.path.join(DIFFUSION_MODEL_VAE_ENCODER_NAME, NEURON_FILE_NAME),
DIFFUSION_MODEL_VAE_DECODER_NAME: os.path.join(DIFFUSION_MODEL_VAE_DECODER_NAME, NEURON_FILE_NAME),
}
if hasattr(model, "text_encoder_2"):
if hasattr(model, "text_encoder") and model.text_encoder is not None:
output_model_names[DIFFUSION_MODEL_TEXT_ENCODER_NAME] = os.path.join(
DIFFUSION_MODEL_TEXT_ENCODER_NAME, NEURON_FILE_NAME
)
if hasattr(model, "text_encoder_2") and model.text_encoder_2 is not None:
output_model_names[DIFFUSION_MODEL_TEXT_ENCODER_2_NAME] = os.path.join(
DIFFUSION_MODEL_TEXT_ENCODER_2_NAME, NEURON_FILE_NAME
)
Expand Down
4 changes: 4 additions & 0 deletions optimum/exporters/neuron/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
from collections import OrderedDict
from typing import TYPE_CHECKING, Dict, Optional, Tuple, Union

import torch
from transformers import PretrainedConfig

from ...neuron.utils import (
Expand Down Expand Up @@ -282,6 +283,9 @@ def _get_submodels_for_export_stable_diffusion(
)
models_for_export.append((DIFFUSION_MODEL_UNET_NAME, copy.deepcopy(pipeline.unet)))

if pipeline.vae.config.get("force_upcast", None) is True:
pipeline.vae.to(dtype=torch.float32)

# VAE Encoder
vae_encoder = copy.deepcopy(pipeline.vae)
vae_encoder.forward = lambda sample: {"latent_sample": vae_encoder.encode(x=sample)["latent_dist"].sample()}
Expand Down
4 changes: 4 additions & 0 deletions optimum/neuron/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,8 @@
"NeuronStableDiffusionImg2ImgPipeline",
"NeuronStableDiffusionInpaintPipeline",
"NeuronStableDiffusionXLPipeline",
"NeuronStableDiffusionXLImg2ImgPipeline",
"NeuronStableDiffusionXLInpaintPipeline",
],
"modeling_decoder": ["NeuronDecoderModel"],
"accelerate": [
Expand Down Expand Up @@ -65,6 +67,8 @@
NeuronStableDiffusionImg2ImgPipeline,
NeuronStableDiffusionInpaintPipeline,
NeuronStableDiffusionPipeline,
NeuronStableDiffusionXLImg2ImgPipeline,
NeuronStableDiffusionXLInpaintPipeline,
NeuronStableDiffusionXLPipeline,
)
from .pipelines import pipeline
Expand Down
Loading
Loading