Skip to content

Commit

Permalink
[Stable Diffusion] Image2image and inpaint pipeline support (#161)
Browse files Browse the repository at this point in the history
* draft img2img sd pipe

* refactor inheritance

* refactoring

* refactoring

* fix sdxl unet inf

* inference done

* add post processing and doc

* fix style

* add test

* update doc prompt

* fix num images per prompt issue

* fix test

* add img2img pipe

* better scale

* test no resize

* remove image

* remove hack

* hack for debug vae encoder

* img2img done

* add inpaint pipe

* add tests

* update doc

* title upper class

* add results img

* fix shape

* address comments & api doc

* fix doc

* due with name

* improve api doc

* update doc

* Update docs/source/guides/models.mdx

Co-authored-by: Michael Benayoun <[email protected]>

* Update docs/source/guides/models.mdx

Co-authored-by: Michael Benayoun <[email protected]>

* Update docs/source/guides/models.mdx

Co-authored-by: Michael Benayoun <[email protected]>

* apply suggestion

---------

Co-authored-by: JingyaHuang <[email protected]>
Co-authored-by: Michael Benayoun <[email protected]>
  • Loading branch information
3 people authored Sep 21, 2023
1 parent d2f5c9d commit 16115ac
Show file tree
Hide file tree
Showing 16 changed files with 1,456 additions and 383 deletions.
74 changes: 73 additions & 1 deletion docs/source/guides/models.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -211,7 +211,11 @@ You can also accelerate the inference of stable diffusion on neuronx devices (in
* VAE encoder
* VAE decoder

The export can be done either with the CLI or with `NeuronStableDiffusionPipeline` API. Here is an example of exporting stable diffusion components with `NeuronStableDiffusionPipeline`:
### Text-to-Image

`NeuronStableDiffusionPipeline` class allows you to generate images from a text prompt on neuron devices similar to the experience with `diffusers`.

Like for other tasks, you need to compile models before being able to perform inference. The export can be done either via the CLI or via `NeuronStableDiffusionPipeline` API. Here is an example of exporting stable diffusion components with `NeuronStableDiffusionPipeline`:

<Tip>

Expand Down Expand Up @@ -247,9 +251,75 @@ Now generate an image with a prompt on neuron:

<img
src="https://raw.githubusercontent.com/huggingface/optimum-neuron/main/docs/assets/guides/models/01-sd-image.png"
width="256"
height="256"
alt="stable diffusion generated image"
/>

### Image-to-Image

With the `NeuronStableDiffusionImg2ImgPipeline` class, you can generate a new image conditioned on a text prompt and an initial image.

```python
import requests
from PIL import Image
from io import BytesIO
from optimum.neuron import NeuronStableDiffusionImg2ImgPipeline

model_id = "nitrosocke/Ghibli-Diffusion"
input_shapes = {"batch_size": 1, "height": 512, "width": 512}
pipeline = NeuronStableDiffusionImg2ImgPipeline.from_pretrained(model_id, export=True, **input_shapes, device_ids=[0, 1])
pipeline.save_pretrained("sd_img2img/")

url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg"

response = requests.get(url)
init_image = Image.open(BytesIO(response.content)).convert("RGB")
init_image = init_image.resize((512, 512))

prompt = "ghibli style, a fantasy landscape with snowcapped mountains, trees, lake with detailed reflection. sunlight and cloud in the sky, warm colors, 8K"

image = pipeline(prompt=prompt, image=init_image, strength=0.75, guidance_scale=7.5).images[0]
image.save("fantasy_landscape.png")
```
`image` | `prompt` | output |
:-------------------------:|:-------------------------:|:-------------------------:|-------------------------:|
<img src="https://huggingface.co/datasets/optimum/documentation-images/resolve/main/neuron/models/03-sd-img2img-init.png" alt="landscape photo" width="256" height="256"/> | ***ghibli style, a fantasy landscape with snowcapped mountains, trees, lake with detailed reflection. warm colors, 8K*** | <img src="https://huggingface.co/datasets/optimum/documentation-images/resolve/main/neuron/models/04-sd-img2img.png" alt="drawing" width="250"/> |

### Inpaint

With the `NeuronStableDiffusionInpaintPipeline` class, you can edit specific parts of an image by providing a mask and a text prompt.

```python
import requests
from PIL import Image
from io import BytesIO
from optimum.neuron import NeuronStableDiffusionInpaintPipeline

model_id = "runwayml/stable-diffusion-inpainting"
input_shapes = {"batch_size": 1, "height": 512, "width": 512}
pipeline = NeuronStableDiffusionInpaintPipeline.from_pretrained(model_id, export=True, **input_shapes, device_ids=[0, 1])
pipeline.save_pretrained("sd_inpaint/")

def download_image(url):
response = requests.get(url)
return Image.open(BytesIO(response.content)).convert("RGB")

img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png"
mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"

init_image = download_image(img_url).resize((512, 512))
mask_image = download_image(mask_url).resize((512, 512))

prompt = "Face of a yellow cat, high resolution, sitting on a park bench"
image = pipeline(prompt=prompt, image=init_image, mask_image=mask_image).images[0]
image.save("cat_on_bench.png")
```

`image` | `mask_image` | `prompt` | output |
:-------------------------:|:-------------------------:|:-------------------------:|-------------------------:|
<img src="https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png" alt="drawing" width="250"/> | <img src="https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png" alt="drawing" width="250"/> | ***Face of a yellow cat, high resolution, sitting on a park bench*** | <img src="https://huggingface.co/datasets/optimum/documentation-images/resolve/main/neuron/models/05-sd-inpaint.png" alt="drawing" width="250"/> |

## Stable Diffusion XL

Similar to Stable Diffusion, you will be able to use `NeuronStableDiffusionXLPipeline` API to export and run inference on Neuron devices with SDXL models.
Expand Down Expand Up @@ -280,6 +350,8 @@ Now generate an image with a prompt on neuron:

<img
src="https://raw.githubusercontent.com/huggingface/optimum-neuron/main/docs/assets/guides/models/02-sdxl-image.jpeg"
width="256"
height="256"
alt="sdxl generated image"
/>

Expand Down
3 changes: 2 additions & 1 deletion docs/source/package_reference/export.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,8 @@ Since many architectures share similar properties for their Neuron configuration
| RoFormer | feature-extraction, fill-mask, multiple-choice, question-answering, text-classification, token-classification |
| XLM | feature-extraction, fill-mask, multiple-choice, question-answering, text-classification, token-classification |
| XLM-RoBERTa | feature-extraction, fill-mask, multiple-choice, question-answering, text-classification, token-classification |
| Stable Diffusion | text-to-image |
| Stable Diffusion | text-to-image, image-to-image, inpaint |
| Stable Diffusion XL | text-to-image |


<Tip>
Expand Down
18 changes: 17 additions & 1 deletion docs/source/package_reference/modeling.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -71,4 +71,20 @@ The following Neuron model classes are available for natural language processing

### NeuronStableDiffusionPipeline

[[autodoc]] modeling_diffusion.NeuronStableDiffusionPipeline
[[autodoc]] modeling_diffusion.NeuronStableDiffusionPipeline
- __call__

### NeuronStableDiffusionImg2ImgPipeline

[[autodoc]] modeling_diffusion.NeuronStableDiffusionImg2ImgPipeline
- __call__

### NeuronStableDiffusionInpaintPipeline

[[autodoc]] modeling_diffusion.NeuronStableDiffusionInpaintPipeline
- __call__

### NeuronStableDiffusionXLPipeline

[[autodoc]] modeling_diffusion.NeuronStableDiffusionXLPipeline
- __call__
15 changes: 11 additions & 4 deletions optimum/exporters/neuron/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -143,18 +143,25 @@ def infer_stable_diffusion_shapes_from_diffusers(
vae_encoder_num_channels = model.vae.config.in_channels
vae_decoder_num_channels = model.vae.config.latent_channels
vae_scale_factor = 2 ** (len(model.vae.config.block_out_channels) - 1) or 8
height = input_shapes["unet_input_shapes"]["height"] // vae_scale_factor
width = input_shapes["unet_input_shapes"]["width"] // vae_scale_factor
height = input_shapes["unet_input_shapes"]["height"]
scaled_height = height // vae_scale_factor
width = input_shapes["unet_input_shapes"]["width"]
scaled_width = width // vae_scale_factor

input_shapes["text_encoder_input_shapes"].update({"sequence_length": sequence_length})
input_shapes["unet_input_shapes"].update(
{"sequence_length": sequence_length, "num_channels": unet_num_channels, "height": height, "width": width}
{
"sequence_length": sequence_length,
"num_channels": unet_num_channels,
"height": scaled_height,
"width": scaled_width,
}
)
input_shapes["vae_encoder_input_shapes"].update(
{"num_channels": vae_encoder_num_channels, "height": height, "width": width}
)
input_shapes["vae_decoder_input_shapes"].update(
{"num_channels": vae_decoder_num_channels, "height": height, "width": width}
{"num_channels": vae_decoder_num_channels, "height": scaled_height, "width": scaled_width}
)

return input_shapes
Expand Down
19 changes: 18 additions & 1 deletion optimum/exporters/neuron/model_configs.py
Original file line number Diff line number Diff line change
Expand Up @@ -256,6 +256,7 @@ def outputs(self) -> List[str]:

def generate_dummy_inputs(self, return_tuple: bool = False, **kwargs):
# For neuron, we use static shape for compiling the unet. Unlike `optimum`, we use the given `height` and `width` instead of the `sample_size`.
# TODO: Modify optimum.utils.DummyVisionInputGenerator to enable unequal height and width (it prioritize `image_size` to custom h/w now)
if self.height == self.width:
self._normalized_config.image_size = self.height
else:
Expand Down Expand Up @@ -302,7 +303,7 @@ def check_model_inputs_order(self, model, dummy_inputs):

@register_in_tasks_manager("vae-encoder", *["semantic-segmentation"])
class VaeEncoderNeuronConfig(VisionNeuronConfig):
ATOL_FOR_VALIDATION = 1e-2
ATOL_FOR_VALIDATION = 1e-3
MODEL_TYPE = "vae-encoder"

NORMALIZED_CONFIG_CLASS = NormalizedConfig.with_args(
Expand All @@ -319,6 +320,22 @@ def inputs(self) -> List[str]:
def outputs(self) -> List[str]:
return ["latent_sample"]

def generate_dummy_inputs(self, return_tuple: bool = False, **kwargs):
# For neuron, we use static shape for compiling the unet. Unlike `optimum`, we use the given `height` and `width` instead of the `sample_size`.
# TODO: Modify optimum.utils.DummyVisionInputGenerator to enable unequal height and width (it prioritize `image_size` to custom h/w now)
if self.height == self.width:
self._normalized_config.image_size = self.height
else:
raise ValueError(
"You need to input the same value for `self.height({self.height})` and `self.width({self.width})`."
)
dummy_inputs = super().generate_dummy_inputs(**kwargs)

if return_tuple is True:
return tuple(dummy_inputs.values())
else:
return dummy_inputs


@register_in_tasks_manager("vae-decoder", *["semantic-segmentation"])
class VaeDecoderNeuronConfig(VisionNeuronConfig):
Expand Down
4 changes: 4 additions & 0 deletions optimum/neuron/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,8 @@
],
"modeling_diffusion": [
"NeuronStableDiffusionPipeline",
"NeuronStableDiffusionImg2ImgPipeline",
"NeuronStableDiffusionInpaintPipeline",
"NeuronStableDiffusionXLPipeline",
],
"modeling_decoder": ["NeuronDecoderModel"],
Expand All @@ -60,6 +62,8 @@
from .modeling_base import NeuronBaseModel
from .modeling_decoder import NeuronDecoderModel
from .modeling_diffusion import (
NeuronStableDiffusionImg2ImgPipeline,
NeuronStableDiffusionInpaintPipeline,
NeuronStableDiffusionPipeline,
NeuronStableDiffusionXLPipeline,
)
Expand Down
49 changes: 37 additions & 12 deletions optimum/neuron/modeling_diffusion.py
Original file line number Diff line number Diff line change
Expand Up @@ -59,8 +59,12 @@
from diffusers.schedulers.scheduling_utils import SCHEDULER_CONFIG_NAME
from diffusers.utils import CONFIG_NAME, is_invisible_watermark_available

from .pipelines.diffusers.pipeline_stable_diffusion import StableDiffusionPipelineMixin
from .pipelines.diffusers.pipeline_stable_diffusion_xl import StableDiffusionXLPipelineMixin
from .pipelines import (
NeuronStableDiffusionImg2ImgPipelineMixin,
NeuronStableDiffusionInpaintPipelineMixin,
NeuronStableDiffusionPipelineMixin,
NeuronStableDiffusionXLPipelineMixin,
)


if TYPE_CHECKING:
Expand Down Expand Up @@ -158,16 +162,16 @@ def __init__(
self.unet = NeuronModelUnet(
unet, self, self.configs[DIFFUSION_MODEL_UNET_NAME], self.neuron_configs[DIFFUSION_MODEL_UNET_NAME]
)
self.vae_encoder = (
NeuronModelVaeEncoder(
if vae_encoder is not None:
self.vae_encoder = NeuronModelVaeEncoder(
vae_encoder,
self,
self.configs[DIFFUSION_MODEL_VAE_ENCODER_NAME],
self.neuron_configs[DIFFUSION_MODEL_VAE_ENCODER_NAME],
)
if vae_encoder is not None
else None
)
else:
self.vae_encoder = None

self.vae_decoder = NeuronModelVaeDecoder(
vae_decoder,
self,
Expand Down Expand Up @@ -623,15 +627,36 @@ def __init__(
):
super().__init__(model, parent_model, config, neuron_config, DIFFUSION_MODEL_VAE_DECODER_NAME)

def forward(self, latent_sample: torch.Tensor):
def forward(
self,
latent_sample: torch.Tensor,
image: Optional[torch.Tensor] = None,
mask: Optional[torch.Tensor] = None,
):
inputs = (latent_sample,)
if image is not None:
inputs += (image,)
if mask is not None:
inputs += (mask,)
outputs = self.model(*inputs)

return tuple(output for output in outputs.values())


class NeuronStableDiffusionPipeline(NeuronStableDiffusionPipelineBase, StableDiffusionPipelineMixin):
__call__ = StableDiffusionPipelineMixin.__call__
class NeuronStableDiffusionPipeline(NeuronStableDiffusionPipelineBase, NeuronStableDiffusionPipelineMixin):
__call__ = NeuronStableDiffusionPipelineMixin.__call__


class NeuronStableDiffusionImg2ImgPipeline(
NeuronStableDiffusionPipelineBase, NeuronStableDiffusionImg2ImgPipelineMixin
):
__call__ = NeuronStableDiffusionImg2ImgPipelineMixin.__call__


class NeuronStableDiffusionInpaintPipeline(
NeuronStableDiffusionPipelineBase, NeuronStableDiffusionInpaintPipelineMixin
):
__call__ = NeuronStableDiffusionInpaintPipelineMixin.__call__


class NeuronStableDiffusionXLPipelineBase(NeuronStableDiffusionPipelineBase):
Expand Down Expand Up @@ -689,5 +714,5 @@ def __init__(
self.watermark = None


class NeuronStableDiffusionXLPipeline(NeuronStableDiffusionXLPipelineBase, StableDiffusionXLPipelineMixin):
__call__ = StableDiffusionXLPipelineMixin.__call__
class NeuronStableDiffusionXLPipeline(NeuronStableDiffusionXLPipelineBase, NeuronStableDiffusionXLPipelineMixin):
__call__ = NeuronStableDiffusionXLPipelineMixin.__call__
12 changes: 12 additions & 0 deletions optimum/neuron/pipelines/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,9 +20,21 @@

_import_structure = {
"transformers": ["pipeline"],
"diffusers": [
"NeuronStableDiffusionPipelineMixin",
"NeuronStableDiffusionImg2ImgPipelineMixin",
"NeuronStableDiffusionInpaintPipelineMixin",
"NeuronStableDiffusionXLPipelineMixin",
],
}

if TYPE_CHECKING:
from .diffusers import (
NeuronStableDiffusionImg2ImgPipelineMixin,
NeuronStableDiffusionInpaintPipelineMixin,
NeuronStableDiffusionPipelineMixin,
NeuronStableDiffusionXLPipelineMixin,
)
from .transformers import (
pipeline,
)
Expand Down
19 changes: 19 additions & 0 deletions optimum/neuron/pipelines/diffusers/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# coding=utf-8
# Copyright 2023 The HuggingFace Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

from .pipeline_stable_diffusion import NeuronStableDiffusionPipelineMixin
from .pipeline_stable_diffusion_img2img import NeuronStableDiffusionImg2ImgPipelineMixin
from .pipeline_stable_diffusion_inpaint import NeuronStableDiffusionInpaintPipelineMixin
from .pipeline_stable_diffusion_xl import NeuronStableDiffusionXLPipelineMixin
Loading

0 comments on commit 16115ac

Please sign in to comment.