Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve image-to-image task page #867

Merged
merged 40 commits into from
Sep 3, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
0e803b4
change image-to-image task description
linoytsaban Aug 26, 2024
18271c1
Update packages/tasks/src/tasks/image-to-image/data.ts
linoytsaban Aug 26, 2024
af83fc4
Merge branch 'main' into img2img-task
linoytsaban Aug 26, 2024
884ad91
add text-to-image note
linoytsaban Aug 26, 2024
bb27566
changes to use cases - add links and adapt layout
linoytsaban Aug 27, 2024
f86c162
changes to use cases - add links and adapt layout
linoytsaban Aug 27, 2024
8be738c
changes to use cases - add links and adapt layout
linoytsaban Aug 27, 2024
821232a
changes to use cases - add links and adapt layout
linoytsaban Aug 27, 2024
71ffb50
changes to use cases - add links and adapt layout
linoytsaban Aug 27, 2024
6d860e3
changes to use cases - add links and adapt layout
linoytsaban Aug 27, 2024
2175a55
changes to use cases - add links and adapt layout
linoytsaban Aug 27, 2024
31ca4e4
changes to use cases - add links and adapt layout
linoytsaban Aug 27, 2024
4d0255b
format
linoytsaban Aug 27, 2024
165ae3c
Update packages/tasks/src/tasks/image-to-image/about.md
linoytsaban Aug 27, 2024
16556f5
Update packages/tasks/src/tasks/image-to-image/about.md
linoytsaban Aug 27, 2024
acf2b37
Merge branch 'main' into img2img-task
linoytsaban Aug 27, 2024
89b9fa3
shorten task summary
linoytsaban Aug 29, 2024
c1f3582
move links
linoytsaban Aug 29, 2024
68fe0e4
format
linoytsaban Aug 29, 2024
1fa4152
add style transfer inference example & link
linoytsaban Aug 29, 2024
444ea0a
format
linoytsaban Aug 29, 2024
76a4762
Merge branch 'main' into img2img-task
linoytsaban Aug 29, 2024
8a56c96
add to comment
linoytsaban Aug 29, 2024
bafe332
Merge remote-tracking branch 'origin/img2img-task' into img2img-task
linoytsaban Aug 29, 2024
2422178
Merge branch 'main' into img2img-task
linoytsaban Sep 2, 2024
45a3ee8
update example (to not use runwayml/sd1.5)
linoytsaban Sep 2, 2024
02e338c
Update packages/tasks/src/tasks/image-to-image/data.ts
linoytsaban Sep 2, 2024
de31621
add text-to-image explaination
linoytsaban Sep 2, 2024
798f6bd
move link & add image example for controlnet
linoytsaban Sep 2, 2024
6582eac
format
linoytsaban Sep 2, 2024
a6d2d3d
Merge branch 'main' into img2img-task
linoytsaban Sep 2, 2024
e84b1f8
remove space
linoytsaban Sep 2, 2024
187e1e2
Merge remote-tracking branch 'origin/img2img-task' into img2img-task
linoytsaban Sep 2, 2024
8560bf5
fix string
linoytsaban Sep 2, 2024
ad2e430
fix string
linoytsaban Sep 2, 2024
1d60e11
Merge branch 'main' into img2img-task
linoytsaban Sep 2, 2024
0ef35b5
Update packages/tasks/src/tasks/image-to-image/about.md
linoytsaban Sep 2, 2024
d3b0548
Merge branch 'main' into img2img-task
linoytsaban Sep 2, 2024
201fa70
add comments to img2img inference example
linoytsaban Sep 3, 2024
00abbb6
format
linoytsaban Sep 3, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
91 changes: 70 additions & 21 deletions packages/tasks/src/tasks/image-to-image/about.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,10 @@
## Use Cases

### Style transfer
Image-to-image pipelines can also be used in text-to-image tasks, to provide visual guidance to the text-guided generation process.

One of the most popular use cases of image-to-image is style transfer. Style transfer models can convert a normal photography into a painting in the style of a famous painter.

## Task Variants
## Use Cases

### Image inpainting

Image inpainting is widely used during photography editing to remove unwanted objects, such as poles, wires, or sensor
dust.
Image inpainting is widely used during photography editing to remove unwanted objects, such as poles, wires, or sensor dust.

### Image colorization

Expand All @@ -24,18 +19,27 @@ Super-resolution models increase the resolution of an image, allowing for higher
You can use pipelines for image-to-image in 🧨diffusers library to easily use image-to-image models. See an example for `StableDiffusionImg2ImgPipeline` below.

```python
from PIL import Image
from diffusers import StableDiffusionImg2ImgPipeline
import torch
from diffusers import AutoPipelineForImage2Image
from diffusers.utils import make_image_grid, load_image

model_id_or_path = "runwayml/stable-diffusion-v1-5"
pipe = StableDiffusionImg2ImgPipeline.from_pretrained(model_id_or_path, torch_dtype=torch.float16)
pipe = pipe.to(cuda)
pipeline = AutoPipelineForImage2Image.from_pretrained(
"stabilityai/stable-diffusion-xl-refiner-1.0", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
)

init_image = Image.open("mountains_image.jpeg").convert("RGB").resize((768, 512))
prompt = "A fantasy landscape, trending on artstation"
# this helps us to reduce memory usage- since SDXL is a bit heavy, this could help by
# offloading the model to CPU w/o hurting performance.
pipeline.enable_model_cpu_offload()

images = pipe(prompt=prompt, image=init_image, strength=0.75, guidance_scale=7.5).images
images[0].save("fantasy_landscape.png")
# prepare image
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/img2img-sdxl-init.png"
init_image = load_image(url)

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"

# pass prompt and image to pipeline
image = pipeline(prompt, image=init_image, strength=0.5).images[0]
make_image_grid([init_image, image], rows=1, cols=2)
```

You can use [huggingface.js](https://github.com/huggingface/huggingface.js) to infer image-to-image models on Hugging Face Hub.
Expand All @@ -53,13 +57,53 @@ await inference.imageToImage({
});
```

## ControlNet
## Uses Cases for Text Guided Image Generation

Controlling the outputs of diffusion models only with a text prompt is a challenging problem. ControlNet is a neural network model that provides image-based control to diffusion models. Control images can be edges or other landmarks extracted from a source image.
### Style Transfer

One of the most popular use cases of image-to-image is style transfer. With style transfer models:

Many ControlNet models were trained in our community event, JAX Diffusers sprint. You can see the full list of the ControlNet models available [here](https://huggingface.co/spaces/jax-diffusers-event/leaderboard).
- a regular photo can be transformed into a variety of artistic styles or genres, such as a watercolor painting, a comic book illustration and more.
- new images can be generated using a text prompt, in the style of a reference input image.

See 🧨diffusers example for style transfer with `AutoPipelineForText2Image` below.

```python
from diffusers import AutoPipelineForText2Image
from diffusers.utils import load_image
import torch

# load pipeline
pipeline = AutoPipelineForText2Image.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16).to("cuda")
pipeline.load_ip_adapter("h94/IP-Adapter", subfolder="sdxl_models", weight_name="ip-adapter_sdxl.bin")

# set the adapter and scales - this is a component that lets us add the style control from an image to the text-to-image model
scale = {
"down": {"block_2": [0.0, 1.0]},
"up": {"block_0": [0.0, 1.0, 0.0]},
}
pipeline.set_ip_adapter_scale(scale)

style_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/0052a70beed5bf71b92610a43a52df6d286cd5f3/diffusers/rabbit.jpg")

generator = torch.Generator(device="cpu").manual_seed(26)
image = pipeline(
prompt="a cat, masterpiece, best quality, high quality",
ip_adapter_image=style_image,
negative_prompt="text, watermark, lowres, low quality, worst quality, deformed, glitch, low contrast, noisy, saturation, blurry",
guidance_scale=5,
num_inference_steps=30,
generator=generator,
).images[0]
image
```

### ControlNet

Controlling the outputs of diffusion models only with a text prompt is a challenging problem. ControlNet is a neural network model that provides image-based control to diffusion models. Control images can be edges or other landmarks extracted from a source image.
![Examples](https://huggingface.co/datasets/optimum/documentation-images/resolve/main/neuron/models/12-sdxl-text2img-controlnet.png)

## Most Used Model for the Task
## Pix2Pix

linoytsaban marked this conversation as resolved.
Show resolved Hide resolved
Pix2Pix is a popular model used for image-to-image translation tasks. It is based on a conditional-GAN (generative adversarial network) where instead of a noise vector a 2D image is given as input. More information about Pix2Pix can be retrieved from this [link](https://phillipi.github.io/pix2pix/) where the associated paper and the GitHub repository can be found.

Expand All @@ -70,8 +114,13 @@ The images below show some examples extracted from the Pix2Pix paper. This model
## Useful Resources

- [Image-to-image guide with diffusers](https://huggingface.co/docs/diffusers/using-diffusers/img2img)
- Image inpainting: [inpainting with 🧨diffusers](https://huggingface.co/docs/diffusers/main/en/api/pipelines/stable_diffusion/inpaint), [demo](https://huggingface.co/spaces/diffusers/stable-diffusion-xl-inpainting)
- Colorization: [demo](https://huggingface.co/spaces/modelscope/old_photo_restoration)
- Super resolution: [image upscaling with 🧨diffusers](https://huggingface.co/docs/diffusers/main/en/api/pipelines/stable_diffusion/upscale#super-resolution), [demo](https://huggingface.co/spaces/radames/Enhance-This-HiDiffusion-SDXL)
- [Style transfer and layout control with diffusers 🧨](https://huggingface.co/docs/diffusers/main/en/using-diffusers/ip_adapter#style--layout-control)
- [Train your ControlNet with diffusers 🧨](https://huggingface.co/blog/train-your-controlnet)
- [Ultra fast ControlNet with 🧨 Diffusers](https://huggingface.co/blog/controlnet)
- [List of ControlNets trained in the community JAX Diffusers sprint](https://huggingface.co/spaces/jax-diffusers-event/leaderboard)

## References

Expand Down
2 changes: 1 addition & 1 deletion packages/tasks/src/tasks/image-to-image/data.ts
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,7 @@ const taskData: TaskDataCustom = {
},
],
summary:
"Image-to-image is the task of transforming a source image to match the characteristics of a target image or a target image domain. Any image manipulation and enhancement is possible with image to image models.",
"Image-to-image is the task of transforming an input image through a variety of possible manipulations and enhancements, such as super-resolution, image inpainting, colorization, and more.",
widgetModels: ["lllyasviel/sd-controlnet-canny"],
youtubeId: "",
};
Expand Down
Loading