Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide an AI image upscaler #97

Open
JensKorte opened this issue Dec 24, 2023 · 9 comments
Open

Provide an AI image upscaler #97

JensKorte opened this issue Dec 24, 2023 · 9 comments
Assignees
Labels

Comments

@JensKorte
Copy link

To provide a better image quality it would be nice if there would be an option in the menu line to activate/deactivate AI upscaling. A GPL3 upscaler is available at https://www.upscayl.org . As a ZIP file it used ~300MB. The AI upscaler could provide wrong informations.

Since it needs several seconds on an Intel Core i5 (2016) per image maybe a solution would be to create a caching directory and provide an image srcset link for the available upscaled images. The upscale logo could have 3 modes: off; on, but only partly available for this page and on, all images upscaled available.

A simple solution would be, if the user just needs to reload the page manually. Further steps could be that the partly available upscale logo has a script that loads once every 5 seconds to get a hint, if all images have finished upscaling and shows a ! if all images are available. The next step would be to do a configurable reload when all upscaled images are available.

Rules for reducing the caching dir could be:

  1. Is the content bigger then max cache dir size: a) Remove oldest where the zim file is not loaded. b) Remove oldest
  2. Is an image older then four weeks/configured time?

(I converted the original d.webp from WP-de to PNG since Github doesn't supportg webp)

https://library.kiwix.org/content/wikipedia_de_all_maxi/I/Germany_in_the_European_Union_on_the_globe_(Europe_centered).svg.png.webp

d
d_upscayl_4x_realesrgan-x4plus

@Jaifroid
Copy link
Member

Jaifroid commented Jan 1, 2024

It seems like a lot of extra work for the server for small gain which would only be relevant if the user zooms in on a page or image. Even then, what they're seeing in an AI-upscaled image is essentially invented detail, albeit based on best high-probability guess. It wouldn't actually reproduce exact detail lost in the original downscaling process. We could either end up with fake-looking images or else, in the worst case, hallucinated detail. Call me cynical...

@kelson42 kelson42 self-assigned this Jan 1, 2024
@kelson42
Copy link
Contributor

kelson42 commented Jan 1, 2024

@JensKorte Thank you for your ticket. It's an interesting one. I'm not sure to fully understand your use case, but if I get it right it would be a software solution to allow better images, dynamically improved, without actually packing high quality images in the ZIM. That would imply to embeds the software solution in Kiwix and considering this is all in typescript.... this is not something very simple... But it could be considered on the longer term. I will move this ticket to kiwix/overview.

@kelson42 kelson42 transferred this issue from kiwix/kiwix-tools Jan 1, 2024
@Jaifroid
Copy link
Member

@kelson42 My view is that this issue should be closed as not planned. We live in an era where AI is eroding the difference between reality and fabulation, and upscaling low-resolution images, while a neat party trick, can only add invented detail, which in my view corrupts what one is seeing. If we want higher-resolution images, we can always do that by decreasing the amount of downscaling at scrape time, at least for smaller Wikipedia ZIMs, and this would be a much more accurate way of undoing the loss of detail.

@kelson42
Copy link
Contributor

@kelson42 My view is that this issue should be closed as not planned. We live in an era where AI is eroding the difference between reality and fabulation, and upscaling low-resolution images, while a neat party trick, can only add invented detail, which in my view corrupts what one is seeing. If we want higher-resolution images, we can always do that by decreasing the amount of downscaling at scrape time, at least for smaller Wikipedia ZIMs, and this would be a much more accurate way of undoing the loss of detail.

@Jaifroid Very strong statement. I have no strong opinion on this even if I believe I would not go so far. But, we would definitly need to clearly inform that such a picture has been partly „invented“.

Anyway, at this stage, the problem is mostly technical. We sould need a library able to do so, in native format… to be able to be put in Kiwix… before even considering to use it.

@Jaifroid
Copy link
Member

@kelson42 Sorry, I'm just getting increasingly worried about the proliferation of fake imagery. I think a "unique selling point" for Kiwix (at least its major offline Wikipedia role) is that unlike AI, Offline Wikipedia provides content you can rely on not to be contaminated with hallucination, and that includes hallucinated image detail. But of course this is just my opinion. I think it would be worth having some kind of discussion about such things!

@kelson42
Copy link
Contributor

I think it would be worth having some kind of discussion about such things!

@Jaifroid Definitly, this is worth it. We can even make a policy about it.

@JensKorte
Copy link
Author

One way to be sure that the upscaling is alright would be, if there is a reproducable upscaling algorithm and after compressing the image the reproducable upscaling is done, automatically compared to the original hires image and if the comparison is OK the file gets e.g. a XMP comment that upscaling with scaler x and its version, upscaling factor y seems to be OK. A hash of the upscaled image could also be included.

@Jaifroid
Copy link
Member

@JensKorte It's an interesting idea. But correct me if I'm wrong: if we're using AI to upscale, then it's non-deterministic, right? The AI will add detail according to its input prompt (in this case an image) and its specific training, and we might get subtly different upscaled images each time. The upscaling done client-side might never be the exact same upscaling done at scrape time in order to store a hash...

I may be extrapolating from the way language models work, as opposed to Stable Diffusion, etc.

I'm also slightly worried about the compute power required. Maybe it's small for upscaling as opposed to making a new image from a text or image prompt.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants