PNG images are not handled/extracted correctly #2317

syntax-surgeon · 2023-11-28T11:18:38Z

syntax-surgeon
Nov 28, 2023

Hi all

I have been wanting to exract all images in a PDF file as separate image files. The process seems to be causing errors during the extraction of PNG images, however, the JPEG images seem to be working just fine. The script I am using is as follows:

import PyPDF2
from PIL import Image
import io

doc = PyPDF2.PdfReader("data/big_lorem_multipic.pdf")

for page in doc.pages:
    for image in page.images:
        image_bytes = io.BytesIO(image.data)
        pil_image = Image.open(image_bytes)
        pil_image.save(f"data/{image.name}")

The JPEG image is saved perfectly, but the PNG images are not. What seems to be the issue here? I have tried manually opening the file such as: with open(f"data/{image.name}", "wb) as file: file.write(image.data) and the results have been exactly the same. I can see that a related bug fix (#1834) was recently made but I still cannot identify the cause of this issue.

For reference I am attaching the PDF below. Thank you
big_lorem_multipic.pdf

stefan6419846 · 2023-11-28T12:37:52Z

stefan6419846
Nov 28, 2023
Maintainer

This probably should be reported as an issue once you migrated from the deprecated PyPDF2 to pypdf (simplified code):

import pypdf


doc = pypdf.PdfReader("big_lorem_multipic.pdf")

for page in doc.pages:
    for image in page.images:
        image.image.save(image.name)

1 reply

syntax-surgeon Nov 28, 2023
Author

Thank you. Moved as an issue here: #2318

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PNG images are not handled/extracted correctly #2317

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

PNG images are not handled/extracted correctly #2317

syntax-surgeon Nov 28, 2023

Replies: 1 comment · 1 reply

stefan6419846 Nov 28, 2023 Maintainer

syntax-surgeon Nov 28, 2023 Author

syntax-surgeon
Nov 28, 2023

Replies: 1 comment 1 reply

stefan6419846
Nov 28, 2023
Maintainer

syntax-surgeon Nov 28, 2023
Author