PNG images are not handled/extracted correctly #2317
Replies: 1 comment 1 reply
-
This probably should be reported as an issue once you migrated from the deprecated PyPDF2 to pypdf (simplified code): import pypdf
doc = pypdf.PdfReader("big_lorem_multipic.pdf")
for page in doc.pages:
for image in page.images:
image.image.save(image.name) |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi all
I have been wanting to exract all images in a PDF file as separate image files. The process seems to be causing errors during the extraction of PNG images, however, the JPEG images seem to be working just fine. The script I am using is as follows:
The JPEG image is saved perfectly, but the PNG images are not. What seems to be the issue here? I have tried manually opening the file such as:
with open(f"data/{image.name}", "wb) as file: file.write(image.data)
and the results have been exactly the same. I can see that a related bug fix (#1834) was recently made but I still cannot identify the cause of this issue.For reference I am attaching the PDF below. Thank you
big_lorem_multipic.pdf
Beta Was this translation helpful? Give feedback.
All reactions