Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDF-hul: NullPointerException with weird escaped chraracters in PDF trailer #876

Open
matthias-fratz-bsz opened this issue Aug 9, 2023 · 2 comments
Labels
bug A product defect that needs fixing P2 Medium priority issues to be scheduled in a future release

Comments

@matthias-fratz-bsz
Copy link

We have several files that trigger the following NullPointerException:

java.lang.NullPointerException: Cannot invoke "edu.harvard.hul.ois.jhove.module.pdf.Token.isSimpleToken()" because "tok" is null
	at edu.harvard.hul.ois.jhove.module.pdf.Parser.readObject(Parser.java:287)
	at edu.harvard.hul.ois.jhove.module.pdf.Parser.readArray(Parser.java:304)
	at edu.harvard.hul.ois.jhove.module.pdf.Parser.readObject(Parser.java:275)
	at edu.harvard.hul.ois.jhove.module.pdf.Parser.readDictionary(Parser.java:340)
	at edu.harvard.hul.ois.jhove.module.PdfModule.parseTrailer(PdfModule.java:1322)
	at edu.harvard.hul.ois.jhove.module.PdfModule.parse(PdfModule.java:820)
	at edu.harvard.hul.ois.jhove.JhoveBase.processFile(JhoveBase.java:782)
	at edu.harvard.hul.ois.jhove.JhoveBase.process(JhoveBase.java:567)
	at edu.harvard.hul.ois.jhove.JhoveBase.dispatch(JhoveBase.java:439)
	at Jhove.main(Jhove.java:295)

I cannot share the offending files, but jhove_npe.zip is a synthetic example I made that also triggers the NPE. It seems to be related to escaped characters in the file's trailer dictionary's /ID entry: \376\377\377\377 causes NPE, while \377\377\377\377 reports "Valid and well-formed". Various combinations around \3xx work or don't work; I was unable to investigate this further.

@matthias-fratz-bsz
Copy link
Author

The original example only works against JHove 1.20.0 with some old version of PDF-hul. Sorry, my fault for not testing against latest...

Anyway, here jhove_npe_1224.zip is an updated version that also causes NPE on JHOVE 1.28.0 and PDF-hul 1.12.4. The ID is from the original file: Not sure why it was written like that (hex string would have been shorter), but it seems to be valid according to the PDF standard.

@carlwilson
Copy link
Member

Thanks for the report. There are a couple of issues that are similar. The pointers and examples you've given will help us to track this down, I think.

@carlwilson carlwilson added bug A product defect that needs fixing P2 Medium priority issues to be scheduled in a future release labels Sep 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A product defect that needs fixing P2 Medium priority issues to be scheduled in a future release
Projects
None yet
Development

No branches or pull requests

2 participants