Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEAT: PDF 2.0 header detection #964

Open
wants to merge 4 commits into
base: dev/1.33
Choose a base branch
from
Open

Conversation

carlwilson
Copy link
Member

@carlwilson carlwilson commented Sep 10, 2024

  • refactored PDF-hul PDF Header parsing, particularly the version parsing;
    • no more null return handling, specific exceptions are thrown instead;
    • these are caught and handled in the PdfModule class;
    • new PdfVersion class to handle version parsing and comparison;
    • PDF version detection now handled by pattern match, with better reporting of errors;
    • offsets added to PdfExceptions that were in the scope of refactoring;
    • added new %PDF-2. signature to the PDF-hul module;
  • new edu.harvard.hul.ois.jhove.messages.JhoveMessages.getMessageInstance convenience method to clone message with sub-message;
  • added two new test cases to PDF-hul regression tests for header version detection;
  • added necessary sed insertions and file copies to handle changed integration test results to jhove-bbt/scripts/create-1.33-target.sh; and
  • refactored "not found" reporting for modules and handlers in jhove-apps edu.harvard.hul.ois.jhove.Jhove class.

Error messages changed to give more detailed information:

  • PDF-HUL-137:
  • PDF-HUL-148: No longer mentions specific minor versions but has a more detailed sub-message
  • PDF-HUL-155: Reports the header that was too short

New Error Messages:

  • PDF-HUL-160: IOException reading PDF header
  • PDF-HUL-161: PDF major version number should be 1.x or 2.x
  • PDF-HUL-162: Malformed PDF version number

Closes #470

- added second signature for PDF-2 style header (this can be improved);
- added support for PDF-2 headers to `PdfHeader`; and
- more work required on valid major/minor version combinations.
- refactored `PDF-hul` PDF Header parsing, particularly the version parsing;
  - no more `null` return handling, specific exceptions are thrown instead;
  - these are caught and handled in the `PdfModule` class;
  - new `PdfVersion` class to handle version parsing and comparison;
  - PDF version detection now handled by pattern match, with better reporting of errors;
  - offsets added to `PdfException`s that were in the scope of refactoring;
  - added new `%PDF-2.` signature to the `PDF-hul` module;
- new `edu.harvard.hul.ois.jhove.messages.JhoveMessages.getMessageInstance` convenience method to clone message with sub-message;
- added two new test cases to `PDF-hul` regression tests for header version detection;
- added necessary `sed` insertions and file copies to handle changed integration test results to `jhove-bbt/scripts/create-1.33-target.sh`; and
- refactored "not found" reporting for modules and handlers in jhove-apps `edu.harvard.hul.ois.jhove.Jhove` class.

Error messages changed to give more detailed information:

- `PDF-HUL-137`:
- `PDF-HUL-148`: No longer mentions specific minor versions but has a more detailed sub-message
- `PDF-HUL-155`: Reports the header that was too short

New Error Messages:

- `PDF-HUL-160`: IOException reading PDF header
- `PDF-HUL-161`: PDF major version number should be 1.x or 2.x
- `PDF-HUL-162`: Malformed PDF version number
@carlwilson carlwilson added the feature New functionality to be developed label Sep 10, 2024
@carlwilson carlwilson added this to the JHOVE 1.34 milestone Sep 10, 2024
@carlwilson carlwilson self-assigned this Sep 10, 2024
- moved field declarations.
} catch (final IOException ee) {
throw new PdfMalformedException(
JhoveMessages.getMessageInstance(MessageConstants.PDF_HUL_160, ee.getMessage()),
offset); // PDF-HUL-137
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small detail // PDF-HUL-137 should be // PDF-HUL-160

throw new PdfMalformedException(
JhoveMessages.getMessageInstance(MessageConstants.PDF_HUL_160, ee.getMessage()),
offset); // PDF-HUL-137
} catch (PdfException e) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this swallow the PDF_HUL_33 and PDF_HUL_34 errors in the parser.getNext?

* @param versionString
* the version number from the PDF Header, should be of
* form
* <code>1.x</code> where x should be of the range 0-7.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation should also be changed to reflect PDF 2.0

PDF-HUL-149 = Invalid indirect destination - referenced object ''{0}'' cannot be found
PDF-HUL-150 = Cross-reference stream must be a stream
PDF-HUL-151 = Unexpected error occurred while attempting to read the cross-reference table
PDF-HUL-152 = Encrypt dictionary has no OE key or it has a null value
PDF-HUL-153 = Encrypt dictionary has no UE key or it has a null value
PDF-HUL-154 = Unknown Developer Prefix:
PDF-HUL-155 = Error parsing mandatory version number from PDF header.
PDF-HUL-155 = Error parsing mandatory version number, PDF header too short: "{0}"
PDF-HUL-156 = Extension can't be defined as an indirect reference
PDF-HUL-157 = Unexpected exception {0}
PDF-HUL-158 = Unexpected exception {0}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm the wiki needs to be updated :-) #967

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New functionality to be developed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants