Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JHOVE 1.26.1 with AM 1.15.1 is buggy #1683

Closed
5 tasks
fitnycdigitalinitiatives opened this issue Apr 29, 2024 · 4 comments
Closed
5 tasks

JHOVE 1.26.1 with AM 1.15.1 is buggy #1683

fitnycdigitalinitiatives opened this issue Apr 29, 2024 · 4 comments

Comments

@fitnycdigitalinitiatives

Expected behaviour

File validation with JHOVE should work.

Current behaviour
Certain files (Tiff's in my case) are getting indefinitely hung up, and completely stalls ingests. Other times, it returns a 'java.lang.OutOfMemoryError: Java heap space' error. The same files when tested on AM 1.14 on rhel 7 do not cause any problems. see: openpreserve/jhove#920

Steps to reproduce

Run JHOVE 1.26.1 on Tiff file (I can't share publicly because they are private student work).

Your environment (version of Archivematica, operating system, other relevant details)
RHEL 9, AM 1.15.1


For Artefactual use:

Before you close this issue, you must check off the following:

  • All pull requests related to this issue are properly linked
  • All pull requests related to this issue have been merged
  • A testing plan for this issue has been implemented and passed (testing plan information should be included in the issue body or comments)
  • Documentation regarding this issue has been written and merged (if applicable)
  • Details about this issue have been added to the release notes (if applicable)
@mamedin
Copy link

mamedin commented Apr 30, 2024

@fitnycdigitalinitiatives can you give us more info about the transfer and tiff files you tested?.

  • How many tiff files have the transfer?
  • Does it happen in a single tiff file transfer?
  • Can you print the output of "file FILE.tiff" and the size for any tiff?

I just tested in Rocky9 with the following sampledata transfer that contains a TIFF:

https://github.com/artefactual/archivematica-sampledata/tree/master/SampleTransfers/Images

About that tiff:

[root@am115rocky9 Images]# file G31DS.TIF
G31DS.TIF: TIFF image data, big-endian, direntries=17, height=3248, bps=1, compression=bi-level group 3, PhotometricIntepretation=WhiteIsZero, name=/tmp/test.pbm, description=converted PBM file, orientation=upper-left, width=2464

And the validation using Jhove works (no error in "Validate format" microservice):

Running Validate using JHOVE
Command "Validate using JHOVE" was successful
Creating validation event for /var/archivematica/sharedDirectory/watchedDirectories/workFlowDecisions/extractPackagesChoice/images-mamedin-49b2e501-2471-427b-b698-cd68a9e9d792/objects/G31DS.TIF (1f58054b-27e9-406d-ab5d-9db1e21bcb98)

The jhove package in our test VM is:

[root@am115rocky9 ~]# rpm -qa | grep jhove
jhove-1.26.1-2.el9.x86_64

@fitnycdigitalinitiatives
Copy link
Author

Hi Miguel,

This occurred in a transfer with 8 total tiff's. Each is about 35 mb, so not very large. I stopped the transfer after the validate step stalled for 3 hours. To see what was going on, I manually ran JHOVE (1.26.1) on each of the tiff's and it turned out that only 2 of them were causing issues. I took those two files and ran them through JHOVE (1.20.0) on our old AM server and it worked instantly without incident. I have also noticed that the "Validate format" microservice step is generally running a little slower than it was on the previous version of AM. That said, for the two offending tiff's I resaved them using photoshop and then there wasn't a problem validating. So it may be the tiff itself that's the problem, but it doesn't cause issue in the older JHOVE and also opens without issue in any image viewing applications. I also have successfully ingested about 1,000 tiff's without issues so maybe these two are completely one-off's. I can send you one of the files directly if you wanted to share your email.

About this tiff:

/data/transfers/sl_fa_000776.tif: TIFF image data, little-endian, direntries=23, height=2800, bps=290, compression=none, PhotometricIntepretation=RGB, width=4200

When I run jhove -h xml /data/transfers/sl_fa_000776.tif it either hangs indefinitely or returns this error:

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:3210) at java.util.Arrays.copyOf(Arrays.java:3181) at java.util.Vector.grow(Vector.java:271) at java.util.Vector.ensureCapacityHelper(Vector.java:251) at java.util.Vector.add(Vector.java:787) at edu.harvard.hul.ois.jhove.module.pdf.Literal.processLiteral(Literal.java:302) at edu.harvard.hul.ois.jhove.module.pdf.Tokenizer.getNext(Tokenizer.java:406) at edu.harvard.hul.ois.jhove.module.pdf.Parser.getNext(Parser.java:94) at edu.harvard.hul.ois.jhove.module.pdf.PdfHeader.parseHeader(PdfHeader.java:130) at edu.harvard.hul.ois.jhove.module.PdfModule.parseHeader(PdfModule.java:1053) at edu.harvard.hul.ois.jhove.module.PdfModule.parse(PdfModule.java:809) at edu.harvard.hul.ois.jhove.JhoveBase.processFile(JhoveBase.java:813) at edu.harvard.hul.ois.jhove.JhoveBase.process(JhoveBase.java:608) at edu.harvard.hul.ois.jhove.JhoveBase.dispatch(JhoveBase.java:461) at edu.harvard.hul.ois.jhove.Jhove.main(Jhove.java:265)

So while I was able to remedy this fairly easily by resaving the tiffs, my concern is that this will happen again and that it works fine on the previous AM with older JHOVE, and also the general sluggishness I seeing during validate. Again, lemme know if you want me to directly send you the file.

Thanks,

Joseph

@fitnycdigitalinitiatives
Copy link
Author

Following up on this,

I had this occur again and did some more investigating. On my local Mac, I installed jhove 1.26.1 and the latest 1.30 a ran it on the problematic tiff files and it ran without issue. The same file however cause the indefinite hanging when run my AM server which runs on RHEL 9. I tested both on JHOVE 1.26.1 and 1.30. So I suspect it's an issue with RHEL 9 or the version of java that's running on rhel 9. Can I send you one of the files to test on your end with rocky 9?

@fitnycdigitalinitiatives
Copy link
Author

I think I figured it out. Actually what was happening is that the default Java max heap size on our server was set on the low side so I upped it and now the problematic Tiff's are no longer causing problems. For future reference, I edited /usr/bin/jhove so the last line includes a bigger max heap:
java -Xmx2048m -Xss1024k -classpath "${jar_list}:${extra_jar_list}" edu.harvard.hul.ois.jhove.Jhove -c "${CONFIG}" "${@}"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants