-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[23.2] Use python-isal for fast zip deflate compression in rocrate export #17342
[23.2] Use python-isal for fast zip deflate compression in rocrate export #17342
Conversation
5c08f00
to
7e9c02b
Compare
8ee3b56
to
a4b14be
Compare
@@ -2753,7 +2753,7 @@ def _finalize(self) -> None: | |||
out_file = out_file_name[: -len(".zip")] | |||
else: | |||
out_file = out_file_name | |||
rval = shutil.make_archive(out_file, "zip", self.export_directory) | |||
rval = shutil.make_archive(out_file, "fastzip", self.export_directory) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this works only because this files also imports from galaxy.util.compression_utils import CompressedFile
. Should we re-export shutil
(maybe as gx_shutil
) in lib/galaxy/util/compression_utils.py
and import it in this file from there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this works only because this files also imports
from galaxy.util.compression_utils import CompressedFile
yes
Should we re-export
shutil
(maybe asgx_shutil
) inlib/galaxy/util/compression_utils.py
and import it in this file from there?
makes sense, sure
@@ -34,6 +34,7 @@ include_package_data = True | |||
install_requires = | |||
galaxy-util | |||
fs | |||
isal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mvdbeek Do you remember why you added the isal
dependency here as well?
I don't think it's currently used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's used in https://github.com/galaxyproject/galaxy/blob/dev/lib/galaxy/util/compression_utils.py#L361-L372 or am I missing something ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, why it's used in the files package ? I don't know, it should be enough to just have it be a dependency of data I thuok ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, exactly. I'll drop it in #18449 .
We're currently archiving a lot of old and unused histories to tape by exporting histories to RO-crates, and CPU utilization appears to be the bottleneck.
I've switched out the default zlib DEFLATE compression with the more efficient DEFLATE implementation provided by https://github.com/intel/isa-l via https://pypi.org/project/isal/. I've also tried zlib-ng (https://github.com/zlib-ng/zlib-ng) but that was only marginally faster.
This is about 3 times faster on 8GB sampled from /dev/urandom (~160 MB/s vs ~56 MB/s), both locally and on galaxy07 under heavy load.
How to test the changes?
(Select all options that apply)
License