Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[23.2] Use python-isal for fast zip deflate compression in rocrate export #17342

Merged
merged 3 commits into from
Jan 23, 2024

Conversation

mvdbeek
Copy link
Member

@mvdbeek mvdbeek commented Jan 23, 2024

We're currently archiving a lot of old and unused histories to tape by exporting histories to RO-crates, and CPU utilization appears to be the bottleneck.

I've switched out the default zlib DEFLATE compression with the more efficient DEFLATE implementation provided by https://github.com/intel/isa-l via https://pypi.org/project/isal/. I've also tried zlib-ng (https://github.com/zlib-ng/zlib-ng) but that was only marginally faster.

This is about 3 times faster on 8GB sampled from /dev/urandom (~160 MB/s vs ~56 MB/s), both locally and on galaxy07 under heavy load.

How to test the changes?

(Select all options that apply)

  • I've included appropriate automated tests.
  • This is a refactoring of components with existing test coverage.
  • Instructions for manual testing are as follows:
    1. [add testing steps and prerequisites here if you didn't write automated tests covering all your changes]

License

  • I agree to license these and all my past contributions to the core galaxy codebase under the MIT license.

@jdavcs jdavcs merged commit 26fb60d into galaxyproject:release_23.2 Jan 23, 2024
42 of 46 checks passed
@@ -2753,7 +2753,7 @@ def _finalize(self) -> None:
out_file = out_file_name[: -len(".zip")]
else:
out_file = out_file_name
rval = shutil.make_archive(out_file, "zip", self.export_directory)
rval = shutil.make_archive(out_file, "fastzip", self.export_directory)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this works only because this files also imports from galaxy.util.compression_utils import CompressedFile. Should we re-export shutil (maybe as gx_shutil) in lib/galaxy/util/compression_utils.py and import it in this file from there?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this works only because this files also imports from galaxy.util.compression_utils import CompressedFile

yes

Should we re-export shutil (maybe as gx_shutil) in lib/galaxy/util/compression_utils.py and import it in this file from there?

makes sense, sure

@@ -34,6 +34,7 @@ include_package_data = True
install_requires =
galaxy-util
fs
isal
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mvdbeek Do you remember why you added the isal dependency here as well?
I don't think it's currently used?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, why it's used in the files package ? I don't know, it should be enough to just have it be a dependency of data I thuok ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, exactly. I'll drop it in #18449 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants