-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speedup packing #45
Speedup packing #45
Conversation
The packaging step was very inefficient especially for metagenomes that have a lot of MAGs. This was because the process was serialized and each one has to read through the input files over and over. This changes it so each file is open and read once and the output is multiplexed to the various outputs files. It also parallelizes the tar file generation. For one test case the previous way was taking 6-7 hours and it nows runs in a few minutes.
- Fix KO naming - Cleanup log output a bit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a few errors with undefined variables that need to be addressed.
I think all the comments have been addressed. |
I think we need to update the version to 0.6.0 in Dockerfile_vis and rebuild it. In addition, the version of the docker string in the main WDL file, mbin_nmdc.wdl line 25 for this merge. |
Let's merge this, then we can do as @chienchi suggested to rebuild the image and then update the WDL. |
These changes are actually from me.
The packaging step was very inefficient especially for metagenomes
that have a lot of MAGs. This was because the process was
serialized and each one has to read through the input files over
and over. This changes it so each file is open and read once
and the output is multiplexed to the various outputs files.
It also parallelizes the tar file generation.
For one test case the previous way was taking 6-7 hours and it nows
runs in a few minutes.