You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Compression is one of the bigger bottlenecks of the pipeline right now. 4mc is nearly 20x faster than bzip2 or bgzip and may offer a reasonable trade-off between compression delay and transfer speed.
The text was updated successfully, but these errors were encountered:
If we are going to use ReadTools, there is maybe not such an improvement. As we discussed in the ReadTools repository, on-the-fly compression might not be a bottleneck and it should be profiled properly (there are java tools for that, such as https://www.ej-technologies.com/products/jprofiler/overview.html, that can help to check where the low speed hotspot).
Another option is to profile some upload/download using ReadTools with different compression (several times and taking average, maximum and minimum). I am still not sure that the compression is the major bottleneck: before the on-the-fly upload existed, the pipeline was taking even more due to compression locally (adding IO overhead on the local filesystem and disk usage), and uploading using hdfs (network bottleneck and IO in HDFS). The improvement was huge, but it might be that the limiting factor is compression now (there is going to be a limit of improvement at some point).
If people is complaining about speed, they should have been at the institute 3 years ago! That's one of the reasons that ReadTools have Distmap support! Hahaha
Compression is one of the bigger bottlenecks of the pipeline right now. 4mc is nearly 20x faster than bzip2 or bgzip and may offer a reasonable trade-off between compression delay and transfer speed.
The text was updated successfully, but these errors were encountered: