You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For the full band (4ch 4s) the initial chunking took me over 12 hrs. After turning on the compression and switching to using a ramdisk (/dev/shm) for the dir_local, instead of using the local disk on the node, I got it down to roughly 7 hrs. This seems a bit extreme. (I did have to limit the number of chunking tasks running simultaneously per freq. band). I've tweaked the chunk size to produce 8 chunks (integer multiple of the thread limit on io-heavy tasks, thread_io), so it is now producing chunks of ~1.5G (pre-compression was 1.5G). The work dir is on a large shared disk.
The text was updated successfully, but these errors were encountered:
That does seem slow. The chunking script could likely be improved quite a bit, as it does a lot of copying of columns. I'll take a look at it.
Another issue is that the chunking is limited to a single node, so it can't take advantage of multiple nodes of a cluster. We could get around this by making a "chunking pipeline" or perhaps by moving the whole chunking operation into the initial-subtract pipeline.
Using multiple nodes would be quite useful here. I've been using only one node for the init subtract (deep) but 3-4 for factor so for me it would go faster if you pipeline it in factor.
Doing the chunking in the initial-subtract pipeline would be possible. I'm not too fond of this because it would make the initial-subtract pipeline even more a "Factor-pipeline", but it probably already is, so there is no real harm done.
Another question is if the chunking part of Factor will speed up if the input data is already compressed with dysco?
Wendy: Is the chunking limited by CPU speed or by IO speed?
For the full band (4ch 4s) the initial chunking took me over 12 hrs. After turning on the compression and switching to using a ramdisk (/dev/shm) for the dir_local, instead of using the local disk on the node, I got it down to roughly 7 hrs. This seems a bit extreme. (I did have to limit the number of chunking tasks running simultaneously per freq. band). I've tweaked the chunk size to produce 8 chunks (integer multiple of the thread limit on io-heavy tasks, thread_io), so it is now producing chunks of ~1.5G (pre-compression was 1.5G). The work dir is on a large shared disk.
The text was updated successfully, but these errors were encountered: