You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
On newer Illumina machines writing only a few large concatenated bcl files (*.cbcl) per cycle, capturing incremental snapshots of modified files in a run directory too often can result in a tarball that is artificially-inflated in size. This occurs because multiple snapshots of each *.cbcl file are included, with each snapshot capturing the progressive changes to a *.cbcl file as it was appended to by the sequencer over time. We ideally want to include each *.cbcl file only once it has been finalized.
This behavior can be improved by setting or overriding the value for DELAY_BETWEEN_INCREMENTS_SEC to a larger number to reduce the frequency at which snapshots are captured.
A value of DELAY_BETWEEN_INCREMENTS_SEC=600 should limit snapshots to a maximum of two per cycle (in the worst case an in-progress capture plus a capture once each cbcl finalized), as 600 seconds is at or above the 95th percentile of observed cycle durations.
An additional improvement should be possible by excluding from a snapshot any files that are currently open or that we anticipate may be changed by the sequencer in the (very) near future. Rather than inspect open file descriptors, we can exclude the paths of the most recent cycle by adding this pattern to a file:
find $PATH_TO_RUN_FOLDER/Data/Intensities/BaseCalls/L00*/ \
-type d \
-regextype posix-extended \
-iregex '^.+\/C[0-9]+\.[0-9]$'|\
sort -r -k1,1 -V |\
head -n1 |\
sed --regexp-extended 's/(BaseCalls\/)L([0-9]+)/\1L\*/g'|\
tee recent_cycle_exclusions.txt
(obtain $PATH_TO_RUN_FOLDER as desired, ex. those directories within /usr/local/illumina/runs/ on a NextSeq 2000)
...and then exclude the patterns in that file as part of the tar call by passing --exclude-from=recent_cycle_exclusions.txt
This should only be added to the call if run_is_finished is not true, so upon run completion any previously-excluded files will be swept up into the final tarball.
We could also optionally add to the exclusion file the paths of files that have changed in the past few minutes. Ex.:
find . -mmin -3 -type f >> recent_cycle_exclusions.txt
The text was updated successfully, but these errors were encountered:
On newer Illumina machines writing only a few large concatenated bcl files (
*.cbcl
) per cycle, capturing incremental snapshots of modified files in a run directory too often can result in a tarball that is artificially-inflated in size. This occurs because multiple snapshots of each*.cbcl
file are included, with each snapshot capturing the progressive changes to a*.cbcl
file as it was appended to by the sequencer over time. We ideally want to include each*.cbcl
file only once it has been finalized.This behavior can be improved by setting or overriding the value for
DELAY_BETWEEN_INCREMENTS_SEC
to a larger number to reduce the frequency at which snapshots are captured.A value of
DELAY_BETWEEN_INCREMENTS_SEC=600
should limit snapshots to a maximum of two per cycle (in the worst case an in-progress capture plus a capture once each cbcl finalized), as 600 seconds is at or above the 95th percentile of observed cycle durations.An additional improvement should be possible by excluding from a snapshot any files that are currently open or that we anticipate may be changed by the sequencer in the (very) near future. Rather than inspect open file descriptors, we can exclude the paths of the most recent cycle by adding this pattern to a file:
(obtain
$PATH_TO_RUN_FOLDER
as desired, ex. those directories within/usr/local/illumina/runs/
on a NextSeq 2000)...and then exclude the patterns in that file as part of the
tar
call by passing--exclude-from=recent_cycle_exclusions.txt
This should only be added to the call if
run_is_finished
is nottrue
, so upon run completion any previously-excluded files will be swept up into the final tarball.We could also optionally add to the exclusion file the paths of files that have changed in the past few minutes. Ex.:
The text was updated successfully, but these errors were encountered: