-
Notifications
You must be signed in to change notification settings - Fork 3
Checking File Generation Scripts Work
NOTE This was only used in the DUNE DC4 Data challenge in 2022.. we have kept the scripts in case we ever need them again but we haven't used them since. By the time the data challenge was done we had the scripts on all four machines.;
Three of the four np04-srv-xxx machines are programmed via crontab to generate sets of data files at certain intervals using shell scripts. These shell scripts all run as the "np04daq" user. The scripts all live in ~np04daq/dc4/bin. The initial data samples all live in /data0/dc4/sample on the respective machines. In general there are 3 scripts that run.
This script prepends a prefix and a timestamp to each original file name in /data0/dc4/sample and copies it to /data0/dc4. This way all file names are unique. Each run of this script effectively makes a new faux run number.
This script will not run unless there is a lock file touched in /tmp. /tmp is cleaned up on these machines on average about once a week.
This is a modified version of Kurt Biery's script which makes a rudimentary metadata file for each of the files thus created above. Variants of it are needed for each data type because the metadata fields need to be a bit different.
Once the json file with the metadata is there in /data0/dc4, the ingest daemon will see it, arrange to copy the data file and the json to public EOS, and rename the files to *.copied
This runs once an hour to remove all the *.copied files.
np04-srv-002 Data files are np02_bde_coldbox_run012352*.hdf5 This is run 12352 from the np02 bottom drift electronics cold box. 60 files x 4 GB each
Scripts run in crontab are createDataFile.sh, createMetadataFile_dc4.sh
np04-srv-003 Data files are of form 455_*_cb.test. These are run 455 from the np02 coldbox top drift electronics. They're raw binary files out of the legacy np02 DAQ system. 81 files x 3GB each.
Scripts run in crontab are createDataFile_top.sh, createMetadataFile_dc4_top.sh
np04-srv-001 (will move to np04-srv-004 once it is back) Data files are of form bc38ee1a-3092-441c-9b37-4c106ae5cf48-gen_protodunehd_1GeV_56895279_0_g4_detsim.root and are detsim files of the ProtoDUNE II HD detector. There are 48 unique files in the sample each about 1.3GB, each cloned 4x to make a total sample of ~240GB
Scripts run in crontab are createDataFile_hd.sh, createMetadataFile_dc4_hd.sh
If all these scripts are running successfully, you should see data files getting copied into /data0/dc4, then json files appearing, and then them showing as *.copied.
These scripts are running currently 4x an hour, generating in aggregate 750GB every time they run and 100TB over the course of the 24-hour day. This will make significant I/O load on the machines that will be noticeable since we are actually copying, not symlinking. We have made directories /data1/dc4, /data2/dc4, and /data3/dc4 to spread out the load. Eventually to make the full rate we have to either add another machine or switch to symlinking rather than copying files.
We are working on getting all the various data challenge operators access to the np04daq account. We don't have it yet.