Skip to content

Tutorial for ProcessND.py

jdkio edited this page Apr 18, 2024 · 10 revisions

Warnings (Important please read)

ProcessND is outdated

This is a tutorial for ProcessND.py, but it is outdated. The official way of making samples is on nersc. Please talk to #nd_production or #nd_reco_sim on slack for more information about that.

Try to use existing samples

Generating your own samples should always be a last resort. Chances are that there's already a sample that can meet your needs. Or that others have similar needs and we should make a combined sample that meets all needs.

Never run large samples without permission

Running a large sample is expensive. All samples should be processed with permission from #nd_production and/or #nd_reco_sim. They should be kept in the loop in all cases. If they don't know what you're up to, you probably shouldn't be running on the grid.

Run tests before running large samples

Running a large sample and then needing to scrap is a waste of resources. Make sure all the files are as you expect them. Unfortunately, the sometimes the errors are not evident. For example, there will always be an edep file even if the genie file crashed. This can confuse running tms on the edep file, because the file is empty of events. The log files too will sometimes have multiple errors because nothing is checking if the previous step completed successfully.

AL9 will probably make these scripts even more outdated

The AL9 transition is coming soon and we already are seeing problems with these scripts. Please use at your own risk. For now, running in a singularity container seems to work. --singularity-image /cvmfs/singularity.opensciencegrid.org/fermilab/fnal-wn-sl7:latest

Please leave files somewhere were all can benefit and documented

Make sure the are somewhere where everyone can use them. Also make sure what is in the files is well documented. Like the individual settings

Then why does this tutorial exist?

  1. In some cases, waiting for a nersc sample might take too long. Especially if just rerunning the dune-tms code, the turnaround is much faster running quickly on the grid
  2. Some functionality (like pileup) is not yet perfectly replicated nersc sample, at least for the dune-tms code

Setup

source /cvmfs/dune.opensciencegrid.org/products/dune/setup_dune.sh
cd <dir with your local copy of ND_Production>
cd scripts # For ProcessND.py

Description of ProcessND.py

ProcessND.py takes in a bunch of parameters and then creates processnd.sh. This is the actual script that is submitted to the grid with jobsub_submit.

jobsub_submit parameters

-N - The number of files. In case of generation, this represents the final number of files. So if each file is 1e15 pot, and you run 1000 files, then the total pot would be 1e18. If using an indir with existing files, -N should be set to that number of files. Each file is generated on another grid node.

--memory=4000MB - This is a good starting point. In case of using overlay, this needs to be increased.

--expected-lifetime=24h - 24h is probably too long, but running all stages with 1e16 pot takes a bit of time. Running a tms-only sample is < 1hr usually. The closer to the actual time this is set to, the higher chance a job has at finding a node. Ie. the quicker things get processed.

--tar_file_name - This should point to the dune-tms tar file. Eventually this will be its own product but for now all the code needed to run dune-tms lives in its tar file.

tar -czvf dune-tms.tar.gz dune-tms
mv dune-tms.tar.gz /pnfs/dune/persistent/users/kleykamp/dune-tms_tarfiles/2024-04-18_add_truth_info_test.tar.gz

Please make sure to at least tag the version of dune-tms you used, so that we have some reproducability. Here's an example corresponding to the tar file above:

git tag kleykamp_2024-04-18_add_truth_info_test
git push origin kleykamp_2024-04-18_add_truth_info_test

Ideally we all use the same release. Talk to #nd_muon_spectrometer or #nd_muon_spectrometer_code if unsure. It's unlikely this tutorial will have the latest version.

Example of running TMS-only sample

python ProcessND.py --stages tmsreco --indir /pnfs/dune/persistent/users/abooth/Production/MiniProdN1p2-v1r1/run-spill-build/output/MiniProdN1p2_NDLAr_1E19_RHC.spill/EDEPSIM_SPILLS/00000/ --outdir /pnfs/dune/scratch/users/kleykamp/nd_production/2024-04-11_add_truth_info_test

jobsub_submit --group dune --role=Analysis -N 100 --singularity-image /cvmfs/singularity.opensciencegrid.org/fermilab/fnal-wn-sl7:latest --expected-lifetime=24h --memory=4000MB --tar_file_name dropbox:///pnfs/dune/persistent/users/kleykamp/dune-tms_tarfiles/2024-04-11_add_truth_info_test.tar.gz file://processnd.sh/

Example of running all stages

python ProcessND.py --outdir /pnfs/dune/scratch/users/kleykamp/nd_production/2024-04-11_add_truth_info_test_with_overlay --geometry_location /pnfs/dune/persistent/physicsgroups/dunendsim/geometries/TDR_Production_geometry_v_1.0.3/nd_hall_with_lar_tms_sand_TDR_Production_geometry_v_1.0.3.gdml --manual_geometry_override nd_hall_with_lar_tms_sand_TDR_Production_geometry_v_1.0.3.gdml --topvol volDetEnclosure --pot 1e15 --stages gen+g4+tmsreco

Then run jobsub_submit. This adds gen+g4 stages, which are genie (neutrino interaction simulation) and edep_sim (particle through detector simulation). In this case, we need to point to the correct version of the geometry, which is geometry_v_1.0.3 as of 2024-04-17 (always check that this is still true). We first point to the location on pnfs where it's copied from, and then the name. --topvol volDetEnclosure is so that it only simulates events in LAr and TMS (and possibly sand?). Otherwise it would simulate in the whole rock detector

Other ways of processing

Overlay (pileup) simulation

Add --timing spill --overlay. Also increase the memory requirements to 8000mb.

Off axis

We can process with off-axis fluxes using the -oa (off-axis) option, which takes in the number of meters off axis. This also requires that you use --dk2nu because only dk2nu flux files have the full neutrino information in them. The gsimple files used by default "flatten down" the dk2nu files so that it gives you the flux assuming 0m off-axis detector, and nothing else. So those files are not suitable.

B-field

It is possible to run with alternate B-fields, but I'm not sure how. I don't think the existing ProcessND.py handles it correctly, at least I haven't seen it validated.

Clone this wiki locally