-
Notifications
You must be signed in to change notification settings - Fork 2
Setting up and running the code
It is generally a good idea to work in a python virtual environment. The virtual environment provides a completely separate Python environment to you in which you can easily install all the packages you need and are at the same time separated from other packages installed on your machine. This way, it is easy to exactly define the environment and prevent misconfiguration.
Instructions for how to setup a virtual environment are all over the internet, one example is here.
If you want to run on CERN LXPlus8 computing cluster, you can use the instructions below to create and activate a working environment. Instructions for LXPlus7 are also provided in the below section.
# Create Python3 virtual env for this project
ENVNAME="jmecoffteaenv"
python3 -m venv ${ENVNAME}
# Activate the environment & add library search path to PYTHONPATH
source ${ENVNAME}/bin/activate
export PYTHONPATH=${PWD}/${ENVNAME}/lib/python3.6/site-packages:$PYTHONPATH
# Get GRID proxy
voms-proxy-init --voms cms
This package can be cloned locally, built and installed via pip. The instructions are as follows:
git clone [email protected]:cms-jet/jme-cofftea.git
# Build the project
cd jme-cofftea/
python setup.py sdist bdist_wheel
# Install via pip (in editable mode)
# Make sure to execute this from the root directory of the project (i.e. /path/to/jme-cofftea)
pip install -e .
If you want to run on CERN LXPlus7 computing cluster, you can use the instructions below to create and activate a working environment:
# First, get a GRID proxy, necessary to access files on DAS
voms-proxy-init --voms cms
# Source the underlying Python3 env
source /cvmfs/sft.cern.ch/lcg/views/LCG_95apython3/x86_64-centos7-gcc8-opt/setup.sh
# Create Python3 virtual env for this project
ENVNAME="jmecoffteaenv"
python -m venv ${ENVNAME}
# Activate the environment & add library search path to PYTHONPATH
source ${ENVNAME}/bin/activate
export PYTHONPATH=${PWD}/${ENVNAME}/lib/python3.6/site-packages:$PYTHONPATH
You can leave the environment by typing deactivate
. You can later reactivate it by sourcing the activate
script shown above.
It is useful to save this script into a env.sh
to conveniently set up the environment every time:
source /cvmfs/sft.cern.ch/lcg/views/LCG_95apython3/x86_64-centos7-gcc8-opt/setup.sh
ENVNAME="jmecoffteaenv"
source ${ENVNAME}/bin/activate
export PYTHONPATH=${PWD}/${ENVNAME}/lib/python3.6/site-packages:$PYTHONPATH
For installing the required packages, see Package Installation section from above.
Check out the ./scripts/run_quick.py
script to see how to quickly run over a few handpicked files, which may be useful for testing:
Note: If you're accessing files from DAS (probably), make sure you have a valid GRID proxy, otherwise this will fail. You can check by voms-proxy-info
.
cd scripts/
./run_quick.py hlt
The output will be saved in a file named hlt_${dataset}.coffea
depending on the name of the dataset you ran over. This output file will contain all the histograms being saved for the trigger efficiency measurement (i.e. numerator and denominator regions).
To run the HLT analyzer over entire datasets (e.g. from DAS), jexec
command can be used. An example is shown below:
jexec hlt --datasrc das submit --dataset 'Muon.*2023.*' --name '2023-05-12_Muon_2023' --filesperjob 2 --asynchronous
Here, we use the submit
directive to submit jobs to HTCondor. The following arguments are specified:
-
--dataset
: A regular expression matching the names of the datasets to be submitted. Note that the regex should be matching not to the full dataset name on DAS, but rather a short name constructed here. -
--name
: Name for the job. The submission files and the output.coffea
files will be saved undersubmission/<name>
. -
--filesperjob
: Defines how many files each job should analyze. Note that there is also an--eventsperjob
option, though for DAS it is recommended to use--filesperjob
instead because remote file access takes time (to count events per file). -
--asynchronous
: Once the job files are all ready, submits them asynchronously (in parallel). Omitting this flag will result in condor jobs being submitted one by one.
Once the jobs are submitted, you can use condor_q
to keep track of the jobs. The output files will be in submission/<name>
. Under submission/<name>/files
directory, the user can find the HTCondor job files and log files for each job.
Once all the jobs finish, the next step is to merge all the output coffea
files into a single accumulator. This is explained here.
You can also run other types of processors using jexec
. For example, to run customNanoProcessor
, you can specify the customnano
option:
jexec customnano submit --dataset 'Muon.*2023.*' --name '2023-05-12_Muon_2023' --filesperjob 2 --asynchronous