Skip to content

Setting up and running the code

Alp edited this page Aug 25, 2023 · 11 revisions

Setup

Python virtual environments

It is generally a good idea to work in a python virtual environment. The virtual environment provides a completely separate Python environment to you in which you can easily install all the packages you need and are at the same time separated from other packages installed on your machine. This way, it is easy to exactly define the environment and prevent misconfiguration.

Instructions for how to setup a virtual environment are all over the internet, one example is here.

Setup on LXPlus8

If you want to run on CERN LXPlus8 computing cluster, you can use the instructions below to create and activate a working environment. Instructions for LXPlus7 are also provided in the below section.

# Create Python3 virtual env for this project
ENVNAME="jmecoffteaenv"
python3 -m venv ${ENVNAME}

# Activate the environment & add library search path to PYTHONPATH
source ${ENVNAME}/bin/activate
export PYTHONPATH=${PWD}/${ENVNAME}/lib/python3.6/site-packages:$PYTHONPATH

# Get GRID proxy
voms-proxy-init --voms cms

Package installation

This package can be cloned locally, built and installed via pip. The instructions are as follows:

git clone [email protected]:cms-jet/jme-cofftea.git

# Build the project
cd jme-cofftea/
python setup.py sdist bdist_wheel

# Install via pip (in editable mode)
# Make sure to execute this from the root directory of the project (i.e. /path/to/jme-cofftea)
pip install -e .

Setup on LXPlus7

If you want to run on CERN LXPlus7 computing cluster, you can use the instructions below to create and activate a working environment:

# First, get a GRID proxy, necessary to access files on DAS
voms-proxy-init --voms cms

# Source the underlying Python3 env
source /cvmfs/sft.cern.ch/lcg/views/LCG_95apython3/x86_64-centos7-gcc8-opt/setup.sh

# Create Python3 virtual env for this project
ENVNAME="jmecoffteaenv"
python -m venv ${ENVNAME}

# Activate the environment & add library search path to PYTHONPATH
source ${ENVNAME}/bin/activate
export PYTHONPATH=${PWD}/${ENVNAME}/lib/python3.6/site-packages:$PYTHONPATH

You can leave the environment by typing deactivate. You can later reactivate it by sourcing the activate script shown above.

It is useful to save this script into a env.sh to conveniently set up the environment every time:

source /cvmfs/sft.cern.ch/lcg/views/LCG_95apython3/x86_64-centos7-gcc8-opt/setup.sh
ENVNAME="jmecoffteaenv"
source ${ENVNAME}/bin/activate
export PYTHONPATH=${PWD}/${ENVNAME}/lib/python3.6/site-packages:$PYTHONPATH

For installing the required packages, see Package Installation section from above.

Running the HLT Processor

Just over a few files

Check out the ./scripts/run_quick.py script to see how to quickly run over a few handpicked files, which may be useful for testing:

Note: If you're accessing files from DAS (probably), make sure you have a valid GRID proxy, otherwise this will fail. You can check by voms-proxy-info.

cd scripts/
./run_quick.py hlt

The output will be saved in a file named hlt_${dataset}.coffea depending on the name of the dataset you ran over. This output file will contain all the histograms being saved for the trigger efficiency measurement (i.e. numerator and denominator regions).

Using large data sets

To run the HLT analyzer over entire datasets (e.g. from DAS), jexec command can be used. An example is shown below:

jexec hlt --datasrc das submit --dataset 'Muon.*2023.*' --name '2023-05-12_Muon_2023' --filesperjob 2 --asynchronous

Here, we use the submit directive to submit jobs to HTCondor. The following arguments are specified:

  • --dataset: A regular expression matching the names of the datasets to be submitted. Note that the regex should be matching not to the full dataset name on DAS, but rather a short name constructed here.
  • --name: Name for the job. The submission files and the output .coffea files will be saved under submission/<name>.
  • --filesperjob: Defines how many files each job should analyze. Note that there is also an --eventsperjob option, though for DAS it is recommended to use --filesperjob instead because remote file access takes time (to count events per file).
  • --asynchronous: Once the job files are all ready, submits them asynchronously (in parallel). Omitting this flag will result in condor jobs being submitted one by one.

Once the jobs are submitted, you can use condor_q to keep track of the jobs. The output files will be in submission/<name>. Under submission/<name>/files directory, the user can find the HTCondor job files and log files for each job.

Once all the jobs finish, the next step is to merge all the output coffea files into a single accumulator. This is explained here.

You can also run other types of processors using jexec. For example, to run customNanoProcessor, you can specify the customnano option:

jexec customnano submit --dataset 'Muon.*2023.*' --name '2023-05-12_Muon_2023' --filesperjob 2 --asynchronous