Top quark EFT analyses using the Coffea framework
The topeft/topeft
directory is set up to be installed as a pip installable package.
topeft/topeft
: A package containing modules and files that will be installed into the environment.topeft/setup.py
: File for installing thetopeft
packagetopeft/analysis
: Subfolders with different analyses or studies.topeft/tests
: Scripts for testing the code withpytest
. For additional details, please see the README in thetests
directory.topeft/input_samples
: Configuration files that point to root files to process.
If conda is not already available, download and install it:
curl https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh > conda-install.sh
bash conda-install.sh
The topeft directory is set up to be installed as a python package. First clone the repository as shown, then run the following commands to set up the environment (note that environment.yml
is a file that is a part of the topeft
repository, so you should cd
into topeft
before running the command):
git clone https://github.com/TopEFT/topeft.git
cd topeft
unset PYTHONPATH # To avoid conflicts.
conda env create -f environment.yml
conda activate coffea-env
pip install -e .
The -e
option installs the project in editable mode (i.e. setuptools "develop mode"). If you wish to uninstall the package, you can do so by running pip uninstall topcoffea
.
The topcoffea
package upon which this analysis also depends is not yet available on PyPI
, so we need to clone the topcoffea
repo and install it ourselves.
cd /your/favorite/directory
git clone https://github.com/TopEFT/topcoffea.git
cd topcoffea
pip install -e .
Now all of the dependencies have been installed and the topeft
repository is ready to be used. The next time you want to use it, all you have to do is to activate the environment via conda activate coffea-env
.
First cd
into analysis/topeft_run2
and run the run_analysis.py
script, passing it the path to your config file or json file. In this example we'll process a single root file locally, using a json file that is already set up.
cd analysis/topeft_run2
wget -nc http://www.crc.nd.edu/~kmohrman/files/root_files/for_ci/ttHJet_UL17_R1B14_NAOD-00000_10194_NDSkim.root
python run_analysis.py ../../input_samples/sample_jsons/test_samples/UL17_private_ttH_for_CI.json -x futures
To make use of distributed resources, the work queue
executor can be used. To use the work queue executor, just change the executor option to -x work_queue
and run the run script as before. Next, you will need to request some workers to execute the tasks on the distributed resources. Please note that the workers must be submitted from the same environment that you are running the run script from (so this will usually mean you want to activate the env in another terminal, and run the condor_submit_workers
command from there. Here is an example condor_submit_workers
command (remembering to activate the env prior to running the command):
conda activate coffea-env
condor_submit_workers -M ${USER}-workqueue-coffea -t 900 --cores 12 --memory 48000 --disk 100000 10
The workers will terminate themselves after 15 minutes of inactivity. More details on the work queue executor can be found here.
If you would like to push changes to the repo, please make a branch and open a PR and ensure that the CI passes. Note that if you are developing on a fork, the CodeCov CI will fail.
Note, if your branch gets out of date as other PRs are merged into the master branch, you may need to merge those changes into your brnach and fix any conflicts prior to your PR being merged.
If your branch changes anything that is expected to causes the yields to change, please run the following to updated the reference yields:
cd analysis/topEFT/
sh remake_ci_ref_yields.sh
sh remake_ci_ref_datacard.sh
The first script remakes the reference json
file for the yields, and the second remakes the reference txt
file for the datacar maker. If you are sure these change are expected, commit and push them to the PR.
To install pytest
for local testing, run:
conda install -c conda-forge pytest pytest-cov
where pytest-cov
is only used if you want to locally check the code coverage.
The pytest
commands are run automatically in the CI. If you would like to run them locally, you can simply run:
pytest
from the main topcoffea directory. This will run all the tests, which will take ~20 minutes. To run a subset, use e.g.:
pytest -k test_futures
where test_futures
is the file/test you would like to run (check the tests
directory for all the available tests, or write your own and push it!). If you would also like to see how the coverage changes, you can add --cov=./ --cov-report=html
to pytest
commands. This will create an html
directory that you can then copy to any folder which you have web access to (e.g. ~/www/
on Earth) For a better printout of what passed and failed, add -rP
to the pytest
commands.
The v0.5 tag was used to produce the results in the TOP-22-006 paper.
-
Run the processor to obtain the histograms (from the skimmed naod files). Use the
fullR2_run.sh
script in theanalysis/topEFT
directory.time source fullR2_run.sh
-
Run the datacard maker to obtain the cards and templates (from the pickled histogram file produced in Step 1, be sure to use the version with the nonprompt estimation, i.e. the one with
_np
appended to the name you specified for theOUT_NAME
infullR2_run.sh
).time python make_cards.py /path/to/your/examplename_np.pkl.gz -C --do-nuisance --var-lst lj0pt ptz -d /scratch365/you/somedir --unblind --do-mc-stat
-
Run the post-processing checks on the cards to look for any unexpected errors in the condor logs and to grab the right set of ptz and lj0pt templates and cards used in TOP-22-006. The script will copy the relevant cards/templates to a directory called
ptz-lj0pt_withSys
that it makes inside of the directory you pass that points to the cards and templates made in Step 2. Thisptz-lj0pt_withSys
is the directory that can be copied to wherever you plan to run thecombine
steps (e.g. PSI).time python datacards_post_processing.py /scratch365/you/somedir -c -s
-
Check the yields with
get_datacard_yields.py
script. This scrip will read the datacards in the directory produced in Step 3 and will dump the SM yields (summed over jet bins) to the screen (the text is formatted as a latex table). Use the--unblind
option if you want to also see the data numbers.python get_datacard_yields.py /scratch365/you/somedir/ptz-lj0pt_withSys/ --unblind
-
Proceed to the Steps for reproducing the "official" TOP-22-006 workspace steps listed in the EFTFit Readme. Remember that in addition to the files cards and templates, you will also need the
selectedWCs.txt
file.