Skip to content

Commit

Permalink
changes in inputs.json and run sequence.sh adding also dir runImpacts…
Browse files Browse the repository at this point in the history
…ALT_0M_ALT_0M
  • Loading branch information
fderiggi committed Mar 28, 2024
2 parents 1ce8b56 + 1f5e2a6 commit 9935bbd
Show file tree
Hide file tree
Showing 12 changed files with 719 additions and 11 deletions.
57 changes: 57 additions & 0 deletions Combine/Checks/Bias_in_significance/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
# Bias in significance

Here we provide a script to perform the bias studies developed for the HIG-22-012 analysis.

For extremely low stat categories the nominal bias studies can fail as the fitted PDF goes negative for a large fraction of the toys, unless we truncate r at 0 which then means we get a truncated pull distribution.

For HIG-22-012 we developed a method which looks at the **bias in significance**. In short, the method uses the significance as a metric, and compares the Type-1 error rate (fraction of toys with significance greater than some threshold) for fixed (best-fit) and floating PDF indices, where the toys are generated with the best-fit index. If the curves differ significantly between these two choices then it indicates some bias in the method. The plot also compares the Type-1 error rate to the asymptotic approximation curve, indicating whether the analysis will need to use the `HybridNew` method in the limit setting. More information is provided in `AN2022_074_v7` (section 9.5), which is attached to the HIG-22-012 CADI.

A description of what the plots show is provided in the following slide:
![](plot_explanation.png)

## Inputs
You can run the study for the full workspace or for each category individually.
To prepare individual category workspaces you can use the existing combineCards functionality, for example:
```
combineCards.py Datacard.txt --ic cat_name > Datacard_cat_name.txt
```

This creates a .txt datacard with only categories matching the reg exp `cat_name` included.
Be careful: you will probably have to manually delete some pdfindex lines at the bottom;
the script does not know that these correspond to the analysis categories,
and therefore will leave them all in (you only want the one corresponding to the category you are studying).

Once that is done, you can run your usual `text2workspace` command to generate the `-d, --datacard` input for this script.

For the full workspace, just use the compiled datacard with all categories as input.

## Usage

The script is split into four different stages. You can configure the number of toys with the `--nToys` option. If the run-time is too long you could consider creating a stat-only workspace e.g. remove all systematics (not including the PDF indices) from the txt datacard and recompile. The bias studies are probably fine to do stat-only (for stat dominated analyses). If still too long then you will want to parallelize into separate jobs and hadd the fit results at the end. An example on how to do this for the SGE batch submission system is provided in the `RunBiasInSignificance_hig22012.py` script.

### Setup
Do an initial fit to the workspace, fixing the signal strength (r) to zero. Save the values of the best-fit pdf indices. Combine requires at least one parameter to fit which you can define with the `--initial-fit-param` option. You can set this to any parameter in the workspace (ideally a nuisance a low impact). The default is a parameter named `lumi_13TeV_uncorrelated_2016`.
```
python RunBiasInSignificance.py --inputWSFile Datacard.root --MH 125.38 --mode setup
```

### Generate toys
```
python RunBiasInSignificance.py --inputWSFile Datacard.root --MH 125.38 --nToys N --mode generate
```
### Fit with the PDF indices fixed
Extract significance for each toy, where the PDF is fixed to the generator PDF (best-fit from setup step).
```
python RunBiasInSignificance.py --inputWSFile Datacard.root --MH 125.38 --nToys N --mode fixed
```

### Fit with the PDF indices floating
Extract significance for each toy, where the PDF index if floating
```
python RunBiasInSignificance.py --inputWSFile Datacard.root --MH 125.38 --nToys N --mode envelope
```

## Plotting the bias in significance results
```
python SummaryBiasSignificance.py
```
80 changes: 80 additions & 0 deletions Combine/Checks/Bias_in_significance/RunBiasInSignificance.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
import ROOT
import os
import glob
import re
from optparse import OptionParser
import subprocess
import json

def rooiter(x):
iter = x.iterator()
ret = iter.Next()
while ret:
yield ret
ret = iter.Next()

def get_options():
parser = OptionParser()
parser.add_option('--inputWSFile', dest='inputWSFile', default='Datacard.root', help="Input workspace")
parser.add_option('--MH', dest='MH', default='125.38', help="MH")
parser.add_option('--initial-fit-param', dest='initial_fit_param', default='lumi_13TeV_uncorrelated_2016', help="Initial fit parameter (combine must have an input parameter to fit to, pick any low impact nuisance)")
parser.add_option('--nToys', dest='nToys', default=2000, type='int', help="Number of toys")
parser.add_option('--mode', dest='mode', default="setup", help="[setup,generate,fixed,envelope]")
return parser.parse_args()
(opt,args) = get_options()

if opt.mode == "setup":

# Get list of pdfindeices
f = ROOT.TFile(opt.inputWSFile)
w = f.Get("w")
cats = w.allCats()
pdf_index = []
for cat in rooiter(cats):
if "pdfindex" in cat.GetName(): pdf_index.append(cat.GetName())
f.Close()

# Initial fit fixing params to be zero
cmd = "combine -m %s -d %s -M MultiDimFit --cminDefaultMinimizerStrategy 0 --setParameters MH=%s,r=0 --freezeParameters MH,r -P %s -n _initial --saveWorkspace --saveSpecifiedIndex %s --X-rtd MINIMIZER_freezeDisassociatedParams --X-rtd MINIMIZER_multiMin_hideConstants --X-rtd MINIMIZER_multiMin_maskConstraints --X-rtd MINIMIZER_multiMin_maskChannels=2; cd .."%(opt.MH,opt.inputWSFile,opt.MH,opt.initial_fit_param,",".join(pdf_index))
print(cmd)
os.system(cmd)

# Save best fit pdf indices to json file
f_res = ROOT.TFile("higgsCombine_initial.MultiDimFit.mH%s.root"%opt.MH)
t = f_res.Get("limit")
t.GetEntry(0)
pdf_index_bf = {}
for index in pdf_index: pdf_index_bf[index] = getattr(t,index)
f_res.Close()
with open("pdfindex.json","w") as jf:
json.dump(pdf_index_bf, jf)

if opt.mode == "generate":

cmd = "combine -m %s -d higgsCombine_initial.MultiDimFit.mH%s.root -M GenerateOnly --setParameters MH=%s --freezeParameters MH --expectSignal 0 -n _toy_ --saveToys --snapshotName MultiDimFit -t %s -s -1\n\n"%(opt.MH,opt.MH,opt.MH,opt.nToys)
print(cmd)
os.system(cmd)

cmd = "mv higgsCombine_toy_* toys.root"
os.system(cmd)

if opt.mode == "fixed":

# Get pdf index and the best fit values
with open("pdfindex.json", "r") as jf:
pdf_index_bf = json.load(jf)

pdf_index_freeze = ",".join(pdf_index_bf.keys())
pdf_index_set = ""
for k,v in pdf_index_bf.items(): pdf_index_set += "%s=%s,"%(k,v)
pdf_index_set = pdf_index_set[:-1]

cmd = "combine -m %s -d higgsCombine_initial.MultiDimFit.mH%s.root -M Significance --snapshotName MultiDimFit --cminDefaultMinimizerStrategy 0 --setParameters MH=%s,%s --expectSignal 0 --freezeParameters MH,%s -n _fixed_ -t %s --X-rtd MINIMIZER_freezeDisassociatedParams --X-rtd MINIMIZER_multiMin_hideConstants --X-rtd MINIMIZER_multiMin_maskConstraints --X-rtd MINIMIZER_multiMin_maskChannels=2 --toysFile toys.root --X-rtd ADDNLL_RECURSIVE=1; mv higgsCombine_fixed_* fit_fixed.root"%(opt.MH,opt.MH,opt.MH,pdf_index_set,pdf_index_freeze,opt.nToys)
print(cmd)
os.system(cmd)

if opt.mode == "envelope":

cmd = "combine -m %s -d higgsCombine_initial.MultiDimFit.mH%s.root -M Significance --snapshotName MultiDimFit --cminDefaultMinimizerStrategy 0 --setParameters MH=%s --expectSignal 0 --freezeParameters MH -n _envelope_ -t %s --X-rtd MINIMIZER_freezeDisassociatedParams --X-rtd MINIMIZER_multiMin_hideConstants --X-rtd MINIMIZER_multiMin_maskConstraints --X-rtd MINIMIZER_multiMin_maskChannels=2 --toysFile toys.root --X-rtd ADDNLL_RECURSIVE=1; mv higgsCombine_envelope_* fit_envelope.root"%(opt.MH,opt.MH,opt.MH,opt.nToys)
print(cmd)
os.system(cmd)
193 changes: 193 additions & 0 deletions Combine/Checks/Bias_in_significance/RunBiasInSignificance_hig22012.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,193 @@
import ROOT
import os
import glob
import re
from optparse import OptionParser
import subprocess
import json

def rooiter(x):
iter = x.iterator()
ret = iter.Next()
while ret:
yield ret
ret = iter.Next()

def write_preamble(_file):
_file.write("#!/bin/bash\n")
_file.write("ulimit -s unlimited\n")
_file.write("set -e\n")
_file.write("cd %s/src\n"%os.environ['CMSSW_BASE'])
_file.write("export SCRAM_ARCH=%s\n"%os.environ['SCRAM_ARCH'])
_file.write("source /cvmfs/cms.cern.ch/cmsset_default.sh\n")
_file.write("eval `scramv1 runtime -sh`\n")
_file.write("cd %s\n"%os.environ['PWD'])

def get_options():
parser = OptionParser()
parser.add_option('--MH', dest='MH', default='125.38', help="MH")
parser.add_option('--initial-fit-param', dest='initial_fit_param', default='lumi_13TeV_2016', help="Initial fit parameters")
parser.add_option('--nToys', dest='nToys', default=2000, type='int', help="Number of toys")
parser.add_option('--nJobs', dest='nJobs', default=10, type='int', help="Number of jobs")
parser.add_option('--mode', dest='mode', default="setup", help="[setup,generate,fixed,envelope,hadd]")
return parser.parse_args()
(opt,args) = get_options()

if opt.mode == "setup":

os.system("cp %s/Combine/Datacard/Datacard_ggtt_resonant_mx%smy%s.txt %s"%(opt.inputDir,opt.MX,opt.MY,run_dir))

if not os.path.isdir("Models%s"%ext_str):
os.system("cp -rp %s/Combine/Models ./Models%s"%(opt.inputDir,ext_str))

# Make stat only datacard and delete pdf indices
with open("%s/Datacard_ggtt_resonant_mx%smy%s.txt"%(run_dir,opt.MX,opt.MY), "r") as f:
lines = f.readlines()

new_lines_all = []
for line in lines:
line = re.sub("\./Models", "%s/Models%s"%(os.environ['PWD'],ext_str), line)
new_lines_all.append(line)
with open("%s/Datacard.txt"%run_dir, "w") as f:
for line in new_lines_all:
f.write(line)

cmd = "cd %s; text2workspace.py Datacard.txt -m %s higgsMassRange=65,150; cd .."%(run_dir,opt.MH)
print(cmd)
os.system(cmd)

# Get list of pdfindeices
f = ROOT.TFile("%s/Datacard.root"%run_dir)
w = f.Get("w")
cats = w.allCats()
pdf_index = []
for cat in rooiter(cats):
if "pdfindex" in cat.GetName(): pdf_index.append(cat.GetName())
f.Close()

# Initial fit fixing params to be zero
cmd = "cd %s; combine -m %s -d Datacard.root -M MultiDimFit --cminDefaultMinimizerStrategy 0 --setParameters MH=%s,MY=%s,MX=%s,r=0 --freezeParameters MH,MX,MY,r -P %s -n _initial --setParameterRanges r=0,0.5 --saveWorkspace --saveSpecifiedIndex %s --X-rtd MINIMIZER_freezeDisassociatedParams --X-rtd MINIMIZER_multiMin_hideConstants --X-rtd MINIMIZER_multiMin_maskConstraints --X-rtd MINIMIZER_multiMin_maskChannels=2; cd .."%(run_dir,opt.MH,opt.MH,opt.MY,opt.MX,opt.initial_fit_param,",".join(pdf_index))
print(cmd)
os.system(cmd)

# Save best fit pdf indices to json file
f_res = ROOT.TFile("%s/higgsCombine_initial.MultiDimFit.mH%s.root"%(run_dir,opt.MH))
t = f_res.Get("limit")
t.GetEntry(0)
pdf_index_bf = {}
for index in pdf_index: pdf_index_bf[index] = getattr(t,index)
f_res.Close()
with open("%s/pdfindex.json"%run_dir,"w") as jf:
json.dump(pdf_index_bf, jf)

if opt.mode == "generate":

if not os.path.isdir("toys"):
os.system("mkdir -p toys")

if not os.path.isdir("jobs_toys"):
os.system("mkdir -p jobs_toys")
else:
# Remove previous jobs
os.system("rm jobs_toys/sub*")

for i_job in range( opt.nJobs ):
f_sub_name = "jobs_toys/sub_%g.sh"%(i_job)
f_sub = open(f_sub_name, "w")
write_preamble(f_sub)
f_sub.write("combine -m %s -d higgsCombine_initial.MultiDimFit.mH%s.root -M GenerateOnly --setParameters MH=%s --freezeParameters MH --expectSignal 0 -n _toy_%g_ --saveToys --snapshotName MultiDimFit -t %s -s -1\n\n"%(opt.MH,opt.MH,opt.MH,i_job,opt.nToys))
# Move toy to toys folder
f_sub.write("mv higgsCombine_toy_%g_* toys/toy_%g.root"%(i_job,i_job))
f_sub.close()

os.system("chmod 775 %s"%f_sub_name)

f_out_name = "%s/%s"%(os.environ['PWD'],re.sub("\.sh",".out",f_sub_name))
f_err_name = "%s/%s"%(os.environ['PWD'],re.sub("\.sh",".err",f_sub_name))
cmd = "qsub -q hep.q -l h_rt=1:0:0 -o %s -e %s %s"%(f_out_name,f_err_name,f_sub_name)
print(cmd)
os.system(cmd)

if opt.mode == "fixed":

if not os.path.isdir("fits_fixed"):
os.system("mkdir -p fits_fixed")

if not os.path.isdir("jobs_fits_fixed"):
os.system("mkdir -p jobs_fits_fixed")
else:
# Remove previous jobs
os.system("rm jobs_fits_fixed/sub*")

# Get pdf index and the best fit values
with open("pdfindex.json", "r") as jf:
pdf_index_bf = json.load(jf)

pdf_index_save = ",".join(pdf_index_bf.keys())
pdf_index_freeze = ",".join(pdf_index_bf.keys())
pdf_index_set = ""
for k,v in pdf_index_bf.items(): pdf_index_set += "%s=%s,"%(k,v)
pdf_index_set = pdf_index_set[:-1]

for i_job in range( opt.nJobs ):
f_sub_name = "jobs_fits_fixed/sub_%g.sh"%(i_job)
f_sub = open(f_sub_name, "w")
write_preamble(f_sub)
f_sub.write("combine -m %s -d higgsCombine_initial.MultiDimFit.mH%s.root -M Significance --snapshotName MultiDimFit --cminDefaultMinimizerStrategy 0 --setParameters MH=%s,%s --expectSignal 0 --freezeParameters MH,%s --trackCats %s -n _fixed_%g_ -t %s --setParameterRanges r=0,50 --X-rtd MINIMIZER_freezeDisassociatedParams --X-rtd MINIMIZER_multiMin_hideConstants --X-rtd MINIMIZER_multiMin_maskConstraints --X-rtd MINIMIZER_multiMin_maskChannels=2 --toysFile toys/toy_%g.root --X-rtd ADDNLL_RECURSIVE=1\n\n"%(opt.MH,opt.MH,opt.MH,pdf_index_set,pdf_index_freeze,pdf_index_save,i_job,opt.nToys,i_job))
# Move toy to toys folder
f_sub.write("mv higgsCombine_fixed_%g_* fits_fixed/fit_fixed_%g.root"%(i_job,i_job))
f_sub.close()

os.system("chmod 775 %s"%f_sub_name)

f_out_name = "%s/%s"%(os.environ['PWD'],re.sub("\.sh",".out",f_sub_name))
f_err_name = "%s/%s"%(os.environ['PWD'],re.sub("\.sh",".err",f_sub_name))
cmd = "qsub -q hep.q -l h_rt=2:0:0 -o %s -e %s %s"%(f_out_name,f_err_name,f_sub_name)
print(cmd)
os.system(cmd)

if opt.mode == "envelope":

if not os.path.isdir("fits_envelope"):
os.system("mkdir -p fits_envelope")

if not os.path.isdir("jobs_fits_envelope"):
os.system("mkdir -p jobs_fits_envelope")
else:
# Remove previous jobs
os.system("rm jobs_fits_envelope/sub*")

# Get pdf index and the best fit values
with open("pdfindex.json", "r") as jf:
pdf_index_bf = json.load(jf)

pdf_index_save = ",".join(pdf_index_bf.keys())

for i_job in range( opt.nJobs ):
f_sub_name = "jobs_fits_envelope/sub_%g.sh"%(i_job)
f_sub = open(f_sub_name, "w")
write_preamble(f_sub)
f_sub.write("combine -m %s -d higgsCombine_initial.MultiDimFit.mH%s.root -M Significance --snapshotName MultiDimFit --cminDefaultMinimizerStrategy 0 --setParameters MH=%s --expectSignal 0 --freezeParameters MH --trackCats %s -n _envelope_%g_ -t %s --setParameterRanges r=0,50 --X-rtd MINIMIZER_freezeDisassociatedParams --X-rtd MINIMIZER_multiMin_hideConstants --X-rtd MINIMIZER_multiMin_maskConstraints --X-rtd MINIMIZER_multiMin_maskChannels=2 --toysFile toys/toy_%g.root --X-rtd ADDNLL_RECURSIVE=1\n\n"%(opt.MH,opt.MH,opt.MH,pdf_index_save,i_job,opt.nToys,i_job))
# Move toy to toys folder
f_sub.write("mv higgsCombine_envelope_%g_* fits_envelope/fit_envelope_%g.root"%(i_job,i_job))
f_sub.close()

os.system("chmod 775 %s"%f_sub_name)

f_out_name = "%s/%s"%(os.environ['PWD'],re.sub("\.sh",".out",f_sub_name))
f_err_name = "%s/%s"%(os.environ['PWD'],re.sub("\.sh",".err",f_sub_name))
cmd = "qsub -q hep.q -l h_rt=2:0:0 -o %s -e %s %s"%(f_out_name,f_err_name,f_sub_name)
print(cmd)
os.system(cmd)



if opt.mode == "hadd":

cmd = "hadd -f fit_fixed.root fits_fixed/fit_fixed_*.root"
print(cmd)
os.system(cmd)

cmd = "hadd -f fit_envelope.root fits_envelope/fit_envelope_*.root"
print(cmd)
os.system(cmd)
Loading

0 comments on commit 9935bbd

Please sign in to comment.