Signal process of MC sample preparation used in analysis stage.
Select final state that we interested in. Build framework to transfer MiniAod or NanoAod to root tree files.
Find a region that contains few signal called control region. In this region, determine the selection to make the comparison between data and MC have an agreement.
There are kinds of scale factors as a function of
- electron/muon/photon ID scale factor
- muon isolation scale factor
- electron reconstruction scale factor
- photon veto scale factor
- trigger scale factor
- pileup reweighting
- egamma energy scale and smearing
- L1 prefiring
For electron scale factors application, please refer to ElectronSFs
For muon SFs, please refer to muonSFs
For photon SFs, please refer to photonSFs and photonSFs
Almost scale factors are provided by corrsponding POG except the trigger efficiencies measurement. As the trigger scale factors vary with different working points and basic selection requirement. For example, in Zgamma measurement, we require the electron p_T>25 GeV, but in other analysis, they may require the electron p_T>30 GeV. An official tool is provided by Egamma POG seen TnpTools. Some approval HLT SFs eleHLTSFs are provided by Egamma HLT group, which you may exactly need.
We always use Z to ee events to measure the trigger efficiency. The MC samples are DYell processes with different generator in LO or NLO. These difference is used to estimate one soucrce of uncertainty called alternative MC. In the TnpTools page, it has a brief introduction, you can do step by step. I also attach some disscussion and introduction from my understanding in ref_Tnp.
Beside these SFs, we need to normalize the MC to its corresponding cross section, where unity of events normalized is cross section of one event. Therefore, when luminosity with unity fb-1 is multipled to events in MC normalized, the yields are got. A table recorded cross section for popular process is provided officially that can be refered to XS_Table. Or if a paper published gave a more accurate cross section, it can also be used. Besides, the GenXSAnalyzer is often used. I provide some codes in my GitHub here Add_weights used to calculate normalized weights and add SFs. As some processes in different years may have little difference and SFs files are different, it is better to do separately. Actually, the code I show is not convenient to write. You may also have the idea that do SetBranchAddress
many times very annoyed. I introduce a way called TTreeFormula
.
In this step, we want to add extra branches that save different SFs and slim the root files at the same time.
void add(){
TFile*fin=new TFile("fin.root");
TFile*fout=new TFile("fout.root","recreate");
TTree*treeIn=(TTree*)fin->Get("tree");
TTree*treeOut=treeIn->Clone();
treeOut->Branch("ele1_id_sf",&ele1_id_sf,"ele1_id_sf/D");
TString cut="basic cuts";
TTreeFormula*tformula1=new TTreeFormula("formula1",cut,tree);
for(int i=0;i<treeIn->GetEntries();i++){
treeIn->GetEntry(i);
ele1_id_sf=xxx;
if ( !tformula1->EvalInstance() )
continue;
treeOut->Fill();
}
fout->cd();
treeOut->Write();
fout->Close();
fin->Close();
}
A very simple example listed above.
- Prepare fake photon sample from data seen plj_production
- Calculate fakefraction fakefraction
- Calculate weights of fake photon sample pljweight
As it's still annoyed to repeate the same procedure in different years and channels. I recommend you define functions and call for them later to produce histograms that you want in different years and channels one time. See an example here Build_hist. These histograms can be use to prepare data cards for significance measurement and calculate uncertainties.
.....
TString th1name="hist_name";
TString cut="(basic cuts)"
TString var="ptlep1";
TH1D*h1=new TH1D(th1name,"",bin,xlow,xhigh);
tree->Draw(var+">>"+th1name,cut+"*weight","goff");
List a simple example to fill histograms not in a loop.
As the renormalization(mu_R) and factorization(mu_F) are used to estimate pp collsion that included in the MC simulation, it is necessary to consider its uncertainty. In CMS MC simulation, we vary mu_R and mu_F by combination of (1,0.5,2). From the envelop, the largest variation comapred with the central one as uncertainty, except condition when mu_R and mu_F are 0.5 or 2.0.
Besides the scale uncertainties, the uncertainty from PDF(parton distribution function) is also needed to consider. The way of handling PDF uncetainties is to calculate the standard deviation from hundreds of PDF weights.
Make use of MadGraph syntax, the interference process can be produced by:
generate p p > lep+ lep- a j j QCDˆ2==2
Take the fraction to signal process in SR as the uncertainty.
Introdution
Three sources uncertainties here. Code here fakeuncertainties
- particel ID and reconstruction
- pileup
- luminosity
- cross section estimation
- L1 prefiring in 2016 and 2017
I usualy print all uncertaintis into a txt file seen an example summary_uncer. When prepare data card, use function of python to transfer them.
f=open('/home/pku/anying/cms/PKU-Cluster/Significance/Uncer/summary_uncer_'+sys.argv[2]+sys.argv[1]+'.txt')
import re
import numpy as np
for line in f:
if not line.strip():
continue
print line
line = line.replace('[','')
line = line.replace(']','')
line = line.replace('\n','')
print line
arr_Temp = re.split(',|=',line)
print arr_Temp
name = arr_Temp[0]
arr_Temp = np.array(arr_Temp[1:])
#arr_Temp.astype(np.float)
arr_Temp = [float(x) for x in arr_Temp]
print name
arr[name]=arr_Temp
print arr
Then we can get special arrays with style like [string][double], which is convenient in the following usage.
Introdution
A simultaneous fit was performed both in CR and SR.
We usually add optimization cuts to increase ratio of signal to background. These cuts are only applied in significance calculation not in the cross section measurement. A scan method is used to determine the value of optimization cuts.
- Histograms preparation code
- data card code
- Combine from HiggsCombieTools
Introdution
The signal strength is got by perfoming the fit as did in significane measurement. Pay attention to comment theory uncertainties related with signal process, only shape effect needs to be considered.
At first, please save branch of gen variables. You should build gen variables same as what you did in the ntuple code in reconstruction level. An example for VBS Zgamma seen genparticles.
Except the minor change of selection, when calculate signal strength, the optimization cuts not included, method of handling EW theoretical uncertainties and division of EW signal sample to in-fiducial(reco&&gen) and out-fiducial(reco&&!gen), the left is same as in calculating significance.
It is very similar with steps in significance calculation.
Unfolding is the generic term used to describe this detector-level to particle-level correction for distributions. In general, this can account for limited acceptance, finite resolution, and inefficiencies of the detector, as well as bin migrations in the distribution between measured and corrected. Simulated EW samples from MC event generators are used to perform the unfolding. Distributions obtained from the generated events correspond to particle-level (will be referred to as gen). Then the events are passed through a detector simulation programme, mimicking the behaviour of the CMS detector as closely as possible, and the same distributions obtained using these events correspond to detector-level (will be referred to as reco).
The unfolded differential cross section can be measured as a function of some variables.
At first, please save branch of gen variables as done in signal stength measurement. As the unfolded differential cross section is a function of some variable. Different distribution histograms are needed to be prepared. And as the case in signal stength measurement, events in-fiducial(reco&&gen) and out-fiducia are still needed to be divided. What's more, uncertainties for different distribution of the unfolded variable are different, which means that you need to calculate these separately. List a simple example to define a function and then call for to produce a series of histograms one time.
void run(TString vec_branchname,TString cut,vector<double> bins){
......
map<TString, double> variables;
tree->SetBranchAddress(vec_branchname, &variables[vec_branchname]);
tree->SetBranchAddress("weight",&weight);
TString th1name="name"
TH1D* th1 = new TH1D(th1name,th1name,bins.size()-1,&bins[0]);
TTreeFormula*tformula1=new TTreeFormula("formula1",cut,tree);
for(int i=0;i<tree->GetEntries();i++){
tree->GetEntry(i);
if ( !tformula1->EvalInstance() )
continue;
th1->Fill(variables[vec_branchname],weight);
}
}
Or use tree->Draw("var>>h1","cut*weight","goff")
directly.
- Histograms Build_Hist
- data card card
- Combine HiggsCombine_Unfolding