Skip to content

AnYpku/PKU-Cluster

Repository files navigation

MC preparation

Signal process of MC sample preparation used in analysis stage.

Ntuple code set up

Select final state that we interested in. Build framework to transfer MiniAod or NanoAod to root tree files.

Analysis stage

Control Region definition and selection determination

Find a region that contains few signal called control region. In this region, determine the selection to make the comparison between data and MC have an agreement.

Scale factor applied

There are kinds of scale factors as a function of $\eta$ and $p_{T}$ in most case, including

  • electron/muon/photon ID scale factor
  • muon isolation scale factor
  • electron reconstruction scale factor
  • photon veto scale factor
  • trigger scale factor
  • pileup reweighting
  • egamma energy scale and smearing
  • L1 prefiring

For electron scale factors application, please refer to ElectronSFs

For muon SFs, please refer to muonSFs

For photon SFs, please refer to photonSFs and photonSFs

Almost scale factors are provided by corrsponding POG except the trigger efficiencies measurement. As the trigger scale factors vary with different working points and basic selection requirement. For example, in Zgamma measurement, we require the electron p_T>25 GeV, but in other analysis, they may require the electron p_T>30 GeV. An official tool is provided by Egamma POG seen TnpTools. Some approval HLT SFs eleHLTSFs are provided by Egamma HLT group, which you may exactly need.

Trigger efficiency

We always use Z to ee events to measure the trigger efficiency. The MC samples are DYell processes with different generator in LO or NLO. These difference is used to estimate one soucrce of uncertainty called alternative MC. In the TnpTools page, it has a brief introduction, you can do step by step. I also attach some disscussion and introduction from my understanding in ref_Tnp.

Add weights

Beside these SFs, we need to normalize the MC to its corresponding cross section, where unity of events normalized is cross section of one event. Therefore, when luminosity with unity fb-1 is multipled to events in MC normalized, the yields are got. A table recorded cross section for popular process is provided officially that can be refered to XS_Table. Or if a paper published gave a more accurate cross section, it can also be used. Besides, the GenXSAnalyzer is often used. I provide some codes in my GitHub here Add_weights used to calculate normalized weights and add SFs. As some processes in different years may have little difference and SFs files are different, it is better to do separately. Actually, the code I show is not convenient to write. You may also have the idea that do SetBranchAddress many times very annoyed. I introduce a way called TTreeFormula.

In this step, we want to add extra branches that save different SFs and slim the root files at the same time.

void add(){
TFile*fin=new TFile("fin.root");
TFile*fout=new TFile("fout.root","recreate");
TTree*treeIn=(TTree*)fin->Get("tree");
TTree*treeOut=treeIn->Clone();
treeOut->Branch("ele1_id_sf",&ele1_id_sf,"ele1_id_sf/D");
TString cut="basic cuts";
TTreeFormula*tformula1=new TTreeFormula("formula1",cut,tree);
for(int i=0;i<treeIn->GetEntries();i++){
   treeIn->GetEntry(i);
    ele1_id_sf=xxx;
    if (  !tformula1->EvalInstance() )
         continue;
    treeOut->Fill();
}
fout->cd();
treeOut->Write();
fout->Close();
fin->Close();
}

A very simple example listed above.

Background estimation

Fake photon

Fake lepton

Uncertainty calculation

As it's still annoyed to repeate the same procedure in different years and channels. I recommend you define functions and call for them later to produce histograms that you want in different years and channels one time. See an example here Build_hist. These histograms can be use to prepare data cards for significance measurement and calculate uncertainties.

.....
TString th1name="hist_name";
TString cut="(basic cuts)"
TString var="ptlep1";
TH1D*h1=new TH1D(th1name,"",bin,xlow,xhigh);
tree->Draw(var+">>"+th1name,cut+"*weight","goff");

List a simple example to fill histograms not in a loop.

Scale and PDF uncertainties

As the renormalization(mu_R) and factorization(mu_F) are used to estimate pp collsion that included in the MC simulation, it is necessary to consider its uncertainty. In CMS MC simulation, we vary mu_R and mu_F by combination of (1,0.5,2). From the envelop, the largest variation comapred with the central one as uncertainty, except condition when mu_R and mu_F are 0.5 or 2.0.

Besides the scale uncertainties, the uncertainty from PDF(parton distribution function) is also needed to consider. The way of handling PDF uncetainties is to calculate the standard deviation from hundreds of PDF weights.

Interference effect

Make use of MadGraph syntax, the interference process can be produced by:

generate p p > lep+ lep- a j j QCDˆ2==2

Take the fraction to signal process in SR as the uncertainty.

Uncertainty from data-driven method

Introdution

Three sources uncertainties here. Code here fakeuncertainties

Jet energy correction uncertainties

Commom uncertaintis

  • particel ID and reconstruction
  • pileup
  • luminosity
  • cross section estimation
  • L1 prefiring in 2016 and 2017

I usualy print all uncertaintis into a txt file seen an example summary_uncer. When prepare data card, use function of python to transfer them.

f=open('/home/pku/anying/cms/PKU-Cluster/Significance/Uncer/summary_uncer_'+sys.argv[2]+sys.argv[1]+'.txt')
import re
import numpy as np
for line in f:
        if not line.strip():
            continue
        print line
        line = line.replace('[','')
        line = line.replace(']','')
        line = line.replace('\n','')
        print line
        arr_Temp = re.split(',|=',line)
        print arr_Temp
        name = arr_Temp[0]
        arr_Temp = np.array(arr_Temp[1:])
        #arr_Temp.astype(np.float)
        arr_Temp = [float(x) for x in arr_Temp]
        print name
        arr[name]=arr_Temp
print arr

Then we can get special arrays with style like [string][double], which is convenient in the following usage.

Significance

Introdution

A simultaneous fit was performed both in CR and SR.

Optimization cuts

We usually add optimization cuts to increase ratio of signal to background. These cuts are only applied in significance calculation not in the cross section measurement. A scan method is used to determine the value of optimization cuts.

Code preparation

Signal Strength

Introdution

The signal strength is got by perfoming the fit as did in significane measurement. Pay attention to comment theory uncertainties related with signal process, only shape effect needs to be considered.

At first, please save branch of gen variables. You should build gen variables same as what you did in the ntuple code in reconstruction level. An example for VBS Zgamma seen genparticles.

Except the minor change of selection, when calculate signal strength, the optimization cuts not included, method of handling EW theoretical uncertainties and division of EW signal sample to in-fiducial(reco&&gen) and out-fiducial(reco&&!gen), the left is same as in calculating significance.

Code preparation

It is very similar with steps in significance calculation.

Unfolding

Unfolding is the generic term used to describe this detector-level to particle-level correction for distributions. In general, this can account for limited acceptance, finite resolution, and inefficiencies of the detector, as well as bin migrations in the distribution between measured and corrected. Simulated EW samples from MC event generators are used to perform the unfolding. Distributions obtained from the generated events correspond to particle-level (will be referred to as gen). Then the events are passed through a detector simulation programme, mimicking the behaviour of the CMS detector as closely as possible, and the same distributions obtained using these events correspond to detector-level (will be referred to as reco).

The unfolded differential cross section can be measured as a function of some variables.

Code preparation

At first, please save branch of gen variables as done in signal stength measurement. As the unfolded differential cross section is a function of some variable. Different distribution histograms are needed to be prepared. And as the case in signal stength measurement, events in-fiducial(reco&&gen) and out-fiducia are still needed to be divided. What's more, uncertainties for different distribution of the unfolded variable are different, which means that you need to calculate these separately. List a simple example to define a function and then call for to produce a series of histograms one time.

void run(TString vec_branchname,TString cut,vector<double> bins){

......
map<TString, double> variables;
tree->SetBranchAddress(vec_branchname, &variables[vec_branchname]);
tree->SetBranchAddress("weight",&weight);
TString th1name="name"
TH1D* th1 = new TH1D(th1name,th1name,bins.size()-1,&bins[0]);
TTreeFormula*tformula1=new TTreeFormula("formula1",cut,tree);
for(int i=0;i<tree->GetEntries();i++){
   tree->GetEntry(i);
    
    if (  !tformula1->EvalInstance() )
         continue;
    th1->Fill(variables[vec_branchname],weight);
}
}

Or use tree->Draw("var>>h1","cut*weight","goff") directly.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published