87944 statistical data analysis for nuclear and
play

87944 - STATISTICAL DATA ANALYSIS FOR NUCLEAR AND SUBNUCLEAR PHYSICS - PowerPoint PPT Presentation

87944 - STATISTICAL DATA ANALYSIS FOR NUCLEAR AND SUBNUCLEAR PHYSICS [Module 3] Gabriele Sirri Istituto Nazionale di Fisica Nucleare 1 RECAP We talk about Data Modelling (a scientific narrative) Conceptual building blocks for


  1. 87944 - STATISTICAL DATA ANALYSIS FOR NUCLEAR AND SUBNUCLEAR PHYSICS [Module 3] Gabriele Sirri Istituto Nazionale di Fisica Nucleare 1

  2. RECAP • We talk about – Data Modelling (a scientific narrative…) – Conceptual building blocks for modeling in High En. Phys. •  Documentation: – K. Cranmer, «Practical Statistic for the LHC», https://arxiv.org/abs/1503.07622 • “Introduction” • “Conceptual blocks for modeling” • Topics discussed during past examinations (just examples and not exhaustive): – Terminology: Ensamble, Experiment, Channel, Observable, Parameters of Interest, Nuisance, Distribution, ... – The Marked Poisson Model – Extended Likelihood, .. 2

  3. RECAP • We talk about – Modeling with RooFit – RooFit Workspace •  Documentation: http://root.cern.ch/drupal/content/roofit – Quick start guide (20 pages) (Includes Workspace & Factory – Users Manual – RooFit macros • root.cern.ch → documentation → tutorials → roofit • There are over 80 macros illustrating many aspects of RooFit functionality • Topics discussed during past examinations (just examples and not exhaustive): – difference between variables, observables, parameters – difference between binned and unbinned datasets, … – sometimes we asked something like which Roofit class do you use for variables, observables, parameters, datasets 3

  4. Tip of the week http://www.isocpp.org Teach yourself Modern C++ • Stack-based scope instead of heap or static global scope. • Auto type inference instead of explicit type names. • Smart pointers instead of raw pointers. • std::string instead of raw char[] arrays. • Standard template library (STL) containers like vector, list, and map instead of raw arrays or custom containers. See <vector>, <list>, and <map>. • STL algorithms instead of manually coded ones. • Inline lambda functions instead of small functions implemented separately. • Range-based for loops to write more robust loops that work with arrays, STL containers in the form for ( for-range-declaration : expression ). • Exceptions, to report and handle error conditions. • Lock-free inter-thread communication using STL std::atomic<> (see <atomic>) instead of other inter-thread communication mechanisms. 4

  5. Further about C++ Programming • Force yourself coding with a style: ex: Google C++ Style Guide https://google.github.io/styleguide/cppguide.html – install clang-format or astyle • Set up a repository. Always. Also for small project. – – Simple, powerful Git GUI: SourceTree • boost C++ libraries: http://www.boost.org/ – Lot of work already done for you !!! 5

  6.  RECAP… Tutorial [fit gaus model to unbinned dataset] Edit the macro roofit_empty.cpp and, following the comments inside, create a Gaussian p.d.f. with mean = 0, and sigma = 1. Change the sigma to 3. Visualize the p.d.f. . Generate an unbinned dataset of 10000 events. Make a Fit with Maximum Likelihood. Visualize the results. Tips: Use information from the slides shown during the lecture or from RooFit Manual at par 2 (Submit ex11.cxx and the image of the canvas in png format) Laboratorio Analisi Statistica dei Dati per HEP - G.Sirri 6

  7. ROOT RooFit

  8.  ROOT vs RooFit (2) type of dataset ROOT fit after booking and filling an histogram ( binned data ). ROOT ROOFIT fit using the whole dataset ( unbinned data ) . Fit method (default) ROOT Least Squares, 𝑦 2 ROOFIT maximum likelihood, − log ℒ can be done also with a binned dataset Error bars ROOT by default gaussian approximation √𝑜 ROOFIT ROOFIT exact limits based on Poisson distribution (asymmetric bars when 𝑜 is small !!) Axis Label left to user  …. ROOT ROOFIT X taken from the variable definition Y is automatically: ex Events / BIN SIZE !!!! Laboratorio Analisi Statistica dei Dati per HEP - G.Sirri 8

  9. void rooofit_ex1 () ROOT ROOFIT void recap_root() { { // Setup model // define a gaussian p.d.f (mean 1 and sigma 1) // ============ // gaus(0) is a substitute for [0]*exp(-0.5*((x-[1])/[2])**2) // Declare variables x,mean,sigma with associated name, title, initial // and (0) means start numbering parameters at 0. value and allowed range auto g1 = new TF1{"gaus_1", "gaus(0)", -10, 10 }; RooRealVar x( "x" , "x" ,-10,10) ; double mean = 1 ; RooRealVar mean( "mean" , "mean of gaussian" ,1,-10,10) ; double sigma = 1 ; RooRealVar sigma( "sigma" , "width of gaussian" ,1,0.1,10) ; g1->SetParameter( 0, 1./( sigma * sqrt(2* 3.1415926 ) ) ); // Build gaussian p.d.f in terms of x,mean and sigma g1->SetParameter( 1, mean ) ; // set mean RooGaussian gauss( "gauss" , "gaussian PDF" ,x,mean,sigma) ; g1->SetParameter( 2, sigma) ; // set sigma // Construct plot frame in 'x' g1->SetLineColor( kBlue ) ; RooPlot* xframe = x.frame(Title( "Gaussian p.d.f." )) ; // change sigma to 3 // Plot model and change parameter values TF1* g2 = new TF1{"gaus_2", "gaus(0)", -10, 10 }; // ======================================= mean = 1 ; // Plot gauss in frame (i.e. in x) sigma = 3 ; gauss.plotOn(xframe) ; g2->SetParameter( 0, 1./( sigma * sqrt(2* 3.1415926 ) ) ); // Change the value of sigma to 3 g2->SetParameter( 1, mean ) ; // set mean sigma.setVal(3) ; g2->SetParameter( 2, sigma) ; // set sigma // Plot gauss in frame (i.e. in x) and draw frame on canvas g2->SetLineColor( kRed ) ; gauss.plotOn(xframe,LineColor(kRed)) ; // Book histograms // Generate events auto h_data = // =============== new TH1D{"h_data", "gaussian random numbers", 100, -10, 10}; // Generate a dataset of 1000 events in x from gauss RooDataSet* data = gauss.generate(x,1000) ; // Create a TRandom3 object to generate random numbers // Make a second plot frame in x and draw both the auto seed = 12345; // data and the p.d.f in the frame TRandom3 ran{seed}; RooPlot* xframe2 = x.frame(Title( "Gaussian p.d.f. with data" )) ; data->plotOn(xframe2) ; // Generate some random numbers and fill histograms gauss.plotOn(xframe2) ; auto const nvalues = 1000; // Fit model to data for (int i=0; i<nvalues; i++){ // ================== auto x = ran->Gaus(1,3); // gaussian in mean = 1 sigma = 3 // Fit pdf to data h_data->Fill(x); gauss.fitTo(*data) ; } // Print values of mean and sigma (that now reflect fitted values and errors) h_data->SetXTitle("x"); mean.Print() ; h_data->SetYTitle("f(x)"); sigma.Print() ; h_data->SetMarkerStyle(20); // Draw all frames on a canvas h_data->Fit("gaus") ; TCanvas* c = new TCanvas( "rf101_basics" , "rf101_basics" ,800,400) ; c->Divide(2) ; // Draw the canvas c->cd(1) ; c = new TCanvas("c1","c1",800,400) ; gPad->SetLeftMargin(0.15) ; c->Divide(2,1) ; xframe->GetYaxis()->SetTitleOffset(1.6) ; c->cd(1); xframe->Draw() ; g1->Draw() ; c->cd(2) ; g2->Draw("SAME") ; gPad->SetLeftMargin(0.15) ; c->cd(2); xframe2->GetYaxis()->SetTitleOffset(1.6) ; h_data->Draw("E1"); xframe2->Draw() ; } }

  10. RECAP • We talk about – Composite Models – Common Fitting Errors: • Understanding MINUIT output • Instabilities and correlation coefficients •  Documentation: http://root.cern.ch/drupal/content/roofit – Composite Models • RooFit User Manual pp17-28 – Common Fitting Errors • Slides • F. James, Interpretation of the errors on parameters as given by MINUIT http://lmu.web.psi.ch/docu/manuals/software_manuals/minuit2/mnerror.pdf • Topics discussed during past examinations (just examples and not exhaustive): – The p.d.f. of a generic extended (signal+background) composite model – The 4 MINUIT algorithms – The difference MINOS vs MIGRAD/HESSE – The theory behind the error computation by HESSE – MINOS and Profile Likelihood Interval – How to recognize/Which strategy to mitigate fit stabilities problems 10

  11. Tip of the week http://root.cern ROOT has a modern tool to manipulate and analyze datasets RDataFrame offers a high level interface for analyses of data stored in TTrees, CSV files and other data formats. In addition, multi-threading and other low-level optimisations allow users to exploit all the resources available on their machines transparently. In a nutshell: ROOT::EnableImplicitMT(); // Tell ROOT you want to go parallel ROOT::RDataFrame d("myTree", "file_*.root"); // Interface to TTree and CSV auto d2 = d.Filter( “x>0" ) // only accept events for which x>0 .Define(“r2”,”x* x+y *y”) ; // define a custom variable r2 auto myHisto = d2.Histo1D( “r2" ); // This happens in parallel! myHisto->Draw(); Lazy execution guarantees that all operations are performed in one event loop less typing, better expressivity, … in place of explicity for-loops on TTree events (SetBranchAddress(..), GetEntry (..), …) Good for histograms, profiles, data reductions, user-defined operations on events. 11

Recommend


More recommend