Machine learning – a very brief introduction Jaime Norman LPSC Grenoble Workshop on Heavy-flavour tagging, Inha University 06/12/2018
Outline • I will briefly outline the motivation to use machine learning to solve physics problems • We can then go through 2 examples using Toolkit for MultiVariate Analysis (TMVA) – Toy dataset example – Charmed baryon example • I have tried to give an intuitive picture of how – no detail about any methods – Very good summary: P. Bhat, Multivariate analysis methods in Particle Physics
Introduction • Many physics experiments (not just particle physics) search for rare signals • Often the main challenge is to extract the signal from the huge background arising from other (uninteresting) physics processes • Information from different detectors gives us features by which we can distinguish ‘signal’ from ‘background’ – Particle identification – Transverse momentum – Other kinematic / topological properties (relative angles, displaced vertices, more complex variables … ) • Knowledge of physics of background/signal crucial
Cut optimisation • The simplest way to try to remove background is by performing 1-dimensional cuts on features – E.g. PID – Nσ cut on dE/dx – pT of tracks • Signal candidates passing cut are kept, while others are rejected • Often not optimal! Especially if we are using many variables, which most likely have more complex, non-linear correlation • How can we optimise our selection?
Multivariate approach • Represent a dataset in terms of feature variables , or vector x = (x 1 , x 2 , x 3 , … , x n ) • Given a vector of features, we want to know (for example) the probability of an entry in our dataset of being signal or background • Construct function y = f(x) which maps feature space into a form which is constructed to be useful for classification – That is f provides map ℜ d à ℜ N – Preferable to have dimensionality N << d • In practice: dataset is finite, and functional form of data is unknown – approximate function f(x,w) is learned
e.g. top quark mass measurement, D0 ℜ d à ℜ n
Supervised learning • Using a training dataset of known output (signal/background) to approximate the function is known as supervised learning – Classification if f(x,w) is discrete ( binary classification if classifying into 2 classes • E.g. identifying higgs decay from other SM processes – Regression if f(x,w) is continuous • E.g. functional form of TPC dE/dx curve (as function of many variables) – see M. Ivanov talk • Also unsupervised learning, reinforced learning
example: Gaussian, linear correlated variables
example: Gaussian, linear correlated variables
Example: Boosted decision trees • Decision trees employ sequential cuts to perform classification • ‘Variable’ space split into partitions, and mapped onto one-dimensional classifier • Selection on classifier corresponds to decision boundary in feature space • Boosted decision trees: create many small trees, and combine - reduce misbehaviour due to fluctuations ✓ Can often perform more optimally than ‘standard’ rectangular cuts ✓ Deals with lots of input data very well – automatic selection of strongly discriminating features ✓ ‘Algorithm-of-choice’ for many other collaborations • Top quark mass[1], Higgs discovery[2], B s0 —>µµ[3] … [1] Phys. Rev. D.58,052001 (1998) [2] Phys. Lett. B 716 (2012) 30 [3] Nature 522, 68-72 (04 June 2015) ! X
Signal probability
Example: Λ C 0.6 Normalised counts Normalised counts Signal 6 < p < 8 GeV/c 0.5 0.5 Background T Data (sidebands) This Thesis 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 p p (GeV/c) p K (GeV/c) T T Normalised counts 35 Normalised counts 35 30 30 25 25 20 20 15 15 10 10 5 5 0 0 0.75 0.8 0.85 0.9 0.95 1 0 0.02 0.04 0.06 0.08 0.1 0.12 cosine pointing angle Decay Length (cm) (1/N) dN/dx 3.5 This Thesis signal 6 < p < 8 GeV/c 3 T background p-Pb, s = 5.02 TeV data NN 2.5 2 1.5 1 0.5 0 1 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 1 − − − − − BDT response
Validation • Just because you have a trained model using the most state-of-the- art, high performing ML algorithm, it doesn’t mean the output is right! – Training data must be accurate representation of real data – Trained model must be tested using independent dataset • Overfitting can occur
Testing TMVA overtraining check for classifier: BDT_pt2to3 4.5 dx Signal (test sample) Signal (training sample) / Background (test sample) Background (training sample) (1/N) dN 4 Kolmogorov-Smirnov test: signal (background) probability = 0 (0.083) 3.5 3 U/O-flow (S,B): (0.0, 0.0)% / (0.0, 0.0)% 2.5 2 1.5 1 0.5 0 0.4 0.2 0 0.2 0.4 − − BDT_pt2to3 response
MC/Data comparison 0.6 Normalised counts Normalised counts Signal 6 < p < 8 GeV/c 0.5 0.5 Background T Data (sidebands) This Thesis 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 p p (GeV/c) p K (GeV/c) T T Normalised counts 35 Normalised counts 35 30 30 25 25 20 20 15 15 10 10 5 5 0 0 0.75 0.8 0.85 0.9 0.95 1 0 0.02 0.04 0.06 0.08 0.1 0.12 cosine pointing angle Decay Length (cm) • Often we don’t have a ‘pure’ signal sample – Can be difficult to evaluate agreement with MC • We can always expect some MC/data difference – should enter in the systematic uncertainty evaluation
Tutorial • Now we can try the tutorial • https :// dfs . cern . ch / dfs / websites / j / jnorman / mvatutorial / • Download https :// cernbox . cern . ch / index . php / s / RvIESWYQF1u5zNI
References • Figures/info taken from P. Bhat, Multivariate analysis methods in Particle Physics, DOI: 10.1146/annurev.nucl.012809.104427
Towards a unified framework for ML in ALICE G.M.Innocenti
G.M.Innocenti
G.M.Innocenti
Quark vs. gluon jet tagging D mesons
• See https://indico.cern.ch/event/766450/ contributions/3225284/attachments/ 1765169/2865695/20181204_WorkshowQA.p df#search=gian%20michele for more info
Backup
Λ C hadronic decay reconstruction • PID using TPC via d E /d x and TOF via time of flight measurement • n σ cuts, or Bayesian approach to identify particles • Cuts on decay topologies exploiting decay vertex displacement from primary vertex • Signal extraction via invariant mass distribution • Feed-down (b) subtracted using pQCD-based estimation of charmed baryon production • Correct for e ffi ciency + normalisation ! X
Recommend
More recommend