Search for top Squarks Using Multivariate Methods Jonas Graw Max Planck Institute for Physics (Werner-Heisenberg-Institut) Thursday 20 th July, 2017
Motivation Standard Strategy: Cut-based analysis ATLAS Work in progress ATLAS Work in progress ∼ ∼ ~ ~ ~ 0 ~ ~ ~ 0 -1 → → χ → -1 → → χ → s = 13 TeV, 36.1 fb pp t t ; t +t; t b+W s = 13 TeV, 36.1 fb pp t t ; t +t; t b+W 1 1 7 7 7 7 [GeV] [GeV] 600 Significance Significance 600 Significance Significance SRA_TT SRB_TT 6 6 6 6 500 500 0 1 0 1 ∼ χ ∼ χ m m 5 5 5 5 400 400 4 4 4 4 300 300 3 3 3 3 200 200 2 2 2 2 100 100 1 1 1 1 0 0 0 0 200 400 600 800 1000 1200 200 400 600 800 1000 1200 m [GeV] m [GeV] ~ ~ t t → Try to look at multivariate methods using Monte Carlo-samples 07/20/2017 Jonas Graw - ˜ t → 0 ℓ - MVA 2/12
Machine Learning Classification: Signal or Background Supervised learning: Training done with labeled simulated events Events divided into training and testing (e. g. 50%-50%) Overtraining (”learning by heart”) needs to be avoided Labeled samples Training T esting Machine Learning ATLAS data Model Predicted Label 07/20/2017 Jonas Graw - ˜ t → 0 ℓ - MVA 3/12
Boosted Decision Tree (BDT) Division into two processes: Signal and Background Decision which variable to take is done by exactly one discriminating variable (cut) Chosen discriminating variable gives the best possible signal background separation Boosting : Training of a new tree, for which falsely classified events get a bigger weight BDT response r ( i ) ∈ [ − 1 , 1] of an event i : Classification measure dependent on the trees with limits { } { } r ( i ) = +1 signal : All trees classify i as r ( i ) = − 1 background 07/20/2017 Jonas Graw - ˜ t → 0 ℓ - MVA 4/12
Multivariate Methods: Boosted Decision Tree Boosted Decision Tree (BDT) utilized Training for models with m ˜ t ≥ 1TeV to optimize the usage of reaches better significances than discriminating variables Training for model with m ˜ t = 1TeV Training on region with 0 ℓ , E miss > T and m ˜ χ 0 = 1 GeV 250 GeV, ≥ 4 Jets & ≥ 1 b -Jet ROC-Curve is an indicator of how good the training was TMVA Overtraining Check for Classifier: BDT (1/N) dN/dx 4 Signal (test sample) Background (test sample) Signal (training sample) 3.5 Background (training sample) 3 2.5 2 1.5 1 0.5 0 − − − − 0.4 0.3 0.2 0.1 0 0.1 0.2 0.3 0.4 0.5 BDT response BDT-response for MC-test- and The area under the curve is a good MC-training data is in very good measure of the training accordance → No overtraining 07/20/2017 Jonas Graw - ˜ t → 0 ℓ - MVA 5/12
Cut on BDT-respones Significance 10 ATLAS Work in Progress 8 -1 s = 13 TeV, 36.1 fb TMVAweighted 6 1000 300 1000 1 1000 100 800 300 800 100 800 1 600 300 600 100 600 1 4 2 0 − − − − 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 Cut on BDT response Partially very high significances Important variables: m T 2 , p T ( top ) , E miss T Cut on BDT-response ≥ 0,34 07/20/2017 Jonas Graw - ˜ t → 0 ℓ - MVA 6/12
Expected Significances in m ˜ t - m ˜ 1 -Parameter Space χ 0 Cut-based method BDT method ATLAS Work in progress ATLAS Internal ∼ ~ ~ ~ ~ ~ ~ ∼ -1 → → χ 0 → -1 → → χ 0 → s = 13 TeV, 36.1 fb pp t t ; t +t; t b+W s = 13 TeV, 36.1 fb pp t t ; t +t; t b+W 1 7 7 1 6 6 [GeV] Significance Significance 600 4-th parameter 600 Significance Significance SRA_TT 6 6 MVA 5 5 500 500 0 1 ∼ χ BDT>0.34 m 5 5 4 4 400 400 4 4 3 3 300 300 3 3 2 2 200 200 2 2 100 100 1 1 1 1 0 0 0 0 200 400 600 800 1000 1200 200 400 600 800 1000 1200 m [GeV] m [GeV] ∼ ~ χ 0 t 1 Cut-based BDT 1 . 7 σ 3 . 0 σ ⇒ Expected significance of 3 σ up to m ˜ t = 1 TeV ⇒ Can we increase this by changing the BDT settings? 07/20/2017 Jonas Graw - ˜ t → 0 ℓ - MVA 7/12
Optimization of the BDT-Settings Expected significances of a sample with m ˜ t = 1 TeV and m ˜ χ 0 = 1 GeV MinNodeSize Significance of Signal InputConfig_TT_directTT_1000_1_a821_r7676_Input 4 4.5 3.8 4 3.5 3.6 3 3.4 2.5 3.2 2 3 1.5 2.8 1 0.5 2.6 1 2 3 4 5 6 7 8 9 10 MaxDepth Maximal Depth : How many different layers can an event surpass? Minimal Node Size : Fraction (%) of events required to be in a leaf ⇒ Are these really the best settings? 07/20/2017 Jonas Graw - ˜ t → 0 ℓ - MVA 8/12
Optimization of the BDT-Settings Area under ROC-Curve of a sample with m ˜ t = 1 TeV and m ˜ χ 0 = 1 GeV MinNodeSize ROC AUC 0.958 4.5 4 0.957 3.5 0.956 3 2.5 0.955 2 1.5 0.954 ATLAS Internal 1 0.953 0.5 1 2 3 4 5 6 7 8 9 10 MaxDepth Area under ROC-Curve tests against overtraining High Maximal Depth and Small Minimal Node Size minimizes ROC-area ⇒ Sign for overtraining! Settings: MaxDepth = 4 , MinNodeSize = 1 . 5 % 07/20/2017 Jonas Graw - ˜ t → 0 ℓ - MVA 9/12
BDT Training for different models Training for model with Training with all samples m ˜ t ≥ 1 TeV ATLAS Internal ATLAS Internal ∼ ∼ ~ ~ ~ 0 ~ ~ ~ 0 -1 → → χ → -1 → → χ → s = 13 TeV, 36.1 fb pp t t ; t +t; t b+W s = 13 TeV, 36.1 fb pp t t ; t +t; t b+W 1 6 6 1 6 6 4-th parameter 600 Significance Significance 4-th parameter 600 Significance Significance MVA MVA 5 5 5 5 500 500 BDT>0.34 BDT>0.36 4 4 4 4 400 400 3 3 3 3 300 300 2 2 2 2 200 200 100 1 1 100 1 1 0 0 0 0 200 400 600 800 1000 1200 200 400 600 800 1000 1200 m [GeV] m [GeV] ∼ ∼ 0 0 χ χ 1 1 Significant increase of the sensitive regions in the paramter space, especially in direction towards kinematic border ( m ˜ t = m ˜ χ 0 + m t ) For large m ˜ t : Training with MC-data with m ˜ t ≥ 1 TeV only more promising 07/20/2017 Jonas Graw - ˜ t → 0 ℓ - MVA 10/12
Comparing against other MVA methods Training for model with m ˜ t = 1 TeV and m ˜ χ 0 = 100 GeV Support Vector Machine (SVM) BDT TMVA overtraining check for classifier: BDT TMVA overtraining check for classifier: SVM dx dx 8 Signal (test sample) Signal (training sample) Signal (test sample) Signal (training sample) 6 / / Background (test sample) Background (training sample) Background (test sample) Background (training sample) (1/N) dN (1/N) dN 7 Kolmogorov-Smirnov test: signal (background) probability = 0 (0.084) Kolmogorov-Smirnov test: signal (background) probability = 0.507 (0.833) 5 6 4 5 U/O-flow (S,B): (0.0, 0.0)% / (0.0, 0.0)% U/O-flow (S,B): (0.0, 0.0)% / (0.0, 0.0)% 4 3 3 2 2 1 1 0 0 − − − − 0.4 0.3 0.2 0.1 0 0.1 0.2 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 BDT response SVM response ATLAS Internal ATLAS Internal ∼ ∼ -1 → ~ ~ ~ → χ 0 → -1 → ~ ~ ~ → χ 0 → s = 13 TeV, 36.1 fb pp t t ; t +t; t b+W s = 13 TeV, 36.1 fb pp t t ; t +t; t b+W 1 6 6 1 6 6 4-th parameter 600 Significance Significance 4-th parameter 600 Significance Significance MVA MVA 5 5 5 5 500 500 BDT>0.17 SVM>0.94 4 4 4 4 400 400 3 3 3 3 300 300 2 2 2 2 200 200 100 1 1 100 1 1 0 0 0 0 200 400 600 800 1000 1200 200 400 600 800 1000 1200 m [GeV] m [GeV] ∼ ∼ 0 0 χ χ 1 1 BDT achieves better significances 07/20/2017 Jonas Graw - ˜ t → 0 ℓ - MVA 11/12
Summary & Outlook Started to look at MVA methods for increasing sensitivity for ˜ t → 0 ℓ analysis Training with several signal samples seems reasonable Current Steps: Checking dependency on BDT settings and BDT input variables Look at neural networks 07/20/2017 Jonas Graw - ˜ t → 0 ℓ - MVA 12/12
BACKUP 07/20/2017 Jonas Graw - ˜ t → 0 ℓ - MVA
Training for model with m ˜ t = 1 TeV and m ˜ χ 0 = 1 GeV ATLAS Internal ∼ ~ ~ ~ → → χ 0 → -1 s = 13 TeV, 36.1 fb pp t t ; t +t; t b+W 1 6 6 4-th parameter Significance Significance 600 MVA 5 5 500 BDT>0.14 4 4 400 3 3 300 2 2 200 1 1 100 0 0 200 400 600 800 1000 1200 m [GeV] ∼ 0 χ 1 07/20/2017 Jonas Graw - ˜ t → 0 ℓ - MVA B/U 1
BDT response After training: Compare true values y i with forecast s i . w i : weights, with ∑ i w i = 1 error fraction: e = ∑ i w i 1 s i ̸ = y i ( 1 − e ) boost factor α = β · ln with β constant, usually β ∈ [0 , 1] e New weights: w i → w i · exp ( ) α · 1 s i ̸ = y i ( ) ∑ s i m α m BDT response of event i : r i = , where m ∑ m α m { 1 if tree m predicts signal ( s i ) m = if tree m predicts background − 1 07/20/2017 Jonas Graw - ˜ t → 0 ℓ - MVA B/U 2
Recommend
More recommend