Towards Data-Driven Particle Physics Classifiers Deep Learning in the Natural Sciences University of Hamburg Eric M. Metodiev Center for Theoretical Physics Massachusetts Institute of Technology Based on work with Patrick Komiske, Benjamin Nachman, Matthew Schwartz, and Jesse Thaler [1708.02949] [1801.10158] [1802.00008] [1809.01140] March 1, 2019 1
Outline Classification at Colliders Training on Data Disentangling Categories Eric M. Metodiev, MIT Data-Driven Particle Physics Classifiers 2
Outline Classification at Colliders Training on Data Disentangling Categories Eric M. Metodiev, MIT Data-Driven Particle Physics Classifiers 3
Jet Classification u, d, s g c b W/Z t H Or ? ? ? New ? ? ? Physics Eric M. Metodiev, MIT Data-Driven Particle Physics Classifiers 4
Jet Classification quark gluon c b W/Z t H Or ? ? ? New ? ? ? Physics Eric M. Metodiev, MIT Data-Driven Particle Physics Classifiers 5
Jet Classification quark gluon § § c New Physics b signal quark jets W/Z QCD background t gluon jets H Or ? ? ? New ? ? ? Physics Eric M. Metodiev, MIT Data-Driven Particle Physics Classifiers 6
Jet Classification quark gluon § § c New Physics b signal quark jets W/Z QCD background t gluon jets § § 𝐷 𝑟 = 4/3 H 𝐷 = 3 gluon jets are “twice as Or wide” as ? ? ? New quark jets ? ? ? Physics Eric M. Metodiev, MIT Data-Driven Particle Physics Classifiers 7
Machine Learning with Jets … Images Observables Sequences Point Clouds [M. Andrews, et al. ,1902.08276] e.g. e.g. e.g. [T. Cheng, 1711.02633] [L. de Oliveira, et al., 1511.05190] [1902.08276] [G. Louppe, et al. , 1702.00748] [K. Datta, A. Larkoski, 1704.08249] [P.T. Komiske, EMM, M.D. Schwartz, 1612.01551] [P.T. Komiske, EMM, J. Thaler, 1712.07124] [G. Kasieczka, N. Kiefer, T. Plehn, J. Thompson, 1812.09223] All supervised classification methods require training data. Impossible to isolate pure samples of quark jets and gluon jets. Often rely on simulation, which is sensitive to mismodeling. Eric M. Metodiev, MIT Data-Driven Particle Physics Classifiers 8
Simulation vs. Data Simple two-feature quark vs. gluon jet classifier using simulation and data . Simulation Data “number of particles in the jet” “number of particles in the jet” “width of the jet” “width of the jet” Very different! [ATLAS Collaboration, 1405.6583] Is it possible to train classifiers on data? Eric M. Metodiev, MIT Data-Driven Particle Physics Classifiers 9
Outline Classification at Colliders Classifying jets based on their originating particles. Training on Data Disentangling Categories Eric M. Metodiev, MIT Data-Driven Particle Physics Classifiers 10
Outline Classification at Colliders Classifying jets based on their originating particles. Training on Data Disentangling Categories Eric M. Metodiev, MIT Data-Driven Particle Physics Classifiers 11
Training on pure samples: Cat vs. Dog jets Cat Jets Dog Jets vs . 1 0 Classifier Eric M. Metodiev, MIT Data-Driven Particle Physics Classifiers 12
Training on mixed samples: Cat vs. Dog jets Cat-enriched Jets Dog-enriched Jets vs . 1 0 Classifier This defines an equivalent classifier to the pure case! Eric M. Metodiev, MIT Data-Driven Particle Physics Classifiers 13
Classification without labels (CWoLa) cat 𝑀 cat cat ) 𝑔 𝒚 + (1 − 𝑔 cat 𝑞 cat 𝒚 + 1 − 𝑔 cat 𝑞 dog 𝒚 1 1 𝑦 = 𝑞 𝑁 1 𝒚 𝑔 dog 1 1 𝑀 𝑁 1 𝑞 𝑁 2 𝒚 = cat 𝑞 dog 𝒚 = cat 𝑞 cat 𝒚 + 1 − 𝑔 cat 𝑀 cat cat 𝑔 𝑔 𝒚 + 1 − 𝑔 𝑁 2 2 2 2 2 dog Optimal cat vs. Optimal mixed is a monotonic rescaling of dog classifier sample classifier Hence they define equivalent classifiers. [EMM, B. Nachman, J. Thaler, 1708.02949] [P .T. Komiske, EMM, B. Nachman, M.D. Schwartz, 1801.10158] see also [L. Dery, B. Nachman, F. Rubbo, A. Schwartzman, 1702.00414] [T. Cohen, M. Freytsis, B. Ostdiek, 1706.09451] Eric M. Metodiev, MIT Data-Driven Particle Physics Classifiers 14
Training on pure samples: Quark vs. Gluon jets Gluon Jets Quark Jets vs . 1 0 Classifier Eric M. Metodiev, MIT Data-Driven Particle Physics Classifiers 15
Training on mixed samples: Quark vs. Gluon jets dijets Z + jet Quark-enriched Jets Gluon-enriched jets vs . 1 0 Classifier This defines an equivalent classifier to the pure case! Eric M. Metodiev, MIT Data-Driven Particle Physics Classifiers 16
Performance Compare 80-20% mixtures to pure samples Vary the mixture purity observables Expert Can train on mixed samples! Works for very impure mixtures! [EMM, B. Nachman, J. Thaler, 1708.02949] Also works for convolutional neural networks and jet images. [P .T. Komiske, EMM, B. Nachman, M.D. Schwartz, 1801.10158] Eric M. Metodiev, MIT Data-Driven Particle Physics Classifiers 17
Outline Classification at Colliders Classifying jets based on their originating particles. Training on Data Weak supervision with mixed jet samples. Disentangling Categories Eric M. Metodiev, MIT Data-Driven Particle Physics Classifiers 18
Outline Classification at Colliders Classifying jets based on their originating particles. Training on Data Weak supervision with mixed jet samples. Disentangling Categories Eric M. Metodiev, MIT Data-Driven Particle Physics Classifiers 19
What do we even mean by quark and gluon jets? Quarks are color triplets. Gluons are color octets. Hadrons in jets are color singlets. No unambiguous definition of quark and gluon jets. [P . Gras, et al. , 1704.03878] Various definitions of increasing verbosity We obtained a quark vs. gluon jet classifier without a definition… Operational data-driven definition of quark and gluon jets Eric M. Metodiev, MIT Data-Driven Particle Physics Classifiers 20
Topic Modeling and Blind Source Separation [Image: D. Blei] [Image: J. Bobin] Eric M. Metodiev, MIT Data-Driven Particle Physics Classifiers 21
Disentangling Categories Let’s model cats and dogs as random animal noise producers. Meow Growl Growl Woof Howl Purr Eric M. Metodiev, MIT Data-Driven Particle Physics Classifiers 22
Disentangling Categories Listen to the animal noises from two different pet stores. Meow Growl Growl Woof Purr Howl Store 𝐵 Store 𝐵 𝑂 "Meow" Store 𝐶 = 𝑔 Store 𝐵 Store 𝐵 𝑂 "Bark" Store 𝐶 = 1 − 𝑔 Cat Cat Store 𝐶 𝑂 "Meow" 𝑔 Store 𝐶 𝑂 "Bark" 1 − 𝑔 Cat Cat Eric M. Metodiev, MIT Data-Driven Particle Physics Classifiers 23
Disentangling Categories Disentangle cat and dog vocabularies from the animal noises at pet stores. Pure cat and dog noise Meow Growl Growl Woof “phase space” is key Purr Howl Store 𝐵 Store 𝐵 𝑂 "Meow" Store 𝐶 = 𝑔 Store 𝐵 Store 𝐵 𝑂 "Bark" Store 𝐶 = 1 − 𝑔 Cat Cat Store 𝐶 𝑂 "Meow" 𝑔 Store 𝐶 𝑂 "Bark" 1 − 𝑔 Cat Cat Eric M. Metodiev, MIT Data-Driven Particle Physics Classifiers 24
Disentangling Categories An operational definition of quark and gluon jets. [EMM, J. Thaler, 1802.00008] [P .T. Komiske, EMM, J. Thaler, 1809.01140] 𝑞 𝐵 𝒚 𝑞 𝐶 𝒚 𝜆 AB ≡ min 𝜆 BA ≡ min 𝑞 𝐶 𝒚 𝒚 𝑞 𝐵 𝒚 𝒚 𝑟 𝑟 1−𝑔 𝑔 𝐵 = 𝐶 = 𝑟 𝑟 1−𝑔 𝑔 𝐶 𝐵 Number of particles in the jet Eric M. Metodiev, MIT Data-Driven Particle Physics Classifiers 25
Disentangling Categories An operational definition of quark and gluon jets. [EMM, J. Thaler, 1802.00008] [P .T. Komiske, EMM, J. Thaler, 1809.01140] 𝑞 𝐵 𝒚 𝑞 𝐶 𝒚 𝜆 AB ≡ min 𝜆 BA ≡ min 𝑞 𝐶 𝒚 𝒚 𝑞 𝐵 𝒚 𝒚 𝑟 𝑟 1−𝑔 𝑔 𝐵 = 𝐶 = 𝑟 𝑟 1−𝑔 𝑔 𝐶 𝐵 Number of particles in the jet With reducibility factors 𝜆 AB and 𝜆 BA , solve for the quark and gluon distributions: 𝑞 quark 𝒚 = 𝑞 𝐵 𝒚 −𝜆 AB 𝑞 𝐶 𝒚 𝑞 gluon 𝒚 = 𝑞 𝐶 𝒚 −𝜆 BA 𝑞 𝐵 𝒚 1−𝜆 AB 1−𝜆 BA Can also use machine learning to determine the feature space. Eric M. Metodiev, MIT Data-Driven Particle Physics Classifiers 26
Collider data as mixtures of jet types Theoretical and experimental definition of jet categories. Theoretically tractable: calculate reducibility factors from perturbative QCD for certain observables. Can use the fractions to calibrate ROC curves. Allows for any observable distributions to be extracted for quark and gluon jets separately. See extra slides for more. Eric M. Metodiev, MIT Data-Driven Particle Physics Classifiers 27
Summary Classification at Colliders Classifying jets based on their originating particles. Training on Data Weak supervision with mixed jet samples. Disentangling Categories Topic modeling to define data-driven jet categories. Eric M. Metodiev, MIT Data-Driven Particle Physics Classifiers 28
The End Thank you! Eric M. Metodiev, MIT Data-Driven Particle Physics Classifiers 29
Extra Slides Eric M. Metodiev, MIT Data-Driven Particle Physics Classifiers 30
Recommend
More recommend