Uncovering latent jet substructure Barry M . Dillon Jozef Stefan Institute , Ljubljana , Slovenia Based on: hep - ph/ 1904.04200 BMD , Darius A . Faroughy , Jernej F . Kamenik Dark Machines , Trieste , April 11 th 2019
Overview • Goal: Build an unsupervised ML tagger that can be used in new physics searches at colliders • How? Latent Dirichlet Allocation (LDA) See talks: ‘Probabilistic programming’: Rajat Mani Thomas ‘Probabilistic Programming and Inference in Particle Physics’: Atılım Güne ş Baydin • Why? Model independence , data - driven , anomaly detection , you can see what the machine learned
Jets and substructure Events at colliders produce collimated bunch of hadrons initiated by some underlaying event: π + π − hadronization hadrons are K + clustered into coloured composite seed particle objects , called jets a jet is defined by the algorithm you π 0 used to cluster the particles
Jets and substructure Taken from: M . Cacciari , G . P . Salam , G . Soyez ( 2008 ) d ij = ∆ R 2 Cambridge ij R 2 , d iB = 1 - Aachen 1 - compute for each particle in the final state d ij 2 - if the minimum is declare particle a jet , and remove it from the list d iB i 3 - if the minimum is combine particles and and go back to step 1 d ij i j 4 - repeat until there are no particles le fu
Jets and substructure What was the initial process that led to the jet production? subjet π + π − hadronization K + jet π 0
Jets and substructure What was the initial process that led to the jet production? π + study the clustering π − history of the hadronization jet K + the clustering history contains information on how the jet Jet substructure J . M . Butterworth , A . R . Davison , M . formed Rubin , G . P . Salam Un - cluster the jet by ( 2008 ) opening subjets one by one π 0 j 0 → j 1 j 2 , m j 1 > m j 2
Jets and substructure Useful substructure observables: min( p 2 T, 1 , p 2 T, 2 ) m j 0 , m j 1 , m j 2 n o ∆ R 2 o j 0 = , , 1 , 2 m 2 m j 0 m j 1 j 0 π + study the subjet mass clustering π − mass drop history of the hadronization jet K + the clustering history contains information on how the jet Jet substructure J . M . Butterworth , A . R . Davison , M . formed Rubin , G . P . Salam Un - cluster the jet by ( 2008 ) opening subjets one by one π 0 j 0 → j 1 j 2 , m j 1 > m j 2
Top tagging Top tagging: ‘was this jet seeded by a top - quark or not?’ Features: subjet mass t ¯ Signal : top jets from production in the SM t m j 0 ∼ m t (175GeV) pp → t ¯ ( t → W + b ) m j 0 ∼ m W (80GeV) t → jj, mass drop m j 1 ∼ m W Features: ∼ 0 . 45 subjet mass m j 0 m t Background : QCD di - jets smoothly decaying distribution , peaked at zero mass drop pp → gg → jj smoothly decaying distribution , peaked at one D . E . Kaplan , K . Tagging tops manually (e . g . the Johns - Hopkins (JH) top - tagger) Rehermann , M . D . Schwartz and B . 1 - cluster with C/A and then uncluster Tweedie ( 2008 ) 2 - cuts are applied manually to filter out jets which have top - like features
Latent Dirichlet Allocation D . M . Blei , A . Y . Ng , M . I . Jordan , J . La ff erty ( 2003 ) Characterising documents as a set of ‘topics’ or ‘themes’ LDA is based on a generative process for writing documents Assumptions: short distance physics is represented by a set of ‘themes’ A ‘theme’ is a distribution over substructure features a jet , or event , is represented by a list (document) of features each jet , or event , can have di ff erent proportions of each theme A mixed sample of jets or events can be parameterised by a set of ‘latent’ hyper - parameters: # themes (finite) theme concentration parameters α i i = 1 , . . . , K β ij theme - feature matrix # features j = 1 , . . . , N f
Latent Dirichlet Allocation D . M . Blei , A . Y . Ng , M . I . Jordan , J . La ff erty ( 2003 ) The LDA process for generating jets or events: theme - feature matrix β Dir( α ) theme concentration parameters
Latent Dirichlet Allocation D . M . Blei , A . Y . Ng , M . I . Jordan , J . La ff erty ( 2003 ) The LDA process for generating jets or events: the Dirichlet is a simplex from which β we will draw the theme proportions for each document it is a prior that allows us to increase the probability of certain theme proportions to be selected Dir( α )
Latent Dirichlet Allocation D . M . Blei , A . Y . Ng , M . I . Jordan , J . La ff erty ( 2003 ) The LDA process for generating jets or events: from the Dirichlet , we draw the theme β proportions for a single jet or event Dir( α ) ω jet , or event
Latent Dirichlet Allocation D . M . Blei , A . Y . Ng , M . I . Jordan , J . La ff erty ( 2003 ) The LDA process for generating jets or events: to choose a feature for the jet or β event , we first draw a theme from the theme proportions Dir( α ) t ω feature jet , or event
Latent Dirichlet Allocation D . M . Blei , A . Y . Ng , M . I . Jordan , J . La ff erty ( 2003 ) The LDA process for generating jets or events: given the theme and the theme - β feature matrix , a feature is chosen and added to the jet or event Dir( α ) feature t ω feature jet , or event
Latent Dirichlet Allocation D . M . Blei , A . Y . Ng , M . I . Jordan , J . La ff erty ( 2003 ) The LDA process for generating jets or events: this process is repeated for each β feature , and each jet or event , to be generated Dir( α ) feature t ω n f = 1 , . . . , N f feature n j,e = 1 , . . . , N j,e jet , or event
Latent Dirichlet Allocation D . M . Blei , A . Y . Ng , M . I . Jordan , J . La ff erty ( 2003 ) The probability of a jet being generated , given the choice of latent parameters , is X ! Z Y p ( j | α , β ) = p ( ω | α ) p ( t | ω ) p ( f | t, β ) ω t f ∈ j The goal: to infer the latent parameters in the theme - feature matrix , by analysing a collection of documents How? Variational Bayesian methods , implemented using the gensim so fu ware R . Rehurek , P . Sojka ( 2010 ) M . D . Ho ff man , D . M . Blei , F . Bach ( 2010 )
Latent Dirichlet Allocation D . M . Blei , A . Y . Ng , M . I . Jordan , J . La ff erty ( 2003 ) The probability of a jet being generated , given the choice of latent parameters , is X ! Z Y p ( j | α , β ) = p ( ω | α ) p ( t | ω ) p ( f | t, β ) ω t f ∈ j The goal: to infer the latent parameters in the theme - feature matrix , by analysing a collection of documents How? Variational Bayesian methods , implemented using the gensim so fu ware R . Rehurek , P . Sojka ( 2010 ) M . D . Ho ff man , D . M . Blei , F . Bach ( 2010 ) Given a collection of jets or events , we can choose a number of themes , and , α i then the LDA algorithm estimates the latent . β ij We can disentangle short distance physics based on their features in the jet substructure , in an unsupervised way .
Latent Dirichlet Allocation D . M . Blei , A . Y . Ng , M . I . Jordan , J . La ff erty ( 2003 ) Useful substructure observables: min( p 2 T, 1 , p 2 T, 2 ) m j 0 , m j 1 , m j 2 n o ∆ R 2 o j 0 = , , 1 , 2 m 2 m j 0 m j 1 j 0 this is a feature in the substructure 1 - un - cluster the jet , calculate the above observables at each stage 2 - bin the observables , and form a feature for each stage , from the observables 3 - form a ‘document’ describing each jet , and a mixed sample of di ff erent jets 4 - analyse these documents using LDA - find the ‘themes’ describing the physics 5 - use inference to identify themes in new jets - identify the origin of the jet
LDA top tagging For our study: 1 - train LDA on mixed samples: S/B = 1 , 1 / 9 , 1 / 99 2 - p T ∈ [350 , 450] GeV ∼ 8 × 10 4 3 - sample size: 4 - in accordance with S/B: α = [0 . 5 , 0 . 5] , [0 . 9 , 0 . 1] , [0 . 99 , 0 , 01]
LDA top tagging p ( m j 0 | t ) 1.0 50 100 150 200 250 50 100 150 200 250 0.8 0.6 m j 1 /m j 0 0.4 0.2 theme 1 theme 2 0 0.008 0.016 0 0.006 0.012 50 100 150 200 250 50 100 150 200 250 m j 0 [GeV] m j 0 [GeV]
LDA top tagging p ( m j 0 | t ) 1.0 50 100 150 200 250 50 100 150 200 250 0.8 0.6 m j 1 /m j 0 0.4 0.2 QCD jet top jet 0 0.008 0.016 0 0.006 0.012 50 100 150 200 250 50 100 150 200 250 m j 0 [GeV] m j 0 [GeV]
LDA top tagging Measure performance with ROC curves: G . Kasieczka , T . Plehn , M . Russell , T . results compared to JH top tagger (purple star) and DeepTop Schell ( 2017 ) results have been k - folded , k =10, to estimate robustness
LDA new physics tagging pp → W 0 → φ W → WWW Now for a NP process: m W 0 = 3 TeV , m φ = 400 GeV p ( m j 0 | t ) 1.0 50 100 150 200 250 300 350 400 450 0.8 S/B = 0 . 011 theme 2 α = [0 . 989 , 0 . 011] m j 1 /m j 0 0.6 0.4 theme 1 0.2 0 0.008 0.016 0.24 0 0.01 0.02 0.03 50 100 150 200 250 300 350 400 450 50 100 150 200 250 300 350 400 450 500 m j 0 [GeV] m j 0 [GeV]
Recommend
More recommend