Dualities in and from Machine Learning Sven Krippendorf Deep Learning and Physics 2019 Yukawa Institute for Theoretical Physics, Kyoto October 31 st 2019 � 1
Spend 2 more slides on Current ML applications in high energy � 2
Improving sensitivity Day, SK 1907.07642 Cluster B • ML-techniques heavily used in experimental bounds. AGN • Brief example: Improving sensitivity for ultra-light axion- axion like particles, compared to previous bounds. conversion • ML algorithms good at classification. Detecting particles is a classification problem. Our classifiers: | γ ( E ) i ! α | γ ( E ) i + β | a ( E ) i Spectrum 0 (no axions) Classifier γ 1 (axions) galactic absorption • Training: Simulate data with and without axions for appropriate X-ray sources • Bounds: Compare fake & real data performance • Algorithms (sklearn): decision trees, boosted decision trees, random forests, Gaussian Naive Bayes, Gaussian Process classifier, SVM, … X-ray telescope Previous bounds: NGC1275: 1605.01043, Other sources: 1704.05256, Athena bounds: 1707.00176 with: Conlon, Day, Jennings; Berg, Muia, Powell, Rummel � 3
NGC1275: 1605.01043 Constraining ALPs Other sources: 1704.05256 Athena bounds: 1707.00176 with: Conlon, Day, Jennings, Rummel; Berg, Muia, Powell • Photon-axion interconversion in background magnetic fields: g a γγ 4 aF ˜ ℒ ⊃ − F = g a γγ E ⋅ B • One interesting parameter region can be obtained for photons from sources in and behind galaxy cluster magnetic fields. ✓ B ⊥ ⌘ ✓ 10 − 3 cm − 3 ◆ ✓ 10 11 GeV ◆ ⇣ ◆ ω Θ = 0 . 28 1 µ G 1 keV n e M Θ 2 P γ → a = 1 1 + Θ 2 sin 2 ⇣ ⌘ p ∆ 1 + Θ 2 ⌘ ✓ ◆ ✓ 1keV ◆ n e L ⇣ ∆ = 0 . 54 2 10 − 3 cm − 3 10kpc ω sweet spot P γ → a 0.25 no oscillations 0.20 0.15 too rapid oscillations 0.10 suppressed conversion 0.05 ω [ keV ] 0.010 0.100 1 10 100 � 4
Improving sensitivity Picture: spectral distortui Picture: bounds overview • Data: Chandra X-ray observations of bright point sources Table: our results (AGN, Quasar) in or behind galaxy clusters • Bounds for ALPs with m<10 -12 eV due to absence of characteristic spectral modulations caused by interconversion between photons and axions in cluster background magnetic field DTC GaussianNB QDA RFC Previous AB 10 -6 x 10 -12 GeV -1 Telescopes VMB LSW (OSQAR) (PVLAS) C 10 -8 A1367 (resid.) 1.9 - - - - 2.4 Axion Coupling |G A γγ | (GeV -1 ) Helioscopes (CAST) 10 -10 A1367 (up-resid.) 2.0 - 1.9 - - 2.4 Horizontal Branch Stars SN 1987A HESS NGC1275 - Chandra Fermi-LAT 10 -12 A1795 Quasar (resid.) - - 1.7 - 1.4 >10.0 NGC1275 - Athena Haloscopes (ADMX) 10 -14 KSVZ A1795 Quasar (up-resid.) - - - - - >10.0 DFSZ 10 -16 A1795 Sy1 (resid.) 1.0 0.8 1.2 1.1 0.7 1.5 10 -18 10 -30 10 -25 10 -20 10 -15 10 -10 10 -5 10 0 A1795 Sy1 (up-resid.) 1.1 1.1 1.1 1.0 0.8 1.5 Axion Mass m A (eV) � 5
Many talks remove slides ML for the string landscape? many talks: Halverson, Ruehle, Shiu � 6
Other avenues? � 7
“Don’t ask what ML can do for you, ask what you can do for ML.” – Gary Shiu � 8
Physics ⋂ ML � 9
Dualities Betzler, SK: 191x.xxxxx � 10
The problem • Obtain correlators at high(er) accuracy: think of this as properties of your data � ⟨ f ( ϕ i ) ⟩ • In physics (incl. condensed matter, holography, string/field theory), we often use clever data representations to evaluate correlators. • Multiple representations can be useful ➔ dualities • Dual representations are very good representations to evaluate certain correlation functions (mapping strongly coupled data products to weakly coupled data products). • Examples … � 11
How are dualities useful in practice? aka connecting physics questions to data questions � 12
Examples: dual representations • Discrete Fourier transformation: n x k = 1 ∑ p j e 2 π ijk / n n j =1 n ∑ x j e − 2 π ijk / n p k = j =1 • Is there a signal under the noise? � 13
Examples: dual representations • 2D Ising model: high - low temperature self-duality Original Dual H = − J ∑ σ i σ j H = − J ∑ s i s j T critical ⟨ i , j ⟩ ⟨ i , j ⟩ Z = ∑ e − ˜ Z = ∑ e − β H ( s ) β H ( σ ) 1 β = − 1 ˜ β = 2 log tanh β k B T Ordered rep. ↔ Disordered rep. Position space? Momentum space? Krammers, Wannier 1941; Onsager 1943; review: Savit 1980 � 14
Which data problem? • Some correlation function which is easier evaluated on dual variables. ⟨ σ i σ j ⟩ , ⟨ E ( σ ) ⟩ , ⟨ M ( σ ) ⟩ • Can we classify the temperature for low-temperature configurations? Which temperature is a sample drawn from (at low temperatures)? They look rather similar. How about in the dual rep.? Replace Images � 15
Data question on Ising • But at the dual temperatures, our data takes a di ff erent shape: Duality • It is easier to classify temperature of a low-temperature configuration in the dual representation … • How come? P ( σ ) = e ˜ E / ˜ P ( s ) = e E / T T , Z Z ⟨Δ E ⟩ ≪ ⟨Δ ˜ Δ T ≪ Δ ˜ E ⟩ T � 16
Data question on Ising • Let’s look at the overlap of energy distributions in finite size samples Original variables Let’s check for performance. Dual variables � 17
Change figures Ising: simple network • Let’s confirm this at simple networks • Side remark: way outperforming standard sklearn classifiers � 18
Example: dual representations • 1D Ising with multiple spin interactions: N − n +1 n − 1 N normal: � (here: � ) ∑ ∏ ∑ B = 0 H ( s ) = − J s k + l − B s k k =1 l =0 k =1 N − n +1 n − 1 dual: � where � ∑ ∏ H ( σ ) = − J σ k σ k = s k + l k =1 l =0 s N = 10 + + n = 3 σ + + Ghost spins (fixed value) • Two data questions: 1) energy of spin configuration 2) metastable configuration cf. 1605.05199 � 19
Example: dual representations • Two data questions: 1) energy of spin configuration 2) metastable configuration - - + - - - + - - + s N = 10 + + n = 3 + + + - + + + + - + σ + + Energy metastable Add evaluation of metastability on dual and normal variables stable configuration cf. 1605.05199 � 20
Example: dual representations • Performance on Normal variables metastability classification n = 4 n = 5 n = 8 n = 9 n = 12 6 · 10 2 0 . 9113 0 . 8688 0 . 8788 0 . 8813 0 . 8803 (single hidden layer) 3 · 10 3 0 . 9243 0 . 9215 0 . 9223 0 . 9295 − • Deeper networks or CNNs 9 . 5 · 10 3 0 . 9424 0 . 9475 0 . 9739 − − perform better to a certain Dual variables degree but at large N or n n = 4 n = 5 n = 8 n = 9 n = 12 6 · 10 2 0 . 9911 0 . 9783 0 . 9819 0 . 9855 0 . 9909 show the same feature. 3 · 10 3 0 . 9958 0 . 9977 0 . 9994 1 . 0000 − 9 . 5 · 10 3 1 . 0000 1 . 0000 1 . 0000 − − 3000 training 600 training samples samples (b) 3000 training samples. � 21
Upshot: dual representations simplify answering certain data questions (i.e. simple networks sufficient) � 22
Why interesting for ML? � 23
Why interesting for ML? • Finding good representations which allow to answer the data question is hard (if not impossible). Normal Frame Dual Frame Data Question ✘ ✔ Neural Neural Neural Network Network Network Input Data: Output: Answer Output Normal Frame to Data Question Dual Representation • In this talk we use physics examples. A generalisation to other data products/questions would be interesting. Here, the data question on the dual network can be addressed with very simple networks. � 24
DFT: simple network • Supervised learning task (binary classification): N discrete values {(( x R , x I ), y )} {(( p R , p I ), y )} noise + signal noise y = 1 y = 0 • Network Layer Shape Parameters For this network Conv1D (2000,2) 4 classification works in Activation (2000,2) - momentum space, Dense 1 4001 but not in position space. Activation 1 - � 25
Utilising dual representation • Goal: improve performance on position space. • Deeper network? Can do the job in principle [DFT can be implemented with a single dense layer] • However finding it dynamically is `impossible’ with standard optimisers, initialisations, and regularisers. Layer Shape Parameters Dense (2000,2) 16000000 Random DFT Conv1D (2000,2) 4 starting point Activation (2000,2) - Dense 1 4001 Activation 1 - � 26
Can we improve the situation by favouring dual representations? � 27
How to utilise dualities? • Learn dual transformations when explicitly known (trivial) and use them as intermediate step in architecture. • Enforce dual representations via feature separation. • When not explicitly known, we can match features of distributions (example 2D Ising high-low-temperature duality) • Re-obtaining “dualities” (1D Ising) by demanding good performance on medium-hard correlation in intermediate layer where no loss of information is present (beyond known dualities). � 28
Recommend
More recommend