dualities in and from machine learning
play

Dualities in and from Machine Learning Sven Krippendorf Deep - PowerPoint PPT Presentation

Dualities in and from Machine Learning Sven Krippendorf Deep Learning and Physics 2019 Yukawa Institute for Theoretical Physics, Kyoto October 31 st 2019 1 Spend 2 more slides on Current ML applications in high energy 2


  1. Dualities in and from Machine Learning Sven Krippendorf 
 Deep Learning and Physics 2019 
 Yukawa Institute for Theoretical Physics, Kyoto October 31 st 2019 � 1

  2. Spend 2 more slides on Current ML applications 
 in high energy � 2

  3. 
 
 
 Improving sensitivity Day, SK 1907.07642 Cluster B • ML-techniques heavily used in experimental bounds. AGN • Brief example: Improving sensitivity for ultra-light axion- axion 
 like particles, compared to previous bounds. conversion • ML algorithms good at classification. Detecting particles is a classification problem. Our classifiers: 
 | γ ( E ) i ! α | γ ( E ) i + β | a ( E ) i Spectrum 0 (no axions) Classifier γ 1 (axions) galactic absorption • Training: Simulate data with and without axions for appropriate X-ray sources • Bounds: Compare fake & real data performance • Algorithms (sklearn): decision trees, boosted decision trees, random forests, Gaussian Naive Bayes, Gaussian Process classifier, SVM, … X-ray telescope Previous bounds: 
 NGC1275: 1605.01043, Other sources: 1704.05256, Athena bounds: 1707.00176 with: Conlon, Day, Jennings; Berg, Muia, Powell, Rummel � 3

  4. NGC1275: 1605.01043 Constraining ALPs Other sources: 1704.05256 Athena bounds: 1707.00176 with: Conlon, Day, Jennings, Rummel; Berg, Muia, Powell • Photon-axion interconversion in background magnetic fields: 
 g a γγ 4 aF ˜ ℒ ⊃ − F = g a γγ E ⋅ B • One interesting parameter region can be obtained for photons from sources in and behind galaxy cluster magnetic fields. 
 ✓ B ⊥ ⌘ ✓ 10 − 3 cm − 3 ◆ ✓ 10 11 GeV ◆ ⇣ ◆ ω Θ = 0 . 28 1 µ G 1 keV n e M Θ 2 P γ → a = 1 1 + Θ 2 sin 2 ⇣ ⌘ p ∆ 1 + Θ 2 ⌘ ✓ ◆ ✓ 1keV ◆ n e L ⇣ ∆ = 0 . 54 2 10 − 3 cm − 3 10kpc ω sweet spot P γ → a 0.25 no oscillations 0.20 0.15 too rapid oscillations 0.10 suppressed conversion 0.05 ω [ keV ] 0.010 0.100 1 10 100 � 4

  5. Improving sensitivity Picture: spectral distortui Picture: bounds overview • Data: Chandra X-ray observations of bright point sources Table: our results (AGN, Quasar) in or behind galaxy clusters • Bounds for ALPs with m<10 -12 eV due to absence of characteristic spectral modulations caused by interconversion between photons and axions in cluster background magnetic field DTC GaussianNB QDA RFC Previous AB 10 -6 x 10 -12 GeV -1 Telescopes VMB LSW (OSQAR) (PVLAS) C 10 -8 A1367 (resid.) 1.9 - - - - 2.4 Axion Coupling |G A γγ | (GeV -1 ) Helioscopes (CAST) 10 -10 A1367 (up-resid.) 2.0 - 1.9 - - 2.4 Horizontal Branch Stars SN 1987A HESS NGC1275 - Chandra Fermi-LAT 10 -12 A1795 Quasar (resid.) - - 1.7 - 1.4 >10.0 NGC1275 - Athena Haloscopes (ADMX) 10 -14 KSVZ A1795 Quasar (up-resid.) - - - - - >10.0 DFSZ 10 -16 A1795 Sy1 (resid.) 1.0 0.8 1.2 1.1 0.7 1.5 10 -18 10 -30 10 -25 10 -20 10 -15 10 -10 10 -5 10 0 A1795 Sy1 (up-resid.) 1.1 1.1 1.1 1.0 0.8 1.5 Axion Mass m A (eV) � 5

  6. Many talks remove slides ML for the string landscape? many talks: Halverson, Ruehle, Shiu � 6

  7. Other avenues? � 7

  8. “Don’t ask what ML can do for you, ask what you can do for ML.” – Gary Shiu � 8

  9. Physics ⋂ ML � 9

  10. Dualities Betzler, SK: 191x.xxxxx � 10

  11. 
 
 The problem • Obtain correlators at high(er) accuracy: 
 think of this as properties of your data � ⟨ f ( ϕ i ) ⟩ • In physics (incl. condensed matter, holography, string/field theory), we often use clever data representations to evaluate correlators. • Multiple representations can be useful ➔ dualities • Dual representations are very good representations to evaluate certain correlation functions (mapping strongly coupled data products to weakly coupled data products). • Examples … � 11

  12. How are dualities useful in practice? 
 aka connecting physics questions to data questions � 12

  13. 
 
 
 Examples: dual representations • Discrete Fourier transformation: 
 n x k = 1 ∑ p j e 2 π ijk / n n j =1 n ∑ x j e − 2 π ijk / n p k = j =1 • Is there a signal under the noise? � 13

  14. Examples: dual representations • 2D Ising model: high - low temperature self-duality Original Dual H = − J ∑ σ i σ j H = − J ∑ s i s j T critical ⟨ i , j ⟩ ⟨ i , j ⟩ Z = ∑ e − ˜ Z = ∑ e − β H ( s ) β H ( σ ) 1 β = − 1 ˜ β = 2 log tanh β k B T Ordered rep. ↔ Disordered rep. Position space? Momentum space? Krammers, Wannier 1941; Onsager 1943; review: Savit 1980 � 14

  15. Which data problem? • Some correlation function which is easier evaluated on dual variables. 
 ⟨ σ i σ j ⟩ , ⟨ E ( σ ) ⟩ , ⟨ M ( σ ) ⟩ • Can we classify the temperature for low-temperature configurations? Which temperature is a sample drawn from (at low temperatures)? They look rather similar. How about in the dual rep.? Replace Images � 15

  16. 
 
 
 
 
 
 Data question on Ising • But at the dual temperatures, our data takes a di ff erent shape: 
 Duality • It is easier to classify temperature of a low-temperature configuration in the dual representation … • How come? P ( σ ) = e ˜ E / ˜ P ( s ) = e E / T T , Z Z ⟨Δ E ⟩ ≪ ⟨Δ ˜ Δ T ≪ Δ ˜ E ⟩ T � 16

  17. 
 
 
 
 
 
 
 
 Data question on Ising • Let’s look at the overlap of energy distributions in finite size samples 
 Original variables Let’s check for performance. Dual variables � 17

  18. Change figures Ising: simple network • Let’s confirm this at simple networks • Side remark: way outperforming standard sklearn classifiers � 18

  19. 
 
 Example: dual representations • 1D Ising with multiple spin interactions: 
 N − n +1 n − 1 N normal: � (here: � ) 
 ∑ ∏ ∑ B = 0 H ( s ) = − J s k + l − B s k k =1 l =0 k =1 N − n +1 n − 1 dual: � where � ∑ ∏ H ( σ ) = − J σ k σ k = s k + l k =1 l =0 s N = 10 + + n = 3 σ + + Ghost spins (fixed value) • Two data questions: 1) energy of spin configuration 
 2) metastable configuration cf. 1605.05199 � 19

  20. 
 
 
 
 
 
 
 
 Example: dual representations • Two data questions: 1) energy of spin configuration 
 2) metastable configuration 
 - - + - - - + - - + s N = 10 + + n = 3 + + + - + + + + - + σ + + Energy metastable Add evaluation of metastability on dual and normal variables stable configuration cf. 1605.05199 � 20

  21. Example: dual representations • Performance on Normal variables metastability classification n = 4 n = 5 n = 8 n = 9 n = 12 6 · 10 2 0 . 9113 0 . 8688 0 . 8788 0 . 8813 0 . 8803 (single hidden layer) 3 · 10 3 0 . 9243 0 . 9215 0 . 9223 0 . 9295 − • Deeper networks or CNNs 9 . 5 · 10 3 0 . 9424 0 . 9475 0 . 9739 − − perform better to a certain Dual variables degree but at large N or n n = 4 n = 5 n = 8 n = 9 n = 12 6 · 10 2 0 . 9911 0 . 9783 0 . 9819 0 . 9855 0 . 9909 show the same feature. 3 · 10 3 0 . 9958 0 . 9977 0 . 9994 1 . 0000 − 9 . 5 · 10 3 1 . 0000 1 . 0000 1 . 0000 − − 3000 training 
 600 training 
 samples samples (b) 3000 training samples. � 21

  22. Upshot: dual representations simplify answering certain data questions 
 (i.e. simple networks sufficient) � 22

  23. Why interesting for ML? � 23

  24. 
 
 
 
 
 
 
 
 Why interesting for ML? • Finding good representations which allow to answer the data question is hard (if not impossible). 
 Normal Frame Dual Frame Data Question ✘ ✔ Neural 
 Neural 
 Neural 
 Network Network Network Input Data: 
 Output: Answer 
 Output Normal Frame to Data Question Dual Representation • In this talk we use physics examples. A generalisation to other data products/questions would be interesting. Here, the data question on the dual network can be addressed with very simple networks. � 24

  25. 
 
 
 
 
 
 
 DFT: simple network • Supervised learning task (binary classification): 
 N discrete values {(( x R , x I ), y )} {(( p R , p I ), y )} noise + signal noise y = 1 y = 0 • Network 
 Layer Shape Parameters For this network Conv1D (2000,2) 4 classification works in Activation (2000,2) - momentum space, 
 Dense 1 4001 but not in position space. Activation 1 - � 25

  26. 
 
 
 
 
 Utilising dual representation • Goal: improve performance on position space. • Deeper network? Can do the job in principle 
 [DFT can be implemented with a single dense layer] • However finding it dynamically is `impossible’ with standard optimisers, initialisations, and regularisers. 
 Layer Shape Parameters Dense (2000,2) 16000000 Random DFT Conv1D (2000,2) 4 starting point Activation (2000,2) - Dense 1 4001 Activation 1 - � 26

  27. Can we improve the situation by favouring dual representations? � 27

  28. How to utilise dualities? • Learn dual transformations when explicitly known (trivial) and use them as intermediate step in architecture. • Enforce dual representations via feature separation. • When not explicitly known, we can match features of distributions (example 2D Ising high-low-temperature duality) • Re-obtaining “dualities” (1D Ising) by demanding good performance on medium-hard correlation in intermediate layer where no loss of information is present (beyond known dualities). � 28

Recommend


More recommend