machine learning for lattice theories
play

Machine learning for lattice theories Real-world lattices 2 - PowerPoint PPT Presentation

Machine learning for lattice theories 1 Michael S. Albergo, Gurtej Kanwar , Phiala E. Shanahan Center for Theoretical Physics, MIT Deep Learning and Physics 1 Albergo, GK, Shanahan [PRD 100 (2019) 034515] Kyoto, Japan (November 1, 2019) Machine


  1. Machine learning for lattice theories 1 Michael S. Albergo, Gurtej Kanwar , Phiala E. Shanahan Center for Theoretical Physics, MIT Deep Learning and Physics 1 Albergo, GK, Shanahan [PRD 100 (2019) 034515] Kyoto, Japan (November 1, 2019)

  2. Machine learning for lattice theories Real-world lattices 2

  3. Machine learning for lattice theories Real-world Quantum field lattices theories 3

  4. Lattices in the real world ● Many materials have degrees of freedom pinned to a lattice structure [Ella Maru Studio] [Mazurenko et al. 1612.08436] 4

  5. Lattices in the real world ● Thermodynamics describes collective behavior of many degrees of freedom ● At some temperature T, Boltzmann distribution over microstates with 5

  6. Lattices in the real world ● Thermodynamics describes collective behavior of many degrees of freedom ● At some temperature T, Boltzmann distribution over microstates with Ising model has spin s = {↑,↓} per site, with energy penalty for neighboring spins differing. Typical microstates have patches of the same spin at some scale. [ "Ising Model and Metropolis Algorithm", MathWorks Physics Team ] 6

  7. Lattices in the real world ● Derive thermodynamic observables by averaging microstates Boltzmann Partition function distribution total energy total energy Helmholtz free correlation energy function . . . . . . 7

  8. Lattices for quantum field theories ● Quantum-mechanical properties also computed as statistical expectation values via Path Integral similar to partition function 8

  9. Lattice Quantum Chromodynamics ● Predictions relevant to interpret upcoming high-energy expts ○ Electron-Ion Collider will investigate detailed nuclear structure ○ Deep Underground Neutrino Expt requires nuclear cross bnl.gov/eic sections with neutrinos So far! Hong-Ye's talk for holography ideas dunescience.org ● Pen-and-paper methods fail, numerical evaluation of path integral req'd 9 [D. Leinweber, Visual QCD Archive]

  10. Computational approach to lattice theories ● Partition functions and path integrals are typically intractable analytically ● Numerical approximation by Monte Carlo sampling sample integral according to estimate observables ● Markov Chain Monte Carlo converges to samples from p( 𝜚 ) . . . approximately 10 distributed ~ p( 𝜚 )

  11. Machine learning for lattice theories Real-world Quantum field lattices theories lattice theories 11

  12. Machine learning for lattice theories Real-world Quantum field lattices theories lattice theories numerical methods – Thermodynamics – Collective phenomena – Spectrum – ... 12

  13. Machine learning for lattice theories Real-world Quantum field lattices theories lattice theories numerical methods ✘ hard to reach continuum limit / critical point in some theories – Thermodynamics – Collective phenomena – Spectrum – ... 13

  14. Machine learning for lattice theories Real-world Quantum field lattices theories lattice theories numerical methods + ML ✘ hard to reach continuum limit / critical point in some theories – Thermodynamics – Collective phenomena – Spectrum – ... 14

  15. Machine learning for lattice theories Real-world Quantum field lattices theories lattice theories Sampling using ML 2 numerical methods + ML ✘ 1 Critical slowing down – Thermodynamics – Collective phenomena – Spectrum – ... 3 Toy model results 15

  16. Difficulties with Markov Chain Monte Carlo ● Need to wait for "burn-in period" ● Configurations close to each other on the chain will be correlated, so must take many steps before drawing independent samples correlated . . . ~ p( 𝜚 ) ~ p( 𝜚 ) burn-in ● Burn-in and correlations both related to Markov chain "autocorrelation time" → smaller autocorrelation time means less computational cost! typically quantify with integrated autocorrelation time: 16

  17. Critical slowing down ● As params defining distribution approach criticality, for Markov chains using local updates, autocorrelation time diverges continuum limit Fitting 𝜐 int to power law behavior gives ● dynamical critical exponents ● Smaller dynamical critical exponent = cheaper, closer approach to criticality 17

  18. CSD in scalar theory used in this work: Critical slowing down ● As params defining distribution approach criticality, for Markov chains using local updates, autocorrelation time diverges continuum limit Fitting 𝜐 int to power law behavior gives ● dynamical critical exponents CSD also affects more realistic, complex models: CP N-1 ○ [Flynn, et al. 1504.06292] ○ O(N) [Frick, et al. PRL 63, 2613] ○ QCD [ALPHA collaboration 1009.5228] ● Smaller dynamical critical exponent = ○ ... cheaper, closer approach to criticality 18

  19. Machine learning for lattice theories Real-world Quantum field lattices theories lattice theories Sampling using ML 2 numerical methods + ML ✘ 1 Critical slowing down – Thermodynamics – Collective phenomena – Spectrum – ... 3 Toy model results 19

  20. Sampling lattice configs likely (log prob = 22) likely (log prob = 25) likely (log prob = 5) unlikely (log prob = -6107) 20

  21. Sampling lattice configs ≅ generating images likely likely [Karras, Lane, Aila / NVIDIA 1812.04948] likely likely unlikely unlikely 21

  22. Unique features of the lattice sampling problem ✓ Probability density computable (up to normalization) ✓ Many symmetries in physics ○ Lattice symmetries like translation , rotation , and reflection ○ Per-site symmetries like negation ✘ High-dimensional (10 9 to 10 12 ) samples ✘ Few (~1000) samples available ahead of time (fewer than # vars!) ○ Hard to use training paradigms that rely on existing samples from distribution 22

  23. Image generation via ML 1. Likelihood free methods: [Goodfellow et al. 1406.2661] E.g. Generative Adversarial Networks (GANs) ✘ Needs many real samples No associated likelihood for each produced sample ✘ 2. Autoencoding: [Kingma & Welling 1312.6114] [Shen & Liu 1612.05363] E.g. Variational Auto-Encoders (VAEs) ✔ Good for human interpretability Same issues as GANs ✘ 3. Normalizing flows: [Rezende & Mohamed 1505.05770] Flow-based models learn a change-of-variables that transforms a known distribution to the desired distribution ✔ Exactly known likelihood for each sample ✔ Can be trained with samples from itself 23

  24. Image generation via ML 1. Likelihood free methods: [Goodfellow et al. 1406.2661] E.g. Generative Adversarial Networks (GANs) ✘ Needs many real samples No associated likelihood for each produced sample ✘ 2. Autoencoding: [Kingma & Welling 1312.6114] [Shen & Liu 1612.05363] E.g. Variational Auto-Encoders (VAEs) ✔ Good for human interpretability Same issues as GANs ✘ 3. Normalizing flows: [Rezende & Mohamed 1505.05770] Flow-based models learn a change-of-variables that transforms a known distribution to the desired distribution ✔ Exactly known likelihood for each sample ✔ Can be trained with samples from itself 24

  25. Many related approaches ● Continuous flows ● Normalizing flows for many-body systems [Noé, Olsson, Köhler, Wu Science 365 (2019) Iss. 6457, 982] [Zhang, E, Wang 1809.10188] ● Hamiltonian transforms ● Self-Learning Monte Carlo [Li, Dong, Zhang, Wang 1910.00024] See talks by Junwei Liu , Lei Wang and Hong-Ye Hu 25 [Liu, Qi, Meng, Fu 1610.03137]

  26. Flow-based generative models Using a change-of-variables, produce a distribution approximating what you want. [Rezende & Mohamed 1505.05770] 26

  27. Flow-based generative models Using a change-of-variables, produce a distribution approximating what you want. [Rezende & Mohamed 1505.05770] Invertible & Tractable Jacobian Approximates Easily sampled desired dist. 27

  28. Flow-based generative models We chose real non-volume preserving (real NVP) flows for our work. [Dinh et al. 1605.08803] Invertible & Tractable Jacobian Many simple layers composed to produce f Approximates Easily sampled desired dist. 28

  29. Flow-based generative models We chose real non-volume preserving (real NVP) flows for our work. [Dinh et al. 1605.08803] Invertible & Tractable Jacobian Approximates Easily sampled desired dist. 29

  30. Real NVP coupling layer -1 Application of g i 1. Freeze 1/2 of the inputs, z a 2. Feed frozen vars into neural networks s and t 3. Scale exp(- s ) and offset - t applied to unfrozen, z b ● Simple inverse and Jacobian 30

  31. Loss function ● Use known target probability density: ● For our application, train to minimize shifted KL divergence shift removes unknown normalization Z ● Can apply self-training : sampling model distribution p̃ f ( 𝜚 ) to estimate loss 31

  32. Correcting for model error ● Known model and target densities, many options to correct for error ● We use MCMC with proposals from ML model (interoperable with standard MC updates) ● Metropolis-Hastings step: model proposal, independent of previous sample Markov Chain ✘ ML model proposals 32

  33. Overview of algorithm Parameterize flow using Real Each layer contains NVP coupling layers arbitrary neural nets s and t 33

Recommend


More recommend