Machine learning for lattice theories 1 Michael S. Albergo, Gurtej Kanwar , Phiala E. Shanahan Center for Theoretical Physics, MIT Deep Learning and Physics 1 Albergo, GK, Shanahan [PRD 100 (2019) 034515] Kyoto, Japan (November 1, 2019)
Machine learning for lattice theories Real-world lattices 2
Machine learning for lattice theories Real-world Quantum field lattices theories 3
Lattices in the real world ● Many materials have degrees of freedom pinned to a lattice structure [Ella Maru Studio] [Mazurenko et al. 1612.08436] 4
Lattices in the real world ● Thermodynamics describes collective behavior of many degrees of freedom ● At some temperature T, Boltzmann distribution over microstates with 5
Lattices in the real world ● Thermodynamics describes collective behavior of many degrees of freedom ● At some temperature T, Boltzmann distribution over microstates with Ising model has spin s = {↑,↓} per site, with energy penalty for neighboring spins differing. Typical microstates have patches of the same spin at some scale. [ "Ising Model and Metropolis Algorithm", MathWorks Physics Team ] 6
Lattices in the real world ● Derive thermodynamic observables by averaging microstates Boltzmann Partition function distribution total energy total energy Helmholtz free correlation energy function . . . . . . 7
Lattices for quantum field theories ● Quantum-mechanical properties also computed as statistical expectation values via Path Integral similar to partition function 8
Lattice Quantum Chromodynamics ● Predictions relevant to interpret upcoming high-energy expts ○ Electron-Ion Collider will investigate detailed nuclear structure ○ Deep Underground Neutrino Expt requires nuclear cross bnl.gov/eic sections with neutrinos So far! Hong-Ye's talk for holography ideas dunescience.org ● Pen-and-paper methods fail, numerical evaluation of path integral req'd 9 [D. Leinweber, Visual QCD Archive]
Computational approach to lattice theories ● Partition functions and path integrals are typically intractable analytically ● Numerical approximation by Monte Carlo sampling sample integral according to estimate observables ● Markov Chain Monte Carlo converges to samples from p( 𝜚 ) . . . approximately 10 distributed ~ p( 𝜚 )
Machine learning for lattice theories Real-world Quantum field lattices theories lattice theories 11
Machine learning for lattice theories Real-world Quantum field lattices theories lattice theories numerical methods – Thermodynamics – Collective phenomena – Spectrum – ... 12
Machine learning for lattice theories Real-world Quantum field lattices theories lattice theories numerical methods ✘ hard to reach continuum limit / critical point in some theories – Thermodynamics – Collective phenomena – Spectrum – ... 13
Machine learning for lattice theories Real-world Quantum field lattices theories lattice theories numerical methods + ML ✘ hard to reach continuum limit / critical point in some theories – Thermodynamics – Collective phenomena – Spectrum – ... 14
Machine learning for lattice theories Real-world Quantum field lattices theories lattice theories Sampling using ML 2 numerical methods + ML ✘ 1 Critical slowing down – Thermodynamics – Collective phenomena – Spectrum – ... 3 Toy model results 15
Difficulties with Markov Chain Monte Carlo ● Need to wait for "burn-in period" ● Configurations close to each other on the chain will be correlated, so must take many steps before drawing independent samples correlated . . . ~ p( 𝜚 ) ~ p( 𝜚 ) burn-in ● Burn-in and correlations both related to Markov chain "autocorrelation time" → smaller autocorrelation time means less computational cost! typically quantify with integrated autocorrelation time: 16
Critical slowing down ● As params defining distribution approach criticality, for Markov chains using local updates, autocorrelation time diverges continuum limit Fitting 𝜐 int to power law behavior gives ● dynamical critical exponents ● Smaller dynamical critical exponent = cheaper, closer approach to criticality 17
CSD in scalar theory used in this work: Critical slowing down ● As params defining distribution approach criticality, for Markov chains using local updates, autocorrelation time diverges continuum limit Fitting 𝜐 int to power law behavior gives ● dynamical critical exponents CSD also affects more realistic, complex models: CP N-1 ○ [Flynn, et al. 1504.06292] ○ O(N) [Frick, et al. PRL 63, 2613] ○ QCD [ALPHA collaboration 1009.5228] ● Smaller dynamical critical exponent = ○ ... cheaper, closer approach to criticality 18
Machine learning for lattice theories Real-world Quantum field lattices theories lattice theories Sampling using ML 2 numerical methods + ML ✘ 1 Critical slowing down – Thermodynamics – Collective phenomena – Spectrum – ... 3 Toy model results 19
Sampling lattice configs likely (log prob = 22) likely (log prob = 25) likely (log prob = 5) unlikely (log prob = -6107) 20
Sampling lattice configs ≅ generating images likely likely [Karras, Lane, Aila / NVIDIA 1812.04948] likely likely unlikely unlikely 21
Unique features of the lattice sampling problem ✓ Probability density computable (up to normalization) ✓ Many symmetries in physics ○ Lattice symmetries like translation , rotation , and reflection ○ Per-site symmetries like negation ✘ High-dimensional (10 9 to 10 12 ) samples ✘ Few (~1000) samples available ahead of time (fewer than # vars!) ○ Hard to use training paradigms that rely on existing samples from distribution 22
Image generation via ML 1. Likelihood free methods: [Goodfellow et al. 1406.2661] E.g. Generative Adversarial Networks (GANs) ✘ Needs many real samples No associated likelihood for each produced sample ✘ 2. Autoencoding: [Kingma & Welling 1312.6114] [Shen & Liu 1612.05363] E.g. Variational Auto-Encoders (VAEs) ✔ Good for human interpretability Same issues as GANs ✘ 3. Normalizing flows: [Rezende & Mohamed 1505.05770] Flow-based models learn a change-of-variables that transforms a known distribution to the desired distribution ✔ Exactly known likelihood for each sample ✔ Can be trained with samples from itself 23
Image generation via ML 1. Likelihood free methods: [Goodfellow et al. 1406.2661] E.g. Generative Adversarial Networks (GANs) ✘ Needs many real samples No associated likelihood for each produced sample ✘ 2. Autoencoding: [Kingma & Welling 1312.6114] [Shen & Liu 1612.05363] E.g. Variational Auto-Encoders (VAEs) ✔ Good for human interpretability Same issues as GANs ✘ 3. Normalizing flows: [Rezende & Mohamed 1505.05770] Flow-based models learn a change-of-variables that transforms a known distribution to the desired distribution ✔ Exactly known likelihood for each sample ✔ Can be trained with samples from itself 24
Many related approaches ● Continuous flows ● Normalizing flows for many-body systems [Noé, Olsson, Köhler, Wu Science 365 (2019) Iss. 6457, 982] [Zhang, E, Wang 1809.10188] ● Hamiltonian transforms ● Self-Learning Monte Carlo [Li, Dong, Zhang, Wang 1910.00024] See talks by Junwei Liu , Lei Wang and Hong-Ye Hu 25 [Liu, Qi, Meng, Fu 1610.03137]
Flow-based generative models Using a change-of-variables, produce a distribution approximating what you want. [Rezende & Mohamed 1505.05770] 26
Flow-based generative models Using a change-of-variables, produce a distribution approximating what you want. [Rezende & Mohamed 1505.05770] Invertible & Tractable Jacobian Approximates Easily sampled desired dist. 27
Flow-based generative models We chose real non-volume preserving (real NVP) flows for our work. [Dinh et al. 1605.08803] Invertible & Tractable Jacobian Many simple layers composed to produce f Approximates Easily sampled desired dist. 28
Flow-based generative models We chose real non-volume preserving (real NVP) flows for our work. [Dinh et al. 1605.08803] Invertible & Tractable Jacobian Approximates Easily sampled desired dist. 29
Real NVP coupling layer -1 Application of g i 1. Freeze 1/2 of the inputs, z a 2. Feed frozen vars into neural networks s and t 3. Scale exp(- s ) and offset - t applied to unfrozen, z b ● Simple inverse and Jacobian 30
Loss function ● Use known target probability density: ● For our application, train to minimize shifted KL divergence shift removes unknown normalization Z ● Can apply self-training : sampling model distribution p̃ f ( 𝜚 ) to estimate loss 31
Correcting for model error ● Known model and target densities, many options to correct for error ● We use MCMC with proposals from ML model (interoperable with standard MC updates) ● Metropolis-Hastings step: model proposal, independent of previous sample Markov Chain ✘ ML model proposals 32
Overview of algorithm Parameterize flow using Real Each layer contains NVP coupling layers arbitrary neural nets s and t 33
Recommend
More recommend