casimir effect and 3d qed from machine learning
play

Casimir effect and 3d QED from machine learning Harold Erbin - PowerPoint PPT Presentation

Casimir effect and 3d QED from machine learning Harold Erbin Universit di Torino & Infn (Italy) In collaboration with: M. Chernodub (Tours), V. Goy, I. Grishmanovky, A. Molochkov (Vladivostok) [ 1911.07571 + to appear] 1 / 49 Outline: 1.


  1. Casimir effect and 3d QED from machine learning Harold Erbin Università di Torino & Infn (Italy) In collaboration with: M. Chernodub (Tours), V. Goy, I. Grishmanovky, A. Molochkov (Vladivostok) [ 1911.07571 + to appear] 1 / 49

  2. Outline: 1. Motivations Motivations Machine learning Introduction to lattice QFT Casimir effect 3d QED Conclusion 2 / 49

  3. Machine learning Machine Learning (ML) Set of techniques for pattern recognition / function approximation without explicit programming. ◮ learn to perform a task implicitly by optimizing a cost function ◮ flexible → wide range of applications ◮ general theory unknown (black box problem) 3 / 49

  4. Machine learning Machine Learning (ML) Set of techniques for pattern recognition / function approximation without explicit programming. ◮ learn to perform a task implicitly by optimizing a cost function ◮ flexible → wide range of applications ◮ general theory unknown (black box problem) Question Where does it fit in theoretical physics? 3 / 49

  5. Machine learning Machine Learning (ML) Set of techniques for pattern recognition / function approximation without explicit programming. ◮ learn to perform a task implicitly by optimizing a cost function ◮ flexible → wide range of applications ◮ general theory unknown (black box problem) Question Where does it fit in theoretical physics? → particle physics, cosmology, many-body physics, quantum information, lattice simulations, string vacua. . . 3 / 49

  6. Lattice QFT Ideas: ◮ discretization of action and path integral ◮ Monte Carlo (MC) algorithms Applications: ◮ access non-perturbative effects, strong-coupling regime ◮ study phase transitions ◮ QCD phenomenology (confinement, quark-gluon plasma. . . ) ◮ Regge / CDT approaches to quantum gravity ◮ supersymmetric gauge theories for AdS/CFT 4 / 49

  7. Lattice QFT Ideas: ◮ discretization of action and path integral ◮ Monte Carlo (MC) algorithms Applications: ◮ access non-perturbative effects, strong-coupling regime ◮ study phase transitions ◮ QCD phenomenology (confinement, quark-gluon plasma. . . ) ◮ Regge / CDT approaches to quantum gravity ◮ supersymmetric gauge theories for AdS/CFT Limitations: ◮ computationally expensive ◮ convergence only for some regions of the parameter space → use machine learning 4 / 49

  8. Machine learning for Monte Carlo Support MC with ML [ 1605.01735 , Carrasquilla-Melko] : ◮ compute useful quantities, predict phase ◮ learn field distribution ◮ identify important (order) parameters ◮ generalize to other regions of parameter space ◮ reduce autocorrelation times ◮ avoid fermion sign problem Selected references: 1608.07848 , Broecker et al. ; 1703.02435 , Wetzel ; 1705.05582 , Wetzel-Scherzer ; 1805.11058 , Abe et al. ; 1801.05784 , Shanahan-Trewartha-Detmold ; 1807.05971 , Yoon-Bhattacharya-Gupta ; 1810.12879 , Zhou-Endrõdi-Pang ; 1811.03533 , Urban-Pawlowski ; 1904.12072 , Albergo-Kanwar-Shanahan ; 1909.06238 , Matsumoto-Kitazawa-Kohno 5 / 49

  9. Plan 1. Casimir energy for arbitrary boundaries for a 3 d scalar field → speed improvement and accuracy 2. deconfinement phase transition in 3 d compact QED → extrapolation to different lattice sizes 6 / 49

  10. Outline: 2. Machine learning Motivations Machine learning Introduction to lattice QFT Casimir effect 3d QED Conclusion 7 / 49

  11. Definition Machine learning (Samuel) The field of study that gives computers the ability to learn without being explicitly programmed. Machine learning (Mitchell) A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T , as measured by P , improves with experience E . 8 / 49

  12. Approaches to machine learning Learning approaches (task: x − → y ): ◮ supervised: learn a map from a set ( x train , y train ), then predict y data from x data ◮ unsupervised: give x data and let the machine find structure (i.e. appropriate y data ) ◮ reinforcement: give x data , let the machine choose output following rules, reward good and/or punish bad results, iterate 9 / 49

  13. Applications General idea = pattern recognition: ◮ classification / clustering ◮ regression (prediction) ◮ transcription / translation ◮ structuring ◮ anomaly detection ◮ denoising ◮ synthesis and sampling ◮ density estimation ◮ conjecture generation Applications in industry: computer vision, language processing, medical diagnosis, fraud detection, recommendation system, autonomous driving. . . 10 / 49

  14. Examples Media applications: ◮ MuZero, AlphaZero (DeepMind): play chess, shogi, Go ◮ MuZero, AlphaStar (Deepmind), OpenAI Five, etc.: play video games (Starcraft 2, Dota 2, Atari. . . ) ◮ Gpt -2 (OpenAI): conditional synthetic text sampling (+ question answering, reading comprehension, summarization, translation) ◮ Yolo: real-time object detection [ 1804.02767 ] ◮ Face2Face: real-time face reenactement ◮ cyclic GAN: image generation and translation Science applications: ◮ AlphaFold (DeepMind): protein folding ◮ (astro)particles [ 1806.11484 , 1807.02876 , darkmachines.org ] ◮ astronomy [ 1904.07248 ] ◮ geometrical structures [ geometricdeeplearning.com ] 11 / 49

  15. Examples (AlphaZero) Some facts: [Silver et al. ’18] ◮ AlphaGo (supervised): 4–1 against human champion (03/2016) ◮ AlphaGo Zero (reinforcement): 100–0 against AlphaGo (04/2017) ◮ positions/s: Stockfish = 5 . 8 × 10 7 , AlphaZero = 6 × 10 4 ◮ training time: 9h (chess), 12 hours (shogi), 13 days (Go) 12 / 49

  16. Examples (GPT) Human entry: A train carriage containing controlled nuclear materials was stolen in Cincinnati today. Its whereabouts are unknown. Machine answer: The incident occurred on the downtown train line, which runs from Covington and Ashland stations. In an email to Ohio news outlets, the U.S. Department of Energy said it is working with the Federal Railroad Administra- tion to find the thief. “The theft of this nuclear material will have significant neg- ative consequences on public and environmental health, our workforce and the economy of our nation,” said Tom Hicks, the U.S. Energy Secretary, in a statement. “Our top priority is to secure the theft and ensure it doesn’t happen again.” The stolen material was taken from the University of Cin- cinnati’s Research Triangle Park nuclear research site, according to a news release from Department officials. (. . . ) [ https://openai.com/blog/better-language-models ] 13 / 49

  17. Examples (videos) ◮ Yolo [ https://www.youtube.com/watch?v=VOC3huqHrss ] ◮ Deepfake [ https://www.youtube.com/watch?v=ohmajJTcpNk ] 14 / 49

  18. Examples (cycle GAN) Monet Photos Zebras Horses Summer Winter Monet photo zebra horse summer winter photo Monet horse zebra winter summer Photograph Monet Van Gogh Cezanne Ukiyo-e [ 1703.10593 ] 15 / 49

  19. Examples (protein) [ https://deepmind.com/blog/article/alphafold ] 16 / 49

  20. Deep neural network Architecture: ◮ 1–many hidden layers, vector x ( n ) ◮ link: weighted input, matrix W ( n ) ◮ neuron: non-linear “activation function” g ( n ) x ( n +1) = g ( n +1) ( W ( n ) x ( n ) ) Generic method: fixed functions g ( n ) , learn weights W ( n ) 17 / 49

  21. Deep neural network x (1) := x i 1 i 1 x (2) = g (2) � W (1) i 2 i 1 x (1) � i 2 i 1 f i 3 ( x i 1 ) := x (3) = g (3) � W (2) i 3 i 2 x (2) � i 3 i 2 i 1 = 1 , 2 , 3; i 2 = 1 , . . . , 4; i 3 = 1 , 2 17 / 49

  22. Learning method ◮ define a loss function L N train � y (train) , y (pred) � � L = distance i i i =1 ◮ minimize the loss function (iterated gradient descent. . . ) 18 / 49

  23. Learning method ◮ define a loss function L N train � y (train) , y (pred) � � L = distance i i i =1 ◮ minimize the loss function (iterated gradient descent. . . ) ◮ main risk: overfitting (= cannot generalize) → various solutions (regularization, dropout. . . ) → split data set in two (training and test) 18 / 49

  24. ML workflow “Naive” workflow: 1. get raw data 2. write neural network with many layers 3. feed raw data to neural network 4. get nice results (or give up) https://xkcd.com/1838 19 / 49

  25. ML workflow Real-world workflow: 1. understand the problem 2. exploratory data analysis ◮ feature engineering ◮ feature selection 3. baseline model ◮ full working pipeline ◮ lower-bound on accuracy 4. validation strategy 5. machine learning model 6. ensembling Pragmatic ref.: [ coursera.org/learn/competitive-data-science ] 19 / 49

  26. Complex neural network 20 / 49

  27. Complex neural network Particularities: ◮ f i ( I ) : engineered features ◮ identical outputs (stabilisation) 20 / 49

  28. Some results Universal approximation theorem Under mild assumptions, a feed-forward network with a single hidden layer containing a finite number of neurons can approximate continuous functions on compact subsets of R n . 21 / 49

  29. Some results Universal approximation theorem Under mild assumptions, a feed-forward network with a single hidden layer containing a finite number of neurons can approximate continuous functions on compact subsets of R n . Comparisons ◮ results comparable and sometimes superior to human experts (cancer diagnosis, traffic sign recognition. . . ) ◮ perform generically better than any other machine learning algorithm 21 / 49

Recommend


More recommend