Bayesian Coarse-Graining in Atomistic Simulations: Adaptive Identification of the Dimensionality and Salient Features M. Schöberl 1 , N. Zabaras 2 , 3 , P .S. Koutsourelakis 1 1Continuum Mechanics Group 2Institute for Advanced Study Technical University of Munich Technical University of Munich 3Institute of Computational Sciences and Informatics University of Notre Dame SIAM CSE, Atlanta, GA, USA March 1st 2017 p.s.koutsourelakis@tum.de Bayesian CG 1 / 23
Problem Definition - Equilibrium Statistical Mechanics Fine-Grained Model (FG) Coarse-Grained Model (CG) X = R ( x ) , p f ( x ) ∝ e − β U f ( x ) dim ( X ) << dim ( x ) • X : coarse -scale dofs • x ∈ M : fine-scale DOFs • R : restriction operator (mapping • U f ( x ) : atomistic potential fine → coarse ) • Observables: � E p f ( x ) [ a ] = a ( x ) p f ( x ) d x p.s.koutsourelakis@tum.de Bayesian CG 2 / 23
Motivation Questions • What are good coarse-grained variables X (how many, how are they related to the FG description?) • What is the right CG model? • Given a good CG model for X , how much can one predict about the whole x (reconstruction)? • How much information is lost during coarse-graining and how does this affect predictions produced by the CG model? • Given finite simulation data at the fine-scale, how (un)certain can one be in their predictions? p.s.koutsourelakis@tum.de Bayesian CG 3 / 23
Motivation Questions • What are good coarse-grained variables X (how many, how are they related to the FG description?) • What is the right CG model? • Given a good CG model for X , how much can one predict about the whole x (reconstruction)? • How much information is lost during coarse-graining and how does this affect predictions produced by the CG model? • Given finite simulation data at the fine-scale, how (un)certain can one be in their predictions? p.s.koutsourelakis@tum.de Bayesian CG 3 / 23
Motivation Questions • What are good coarse-grained variables X (how many, how are they related to the FG description?) • What is the right CG model? • Given a good CG model for X , how much can one predict about the whole x (reconstruction)? • How much information is lost during coarse-graining and how does this affect predictions produced by the CG model? • Given finite simulation data at the fine-scale, how (un)certain can one be in their predictions? p.s.koutsourelakis@tum.de Bayesian CG 3 / 23
Motivation Questions • What are good coarse-grained variables X (how many, how are they related to the FG description?) • What is the right CG model? • Given a good CG model for X , how much can one predict about the whole x (reconstruction)? • How much information is lost during coarse-graining and how does this affect predictions produced by the CG model? • Given finite simulation data at the fine-scale, how (un)certain can one be in their predictions? p.s.koutsourelakis@tum.de Bayesian CG 3 / 23
Motivation Questions • What are good coarse-grained variables X (how many, how are they related to the FG description?) • What is the right CG model? • Given a good CG model for X , how much can one predict about the whole x (reconstruction)? • How much information is lost during coarse-graining and how does this affect predictions produced by the CG model? • Given finite simulation data at the fine-scale, how (un)certain can one be in their predictions? p.s.koutsourelakis@tum.de Bayesian CG 3 / 23
Motivation Two roads in CG: 1.) Variational (Mean Field, and many others) p f ( x ) KL (¯ p f ( x ) || p f ( x )) min ¯ 2.) Data-driven (e.g. Relative Entropy [Shell (2008)]): p f ( x ) KL ( p f ( x ) || ¯ min ¯ p f ( x )) where: • ¯ p f ( x ) : approximation • p f ( x ) ∝ e − β U f ( x ) : exact p.s.koutsourelakis@tum.de Bayesian CG 4 / 23
Motivation Existing methods Fine-Scale Configuration x R ( x )= X → ¯ p f ( x ) − − − − − p c ( X ) Coarse-Scale � �� � � �� � X coarse fine Proposed (Generative model) � p cf ( x | X ) → ¯ p c ( X ) − − − − − p f ( x ) = p cf ( x | X ) p c ( X ) d X � �� � � �� � coarse fine Notes • No restriction operator (fine-to-coarse R ( x ) = X ). • A probabilistic coarse-to-fine map p cf ( x | X ) is prescribed • The coarse model p c ( X ) is not the marginal of X (given R ( x ) = X ) p.s.koutsourelakis@tum.de Bayesian CG 5 / 23
Motivation Existing methods Fine-Scale Configuration x R ( x )= X → ¯ p f ( x ) − − − − − p c ( X ) Coarse-Scale � �� � � �� � X coarse fine Proposed (Generative model) Coarse-Scale X � p cf ( x | X ) → ¯ p c ( X ) − − − − − p f ( x ) = p cf ( x | X ) p c ( X ) d X � �� � � �� � coarse Fine-Scale fine Configurations x Notes • No restriction operator (fine-to-coarse R ( x ) = X ). • A probabilistic coarse-to-fine map p cf ( x | X ) is prescribed • The coarse model p c ( X ) is not the marginal of X (given R ( x ) = X ) p.s.koutsourelakis@tum.de Bayesian CG 5 / 23
Motivation Given p c ( X ) and p cf ( x | X ) : 1) Draw X from p c ( X ) (i.e. simulate CG model) 2) Draw x from p cf ( x | X ) p.s.koutsourelakis@tum.de Bayesian CG 6 / 23
Learning Proposed Probabilistic Generative model p c ( X | θ c ) p cf ( x | X , θ cf ) , • Parametrize: � �� � � �� � coarse model coarse → fine map • Optimize: θ c , θ cf KL ( p f ( x ) || ¯ p f ( x | θ c , θ cf )) min � � p cf ( x | X , θ cf ) p c ( X | θ c ) d X ↔ min θ c , θ cf − p f ( x ) log d x p f ( x ) � � � � ↔ max p f ( x ) p cf ( x | X , θ cf ) p c ( X | θ c ) d X log d x θ c , θ cf � N � p cf ( x ( i ) | X , θ cf ) p c ( X | θ c ) d X ↔ max i = 1 log θ c , θ cf ↔ max θ c , θ cf L ( θ c , θ cf ) , (MLE) • MAP estimate: max θ c , θ cf L ( θ c , θ cf ) + log p ( θ c , θ cf ) � �� � log − prior • Fully Bayesian i.e. posterior: p ( θ c , θ cf | x ( 1 : N ) ) ∝ exp {L ( θ c , θ cf ) p ( θ c , θ cf ) } p.s.koutsourelakis@tum.de Bayesian CG 7 / 23
Learning Proposed Probabilistic Generative model p c ( X | θ c ) p cf ( x | X , θ cf ) , • Parametrize: � �� � � �� � coarse model coarse → fine map • Optimize: θ c , θ cf KL ( p f ( x ) || ¯ p f ( x | θ c , θ cf )) min � � p cf ( x | X , θ cf ) p c ( X | θ c ) d X ↔ min θ c , θ cf − p f ( x ) log d x p f ( x ) � � � � ↔ max p f ( x ) p cf ( x | X , θ cf ) p c ( X | θ c ) d X log d x θ c , θ cf � N � p cf ( x ( i ) | X , θ cf ) p c ( X | θ c ) d X ↔ max i = 1 log θ c , θ cf ↔ max θ c , θ cf L ( θ c , θ cf ) , (MLE) • MAP estimate: max θ c , θ cf L ( θ c , θ cf ) + log p ( θ c , θ cf ) � �� � log − prior • Fully Bayesian i.e. posterior: p ( θ c , θ cf | x ( 1 : N ) ) ∝ exp {L ( θ c , θ cf ) p ( θ c , θ cf ) } p.s.koutsourelakis@tum.de Bayesian CG 7 / 23
Learning Proposed Probabilistic Generative model p c ( X | θ c ) p cf ( x | X , θ cf ) , • Parametrize: � �� � � �� � coarse model coarse → fine map • Optimize: θ c , θ cf KL ( p f ( x ) || ¯ p f ( x | θ c , θ cf )) min � � p cf ( x | X , θ cf ) p c ( X | θ c ) d X ↔ min θ c , θ cf − p f ( x ) log d x p f ( x ) � � � � ↔ max p f ( x ) p cf ( x | X , θ cf ) p c ( X | θ c ) d X log d x θ c , θ cf � N � p cf ( x ( i ) | X , θ cf ) p c ( X | θ c ) d X ↔ max i = 1 log θ c , θ cf ↔ max θ c , θ cf L ( θ c , θ cf ) , (MLE) • MAP estimate: max θ c , θ cf L ( θ c , θ cf ) + log p ( θ c , θ cf ) � �� � log − prior • Fully Bayesian i.e. posterior: p ( θ c , θ cf | x ( 1 : N ) ) ∝ exp {L ( θ c , θ cf ) p ( θ c , θ cf ) } p.s.koutsourelakis@tum.de Bayesian CG 7 / 23
Learning Stochastic VB-Expectation-Maximization [Beal & Ghahramani 2003] = � N � p cf ( x ( i ) | X ( i ) , θ cf ) p c ( X ( i ) | θ c ) d X ( i ) L ( θ c , θ cf ) i = 1 log � q ( X ( i ) ) p cf ( x ( i ) | X ( i ) , θ cf ) p c ( X ( i ) | θ c ) = � N d X ( i ) i = 1 log q ( X ( i ) ) q ( X ( i ) ) log p cf ( x ( i ) | X ( i ) , θ cf ) p c ( X ( i ) | θ c ) ≥ � N � d X ( i ) i = 1 q ( X ( i ) ) = � N i = 1 F i ( q ( X ( i ) ) , θ c , θ cf ) = F ( q , θ c , θ cf ) • E-step: Approximate q opt ( X ( i ) ) using a multivariate Gaussians: i q i ( X ( i ) ) = N ( µ opt , Σ opt ) i i • M-step: Compute gradients � N i = 1 ∇ θ c F , � N i = 1 ∇ θ cf F , (and Hessian) and update ( θ c , θ cf ) p.s.koutsourelakis@tum.de Bayesian CG 8 / 23
Learning Stochastic VB-Expectation-Maximization [Beal & Ghahramani 2003] = � N � p cf ( x ( i ) | X ( i ) , θ cf ) p c ( X ( i ) | θ c ) d X ( i ) L ( θ c , θ cf ) i = 1 log = � N � q ( X ( i ) ) p cf ( x ( i ) | X ( i ) , θ cf ) p c ( X ( i ) | θ c ) d X ( i ) i = 1 log q ( X ( i ) ) � q ( X ( i ) ) log p cf ( x ( i ) | X ( i ) , θ cf ) p c ( X ( i ) | θ c ) ≥ � N d X ( i ) i = 1 q ( X ( i ) ) = � N i = 1 F i ( q ( X ( i ) ) , θ c , θ cf ) = F ( q , θ c , θ cf ) • E-step: Approximate q opt ( X ( i ) ) using a multivariate Gaussians: i q i ( X ( i ) ) = N ( µ opt , Σ opt ) i i • M-step: Compute gradients � N i = 1 ∇ θ c F , � N i = 1 ∇ θ cf F , (and Hessian) and update ( θ c , θ cf ) Essential Ingredient: Stochastic Optimization ADAptive Moment estimation (ADAM, [Kingma & Ba 2014]) p.s.koutsourelakis@tum.de Bayesian CG 8 / 23
Recommend
More recommend