Fast Robustness Quantification with Variational Bayes Tamara - PowerPoint PPT Presentation

Variational Bayes • Variational Bayes (VB) • Approximation for q ∗ ( θ ) p ( θ | x ) posterior • Minimize Kullback-Liebler (KL) divergence: p ( θ | x ) q ( θ ) KL ( q k p ( ·| x )) • VB practical success • point estimates and prediction • fast 3

Variational Bayes • Variational Bayes (VB) • Approximation for q ∗ ( θ ) p ( θ | x ) posterior • Minimize Kullback-Liebler (KL) divergence: p ( θ | x ) KL ( q k p ( ·| x )) q ∗ ( θ ) • VB practical success • point estimates and prediction • fast 3

Variational Bayes • Variational Bayes (VB) • Approximation for q ∗ ( θ ) p ( θ | x ) posterior • Minimize Kullback-Liebler (KL) divergence: p ( θ | x ) KL ( q k p ( ·| x )) q ∗ ( θ ) • VB practical success • point estimates and prediction • fast 3 [Broderick, Boyd, Wibisono, Wilson, Jordan 2013]

Variational Bayes • Variational Bayes (VB) • Approximation for q ∗ ( θ ) p ( θ | x ) posterior • Minimize Kullback-Liebler (KL) divergence: p ( θ | x ) KL ( q k p ( ·| x )) q ∗ ( θ ) • VB practical success • point estimates and prediction • fast, streaming, distributed 3 [Broderick, Boyd, Wibisono, Wilson, Jordan 2013]

What about uncertainty? • Variational Bayes q ( θ ) log q ( θ ) Z KL ( q || p ( ·| x )) = p ( θ | x ) d θ ! θ • Mean-field variational Bayes (MFVB) θ 2 J Y q ( θ ) = q ( θ j ) ! j =1 • Underestimates variance (sometimes severely) θ 1 • No covariance estimates 4 [Bishop 2006]

What about uncertainty? • Variational Bayes ! • Mean-field variational Bayes (MFVB) J Y q ( θ ) = q ( θ j ) ! j =1 • Underestimates variance (sometimes severely) • No covariance estimates 4

What about uncertainty? • Variational Bayes ! • Mean-field variational Bayes (MFVB) θ 2 p ( θ | x ) J Y q ( θ ) = q ( θ j ) ! j =1 • Underestimates variance (sometimes severely) θ 1 • No covariance estimates 4 [Bishop 2006]

What about uncertainty? • Variational Bayes q ( θ ) log q ( θ ) Z KL ( q || p ( ·| x )) = p ( θ | x ) d θ ! θ • Mean-field variational Bayes (MFVB) θ 2 p ( θ | x ) J Y q ( θ ) = q ( θ j ) ! j =1 • Underestimates variance (sometimes severely) θ 1 • No covariance estimates 4 [Bishop 2006]

What about uncertainty? • Variational Bayes q ( θ ) log q ( θ ) Z KL ( q || p ( ·| x )) = p ( θ | x ) d θ ! θ • Mean-field variational Bayes (MFVB) θ 2 p ( θ | x ) J Y q ( θ ) = q ( θ j ) ! q ∗ ( θ ) j =1 • Underestimates variance (sometimes severely) θ 1 • No covariance estimates 4 [Bishop 2006]

What about uncertainty? • Variational Bayes q ( θ ) log q ( θ ) Z KL ( q || p ( ·| x )) = p ( θ | x ) d θ ! θ • Mean-field variational Bayes (MFVB) θ 2 p ( θ | x ) J Y q ( θ ) = q ( θ j ) ! q ∗ ( θ ) j =1 • Underestimates variance (sometimes severely) θ 1 • No covariance estimates 4 [MacKay 2003; Bishop 2006; Wang, Titterington 2004; Turner, Sahani 2011]

What about uncertainty? • Variational Bayes q ( θ ) log q ( θ ) Z KL ( q || p ( ·| x )) = p ( θ | x ) d θ ! θ • Mean-field variational Bayes (MFVB) θ 2 p ( θ | x ) J Y q ( θ ) = q ( θ j ) ! q ∗ ( θ ) j =1 • Underestimates variance (sometimes severely) θ 1 • No covariance estimates [MacKay 2003; Bishop 2006; Wang, Titterington 2004; Turner, Sahani 2011] 4 [Fosdick 2013; Dunson 2014; Bardenet, Doucet, Holmes 2015]

Linear response • Cumulant-generating function � mean = d C ( t ) := log E e t T θ � dtC ( t ) � ! � t =0 • True posterior covariance vs MFVB covariance d 2 d 2 � � � � Σ := dt T dtC p ( ·| x ) ( t ) V := dt T dtC q ∗ ( t ) ! � � � � t =0 t =0 • “Linear response” p ( θ | x ) ! log p t ( θ ) := log p ( θ | x ) + t T θ − C ( t ) , MFVB q ∗ t q ∗ ( θ ) • The LRVB approximation � � d d =: ˆ � � dt T E p t θ t θ Σ = dt T E q ∗ Σ ≈ � � � � t =0 t =0 5 [Bishop 2006]

Linear response • Cumulant-generating function � mean = d C ( t ) := log E e t T θ � dtC ( t ) � ! � t =0 • True posterior covariance vs MFVB covariance d 2 d 2 � � � � Σ := dt T dtC p ( ·| x ) ( t ) V := dt T dtC q ∗ ( t ) ! � � � � t =0 t =0 • “Linear response” p ( θ | x ) p ( θ | x ) ! log p t ( θ ) := log p ( θ | x ) + t T θ − C ( t ) , MFVB q ∗ t q ∗ ( θ ) • The LRVB approximation � � d d =: ˆ � � dt T E p t θ t θ Σ = dt T E q ∗ Σ ≈ � � � � t =0 t =0 5 [Bishop 2006] [Bishop 2006]

Linear response • Cumulant-generating function � mean = d C ( t ) := log E e t T θ � dtC ( t ) � ! � t =0 • True posterior covariance vs MFVB covariance d 2 d 2 � � � � Σ := dt T dtC p ( ·| x ) ( t ) V := dt T dtC q ∗ ( t ) ! � � � � t =0 t =0 • “Linear response” p ( θ | x ) ! log p t ( θ ) := log p ( θ | x ) + t T θ − C ( t ) , MFVB q ∗ t • The LRVB approximation � � d d =: ˆ � � dt T E p t θ t θ Σ = dt T E q ∗ Σ ≈ � � � � t =0 t =0 5 [Bishop 2006]

Linear response • Cumulant-generating function � mean = d C ( t ) := log E e t T θ � dtC ( t ) � ! � t =0 • True posterior covariance vs MFVB covariance d 2 d 2 � � � � Σ := dt T dtC p ( ·| x ) ( t ) V := dt T dtC q ∗ ( t ) ! � � � � t =0 t =0 • “Linear response” p ( θ | x ) p ( θ | x ) ! log p t ( θ ) := log p ( θ | x ) + t T θ − C ( t ) , MFVB q ∗ t q ∗ ( θ ) q ∗ ( θ ) • The LRVB approximation � � d d =: ˆ � � dt T E p t θ t θ Σ = dt T E q ∗ Σ ≈ � � � � t =0 t =0 5 [Bishop 2006] [Bishop 2006]

Linear response • Cumulant-generating function � mean = d C ( t ) := log E e t T θ � dtC ( t ) � ! � t =0 • True posterior covariance vs MFVB covariance d 2 d 2 � � � � Σ := dt T dtC p ( ·| x ) ( t ) V := dt T dtC q ∗ ( t ) ! � � � � t =0 t =0 • “Linear response” p ( θ | x ) p ( θ | x ) ! log p t ( θ ) := log p ( θ | x ) + t T θ − C ( t ) , MFVB q ∗ t q ∗ ( θ ) q ∗ ( θ ) • The LRVB approximation  d �� d � Σ = dtC p ( ·| x ) ( t ) � dt T � t =0 5 [Bishop 2006] [Bishop 2006]

Linear response • Cumulant-generating function � mean = d C ( t ) := log E e t T θ � dtC ( t ) � ! � t =0 • True posterior covariance vs MFVB covariance d 2 d 2 � � � � Σ := dt T dtC p ( ·| x ) ( t ) V := dt T dtC q ∗ ( t ) ! � � � � t =0 t =0 • “Linear response” p ( θ | x ) p ( θ | x ) ! log p t ( θ ) := log p ( θ | x ) + t T θ − C ( t ) , MFVB q ∗ t q ∗ ( θ ) q ∗ ( θ ) • The LRVB approximation � � d d =: ˆ � � dt T E p t θ t θ Σ = dt T E q ∗ Σ ≈ � � � � t =0 t =0 5 [Bishop 2006] [Bishop 2006]

Linear response • Cumulant-generating function � mean = d C ( t ) := log E e t T θ � dtC ( t ) � ! � t =0 • True posterior covariance vs MFVB covariance d 2 d 2 � � � � Σ := dt T dtC p ( ·| x ) ( t ) V := dt T dtC q ∗ ( t ) ! � � � � t =0 t =0 • “Linear response” p ( θ | x ) ! log p t ( θ ) := log p ( θ | x ) + t T θ − C ( t ) , MFVB q ∗ t q ∗ ( θ ) • The LRVB approximation � � d d =: ˆ � � dt T E p t θ t θ Σ = dt T E q ∗ Σ ≈ � � � � t =0 t =0 5 [Bishop 2006]

LRVB estimator � d • LRVB covariance estimate ˆ � t θ Σ := dt T E q ∗ � � t =0 • Suppose exponential family with mean parametrization q t m t 6

LRVB estimator � d • LRVB covariance estimate ˆ � t θ Σ := dt T E q ∗ � � t =0 • Suppose exponential family with mean parametrization q t m t = ( I − V H ) − 1 V 6

LRVB estimator � d • LRVB covariance estimate ˆ � t θ Σ := dt T E q ∗ � � t =0 • Suppose exponential family with mean parametrization q t m t ✓ ∂ 2 KL ◆ − 1 � = ( I − V H ) − 1 V ˆ � Σ = � ∂ m ∂ m T � m = m ∗ 6

LRVB estimator � d • LRVB covariance estimate ˆ � t θ Σ := dt T E q ∗ � � t =0 • Suppose exponential family with mean parametrization q t m t ✓ ∂ 2 KL ◆ − 1 � = ( I − V H ) − 1 V ˆ � Σ = � ∂ m ∂ m T � m = m ∗ • Symmetric and positive definite at local min of KL • The LRVB assumption: 6

LRVB estimator � d • LRVB covariance estimate ˆ � t θ Σ := dt T E q ∗ � � t =0 • Suppose exponential family with mean parametrization q t m t ✓ ∂ 2 KL ◆ − 1 � = ( I − V H ) − 1 V ˆ � Σ = � ∂ m ∂ m T � m = m ∗ • Symmetric and positive definite at local min of KL • The LRVB assumption: E p t θ ≈ E q ∗ t θ 6

LRVB estimator � d • LRVB covariance estimate ˆ � t θ Σ := dt T E q ∗ � � t =0 • Suppose exponential family with mean parametrization q t m t ✓ ∂ 2 KL ◆ − 1 � = ( I − V H ) − 1 V ˆ � Σ = � ∂ m ∂ m T � m = m ∗ • Symmetric and positive definite at local min of KL p ( θ | x ) • The LRVB assumption: E p t θ ≈ E q ∗ t θ q ∗ ( θ ) [Bishop 2006] 6

LRVB estimator � d • LRVB covariance estimate ˆ � t θ Σ := dt T E q ∗ � � t =0 • Suppose exponential family with mean parametrization q t m t ✓ ∂ 2 KL ◆ − 1 � = ( I − V H ) − 1 V ˆ � Σ = � ∂ m ∂ m T � m = m ∗ • Symmetric and positive definite at local min of KL p ( θ | x ) • The LRVB assumption: E p t θ ≈ E q ∗ t θ q ∗ ( θ ) • LRVB estimate is exact when MFVB gives exact mean (e.g. multivariate normal) [Bishop 2006] 6

Microcredit Experiment • Simplified from Meager (2015) • K microcredit trials (Mexico, Mongolia, Bosnia, India, Morocco, Philippines, Ethiopia) • N k businesses in k th site (~900 to ~17K) • Profit of n th business at k th site: ! ! • Priors and hyperpriors: 7

Microcredit Experiment • Simplified from Meager (2015) • K microcredit trials (Mexico, Mongolia, Bosnia, India, Morocco, Philippines, Ethiopia) • N k businesses in k th site (~900 to ~17K) • Profit of n th business at k th site: profit ! indep N ( µ k + T kn τ k , σ 2 k ) y kn ∼ ! • Priors and hyperpriors: 7

Microcredit Experiment • Simplified from Meager (2015) • K microcredit trials (Mexico, Mongolia, Bosnia, India, Morocco, Philippines, Ethiopia) • N k businesses in k th site (~900 to ~17K) 1 if microcredit • Profit of n th business at k th site: profit ! indep N ( µ k + T kn τ k , σ 2 k ) y kn ∼ ! • Priors and hyperpriors: 7

Microcredit Experiment • Simplified from Meager (2015) • K microcredit trials (Mexico, Mongolia, Bosnia, India, Morocco, Philippines, Ethiopia) • N k businesses in k th site (~900 to ~17K) 1 if microcredit • Profit of n th business at k th site: profit ! indep N ( µ k + T kn τ k , σ 2 k ) y kn ∼ ! • Priors and hyperpriors: ✓ ◆ ✓✓ ◆ ◆ µ k µ iid ∼ N , C τ k τ 7

Microcredit Experiment • Simplified from Meager (2015) • K microcredit trials (Mexico, Mongolia, Bosnia, India, Morocco, Philippines, Ethiopia) • N k businesses in k th site (~900 to ~17K) 1 if microcredit • Profit of n th business at k th site: profit ! indep N ( µ k + T kn τ k , σ 2 k ) y kn ∼ ! • Priors and hyperpriors: ✓ ◆ ✓✓ ◆ ◆ µ k µ iid ∼ N , C τ k τ iid σ − 2 ∼ Γ ( a, b ) k 7

Microcredit Experiment • Simplified from Meager (2015) • K microcredit trials (Mexico, Mongolia, Bosnia, India, Morocco, Philippines, Ethiopia) • N k businesses in k th site (~900 to ~17K) 1 if microcredit • Profit of n th business at k th site: profit ! indep N ( µ k + T kn τ k , σ 2 k ) y kn ∼ ! • Priors and hyperpriors: ✓ ◆ ✓✓ ◆ ◆ ✓ ◆ ✓✓ ◆ ◆ µ k µ iid µ iid µ 0 , Λ − 1 ∼ N ∼ N , C τ k τ τ τ 0 iid σ − 2 ∼ Γ ( a, b ) C ∼ Sep&LKJ( η , c, d ) k 7

Fast Robustness Quantification with Variational Bayes Tamara - PowerPoint PPT Presentation

Fast Robustness Quantification with Variational Bayes Tamara Broderick ITT Career Development Assistant Professor, MIT With: Ryan Giordano, Rachael Meager, Jonathan Huggins, Michael I. Jordan Bayesian inference Complex, modular

Fast Quantification of Uncertainty and Robustness with Variational Bayes Tamara Broderick ITT

Another Walkthrough of Variational Bayes Bevan Jones ML for NLP Reading Group The University of

Variational Auto-encoders 2 VARIATIONAL AUTO-ENCODERS INTRODUCTION VARIATIONAL AUTO-ENCODERS

EM & Variational Bayes Hanxiao Liu September 9, 2014 1 / 19 Outline 1. EM Algorithm 1.1

Variational Inference for Bayes vMF Mixture Hanxiao Liu September 23, 2014 1 / 14 Variational

Applications of Variational Bayes & DAGs in Neuroimaging ECE

Improving the Robustness of Variational Optical Flow through Tensor Voting by: Hatem A.

Probabilistic & Unsupervised Learning Factored Variational Approximations and Variational

Probabilistic & Unsupervised Learning Factored Variational Approximations and Variational

Fast and Simple Natural-Gradient Variational Inference with Mixture of Exponential-family

Switching Linear Dynamics for Variational Bayes Filtering Philip Becker-Ehmck 1 , 2 , Jan Peters 2

Uncertainty quantification of Antarctic contribution to sea-level rise using the fast Elementary

Large Sample Robustness Bayes Nets with Incomplete Information Jim Smith and Ali Daneshkhah

Variational Autoencoders Tom Fletcher March 25, 2019 Talking about this paper: Diederik Kingma

Fast Variational Algorithms for Statistical Network Modeling and other network modeling advances

Level set methods for robustness measures P. V AN D OOREN CESAME, Universit e catholique de

Foundations of Artificial Intelligence 46. Uncertainty: Introduction and Quantification Malte

Variational Hamiltonian Monte Carlo via Score Matching Cheng Zhang (Joint work with Prof. Shahbaba

Projected Stein variational Newton: A fast and scalable Bayesian inference method in high

Robustness? Robustness ? Robustness?

Where Are We? Lecture 9 Robustness through Training 1 Robustness Explicit Handling of Noise

Ludger BioQuant Chitotriose Standard for Glycan Quantitation A fast, reliable method for

Bayes Theorem Thomas Bayes (1701-1761) Simple form of Bayes Theorem, for

Quantification and Quantificational Structures 7/21/17 Overview Interpreting DPs (entity

Fast Robustness Quantification with Variational Bayes Tamara - PowerPoint PPT Presentation

Fast Robustness Quantification with Variational Bayes Tamara Broderick ITT Career Development Assistant Professor, MIT With: Ryan Giordano, Rachael Meager, Jonathan Huggins, Michael I. Jordan Bayesian inference Complex, modular

Fast Quantification of Uncertainty and Robustness with Variational Bayes Tamara Broderick ITT

Another Walkthrough of Variational Bayes Bevan Jones ML for NLP Reading Group The University of

Variational Auto-encoders 2 VARIATIONAL AUTO-ENCODERS INTRODUCTION VARIATIONAL AUTO-ENCODERS

EM &amp; Variational Bayes Hanxiao Liu September 9, 2014 1 / 19 Outline 1. EM Algorithm 1.1

Variational Inference for Bayes vMF Mixture Hanxiao Liu September 23, 2014 1 / 14 Variational

Applications of Variational Bayes &amp; DAGs in Neuroimaging ECE

Improving the Robustness of Variational Optical Flow through Tensor Voting by: Hatem A.

Probabilistic &amp; Unsupervised Learning Factored Variational Approximations and Variational

Probabilistic &amp; Unsupervised Learning Factored Variational Approximations and Variational

Fast and Simple Natural-Gradient Variational Inference with Mixture of Exponential-family

Switching Linear Dynamics for Variational Bayes Filtering Philip Becker-Ehmck 1 , 2 , Jan Peters 2

Uncertainty quantification of Antarctic contribution to sea-level rise using the fast Elementary

Large Sample Robustness Bayes Nets with Incomplete Information Jim Smith and Ali Daneshkhah

Variational Autoencoders Tom Fletcher March 25, 2019 Talking about this paper: Diederik Kingma

Fast Variational Algorithms for Statistical Network Modeling and other network modeling advances

Level set methods for robustness measures P. V AN D OOREN CESAME, Universit e catholique de

Foundations of Artificial Intelligence 46. Uncertainty: Introduction and Quantification Malte

Variational Hamiltonian Monte Carlo via Score Matching Cheng Zhang (Joint work with Prof. Shahbaba

Projected Stein variational Newton: A fast and scalable Bayesian inference method in high

Robustness? Robustness ? Robustness?

Where Are We? Lecture 9 Robustness through Training 1 Robustness Explicit Handling of Noise

Ludger BioQuant Chitotriose Standard for Glycan Quantitation A fast, reliable method for

Bayes Theorem Thomas Bayes (1701-1761) Simple form of Bayes Theorem, for

Quantification and Quantificational Structures 7/21/17 Overview Interpreting DPs (entity

EM & Variational Bayes Hanxiao Liu September 9, 2014 1 / 19 Outline 1. EM Algorithm 1.1

Applications of Variational Bayes & DAGs in Neuroimaging ECE

Probabilistic & Unsupervised Learning Factored Variational Approximations and Variational

Probabilistic & Unsupervised Learning Factored Variational Approximations and Variational