Variational Inference for Bayes vMF Mixture Hanxiao Liu September - PowerPoint PPT Presentation

Variational Inference for Bayes vMF Mixture Hanxiao Liu September 23, 2014 1 / 14

Variational Inference Review Lower bound the likelihood L ( θ ; X ) = E q log p ( X | θ ) � � � � log p ( X , Z | θ ) q ( Z ) = E q + E q log q ( Z ) p ( Z | X , θ ) � �� VLB ( q , θ ) D KL ( q ( Z ) || p ( Z | X , θ )) Raise VLB ( q , θ ) by coordinate ascent � q , θ t � 1. q t + 1 = argmax VLB q = � M i = 1 q i 2. θ t + 1 = argmax θ VLB � q t + 1 , θ � 2 / 14

Variational Inference Review � q , θ t � Goal : solve by coordinate ascent, i.e. argmax VLB q = � M i = 1 q i sequentially updating a single q i in each iteration. Each coordinate step has a closed-form solution— � � � X , Z | θ t � log p � q j ; q − j , θ t � = E q VLB q ( Z ) M � � X , Z | θ t � = E q log p − E q log q i i = 1 � X , Z | θ t � = E q j E q − j log p − E q j log q j + const � �� log ˜ q j + const � q j log ˜ q j = + const = − D KL ( q j || ˜ q j ) + const q j � X , Z | θ t � ⇒ log q ∗ = j = E q − j log p + const 3 / 14

Compute log p ( X , Z | θ ) N � � � p ( X , Z | θ ) = Dirichlet ( π | α ) × Multi ( z i | π ) vMF x i | µ z i , κ z i i = 1 K � � κ k | m , σ 2 � × vMF ( µ k | µ 0 , C 0 ) logNormal k = 1 K � log p ( X , Z | θ ) = − log B ( α ) + ( α − 1 ) log π k k = 1 N K N K � � � � � � log C D ( κ k ) + κ k x ⊤ + z ik log π k + z ik i µ k i = 1 k = 1 i = 1 k = 1 K � � � log C D ( C 0 ) + C 0 µ ⊤ + k µ 0 k = 1 � � K − ( log κ k − m ) 2 − log κ k − 1 � � 2 πσ 2 � + 2 log 2 σ 2 k = 1 5 / 14

Updating q ( π ) ? q ( π ) ≡ Dirichlet ( ·| ρ ) log q ∗ ( π ) = E q \ π log p ( X , Z | θ ) + const � K � N K � � � = E q \ π ( α − 1 ) log π k + + const z ik log π k i = 1 k = 1 k = 1 � � K N � � = α + E q [ z ik ] − 1 log π k + const k = 1 i = 1 K α + � N � i = 1 E q [ z ik ] − 1 ⇒ q ∗ ( π ) ∝ = ∼ Dirichlet π k k = 1 N � ⇒ ρ ∗ = k = α + E q [ z ik ] i = 1 6 / 14

Updating q ( z i ) ? q ( z i ) ≡ Multi ( ·| λ i ) log q ∗ ( z i ) = E q \ z i log p ( X , Z | θ ) + const � N � K N K � � � � � � log C D ( κ k ) + κ k x ⊤ = E q \ z i z ik log π k + z ik i µ k + const i = 1 k = 1 i = 1 k = 1 K � � � E q log π k + E q log C D ( κ k ) + E q [ κ k ] x ⊤ = i E q [ µ k ] + const z ik k = 1 ⇒ q ∗ ( z i ) ∼ Multi , λ ∗ ik ∝ e E q log π k + E q log C D ( κ k )+ E q [ κ k ] x ⊤ i E q [ µ k ] = Assume E q log π k , E q log C D ( κ k ) , E q [ κ k ] and E q [ µ k ] are already known. We will explicitly compute them later. 7 / 14

Updating q ( µ k ) ? q ( µ k ) ≡ vMF ( ·| ψ k , γ k ) log q ∗ ( µ k ) = E q \ µ k log p ( X , Z | θ ) + const   N K K � � � z ij κ j x ⊤ C 0 µ ⊤  + const = E q \ µ k i µ j + j µ 0  i = 1 j = 1 j = 1 � N � � E q [ z ik ] x ⊤ + C 0 µ ⊤ = E q [ κ k ] i µ k k µ 0 + const i = 1 � �� N � � ⊤ µ k ∼ vMF E q [ κ k ] i = 1 E q [ z ik ] x i + C 0 µ 0 ⇒ q ∗ ( µ k ) ∝ e = �� N � � � N � � E q [ κ k ] i = 1 E q [ z ik ] x i + C 0 µ 0 � � � γ ∗ � � , ψ ∗ k = E q [ κ k ] E q [ z ik ] x i + C 0 µ 0 k = � � γ k � � i = 1 8 / 14

Updating q ( κ k ) ? q ( κ k ) ≡ logNormal ( ·| a k , b k ) log q ∗ ( κ k ) = E q \ κ k log p ( X , Z | θ ) + const � � N K K − log κ j − ( log κ j − m ) 2 � � � � � log C D ( κ j ) + κ j x ⊤ = E q \ κ k z ij i µ j + + const 2 σ 2 i = 1 j = 1 j = 1 � � N − log κ k − ( log κ k − m ) 2 � � � log C D ( κ k ) + κ k x ⊤ = E q \ κ k + const z ik i µ k 2 σ 2 i = 1 N − log κ k − ( log κ k − m ) 2 � E q [ z ik ] � i E q [ µ k ] � log C D ( κ k ) + κ k x ⊤ = + const 2 σ 2 i = 1 ⇒ q ∗ ( κ k ) �∼ logNormal = due to the existence of log C D ( κ k ) 9 / 14

Intermediate Quantities Some intermediate quantities are in closed-form ◮ q ( z i ) ≡ Multi ( z i | λ i ) = ⇒ E q [ z ij ] = λ ij �� ◮ q ( π ) ≡ Dirichlet ( π | ρ ) = ⇒ E q log π k = Ψ ( ρ k ) − Ψ j ρ j I D ( γ k ) 1 ◮ q ( µ k ) ≡ vMF ( µ k | ψ k , γ k ) = ⇒ E q [ µ k ] = 2 − 1 ( γ k ) ψ k 2 I D [Rothenbuehler, 2005] Some are not— E q [ κ k ] and E q log C D ( κ k ) 1. the absence of a good parametric form of q ( κ k ) ◮ apply sampling 2. even if κ k ∼ logNormal is assumed, E q log C D ( κ k ) is still hard to deal with ◮ bound log C D ( · ) by some simple functions 1 can be derived from the characteristic function of vMF 10 / 14

Sampling In principle we can sample κ k from p ( κ k | X , θ ) . Unfortunately, the sampling procedure above requires the samples of z i , µ k , π , . . . which are not maintained by variational inference. Recall the optimal posterior for κ k satisfies 2 log q ∗ ( κ k ) N − log κ k − ( log κ k − m ) 2 � � � log C D ( κ k ) + κ k x ⊤ = E [ z ik ] i E q [ µ k ] + const 2 σ 2 i = 1 � N � � ⇒ q ∗ ( κ k ) ∝ exp � � log C D ( κ k ) + κ k x ⊤ = E [ z ik ] i E q [ µ k ] i = 1 � κ k | m , σ 2 � × logNormal We can sample from q ∗ ( κ k ) ! 2 see derivation on p.8 11 / 14

Bounding Outline ◮ Assume q ( κ k ) ≡ logNormal ( ·| a k , b k ) ◮ Lower bound E q log C D ( κ k ) in VLB by some simple terms ◮ To optimize q ( κ k ) , use gradient ascent w.r.t a k and b k to raise the VLB Empirically, sampling outperforms bounding 12 / 14

Empirical Bayes for Hyperparameters Raise VLB ( q , θ ) by coordinate ascent � q , θ t � 1. q t + 1 = argmax VLB q = � M i = 1 q i 2. θ t + 1 = argmax θ VLB � q t + 1 , θ � = argmax θ E q t + 1 log p ( X , Z | θ ) For example, one can use gradient ascent to optimize α K � max − log B ( α ) + ( α − 1 ) E q t + 1 [ log π k ] α> 0 k = 1 m , σ 2 , µ 0 and C 0 can be optimized in a similar manner 3 3 Unlike α , their solutions can be written in closed-form 13 / 14

Reference I Banerjee, A., Dhillon, I. S., Ghosh, J., and Sra, S. (2005). Clustering on the unit hypersphere using von mises-fisher distributions. In Journal of Machine Learning Research , pages 1345–1382. Gopal, S. and Yang, Y. (2014). Von mises-fisher clustering models. In Proceedings of The 31st International Conference on Machine Learning , pages 154–162. Rothenbuehler, J. (2005). Dependence Structures beyond copulas: A new model of a multivariate regular varying distribution based on a finite von Mises-Fisher mixture model . PhD thesis, Cornell University. 14 / 14

Variational Inference for Bayes vMF Mixture Hanxiao Liu September - PowerPoint PPT Presentation

Variational Inference for Bayes vMF Mixture Hanxiao Liu September 23, 2014 1 / 14 Variational Inference Review Lower bound the likelihood L ( ; X ) = E q log p ( X | ) log p ( X , Z | ) q ( Z ) = E q + E q log q (

Fast and Simple Natural-Gradient Variational Inference with Mixture of Exponential-family

EM & Variational Bayes Hanxiao Liu September 9, 2014 1 / 19 Outline 1. EM Algorithm 1.1

Memoized Online Variational Inference for Dirichlet Process Mixture Models Michael C. Hughes

Variational Inference for GPs: Presenters Group1: Stochastic variational inference. Slides 2 - 28

Lecture 14: Inference in Dirichlet Processes (Blei & Jordan, Variational inference for

Another Walkthrough of Variational Bayes Bevan Jones ML for NLP Reading Group The University of

Fast Quantification of Uncertainty and Robustness with Variational Bayes Tamara Broderick ITT

Fast Robustness Quantification with Variational Bayes Tamara Broderick ITT Career Development

CS480/680 Machine Learning Lecture 11: February 11 th , 2020 Variational Inference Zahra

Rejection Sampling Variational Inference Karan Grewal CSC2547 / STA4273 Overview Variational

Variational Inference for Dirichlet Process Mixtures By David Blei and Michael Jordan Presented

Deep Variational Inference FLARE Reading Group Presentation Wesley Tansey 9/28/2016 What is

Lecture Variational 13 Inference Panini Kaushal Scribes : - Margulies Smedeuranh Niklas

Variational Auto-encoders 2 VARIATIONAL AUTO-ENCODERS INTRODUCTION VARIATIONAL AUTO-ENCODERS

Outline Inference in Bayes Nets Variable Elimination Bayes Nets (cont) CS 486/686

Approximate Inference Henrik I. Christensen Robotics & Intelligent Machines @ GT Georgia

CS 440/ECE448 Lecture 19: Bayes Net Inference Mark Hasegawa-Johnson, 3/2019 Including slides by

Variational Inference CMSC 691 UMBC Goal: Posterior Inference Hyperparameters Unknown

Applications of Variational Bayes & DAGs in Neuroimaging ECE

Neural Variational Inference and Learning Andriy Mnih, Karol Gregor 22 June 2014 1 / 14

Chapter 3 More about Inference Jussi Ahola Introduction In chapter 3 the Bayes' theorem is

Variational Inference and Learning Michael Gutmann Probabilistic Modelling and Reasoning

161-17 Stochastic Variational Lecture Inference I Kaushal Panchi Scribes : Jay DeYoung

1 Bayes Nets: Assumptions Independence in a BN Assumptions we are required to make to define

Variational Inference for Bayes vMF Mixture Hanxiao Liu September - PowerPoint PPT Presentation

Variational Inference for Bayes vMF Mixture Hanxiao Liu September 23, 2014 1 / 14 Variational Inference Review Lower bound the likelihood L ( ; X ) = E q log p ( X | ) log p ( X , Z | ) q ( Z ) = E q + E q log q (

Fast and Simple Natural-Gradient Variational Inference with Mixture of Exponential-family

EM &amp; Variational Bayes Hanxiao Liu September 9, 2014 1 / 19 Outline 1. EM Algorithm 1.1

Memoized Online Variational Inference for Dirichlet Process Mixture Models Michael C. Hughes

Variational Inference for GPs: Presenters Group1: Stochastic variational inference. Slides 2 - 28

Lecture 14: Inference in Dirichlet Processes (Blei &amp; Jordan, Variational inference for

Another Walkthrough of Variational Bayes Bevan Jones ML for NLP Reading Group The University of

Fast Quantification of Uncertainty and Robustness with Variational Bayes Tamara Broderick ITT

Fast Robustness Quantification with Variational Bayes Tamara Broderick ITT Career Development

CS480/680 Machine Learning Lecture 11: February 11 th , 2020 Variational Inference Zahra

Rejection Sampling Variational Inference Karan Grewal CSC2547 / STA4273 Overview Variational

Variational Inference for Dirichlet Process Mixtures By David Blei and Michael Jordan Presented

Deep Variational Inference FLARE Reading Group Presentation Wesley Tansey 9/28/2016 What is

Lecture Variational 13 Inference Panini Kaushal Scribes : - Margulies Smedeuranh Niklas

Variational Auto-encoders 2 VARIATIONAL AUTO-ENCODERS INTRODUCTION VARIATIONAL AUTO-ENCODERS

Outline Inference in Bayes Nets Variable Elimination Bayes Nets (cont) CS 486/686

Approximate Inference Henrik I. Christensen Robotics &amp; Intelligent Machines @ GT Georgia

CS 440/ECE448 Lecture 19: Bayes Net Inference Mark Hasegawa-Johnson, 3/2019 Including slides by

Variational Inference CMSC 691 UMBC Goal: Posterior Inference Hyperparameters Unknown

Applications of Variational Bayes &amp; DAGs in Neuroimaging ECE

Neural Variational Inference and Learning Andriy Mnih, Karol Gregor 22 June 2014 1 / 14

Chapter 3 More about Inference Jussi Ahola Introduction In chapter 3 the Bayes' theorem is

Variational Inference and Learning Michael Gutmann Probabilistic Modelling and Reasoning

161-17 Stochastic Variational Lecture Inference I Kaushal Panchi Scribes : Jay DeYoung

1 Bayes Nets: Assumptions Independence in a BN Assumptions we are required to make to define

EM & Variational Bayes Hanxiao Liu September 9, 2014 1 / 19 Outline 1. EM Algorithm 1.1

Lecture 14: Inference in Dirichlet Processes (Blei & Jordan, Variational inference for

Approximate Inference Henrik I. Christensen Robotics & Intelligent Machines @ GT Georgia

Applications of Variational Bayes & DAGs in Neuroimaging ECE