Extended Variational Inference for Non-Gaussian Statistical Models - PowerPoint PPT Presentation

Extended Variational Inference for Non-Gaussian Statistical Models Zhanyu Ma mazhanyu@bupt.edu.cn Pattern Recognition and Intelligent System Lab., Beijing University of Posts and Telecommunications, Beijing, China. VALSE Webinar May 20, 2015

Collaborators 2

References [1] Z. Ma, A.E. Teschendorff, A. Leijon, Y. Qiao, H. Zhang, and J. Guo, “Variational Bayesian Matrix Factorization for Bounded Support Data”, IEEE Trans. on Pattern Analysis and Machine Intelligence (TPAMI) , Volume 37, Issue 4, pp. 876 – 889, Apr. 2015. [2] Z. Ma and A. Leijon, “Bayesian Estimation of Beta Mixture Models with Variational Inference”, IEEE Trans. on Pattern Analysis and Machine Intelligence (TPAMI) , Vol. 33, pp. 2160 – 2173, Nov. 2011. [3] Z. Ma, P. K. Rana, J. Taghia, M. Flierl, and A. Leijon, “Bayesian Estimation of Dirichlet Mixture Model with Variational Inference”, Pattern Recognition (PR) , Volume 47, Issue 9, pp. 3143-3157, September 2014. [4] J. Taghia, Z. Ma, A. Leijon, “Bayesian Estimation of the von-Mises Fisher Mixture Model with Variational Inference ”, IEEE Trans. on Pattern Analysis and Machine Intelligence (TPAMI) , Volume 36, Issue 9, pp. 1701-1715, September, 2014. [5] P. K. Rana, J. Taghia, Z. Ma , and M. Flierl, “Probabilistic Multiview Depth Image Enhancement Using Variational Inference”, IEEE Journal of Selected Topics in Signal Processing (J-STSP), Volume 9, Issue 3, pp. 435-448, Apr. 2015 3

Outline Non-Gaussian Statistical Models • Non-Gaussian vs. Gaussian • Advantages and Challenges Variational Inference (VI) and Extended VI • Formulations and Conditions • Convergence and Bias Related Applications • Beta/Dirichlet Mixture Model • BG-NMF 4

Non-Gaussian Statistical Models • Definition – Statistical model for non-Gaussian data – Belong to exponential family • Directional von data Mises- Fisher • L 2 norm =1 • Bounded Dirichlet Non-Gaussian support /Beta • L 1 norm =1 • Semi-bounded Gamma support 5

Non-Gaussian Statistical Models  Why non-Gaussian? OR Why not Gaussian?  Real-life data are not Gaussian • Speech Spectra • Image pixel value • Edge strength in complex network • DNA methylation level • ………. 6

Non-Gaussian Statistical Models  Gaussian distribution  Advantages • the widely used probability distribution • analytically tractable solution • Gaussian mixture model can model arbitrary distribution • vast applications  Disadvantages • not all the data are Gaussian distributed • unbounded support and symmetric shape for bounded/semi-bounded/well-structured data • flexible model with the cost of high model complexity 7

Non-Gaussian Statistical Models  Non-Gaussian distribution  Advantages • well defined for bounded/semi-bounded/well-structured data • belong to exponential family  mathematical convenience and conjugate match • non-Gaussian mixture model can model data more efficiently  Disadvantages • numerically challenging in parameter estimation, both ML and Bayesian estimations! • lack of closed-form solution for real applications 8

Non-Gaussian Statistical Models • Example 1: beta distribution ( ) Γ + ( ) ∫ ( ) ( ) ∞ u v − − − − = − Γ = 1 1 v 1 u z t beta ; , 1 , x u v x x z t e dt ( ) ( ) Γ Γ u v 0 – Bounded support and flexible shape – Image processing, speech coding, DNA methylation analysis 9

Non-Gaussian Statistical Models • Example 2: Dirichlet distribution (neutral vector) ( ) ∑ Γ K a K ( ) ∏ ∑ x a − = K = > > = k a 1 1 Dir ; k , 1 , 0 , 0 . x x x a k ( ) ∏ = k k k k K Γ 1 k a = k 1 = k k 1 – Conventionally used as conjugate prior of multi categorical distribution or multinomial distribution, describing mixture weights in mixture modeling – Recently, it was applied to model proportional data (i.e., data with L1 norm) – Speech coding, skin color detection, multiview 3D enhancement, etc. 10

Non-Gaussian Statistical Models • Example 3: von Mises-Fisher distribution − λ K 1 ( ) 2 μ T x μ x x T x 1 λ = λ ⋅ = ; , , f e ( ) ( ) π K λ 2 I 2 − K 1 2 ( ) denotes the modified Bessel function of the first kind I v p – Distributed on K-dimensional sphere – Two-dimensional vMF = circle – Directional statistics, gene expressions, speech coding 11

Non-Gaussian Statistical Models • Summary – Non-Gaussian distribution represents a family of distributions which are not Gaussian distributed – Not conflicting with central limit theorem – Well-defined for bounded/semi- bounded/structured data – More efficient than Gaussian distribution – Hard to estimate, computationally costly, and difficult to use in practice 12

Outline Non-Gaussian Statistical Models • Non-Gaussian vs. Gaussian • Advantages and Challenges Variational Inference (VI) and Extended VI • Formulations and Conditions • Convergence and Bias Related Applications • Beta/Dirichlet Mixture Model • BG-NMF 13

Formulation and Conditions • Maximum likelihood (ML) estimation – Widely used for point estimation of the parameters – Expectation-maximization (EM) algorithm – Converge to local maxima and may yield overfitting – No analytically tractable solution for most non- Gaussian distributions 14

Formulation and Conditions • Bayesian estimation – Estimating the distributions of the parameters, rather than point estimate – Conjugate match in exponential family – No overfitting, feasible for online learning – Without approximation, there is no analytically tractable solution for non-Gaussian distributions 15

Formulation and Conditions Example: ML estimation for beta mixture model [1] • – M step ( ) ( ) ∑   N ψ + − ψ + 1 ln u v u x   = N n = 1 n 0 ( ) ( ) ( )   ∑ ψ + − ψ + N − 1 ln 1 u v v x   = n N 1 n ( )  − −  Γ t zt ln ( ) ∞ d z e e ∫ ψ = =  −  z dt   − − t  1  dz t e 0 – Numerical solution, Gibbs sampling, Newton-Raphson method, MCMC, etc. 16 [1] Z. Ma and A. Leijon, ‘Beta Mixture Model and the Application to Image Classification’, IEEE International Conference on Image Processing , pp. 2045-2048, 2009.

Formulation and Conditions Example: Bayesian estimation of beta distribution [1] • – Prior ( ) ν  Γ +  0 ( ) u v ( ) ( ) − α − − β − α β ν ∝ 1 1 u v , ; , , p u v   e e 0 0 ( ) ( ) 0 0 0 Γ Γ   u v – Likelihood ( ) Γ + u v ( ) ( ) − = − − v 1 u 1 beta ; , 1 x u v x x ( ) ( ) Γ Γ u v – Posterior ( ) ν + N  Γ +   ∑   ∑  ( ) ( ) ( ) 0 N N − α − − − β − − − ( ) u v     ln x u 1 ln 1 x v 1 X α β ν ∝  0 n   0 n  = = 1 1 , | ; , ,   n n p u v e e ( ) ( ) Γ Γ 0 0 0   u v – No closed-form expression for mean, variance, etc. – No analytically tractable solution for mixture model – Not applicable in practice 17 [1] Z. Ma and A. Leijon, ‘Bayesian Estimation of Beta Mixture Models with Variational Inference’, IEEE Transaction on Pattern Analysis and Machine Intelligence , Vol. 33, pp. 2160 – 2173, Nov. 2011.

Formulation and Conditions • Variational inference [1] – Mean field theory in physics, d ates back to 18 th century, by Euler, Lagrange, etc. – Function over function – Closed form solution with certain constraints ( ) ( ) ( ) θ ∫ θ θ = | f x f x f d ( ) ( ) θ θ , ( ) ( ) f x ( ) g ∫ ∫ θ θ θ θ = + ln ln ln f x g d g d ( ) ( ) θ θ | g f x ( ) ( ) L = + KL || g g f ( ) ( ) θ θ – Goal: approximate by via either g | f x ( ) ( ) L maximizing or minimizing KL g || g f 18 [1] C. M. Bishop, ‘Pattern Recognition and Machine Learning’, Springer , 2006

Formulation and Conditions • Factorized approximation [1] ( ) ( ) ∏ θ θ ≈ g g i i i [ ] C ( ) ( ) θ θ = + * ln E ln , g f x ≠ i i j i ( ) g θ – No constraints on the form of i i ( ) L – Directly maximizing g – Always converges but may fall in local maxima – Analytically tractable form solution for Gaussian 19 [1] C. M. Bishop, ‘Pattern Recognition and Machine Learning’, Springer , 2006

Extended Variational Inference for Non-Gaussian Statistical Models - PowerPoint PPT Presentation

Extended Variational Inference for Non-Gaussian Statistical Models Zhanyu Ma mazhanyu@bupt.edu.cn Pattern Recognition and Intelligent System Lab., Beijing University of Posts and Telecommunications, Beijing, China. VALSE Webinar May 20, 2015

Deep Variational Inference FLARE Reading Group Presentation Wesley Tansey 9/28/2016 What is

Variational Inference for GPs: Presenters Group1: Stochastic variational inference. Slides 2 - 28

Non-Gaussian likelihoods for Gaussian Processes Alan Saul Outline Motivation Non-Gaussian

Variational Auto-encoders 2 VARIATIONAL AUTO-ENCODERS INTRODUCTION VARIATIONAL AUTO-ENCODERS

Rejection Sampling Variational Inference Karan Grewal CSC2547 / STA4273 Overview Variational

Gaussian Filter The Gaussian filter 1 2 1 A Gaussian kernel gives less 1 2 4 2 weight to

Variational Model Selection for Sparse Gaussian Process Regression Michalis K. Titsias School of

CS480/680 Machine Learning Lecture 11: February 11 th , 2020 Variational Inference Zahra

An Introduction to An Introduction to Variational Variational Methods for Graphical Models

Lecture Variational 13 Inference Panini Kaushal Scribes : - Margulies Smedeuranh Niklas

CS 285 Instructor: Sergey Levine UC Berkeley Todays Lecture 1. Probabilistic latent variable

Variational Inference for Bayes vMF Mixture Hanxiao Liu September 23, 2014 1 / 14 Variational

Lecture 3 Capacity of Multiuser Gaussian Channels The Gaussian uplink: 6.1 The fading

Distributed Variational Inference in Sparse Gaussian Process Regression and Latent Variable Models

Variational Bayesian Inference for Parametric and Non-Parametric Regression with Missing Predictor

Extended Project Qualification Introduction What is an Extended Project? What does an

Natural Language Processing Lecture 132/26/2015 Martha Palmer Today Start on Parsing

Kai-Wei Chang UCLA References: http://kwchang.net Kai-Wei Chang

1 2 3 4 5

GENI-VIOLIN: Distributed Suspend and Resume for GENI

GDP and More: Performance and Power Solutions for Multi-Core VLSI Systems Hai Wang University

design of the radiation cooled positron target Sabine Riemann (DESY), Andriy Ushakov (UHH),

A Photon Dump Study for ILC Undulator Positron Source Yu Morikawa 2017/9/20 1 ILC Beam Dumps

Estimation in nonparametric regression with discrete errors Wolfgang Wefelmeyer (University of