Spectral Experts for Estimating Mixtures of Linear Regressions Arun - PowerPoint PPT Presentation

Spectral Experts for Estimating Mixtures of Linear Regressions Arun Tejasvi Chaganty Percy Liang Stanford University January 28, 2016 Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 1 / 22

Introduction Latent Variable Models h Generative Models ◮ x Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 2 / 22

Introduction Latent Variable Models h Generative Models ◮ ◮ Gaussian Mixture Models ◮ Hidden Markov Models ◮ Latent Dirichlet Allocation x ◮ PCFGs ◮ . . . Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 2 / 22

Introduction Latent Variable Models h Generative Models ◮ ◮ Gaussian Mixture Models ◮ Hidden Markov Models ◮ Latent Dirichlet Allocation x ◮ PCFGs ◮ . . . Discriminative Models ◮ x h y Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 2 / 22

Introduction Latent Variable Models h Generative Models ◮ ◮ Gaussian Mixture Models ◮ Hidden Markov Models ◮ Latent Dirichlet Allocation x ◮ PCFGs ◮ . . . Discriminative Models ◮ x ◮ Mixture of Experts h ◮ Latent CRFs ◮ Discriminative LDA ◮ . . . y Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 2 / 22

Introduction Latent Variable Models h Generative Models ◮ ◮ Gaussian Mixture Models ◮ Hidden Markov Models ◮ Latent Dirichlet Allocation x ◮ PCFGs ◮ . . . Discriminative Models ◮ x ◮ Mixture of Experts h ◮ Latent CRFs ◮ Discriminative LDA ◮ . . . y ◮ Easy to include features and tend to be more accurate. Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 2 / 22

Introduction Parameter Estimation is Hard − log p θ ( x ) θ ◮ Log-likelihood function is non-convex. Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 3 / 22

Introduction Parameter Estimation is Hard − log p θ ( x ) θ MLE θ ◮ Log-likelihood function is non-convex. ◮ MLE is consistent but intractable. Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 3 / 22

Introduction Parameter Estimation is Hard − log p θ ( x ) θ EM θ EM θ MLE θ ◮ Log-likelihood function is non-convex. ◮ MLE is consistent but intractable. ◮ Local methods (EM, gradient descent, etc.) are tractable but inconsistent. Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 3 / 22

Introduction Parameter Estimation is Hard − log p θ ( x ) θ EM θ EM θ MLE θ ◮ Log-likelihood function is non-convex. ◮ MLE is consistent but intractable. ◮ Local methods (EM, gradient descent, etc.) are tractable but inconsistent. ◮ Can we build an efficient and consistent estimator ? Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 3 / 22

Introduction Related Work ◮ Method of Moments [Pearson, 1894] Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 4 / 22

Introduction Related Work ◮ Method of Moments [Pearson, 1894] ◮ Observable operators ◮ Control Theory [Ljung, 1987] ◮ Observable operator models [Jaeger, 2000; Littman/Sutton/Singh, 2004] ◮ Hidden Markov models [Hsu/Kakade/Zhang, 2009] ◮ Low-treewidth graphs [Parikh et al., 2012] ◮ Weighted finite state automata [Balle & Mohri, 2012] Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 4 / 22

Introduction Related Work ◮ Method of Moments [Pearson, 1894] ◮ Observable operators ◮ Control Theory [Ljung, 1987] ◮ Observable operator models [Jaeger, 2000; Littman/Sutton/Singh, 2004] ◮ Hidden Markov models [Hsu/Kakade/Zhang, 2009] ◮ Low-treewidth graphs [Parikh et al., 2012] ◮ Weighted finite state automata [Balle & Mohri, 2012] ◮ Parameter Estimation ◮ Mixture of Gaussians [Kalai/Moitra/Valiant, 2010] ◮ Mixture models, HMMs [Anandkumar/Hsu/Kakade, 2012] ◮ Latent Dirichlet Allocation [Anandkumar/Hsu/Kakade, 2012] ◮ Stochastic block models [Anandkumar/Ge/Hsu/Kakade, 2012] ◮ Linear Bayesian networks [Anandkumar/Hsu/Javanmard/Kakade, 2012] Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 4 / 22

Introduction Outline Introduction Tensor Factorization for a Generative Model Tensor Factorization for a Discriminative Model Experimental Insights Conclusions Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 5 / 22

Tensor Factorization for a Generative Model Aside: Tensor Operations Tensor Product ◮ = × × x ⊗ 3 = x ⊗ x ⊗ x x ⊗ 3 ijk = x i x j x k Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 6 / 22

Tensor Factorization for a Generative Model Aside: Tensor Operations Tensor Product ◮ = × × x ⊗ 3 = x ⊗ x ⊗ x x ⊗ 3 ijk = x i x j x k ◮ Inner product � � = 0 . 5 , � � A , B � = A ijk B ijk ijk Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 6 / 22

Tensor Factorization for a Generative Model Aside: Tensor Operations Tensor Product ◮ = × × x ⊗ 3 = x ⊗ x ⊗ x x ⊗ 3 ijk = x i x j x k ◮ Inner product � � = 0 . 5 , � � A , B � = A ijk B ijk ijk = � vec A , vec B � � � = 0 . 5 , Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 6 / 22

Tensor Factorization for a Generative Model Example: Gaussian Mixture Model anandkumar12moments ◮ Generative process: h ∼ Mult ([ π 1 , π 2 , · · · , π k ]) h x ∼ N ( β h , σ 2 ) . x 2 x x 1 Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 7 / 22

Tensor Factorization for a Generative Model Example: Gaussian Mixture Model anandkumar12moments ◮ Generative process: h ∼ Mult ([ π 1 , π 2 , · · · , π k ]) h x ∼ N ( β h , σ 2 ) . x 2 ◮ Moments: x E [ x | h ] = β h x 1 Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 7 / 22

Tensor Factorization for a Generative Model Example: Gaussian Mixture Model anandkumar12moments ◮ Generative process: h ∼ Mult ([ π 1 , π 2 , · · · , π k ]) h x ∼ N ( β h , σ 2 ) . x 2 ◮ Moments: x E [ x | h ] = β h x 1 � E [ x ] = π h β h h Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 7 / 22

Tensor Factorization for a Generative Model Example: Gaussian Mixture Model anandkumar12moments ◮ Generative process: h ∼ Mult ([ π 1 , π 2 , · · · , π k ]) h x ∼ N ( β h , σ 2 ) . x 2 ◮ Moments: x E [ x | h ] = β h x 1 � d E [ x ] = π h β h h E [ x ⊗ 2 ] d � E [ x ⊗ 2 ] = h ) + σ 2 π h ( β h β T h � π h β h ⊗ 2 + σ 2 = h Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 7 / 22

Tensor Factorization for a Generative Model Example: Gaussian Mixture Model anandkumar12moments ◮ Generative process: h ∼ Mult ([ π 1 , π 2 , · · · , π k ]) h x ∼ N ( β h , σ 2 ) . x 2 ◮ Moments: x E [ x | h ] = β h x 1 � d E [ x ] = π h β h h E [ x ⊗ 2 ] d � E [ x ⊗ 2 ] = h ) + σ 2 π h ( β h β T h � π h β h ⊗ 2 + σ 2 = h E [ x ⊗ 3 ] d � E [ x ⊗ 3 ] = π h β ⊗ 3 + bias . h h Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 7 / 22

Tensor Factorization for a Generative Model Solution: Tensor Factorization ◮ E [ x ⊗ 3 ] = � k h =1 π h β ⊗ 3 h . h x 2 x x 1 Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 8 / 22

Tensor Factorization for a Generative Model Solution: Tensor Factorization ◮ E [ x ⊗ 3 ] = � k h =1 π h β ⊗ 3 h . h x 2 x x 1 = + + · · · + k Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 8 / 22

Tensor Factorization for a Generative Model Solution: Tensor Factorization AnandkumarGeHsu2012 ◮ E [ x ⊗ 3 ] = � k h =1 π h β ⊗ 3 h . h ◮ If β h are orthogonal, they are eigenvectors! x 2 E [ x ⊗ 3 ]( β h , β h ) = π h β h . x x 1 = + + · · · + k Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 8 / 22

Tensor Factorization for a Generative Model Solution: Tensor Factorization AnandkumarGeHsu2012 ◮ E [ x ⊗ 3 ] = � k h =1 π h β ⊗ 3 h . h ◮ If β h are orthogonal, they are eigenvectors! x 2 E [ x ⊗ 3 ]( β h , β h ) = π h β h . x ◮ In general, whiten E [ x ⊗ 3 ] first. x 1 = + + · · · + k Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 8 / 22

Tensor Factorization for a Generative Model x h h y x Generative Models Discriminative Models Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 9 / 22

Tensor Factorization for a Discriminative Model Mixture of Linear Regressions x h y y x Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 10 / 22

Tensor Factorization for a Discriminative Model Mixture of Linear Regressions x h y y ◮ Given x ◮ h ∼ Mult ([ π 1 , π 2 , · · · , π k ]). x Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 10 / 22

Spectral Experts for Estimating Mixtures of Linear Regressions Arun - PowerPoint PPT Presentation

Spectral Experts for Estimating Mixtures of Linear Regressions Arun Tejasvi Chaganty Percy Liang Stanford University January 28, 2016 Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 1 / 22 Introduction Latent

Estimating Variance under Estimating Mean . . . Interval and Fuzzy Estimating Variance . . .

Spectral Clustering Spectral Clustering? Spectral methods Methods using eigenvectors of

Analysis of a model of elastic plastic mixtures (Prandtl-Reuss-mixtures) Project of Josef

Estimating Estimating Covariance . . . Statistical Characteristics Estimating . . . Proof of

Mixtures of models Michel Bierlaire michel.bierlaire@epfl.ch Transport and Mobility Laboratory

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Planning III-A: Planning III-A: Estimating Software Size - Estimating Software Size -

Estimating Frequency Moments Estimating F 0 Algorithm Correctness Further Anil Maheshwari

Estimating Frequency Moments Moments Estimating F 0 Algorithm Correctness Anil Maheshwari

An Introduction to Spectral Learning Hanxiao Liu November 8, 2013 An Introduction to Spectral

Release granular mushrooms Release granular mushrooms and dried mixtures and dried mixtures

The science of mixtures and separation techniques Rahul Bhambure PhD Scientist, Chemical

Estimating the Error at Given Test Estimating the Error at Given Test Input Points for Linear

Public consultation EXPERTS WIPO ADR PRESENTATION AND CURRENT STATE OF THE EXPERTS WIPO ADR

Today Experts/Zero-Sum Games Equilibrium. Boosting and Experts. Routing and Experts. Two person

Quadratic versus Linear Estimating Equations GLS estimating equations 2 g 2 f

Crab Nebula Flares: Too much ado about not too much? D. Kazanas NASA/GSFC The Crab Nebula

AN INTRODUCTION TO TRANSITION POLYNOMIALS Jo Ellis-Monaghan The story There are

From the Foundational Crisis of Mathematics to Explicit Mathematics PhDs in Logic XI Gerhard J

1 National Faculty Trends Composition of Instructional Faculty Among Nonprofit Institutions* *

2 3 3G / 4G 3G / 4G CENTRAFUSE CORE 4 5 It isnt about volume Value to customer

Library Superstar Thursday, July 23, 2015 Lydia Thorne Search Operators A Review What is

Welcome Chairs:) PLC Process Check 1. What is it we want our students

Pervasive Parallelism Laboratory Stanford University Unleash full power of future computing

Sambuz

Useful Links

Newsletter

Mail Us