Spectral Experts for Estimating Mixtures of Linear Regressions Arun Tejasvi Chaganty Percy Liang Stanford University January 28, 2016 Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 1 / 22
Introduction Latent Variable Models h Generative Models ◮ x Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 2 / 22
Introduction Latent Variable Models h Generative Models ◮ ◮ Gaussian Mixture Models ◮ Hidden Markov Models ◮ Latent Dirichlet Allocation x ◮ PCFGs ◮ . . . Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 2 / 22
Introduction Latent Variable Models h Generative Models ◮ ◮ Gaussian Mixture Models ◮ Hidden Markov Models ◮ Latent Dirichlet Allocation x ◮ PCFGs ◮ . . . Discriminative Models ◮ x h y Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 2 / 22
Introduction Latent Variable Models h Generative Models ◮ ◮ Gaussian Mixture Models ◮ Hidden Markov Models ◮ Latent Dirichlet Allocation x ◮ PCFGs ◮ . . . Discriminative Models ◮ x ◮ Mixture of Experts h ◮ Latent CRFs ◮ Discriminative LDA ◮ . . . y Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 2 / 22
Introduction Latent Variable Models h Generative Models ◮ ◮ Gaussian Mixture Models ◮ Hidden Markov Models ◮ Latent Dirichlet Allocation x ◮ PCFGs ◮ . . . Discriminative Models ◮ x ◮ Mixture of Experts h ◮ Latent CRFs ◮ Discriminative LDA ◮ . . . y ◮ Easy to include features and tend to be more accurate. Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 2 / 22
Introduction Parameter Estimation is Hard − log p θ ( x ) θ ◮ Log-likelihood function is non-convex. Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 3 / 22
Introduction Parameter Estimation is Hard − log p θ ( x ) θ MLE θ ◮ Log-likelihood function is non-convex. ◮ MLE is consistent but intractable. Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 3 / 22
Introduction Parameter Estimation is Hard − log p θ ( x ) θ EM θ EM θ MLE θ ◮ Log-likelihood function is non-convex. ◮ MLE is consistent but intractable. ◮ Local methods (EM, gradient descent, etc.) are tractable but inconsistent. Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 3 / 22
Introduction Parameter Estimation is Hard − log p θ ( x ) θ EM θ EM θ MLE θ ◮ Log-likelihood function is non-convex. ◮ MLE is consistent but intractable. ◮ Local methods (EM, gradient descent, etc.) are tractable but inconsistent. ◮ Can we build an efficient and consistent estimator ? Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 3 / 22
Introduction Related Work ◮ Method of Moments [Pearson, 1894] Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 4 / 22
Introduction Related Work ◮ Method of Moments [Pearson, 1894] ◮ Observable operators ◮ Control Theory [Ljung, 1987] ◮ Observable operator models [Jaeger, 2000; Littman/Sutton/Singh, 2004] ◮ Hidden Markov models [Hsu/Kakade/Zhang, 2009] ◮ Low-treewidth graphs [Parikh et al., 2012] ◮ Weighted finite state automata [Balle & Mohri, 2012] Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 4 / 22
Introduction Related Work ◮ Method of Moments [Pearson, 1894] ◮ Observable operators ◮ Control Theory [Ljung, 1987] ◮ Observable operator models [Jaeger, 2000; Littman/Sutton/Singh, 2004] ◮ Hidden Markov models [Hsu/Kakade/Zhang, 2009] ◮ Low-treewidth graphs [Parikh et al., 2012] ◮ Weighted finite state automata [Balle & Mohri, 2012] ◮ Parameter Estimation ◮ Mixture of Gaussians [Kalai/Moitra/Valiant, 2010] ◮ Mixture models, HMMs [Anandkumar/Hsu/Kakade, 2012] ◮ Latent Dirichlet Allocation [Anandkumar/Hsu/Kakade, 2012] ◮ Stochastic block models [Anandkumar/Ge/Hsu/Kakade, 2012] ◮ Linear Bayesian networks [Anandkumar/Hsu/Javanmard/Kakade, 2012] Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 4 / 22
Introduction Outline Introduction Tensor Factorization for a Generative Model Tensor Factorization for a Discriminative Model Experimental Insights Conclusions Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 5 / 22
Tensor Factorization for a Generative Model Aside: Tensor Operations Tensor Product ◮ = × × x ⊗ 3 = x ⊗ x ⊗ x x ⊗ 3 ijk = x i x j x k Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 6 / 22
Tensor Factorization for a Generative Model Aside: Tensor Operations Tensor Product ◮ = × × x ⊗ 3 = x ⊗ x ⊗ x x ⊗ 3 ijk = x i x j x k ◮ Inner product � � = 0 . 5 , � � A , B � = A ijk B ijk ijk Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 6 / 22
Tensor Factorization for a Generative Model Aside: Tensor Operations Tensor Product ◮ = × × x ⊗ 3 = x ⊗ x ⊗ x x ⊗ 3 ijk = x i x j x k ◮ Inner product � � = 0 . 5 , � � A , B � = A ijk B ijk ijk = � vec A , vec B � � � = 0 . 5 , Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 6 / 22
Tensor Factorization for a Generative Model Example: Gaussian Mixture Model anandkumar12moments ◮ Generative process: h ∼ Mult ([ π 1 , π 2 , · · · , π k ]) h x ∼ N ( β h , σ 2 ) . x 2 x x 1 Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 7 / 22
Tensor Factorization for a Generative Model Example: Gaussian Mixture Model anandkumar12moments ◮ Generative process: h ∼ Mult ([ π 1 , π 2 , · · · , π k ]) h x ∼ N ( β h , σ 2 ) . x 2 ◮ Moments: x E [ x | h ] = β h x 1 Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 7 / 22
Tensor Factorization for a Generative Model Example: Gaussian Mixture Model anandkumar12moments ◮ Generative process: h ∼ Mult ([ π 1 , π 2 , · · · , π k ]) h x ∼ N ( β h , σ 2 ) . x 2 ◮ Moments: x E [ x | h ] = β h x 1 � E [ x ] = π h β h h Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 7 / 22
Tensor Factorization for a Generative Model Example: Gaussian Mixture Model anandkumar12moments ◮ Generative process: h ∼ Mult ([ π 1 , π 2 , · · · , π k ]) h x ∼ N ( β h , σ 2 ) . x 2 ◮ Moments: x E [ x | h ] = β h x 1 � d E [ x ] = π h β h h E [ x ⊗ 2 ] d � E [ x ⊗ 2 ] = h ) + σ 2 π h ( β h β T h � π h β h ⊗ 2 + σ 2 = h Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 7 / 22
Tensor Factorization for a Generative Model Example: Gaussian Mixture Model anandkumar12moments ◮ Generative process: h ∼ Mult ([ π 1 , π 2 , · · · , π k ]) h x ∼ N ( β h , σ 2 ) . x 2 ◮ Moments: x E [ x | h ] = β h x 1 � d E [ x ] = π h β h h E [ x ⊗ 2 ] d � E [ x ⊗ 2 ] = h ) + σ 2 π h ( β h β T h � π h β h ⊗ 2 + σ 2 = h E [ x ⊗ 3 ] d � E [ x ⊗ 3 ] = π h β ⊗ 3 + bias . h h Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 7 / 22
Tensor Factorization for a Generative Model Solution: Tensor Factorization ◮ E [ x ⊗ 3 ] = � k h =1 π h β ⊗ 3 h . h x 2 x x 1 Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 8 / 22
Tensor Factorization for a Generative Model Solution: Tensor Factorization ◮ E [ x ⊗ 3 ] = � k h =1 π h β ⊗ 3 h . h x 2 x x 1 = + + · · · + k Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 8 / 22
Tensor Factorization for a Generative Model Solution: Tensor Factorization AnandkumarGeHsu2012 ◮ E [ x ⊗ 3 ] = � k h =1 π h β ⊗ 3 h . h ◮ If β h are orthogonal, they are eigenvectors! x 2 E [ x ⊗ 3 ]( β h , β h ) = π h β h . x x 1 = + + · · · + k Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 8 / 22
Tensor Factorization for a Generative Model Solution: Tensor Factorization AnandkumarGeHsu2012 ◮ E [ x ⊗ 3 ] = � k h =1 π h β ⊗ 3 h . h ◮ If β h are orthogonal, they are eigenvectors! x 2 E [ x ⊗ 3 ]( β h , β h ) = π h β h . x ◮ In general, whiten E [ x ⊗ 3 ] first. x 1 = + + · · · + k Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 8 / 22
Tensor Factorization for a Generative Model x h h y x Generative Models Discriminative Models Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 9 / 22
Tensor Factorization for a Generative Model x h h y x Generative Models Discriminative Models Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 9 / 22
Tensor Factorization for a Discriminative Model Mixture of Linear Regressions x h y y x Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 10 / 22
Tensor Factorization for a Discriminative Model Mixture of Linear Regressions x h y y ◮ Given x ◮ h ∼ Mult ([ π 1 , π 2 , · · · , π k ]). x Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 10 / 22
Recommend
More recommend