Guaranteed Learning of Latent Variable Models through Spectral and - PowerPoint PPT Presentation

Independent Component Analysis h 1 h 2 h k Independent sources, unknown mixing. Blind source separation. A Application: speech, image, video.. k sources. d dimensions. x 2 x d x 1 x = Ah + z . z ∼ N (0 , σ 2 I ) . Sources h i are independent. Form cumulant tensor M 4 := E [ x ⊗ 4 ] − E [ x i 1 x i 2 ] E [ x i 3 x i 4 ] . . . � κ i a i ⊗ a i ⊗ a i ⊗ a i . = i Kurtosis: κ i := E [ h 4 i ] − 3 . Assumption: sources have non-zero kurtosis ( κ i � = 0) .

Outline Introduction 1 Latent Variable Models and Moments 2 Community Detection in Graphs 3 Analysis of Tensor Power Method 4 Advanced Topics 5 Conclusion 6

Social Networks & Recommender Systems Social Networks Recommender Systems Network of social ties, e.g. Observed: Ratings of users for friendships, co-authorships various products. Hidden: communities of actors. Goal: New recommendations. Modeling: User/product groups.

Network Community Models How are communities formed? How do communities interact?

Network Community Models How are communities formed? How do communities interact? 0.1 0.8 0.1 0.4 0.3 0.3 0.7 0.2 0.1

Network Community Models How are communities formed? How do communities interact? 0.9 0.1 0.8 0.1 0.4 0.3 0.3 0.7 0.2 0.1

Network Community Models How are communities formed? How do communities interact? 0.1 0.8 0.1 0.1 0.4 0.3 0.3 0.7 0.2 0.1

Network Community Models How are communities formed? How do communities interact? 0.1 0.8 0.1 0.4 0.3 0.3 0.7 0.2 0.1

Mixed Membership Model (Airoldi et al) k communities and n nodes. Graph G ∈ R n × n (adjacency matrix). Fractional memberships: π x ∈ R k membership of node x . � ∆ k − 1 := { π x ∈ R k , π x ( i ) ∈ [0 , 1] , ∀ x ∈ [ n ] } . π x ( i ) = 1 , i Node memberships { π u } drawn from Dirichlet distribution.

Mixed Membership Model (Airoldi et al) k communities and n nodes. Graph G ∈ R n × n (adjacency matrix). Fractional memberships: π x ∈ R k membership of node x . � ∆ k − 1 := { π x ∈ R k , π x ( i ) ∈ [0 , 1] , ∀ x ∈ [ n ] } . π x ( i ) = 1 , i Node memberships { π u } drawn from Dirichlet distribution. Edges conditionally independent given community memberships: G i,j ⊥ ⊥ G a,b | π i , π j , π a , π b . Edge probability averaged over community memberships P [ G i,j = 1 | π i , π j ] = E [ G i,j | π i , π j ] = π ⊤ i Pπ j . P ∈ R k × k : average edge connectivity for pure communities. Airoldi, Blei, Fienberg, and Xing. Mixed membership stochastic blockmodels. J. of Machine Learning Research, June 2008.

Networks under Community Models

Networks under Community Models Stochastic Block Model α 0 = 0

Networks under Community Models Stochastic Block Model Mixed Membership Model α 0 = 0 α 0 = 1

Networks under Community Models Stochastic Block Model Mixed Membership Model α 0 = 0 α 0 = 10

Networks under Community Models Stochastic Block Model Mixed Membership Model α 0 = 0 α 0 = 10 Unifying Assumption Edges conditionally independent given community memberships

Subgraph Counts as Graph Moments

Subgraph Counts as Graph Moments 3 -star counts sufficient for identifiability and learning of MMSB

Subgraph Counts as Graph Moments 3 -star counts sufficient for identifiability and learning of MMSB 3 -Star Count Tensor 1 ˜ x M 3 ( a, b, c ) = | X | # of common neighbors in X X � 1 = G ( x, a ) G ( x, b ) G ( x, c ) . | X | x ∈ X A B C � 1 ˜ [ G ⊤ x,A ⊗ G ⊤ x,B ⊗ G ⊤ M 3 = x,C ] c a b | X | x ∈ X

Multi-view Representation Conditional independence of the three views π x : community membership vector of node x . 3 -stars Graphical model π x x X U V W A B C G ⊤ G ⊤ G ⊤ x,A x,C x,B E [ G ⊤ x,A | Π] = Π ⊤ A P ⊤ π x = Uπ x . Linear Multiview Model:

Subgraph Counts as Graph Moments Second and Third Order Moments � ˆ 1 Z C G ⊤ x,C G x,B Z ⊤ M 2 := B − shift | X | x � � � ˆ 1 G ⊤ x,A ⊗ Z B G ⊤ x,B ⊗ Z C G ⊤ − shift M 3 := x,C | X | x Symmetrize Transition Matrices x X Pairs C,B := G ⊤ X,C ⊗ G ⊤ X,B Z B := Pairs ( A, C ) (Pairs ( B, C )) † A B C Z C := Pairs ( A, B ) (Pairs ( C, B )) † c a b E [ G ⊤ Linear Multiview Model: x,A | Π] = Uπ x . � � α i α i E [ ˆ E [ ˆ M 2 | Π A,B,C ] = u i ⊗ u i , M 3 | Π A,B,C ] = u i ⊗ u i ⊗ u i . α 0 α 0 i i

Outline Introduction 1 Latent Variable Models and Moments 2 Community Detection in Graphs 3 Analysis of Tensor Power Method 4 Advanced Topics 5 Conclusion 6

Recap of Tensor Method � � M 2 = w i a i ⊗ a i , M 3 = w i a i ⊗ a i ⊗ a i . i i v 1 a 1 W a 2 v 2 a 3 Whitening matrix W from SVD of M 2 . v 3 Multilinear transform: T = M 3 ( W, W, W ) . Tensor M 3 Tensor T Eigenvectors of T through power method and deflation. T ( I, v, v ) v �→ � T ( I, v, v ) � .

Orthogonal Tensor Eigen Decomposition � T = λ i v i ⊗ v i ⊗ v i , � v i , v j � = δ i,j , ∀ i, j. i ∈ [ k ] T ( I, v 1 , v 1 ) = � i λ i � v i , v 1 � 2 v i = λ 1 v 1 . v i are eigenvectors of tensor T .

Orthogonal Tensor Eigen Decomposition � T = λ i v i ⊗ v i ⊗ v i , � v i , v j � = δ i,j , ∀ i, j. i ∈ [ k ] T ( I, v 1 , v 1 ) = � i λ i � v i , v 1 � 2 v i = λ 1 v 1 . v i are eigenvectors of tensor T . Tensor Power Method Start from an initial vector v . T ( I, v, v ) v �→ � T ( I, v, v ) � .

Orthogonal Tensor Eigen Decomposition � T = λ i v i ⊗ v i ⊗ v i , � v i , v j � = δ i,j , ∀ i, j. i ∈ [ k ] T ( I, v 1 , v 1 ) = � i λ i � v i , v 1 � 2 v i = λ 1 v 1 . v i are eigenvectors of tensor T . Tensor Power Method Start from an initial vector v . T ( I, v, v ) v �→ � T ( I, v, v ) � . Questions Is there convergence? Does the convergence depend on initialization? What about performance under noise?

Recap of Matrix Eigen Analysis For symmetric M ∈ R k × k , eigen decomposition: M = � i λ i v i v ⊤ i . Eigen vectors are fixed points: Mv = λv . ◮ In our notation: M ( I, v ) = λv . Uniqueness (Identifiability): Iff. λ i are distinct.

Recap of Matrix Eigen Analysis For symmetric M ∈ R k × k , eigen decomposition: M = � i λ i v i v ⊤ i . Eigen vectors are fixed points: Mv = λv . ◮ In our notation: M ( I, v ) = λv . Uniqueness (Identifiability): Iff. λ i are distinct. M ( I, v ) Power method: v �→ � M ( I, v ) � .

Recap of Matrix Eigen Analysis For symmetric M ∈ R k × k , eigen decomposition: M = � i λ i v i v ⊤ i . Eigen vectors are fixed points: Mv = λv . ◮ In our notation: M ( I, v ) = λv . Uniqueness (Identifiability): Iff. λ i are distinct. M ( I, v ) Power method: v �→ � M ( I, v ) � . Convergence properties Let λ 1 > λ 2 . . . > λ d . { v i } form a basis. Let initialization v = � i c i v i . If c 1 � = 0 , power method converges to v 1 .

Recap of Matrix Eigen Analysis For symmetric M ∈ R k × k , eigen decomposition: M = � i λ i v i v ⊤ i . Eigen vectors are fixed points: Mv = λv . ◮ In our notation: M ( I, v ) = λv . Uniqueness (Identifiability): Iff. λ i are distinct. M ( I, v ) Power method: v �→ � M ( I, v ) � . Convergence properties Let λ 1 > λ 2 . . . > λ d . { v i } form a basis. Let initialization v = � i c i v i . If c 1 � = 0 , power method converges to v 1 . Perturbation analysis (Davis-Kahan): T + E Require � E � < min i � = j | λ i − λ j | .

Optimization viewpoint of matrix analysis � M = λ i v i ⊗ v i , λ 1 > λ 2 . . . . i ∈ [ k ] � Rayleigh quotient at v : M ( v, v ) = v ⊤ Mv = λ i � v i , v � 2 . i Optimization problem: max M ( v, v ) s.t. � v � = 1 . v

Optimization viewpoint of matrix analysis � M = λ i v i ⊗ v i , λ 1 > λ 2 . . . . i ∈ [ k ] � Rayleigh quotient at v : M ( v, v ) = v ⊤ Mv = λ i � v i , v � 2 . i Optimization problem: max M ( v, v ) s.t. � v � = 1 . v Non-convex problem. Global maximizer is v 1 (top eigenvector).

Optimization viewpoint of matrix analysis � M = λ i v i ⊗ v i , λ 1 > λ 2 . . . . i ∈ [ k ] � Rayleigh quotient at v : M ( v, v ) = v ⊤ Mv = λ i � v i , v � 2 . i Optimization problem: max M ( v, v ) s.t. � v � = 1 . v Non-convex problem. Global maximizer is v 1 (top eigenvector). What are the local optimizers?

Optimization viewpoint of matrix analysis Optimization: max M ( v, v ) s.t. � v � = 1 . v Lagrangian: L ( v, λ ) := M ( v, v ) − λ ( v ⊤ v − 1) .

Optimization viewpoint of matrix analysis Optimization: max M ( v, v ) s.t. � v � = 1 . v Lagrangian: L ( v, λ ) := M ( v, v ) − λ ( v ⊤ v − 1) . First derivative: ∇ L ( v, λ ) = 2( M ( I, v ) − λv ) .

Optimization viewpoint of matrix analysis Optimization: max M ( v, v ) s.t. � v � = 1 . v Lagrangian: L ( v, λ ) := M ( v, v ) − λ ( v ⊤ v − 1) . First derivative: ∇ L ( v, λ ) = 2( M ( I, v ) − λv ) . Stationary points are eigenvectors: ∇ L ( v, λ ) = 0 .

Optimization viewpoint of matrix analysis Optimization: max M ( v, v ) s.t. � v � = 1 . v Lagrangian: L ( v, λ ) := M ( v, v ) − λ ( v ⊤ v − 1) . First derivative: ∇ L ( v, λ ) = 2( M ( I, v ) − λv ) . Stationary points are eigenvectors: ∇ L ( v, λ ) = 0 . M ( I,v ) Power method v �→ � M ( I,v ) � is a version of gradient ascent.

Optimization viewpoint of matrix analysis Optimization: max M ( v, v ) s.t. � v � = 1 . v Lagrangian: L ( v, λ ) := M ( v, v ) − λ ( v ⊤ v − 1) . First derivative: ∇ L ( v, λ ) = 2( M ( I, v ) − λv ) . Stationary points are eigenvectors: ∇ L ( v, λ ) = 0 . M ( I,v ) Power method v �→ � M ( I,v ) � is a version of gradient ascent. Second derivative: ∇ 2 L ( v, λ ) = 2( M − λI ) .

Optimization viewpoint of matrix analysis Optimization: max M ( v, v ) s.t. � v � = 1 . v Lagrangian: L ( v, λ ) := M ( v, v ) − λ ( v ⊤ v − 1) . First derivative: ∇ L ( v, λ ) = 2( M ( I, v ) − λv ) . Stationary points are eigenvectors: ∇ L ( v, λ ) = 0 . M ( I,v ) Power method v �→ � M ( I,v ) � is a version of gradient ascent. Second derivative: ∇ 2 L ( v, λ ) = 2( M − λI ) . Local optimality condition for constrained optimization w ⊤ ∇ 2 L ( v, λ ) w < 0 for all w ⊥ v , at a stationary point v .

Optimization viewpoint of matrix analysis Optimization: max M ( v, v ) s.t. � v � = 1 . v Lagrangian: L ( v, λ ) := M ( v, v ) − λ ( v ⊤ v − 1) . First derivative: ∇ L ( v, λ ) = 2( M ( I, v ) − λv ) . Stationary points are eigenvectors: ∇ L ( v, λ ) = 0 . M ( I,v ) Power method v �→ � M ( I,v ) � is a version of gradient ascent. Second derivative: ∇ 2 L ( v, λ ) = 2( M − λI ) . Local optimality condition for constrained optimization w ⊤ ∇ 2 L ( v, λ ) w < 0 for all w ⊥ v , at a stationary point v . Verify: v 1 is the only local optimum. Verify: All other eigenvectors are saddle points.

Optimization viewpoint of matrix analysis Optimization: max M ( v, v ) s.t. � v � = 1 . v Lagrangian: L ( v, λ ) := M ( v, v ) − λ ( v ⊤ v − 1) . First derivative: ∇ L ( v, λ ) = 2( M ( I, v ) − λv ) . Stationary points are eigenvectors: ∇ L ( v, λ ) = 0 . M ( I,v ) Power method v �→ � M ( I,v ) � is a version of gradient ascent. Second derivative: ∇ 2 L ( v, λ ) = 2( M − λI ) . Local optimality condition for constrained optimization w ⊤ ∇ 2 L ( v, λ ) w < 0 for all w ⊥ v , at a stationary point v . Verify: v 1 is the only local optimum. Verify: All other eigenvectors are saddle points. Power method recovers v 1 when initialization v satisfies � v, v 1 � � = 0 .

Analysis of Tensor Power Method � T = λ i v i ⊗ v i ⊗ v i . i ∈ [ k ] Bad news about tensors Decomposition may not always exist for general tensors. Finding the decomposition is NP-hard in general.

Analysis of Tensor Power Method � T = λ i v i ⊗ v i ⊗ v i . i ∈ [ k ] Bad news about tensors Decomposition may not always exist for general tensors. Finding the decomposition is NP-hard in general. We will see that a tractable case is when we are promised that an orthogonal decomposition exists.

Analysis of Tensor Power Method � T = λ i v i ⊗ v i ⊗ v i . i ∈ [ k ] Bad news about tensors Decomposition may not always exist for general tensors. Finding the decomposition is NP-hard in general. We will see that a tractable case is when we are promised that an orthogonal decomposition exists. Characterization of components { v i } { v i } are eigenvectors: T ( I, v i , v i ) = λ i v i .

Analysis of Tensor Power Method � T = λ i v i ⊗ v i ⊗ v i . i ∈ [ k ] Bad news about tensors Decomposition may not always exist for general tensors. Finding the decomposition is NP-hard in general. We will see that a tractable case is when we are promised that an orthogonal decomposition exists. Characterization of components { v i } { v i } are eigenvectors: T ( I, v i , v i ) = λ i v i . Bad news: There can be other eigenvectors (unlike matrix case). v = v 1 + v 2 satisfies T ( I, v, v ) = 1 √ √ 2 v. λ i ≡ 1 . 2

Analysis of Tensor Power Method � T = λ i v i ⊗ v i ⊗ v i . i ∈ [ k ] Bad news about tensors Decomposition may not always exist for general tensors. Finding the decomposition is NP-hard in general. We will see that a tractable case is when we are promised that an orthogonal decomposition exists. Characterization of components { v i } { v i } are eigenvectors: T ( I, v i , v i ) = λ i v i . Bad news: There can be other eigenvectors (unlike matrix case). v = v 1 + v 2 satisfies T ( I, v, v ) = 1 √ √ 2 v. λ i ≡ 1 . 2 How do we avoid spurious solutions (not part of decomposition)?

Optimization viewpoint of tensor analysis Optimization: max T ( v, v, v ) s.t. � v � = 1 . v Lagrangian: L ( v, λ ) := T ( v, v, v ) − 1 . 5 λ ( v ⊤ v − 1) .

Optimization viewpoint of tensor analysis Optimization: max T ( v, v, v ) s.t. � v � = 1 . v Lagrangian: L ( v, λ ) := T ( v, v, v ) − 1 . 5 λ ( v ⊤ v − 1) . First derivative: ∇ L ( v, λ ) = 3( T ( I, v, v ) − λv ) .

Optimization viewpoint of tensor analysis Optimization: max T ( v, v, v ) s.t. � v � = 1 . v Lagrangian: L ( v, λ ) := T ( v, v, v ) − 1 . 5 λ ( v ⊤ v − 1) . First derivative: ∇ L ( v, λ ) = 3( T ( I, v, v ) − λv ) . Stationary points are eigenvectors: ∇ L ( v, λ ) = 0 .

Optimization viewpoint of tensor analysis Optimization: max T ( v, v, v ) s.t. � v � = 1 . v Lagrangian: L ( v, λ ) := T ( v, v, v ) − 1 . 5 λ ( v ⊤ v − 1) . First derivative: ∇ L ( v, λ ) = 3( T ( I, v, v ) − λv ) . Stationary points are eigenvectors: ∇ L ( v, λ ) = 0 . T ( I,v,v ) Power method v �→ � T ( I,v,v ) � is a version of gradient ascent.

Optimization viewpoint of tensor analysis Optimization: max T ( v, v, v ) s.t. � v � = 1 . v Lagrangian: L ( v, λ ) := T ( v, v, v ) − 1 . 5 λ ( v ⊤ v − 1) . First derivative: ∇ L ( v, λ ) = 3( T ( I, v, v ) − λv ) . Stationary points are eigenvectors: ∇ L ( v, λ ) = 0 . T ( I,v,v ) Power method v �→ � T ( I,v,v ) � is a version of gradient ascent. Second derivative: ∇ 2 L ( v, λ ) = 3(2 T ( I, I, v ) − λI ) .

Optimization viewpoint of tensor analysis Optimization: max T ( v, v, v ) s.t. � v � = 1 . v Lagrangian: L ( v, λ ) := T ( v, v, v ) − 1 . 5 λ ( v ⊤ v − 1) . First derivative: ∇ L ( v, λ ) = 3( T ( I, v, v ) − λv ) . Stationary points are eigenvectors: ∇ L ( v, λ ) = 0 . T ( I,v,v ) Power method v �→ � T ( I,v,v ) � is a version of gradient ascent. Second derivative: ∇ 2 L ( v, λ ) = 3(2 T ( I, I, v ) − λI ) . Local optimality condition for constrained optimization w ⊤ ∇ 2 L ( v, λ ) w < 0 for all w ⊥ v , at a stationary point v .

Optimization viewpoint of tensor analysis Optimization: max T ( v, v, v ) s.t. � v � = 1 . v Lagrangian: L ( v, λ ) := T ( v, v, v ) − 1 . 5 λ ( v ⊤ v − 1) . First derivative: ∇ L ( v, λ ) = 3( T ( I, v, v ) − λv ) . Stationary points are eigenvectors: ∇ L ( v, λ ) = 0 . T ( I,v,v ) Power method v �→ � T ( I,v,v ) � is a version of gradient ascent. Second derivative: ∇ 2 L ( v, λ ) = 3(2 T ( I, I, v ) − λI ) . Local optimality condition for constrained optimization w ⊤ ∇ 2 L ( v, λ ) w < 0 for all w ⊥ v , at a stationary point v . Verify: { v i } are the only local optima. Verify: All other eigenvectors are saddle points.

Optimization viewpoint of tensor analysis Optimization: max T ( v, v, v ) s.t. � v � = 1 . v Lagrangian: L ( v, λ ) := T ( v, v, v ) − 1 . 5 λ ( v ⊤ v − 1) . First derivative: ∇ L ( v, λ ) = 3( T ( I, v, v ) − λv ) . Stationary points are eigenvectors: ∇ L ( v, λ ) = 0 . T ( I,v,v ) Power method v �→ � T ( I,v,v ) � is a version of gradient ascent. Second derivative: ∇ 2 L ( v, λ ) = 3(2 T ( I, I, v ) − λI ) . Local optimality condition for constrained optimization w ⊤ ∇ 2 L ( v, λ ) w < 0 for all w ⊥ v , at a stationary point v . Verify: { v i } are the only local optima. Verify: All other eigenvectors are saddle points. For an orthogonal tensor, no spurious local optima!

Review: matrix power iteration Recall matrix power iteration for matrix M := � i λ i v i v ⊤ i : Start with some v , and for j = 1 , 2 , . . . : � � � v ⊤ v �→ Mv = λ i i v v i . i i.e. , component in v i direction is scaled by λ i .

Review: matrix power iteration Recall matrix power iteration for matrix M := � i λ i v i v ⊤ i : Start with some v , and for j = 1 , 2 , . . . : � � � v ⊤ v �→ Mv = λ i i v v i . i i.e. , component in v i direction is scaled by λ i . If λ 1 > λ 2 ≥ · · · , then in t iterations, � � 2 � λ 2 � 2 t v ⊤ 1 v � 2 ≥ 1 − k . � � λ 1 v ⊤ i v i Converges linearly to v 1 assuming gap λ 2 /λ 1 < 1 .

Tensor power iteration convergence analysis Let c i := v ⊤ i v initial component in v i direction; assume WLOG λ 1 | c 1 | > λ 2 | c 2 | ≥ λ 3 | c 3 | ≥ · · · .

Tensor power iteration convergence analysis Let c i := v ⊤ i v initial component in v i direction; assume WLOG λ 1 | c 1 | > λ 2 | c 2 | ≥ λ 3 | c 3 | ≥ · · · . Then � � � � 2 v i = v ⊤ λ i c 2 v �→ λ i i v i v i i i i.e. , component in v i direction is squared then scaled by λ i .

Tensor power iteration convergence analysis Let c i := v ⊤ i v initial component in v i direction; assume WLOG λ 1 | c 1 | > λ 2 | c 2 | ≥ λ 3 | c 3 | ≥ · · · . Then � � � � 2 v i = v ⊤ λ i c 2 v �→ λ i i v i v i i i i.e. , component in v i direction is squared then scaled by λ i . By induction, in t iterations � λ 2 t − 1 c 2 t v = v i , i i i so � � 2 � 2 � � � 2 t +1 v ⊤ � � 1 v λ 1 v 2 c 2 � � � 2 ≥ 1 − k . � � � � max i � =1 λ i v 1 c 1 v ⊤ i v i

Matrix vs. tensor power iteration Matrix power iteration : Tensor power iteration :

Matrix vs. tensor power iteration Matrix power iteration : Requires gap between largest and second-largest eigenvalue. 1 Property of the matrix only. Tensor power iteration : Requires gap between largest and second-largest λ i | c i | . 1 Property of the tensor and initialization v .

Matrix vs. tensor power iteration Matrix power iteration : Requires gap between largest and second-largest eigenvalue. 1 Property of the matrix only. Converges to top eigenvector. 2 Tensor power iteration : Requires gap between largest and second-largest λ i | c i | . 1 Property of the tensor and initialization v . Converges to v i for which v i | c i | = max! could be any of them. 2

Matrix vs. tensor power iteration Matrix power iteration : Requires gap between largest and second-largest eigenvalue. 1 Property of the matrix only. Converges to top eigenvector. 2 Linear convergence. Need O (log(1 /ǫ )) iterations. 3 Tensor power iteration : Requires gap between largest and second-largest λ i | c i | . 1 Property of the tensor and initialization v . Converges to v i for which v i | c i | = max! could be any of them. 2 Quadratic convergence. Need O (log log(1 /ǫ )) iterations. 3

Guaranteed Learning of Latent Variable Models through Spectral and - PowerPoint PPT Presentation

Guaranteed Learning of Latent Variable Models through Spectral and Tensor Methods Anima Anandkumar U.C. Irvine Guaranteed Unsupervised Learning Unsupervised Learning: no labeled samples available for training. Guaranteed Unsupervised Learning

Guaranteed Learning of Latent Variable Models through Tensor Methods Furong Huang University of

1 Latent variable models In the next section we will discuss latent variable models for

Learning Overcomplete Latent Variable Models through Tensor Methods Anima Anandkumar UC Irvine

Latent Variable Models CS3750 Xiaoting Li 1 Out utli line Latent Variable Models

Part III: Latent Tree Models Le Song ICML 2012 Tutorial on Spectral Algorithms for Latent

Learning Latent Variable Models through Tensor Methods Anima Anandkumar U.C. Irvine Challenges

Pengtao Xie Joint work with Yuntian Deng and Eric Xing Carnegie Mellon University 1 Latent

Latent Variable Models Stefano Ermon, Aditya Grover Stanford University Lecture 6 Stefano

Learning Overcomplete Latent Variable Models through Tensor Methods Majid Janzamin UC Irvine

Latent Variable models for GWAs Oliver Stegle Machine Learning and Computational Biology Research

Outline Latent Variable Generative Models Cooperative Vector Quantizer Model Model

Numberjack User Guide May 27, 2013 1 Variables Constructor for the class Variable : Constructor

Discrete Latent Variable Models Stefano Ermon, Aditya Grover Stanford University Lecture 15

Maximum Reconstruction Estimation for Generative Latent-Variable Models Yong Cheng joint work

Stochastic Latent Actor-Critic: Deep Reinforcement Learning with a Latent Variable Model CS330

Distributed Variational Inference in Sparse Gaussian Process Regression and Latent Variable Models

Karthik ik Kambatla, , Purdue ue Univ ivers ersit ity Abhinav Pathak, Purdue University

Learning with Differentiable Perturbed Optimizers Quentin Berthet Youth in High-dimensions -

The Role of Normware in Trustworthy and Explainable AI Giovanni Sileno (g.sileno@uva.nl),

Self-testing quantum systems of arbitrary local Self-testing quantum systems of arbitrary local

Sparse Canonical Correlation Analysis: Minimaxity, Algorithm, and Computational Barrier Harrison

CONVERSION FUNNEL MASTERY A N D C E R T I F I C A T I O N C L A S S OUR GOAL: Craft a

Optimization of lowest Robin eigenvalues on 2-manifolds and unbounded cones Vladimir Lotoreichik

Lecture 5 Math Prerequisite II: Nonlinear Least-squares Lin ZHANG, PhD School of Software