learning gaussian tree models analysis of error exponents
play

Learning Gaussian Tree Models: Analysis of Error Exponents and - PowerPoint PPT Presentation

Learning Gaussian Tree Models: Analysis of Error Exponents and Extremal Structures Vincent Tan Animashree Anandkumar, Alan Willsky Stochastic Systems Group, Laboratory for Information and Decision Systems, Massachusetts Institute of Technology


  1. Learning Gaussian Tree Models: Analysis of Error Exponents and Extremal Structures Vincent Tan Animashree Anandkumar, Alan Willsky Stochastic Systems Group, Laboratory for Information and Decision Systems, Massachusetts Institute of Technology Allerton Conference (Sep 30, 2009) 1/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 1 / 20

  2. Motivation Given a set of i.i.d. samples drawn from p , a Gaussian tree model. 2/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 2 / 20

  3. Motivation Given a set of i.i.d. samples drawn from p , a Gaussian tree model. Inferring structure of Phylogenetic Trees from observed data. Carlson et al. 2008, PLoS Comp. Bio. 2/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 2 / 20

  4. More motivation What is the exact rate of decay of the probability of error? 3/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 3 / 20

  5. More motivation What is the exact rate of decay of the probability of error? How do the structure and parameters of the model influence the error exponent (rate of decay)? 3/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 3 / 20

  6. More motivation What is the exact rate of decay of the probability of error? How do the structure and parameters of the model influence the error exponent (rate of decay)? What are extremal tree distributions for learning? 3/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 3 / 20

  7. More motivation What is the exact rate of decay of the probability of error? How do the structure and parameters of the model influence the error exponent (rate of decay)? What are extremal tree distributions for learning? Consistency is well established (Chow and Wagner 1973). Error Exponent is a quantitative measure of the “goodness” of learning. 3/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 3 / 20

  8. Main Contributions Provide the exact Rate of Decay for a given p . 1 4/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 4 / 20

  9. Main Contributions Provide the exact Rate of Decay for a given p . 1 Rate of decay ≈ SNR for learning. 2 4/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 4 / 20

  10. Main Contributions Provide the exact Rate of Decay for a given p . 1 Rate of decay ≈ SNR for learning. 2 Characterized the extremal trees structures for learning, i.e., stars 3 and Markov chains. Stars have the slowest rate. Chains have the fastest rate. 4/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 4 / 20

  11. Notation and Background p = N ( 0 , Σ ) : d -dimensional Gaussian tree model. 5/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 5 / 20

  12. Notation and Background p = N ( 0 , Σ ) : d -dimensional Gaussian tree model. Samples x n = { x 1 , x 2 , . . . , x n } drawn i.i.d. from p . 5/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 5 / 20

  13. Notation and Background p = N ( 0 , Σ ) : d -dimensional Gaussian tree model. Samples x n = { x 1 , x 2 , . . . , x n } drawn i.i.d. from p . p : Markov on T p = ( V , E p ) , a tree. p : Factorizes according to T p . 5/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 5 / 20

  14. Notation and Background p = N ( 0 , Σ ) : d -dimensional Gaussian tree model. Samples x n = { x 1 , x 2 , . . . , x n } drawn i.i.d. from p . p : Markov on T p = ( V , E p ) , a tree. p : Factorizes according to T p . p ( x ) = p 1 ( x 1 ) p 1 , 2 ( x 1 , x 2 ) p 1 , 3 ( x 1 , x 3 ) p 1 , 4 ( x 1 , x 4 ) , p 1 ( x 1 ) p 1 ( x 1 ) p 1 ( x 1 ) 5/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 5 / 20

  15. Notation and Background p = N ( 0 , Σ ) : d -dimensional Gaussian tree model. Samples x n = { x 1 , x 2 , . . . , x n } drawn i.i.d. from p . p : Markov on T p = ( V , E p ) , a tree. p : Factorizes according to T p .   ♠ ♣ ♣ ♣   p ( x ) = p 1 ( x 1 ) p 1 , 2 ( x 1 , x 2 ) p 1 , 3 ( x 1 , x 3 ) p 1 , 4 ( x 1 , x 4 ) ♣ ♠ 0 0 Σ − 1 =   ,   ♣ ♠ p 1 ( x 1 ) p 1 ( x 1 ) p 1 ( x 1 ) 0 0 ♣ ♠ 0 0 5/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 5 / 20

  16. Max-Likelihood Learning of Tree Distributions (Chow-Liu) p x n as the empirical distribution of x n , i.e., Denote � p = � p ( x ) := N ( x ; 0 , � � Σ ) where � Σ is the empirical covariance matrix of x n . 6/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 6 / 20

  17. Max-Likelihood Learning of Tree Distributions (Chow-Liu) p x n as the empirical distribution of x n , i.e., Denote � p = � p ( x ) := N ( x ; 0 , � � Σ ) where � Σ is the empirical covariance matrix of x n . � p e : Empirical on edge e . 6/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 6 / 20

  18. Max-Likelihood Learning of Tree Distributions (Chow-Liu) p x n as the empirical distribution of x n , i.e., Denote � p = � p ( x ) := N ( x ; 0 , � � Σ ) where � Σ is the empirical covariance matrix of x n . � p e : Empirical on edge e . Reduces to a max-weight spanning tree problem (Chow-Liu 1968) � � p e ) := � E CL ( x n ) = argmax I ( � p e ) . I ( � I ( X i ; X j ) . E q : q ∈ Trees e ∈E q 6/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 6 / 20

  19. Max-Likelihood Learning of Tree Distributions 7/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 7 / 20

  20. Max-Likelihood Learning of Tree Distributions True MIs { I ( p e ) } Max-weight spanning tree E p 7/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 7 / 20

  21. Max-Likelihood Learning of Tree Distributions True MIs { I ( p e ) } Max-weight spanning tree E p 7/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 7 / 20

  22. Max-Likelihood Learning of Tree Distributions True MIs { I ( p e ) } Max-weight spanning tree E p Max-weight spanning tree � p e ) } from x n E CL ( x n ) � = E p Empirical MIs { I ( � 7/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 7 / 20

  23. Problem Statement The estimated edge set is � E CL ( x n ) 8/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 8 / 20

  24. Problem Statement The estimated edge set is � E CL ( x n ) and the error event is � � � E CL ( x n ) � = E p . 8/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 8 / 20

  25. Problem Statement The estimated edge set is � E CL ( x n ) and the error event is � � � E CL ( x n ) � = E p . 8/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 8 / 20

  26. Problem Statement The estimated edge set is � E CL ( x n ) and the error event is � � � E CL ( x n ) � = E p . Find and analyze the error exponent K p : �� �� n →∞ − 1 � E CL ( x n ) � = E p K p := lim n log P . 8/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 8 / 20

  27. Problem Statement The estimated edge set is � E CL ( x n ) and the error event is � � � E CL ( x n ) � = E p . Find and analyze the error exponent K p : �� �� n →∞ − 1 � E CL ( x n ) � = E p K p := lim n log P . Alternatively, �� �� . � E CL ( x n ) � = E p = exp ( − nK p ) . P 8/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 8 / 20

  28. The Crossover Rate I 9/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 9 / 20

  29. The Crossover Rate I � V � Two pairs of nodes e , e ′ ∈ with distribution p e , e ′ , s.t. 2 I ( p e ) > I ( p e ′ ) . 9/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 9 / 20

  30. The Crossover Rate I � V � Two pairs of nodes e , e ′ ∈ with distribution p e , e ′ , s.t. 2 I ( p e ) > I ( p e ′ ) . Consider the crossover event: { I ( � p e ) ≤ I ( � p e ′ ) } . 9/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 9 / 20

  31. The Crossover Rate I � V � Two pairs of nodes e , e ′ ∈ with distribution p e , e ′ , s.t. 2 I ( p e ) > I ( p e ′ ) . Consider the crossover event: { I ( � p e ) ≤ I ( � p e ′ ) } . Definition: Crossover Rate n →∞ − 1 J e , e ′ := n log P ( { I ( � p e ) ≤ I ( � p e ′ ) } ) . lim 9/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 9 / 20

  32. The Crossover Rate I � V � Two pairs of nodes e , e ′ ∈ with distribution p e , e ′ , s.t. 2 I ( p e ) > I ( p e ′ ) . Consider the crossover event: { I ( � p e ) ≤ I ( � p e ′ ) } . Definition: Crossover Rate n →∞ − 1 J e , e ′ := n log P ( { I ( � p e ) ≤ I ( � p e ′ ) } ) . lim This event may potentially lead to an error in structure learning. Why? 9/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 9 / 20

  33. The Crossover Rate II Theorem The crossover rate is � � J e , e ′ = D ( q || p e , e ′ ) : I ( q e ′ ) = I ( q e ) inf . q ∈ Gaussians 10/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 10 / 20

Recommend


More recommend