Error Exponents for Composite Hypothesis Testing of Markov Forest Distributions Vincent Tan, Anima Anandkumar, Alan S. Willsky Stochastic Systems Group, Laboratory for Information and Decision Systems, Massachusetts Institute of Technology ISIT (Jun 18, 2010) 1/17 Vincent Tan (SSG, LIDS, MIT) Testing Markov Forest Distributions ISIT 2010 1 / 17
Motivation Continuation of line of work on error exponents for learning tree-structured graphical models: Discrete Case: Tan, Anandkumar, Tong, Willsky, ISIT 2009. 2/17 Vincent Tan (SSG, LIDS, MIT) Testing Markov Forest Distributions ISIT 2010 2 / 17
Motivation Continuation of line of work on error exponents for learning tree-structured graphical models: Discrete Case: Tan, Anandkumar, Tong, Willsky, ISIT 2009. Gaussian Case: Tan, Anandkumar, Willsky, Trans. SP 2010. 2/17 Vincent Tan (SSG, LIDS, MIT) Testing Markov Forest Distributions ISIT 2010 2 / 17
Motivation Continuation of line of work on error exponents for learning tree-structured graphical models: Discrete Case: Tan, Anandkumar, Tong, Willsky, ISIT 2009. Gaussian Case: Tan, Anandkumar, Willsky, Trans. SP 2010. Instead of learning, we instead focus on hypothesis testing. 2/17 Vincent Tan (SSG, LIDS, MIT) Testing Markov Forest Distributions ISIT 2010 2 / 17
Motivation Continuation of line of work on error exponents for learning tree-structured graphical models: Discrete Case: Tan, Anandkumar, Tong, Willsky, ISIT 2009. Gaussian Case: Tan, Anandkumar, Willsky, Trans. SP 2010. Instead of learning, we instead focus on hypothesis testing. Provides intuition for which classes of graphical models are easy for learning in terms of the detection error exponent. 2/17 Vincent Tan (SSG, LIDS, MIT) Testing Markov Forest Distributions ISIT 2010 2 / 17
Motivation Continuation of line of work on error exponents for learning tree-structured graphical models: Discrete Case: Tan, Anandkumar, Tong, Willsky, ISIT 2009. Gaussian Case: Tan, Anandkumar, Willsky, Trans. SP 2010. Instead of learning, we instead focus on hypothesis testing. Provides intuition for which classes of graphical models are easy for learning in terms of the detection error exponent. Is there a relation between the detection error exponent and the exponent associated to structure learning? 2/17 Vincent Tan (SSG, LIDS, MIT) Testing Markov Forest Distributions ISIT 2010 2 / 17
Background on Tree-Structured Graphical Models Graphical model: family of multivariate probability distributions that factorize according to a given graph G = ( V , E ) . Vertices in the set V = { 1 , . . . , d } correspond to variables and � V � edges in E ⊂ to conditional independences. 2 3/17 Vincent Tan (SSG, LIDS, MIT) Testing Markov Forest Distributions ISIT 2010 3 / 17
Background on Tree-Structured Graphical Models Graphical model: family of multivariate probability distributions that factorize according to a given graph G = ( V , E ) . Vertices in the set V = { 1 , . . . , d } correspond to variables and � V � edges in E ⊂ to conditional independences. 2 Example for tree-structured P ( x ) with d = 4 . 3/17 Vincent Tan (SSG, LIDS, MIT) Testing Markov Forest Distributions ISIT 2010 3 / 17
Background on Tree-Structured Graphical Models Graphical model: family of multivariate probability distributions that factorize according to a given graph G = ( V , E ) . Vertices in the set V = { 1 , . . . , d } correspond to variables and � V � edges in E ⊂ to conditional independences. 2 Example for tree-structured P ( x ) with d = 4 . X 3 X 2 ✉ ✉ ❅ � � ❅ � ❅ ❅� X 1 ✉ X 4 ✉ 3/17 Vincent Tan (SSG, LIDS, MIT) Testing Markov Forest Distributions ISIT 2010 3 / 17
Background on Tree-Structured Graphical Models Graphical model: family of multivariate probability distributions that factorize according to a given graph G = ( V , E ) . Vertices in the set V = { 1 , . . . , d } correspond to variables and � V � edges in E ⊂ to conditional independences. 2 Example for tree-structured P ( x ) with d = 4 . X 3 X 2 ✉ ✉ ❅ � � ❅ � ❅ ❅� X 1 ✉ X 4 ✉ P ( x 1 , x 2 , x 3 , x 4 ) = P 1 ( x 1 ) × P 1 , 2 ( x 1 , x 2 ) × P 1 , 3 ( x 1 , x 3 ) × P 1 , 4 ( x 1 , x 4 ) . P 1 ( x 1 ) P 1 ( x 1 ) P 1 ( x 1 ) 3/17 Vincent Tan (SSG, LIDS, MIT) Testing Markov Forest Distributions ISIT 2010 3 / 17
Learning vs Hypothesis Testing Canonical Problem: Given x 1 , . . . , x n ∼ P , learn structure of P . 4/17 Vincent Tan (SSG, LIDS, MIT) Testing Markov Forest Distributions ISIT 2010 4 / 17
Learning vs Hypothesis Testing Canonical Problem: Given x 1 , . . . , x n ∼ P , learn structure of P . If P is a tree, can use Chow and Liu (1968) as an efficient implementation of ML. 4/17 Vincent Tan (SSG, LIDS, MIT) Testing Markov Forest Distributions ISIT 2010 4 / 17
Learning vs Hypothesis Testing Canonical Problem: Given x 1 , . . . , x n ∼ P , learn structure of P . If P is a tree, can use Chow and Liu (1968) as an efficient implementation of ML. Denote set of distributions Markov on a tree T 0 ∈ T as D ( T 0 ) . Set of distributions Markov on any tree is D ( T ) . 4/17 Vincent Tan (SSG, LIDS, MIT) Testing Markov Forest Distributions ISIT 2010 4 / 17
Learning vs Hypothesis Testing Canonical Problem: Given x 1 , . . . , x n ∼ P , learn structure of P . If P is a tree, can use Chow and Liu (1968) as an efficient implementation of ML. Denote set of distributions Markov on a tree T 0 ∈ T as D ( T 0 ) . Set of distributions Markov on any tree is D ( T ) . Composite hypothesis testing problem considered here: H 0 : x 1 , . . . , x n ∼ Λ 0 ⊂ D ( T ) H 1 : x 1 , . . . , x n ∼ Λ 1 ⊂ D ( T ) Λ i closed and Λ 0 ∩ Λ 1 = ∅ . 4/17 Vincent Tan (SSG, LIDS, MIT) Testing Markov Forest Distributions ISIT 2010 4 / 17
Definition of Worst-Case Type-II Error Exponent Neyman-Pearson setup. Acceptance regions ( A n ) . 5/17 Vincent Tan (SSG, LIDS, MIT) Testing Markov Forest Distributions ISIT 2010 5 / 17
Definition of Worst-Case Type-II Error Exponent Neyman-Pearson setup. Acceptance regions ( A n ) . Def: Type-II error exponent for a fixed Q ∈ Λ 1 given ( A n ) : n →∞ − 1 n log Q n ( A n ) J (Λ 0 , Q ; A n ) := lim inf 5/17 Vincent Tan (SSG, LIDS, MIT) Testing Markov Forest Distributions ISIT 2010 5 / 17
Definition of Worst-Case Type-II Error Exponent Neyman-Pearson setup. Acceptance regions ( A n ) . Def: Type-II error exponent for a fixed Q ∈ Λ 1 given ( A n ) : n →∞ − 1 n log Q n ( A n ) J (Λ 0 , Q ; A n ) := lim inf Def: Optimal Type-II error exponent J ∗ (Λ 0 , Q ) := J (Λ 0 , Q ; A n ) sup A n : P n ( A n ) ≤ α, ∀ P ∈ Λ 0 5/17 Vincent Tan (SSG, LIDS, MIT) Testing Markov Forest Distributions ISIT 2010 5 / 17
Definition of Worst-Case Type-II Error Exponent Neyman-Pearson setup. Acceptance regions ( A n ) . Def: Type-II error exponent for a fixed Q ∈ Λ 1 given ( A n ) : n →∞ − 1 n log Q n ( A n ) J (Λ 0 , Q ; A n ) := lim inf Def: Optimal Type-II error exponent J ∗ (Λ 0 , Q ) := J (Λ 0 , Q ; A n ) sup A n : P n ( A n ) ≤ α, ∀ P ∈ Λ 0 Def: Worst-Case Optimal Type-II error exponent J ∗ (Λ 0 , Λ 1 ) := inf Q ∈ Λ 1 J ∗ (Λ 0 , Q ) 5/17 Vincent Tan (SSG, LIDS, MIT) Testing Markov Forest Distributions ISIT 2010 5 / 17
Definition of Worst-Case Type-II Error Exponent Neyman-Pearson setup. Acceptance regions ( A n ) . Def: Type-II error exponent for a fixed Q ∈ Λ 1 given ( A n ) : n →∞ − 1 n log Q n ( A n ) J (Λ 0 , Q ; A n ) := lim inf Def: Optimal Type-II error exponent J ∗ (Λ 0 , Q ) := J (Λ 0 , Q ; A n ) sup A n : P n ( A n ) ≤ α, ∀ P ∈ Λ 0 Def: Worst-Case Optimal Type-II error exponent J ∗ (Λ 0 , Λ 1 ) := inf Q ∈ Λ 1 J ∗ (Λ 0 , Q ) Optimizing distribution Q ∗ called the least favorable distribution. 5/17 Vincent Tan (SSG, LIDS, MIT) Testing Markov Forest Distributions ISIT 2010 5 / 17
Why Difficult? 6/17 Vincent Tan (SSG, LIDS, MIT) Testing Markov Forest Distributions ISIT 2010 6 / 17
Why Difficult? Many trees: If there are d nodes, there are d d − 2 trees! Searching for the dominant error event may be intractable. 6/17 Vincent Tan (SSG, LIDS, MIT) Testing Markov Forest Distributions ISIT 2010 6 / 17
Why Difficult? Many trees: If there are d nodes, there are d d − 2 trees! Searching for the dominant error event may be intractable. Natural Questions: Any closed-form expressions for the worst-case error exponent for special Λ 0 , Λ 1 ? How does this depend on the true distribution? 6/17 Vincent Tan (SSG, LIDS, MIT) Testing Markov Forest Distributions ISIT 2010 6 / 17
Why Difficult? Many trees: If there are d nodes, there are d d − 2 trees! Searching for the dominant error event may be intractable. Natural Questions: Any closed-form expressions for the worst-case error exponent for special Λ 0 , Λ 1 ? How does this depend on the true distribution? Connections to learning? Intuition and characterization of the least favorable distribution? 6/17 Vincent Tan (SSG, LIDS, MIT) Testing Markov Forest Distributions ISIT 2010 6 / 17
A Simplification Assume that H 0 is simple and P is Markov on T 0 = ( V , E 0 ) . H 0 : x 1 , . . . , x n ∼ { P } H 1 : x 1 , . . . , x n ∼ Λ 1 = D ( T ) \ D ( T 0 ) 7/17 Vincent Tan (SSG, LIDS, MIT) Testing Markov Forest Distributions ISIT 2010 7 / 17
Recommend
More recommend