Lecture 8 Hypothesis Testing I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 21 I-Hsiang Wang IT Lecture 8
In this lecture, we elaborate more on binary hypothesis testing, focusing on the following aspects: 1 Fundamental performance limits of binary hypothesis testing. Log likelihood, Neyman-Pearson Test Optimal trade-off between α and β ( α : Probability of false alarm/type-I error/false positive) ( β : Probability of miss detection/type-II error/false negative) . 2 Asymptotic performance of testing from n i.i.d. samples as n → ∞ . Stein's regime vs. Chernoff's regime Error exponents Along the side, we will introduce large deviation theory , an important set of probabilistic theoretical tools that not only help characterize the asymptotic performance limits of binary hypothesis testing but also play an important role in other problems. 2 / 21 I-Hsiang Wang IT Lecture 8
Binary Hypothesis Testing: More Details 1 Binary Hypothesis Testing: More Details Recap: Log Likelihood, Neyman-Pearson Test Tradeoff between α and β Asymptotic Performance: Prelude 3 / 21 I-Hsiang Wang IT Lecture 8
Binary Hypothesis Testing: More Details Recap: Log Likelihood, Neyman-Pearson Test 1 Binary Hypothesis Testing: More Details Recap: Log Likelihood, Neyman-Pearson Test Tradeoff between α and β Asymptotic Performance: Prelude 4 / 21 I-Hsiang Wang IT Lecture 8
Binary Hypothesis Testing: More Details Recap: Log Likelihood, Neyman-Pearson Test Setup (Recap) { H 0 : X ∼ P 0 (Null Hypothesis, θ = 0 ) (1) H 1 : X ∼ P 1 (Alternative Hypothesis, θ = 1 ) Unknown binary parameter θ . Decision rule (randomized test) φ : X → [0 , 1] . Outcome ˆ Data generating distribution P θ . θ = 1 with probability φ ( X ) . Loss function: 0-1 loss 1 { ˆ Data/Observation/Sample X ∼ P θ . θ ̸ = θ } . Probability of Errors (prove the following as an exercise) Probability of Type-I Error : α φ = E X ∼ P 0 [ φ ( X )] (2) Probability of Type-II Error : β φ = E X ∼ P 1 [1 − φ ( X )] 5 / 21 I-Hsiang Wang IT Lecture 8
Binary Hypothesis Testing: More Details Recap: Log Likelihood, Neyman-Pearson Test Likelihood, Log Likelihood Ratio, and Likelihood Ratio Test 1 L ( θ | x ) ≜ P θ ( x ) (viewed as a function of parameter θ given the data x ) is called the likelihood function of θ . 2 For binary HT, likelihood ratio L ( x ) ≜ L (1 | x ) L (0 | x ) = P 1 ( x ) P 0 ( x ) 3 Log likelihood ratio (LLR) l ( x ) ≜ log L ( x ) = log P 1 ( x ) − log P 0 ( x ) . 4 A (randomized) likelihood ratio test (LRT) is a test φ τ,γ defined as follows: (parametrized by cosntants τ ∈ R and γ ∈ (0 , 1) ) if l ( x ) > τ 1 . φ τ,γ ( x ) = if l ( x ) = τ γ if l ( x ) < τ 0 Remark : In this lecture we assume the logarithm above is base- 2 . 6 / 21 I-Hsiang Wang IT Lecture 8
Binary Hypothesis Testing: More Details Recap: Log Likelihood, Neyman-Pearson Test Performance of LRT For a LRT φ τ,γ , the probabilities of errors are { α = P 0 { l ( X ) > τ } + γP 0 { l ( X ) = τ } = L 0 { l > τ } + γ L 0 { l = τ } (3) β = P 1 { l ( X ) < τ } + (1 − γ ) P 1 { l ( X ) = τ } = L 1 { l ≤ τ } − γ L 1 { l = τ } where L 0 , L 1 are the distributions of the LLR under P 0 and P 1 respectively. The following facts will be useful later. The proofs are left as exercise. Proposition 1 For a LRT φ τ,γ , its probabilities of type-I and type II errors satisfy α ≤ 2 − τ , β ≤ 2 τ , L 0 { l > τ } ≤ α ≤ L 0 { l ≥ τ } , and L 1 { l < τ } ≤ β ≤ L 1 { l ≤ τ } . Furthermore, the distributions of LLR satisfy L 1 ( l ) = 2 l L 0 ( l ) . 7 / 21 I-Hsiang Wang IT Lecture 8
Binary Hypothesis Testing: More Details Recap: Log Likelihood, Neyman-Pearson Test Neyman-Pearson Theorem and Neyman-Pearson Test Recall the Neyman-Pearson problem which aims to find the lowest probability of type-II error under the constraint that the probability of the type-I error is at most α : β ∗ ( α ) ≜ inf φ : X→ [0 , 1] (4) β φ α φ ≤ α Let us re-state the Neyman-Pearson Theorem to emphasize the fact that β ∗ ( α ) can be attained by a randomized LRT φ τ,γ , called the Neyman-Pearson Test. Theorem 1 (Neyman-Pearson: (Randomized) LRT is Optimal) For any α ∈ [0 , 1] , β ∗ ( α ) is attained by a (randomized) LRT φ τ ∗ ,γ ∗ with the parameters ( τ ∗ , γ ∗ ) , where the pair ( τ ∗ , γ ∗ ) ∈ R × [0 , 1] is the unique solution to α = L 0 { l > τ } + γ L 0 { l = τ } . Hence, the inf {·} in (4) is attainable and hence becomes min {·} . 8 / 21 I-Hsiang Wang IT Lecture 8
Binary Hypothesis Testing: More Details Tradeoff between α and β 1 Binary Hypothesis Testing: More Details Recap: Log Likelihood, Neyman-Pearson Test Tradeoff between α and β Asymptotic Performance: Prelude 9 / 21 I-Hsiang Wang IT Lecture 8
Binary Hypothesis Testing: More Details Tradeoff between α and β Tradeoff between Probability of Type-I and Type-II Errors Define the collection of all feasible ( Probability of Type-I Error , Probability of Type-II Error ) as follows: R ( P 0 , P 1 ) ≜ { ( α φ , β φ ) | φ : X → [0 , 1] } (5) Proposition 2 (Properties of R ( P 0 , P 1 ) ) R ( P 0 , P 1 ) satisfies the following properties: 1 It is closed and convex. 2 It contains the diagonal line { ( a, 1 − a ) | a ∈ [0 , 1] } . 3 It is symmetric w.r.t. the diagonal line: ( α, β ) ∈ R ( P 0 , P 1 ) ⇐ ⇒ (1 − α, 1 − β ) ∈ R ( P 0 , P 1 ) . 4 Lower boundary (below the diagonal line) { β ∗ ( α ) | α ∈ [0 , 1] } is attained by Neyman-Pearson Test. 10 / 21 I-Hsiang Wang IT Lecture 8
�� � ��� � � � ��� ��� �� � ��� � � � ��� ��� Binary Hypothesis Testing: More Details Tradeoff between α and β β ( P MD ) β ( P MD ) R ( P 0 , P 1 ) R ( P 0 , P 1 ) 1 1 α ( P FA ) α ( P FA ) 1 1 (a) |X| = ∞ (b) |X| < ∞ Intuition : R ( P 0 , P 1 ) tells how "dissimilar" P 0 and P 1 are. The larger R ( P 0 , P 1 ) is, the easier it is to distinguish P 0 and P 1 . 11 / 21 I-Hsiang Wang IT Lecture 8
Binary Hypothesis Testing: More Details Tradeoff between α and β Proof Sketch of Proposition 2 Closedness is due to Neyman-Pearson Theorem ( inf {·} is attainable and becomes min {·} ) and the symmetry property (Property 3) . Convexity is proved by consider a convex combination φ ( λ ) of two tests φ (0) and φ (1) , where φ ( λ ) ( x ) ≜ (1 − λ ) φ (0) ( x ) + λφ (1) ( x ) . Derive its ( α, β ) and convexity is immediately proved. Consider a blind test which flips a biased ( Ber ( a ) ) coin to make decision regardless of which x observes. In other words, φ ( x ) = a, ∀ x ∈ X . Then show that the type-I error probability and type-II error probability are indeed a and (1 − a ) respectively. Symmetry is proved by consider an opposite test φ against the test φ achieving ( α, β ) , where φ ( x ) = 1 − φ ( x ) , ∀ x ∈ X . 12 / 21 I-Hsiang Wang IT Lecture 8
Binary Hypothesis Testing: More Details Tradeoff between α and β Example 1 Draw R ( P 0 , P 1 ) for the following cases: P 0 = Ber ( a ) and P 1 = Ber ( b ) . P 0 = P 1 . P 0 ⊥ P 1 , that is, ⟨ P 0 , P 1 ⟩ = 0 . 13 / 21 I-Hsiang Wang IT Lecture 8
Binary Hypothesis Testing: More Details Tradeoff between α and β Bounds on R ( P 0 , P 1 ) We can run through Neyman-Pearson Tests over all α ∈ [0 , 1] and obtain the lower boundary { β ∗ ( α ) | α ∈ [0 , 1] } of the region R ( P 0 , P 1 ) , which suffices to characterize the entire region. However, this might be more challenging than you initially think, especially when the observation becomes high-dimensional (like the decoding of channel code). Hence, we often would like to have inner and outer bounds of the region R ( P 0 , P 1 ) . Inner Bound is about achievability: Come up with tests which have tractable performance bounds. Often we use deterministic LRT with carefully chosen a Outer Bound is about converse: Show that the performance of all feasible tests must satisfy certain properties. 14 / 21 I-Hsiang Wang IT Lecture 8
Binary Hypothesis Testing: More Details Tradeoff between α and β Outer Bounds Lemma 1 (Weak Converse) For all ( α, β ) ∈ R ( P 0 , P 1 ) , d b (1 − α ∥ β ) ≤ D ( P 0 ∥ P 1 ) , d b ( β ∥ 1 − α ) ≤ D ( P 1 ∥ P 0 ) . Remark : The weak converse bound is characterized by the information divergence. Interestingly, information divergences are the expectation of LLR: D ( P 1 ∥ P 0 ) = E X ∼ P 1 [ l ( X )] = E L 1 [ l ] . D ( P 0 ∥ P 1 ) = E X ∼ P 0 [ − l ( X )] = − E L 0 [ l ] , Lemma 2 (Strong Converse) For all ( α, β ) ∈ R ( P 0 , P 1 ) and τ ∈ R , α + 2 − τ β ≥ L 0 { l > τ } , β + 2 τ α ≥ L 1 { l > τ } . Remark : The strong converse requires the knowledge of the distributions of the LLR, while the weak converse only needs the expected values of the LLR. 15 / 21 I-Hsiang Wang IT Lecture 8
Recommend
More recommend