second order asymptotics of sequential hypothesis testing
play

Second-Order Asymptotics of Sequential Hypothesis Testing Yonglong - PowerPoint PPT Presentation

Second-Order Asymptotics of Sequential Hypothesis Testing Yonglong Li and Vincent Y. F. Tan June 4, 2020 1/20 1 / 20 Outline Problem Setup Literature Review Main Result Numerical Examples Proof of the Main Result 2/20 2 /


  1. Second-Order Asymptotics of Sequential Hypothesis Testing Yonglong Li and Vincent Y. F. Tan June 4, 2020 1/20 1 / 20

  2. Outline ◮ Problem Setup ◮ Literature Review ◮ Main Result ◮ Numerical Examples ◮ Proof of the Main Result 2/20 2 / 20

  3. Sequential Hypothesis Testing Problem Setup ◮ Binary hypothesis testing: H 0 : P = P 0 and H 1 : P = P 1 , where P 0 and P 1 are probability distributions defined on the same alphabet X . 3/20 3 / 20

  4. Sequential Hypothesis Testing Problem Setup ◮ Binary hypothesis testing: H 0 : P = P 0 and H 1 : P = P 1 , where P 0 and P 1 are probability distributions defined on the same alphabet X . ◮ { X k } ∞ k =1 are i.i.d. random variables distributed according to P . 3/20 3 / 20

  5. Sequential Hypothesis Testing Problem Setup ◮ Binary hypothesis testing: H 0 : P = P 0 and H 1 : P = P 1 , where P 0 and P 1 are probability distributions defined on the same alphabet X . ◮ { X k } ∞ k =1 are i.i.d. random variables distributed according to P . ◮ F n is the σ -algebra generated by X n 1 . 3/20 3 / 20

  6. Sequential Hypothesis Testing Problem Setup ◮ Binary hypothesis testing: H 0 : P = P 0 and H 1 : P = P 1 , where P 0 and P 1 are probability distributions defined on the same alphabet X . ◮ { X k } ∞ k =1 are i.i.d. random variables distributed according to P . ◮ F n is the σ -algebra generated by X n 1 . ◮ T is a stopping time adapted to the filtration {F n } ∞ n =1 and F T is the σ -algebra associated with T . 3/20 3 / 20

  7. Sequential Hypothesis Testing Problem Setup ◮ Binary hypothesis testing: H 0 : P = P 0 and H 1 : P = P 1 , where P 0 and P 1 are probability distributions defined on the same alphabet X . ◮ { X k } ∞ k =1 are i.i.d. random variables distributed according to P . ◮ F n is the σ -algebra generated by X n 1 . ◮ T is a stopping time adapted to the filtration {F n } ∞ n =1 and F T is the σ -algebra associated with T . ◮ δ is a { 0 , 1 } -valued function measurable with respect to F T . δ = i means H i is the underlying hypothesis. A pair ( δ, T ) is called a sequential hypothesis test (SHT). 3/20 3 / 20

  8. Sequential Hypothesis Testing Problem Setup ◮ P 1 | 0 ( δ, T ) = P T P 0 | 1 ( δ, T ) = P T 0 ( δ = 1) and 1 ( δ = 0) . ◮ Expectation constraint on the sample size T : for any integer n , max i =0 , 1 E P i [ T ] ≤ n . Sequential Probability Ratio Test One important class of SHTs is the family of sequential probability ratio tests (SPRTs). Let Y k = log p 0 ( X k ) p 1 ( X k ) and S n = � n k =1 Y k . For any pair of positive real numbers α and β , an SPRT with parameters ( α, β ) is defined as follows � 0 if S T > β δ = 1 if S T < − α, 4/20 where T = inf { n ≥ 1 : S n / ∈ [ − α, β ] } . 4 / 20

  9. Problem Setup Error Exponents Given a sequence of SHTs { ( δ n , T n ) } ∞ n =1 satisfy the expectation constraint, we are concerned with the error exponents ( E 0 , E 1 ) defined as 1 1 1 1 E 0 = lim inf n log and E 1 = lim inf n log P 0 | 1 ( δ n , T n ) . P 1 | 0 ( δ n , T n ) n →∞ n →∞ 5/20 5 / 20

  10. Literature Review ◮ In 1948, Wald and Wolfowitz showed that E 0 ≤ D ( P 1 � P 0 ) and E 1 ≤ D ( P 0 � P 1 ) and the error exponent can be achieved by a sequence of SPRTs. 6/20 6 / 20

  11. Literature Review ◮ In 1948, Wald and Wolfowitz showed that E 0 ≤ D ( P 1 � P 0 ) and E 1 ≤ D ( P 0 � P 1 ) and the error exponent can be achieved by a sequence of SPRTs. ◮ In 1986 and 1988, Lotov studied the series expansion of the error probabilities for a sequence of SPRTs. 6/20 6 / 20

  12. Literature Review ◮ In 1948, Wald and Wolfowitz showed that E 0 ≤ D ( P 1 � P 0 ) and E 1 ≤ D ( P 0 � P 1 ) and the error exponent can be achieved by a sequence of SPRTs. ◮ In 1986 and 1988, Lotov studied the series expansion of the error probabilities for a sequence of SPRTs. ◮ For the fixed length binary hypothesis testing problem, Strassen showed that the backoff from the optimal exponent D ( P 0 � P 1 ) is of the order Θ( 1 √ n ) and characterized the implied constant as a function of the relative entropy variance and the Gaussian cumulative distribution function. 6/20 6 / 20

  13. Main Result Secon-order Term under Expectation Constraint For fixed λ ∈ [0 , 1], � 1 � F n ( λ ) = sup λ n log P 1 | 0 ( δ n , T n ) + D ( P 1 � P 0 ) ( δ n , T n ):max i =0 , 1 E Pi [ T n ] ≤ n � 1 � + (1 − λ ) n log P 0 | 1 ( δ n , T n ) + D ( P 0 � P 1 ) . (1) Let F ( λ ) = lim sup n →∞ F n ( λ ) and F ( λ ) = lim inf n →∞ F n ( λ ) . If F ( λ ) = F ( λ ), then we term this common value as the second-order exponent of SHT under the expectation constraint and we denote it simply as F ( λ ). 7/20 7 / 20

  14. Main Result Second-order Asymptotics under the Expectation Constraint Let { α k } ∞ k =1 and { β k } ∞ k =1 be two increasing sequences of positive real numbers such that α k → ∞ and β k → ∞ as k → ∞ . Let T ( β k ) = inf { n ≥ 1 : S n > β k } and ˜ T ( α k ) = inf { n ≥ 1 : − S n > α k } . Furthermore, let R k = S T ( β k ) − β k and ˜ R k = − S ˜ T ( α k ) − α k . It is known that ◮ if the true hypothesis is H 0 , { R k } ∞ k =1 converges in distribution to some random variable R and the limit is independent of the choice of { α k } ∞ k =1 ; ◮ if the true hypothesis is H 1 , { ˜ R k } ∞ k =1 converges in distribution to some random variable ˜ R and the limit is independent of the choice of { β k } ∞ k =1 . 8/20 8 / 20

  15. Main Result Second-order Asymptotics under the Expectation Constraint Define A ( P 0 , P 1 ) = E [ ˜ ˜ A ( P 0 , P 1 ) = E [ R ] , R ] , B ( P 0 , P 1 ) = log E [ e − ˜ B ( P 0 , P 1 ) = log E [ e − R ] , ˜ R ] . We note that ˜ A ( P 0 , P 1 ) � = A ( P 1 , P 0 ) and ˜ B ( P 0 , P 1 ) � = B ( P 1 , P 0 ) in general. Theorem 1 �� � 2 � � log p 0 ( X 1 ) � Let P 0 and P 1 be such that max i =0 , 1 E P i < ∞ and p 1 ( X 1 ) log p 0 ( X 1 ) p 1 ( X 1 ) is non-arithmetic when X 1 ∼ P 0 . Then for every λ ∈ [0 , 1] , F ( λ ) = F ( λ ) = F ( λ ) = � ˜ A ( P 0 , P 1 ) + ˜ � � � λ B ( P 0 , P 1 ) + (1 − λ ) A ( P 0 , P 1 ) + B ( P 0 , P 1 ) . 9/20 9 / 20

  16. Second-order Results Remark 1 The rate of convergence of the optimal λ -weighted finite-length exponents sup ( δ n , T n ) − λ n log P 1 | 0 ( δ n , T n ) − 1 − λ log P 0 | 1 ( δ n , T n ) to n the λ -weighted exponents λ D ( P 1 � P 0 ) + (1 − λ ) D ( P 0 � P 1 ) is Θ( 1 n ) . 10/20 10 / 20

  17. Numerical Examples Example 1 Let γ 0 and γ 1 be two positive real numbers such that γ 0 < γ 1 . Let p 0 ( x ) = γ 0 e − γ 0 x and p 1 ( x ) = γ 1 e − γ 1 x for x > 0. We can numerically compute the second-order exponent under the expectation constraint. This is illustrated in Figure 3 for various λ ’s. F 4 λ = 0.1 λ = 0.5 3 λ = 0.9 2 1 γ 0.2 0.4 0.6 0.8 1.0 Figure: Exponential distributions as in Example 2 with γ 0 = γ and γ 1 = 1 11/20 11 / 20

  18. Numerical Examples Example 2 Let θ 0 and θ 1 be two distinct real numbers. Let 2 π e − ( x − θ 0)2 2 π e − ( x − θ 1)2 1 1 p 0 ( x ) = √ and p 1 ( x ) = √ for x ∈ R . Let 2 2 ∆( θ ) = | θ 1 − θ 2 | . Then we can numerically compute the second-order exponent under the expectation constraint. This is illustrated in Figure 3. We note that for this case of discriminating between two Gaussians, F ( λ ) does not depend on λ ∈ [0 , 1]. F 20 15 10 5 | Δθ | 2 4 6 8 Figure: Gaussian distributions 12/20 12 / 20

  19. Proof of the Main Result Auxiliary Tools In the proof of Theorem 1, we use the following results on the asymptotics of the first passage time. Let { α i } ∞ i =1 and { β i } ∞ i =1 be two increasing sequences of positive real numbers such that α i → ∞ and β i → ∞ as i → ∞ . Let ( δ i , T i ) be an SPRT with parameters ( α i , β i ). Theorem 2 (Woodfore) Assume that max { E P 1 [ Y 2 1 ] , E P 0 [ Y 2 1 ] } < ∞ and Y 1 is non-arithmetic. Then as n → ∞ , D ( P 0 � P 1 ) + A ( P 0 , P 1 ) β n E P 0 [ T n ] = D ( P 0 � P 1 ) + o (1) , and ˜ α n A ( P 0 , P 1 ) E P 1 [ T n ] = D ( P 1 � P 0 ) + D ( P 1 � P 0 ) + o (1) . 13/20 13 / 20

  20. Proof of the Main Result Auxiliary Tools Theorem 3 (Woodfore) Assume that max { E P 1 [ Y 2 1 ] , E P 0 [ Y 2 1 ] } < ∞ and Y 1 is non-arithmetic. Then, ˜ i →∞ P 1 | 0 ( δ i , T i ) e α i = e B ( P 0 , P 1 ) , lim i →∞ P 0 | 1 ( δ i , T i ) e β i = e B ( P 0 , P 1 ) . lim The following lemma characterizes the optimality of the SPRT. Lemma 4 (Ferguson) Let ( δ, T ) be an SPRT. Let (˜ δ, ˜ T ) be any SHT such that E P 0 [ ˜ E P 1 [ ˜ T ] ≤ E P 0 [ T ] T ] ≤ E P 1 [ T ] . Then and P 0 | 1 ( δ, T ) ≤ P 0 | 1 (˜ δ, ˜ P 1 | 0 ( δ, T ) ≤ P 1 | 0 (˜ δ, ˜ T ) and T ) . 14/20 14 / 20

Recommend


More recommend