nonparametric testing by convex optimization
play

Nonparametric testing by convex optimization Anatoli Juditsky joint - PowerPoint PPT Presentation

Nonparametric testing by convex optimization Anatoli Juditsky joint research with Alexander Goldenshluger and Arkadi Nemirovski University J. Fourier, University of Haifa, ISyE, Georgia Tech, Atlanta Gargantua, November 26,


  1. Nonparametric testing by convex optimization Anatoli Juditsky ∗ joint research with Alexander Goldenshluger ‡ and Arkadi Nemirovski † ∗ University J. Fourier, ‡ University of Haifa, † ISyE, Georgia Tech, Atlanta Gargantua, November 26, 2013 1 / 41

  2. Motivation: event detection in sensor networks [Tartakovsky, Veeravalli, 2004, 2008] 1 0.8 0.6 0.4 0.2 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Array of 20 sensors on the uniform grid along the left and bottom edges of [0 , 1] 2 . “ + ” represent the points of the uniform 20 × 20–grid Γ, “ • ” are sensor positions, interposed with contour plot of the response of the 6th sensor 2 / 41

  3. Suppose that m sensors are deployed on the domain G ⊆ R d . Given a grid Γ = ( γ i ) i =1 ,..., n ⊂ G . An event at a node γ i ∈ Γ produces the signal s = re [ i ] : Γ → R n of known signature e [ i ] with unknown real factor r . The signal is contaminated by a nuisance (a background signal) v ∈ V , where V is a known convex and compact set in R n . Observation ω = [ ω 1 ; ... ; ω m ] of the array of m sensors is a linear transformation of the signal, contaminated with random noise: ω ∼ P µ – a random vector in R m with the distribution parameterized by µ ∈ R m , where µ = A ( s + v ) , and A ∈ R m × n is a known matrix of sensor responses. 3 / 41

  4. Objective: testing the (null) hypothesis H 0 that no event happened against the alternative H 1 that exactly one event took place. We require that • Ae [ i ] � = 0 for all i • under H 1 , when an event occurs at a node γ i ∈ Γ, we have s = re [ i ] with | r | ≥ ρ i with some given ρ i > 0. Problem ( D ρ ): Given ρ = [ ρ 1 ; ... ; ρ n ] > 0, decide between • hypothesis H 0 : s = 0 against • alternative H 1 ( ρ ) : s = re [ i ] for some i ∈ { 1 , ..., n } and r with | r | ≥ ρ i . The risk of the test is the maximal probability to reject H 0 when the hypothesis is true or to accept H 0 when H 1 ( ρ ) is true. Our goal is, given an ǫ ∈ (0 , 1) , construct a test with risk ≤ ǫ for as wide as possible (i.e., with as small ρ as possible) alternative H 1 ( ρ ). 4 / 41

  5. A particular case: signal detection in convolution [Yin, 1988, Wang, 1995, Muller 1999, Gustavson, 2000, Antoniadis, Gijbels, 2002, Goldenshluger et al., 2008,...] We consider the model with observation ω = A ( s + v ) + σξ, where s , v ∈ R n , and ξ ∼ N (0 , I m ) with known σ > 0. Let µ = [ µ 1 ; ...µ m ] be the vector of m consecutive outputs of a discrete time 0.18 0.16 linear dynamical system with a given 0.14 impulse response { g k } , k = 1 , ..., T , i.e. 0.12 µ ∈ R m is the convolution image of 0.1 0.08 n -dimensional “signal” s 0.06 (that is, n = m + T − 1). 0.04 0.02 A is the Toeplitz m × n matrix of the 0 −60 −40 −20 0 20 40 60 80 100 described linear mapping x �→ µ . Convolution kernel, m = 100, n = 159 We want to detect the presence of the signal s = re [ i ] , where e [ i ] , i = 1 , ..., n , are some given vectors in R n . 5 / 41

  6. Situation, formally Given are • “Observation space” Ω , P Ω: Polish (complete separable metric) space P : σ -finite σ -additive Borel measure on Ω • Family P = { P µ ( d ω ) = p µ ( ω ) P ( d ω ) : µ ∈ M} of probability distributions on Ω distribution’s parameter running through “parameter space” M ⊂ R m µ : p µ : density of distribution P µ w.r.t. the reference measure P • “Parameter spaces” – two nonempty convex compact subsets M 0 ⊂ M and M 1 ⊂ M . 6 / 41

  7. Assumptions We assume that • M ⊂ R m is a convex set which coincides with its relative interior; • distributions P µ ∈ P possess densities p µ ( ω ) w.r.t. the measure P on the space Ω. We assume that p µ ( ω ) is continuous in µ ∈ M and is positive for all ω ∈ Ω; • We are given a finite-dimensional linear space F of continuous functions on Ω containing constants such that ln( p µ ( · ) / p ν ( · )) ∈ F whenever µ, ν ∈ M ; 7 / 41

  8. Assumptions We assume that • M ⊂ R m is a convex set which coincides with its relative interior; • distributions P µ ∈ P possess densities p µ ( ω ) w.r.t. the measure P on the space Ω. We assume that p µ ( ω ) is continuous in µ ∈ M and is positive for all ω ∈ Ω; • We are given a finite-dimensional linear space F of continuous functions on Ω containing constants such that ln( p µ ( · ) / p ν ( · )) ∈ F whenever µ, ν ∈ M ; �� � • For every φ ∈ F , the function F φ ( µ ) = ln Ω exp { φ ( ω ) } p µ ( ω ) P ( d ω ) is well defined and concave in µ ∈ M . We call the just described situation a good observation scheme. 7 / 41

  9. ... and goal Given observation scheme [observation space (Ω , P ) and family of distributions { p µ ( · ) } µ ∈M , “parameter spaces” M 0 , M 1 , and random observation ω ∼ p µ ( · ) , coming from some unknown µ , known to belong either to M 0 (hypothesis H 0 ) or to M 1 (hypothesis H 1 ), decide between H 0 and H 1 . Risk of the test: given a test (we interpret value 0 as accepting H 0 and 1 as accepting H 1 ), we consider the quantities ǫ 0 = sup Prob ω ∼ P µ { test rejects H 0 } , µ ∈ M 0 ǫ 1 = sup Prob ω ∼ P µ { test rejects H 1 } , µ ∈ M 1 We say that risk of the test is ≤ ǫ , if both error probabilities are ≤ ǫ . 8 / 41

  10. Example: Gaussian case Given an noisy observation ω = µ + ξ, ξ ∼ N (0 , I ) , make conclusions about µ . The observation scheme is • (Ω , P ): R m with Lebesque measure • p µ ( ω ) = N ( µ, I ) , µ ∈ M := R m • F = { φ ( ω ) = a T ω + b : a ∈ R m , b ∈ R } , and �� � = b + a T µ + a T a R m e a T ω + b p µ ( ω ) d ω ) ln 2 , is concave in µ Gaussian observation scheme is good! 9 / 41

  11. Example: Poisson case Given m realizations of independent Poisson random variables ω i ∼ Poisson ( µ i ) with parameters µ i , make conclusions about µ . The observation scheme is • (Ω , P ): Z m + with counting measure i µ i , µ ∈ M = int R m • p µ ( ω ) = µ ω ω ! e − � + • F = { φ ( ω ) = a T ω + b : a ∈ R m , b ∈ R } , and    � � m e a T ω + b p µ ( ω ) [ e a i − 1] µ i ,  = b + ln ω ∈ Z m i =1 + is concave in µ Poisson observation scheme is good! 10 / 41

  12. Example: discrete case Given realization of random variable ω taking values 1 , ..., m with probabilities µ i µ i := Prob { ω = i } , make conclusions about µ . The observation scheme is • (Ω , P ): { 1 , ..., m } with counting measure � � µ > 0 , µ ∈ R m : • p µ ( ω ) = µ ω , µ ∈ M = � m ω =1 µ ω = 1 • F = R (Ω) = R m , and �� � � m � � e φ ( ω ) p µ ( ω ) e φ ( ω ) µ ω ln = ln , ω ∈ Ω ω =1 is concave in µ . Discrete observation scheme is good! 11 / 41

  13. Simple test Simple (Cramer’s) test: a simple test is specified by a detector φ ( · ) ∈ F ; it accepts H 0 , the observation being ω , if φ ( ω ) ≥ 0, and accepts H 1 otherwise. We can easily bound the risk of a simple test φ : for µ ∈ M 0 we have � Prob ω ∼ P µ ( φ ( ω ) < 0) ≤ E ω ∼ P µ ( e − φ ( ω ) ) = e − φ ( ω ) p µ ( ω ) P ( d ω ) , Ω and for ν ∈ M 1 , � Prob ω ∼ P ν ( φ ( ω ) ≥ 0) ≤ E ω ∼ P ν ( e φ ( ω ) ) = e φ ( ω ) p ν ( ω ) P ( d ω ) . Ω We associate with φ ( · ) ∈ F , and [ µ ; ν ] ∈ M 0 × M 1 the aggregate �� � �� � Ω e − φ ( ω ) p µ ( ω ) P ( d ω ) Ω e φ ( ω ) p ν ( ω ) P ( d ω ) Φ( φ, [ µ ; ν ]) = ln + ln Key observation: in a good observation scheme Φ( φ, [ µ ; ν ]) is continuous on its domain, convex in φ ( · ) ∈ F and concave in [ µ ; ν ] ∈ M 0 × M 1 . 12 / 41

  14. Main result Theorem 1 (i) Φ( φ, [ µ ; ν ]) possesses a saddle point ( min in φ , max in [ µ ; ν ] ) ( φ ∗ ( · ) , [ x ∗ ; y ∗ ]) on F × ( M 0 × M 1 ) with the saddle value min max Φ( φ, [ µ ; ν ]) := 2 ln( ε ∗ ) . φ ∈F [ µ ; ν ] ∈ M 0 × M 1 The risk of the simple test associated with the detector φ ∗ on the composite hypotheses H M 0 , H M 1 is ≤ ε ∗ . 13 / 41

  15. Main result Theorem 1 (i) Φ( φ, [ µ ; ν ]) possesses a saddle point ( min in φ , max in [ µ ; ν ] ) ( φ ∗ ( · ) , [ x ∗ ; y ∗ ]) on F × ( M 0 × M 1 ) with the saddle value min max Φ( φ, [ µ ; ν ]) := 2 ln( ε ∗ ) . φ ∈F [ µ ; ν ] ∈ M 0 × M 1 The risk of the simple test associated with the detector φ ∗ on the composite hypotheses H M 0 , H M 1 is ≤ ε ∗ . (ii) The detector φ ∗ is readily given by the [ µ ; ν ] -component [ µ ∗ ; ν ∗ ] of the associated saddle point of Φ , specifically, φ ∗ ( · ) = 1 2 ln [ p µ ∗ ( · ) / p ν ∗ ( · )] . 13 / 41

Recommend


More recommend