Gaussian Model Selection with Unknown Variance Y. Baraud, C. Giraud and S. Huet Universit´ e de Nice - Sophia Antipolis, INRA Jouy en Josas Luminy, 13-17 novembre 2006
The statistical setting The statistical model Observations: Y i = µ i + σε i , i = 1 , . . . , n • µ = ( µ 1 , . . . , µ n ) ′ ∈ R n and σ > 0 are unknown • ε 1 , . . . , ε n are i.i.d standard Gaussian Collection of models / estimators • S = { S m , m ∈ M} a countable collection of linear subspaces of R n (models) • ˆ µ m = least-squares estimator of µ on S m
Example: change-points detection • µ i = f ( x i ) with f : [0 , 1] �→ R , piecewise constant. • M is the set of increasing sequences m = ( t 0 , . . . , t q ) with q ∈ { 1 , . . . , p } , t 0 = 0 , t q = 1 , and { t 1 , . . . , t q − 1 } ⊂ { x 1 , . . . , x n } . • models: S m = { ( g ( x 1 ) , . . . , g ( x n )) ′ , g ∈ F m } , where q � ( a 1 , . . . , a q ) ∈ R q F ( t 0 ,...,t q ) = g = a j 1 [ t j − 1 ,t j [ with . j =1 • No residual squares to estimate the variance.
Risk on a single model Euclidean risk on S m : � | 2 � | 2 + D m σ 2 | | µ − ˆ µ m | = | | µ − µ m | E � �� � � �� � variance bias � | 2 � µ m ∗ , where m ∗ minimizes m �→ E Ideal: estimate µ with ˆ | | µ − ˆ µ m | . . .
Model selection Selection rule: we set D m = dim ( S m ) and select ˆ m minimizing � � 1 + pen ( m ) | 2 Crit L ( m ) = | | Y − ˆ µ m | (1) n − D m or � | � | 2 Crit K ( m ) = n | Y − ˆ µ m | + 1 2 pen ′ ( m ) . 2 log (2) n Some classical penalties: FPE AIC BIC AMDL pen ′ ( m ) = 2 D m pen ′ ( m ) = D m log n pen ′ ( m ) = 3 D m log n pen ( m ) = 2 D m
Model selection Selection rule: we select ˆ m minimizing � � 1 + pen ( m ) | 2 Crit L ( m ) = | | Y − ˆ µ m | n − D m or � | � | 2 Crit K ( m ) = n | Y − ˆ µ m | + 1 2 pen ′ ( m ) . 2 log n Criteria (1) and (2) are equivalent with � � 1 + pen ( m ) pen ′ ( m ) = n log . n − D m
Objectives • for classical criteria: to analyze the Euclidean risk of ˆ µ ˆ m with regard to the complexity of the family of model S , and compare this risk to | ] 2 . m ∈M E [ | inf | µ − ˆ µ m | • to propose penalties versatile enough to take into account the complexity of S and the sample size. Complexity: We say that S has an index of complexity ( M, a ) if for all D ≥ 1 card { m ∈ M , D m = D } ≤ Me aD .
Theorem 1: Performances of classical penalties Let K > 1 and S with complexity ( M, a ) ∈ R 2 + . If for all m ∈ M , D m ≤ D max ( K, M, a ) (explicit) and pen ( m ) ≥ K 2 φ − 1 ( a ) D m , with φ ( x ) = ( x − 1 − log x ) / 2 for x ≥ 1 , then � � � � K 1 + pen ( m ) � | 2 � | 2 + pen ( m ) σ 2 | | µ − ˆ µ ˆ m | ≤ K − 1 inf | | µ − µ m | + R E n − D m m ∈M where � � R = Kσ 2 8 KMe − a K 2 φ − 1 ( a ) + 2 K + . � � 2 K − 1 e φ ( K ) / 2 − 1
Performances of ˆ µ ˆ m • under the above hypotheses if pen ( m ) = Kφ − 1 ( a ) D m with K > 1 � � � | 2 � � | 2 � ≤ c ( K, M ) φ − 1 ( a ) + σ 2 | | µ − ˆ µ ˆ m | inf | | µ − ˆ µ m | E m ∈M E • The condition ” pen ( m ) ≥ K 2 φ − 1 ( a ) D m with K > 1 ” is sharp (at least when a = 0 and a = log n ). Roughly, for large values of n this imposes the restrictions: Criteria FPE AIC BIC AMDL a < 1 a < 3 Complexity a < 0 . 15 a < 0 . 15 2 log( n ) 2 log( n )
Dkhi function For x ≥ 0 , we define �� � � 1 X D − x X N Dkhi [ D, N, x ] = E ( X D ) × E ∈ ]0 , 1] . N + where X D and X N are two independent χ 2 ( D ) and χ 2 ( N ) . Computation: x �→ Dkhi [ D, N, x ] is decreasing and � � � � x − x F D,N +2 ≥ ( N + 2) x Dkhi [ D, N, x ] = P F D +2 ,N ≥ , D P D + 2 DN where F D,N is a Fischer random variable with D and N degrees of freedom.
Theorem 2: a general risk bound Let pen be an arbitrary non-negative penalty function and assume that N m = n − D m ≥ 2 for all m ∈ M . If ˆ m exists a.s., then for any K > 1 � � � � K 1 + pen ( m ) � | 2 � | 2 + pen ( m ) σ 2 | | µ − ˆ µ ˆ m | ≤ K − 1 inf | | µ − µ m | + Σ (3) E N m m ∈M where � � Σ = K 2 σ 2 D m + 1 , N m − 1 , N m − 1 � ( D m + 1) Dkhi pen ( m ) . K − 1 KN m m ∈M
Minimal penalties • Choose K > 1 and L = { L m , m ∈ M} non-negative numbers (weights) such that � Σ ′ = ( D m + 1) e − L m < + ∞ . m ∈M • For any m ∈ M set N m N m − 1 Dkhi − 1 � D m + 1 , N m − 1 , e − L m � pen L K, L ( m ) = K • When L m ∨ D m ≤ κn with κ < 1 : pen L K, L ( m ) ≤ C ( K, κ ) ( L m ∨ D m ) .
How to choose the L m ? • When S has a complexity ( M, a ) : a possible choice is L m = aD m + 3 log( D m +1 ) . Then � � Σ ′ = ( D m + 1) e − L m ≤ M D − 2 m ∈M D ≥ 1 �� �� n • For change-point detection: We choose L m = L ( | m | ) = log +2 log( | m | ) , | m |− 2 for which p +1 p +1 � � n 1 � � De − L ( D ) = Σ ′ = D ≤ log( p + 1) . D − 2 D =2 D =2
Recommend
More recommend