fast adaptive estimation of log additive exponential
play

Fast adaptive estimation of log-additive exponential models in - PowerPoint PPT Presentation

Theoretic results Simulation study Fast adaptive estimation of log-additive exponential models in Kullback-Leibler divergence Colloque Jeunes Probabilistes et Statisticiens Richard Fischer EDF R&D MRI, CERMICS, LAMA Supervisors: Cristina


  1. Theoretic results Simulation study Fast adaptive estimation of log-additive exponential models in Kullback-Leibler divergence Colloque Jeunes Probabilistes et Statisticiens Richard Fischer EDF R&D MRI, CERMICS, LAMA Supervisors: Cristina Butucea (LAMA), Jean-François Delmas (CERMICS), Anne Dutfoy (EDF R&D MRI) 18/04/2016 Richard Fischer Fast adaptive estimation of log-additive exponential models 1 / 21

  2. Theoretic results Simulation study Summary 1 Theoretic results 2 Simulation study Richard Fischer Fast adaptive estimation of log-additive exponential models 2 / 21

  3. Theoretic results Simulation study Summary 1 Theoretic results 2 Simulation study Richard Fischer Fast adaptive estimation of log-additive exponential models 3 / 21

  4. Theoretic results Simulation study Estimation problem Suppose that we have an i.i.d. sample X n = ( X 1 , X 2 , . . . , X n ) of a d -dimensional distribution whose density has a product form on △ = { x = ( x 1 , . . . , x d ) ∈ R d : 0 ≤ x 1 ≤ x 2 ≤ . . . ≤ x d ≤ 1 } : d � � d p i ( x i ) 1 △ ( x ) = e ( i = 1 ℓ 0 i ( x i ) − a 0 ) 1 △ ( x ) f ( x ) = i = 1 � [ 0 , 1 ] ℓ 0 such that i q i dx = 0 with q i the i -th marginal of the Lebesgue measure on △ , and a 0 a normalizing constant Suppose that for all 1 ≤ i ≤ d , ℓ 0 i belong to a Sobolev space W 2 r i ( q i ) with r i ∈ N ∗ unknown : � � h ∈ L 2 ( q i ); h ( r i − 1 ) is abs. cont. and h ( r i ) ∈ L 2 ( q i ) W 2 r i ( q i ) = . The product structure of the density suggests a log-additive model to reduce the d -variate problem to d univariate problems Richard Fischer Fast adaptive estimation of log-additive exponential models 4 / 21

  5. Theoretic results Simulation study Log-Additive Exponential Series Estimator Log-additive exponential family For θ = ( θ i , k ; 1 ≤ i ≤ d , 1 ≤ k ≤ m i ) : � d � m i � � θ i , k ϕ i , k ( x i ) − ψ ( θ ) f θ ( x ) = exp 1 △ ( x ) i = 1 k = 1 We require a family of functions ( ϕ i , k ( x i ); 1 ≤ i ≤ d , k ∈ N ) adapted to △ (“orthonormality” w.r.t. the Lebesgue measure on △ ) Basis functions For 1 ≤ i ≤ d , k ∈ N , we define for t ∈ I : ϕ i , k ( t ) = ρ i , k P ( d − i , i − 1 ) ( 2 t − 1 ) , k where P ( d − i , i − 1 ) is the k -th degree Jacobi polynomial and ρ i , k a k constant. Richard Fischer Fast adaptive estimation of log-additive exponential models 5 / 21

  6. Theoretic results Simulation study Maximum likelihood estimator � � We have a sample of size n : X n = X j = ( X j 1 , . . . , X j d ) j = 1 .. n Maximum likelihood estimator ˆ θ m , n verifies, for 1 ≤ i ≤ d , f m , n = f ˆ 1 ≤ k ≤ m i : n µ m , n , i , k = 1 � ϕ i , k ( X j θ m , n [ ϕ i , k ( X i )] = ˆ i ) E f ˆ n j = 1 � �� � empirical mean This is equivalent to (with | m | = � d i = 1 m i ) : ˆ θ m , n = argmax θ ∈ R | m | θ · ˆ µ m , n − ψ ( θ ) n = argmax θ ∈ R | m | 1 � log ( f θ ( X j )) n j = 1 � �� � log-likelihood Richard Fischer Fast adaptive estimation of log-additive exponential models 6 / 21

  7. Theoretic results Simulation study Result of non-adaptive convergence rate I. Theorem �� d � Let f 0 ( x ) = exp i = 1 ℓ 0 1 △ ( x ) . Assume that ℓ 0 i ∈ W 2 i ( x i ) − a 0 r i ( q i ) , r i ∈ N , r i > d. Choose m i = m i ( n ) → ∞ such that : d � | m | 2 d + 1 / n → 0 , | m | 2 d m − 2 r i → 0 and i i = 1 then the Kullback-Leibler divergence between f and f ˆ θ satisfies : � d �� � + m i � D ( f 0 || ˆ m − 2 r i f m , n ) = O P i n i = 1 Richard Fischer Fast adaptive estimation of log-additive exponential models 7 / 21

  8. Theoretic results Simulation study Result of non-adaptive convergence rate II. Optimal convergence rate If we choose m i proportional to n 1 / ( 2 r i + 1 ) , we obtain the optimal univariate rate : � d � − 2 ri � � � − 2 min ( r ) D ( f || ˆ f m , n ) = O P n 2 ri + 1 = O P n 2 min ( r )+ 1 i = 1 Same rate with m i = n 1 / ( 2 min ( r )+ 1 ) for all 1 ≤ i ≤ d Uniform convergence � � f 0 = e � d i ) ( r i ) � L 2 ( q i ) ≤ κ i = 1 ℓ 0 [ i ] − a 0 ; � ℓ 0 i � ∞ ≤ κ, � ( ℓ 0 K r ( κ ) = The convergence in probability is uniform on the set K r ( κ ) of densities : � d � � � � � + | m | � f 0 � ˆ m − 2 r i K →∞ lim sup lim sup ≥ = 0 P D f m , n K i n n →∞ f 0 ∈K r ( κ ) i = 1 Richard Fischer Fast adaptive estimation of log-additive exponential models 8 / 21

  9. Theoretic results Simulation study Adaptive estimation The optimal choice m i ∼ n 1 / ( 2 min ( r )+ 1 ) depends on r , which is unknown Adaptation method : 1 Split the sample into two parts : X n X n X n 1 2 Aggregation Estimators 2 Create multiple estimators ˆ f m , n = f ˆ θ m , n with m ∈ M n based on the sample X n 1 Number of estimators : N n , increasing with n Each m ∈ M n corresponds to regularity parameters r with min ( r ) fixed 3 Perform a convex aggregation on the logarithms of ˆ f m , n with the sample X n 2 to obtain the final estimator f ˆ λ ∗ n Richard Fischer Fast adaptive estimation of log-additive exponential models 9 / 21

  10. Theoretic results Simulation study Choice of estimators Number of estimators N n = o ( log ( n )) , lim n →∞ N n = + ∞ The grid : � � 1 2 ( d + j )+ 1 ⌋ , 1 ≤ j ≤ N n N n = ⌊ n Same number of basis functions in each direction : � � m = ( v , . . . , v ) ∈ R d , v ∈ N n M n = m 2 m 1 Richard Fischer Fast adaptive estimation of log-additive exponential models 10 / 21

  11. Theoretic results Simulation study Choice of estimators Number of estimators N n = o ( log ( n )) , lim n →∞ N n = + ∞ The grid : � � 1 2 ( d + j )+ 1 ⌋ , 1 ≤ j ≤ N n N n = ⌊ n Same number of basis functions in each direction : � � m = ( v , . . . , v ) ∈ R d , v ∈ N n M n = m 2 m 1 Richard Fischer Fast adaptive estimation of log-additive exponential models 10 / 21

  12. Theoretic results Simulation study Choice of estimators Number of estimators N n = o ( log ( n )) , lim n →∞ N n = + ∞ The grid : � � 1 2 ( d + j )+ 1 ⌋ , 1 ≤ j ≤ N n N n = ⌊ n Same number of basis functions in each direction : � � m = ( v , . . . , v ) ∈ R d , v ∈ N n M n = m 2 m 1 Richard Fischer Fast adaptive estimation of log-additive exponential models 10 / 21

  13. Theoretic results Simulation study Choice of estimators Number of estimators N n = o ( log ( n )) , lim n →∞ N n = + ∞ The grid : � � 1 2 ( d + j )+ 1 ⌋ , 1 ≤ j ≤ N n N n = ⌊ n Same number of basis functions in each direction : � � m = ( v , . . . , v ) ∈ R d , v ∈ N n M n = m 2 m 1 Richard Fischer Fast adaptive estimation of log-additive exponential models 10 / 21

  14. Theoretic results Simulation study Convex aggregation of log-densities Convex combination of log-densities ℓ m , n ( x ) = � d � m i Let ˆ k = 1 ˆ θ i , k ϕ i , k ( x i ) for m ∈ M n i = 1 � � � λ m ˆ ℓ m , n ( x ) − ψ λ f λ ( x ) = exp 1 △ ( x ) m ∈M n with λ ∈ Λ + = { ( λ m , m ∈ M n ) , λ m ≥ 0 and � m ∈M n λ m = 1 } Selection of weights ˆ λ ∗ n based on the sample X n 2 : 1 − 1 � � � ˆ λ ∗ f λ ( X j ) n = argmax log 2 pen ( λ ) |X n 2 | λ ∈ Λ + X j ∈X n 2 � �� � � �� � log-likelihood penalty � � with pen ( λ ) = � f λ � ˆ m ∈M n λ m D f m , n Richard Fischer Fast adaptive estimation of log-additive exponential models 11 / 21

  15. Theoretic results Simulation study Sharp oracle inequality for aggregation Lemma Let n ∈ N ∗ be fixed. The convex aggregate estimator f ˆ n verifies for any λ ∗ x > 0 with probability greater than 1 − exp ( − x ) : � � � � ≤ β ( log ( N n ) + x ) f 0 � ˆ f 0 � f ˆ D − min m ∈M n D f m , n , λ ∗ n n with a constant β = β ( � ℓ 0 � ∞ , � ( ℓ 0 i ) ( r i ) � L 2 ( q i ) ) . Order of the remainder term log ( N n ) / n negligible compared to n − 2 min ( r ) / ( 2 min ( r )+ 1 ) . Richard Fischer Fast adaptive estimation of log-additive exponential models 12 / 21

  16. Theoretic results Simulation study Adaptive estimation - Main result Theorem The convex aggregate estimator f ˆ n converges to f in probability with λ ∗ the convergence rate : � � 2 min ( r ) n − D ( f || f ˆ n ) = O P . 2 min ( r )+ 1 λ ∗ Uniform convergence The convergence is uniform for r ∈ R n = { j , d + 1 ≤ j ≤ R n } : � � � � � � 2 min ( r ) f 0 � f ˆ n − K →∞ lim sup lim sup sup D ≥ K = 0 , P 2 min ( r )+ 1 λ ∗ n n →∞ r ∈ ( R n ) d f 0 ∈K r ( κ ) where R n satisfies : � � 2 log ( log ( N n )) − 1 log ( n ) 1 R n ≤ N n + d , R n ≤ R n ≤ n , 2 ( d + Nn )+ 1 2 Richard Fischer Fast adaptive estimation of log-additive exponential models 13 / 21

  17. Theoretic results Simulation study Summary 1 Theoretic results 2 Simulation study Richard Fischer Fast adaptive estimation of log-additive exponential models 14 / 21

Recommend


More recommend