on oracle inequalities related to high dimensional linear
play

On oracle inequalities related to high dimensional linear models - PowerPoint PPT Presentation

Spectral regularization for high dimensional linear models The Empirical Risk Minimization An oracle inequality for a known noise variance Unknown noise variance On oracle inequalities related to high dimensional linear models Yuri Golubev


  1. Spectral regularization for high dimensional linear models The Empirical Risk Minimization An oracle inequality for a known noise variance Unknown noise variance On oracle inequalities related to high dimensional linear models Yuri Golubev CNRS, Universit´ e de Provence Conference on Applied Inverse Problems July 21, Vienna Yuri Golubev Oracle inequalities

  2. Spectral regularization for high dimensional linear models The Empirical Risk Minimization An oracle inequality for a known noise variance Unknown noise variance Outline of the talk 1 Spectral regularization for high dimensional linear models Ordered regularizations 2 The Empirical Risk Minimization Excess risk penalties 3 An oracle inequality for a known noise variance Short discussion 4 Unknown noise variance Example: the Tikhonov-Phillips regularization Yuri Golubev Oracle inequalities

  3. Spectral regularization for high dimensional linear models The Empirical Risk Minimization Ordered regularizations An oracle inequality for a known noise variance Unknown noise variance This talk deals with recovering θ = ( θ (1) , . . . , θ ( n )) ⊤ ∈ R n from the noisy data Y = A θ + σξ, where A is a known m × n - matrix with m ≥ n ξ ∈ R n is a standard white Gaussian noise with E ξ ( k ) ξ ( l ) = δ kl , k , l = 1 , . . . , m n is large (infinity) . σ may be known or unknown. Example: the linear model can be used to approximate the equation � y ( u ) = A ( u , v ) θ ( v ) dv + ε ( u ) . Yuri Golubev Oracle inequalities

  4. Spectral regularization for high dimensional linear models The Empirical Risk Minimization Ordered regularizations An oracle inequality for a known noise variance Unknown noise variance Maximum likelihood estimator The standard ML estimator is defined by m where � x � 2 = ˆ � Y − A θ � 2 , � x 2 ( k ) . θ 0 = arg min θ ∈ R n k =1 With a simple algebra we obtain /Moore (1920), Penrose (1955)/ θ 0 = ( A ⊤ A ) − 1 A ⊤ Y . ˆ Yuri Golubev Oracle inequalities

  5. Spectral regularization for high dimensional linear models The Empirical Risk Minimization Ordered regularizations An oracle inequality for a known noise variance Unknown noise variance Risk of the MP inversion The risk of this inversion is computed as follows: n θ 0 − θ � 2 = E � ( A ⊤ A ) − 1 A ⊤ ǫ � 2 = σ 2 E � ˆ � λ k , k =1 where λ k are the eigenvalues of ( A ⊤ A ) − 1 λ k A ⊤ A ψ k = ψ k , λ 1 ≤ λ 2 , . . . , ≤ λ n and ψ k ∈ R n are the eigenvectors of A ⊤ A . If A has a large condition number or n is large, the risk of ˆ θ 0 may be very large. Yuri Golubev Oracle inequalities

  6. Spectral regularization for high dimensional linear models The Empirical Risk Minimization Ordered regularizations An oracle inequality for a known noise variance Unknown noise variance Spectral regularization The basic idea in the spectral regularization is to suppress large λ k in the risk of ˆ θ 0 . We smooth ˆ θ 0 with the help of a properly chosen matrixes H α , α ∈ R + θ α = H α ˆ ˆ ( A ⊤ A ) − 1 � ( A ⊤ A ) − 1 A ⊤ Y , � θ 0 = H α n � ( A ⊤ A ) − 1 � � where H α ( s , l ) = H α ( λ k ) ψ s ( k ) ψ l ( k ) . k =1 Typically lim α → 0 H α ( λ ) = 1 , lim λ →∞ H α ( λ ) = 0 for all α > 0. α is called regularization parameter. Yuri Golubev Oracle inequalities

  7. Spectral regularization for high dimensional linear models The Empirical Risk Minimization Ordered regularizations An oracle inequality for a known noise variance Unknown noise variance Bias-variance decomposition For the risk of ˆ θ α we get a standard bias-variance decomposition n n θ α − θ � 2 = � 2 � θ, ψ k � 2 + σ 2 E � ˆ � � λ k H 2 � 1 − H α ( λ k ) α ( λ k ) , k =1 k =1 n � where � θ, ψ k � = θ ( l ) ψ k ( l ). l =1 Remarks: The spectral regularization may improve substantially ˆ θ 0 when � θ, ψ k � 2 are small for large k. The best regularization parameter depends on θ and therefore it should be data-driven. Yuri Golubev Oracle inequalities

  8. Spectral regularization for high dimensional linear models The Empirical Risk Minimization Ordered regularizations An oracle inequality for a known noise variance Unknown noise variance Spectral cut-off (requires the SVD) H α ( λ ) = 1 { αλ ≤ 1 } . Tikhonov’s regularization � � Y − A θ � 2 + α � θ � 2 � ˆ θ α = arg min θ or, equivalently, ˆ θ α = [ α I + A ⊤ A ] − 1 A ⊤ Y , H α ( λ ) = (1 + αλ ) − 1 . Landweber’s iterations (solve A ⊤ Y = A ⊤ A θ ) ˆ � ˆ I − a − 1 A ⊤ A θ i − 1 + a − 1 A ⊤ Y . � θ i = The iterations converge if a λ 1 < 1. It is easy to check that 1 − ( a λ ) − 1 � 1 /α , � H α ( λ ) = 1 − α = 1 / ( i + 1) . Yuri Golubev Oracle inequalities

  9. Spectral regularization for high dimensional linear models The Empirical Risk Minimization Ordered regularizations An oracle inequality for a known noise variance Unknown noise variance Ordered functions In the above examples the families of functions (smoothers) H α ( · ) , α ∈ R + are ordered (see Kneip (1995)) 0 ≤ H α ( λ ) ≤ 1 for all λ ∈ R + H α 1 ( λ ) ≥ H α 2 ( λ ) , α 1 ≤ α 2 . Yuri Golubev Oracle inequalities

  10. Spectral regularization for high dimensional linear models The Empirical Risk Minimization Excess risk penalties An oracle inequality for a known noise variance Unknown noise variance Our goal is to find the best estimate within the family spectral regularization methods ˆ θ α = H α [( A ⊤ A ) − 1 ]( A ⊤ A ) − 1 A ⊤ Y , α ∈ [0 , α ◦ ] . In other words, we are looking for ˆ α that minimizes α � 2 uniformly in θ ∈ R n . E � θ − ˆ θ ˆ This idea puts into practice with the help of the empirical risk minimization principle : θ α � 2 + σ 2 Pen ( α ) , R α [ Y ] , where R α [ Y ] = � ˆ θ 0 − ˆ α = arg min ˆ α and Pen ( α ) : (0 , α ◦ ] → R + is a given function of α . Yuri Golubev Oracle inequalities

  11. Spectral regularization for high dimensional linear models The Empirical Risk Minimization Excess risk penalties An oracle inequality for a known noise variance Unknown noise variance A good data-driven regularization should minimize in some sense the risk L α ( θ ) = E � θ − ˆ θ α � 2 . This is why, we are looking for a minimal penalty that ensures the following inequality L α ( θ ) � R α [ Y ] + C , where C is a random variable that doesn’t depend on α and θ . It is easy to check that n θ 0 � 2 = − σ 2 C = −� θ − ˆ � λ k ξ 2 ( k ) k =1 Traditional approach to solve this inequality is based on the unbiased risk estimation defining the penalty as a root of the equation L α ( θ ) = E R α [ Y ] + E C . Yuri Golubev Oracle inequalities

  12. Spectral regularization for high dimensional linear models The Empirical Risk Minimization Excess risk penalties An oracle inequality for a known noise variance Unknown noise variance Excess risk penalties Unfortunately, thus obtained penalty is not good for ill-posed problems (see e.g. Cavalier and Golubev (2006)). The main idea in this talk is to compute the penalty in a little bit different way, namely as a minimal root of the equation � � � � E sup L α ( θ ) − R α [ Y ] − C + ≤ K E L α ◦ ( θ ) − R α ◦ [ Y ] − C + , α ≤ α ◦ where [ x ] + = max { 0 , x } and K > 1 is a constant. Heuristic motivation: we are looking for the minimal penalty balancing the all excess risks. Yuri Golubev Oracle inequalities

  13. Spectral regularization for high dimensional linear models The Empirical Risk Minimization Short discussion An oracle inequality for a known noise variance Unknown noise variance It finds out that for ordered smoothers the penalty may be found as a solution of the marginal equation � � � � + , α ∈ [0 , α ◦ ] L α ( θ ) − R α [ Y ] − C + ≤ E L α ◦ ( θ ) − R α ◦ [ Y ] − C E To compute the penalty, we assume that it has the following structure n � Pen ( α ) = 2 λ k H α [ λ k ] + (1 + γ ) Q ( α ) , k =1 where 2 � n k =1 λ k H α [ λ k ] is the penalty related to the unbiased risk estimation. γ is a positive number and Q ( α ) , α > 0 is a positive function of α to be defined later on. Yuri Golubev Oracle inequalities

  14. Spectral regularization for high dimensional linear models The Empirical Risk Minimization Short discussion An oracle inequality for a known noise variance Unknown noise variance The large deviation approach results in the following algorithm for computing n ρ 2 α ( k ) � Q ( α ) = 2 D ( α ) µ α 1 − 2 µ α ρ α ( k ) , k =1 where n � 2 , � D 2 ( α ) =2 λ 2 2 H α [ λ k ] − H 2 � α [ λ k ] k k =1 √ 2 D − 1 ( α ) λ k 2 H α [ λ k ] − H 2 � � ρ α ( k ) = α [ λ k ] , where µ α is a root of equation n 2 x 2 F [ µ α ρ α ( k )] = log D ( α ) D ( α ◦ ) , F ( x ) = 1 � 2 log(1 − 2 x ) + x + 1 − 2 x . k =1 Yuri Golubev Oracle inequalities

Recommend


More recommend