The Chi-squared Distribution of the Regularized Least Squares - PowerPoint PPT Presentation

Introduction Statistical Results for Least Squares Implications of Statistical Results for Regularized Least Squares Newton algorithm Results The Chi-squared Distribution of the Regularized Least Squares Functional for Regularization Parameter Estimation Rosemary Renaut DEPARTMENT OF MATHEMATICS AND STATISTICS GAMM Workshop 2008 MATHEMATICS AND STATISTICS 1 / 28

Introduction Statistical Results for Least Squares Implications of Statistical Results for Regularized Least Squares Newton algorithm Results Outline Introduction 1 Statistical Results for Least Squares 2 Implications of Statistical Results for Regularized Least Squares 3 Newton algorithm 4 Results 5 Conclusions and Future Work 6 Further Results and More Details 7 MATHEMATICS AND STATISTICS 2 / 28

Introduction Statistical Results for Least Squares Implications of Statistical Results for Regularized Least Squares Newton algorithm Results Least Squares for A x = b, (Weighted) Consider discrete systems: A ∈ R m × n , b ∈ R m , x ∈ R n A x = b + e , e is the m − vector of random measurement errors with mean 0 and positive definite covariance matrix C b = E ( ee T ) . Assume that C b is known. (Calculate if given multiple b ) For uncorrelated measurements C b is diagonal matrix of standard deviations of the errors. (Colored noise) For correlated measurements, let W b = C b − 1 and L b L b T = W b be the Choleski factorization of W b and weight the equation: L b A x = L b b + ˜ e , ˜ e are uncorrelated. (White noise). ˜ e ∼ N ( 0 , I ) , normally distributed mean 0 and variance I . MATHEMATICS AND STATISTICS 3 / 28

Introduction Statistical Results for Least Squares Implications of Statistical Results for Regularized Least Squares Newton algorithm Results Weighted Regularized Least Squares for numerically ill-posed systems Formulation: x = argmin J ( x ) = argmin {� A x − b � 2 W b + � x − x 0 � 2 ˆ W x } . (1) x 0 is a reference solution, often x 0 = 0. Standard: W x = λ 2 I , λ unknown penalty parameter. Statistically, W x is inverse covariance matrix for the model x i.e. λ = 1 /σ x , σ 2 x the common variance in x . Assumes the resulting estimates for x uncorrelated. ˆ x is the standard maximum a posteriori (MAP) estimate of the solution, when all a priori information is provided. Question: The Problem How do we find an appropriate regularization parameter λ ? More generally, what is the correct W x ? MATHEMATICS AND STATISTICS 4 / 28

Introduction Statistical Results for Least Squares Implications of Statistical Results for Regularized Least Squares Newton algorithm Results The General Case : Generalized Tikhonov Regularization Formulation: Regularization with Solution Mapping Generalized Tikhonov regularization, operator D acts on x . x = argmin J D ( x ) = argmin {� A x − b � 2 W b + � ( x − x 0 ) � 2 ˆ W D } . (2) Assume invertibility N ( A ) ∩ N ( D ) = ∅ Then solutions depend on W D = λ 2 D T D : x ( λ ) = argmin J D ( x ) = argmin {� A x − b � 2 W b + λ 2 � D ( x − x 0 ) � 2 } . (3) ˆ GOAL Can we estimate λ efficiently when W b is known? Use statistics of the solution to find λ . MATHEMATICS AND STATISTICS 5 / 28

Introduction Statistical Results for Least Squares Implications of Statistical Results for Regularized Least Squares Newton algorithm Results Background: Statistics of the Least Squares Problem Theorem (Rao73: First Fundamental Theorem) Let r be the rank of A and for b ∼ N ( A x , σ 2 b I ) , (errors in measurements are normally distributed with mean 0 and covariance σ 2 b I), then x � A x − b � 2 ∼ σ 2 b χ 2 ( m − r ) . J = min J follows a χ 2 distribution with m − r degrees of freedom. Corollary (Weighted Least Squares) For b ∼ N ( A x , C b ) , and W b = C b − 1 then x � A x − b � 2 W b ∼ χ 2 ( m − r ) . J = min MATHEMATICS AND STATISTICS 6 / 28

Introduction Statistical Results for Least Squares Implications of Statistical Results for Regularized Least Squares Newton algorithm Results Extension: Statistics of the Regularized Least Squares Problem Theorem: χ 2 distribution of the regularized functional x = argmin J D ( x ) = argmin {� A x − b � 2 W b + � ( x − x 0 ) � 2 W D = D T W x D . ˆ W D } , (4) Assume W b and W x are symmetric positive definite. Problem is uniquely solvable N ( A ) ∩ N ( D ) � = 0. Moore-Penrose generalized inverse of W D is C D Statistics: ( b − A x ) = e ∼ N ( 0 , C b ) , ( x − x 0 ) = f ∼ N ( 0 , C D ) , x 0 is the mean vector of the model parameters. Then J D ∼ χ 2 ( m + p − n ) MATHEMATICS AND STATISTICS 7 / 28

Introduction Statistical Results for Least Squares Implications of Statistical Results for Regularized Least Squares Newton algorithm Results Key Aspects of the Proof I: The Functional J Algebraic Simplifications: Rewrite functional as quadratic form Regularized solution given in terms of resolution matrix R ( W D ) x 0 + ( A T W b A + D T W x D ) − 1 A T W b r , ˆ x = (5) x 0 + R ( W D ) W b 1 / 2 r , = r = b − A x 0 = x 0 + y ( W D ) . (6) ( A T W b A + D T W x D ) − 1 A T W b 1 / 2 R ( W D ) = (7) Functional is given in terms of influence matrix A ( W D ) W b 1 / 2 AR ( W D ) A ( W D ) = (8) r T W b 1 / 2 ( I m − A ( W D )) W b 1 / 2 r , r = W b 1 / 2 r (9) J D (ˆ ˜ x ) = let r T ( I m − A ( W D ))˜ = ˜ r . (10) MATHEMATICS AND STATISTICS 8 / 28

Introduction Statistical Results for Least Squares Implications of Statistical Results for Regularized Least Squares Newton algorithm Results Key Aspects of the Proof II : Properties of a Quadratic Form χ 2 distribution of Quadratic Forms x T P x for normal variables (Fisher- Cochran Theorem) Components x i are independent normal variables x i ∼ N ( 0 , 1 ) , i = 1 : n . A necessary and sufficient condition that x T P x has a central χ 2 distribution is that P is idempotent , P 2 = P . In which case the degrees of freedom of χ 2 is rank( P ) = trace( P ) = n . . When the means of x i are µ i � = 0, x T P x has a non-central χ 2 distribution, with non-centrality parameter c = µ T P µ A χ 2 random variable with n degrees of freedom and centrality parameter c has mean n + c and variance 2 ( n + 2 c ) . MATHEMATICS AND STATISTICS 9 / 28

Introduction Statistical Results for Least Squares Implications of Statistical Results for Regularized Least Squares Newton algorithm Results Key Aspects of the Proof III: Requires the GSVD Lemma Assume invertibility and m ≥ n ≥ p. There exist unitary matrices U ∈ R m × m , V ∈ R p × p , and a nonsingular matrix X ∈ R n × n such that � � Υ X T D = V [ M , 0 p × ( n − p ) ] X T , A = U (11) 0 ( m − n ) × n Υ = diag ( υ 1 , . . . , υ p , 1 , . . . , 1 ) ∈ R n × n , M = diag ( µ 1 , . . . , µ p ) ∈ R p × p , 0 ≤ υ 1 ≤ · · · ≤ υ p ≤ 1 , 1 ≥ µ 1 ≥ · · · ≥ µ p > 0 , (12) υ 2 i + µ 2 i = 1 , i = 1 , . . . p . The Functional with the GSVD ˜ = diag ( µ 1 , . . . , µ p , 0 n − p , I m − n ) Let Q r = � ˜ r T ( I m − A ( W D ))˜ QU T ˜ r � 2 ˜ = 2 , then J MATHEMATICS AND STATISTICS 10 / 28

The Chi-squared Distribution of the Regularized Least Squares - PowerPoint PPT Presentation

Introduction Statistical Results for Least Squares Implications of Statistical Results for Regularized Least Squares Newton algorithm Results The Chi-squared Distribution of the Regularized Least Squares Functional for Regularization Parameter

Chi-squared ( 2 ) (1.10.5) and F -tests (9.5.2) for the variance of a normal distribution 2

Probability Review III Harvard Math Camp - Econometrics Ashesh Rambachan Summer 2018 Outline

M Squared Engineering M Squared Engineering PLAN REVIEW AND PLAN REVIEW AND SPECIFICATIONS

Regularized generalized CCA (RGCCA) Arthur Tenenhaus (SUPELEC) Michel Tenenhaus (HEC Paris) 1

Accessing Higher Ground Using ZoomText and Other Great Products from Ai Squared Kimberly Cline

Regularized Least Squares Charlie Frogner 1 MIT 2012 1 Slides mostly stolen from Ryan Rifkin

Regularized Least Squares Charlie Frogner 1 MIT 2011 1 Slides mostly stolen from Ryan Rifkin

Model Selection and Fast Rates for Regularized Least-Squares Andrea Caponnetto 1 Plan

4/17/2012 M Squared Engineering Contract with IDOT to provide Technical Supportive

CSI5180. MachineLearningfor BioinformaticsApplications Regularized Linear Models by Marcel

Gravity from BRST squared copy BRST Double-copy in a non-flat Silvia Nagy background

The infra elite Seven of the top 10 also formed the list last year, with Stonepeak, I Squared and

R 2 Academy Command Line Crash Course www.r-squared.in/gi t-hub Connect With Us Rsquared

Regularized Linear Models in Stacked Generalization Sam Reid and Greg Grudic Department of

1. Normal distribution 2. Geometric distribution 3. Binomial distribution 4.

M Squared Engineering on behalf of the Illinois Department of Transportation Thursday, February

P SQUARED CONSULTING 6 Spiceberry Place - The Woodlands, TX 77382 Dem ystifying Process I m

Lecture 24: The Sample Variance S 2 The squared variation 0/ 13 Suppose we have n numbers x 1 , x

Practical data analysis Large Number Theorems Width of a distribution Doru Constantin and

Correlated bandits or: How to minimize mean-squared error online 1 LinkedIn Corp. 2 Indian

Distribution The definition of distribution Distribution of the subject-term Distribution of the

The Squared Circle: Fitting Trademark Law Principles into ICANNs Rights Protection Mechanisms

Time-regularized versus framewise reconstruction (a) A = 2 . 8 cps (c) A = 5 . 7 cps (e) A = 5 . 7

The Traveling Salesman Problem Under Squared Euclidean Distances Mark de Berg Fred van Nijnatten