6. Approximation and fitting Prof. Ying Cui Department of Electrical Engineering Shanghai Jiao Tong University 2018 SJTU Ying Cui 1 / 31
Outline Norm approximation Least-norm problems Regularized approximation Robust approximation SJTU Ying Cui 2 / 31
Basic norm approximation problem min � Ax − b � x where A ∈ R m × n with m ≥ n and independent columns and b ∈ R m are given, and � · � is a norm on R m ◮ a solvable convex problem ◮ optimal value is zero iff b ∈ R ( A ) = { Ax | x ∈ R n } ◮ solution is A − 1 b when m = n ◮ optimal value is non-zero and problem is more interesting and useful if b �∈ R ( A ) ◮ an optimal point x ∗ = arg min x � Ax − b � is called an approximate solution of Ax ≈ b in norm � · � ◮ r = Ax − b is called the residual for the problem SJTU Ying Cui 3 / 31
Interpretations approximation interpretation : ◮ fit/approximate vector b by a linear combination of columns of A , as closely as possible, with deviation measured in � · � ◮ Ax = x 1 a 1 + · · · + x n a n ( a 1 , · · · , a n ∈ R m : columns of A ) ◮ Ax ∗ is best approximation of b ◮ approximation problem is also called regression problem ◮ a 1 , · · · , a n are called regressors ◮ x ∗ 1 a 1 + · · · + x ∗ n a n is called regression of b estimation interpretation : ◮ estimate a parameter vector x on based on an imperfect linear vector measurement b ◮ consider a linear measurement model y = Ax + v , where y ∈ R m is a vector measurement, x ∈ R m is a vector of parameters to be estimated, v ∈ R m is some measurement error that is unknown but presumed to be small in � · � ◮ x ∗ is best guess of x , given y = b SJTU Ying Cui 4 / 31
Interpretations geometric interpretation : ◮ find a projection of point b onto subspace R ( A ) in � · � , i.e., a point in R ( A ) that is closest to b , i.e., an optimal point of min � u − b � u s . t . u ∈ R ( A ) ◮ Ax ∗ is point in R ( A ) closest to b design interpretation : ◮ choose a vector of design variables x that achieves, as closely as possible, the target/desired results b ◮ residual vector r = Ax − b can be interpreted as the deviation between actual results Ax and target/desired results b ◮ x ∗ is design that best achieves desired results b SJTU Ying Cui 5 / 31
Examples least-squares approximation ( � · � 2 ): equivalent problem (QP, obtained by squaring objective): � Ax − b � 2 2 = r 2 1 + · · · + r 2 min m x ◮ objective function f ( x ) = x T A T Ax − 2 x T A T b + b T b is convex quadratic ◮ point x is optimal iff it satisfies ∇ f ( x ) = 2 A T Ax − 2 A T b = 0 = ⇒ A T Ax = A T b which always have a solution ◮ unique solution x ∗ = ( A T A ) − 1 A T b if columns of A are independent (i.e., rank A = n ) SJTU Ying Cui 6 / 31
Examples Chebyshev or minimax approximation ( � · � ∞ ): min � Ax − b � ∞ = max {| r 1 | , · · · , | r m |} x ◮ equivalent problem (LP): min t x ∈ R n , t ∈ R s . t . − t 1 � Ax − b � t 1 Sum of absolute residuals approximation ( � · � 1 ): min � Ax − b � 1 = | r 1 | + · · · + | r m | x ◮ equivalent problem (LP): 1 T t min x , t ∈ R n s . t . − t � Ax − b � t SJTU Ying Cui 7 / 31
Penalty function approximation min φ ( r 1 ) + · · · + φ ( r m ) x ∈ R n , r ∈ R m s . t . r = Ax − b ◮ (residual) penalty function φ : R → R is a measure of dislike of a residual, and assumed to be convex ◮ in many cases, symmetric, nonnegative and φ (0) = 0 ◮ interpretation: minimize total penalty incurred by residuals of approximation Ax of b ◮ extension of equivalent problem of l p -norm (1 ≤ p < ∞ ) approximation p = | r 1 | p + · · · + | r m | p � Ax − b � p min x with a separable and symmetric function of residuals as objective function SJTU Ying Cui 8 / 31
Common penalty functions ◮ l p -norm penalty function φ ( u ) = | u | p (1 ≤ p < ∞ ) ◮ quadratic penalty function φ ( u ) = u 2 ◮ absolute value penalty function φ ( u ) = | u | ◮ deadzone-linear penalty function with deadzone width a > 0 � 0 , | u | ≤ a φ ( u ) = max { 0 , | u | − a } = | u | − a , | u | > a ◮ log-barrier penalty function with limit a > 0 − a 2 log(1 − ( u / a ) 2 ) , � | u | < a φ ( u ) = ∞ , | u | ≥ a SJTU Ying Cui 9 / 31
Example histogram of residual amplitudes for four penalty functions φ ( u ) = | u | , φ ( u ) = u 2 , φ ( u ) = max { 0 , | u |− 0 . 5 } , φ ( u ) = − log(1 − u 2 ) ◮ many zero or very small residuals, more large ones ◮ many modest residuals, relatively fewer large ones ◮ many residuals right at edge of ‘free’ zone ◮ residual distribution similar to that of quadratic except no residuals larger than 1 ◮ shape of penalty function has a large effect on distribution of residuals SJTU Ying Cui 10 / 31
Sensitivity to outliers or large errors ◮ in estimation or regression context, an outlier is a measurement y i = a T i x + v i with a relatively large noise v i ◮ often associated with a flawed measurement or faulty data ◮ ideally guess which measurements are outliers, and either remove outliers or greatly lower their weight ◮ cannot assign zero penalty for very large residuals ◮ avoid making all residuals large to yield a total penalty of zero ◮ sensitivity to outliers depends on (relative) value of penalty function for large residuals ◮ least sensitive convex penalty functions are those grow linearly, i.e., like | u | , for large u , called robust (against outliers) SJTU Ying Cui 11 / 31
Robust convex penalty functions ◮ absolute value penalty function: φ ( u ) = | u | ◮ Huber penalty function (with parameter M > 0): � u 2 , | u | ≤ M φ hub ( u ) = M (2 | u | − M ) , | u | > M ◮ example: use an affine function f ( t ) = α + β t to fit 42 points ( t i , y i ) (circles) with two obvious outliers ◮ left: Huber penalty for M = 1 ◮ right: fit using quadratic penalty (dashed) is rotated away from non-outlier data, toward outliers; fit using Huber penalty (solid) gives a far better fit to non-outlier data SJTU Ying Cui 12 / 31
Approximation with constraints add constraints to basic norm approximation problem ◮ in an approximation problem, constraints can be used to ensure that approximation Ax of b satisfies certain properties ◮ in an estimation problem, constraints arise as prior knowledge of vector x to be estimated, or from prior knowledge of estimation error v ◮ in a geometric problem, constraints arise in determining projection of a point b on a set more complicated than a subspace, e.g., a cone or polyhedron SJTU Ying Cui 13 / 31
Examples Nonnegativity constraints : min � Ax − b � x s . t . x � 0 ◮ approximate b using a conic combination of columns of A ◮ estimate x known to be nonnegative, e.g., powers, rates ◮ determine projection of b onto cone generated by columns of A Variable bounds : min � Ax − b � x s . t . l � x � u ◮ estimate x with prior knowledge of interval for each variable ◮ determine projection of b onto image of a box under linear mapping induced by A SJTU Ying Cui 14 / 31
Examples Probability distribution : min � Ax − b � x x � 0 , 1 T x = 1 s . t . ◮ approximate b using a convex combination of columns of A ◮ estimate proportions or relative frequencies Norm ball constraint : min � Ax − b � x s . t . � x − x 0 � ≤ d ◮ estimate x with prior guess x 0 and maximum plausible deviation d ◮ approximate b using a linear combination of columns of A within trust region � x − x 0 � ≤ d SJTU Ying Cui 15 / 31
Least-norm problems min � x � x s . t . Ax = b where A ∈ R m × n with m ≤ n and independent rows, b ∈ R m , and � · � is a norm on R n ◮ a solvable convex problem ◮ only feasible point is A − 1 b when m = n ◮ problem is interesting only when m < n ( Ax = b underdetermined) ◮ an optimal point x ∗ is called a least-norm solution of Ax = b in norm � · � ◮ reformulation: norm approximation problem min u � x 0 + Zu � , ( x 0 + Zu : general solution of Ax = b , Z ∈ R n × m , u ∈ R m ) ◮ extension: least-penalty problem min φ ( x 1 ) + · · · + φ ( x n ) φ : convex, nonnegative, φ (0) = 0 x s . t . Ax = b SJTU Ying Cui 16 / 31
Interpretations control or design interpretation : ◮ x are n design variables (inputs), b are m required results (outputs), and Ax = b represent m requirements on design ◮ design is underspecified with n − m DoFs (as m < n ) ◮ choose smallest (‘most efficient’) design (measured by norm � · � ) that satisfies requirements estimation interpretation : ◮ x are n parameters, and b are m perfect measurements ◮ measurements do not completely determine parameters (as m < n ), and prior information is that parameters are small (measured by norm � · � ) ◮ choose smallest (‘most plausible’) estimate consistent with measurements geometric interpretation : ◮ find point in affine set { x | Ax = b } with minimum distance (measured by norm � · � ) to 0 SJTU Ying Cui 17 / 31
Recommend
More recommend