Nonlinear Programming Models Fabio Schoen 2008 http://gol.dsi.unifi.it/users/schoen Nonlinear Programming Models – p.
Introduction Nonlinear Programming Models – p.
NLP problems min f ( x ) x ∈ S ⊆ R n Standard form: min f ( x ) h i ( x ) = 0 i = 1 , m g j ( x ) ≤ 0 j = 1 , k Here S = { x ∈ R n : h i ( x ) = 0 ∀ i, g j ( x ) ≤ 0 ∀ j } Nonlinear Programming Models – p.
Local and global optima A global minimum or global optimum is any x ⋆ ∈ S such that x ∈ S ⇒ f ( x ) ≥ f ( x ⋆ ) A point ¯ x is a local optimum if ∃ ε > 0 such that x ∈ S ∩ B (¯ x, ε ) ⇒ f ( x ) ≥ f (¯ x ) x, ε ) = { x ∈ R n : � x − ¯ x � ≤ ε } is a ball in R n . where B (¯ Any global optimum is also a local optimum, but the opposite is generally false. Nonlinear Programming Models – p.
Convex Functions A set S ⊆ R n is convex if x, y ∈ S ⇒ λx + (1 − λ ) y ∈ S for all choices of λ ∈ [0 , 1] . Let Ω ⊆ R n : non empty convex set. A function f : Ω → R is convex iff f ( λx + (1 − λ ) y ) ≤ λf ( x ) + (1 − λ ) f ( y ) for all x, y ∈ Ω , λ ∈ [0 , 1] Nonlinear Programming Models – p.
Convex Functions x y Nonlinear Programming Models – p.
Properties of convex functions Every convex function is continuous in the interior of Ω . It might be discontinuous, but only on the frontier. If f is continuously differentiable then it is convex iff f ( y ) ≥ f ( x ) + ( y − x ) T ∇ f ( x ) for all y ∈ Ω Nonlinear Programming Models – p.
Convex functions x y Nonlinear Programming Models – p.
If f is twice continuously differentiable ⇒ f it is convex iff its Hessian matrix is positive semi-definite: � ∂ 2 f � ∇ 2 f ( x ) := ∂x i ∂x j then ∇ 2 f ( x ) � 0 iff v T ∇ 2 f ( x ) v ≥ 0 ∀ v ∈ R n or, equivalently, all eigenvalues of ∇ 2 f ( x ) are non negative. Nonlinear Programming Models – p.
Example: an affine function is convex (and concave) For a quadratic function ( Q : symmetric matrix): f ( x ) = 1 2 x T Qx + b T x + c we have ∇ 2 f ( x ) = Q ∇ f ( x ) = Qx + b ⇒ f is convex iff Q � 0 Nonlinear Programming Models – p. 1
Convex Optimization Problems min f ( x ) x ∈ S is a convex optimization problem iff S is a convex set and f is convex on S . For a problem in standard form min f ( x ) h i ( x ) = 0 i = 1 , m g j ( x ) ≤ 0 j = 1 , k if f is convex, h i ( x ) are affine functions, g j ( x ) are convex functions, then the problem is convex. Nonlinear Programming Models – p. 1
Maximization Slight abuse in notation: a problem max f ( x ) x ∈ S is called convex iff S is a convex set and f is a concave function (not to be confused with minimization of a concave function, (or maximization of a convex function) which are NOT a convex optimization problem) Nonlinear Programming Models – p. 1
Convex and non convex optimization Convex optimization “is easy”, non convex optimization is usually very hard. Fundamental property of convex optimization problems: every local optimum is also a global optimum (will give a proof later) Minimizing a positive semidefinite quadratic function on a polyhedron is easy (polynomially solvable); if even a single eigenvalue of the hessian is negative ⇒ the problem becomes NP –hard Nonlinear Programming Models – p. 1
Convex functions: examples Many (of course not all . . . ) functions are convex! affine functions a T x + b quadratic functions 1 2 x T Qx + b T x + c with Q = Q T , Q � 0 any norm is a convex function x log x (however log x is concave) f is convex if and only if ∀ x 0 , d ∈ R n , its restriction to any line: φ ( α ) = f ( x 0 + αd ) , is a convex function a linear non negative combination of convex functions is convex � g ( x, y ) convex in x for all y ⇒ g ( x, y ) dy convex Nonlinear Programming Models – p. 1
more examples . . . max i { a T i x + b } is convex f, g : convex ⇒ max { f ( x ) , g ( x ) } is convex f a convex functions for any a ∈ A (a possibly uncountable set) ⇒ sup a ∈A f a ( x ) is convex f convex ⇒ f ( Ax + b ) let S ⊆ R n be any set ⇒ f ( x ) = sup s ∈ S � x − s � is convex Trace ( A T X ) = � i,j A ij X ij is convex (it is linear!) log det X − 1 is convex over the set of matrices X ∈ R n × n : X ≻ 0 λ max ( X ) (the largest eigenvalue of a matrix X ) Nonlinear Programming Models – p. 1
Data Approximation Nonlinear Programming Models – p. 1
Table of contents norm approximation maximum likelihood robust estimation Nonlinear Programming Models – p. 1
Norm approximation Problem: x � Ax − b � min where A, b : parameters. Usually the system is over-determined, i.e. b �∈ Range ( A ) . For example, this happens when A ∈ R m × n with m > n and A has full rank. r := Ax − b : “residual”. Nonlinear Programming Models – p. 1
Examples √ � r � = r T r : least squares (or “regression”) √ � r � = r T Pr with P ≻ 0 : weighted least squares � r � = max i | r i | : minimax, or ℓ ∞ or di Tchebichev approximation i | r i | : absolute or ℓ 1 approximation � r � = � Possible (convex) additional constraints: maximum deviation from an initial estimate: � x − x est � ≤ ǫ simple bounds ℓ i ≤ x i ≤ u i ordering: x 1 ≤ x 2 ≤ · · · ≤ x n Nonlinear Programming Models – p. 1
Example: ℓ 1 norm Matrix A ∈ R 100 × 30 80 norm 1 residuals 70 60 50 40 30 20 10 0 -5 -4 -3 -2 -1 0 1 2 3 4 5 Nonlinear Programming Models – p. 2
ℓ ∞ norm 20 ∞ norm residuals 18 16 14 12 10 8 6 4 2 0 -5 -3 0 3 5 -4 -2 -1 1 2 4 Nonlinear Programming Models – p. 2
ℓ 2 norm 18 norm 2 residuals 16 14 12 10 8 6 4 2 0 -5 -3 0 3 5 -4 -2 -1 1 2 4 Nonlinear Programming Models – p. 2
Variants i h ( y i − a T min � i x ) where h : convex function: � z 2 | z | ≤ 1 h linear–quadratic h ( z ) = 2 | z | − 1 | z | > 1 � | z | ≤ 1 0 “dead zone”: h ( z ) = | z | − 1 | z | > 1 � − log(1 − z 2 ) | z | < 1 logarithmic barrier: h ( z ) = ∞ | z | ≥ 1 Nonlinear Programming Models – p. 2
comparison 4 norm 1(x) norm 2(x) 3.5 linquad(x) deadzone(x) 3 logbarrier(x) 2.5 2 1.5 1 0.5 0 -0.5 -1.5 -0.5 0 0.5 1.5 -2 -1 1 2 Nonlinear Programming Models – p. 2
Maximum likelihood Given a sample X 1 , X 2 , . . . , X k and a parametric family of probability density functions L ( · ; θ ) , the maximum likelihood estimate of θ given the sample is ˆ L ( X 1 , . . . , X k ; θ ) θ = arg max θ Example: linear measures with and additive i.i.d. (independent identically dsitributed) noise: X i = a T i θ + ε i (1) where ε i iid random variables with density p ( · ) : k � p ( X i − a T L ( X 1 . . . , X k ; θ ) = i θ ) i =1 Nonlinear Programming Models – p. 2
Max likelihood estimate - MLE (taking the logarithm, which does not change optimum points): ˆ � log( p ( X i − a T θ = arg max i θ )) θ i If p is log–concave ⇒ this problem is convex. Examples: ε ∼ N (0 , σ ) , i.e. p ( z ) = (2 πσ ) − 1 / 2 exp( − z 2 / 2 σ 2 ) ⇒ MLE is the ℓ 2 estimate: θ = arg min � Aθ − X � 2 ; p ( z ) = (1 / (2 a )) exp( −| z | /a ) ⇒ ℓ 1 estimate: ˆ θ = arg min θ � Aθ − X � 1 Nonlinear Programming Models – p. 2
p ( z ) = (1 /a ) exp( − z/a )1 { z ≥ 0 } (negative exponential) ⇒ the estimate can be found solving the LP problem: min 1 T ( X − Aθ ) ≤ Aθ X p uniform on [ − a, a ] ⇒ the MLE is any θ such that � Aθ − X � ∞ ≤ a Nonlinear Programming Models – p. 2
Ellipsoids An ellipsoid is a subset of R n of the form E = { x ∈ R n : ( x − x 0 ) T P − 1 ( x − x 0 ) ≤ 1 } where x 0 ∈ R n is the center of the ellipsoid and P is a symmetric positive-definite matrix. Alternative representations: E = { x ∈ R n : � Ax − b � 2 ≤ 1 } where A ≻ 0 , or E = { x ∈ R n : x = x 0 + Au | � u � 2 ≤ 1 } where A is square and non singular (affine transformation of the unit ball) Nonlinear Programming Models – p. 2
Robust Least Squares i x − b i ) 2 Hp: a i not known, �� i ( a T Least Squares: ˆ x = arg min but it is known that a i ∈ E i = { ¯ a i + P i u : � u � ≤ 1 } where P i = P T i � 0 . Definition: worst case residuals: �� ( a T max i x − b i ) 2 a i ∈E i i A robust estimate of x is the solution of �� ( a T i x − b i ) 2 x r = arg min ˆ x max a i ∈E i i Nonlinear Programming Models – p. 2
RLS It holds: | α + β T y | ≤ | α | + � β �� y � then, choosing y ⋆ = β/ � β � if α ≥ 0 and y ⋆ = − β/ � β � , otherwise if α < 0 , then � y � = 1 and | α + β T y ⋆ | = | α + β T β/ � β � sign ( α ) | = | α | + � β � then: a i ∈E i | ( a T a T i x − b i + u T P i x | max i x − b i ) | = � u �≤ 1 | ¯ max a T = | ¯ i x − b i | + � P i x � Nonlinear Programming Models – p. 3
. . . Thus the Robust Least Squares problem reduces to � 1 / 2 �� a T i x − b i | + � P i x � ) 2 min ( | ¯ i (a convex optimization problem). Transformation: x,t � t � 2 min a T | ¯ i x − b i | + � P i x � ≤ t i ∀ i i.e. Nonlinear Programming Models – p. 3
Recommend
More recommend