Metti 5 Optimization for nonlinear parameter estimation and function estimation Lecture 7 – Roscoff, June 13-18, 2011 –
Objectives Direct problem ˛ input ˛ ˛ BC ˛ − → Model − → State solution ˛ IC ˛ ˛ Parameters ˛ Notations R ( u, ψ ) = 0 Model u State ψ Unknown Inverse problem From state measurements u d , find unknown ψ that minimizes j ( ψ ) := J ( u )
Examples Thermal conductivity BC • λ = c te λ = P λ i ξ i ( T ) S ( u, ψ ) = 0 • λ ( u ≡ T ) ⇒ λ = P λ i ξ i ( x ) ψ ← λ ( x ) • λ ( x ) ⇒ Heat transfer coefficient BC • h = c te S ( u, ψ ) = 0 • h ( u ≡ T ) • h ( x ) ψ ← h ( x )
Inverse problem From state measurements u d , find unknown ψ that minimizes j ( ψ ) := J ( u ) where R ( u, ψ ) = 0 : ψ �→ u Contents 1 n -D Optimization 2 Gradient computation 3 An example of heat transfer coefficient identification
Non-linear optimization Direct methods of the kind of those seen in Lecture 2 usable • for linear estimation, • when dim ψ is “low” We need Specific algorithms • for non linear parameter estimation (iterations) • for function estimation, i.e. dim ψ is “high” Function → parameters X ψ i ξ i ( s ) ψ ← ψ ( s ) =
Optimization We search ¯ ψ = arg ψ ∈K⊂V j ( ψ ) min Methods (quite a lot . . . ) n -D Optimization Methods Gradient Free With Gradient Deterministic Stochastic Order 1 Order 2 Order between 1 & 2 . . . . . . Simplex Steepest Conjugate gradients Levenberg PSO AG Newton DFP BFGS . . . . . . and much more than that ! for gradient free, see [ OnWubolu,G.C. and Babu,B.V., New optimization techniques in engineering, Springer, 2003 ]
Gradient-type methods
Gradient-type methods : Steepest method First iteration ∇ j ( ψ ) d
Gradient-type methods : Steepest method Second iteration d 1 ∇ j ( ψ 0 ) ∇ j ( ψ 1 ) d 0
Gradient-type methods : Steepest method Successive displacement : Orthogonality → zig-zag
Gradient-type methods : Steepest method Algorithm 1 : Steepest descent while (Stopping criterion not satisfied) do (We are at the point ψ p , iteration p ) • compute the gradient ∇ j ( ψ ) • the descent direction, d p = −∇ j ( ψ p ) • Line-search : α> 0 g ( α ) = j ( ψ p + αd p ) Find ¯ α = arg min
Stopping criterion �∇ j ( ψ p ) � 2 or ∞ ≤ ε ˛ j ( ψ p ) − j ( ψ p − 1 ) ˛ ≤ ε ˛ ˛ ψ p − ψ p − 1 ≤ ε j ( ψ p ) ≤ ε
Gradient-type methods : Steepest method Successive displacement : Orthogonality → zig-zag Why such zig-zagging ? Step p Direction of descent : d p = −∇ j ( ψ p ) Line search : α> 0 g ( α ) = j ( ψ p + αd p ) Find ¯ α = arg min So : g ′ ( α p ) = 0 = ( d p , ∇ j ( ψ p + αd p )) ` d p , ∇ j ` ψ p +1 ´´ = So : d p , d p +1 ´ ` = 0
Gradient-type methods Admissible directions ∇ j ( ψ ) ( ∇ j, d ) < 0
Conjugate directions ℓ 1 ( α ) = x 1 + αp x 2 ¯ ¯ x 1 ℓ 2 ( α ) = x 2 + αp ⇒ The vector ¯ x 1 − ¯ x 2 is conjugate to the direction p
Conjugate directions e 2 z ¯ x 0 e 1 x 1 ¯ ⇒ The vector z − ¯ x 1 is conjugate to the direction e 1
Conjugate directions for n –D Algorithm Let the quadratic cost j ( ψ ) = 1 2 ( A ψ, ψ ) , First iteration d 0 = −∇ j ( ψ 0 ) Then, from gradient orthogonality : d 0 , ∇ j ψ 1 ´´ d 0 , A ψ 1 ´ ` ` ` = 0 = ψ 0 + α 0 d 0 ´´ ` d 0 , A ` = ` d 0 , A ψ 0 ´ + α 0 ` d 0 , A d 0 ´ = . So we have the step length : ` d 0 , A ψ 0 ´ α 0 = − ( d 0 , A d 0 ) .
Conjugate directions Algorithm Step p The direction d p is chosen A − conjugate to d p − 1 : d p , A d p − 1 ´ −∇ j ( ψ p ) + β p d p − 1 , A d p − 1 ´ ` ` = ` ∇ j ( ψ p ) , A d p − 1 ´ + β p ` d p − 1 , A d p − 1 ´ = − = 0 So : ∇ j ( ψ p ) , A d p − 1 ´ ` β p = . ( d p − 1 , A d p − 1 )
Conjugate directions Algorithm Algorithm 2 : The conjugate gradient algorithm applied on quadratic functions Let p = 0 , ψ 0 be the starting point, • Compute the gradient and the descent direction, d 0 = −∇ j ( ψ 0 ) , ` d 0 , A ψ 0 ´ • Compute the step size α 0 = − ( d 0 , A d 0 ) . while (Stopping criterion not satisfied) do At step p , we are at the point ψ p . We define ψ p +1 = ψ p + α p d p with : • the step size α p = − ( d p , ∇ j ( ψ p )) ( d p , A d p ) • the direction d p = −∇ j ( ψ p ) + β p d p − 1 ∇ j ( ψ p ) , A d p − 1 ´ ` • where the coefficient needed for conjugate directions : β p = ; ( d p − 1 , A d p − 1 )
Gradient-type methods d 1 ∇ j ( ψ 0 ) d 0 ∇ j ( ψ 1 )
Conjugate gradients for non-quadratic functions We use : ψ p − ψ p − 1 ´ ∇ j ( ψ p ) − ∇ j ( ψ p − 1 ) ` = A ψ p − 1 + α p − 1 d p − 1 − ψ p − 1 ´ ` = A = α p − 1 A d p − 1 , and combine with previously-seen relationships to get β p through the • Polak and Ribiere’s method : ∇ j ( ψ p ) , ∇ j ( ψ p ) − ∇ j ( ψ p − 1 ) ` ´ β p = , ( ∇ j ( ψ p − 1 ) , ∇ j ( ψ p − 1 )) • Fletcher and Reeves’ method : ( ∇ j ( ψ p ) , ∇ j ( ψ p )) β p = ( ∇ j ( ψ p − 1 ) , ∇ j ( ψ p − 1 )) .
Conjugate gradients for non-quadratic functions Algorithm 3 : The conjugate gradient algorithm applied on arbitrary functions Let p = 0 , ψ 0 be the starting point, d 0 = −∇ j ( ψ 0 ) , perform the Line-search while (Stopping criterion not satisfied) do At step p , we are at the point ψ p ; we define ψ p +1 = ψ p + α p d p with : • the step size α p = arg min α ∈ R + g ( α ) = j ( ψ p + αd p ) with : • the direction d p = −∇ j ( ψ p ) + β p d p − 1 where • the conjugate condition parameter β p satisfies either • Polak and Ribiere’s method • Fletcher and Reeves’ method
Newton Assume that j ( ψ ) is • twice continuously differentiable, • that second derivatives exist Approach j ( ψ ) by its quadratic approximation δψ p + O ( δψ p ) 2 , ∇ j ( ψ p +1 ) = ∇ j ( ψ p ) + ∇ 2 j ( ψ p ) ˆ ˜ so that δψ p = −∇ j ( ψ p ) ∇ 2 j ( ψ p ) ˆ ˜ with ψ p +1 = δψ p + ψ p Convergence rate Quadratically But • difficult to compute ∇ 2 j , expensive • convergence ensured only if ∇ 2 j is positive definite
Quasi-Newton Newton ˜ − 1 ∇ j ( ψ p ) . ψ p +1 = ψ p − ∇ 2 j ( ψ p ) ˆ Idea : ˜ − 1 ← H p ∇ 2 j ( ψ p ) ˆ H p +1 = H p + Λ p Imposed condition = ψ p − ψ p − 1 ∇ j ( ψ p ) − ∇ j ( ψ p − 1 ) ˆ ˜ H Different methods for the correction Λ p
Quasi-Newton Different methods for the correction Λ p • We set δ p = ψ p +1 − ψ p and γ p = ∇ j ( ψ p +1 ) − ∇ j ( ψ p ) → Davidon-Fletcher-Powell H p +1 = H p + δ p ( δ p ) t ( δ p ) t γ p − H p γ p ( γ p ) t H p ( γ p ) t H γ p ⇒ Broyden – Fletcher – Goldfarb – Shanno – δ p ( δ p ) t ( δ p ) t γ p − δ p γ pt H p + H p γ p δ pt 1 + γ pt H p γ p » H p +1 = H p + . δ pt γ p δ pt γ p Convergence rate Superlinear Remark BFGS is less sensitive than DFP to line-search inacuracy
Test : Rosenbrock „ − 1 « „ 1 « Guess : , Optimum : 1 1 2500 2000 1500 1000 500 3.0 2.5 2.0 2.0 1.5 1.5 1.0 y 1.0 0.5 0.5 0.0 x 0.5 0.0 1.0 0.5 1.5
PSO 2 1.5 1 0.5 0 -0.5 -1 -1 -0.5 0 0.5 1 1.5 2 ⊲⊲ http ://clerc.maurice.free.fr/pso/
Steepest descent 2 1.5 1 0.5 0 -0.5 -1 -1 -0.5 0 0.5 1 1.5 2 ⊲⊲ GSL Library
Conjugate Gradient 2 1.5 1 0.5 0 -0.5 -1 -1 -0.5 0 0.5 1 1.5 2 ⊲⊲ GSL Library
BFGS 2 1.5 1 0.5 0 -0.5 -1 -1 -0.5 0 0.5 1 1.5 2 ⊲⊲ GSL Library
Algos based on cost gradient Previously algo are only based on the cost gradient ∇ j ( ψ ) : • Steepest, • conjugate gradient, • DFP, BFGS There are others that also use the “sensitivity” of the state wrt parameters (cf Lecture 2) Cost of the kind ˆ ( u − u d ) 2 d s j ( ψ ) := J ( u ) = S
Definition (Directional derivative) u ′ ( ψ ; δψ ) is the derivative of the state u ( ψ ) at the point ψ in the direction δψ : u ( ψ + ǫδψ ) − u ( ψ ) u ′ ( ψ ; δψ ) := lim ǫ ǫ → 0 then the directional derivative of the cost function writes : j ′ ( ψ ; δψ ) = J ′ ( u ) , u ′ ( ψ ; δψ ) ` ´ , where j ′ ( ψ ; δψ ) = ( ∇ j ( ψ ) , δψ )
Gauss–Newton Second derivative the second derivative of j ( ψ ) at the point ψ in the directions δψ and δφ is given by : j ′′ ( ψ ; δψ, δφ ) = J ′ ( u ) , u ′′ ( ψ ; δψ, δφ ) J ′′ ( u ) , u ′ ( ψ ; δψ ) , u ′ ( ψ ; δφ ) ` ´ `` ´ ´ + . Neglecting the second-order term (this is actually the Gauss–Newton approach), we have : j ′′ ( ψ ; δψ, δφ ) ≈ J ′′ ( u ) , u ′ ( ψ ; δψ ) , u ′ ( ψ ; δφ ) `` ´ ´ . Gauss–Newton S t Sδψ k = −∇ j ( ψ k ) Matrix S t S usually badly conditionned
Damp the system Levenberg–Marquardt δψ k = −∇ j ( ψ k ) S t S + ℓI ˆ ˜ or better : δψ k = −∇ j ( ψ k ) S t S + ℓ diag ( S t S ) ˆ ˜ Remark Note that ℓ → 0 yields the Gauss–Newton algorithm while ℓ bigger gives an approximation of the steepest descent gradient algorithm. In practice, the parameter ℓ may be adjusted at each iteration. Remark when dim ψ is high → prefer gradient-based methods
Recommend
More recommend