Stochastic constrained optimization in Hilbert spaces with applications Georg Ch. Pflug/C. Geiersbach March 27, 2019 Georg Ch. Pflug/C. Geiersbach Stochastic constrained optimization in Hilbert spaces with applications
Iteration methods Problems: Finding roots of equations: Finding optima of functions: Given f ( · ) find a root x ∗ , Given f ( · ) find a candidate such that f ( x ∗ ) = 0. for an optimum, i.e. x ∗ such that ∇ f ( x ∗ ) = 0. Georg Ch. Pflug/C. Geiersbach Stochastic constrained optimization in Hilbert spaces with applications
Newton (1669). Iterative solution method for the equation f ( x ) = x 3 − 2 x − 5 = 0 Raphson (1690) General version x n +1 = x n − f ( x n ) x n +1 = x n − [ ∇ 2 f ( x n )] − 1 ∇ f ( x n ) f ′ ( x n ) vMises, Pollaczek-Geiringer (1929) x n +1 = x n − t · f ( x n ) x n +1 = x n − t · ∇ f ( x n ) converges, if t ≤ [sup x f ′ ( x )] − 1 converges, if t < 1 /λ max with λ max the max. eigenvalue of ∇ 2 f ( x ) decreasing stepsize: x n +1 = x n − t n f ( x n ) x n +1 = x n − t n ∇ f ( x n ) with t n ≥ 0 , t n → 0 , � n t n = ∞ . Georg Ch. Pflug/C. Geiersbach Stochastic constrained optimization in Hilbert spaces with applications
xxxxxxxxxxxxxxxxxxx Rosen (1960). Gradient projection for optimization under linear equality constraints min { f ( x ) : Ax = b } x n +1 = x n − t n ( I − A ⊤ ( AA ⊤ ) − 1 A ) ∇ f ( x n ) Goldstein (1964). Gradient projection for optimization under convex constraints min { f ( x ) : x ∈ C (convex) } x n +1 = π C ( x n − t n ∇ f ( x n )) π C is the convex projection Georg Ch. Pflug/C. Geiersbach Stochastic constrained optimization in Hilbert spaces with applications
Stochastic iterations f resp. ∇ f is only observable together with noise, i.e. f ω ( · ) resp. ∇ f ω ( · ). E [ f ω ( x )] = f ( x ) + bias resp. E [ ∇ f ω ( x )] = ∇ f ( x ) + bias Robbins-Monro (1951) Ermoliev (1967-1976) stochastic (quasi-)gradients X n +1 = X n − t n f ω n ( X n ) X n +1 = X n − t n ∇ f ω n ( X n ) Gupal (1974), Kushner (1974) stochastic (quasi-)gradient projection X n +1 = π C ( X n − t n ∇ f ω n ( X n )) Georg Ch. Pflug/C. Geiersbach Stochastic constrained optimization in Hilbert spaces with applications
The projected stochastic quasigradient method While more sophisticated methods like (Armijo line-search, level set methods, mirror decent methods or operator splitting) have developed and became popular for deterministic optimization, the good old gradient search is still nearly the only method for stochastic optimization. Stochastic optimization is applied in two different cases (1) for problems of huge dimensions, where subproblems of smaller dimension are generated by random selection; (2) for intrinsically stochastic problems, where externally risk factors have to be considered. Problems of type (1) include e.g. digital image classification and restoration, speech recognition, deep machine learning using neural networks and deterministic shape optimization. In this talk, we discuss a problem of type (2): Shape optimization in an intrinsically random environment. Georg Ch. Pflug/C. Geiersbach Stochastic constrained optimization in Hilbert spaces with applications
Projected Stochastic Gradient (PSG) Algorithm in Hilbert spaces Let H be a Hilbert space with inner product �· , ·� and norm �·� . Let the projection onto C be denoted by π C : H → C . Problem: min u ∈ C { j ( u ) = E [ J ω ( u )] } . The PSG Algorithm: ◮ Initialization: u 0 ∈ H ◮ for n = 0 , 1 , . . . Generate independent ω n , choose t n > 0 u n +1 := π C ( u n − t n g n ( ω n )) with stochastic gradient g n Possible choices for the stochastic gradient: ◮ Single realization: g n = ∇ J ω n ( u n ) 1 � m n ◮ Batch method: g n = i =1 ∇ J ω n , i ( u n ) m n Georg Ch. Pflug/C. Geiersbach Stochastic constrained optimization in Hilbert spaces with applications
Illustration Left: Projection to the tangent space Right: Projection to the constraint set Left: Line Search? Right: A stationary point Georg Ch. Pflug/C. Geiersbach Stochastic constrained optimization in Hilbert spaces with applications
Assumptions for Convergence 1. ∅ � = C ⊂ H is closed and convex. 2. J ω is convex and continuously Fr´ echet differentiable for a.e. ω ∈ Ω on a neighborhood of C ⊂ H . 3. j bounded below by ¯ j ∈ R and finitely valued over C . 4. Robbins-Monro step sizes t n ≥ 0 , � ∞ n =0 t n = ∞ , � ∞ n =0 t 2 n < ∞ . 5. ∇ J ω n ( u n ) = ∇ j ( u n ) + w n +1 + r n +1 and increasing {F n } , (i) w n and r n are F n -measurable; (ii) E [ w n |F n ] = 0; (iii) � ∞ n =0 t n esssup � r n � < ∞ ; (iv) ∃ M 1 , M 2 : E [ �∇ J ω n ( u ) � 2 |F n ] ≤ M 1 + M 2 � u n � 2 . Georg Ch. Pflug/C. Geiersbach Stochastic constrained optimization in Hilbert spaces with applications
Convergence Results Theorem ((Geiersbach and G.P.) Weak Convergence in Probability for General Convex Objective.) Under Assumptions 1-5 it holds for PSG algorithm and S := { w ∈ C : j ( w ) = j (˜ u ) } , where ˜ u is a minimizer of j: 1. {� u n − ˜ u �} converges a.s. for all ˜ u ∈ S, 2. { j ( u n ) } converges a.s. and lim n →∞ j ( u n ) = j (˜ u ) , 3. { u n } weakly converges a.s. and lim n u n ∈ S. This is stronger than ”any weak cluster point of ( u n ) lies in S ! Corollary (A.s. Strong Convergence for Strongly Convex Objective) Given Assumptions 1-5, assume as well that j is strongly convex. Then { u n } converges strongly a.s. to a unique optimum ¯ u. Georg Ch. Pflug/C. Geiersbach Stochastic constrained optimization in Hilbert spaces with applications
Efficiency Estimates in the Strongly Convex Case If j is strongly convex with growth µ and t n = θ/ ( n + ν ) for θ > 1 / (2 µ ); ν ≥ K 1 then there are computable constants K 1 , K 2 such that the expected error in the control at step n is � K 2 E [ � u n − ¯ u � ] ≤ n + ν and the expected error in the objective n is given by LK 2 E [ j ( u n ) − j (¯ u )] ≤ 2( n + ν ) . L is the Lipschitz constant for j . This generalizes a result by Nemirovski et al. (2009). Georg Ch. Pflug/C. Geiersbach Stochastic constrained optimization in Hilbert spaces with applications
Efficiency Estimates in the General Convex Case I Polyak and Juditsky (1992), Ruppert (1992): Convergence improvement by taking larger stepsizes and averaging. Define γ k := t k / ( � k ℓ =1 t ℓ ) and N � u N ˜ 1 = γ k u k . k =1 Let D S be a bound s.t. sup u ∈ S � u 0 − u � ≤ D S . We can show that u )] ≤ D S + R � N k =1 t 2 k E [ j ( u k ) − j (¯ 2 � N k =1 t k with a computable constant R . With the constant stepsize policy t n = D S K − 1 / 2 N − 1 / 2 for a fixed number of iterations n = 1 , . . . , N we get the efficiency estimate √ u )] ≤ D R u N E [ j (˜ 1 ) − j (¯ √ . N Georg Ch. Pflug/C. Geiersbach Stochastic constrained optimization in Hilbert spaces with applications
Efficiency Estimates in the General Convex Case II √ With the choice of a variable stepsize t n = θ D S / nR we can show that u )] = O (log n u n E [ j (˜ 1 ) − j (¯ √ n ) And if one starts averaging after N 1 steps with N 1 = [ rn ] one can also get u )] = O ( 1 u n E [ j (˜ N 1 ) − j (¯ √ n ) . These bounds are extensions of Nemirovski et al. (2009). Georg Ch. Pflug/C. Geiersbach Stochastic constrained optimization in Hilbert spaces with applications
A PDE-constained problem: Optimal Control of Stationary Heat Source � � 1 � � + λ 2 � y − y D � 2 2 � u � 2 min E [ J ω ( u )] = E L 2 ( D ) L 2 ( D ) u ∈ C − ∇ · ( a ( x , ω ) ∇ y ( x , ω ) = u ( x ) , ( x , ω ) ∈ D × Ω , s.t. y ( x , ω ) = 0 , ( x , ω ) ∈ ∂ D × Ω . C = { u ∈ L 2 ( D ) : u a ( x ) ≤ u ( x ) ≤ u b ( x ) a.e. x ∈ D } . ◮ Random (positive) conductivity a ( x , ω ) ∈ ( a min , a max ) ◮ Random temperature y = y ( x , ω ) controlled by deterministic source density u = u ( x ). ◮ Deterministic target distribution y D = y D ( x ) ∈ L 2 ( D ). Georg Ch. Pflug/C. Geiersbach Stochastic constrained optimization in Hilbert spaces with applications
The Problem Satisfies Convergence Assumptions ◮ ∅ � = C ⊂ H is closed and convex. ◮ J ω is convex and continuously Fr´ echet differentiable for a.e. ω ∈ Ω on a neighborhood of C ⊂ H . ◮ j bounded below by ¯ j ∈ R and finite valued over C . ◮ Robbins-Monro step sizes t n ≥ 0 , � ∞ n =0 t n = ∞ , � ∞ n =0 t 2 n < ∞ . ◮ For a fixed realization ω , there exists a unique solution y ( · , ω ) ∈ H 1 0 ( D ) to the PDE constraint and � y ( · , ω ) � L 2 ( D ) ≤ C 1 � u � L 2 ( D ) . ◮ ∇ J ω ( u ) = λ u − p ( · , ω ) , where p ( · , ω ) solves the adjoint PDE ˆ ˆ ∀ v ∈ H 1 a ( x , ω ) ∇ v · ∇ p d x = ( y D − y ( · , ω )) v d x 0 ( D ) D D with bounds � p ( · , ω ) � L 2 ( D ) ≤ C 2 � y D − y ( · , ω ) � L 2 ( D ) . ◮ �∇ J ω ( u ) � L 2 ( D ) ≤ λ � u � L 2 ( D ) + C 2 ( � y D � L 2 ( D ) + C 1 � u � L 2 ( D ) ). Georg Ch. Pflug/C. Geiersbach Stochastic constrained optimization in Hilbert spaces with applications
Recommend
More recommend