between discrete and continuous optimization
play

Between Discrete and Continuous Optimization: Submodularity & - PowerPoint PPT Presentation

Between Discrete and Continuous Optimization: Submodularity & Optimization Stefanie Jegelka, MIT Simons Bootcamp Aug 2017 Submodularity set function: F ( S ) V S submodularity = diminishing returns S T, a / T F


  1. Between Discrete and Continuous Optimization: 
 Submodularity & Optimization Stefanie Jegelka, MIT 
 Simons Bootcamp Aug 2017

  2. Submodularity set function: F ( S ) V S • submodularity = “diminishing returns” ∀ S ⊆ T, a / ∈ T F ( S ∪ { a } ) − F ( S ) ≥ F ( T ∪ { a } ) − F ( T )

  3. Submodularity set function: F ( S ) • diminishing returns: ∀ S ⊆ T, a / ∈ T F ( S ∪ { a } ) − F ( S ) ≥ F ( T ∪ { a } ) − F ( T ) • equivalent general definition: ∀ A, B ⊆ V F ( A ) + F ( B ) ≥ F ( A ∪ B ) + F ( A ∩ B )

  4. Why is this interesting? Importance of convex functions (Lovász, 1983): • “occur in many models in economy, engineering and other sciences”, “often the only nontrivial property that can be stated in general” • preserved under many operations and transformations: larger effective range of results • sufficient structure for a “mathematically beautiful and practically useful theory” • efficient minimization “It is less apparent, but we claim and hope to prove to a certain extent, that a similar role is played in discrete optimization by submodular set-functions“ […] 


  5. Examples of submodular set functions • linear functions • discrete entropy • discrete mutual information • matrix rank functions • matroid rank functions (“combinatorial rank”) • coverage • diffusion in networks • volume (by log determinant) • graph cuts • …

  6. 
 
 Roadmap • Optimizing submodular set functions: 
 discrete optimization via continuous optimization 
 • Submodularity more generally: 
 continuous optimization via discrete optimization 
 • Further connections

  7. 
 
 Roadmap • Optimizing submodular set functions 
 via continuous optimization 
 Key Question: 
 Submodularity = Discrete Convexity or Discrete Concavity? 
 (Lovász, Fujishige, Murota, …)

  8. Continuous extensions S ⊆ V F ( S ) min x ∈ { 0 , 1 } n F ( x ) min ⇔ • LP relaxation? 
 nonlinear cost function: exponentially many variables… f : [0 , 1] n → R F : { 0 , 1 } n → R

  9. Nonlinear extensions & optimization nonlinear extension/optimization f : [0 , 1] n → R F : { 0 , 1 } n → R z ∈ conv( C ) ⊆ [0 , 1] n f ( z ) min x ∈ C ⊆ { 0 , 1 } n F ( x ) min

  10. 
 Generic construction f : [0 , 1] n → R F : { 0 , 1 } n → R 1 .5 a a discrete set: 
 0 .5 b b continuous z T = {a,d} 0 0 c c 1 .8 d d • Define probability measure over subsets (joint over coordinates) such that marginals agree with z : 
 P ( i ∈ S ) = z i • Extension: 
 f ( z ) = E [ F ( S )] • for discrete z : f ( z ) = F ( z )

  11. Independent coordinates .5 a f ( z ) = E [ F ( S )] .5 b 0 c Y Y P ( S ) = (1 − z j ) z i · .8 d i ∈ S j / ∈ S • is a multilinear polynomial: multilinear extension 
 f ( z ) • neither convex nor concave…

  12. Lovász extension P ( i ∈ S ) = z i f ( z ) = E [ F ( S )] • “coupled” distribution defined by level sets S 0 = {} , S 1 = { d } , S 2 = { a, b, d } , a .5 S 3 = { a, b, c, d } b .5 c 0 = Choquet integral of F E [ F ( S )] d .8 z Theorem (Lovász 1983) 
 is convex iff is submodular. f ( z ) F ( S )

  13. 
 Convexity and subgradients if F is submodular (Edmonds 1971, Lovász 1983) : s ∈ B F h s, z i f ( z ) = E [ F ( S )] = max a .5 b .5 c 0 d .8 Base Polytope of F • can compute subgradient of f(z) in O(n log n) • rounding: use one of the level sets of z* 
 
 = exact convex relaxation! z ∈ [0 , 1] n f ( z ) min S ⊆ V F ( S ) min

  14. Submodular minimization: a brief overview z ∈ [0 , 1] n f ( z ) min convex optimization • ellipsoid method (Grötschel-Lovász-Schrijver 81) • subgradient method (improved: Chakrabarty-Lee-Sidford-Wong 16) combinatorial optimization • network flow based (Schrijver 00, Iwata-Fleischer-Fujishige-01) 
 O ( n 4 T + n 5 log M ) O ( n 6 + n 5 T ) (Iwata 03), (Orlin 09) convex + combinatorial • cutting planes (Lee-Sidford-Wong 15) O ( n 2 T log nM + n 3 log c nM ) O ( n 3 T log 2 n + n 4 log c n )

  15. 
 
 How far does relaxation go? • strongly convex version: 2 k z k 2 z ∈ R n f ( z )+ 1 min z ∈ [0 , 1] n f ( z ) min dual: 2 k s k 2 1 min s ∈ B F • Fujishige-Wolfe / minimum-norm point algorithm • actually solves parametric submodular minimization • But: no relaxation is tight for constrained minimization 
 typically hard to approximate

  16. Submodular maximization NP-hard | S | ≤ k F ( S ) max max S ⊆ V F ( S ) * • simple cases (*, monotone) : 
 discrete greedy algorithm is optimal (Nemhauser-Wolsey-Fisher 1972) • more complex cases (complicated constraints, non-monotone) : continuous extension + rounding f : [0 , 1] n → R F : { 0 , 1 } n → R concave envelope is intractable, but …

  17. Independent coordinates Y Y P ( S ) = (1 − z j ) z i · f ( z ) = E [ F ( S )] i ∈ S j / ∈ S ∂ 2 f • for all i,j ≤ 0 ∂ x i ∂ x j • concave in increasing directions 
 f ( z ) (diminishing returns) • convex in “swap” directions f ( z ) • continuous maximization (monotone): despite nonconvexity! 
 (Calinescu-Chekuri-Pal-Vondrak 2007, Feldman-Naor-Schwartz 2011,…, Hassani-Soltanolkotabi- Karbasi 2017, …) • similar approach for non-monotone functions 
 (Buchbinder-Naor-Feldman 2012,…)

  18. 
 
 
 
 
 “Continuous greedy” as Frank-Wolfe Initialize: z 0 = 0 • concavity in positive directions: 
 for all there is a : 
 for t =1, . . . T: z ∈ [0 , 1] n v ∈ P s t 2 arg max s ∈ P h s, r f ( z t ) i h v, r f ( z ) i � OPT � f ( z ) z t +1 = z t + α t s t • Analysis: 
 f ( z t +1 ) � f ( z t ) + α h s t , r f ( z t ) i � C 2 α 2 ≥ f ( z t ) + α [OPT − f ( z t )] − C 2 α 2 ⇒ OPT − f ( z t +1 ) ≤ (1 − α )[OPT − f ( z t )] + C 2 α 2 • with α = 1 /T f ( z T ) ≥ (1 − (1 − 1 T ) T )OPT − C 2 T

  19. Binary / Set function optimization • NP-hard • exact convex relaxation • But: constant-factor approxi- • Lovász extension mations for constraints • But: constrained is hard • multilinear extension • convexity • diminishing returns

  20. 
 
 Roadmap • Optimizing submodular set functions: 
 discrete optimization via continuous optimization 
 • Submodularity more generally: 
 continuous optimization via discrete optimization 
 • Further connections

  21. 
 
 
 Submodularity beyond sets • sets: for all subsets 
 A, B ⊆ V F ( A ) + F ( B ) ≥ F ( A ∪ B ) + F ( A ∩ B ) • replace sets by vectors: 
 F ( x ) + F ( y ) ≥ F ( x ∨ y ) + F ( x ∧ y ) • or: Hessian has all off-diagonals <= 0. (Topkis 1978) ∂ 2 F ≤ 0 ∂ x i ∂ x j

  22. Examples ∂ 2 F ≤ 0 F ( x ) + F ( y ) ≥ F ( x ∨ y ) + F ( x ∧ y ) ∂ x i ∂ x j submodular function can be 
 convex, concave or neither! • any separable function X n F ( x ) = i =1 F i ( x i ) • for concave F ( x ) = g ( x i − x j ) g • for convex � X � F ( x ) = h i x i h

  23. Maximization • General case: 
 diminishing returns stronger than submodularity • DR-submodular function: for all ∂ 2 F/ ∂ x i ∂ x j ≤ 0 i, j • with DR, many results generalize 
 (including “continuous greedy”) 
 (Kapralov-Post-Vondrák 2010, Soma et al 2014-15, Ene & Nguyen 2016, Bian et al 2016, Gottschalk & Peis 2016)

  24. Minimization • discretize continuous functions: factor O (1 / ✏ ) • Option 1: 
 transform into set function optimization 
 (Birkhoff 1937, Schrijver 2000, Orlin 2007) 
 better for DR-submodular 
 (Ene & Nguyen 2016) • Option II: 
 convex extension for integer submodular 
 function (Bach 2015)

  25. 
 
 
 
 
 Convex extension • Set functions: efficient minimization via convex extension 
 f : [0 , 1] n → R F : { 0 , 1 } n → R 1 .5 .5 0 f ( z ) = E [ F ( S )] 0 0 .8 1 • Integer vectors: distribution over {0,…k} for each coordinate 1 F : { 0 , . . . k } n → R 4 f ( z ) = E [ F ( x )] 0 2

  26. 
 
 
 
 Applications • robust optimization of bipartite influences (Staib-Jegelka 2017) 
 max y ∈ B min p ∈ P I ( y ; p ) p st • non-convex isotonic regression (Bach 2017) n X min G ( x i − z i ) s.t. x i ≥ x i ∀ ( i, j ) ∈ E x ∈ [0 , 1] n i =1

  27. 
 
 Roadmap • Optimizing submodular set functions: 
 discrete optimization via continuous optimization 
 • Submodularity more generally: 
 continuous optimization via discrete optimization 
 • Further connections

  28. 
 Log-sub/supermodular distributions P ( x ) ∝ exp( F ( x )) P ( S ) ∝ exp( F ( S )) • -F(S) submodular: multivariate totally positive, 
 FKG lattice condition 
 • implies positive association : 
 for all monotonically increasing G,H : 
 E [ G ( S ) H ( S )] ≥ E G ( S ) E H ( S ) • F(S) submodular?

  29. 
 
 
 
 Negative association and stable polynomials • sub-class satisfies negative association : 
 for all monotonically increasing G,H with disjoint support: 
 E [ G ( S ) H ( S )] ≤ E G ( S ) E H ( S ) • Condition implies conditionally negative association: 
 X Y z ∈ C n q ( z ) = P ( S ) z i , S ⊆ V i ∈ S should be real stable. Strongly Rayleigh measures 
 (Borcea, Bränden, Liggett 2009) 


Recommend


More recommend