Between Discrete and Continuous Optimization: Submodularity & Optimization Stefanie Jegelka, MIT Simons Bootcamp Aug 2017
Submodularity set function: F ( S ) V S • submodularity = “diminishing returns” ∀ S ⊆ T, a / ∈ T F ( S ∪ { a } ) − F ( S ) ≥ F ( T ∪ { a } ) − F ( T )
Submodularity set function: F ( S ) • diminishing returns: ∀ S ⊆ T, a / ∈ T F ( S ∪ { a } ) − F ( S ) ≥ F ( T ∪ { a } ) − F ( T ) • equivalent general definition: ∀ A, B ⊆ V F ( A ) + F ( B ) ≥ F ( A ∪ B ) + F ( A ∩ B )
Why is this interesting? Importance of convex functions (Lovász, 1983): • “occur in many models in economy, engineering and other sciences”, “often the only nontrivial property that can be stated in general” • preserved under many operations and transformations: larger effective range of results • sufficient structure for a “mathematically beautiful and practically useful theory” • efficient minimization “It is less apparent, but we claim and hope to prove to a certain extent, that a similar role is played in discrete optimization by submodular set-functions“ […]
Examples of submodular set functions • linear functions • discrete entropy • discrete mutual information • matrix rank functions • matroid rank functions (“combinatorial rank”) • coverage • diffusion in networks • volume (by log determinant) • graph cuts • …
Roadmap • Optimizing submodular set functions: discrete optimization via continuous optimization • Submodularity more generally: continuous optimization via discrete optimization • Further connections
Roadmap • Optimizing submodular set functions via continuous optimization Key Question: Submodularity = Discrete Convexity or Discrete Concavity? (Lovász, Fujishige, Murota, …)
Continuous extensions S ⊆ V F ( S ) min x ∈ { 0 , 1 } n F ( x ) min ⇔ • LP relaxation? nonlinear cost function: exponentially many variables… f : [0 , 1] n → R F : { 0 , 1 } n → R
Nonlinear extensions & optimization nonlinear extension/optimization f : [0 , 1] n → R F : { 0 , 1 } n → R z ∈ conv( C ) ⊆ [0 , 1] n f ( z ) min x ∈ C ⊆ { 0 , 1 } n F ( x ) min
Generic construction f : [0 , 1] n → R F : { 0 , 1 } n → R 1 .5 a a discrete set: 0 .5 b b continuous z T = {a,d} 0 0 c c 1 .8 d d • Define probability measure over subsets (joint over coordinates) such that marginals agree with z : P ( i ∈ S ) = z i • Extension: f ( z ) = E [ F ( S )] • for discrete z : f ( z ) = F ( z )
Independent coordinates .5 a f ( z ) = E [ F ( S )] .5 b 0 c Y Y P ( S ) = (1 − z j ) z i · .8 d i ∈ S j / ∈ S • is a multilinear polynomial: multilinear extension f ( z ) • neither convex nor concave…
Lovász extension P ( i ∈ S ) = z i f ( z ) = E [ F ( S )] • “coupled” distribution defined by level sets S 0 = {} , S 1 = { d } , S 2 = { a, b, d } , a .5 S 3 = { a, b, c, d } b .5 c 0 = Choquet integral of F E [ F ( S )] d .8 z Theorem (Lovász 1983) is convex iff is submodular. f ( z ) F ( S )
Convexity and subgradients if F is submodular (Edmonds 1971, Lovász 1983) : s ∈ B F h s, z i f ( z ) = E [ F ( S )] = max a .5 b .5 c 0 d .8 Base Polytope of F • can compute subgradient of f(z) in O(n log n) • rounding: use one of the level sets of z* = exact convex relaxation! z ∈ [0 , 1] n f ( z ) min S ⊆ V F ( S ) min
Submodular minimization: a brief overview z ∈ [0 , 1] n f ( z ) min convex optimization • ellipsoid method (Grötschel-Lovász-Schrijver 81) • subgradient method (improved: Chakrabarty-Lee-Sidford-Wong 16) combinatorial optimization • network flow based (Schrijver 00, Iwata-Fleischer-Fujishige-01) O ( n 4 T + n 5 log M ) O ( n 6 + n 5 T ) (Iwata 03), (Orlin 09) convex + combinatorial • cutting planes (Lee-Sidford-Wong 15) O ( n 2 T log nM + n 3 log c nM ) O ( n 3 T log 2 n + n 4 log c n )
How far does relaxation go? • strongly convex version: 2 k z k 2 z ∈ R n f ( z )+ 1 min z ∈ [0 , 1] n f ( z ) min dual: 2 k s k 2 1 min s ∈ B F • Fujishige-Wolfe / minimum-norm point algorithm • actually solves parametric submodular minimization • But: no relaxation is tight for constrained minimization typically hard to approximate
Submodular maximization NP-hard | S | ≤ k F ( S ) max max S ⊆ V F ( S ) * • simple cases (*, monotone) : discrete greedy algorithm is optimal (Nemhauser-Wolsey-Fisher 1972) • more complex cases (complicated constraints, non-monotone) : continuous extension + rounding f : [0 , 1] n → R F : { 0 , 1 } n → R concave envelope is intractable, but …
Independent coordinates Y Y P ( S ) = (1 − z j ) z i · f ( z ) = E [ F ( S )] i ∈ S j / ∈ S ∂ 2 f • for all i,j ≤ 0 ∂ x i ∂ x j • concave in increasing directions f ( z ) (diminishing returns) • convex in “swap” directions f ( z ) • continuous maximization (monotone): despite nonconvexity! (Calinescu-Chekuri-Pal-Vondrak 2007, Feldman-Naor-Schwartz 2011,…, Hassani-Soltanolkotabi- Karbasi 2017, …) • similar approach for non-monotone functions (Buchbinder-Naor-Feldman 2012,…)
“Continuous greedy” as Frank-Wolfe Initialize: z 0 = 0 • concavity in positive directions: for all there is a : for t =1, . . . T: z ∈ [0 , 1] n v ∈ P s t 2 arg max s ∈ P h s, r f ( z t ) i h v, r f ( z ) i � OPT � f ( z ) z t +1 = z t + α t s t • Analysis: f ( z t +1 ) � f ( z t ) + α h s t , r f ( z t ) i � C 2 α 2 ≥ f ( z t ) + α [OPT − f ( z t )] − C 2 α 2 ⇒ OPT − f ( z t +1 ) ≤ (1 − α )[OPT − f ( z t )] + C 2 α 2 • with α = 1 /T f ( z T ) ≥ (1 − (1 − 1 T ) T )OPT − C 2 T
Binary / Set function optimization • NP-hard • exact convex relaxation • But: constant-factor approxi- • Lovász extension mations for constraints • But: constrained is hard • multilinear extension • convexity • diminishing returns
Roadmap • Optimizing submodular set functions: discrete optimization via continuous optimization • Submodularity more generally: continuous optimization via discrete optimization • Further connections
Submodularity beyond sets • sets: for all subsets A, B ⊆ V F ( A ) + F ( B ) ≥ F ( A ∪ B ) + F ( A ∩ B ) • replace sets by vectors: F ( x ) + F ( y ) ≥ F ( x ∨ y ) + F ( x ∧ y ) • or: Hessian has all off-diagonals <= 0. (Topkis 1978) ∂ 2 F ≤ 0 ∂ x i ∂ x j
Examples ∂ 2 F ≤ 0 F ( x ) + F ( y ) ≥ F ( x ∨ y ) + F ( x ∧ y ) ∂ x i ∂ x j submodular function can be convex, concave or neither! • any separable function X n F ( x ) = i =1 F i ( x i ) • for concave F ( x ) = g ( x i − x j ) g • for convex � X � F ( x ) = h i x i h
Maximization • General case: diminishing returns stronger than submodularity • DR-submodular function: for all ∂ 2 F/ ∂ x i ∂ x j ≤ 0 i, j • with DR, many results generalize (including “continuous greedy”) (Kapralov-Post-Vondrák 2010, Soma et al 2014-15, Ene & Nguyen 2016, Bian et al 2016, Gottschalk & Peis 2016)
Minimization • discretize continuous functions: factor O (1 / ✏ ) • Option 1: transform into set function optimization (Birkhoff 1937, Schrijver 2000, Orlin 2007) better for DR-submodular (Ene & Nguyen 2016) • Option II: convex extension for integer submodular function (Bach 2015)
Convex extension • Set functions: efficient minimization via convex extension f : [0 , 1] n → R F : { 0 , 1 } n → R 1 .5 .5 0 f ( z ) = E [ F ( S )] 0 0 .8 1 • Integer vectors: distribution over {0,…k} for each coordinate 1 F : { 0 , . . . k } n → R 4 f ( z ) = E [ F ( x )] 0 2
Applications • robust optimization of bipartite influences (Staib-Jegelka 2017) max y ∈ B min p ∈ P I ( y ; p ) p st • non-convex isotonic regression (Bach 2017) n X min G ( x i − z i ) s.t. x i ≥ x i ∀ ( i, j ) ∈ E x ∈ [0 , 1] n i =1
Roadmap • Optimizing submodular set functions: discrete optimization via continuous optimization • Submodularity more generally: continuous optimization via discrete optimization • Further connections
Log-sub/supermodular distributions P ( x ) ∝ exp( F ( x )) P ( S ) ∝ exp( F ( S )) • -F(S) submodular: multivariate totally positive, FKG lattice condition • implies positive association : for all monotonically increasing G,H : E [ G ( S ) H ( S )] ≥ E G ( S ) E H ( S ) • F(S) submodular?
Negative association and stable polynomials • sub-class satisfies negative association : for all monotonically increasing G,H with disjoint support: E [ G ( S ) H ( S )] ≤ E G ( S ) E H ( S ) • Condition implies conditionally negative association: X Y z ∈ C n q ( z ) = P ( S ) z i , S ⊆ V i ∈ S should be real stable. Strongly Rayleigh measures (Borcea, Bränden, Liggett 2009)
Recommend
More recommend