Between Discrete and Continuous Optimization: Submodularity & - PowerPoint PPT Presentation

Between Discrete and Continuous Optimization:   Submodularity & Optimization Stefanie Jegelka, MIT   Simons Bootcamp Aug 2017

Submodularity set function: F ( S ) V S • submodularity = “diminishing returns” ∀ S ⊆ T, a / ∈ T F ( S ∪ { a } ) − F ( S ) ≥ F ( T ∪ { a } ) − F ( T )

Submodularity set function: F ( S ) • diminishing returns: ∀ S ⊆ T, a / ∈ T F ( S ∪ { a } ) − F ( S ) ≥ F ( T ∪ { a } ) − F ( T ) • equivalent general definition: ∀ A, B ⊆ V F ( A ) + F ( B ) ≥ F ( A ∪ B ) + F ( A ∩ B )

Why is this interesting? Importance of convex functions (Lovász, 1983): • “occur in many models in economy, engineering and other sciences”, “often the only nontrivial property that can be stated in general” • preserved under many operations and transformations: larger effective range of results • sufficient structure for a “mathematically beautiful and practically useful theory” • efficient minimization “It is less apparent, but we claim and hope to prove to a certain extent, that a similar role is played in discrete optimization by submodular set-functions“ […]  

Examples of submodular set functions • linear functions • discrete entropy • discrete mutual information • matrix rank functions • matroid rank functions (“combinatorial rank”) • coverage • diffusion in networks • volume (by log determinant) • graph cuts • …

    Roadmap • Optimizing submodular set functions:   discrete optimization via continuous optimization   • Submodularity more generally:   continuous optimization via discrete optimization   • Further connections

    Roadmap • Optimizing submodular set functions   via continuous optimization   Key Question:   Submodularity = Discrete Convexity or Discrete Concavity?   (Lovász, Fujishige, Murota, …)

Continuous extensions S ⊆ V F ( S ) min x ∈ { 0 , 1 } n F ( x ) min ⇔ • LP relaxation?   nonlinear cost function: exponentially many variables… f : [0 , 1] n → R F : { 0 , 1 } n → R

Nonlinear extensions & optimization nonlinear extension/optimization f : [0 , 1] n → R F : { 0 , 1 } n → R z ∈ conv( C ) ⊆ [0 , 1] n f ( z ) min x ∈ C ⊆ { 0 , 1 } n F ( x ) min

  Generic construction f : [0 , 1] n → R F : { 0 , 1 } n → R 1 .5 a a discrete set:   0 .5 b b continuous z T = {a,d} 0 0 c c 1 .8 d d • Define probability measure over subsets (joint over coordinates) such that marginals agree with z :   P ( i ∈ S ) = z i • Extension:   f ( z ) = E [ F ( S )] • for discrete z : f ( z ) = F ( z )

Independent coordinates .5 a f ( z ) = E [ F ( S )] .5 b 0 c Y Y P ( S ) = (1 − z j ) z i · .8 d i ∈ S j / ∈ S • is a multilinear polynomial: multilinear extension   f ( z ) • neither convex nor concave…

Lovász extension P ( i ∈ S ) = z i f ( z ) = E [ F ( S )] • “coupled” distribution defined by level sets S 0 = {} , S 1 = { d } , S 2 = { a, b, d } , a .5 S 3 = { a, b, c, d } b .5 c 0 = Choquet integral of F E [ F ( S )] d .8 z Theorem (Lovász 1983)   is convex iff is submodular. f ( z ) F ( S )

  Convexity and subgradients if F is submodular (Edmonds 1971, Lovász 1983) : s ∈ B F h s, z i f ( z ) = E [ F ( S )] = max a .5 b .5 c 0 d .8 Base Polytope of F • can compute subgradient of f(z) in O(n log n) • rounding: use one of the level sets of z*     = exact convex relaxation! z ∈ [0 , 1] n f ( z ) min S ⊆ V F ( S ) min

Submodular minimization: a brief overview z ∈ [0 , 1] n f ( z ) min convex optimization • ellipsoid method (Grötschel-Lovász-Schrijver 81) • subgradient method (improved: Chakrabarty-Lee-Sidford-Wong 16) combinatorial optimization • network flow based (Schrijver 00, Iwata-Fleischer-Fujishige-01)   O ( n 4 T + n 5 log M ) O ( n 6 + n 5 T ) (Iwata 03), (Orlin 09) convex + combinatorial • cutting planes (Lee-Sidford-Wong 15) O ( n 2 T log nM + n 3 log c nM ) O ( n 3 T log 2 n + n 4 log c n )

    How far does relaxation go? • strongly convex version: 2 k z k 2 z ∈ R n f ( z )+ 1 min z ∈ [0 , 1] n f ( z ) min dual: 2 k s k 2 1 min s ∈ B F • Fujishige-Wolfe / minimum-norm point algorithm • actually solves parametric submodular minimization • But: no relaxation is tight for constrained minimization   typically hard to approximate

Submodular maximization NP-hard | S | ≤ k F ( S ) max max S ⊆ V F ( S ) * • simple cases (*, monotone) :   discrete greedy algorithm is optimal (Nemhauser-Wolsey-Fisher 1972) • more complex cases (complicated constraints, non-monotone) : continuous extension + rounding f : [0 , 1] n → R F : { 0 , 1 } n → R concave envelope is intractable, but …

Independent coordinates Y Y P ( S ) = (1 − z j ) z i · f ( z ) = E [ F ( S )] i ∈ S j / ∈ S ∂ 2 f • for all i,j ≤ 0 ∂ x i ∂ x j • concave in increasing directions   f ( z ) (diminishing returns) • convex in “swap” directions f ( z ) • continuous maximization (monotone): despite nonconvexity!   (Calinescu-Chekuri-Pal-Vondrak 2007, Feldman-Naor-Schwartz 2011,…, Hassani-Soltanolkotabi- Karbasi 2017, …) • similar approach for non-monotone functions   (Buchbinder-Naor-Feldman 2012,…)

          “Continuous greedy” as Frank-Wolfe Initialize: z 0 = 0 • concavity in positive directions:   for all there is a :   for t =1, . . . T: z ∈ [0 , 1] n v ∈ P s t 2 arg max s ∈ P h s, r f ( z t ) i h v, r f ( z ) i � OPT � f ( z ) z t +1 = z t + α t s t • Analysis:   f ( z t +1 ) � f ( z t ) + α h s t , r f ( z t ) i � C 2 α 2 ≥ f ( z t ) + α [OPT − f ( z t )] − C 2 α 2 ⇒ OPT − f ( z t +1 ) ≤ (1 − α )[OPT − f ( z t )] + C 2 α 2 • with α = 1 /T f ( z T ) ≥ (1 − (1 − 1 T ) T )OPT − C 2 T

Binary / Set function optimization • NP-hard • exact convex relaxation • But: constant-factor approxi- • Lovász extension mations for constraints • But: constrained is hard • multilinear extension • convexity • diminishing returns

      Submodularity beyond sets • sets: for all subsets   A, B ⊆ V F ( A ) + F ( B ) ≥ F ( A ∪ B ) + F ( A ∩ B ) • replace sets by vectors:   F ( x ) + F ( y ) ≥ F ( x ∨ y ) + F ( x ∧ y ) • or: Hessian has all off-diagonals <= 0. (Topkis 1978) ∂ 2 F ≤ 0 ∂ x i ∂ x j

Examples ∂ 2 F ≤ 0 F ( x ) + F ( y ) ≥ F ( x ∨ y ) + F ( x ∧ y ) ∂ x i ∂ x j submodular function can be   convex, concave or neither! • any separable function X n F ( x ) = i =1 F i ( x i ) • for concave F ( x ) = g ( x i − x j ) g • for convex � X � F ( x ) = h i x i h

Maximization • General case:   diminishing returns stronger than submodularity • DR-submodular function: for all ∂ 2 F/ ∂ x i ∂ x j ≤ 0 i, j • with DR, many results generalize   (including “continuous greedy”)   (Kapralov-Post-Vondrák 2010, Soma et al 2014-15, Ene & Nguyen 2016, Bian et al 2016, Gottschalk & Peis 2016)

Minimization • discretize continuous functions: factor O (1 / ✏ ) • Option 1:   transform into set function optimization   (Birkhoff 1937, Schrijver 2000, Orlin 2007)   better for DR-submodular   (Ene & Nguyen 2016) • Option II:   convex extension for integer submodular   function (Bach 2015)

          Convex extension • Set functions: efficient minimization via convex extension   f : [0 , 1] n → R F : { 0 , 1 } n → R 1 .5 .5 0 f ( z ) = E [ F ( S )] 0 0 .8 1 • Integer vectors: distribution over {0,…k} for each coordinate 1 F : { 0 , . . . k } n → R 4 f ( z ) = E [ F ( x )] 0 2

        Applications • robust optimization of bipartite influences (Staib-Jegelka 2017)   max y ∈ B min p ∈ P I ( y ; p ) p st • non-convex isotonic regression (Bach 2017) n X min G ( x i − z i ) s.t. x i ≥ x i ∀ ( i, j ) ∈ E x ∈ [0 , 1] n i =1

  Log-sub/supermodular distributions P ( x ) ∝ exp( F ( x )) P ( S ) ∝ exp( F ( S )) • -F(S) submodular: multivariate totally positive,   FKG lattice condition   • implies positive association :   for all monotonically increasing G,H :   E [ G ( S ) H ( S )] ≥ E G ( S ) E H ( S ) • F(S) submodular?

        Negative association and stable polynomials • sub-class satisfies negative association :   for all monotonically increasing G,H with disjoint support:   E [ G ( S ) H ( S )] ≤ E G ( S ) E H ( S ) • Condition implies conditionally negative association:   X Y z ∈ C n q ( z ) = P ( S ) z i , S ⊆ V i ∈ S should be real stable. Strongly Rayleigh measures   (Borcea, Bränden, Liggett 2009)  

Between Discrete and Continuous Optimization: Submodularity & - PowerPoint PPT Presentation

Between Discrete and Continuous Optimization: Submodularity & Optimization Stefanie Jegelka, MIT Simons Bootcamp Aug 2017 Submodularity set function: F ( S ) V S submodularity = diminishing returns S T, a / T F

CMSC 222: Discrete Mathematics Prof S Fall 2018 What is Discrete Mathematics? Discrete

Plan Discrete paths as Heyting algebras Discrete paths as categories Discrete paths as quantales

Evidence evaluation for discrete data Evidence evaluation for discrete data Evidence evaluation

Discrete Mathematics Jeremy Siek Spring 2010 Jeremy Siek Discrete Mathematics 1 / 118 Jeremy

Cyber-Physical Systems Discrete Dynamics IECE 553/453 Fall 2019 Prof. Dola Saha 1 Discrete

Cyber-Physical Systems Discrete Dynamics ICEN 553/453 Fall 2018 Prof. Dola Saha 1 Discrete

Discrete-time Systems in the Time Domain Chaiwoot Boonyasiriwat August 21, 2020 Discrete-time

Discrete Structures of Computer Science Amanda Watson What is Discrete Mathematics?

Outline Continuous Optimization DM812 METAHEURISTICS Lecture 12 1. Model Based Metaheuristics

Sampling Theory The world is continuous Like it or not, images are discrete. Intro to

Sampling Theory The world is continuous Like it or not, images are discrete. Intro to

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization

Overview Motivation Verifying Continuous-Time Markov Chains 1 Lecture 1+2: Discrete-Time Markov

continuous random variables continuous random variables Discrete random variable: takes values in

Continuous Descent Operation (CDO) Continuous Descent Operation (CDO) Doc 9331 Doc 9331 Erwin

Continuous Improvement Continuous Improvement Update on Continuous Improvement Process Update on

Machine learning on the symmetric group Jean-Philippe Vert ML ML ML ML What if inputs are

Finding low-rank structure in messy data Laura Balzano University of Michigan Michigan Institute

Ranking and Calibrating Click-Attributed Purchases in Performance Display Advertising Sougata

Approximating likelihood ratios with calibrated classifiers Gilles Louppe DIANA meeting

Structured Graph Learning Via Laplacian Spectral Constraints Sandeep Kumar, Jiaxi Ying, Jos

Linear Dimension Reduction (in L 2 ) Linear Dimension Reduction: R D R d Goal: Find a low-dim.

Machine learning and convex optimization with submodular functions Francis Bach Sierra

To save and enhance lives October 5th, 2015 2 1