new algorithms for approximate minimization of the
play

New Algorithms for Approximate Minimization of the Difference - PowerPoint PPT Presentation

Background v = f g Procedures for min f g Additional Theoretical Results Experiments Summary New Algorithms for Approximate Minimization of the Difference Between Submodular Functions, with Applications Rishabh Iyer Jeff Bilmes


  1. Background v = f − g Procedures for min f − g Additional Theoretical Results Experiments Summary New Algorithms for Approximate Minimization of the Difference Between Submodular Functions, with Applications Rishabh Iyer Jeff Bilmes University of Washington, Seattle Department of Electrical Engineering December 13, 2013 Iyer/Bilmes 2012 Minimizing submodular f − g page 1 / 34

  2. Background v = f − g Procedures for min f − g Additional Theoretical Results Experiments Summary Outline Background 1 Optimizing v ( X ) = f ( X ) − g ( X ) with f , g submodular 2 Procedures for minimizing v ( X ) 3 The Submodular Supermodular Procedure The Supermodular Submodular Procedure The Modular Modular Procedure Some Additional Theoretical Results 4 Experiments 5 Iyer/Bilmes 2012 Minimizing submodular f − g page 2 / 34

  3. Background v = f − g Procedures for min f − g Additional Theoretical Results Experiments Summary Outline Background 1 Optimizing v ( X ) = f ( X ) − g ( X ) with f , g submodular 2 Procedures for minimizing v ( X ) 3 The Submodular Supermodular Procedure The Supermodular Submodular Procedure The Modular Modular Procedure Some Additional Theoretical Results 4 Experiments 5 Iyer/Bilmes 2012 Minimizing submodular f − g page 3 / 34

  4. Background v = f − g Procedures for min f − g Additional Theoretical Results Experiments Summary Submodular Functions A function f : 2 V → R is submodular if for all A , B ⊆ V , f ( A ) + f ( B ) ≥ f ( A ∪ B ) + f ( A ∩ B ). Coverage of intersection of elements is less then common coverage. + ≥ f ( A ∪ B ) f ( A ) + f ( B ) f ( A ∩ B ) = f ( A r ) + 2 f ( C ) + f ( B r ) = f ( A r ) + f ( C ) + f ( B r ) Equivalently, diminishing returns: Let A ⊆ B ⊆ V \ { j } then f is submodular iff f ( v | A ) � f ( A + v ) − f ( A ) ≥ f ( B + v ) − f ( B ) � f ( v | B ) (1) I.e., conditioning reduces valuation (like entropy). Iyer/Bilmes 2012 Minimizing submodular f − g page 4 / 34

  5. Background v = f − g Procedures for min f − g Additional Theoretical Results Experiments Summary Outline Background 1 Optimizing v ( X ) = f ( X ) − g ( X ) with f , g submodular 2 Procedures for minimizing v ( X ) 3 The Submodular Supermodular Procedure The Supermodular Submodular Procedure The Modular Modular Procedure Some Additional Theoretical Results 4 Experiments 5 Iyer/Bilmes 2012 Minimizing submodular f − g page 5 / 34

  6. Background v = f − g Procedures for min f − g Additional Theoretical Results Experiments Summary Optimizing The Difference between Two Submodular Functions In this paper, we address the following problem. Given two submodular functions f and g , solve the optimization problem: X ⊆ V [ f ( X ) − g ( X )] ≡ min min X ⊆ V [ v ( X )] (2) with v : 2 V → R , v = f − g . A function r is said to be supermodular if − r is submodular. Iyer/Bilmes 2012 Minimizing submodular f − g page 6 / 34

  7. Background v = f − g Procedures for min f − g Additional Theoretical Results Experiments Summary Applications Sensor placement with submodular costs. I.e., let V be a set of possible sensor locations, f ( A ) = I ( X A ; X V \ A ) measures the quality of a subset A of placed sensors, and c ( A ) the submodular cost. We have min A f ( A ) − λ c ( A ). Iyer/Bilmes 2012 Minimizing submodular f − g page 7 / 34

  8. Background v = f − g Procedures for min f − g Additional Theoretical Results Experiments Summary Applications Sensor placement with submodular costs. I.e., let V be a set of possible sensor locations, f ( A ) = I ( X A ; X V \ A ) measures the quality of a subset A of placed sensors, and c ( A ) the submodular cost. We have min A f ( A ) − λ c ( A ). Discriminatively structured graphical models, EAR measure I ( X A ; X V \ A ) − I ( X A ; X V \ A | C ), and synergy in neuroscience. Iyer/Bilmes 2012 Minimizing submodular f − g page 7 / 34

  9. Background v = f − g Procedures for min f − g Additional Theoretical Results Experiments Summary Applications Sensor placement with submodular costs. I.e., let V be a set of possible sensor locations, f ( A ) = I ( X A ; X V \ A ) measures the quality of a subset A of placed sensors, and c ( A ) the submodular cost. We have min A f ( A ) − λ c ( A ). Discriminatively structured graphical models, EAR measure I ( X A ; X V \ A ) − I ( X A ; X V \ A | C ), and synergy in neuroscience. Feature selection: a problem of maximizing I ( X A ; C ) − λ c ( A ) = H ( X A ) − [ H ( X A | C ) + λ c ( A )], the difference between two submodular functions, where H is the entropy and c is a feature cost function. Iyer/Bilmes 2012 Minimizing submodular f − g page 7 / 34

  10. Background v = f − g Procedures for min f − g Additional Theoretical Results Experiments Summary Applications Sensor placement with submodular costs. I.e., let V be a set of possible sensor locations, f ( A ) = I ( X A ; X V \ A ) measures the quality of a subset A of placed sensors, and c ( A ) the submodular cost. We have min A f ( A ) − λ c ( A ). Discriminatively structured graphical models, EAR measure I ( X A ; X V \ A ) − I ( X A ; X V \ A | C ), and synergy in neuroscience. Feature selection: a problem of maximizing I ( X A ; C ) − λ c ( A ) = H ( X A ) − [ H ( X A | C ) + λ c ( A )], the difference between two submodular functions, where H is the entropy and c is a feature cost function. Graphical Model Inference. Finding x that maximizes p ( x ) ∝ exp( − v ( x )) where x ∈ { 0 , 1 } n and v is a pseudo-Boolean function. When v is non-submodular, it can be represented as a difference between submodular functions. Iyer/Bilmes 2012 Minimizing submodular f − g page 7 / 34

  11. Background v = f − g Procedures for min f − g Additional Theoretical Results Experiments Summary Heuristics for General Set Function Optimization Lemma (Narisimham & Bilmes, 2005) Given any set function v, it can be expressed as v ( X ) = f ( X ) − g ( X ) , ∀ X ⊆ V for some submodular functions f and g. We give a new proof that depends on computing α v = min X ⊂ Y ⊆ V \ j v ( j | X ) − v ( j | Y ) which can be intractable for general v . However, we show that for those functions where α v can be bounded efficiently, f and g can be computed efficiently. Lemma For a given set function v, if α v or a lower bound can be found in polynomial time, a corresponding decomposition f and g can also be found in polynomial time. Iyer/Bilmes 2012 Minimizing submodular f − g page 8 / 34

  12. Background v = f − g Procedures for min f − g Additional Theoretical Results Experiments Summary Outline Background 1 Optimizing v ( X ) = f ( X ) − g ( X ) with f , g submodular 2 Procedures for minimizing v ( X ) 3 The Submodular Supermodular Procedure The Supermodular Submodular Procedure The Modular Modular Procedure Some Additional Theoretical Results 4 Experiments 5 Iyer/Bilmes 2012 Minimizing submodular f − g page 9 / 34

  13. Background v = f − g Procedures for min f − g Additional Theoretical Results Experiments Summary Convex/Concave and Semigradients A convex function φ has a subgradient at any in-domain point y , namely there exists h y such that φ ( x ) − φ ( y ) ≥ � h y , x − y � , ∀ x . (3) A concave ψ has a supergradient at any in-domain point y , namely there exists g y such that ψ ( x ) − ψ ( y ) ≤ � g y , x − y � , ∀ x . (4) If a function has both a sub- and super-gradient at a point, then the function must be affine. Iyer/Bilmes 2012 Minimizing submodular f − g page 10 / 34

  14. Background v = f − g Procedures for min f − g Additional Theoretical Results Experiments Summary Submodular Subgradients For submodular function f , the subdifferential can be defined as: ∂ f ( X ) = { x ∈ R V : ∀ Y ⊆ V , x ( Y ) − x ( X ) ≤ f ( Y ) − f ( X ) } (5) Extreme points of the sub-differential are easily computable via the greedy algorithm: Theorem (Fujishige 2005, Theorem 6.11) A point y is an extreme point of ∂ f ( Y ) , iff there exists a chain ∅ = S 0 ⊂ S 1 ⊂ · · · ⊂ S n with Y = S j for some j, such that y ( S i \ S i − 1 ) = y ( S i ) − y ( S i − 1 ) = f ( S i ) − f ( S i − 1 ) . Iyer/Bilmes 2012 Minimizing submodular f − g page 11 / 34

  15. Background v = f − g Procedures for min f − g Additional Theoretical Results Experiments Summary The Submodular Subgradients (Fujishige 2005) Let σ be a permutation of V and define S σ i = { σ (1) , σ (2) , . . . , σ ( i ) } as σ ’s chain containing Y , meaning S σ | Y | = Y (we say that σ ’s chain contains Y ). Then we can define a subgradient h f Y corresponding to f as: � f ( S σ 1 ) if i = 1 h f Y ,σ ( σ ( i )) = otherwise . f ( S σ i ) − f ( S σ i − 1 ) We get a tight modular lower bound of f as follows: � h f h f Y ,σ ( X ) � Y ,σ ( x ) ≤ f ( X ) , ∀ X ⊆ V . x ∈ X Note, h f Y ,σ ( Y ) = f ( Y ). Iyer/Bilmes 2012 Minimizing submodular f − g page 12 / 34

Recommend


More recommend