New Algorithms for Approximate Minimization of the Difference - PowerPoint PPT Presentation

Background v = f − g Procedures for min f − g Additional Theoretical Results Experiments Summary New Algorithms for Approximate Minimization of the Difference Between Submodular Functions, with Applications Rishabh Iyer Jeff Bilmes University of Washington, Seattle Department of Electrical Engineering December 13, 2013 Iyer/Bilmes 2012 Minimizing submodular f − g page 1 / 34

Background v = f − g Procedures for min f − g Additional Theoretical Results Experiments Summary Outline Background 1 Optimizing v ( X ) = f ( X ) − g ( X ) with f , g submodular 2 Procedures for minimizing v ( X ) 3 The Submodular Supermodular Procedure The Supermodular Submodular Procedure The Modular Modular Procedure Some Additional Theoretical Results 4 Experiments 5 Iyer/Bilmes 2012 Minimizing submodular f − g page 2 / 34

Background v = f − g Procedures for min f − g Additional Theoretical Results Experiments Summary Submodular Functions A function f : 2 V → R is submodular if for all A , B ⊆ V , f ( A ) + f ( B ) ≥ f ( A ∪ B ) + f ( A ∩ B ). Coverage of intersection of elements is less then common coverage. + ≥ f ( A ∪ B ) f ( A ) + f ( B ) f ( A ∩ B ) = f ( A r ) + 2 f ( C ) + f ( B r ) = f ( A r ) + f ( C ) + f ( B r ) Equivalently, diminishing returns: Let A ⊆ B ⊆ V \ { j } then f is submodular iff f ( v | A ) � f ( A + v ) − f ( A ) ≥ f ( B + v ) − f ( B ) � f ( v | B ) (1) I.e., conditioning reduces valuation (like entropy). Iyer/Bilmes 2012 Minimizing submodular f − g page 4 / 34

Background v = f − g Procedures for min f − g Additional Theoretical Results Experiments Summary Optimizing The Difference between Two Submodular Functions In this paper, we address the following problem. Given two submodular functions f and g , solve the optimization problem: X ⊆ V [ f ( X ) − g ( X )] ≡ min min X ⊆ V [ v ( X )] (2) with v : 2 V → R , v = f − g . A function r is said to be supermodular if − r is submodular. Iyer/Bilmes 2012 Minimizing submodular f − g page 6 / 34

Background v = f − g Procedures for min f − g Additional Theoretical Results Experiments Summary Applications Sensor placement with submodular costs. I.e., let V be a set of possible sensor locations, f ( A ) = I ( X A ; X V \ A ) measures the quality of a subset A of placed sensors, and c ( A ) the submodular cost. We have min A f ( A ) − λ c ( A ). Iyer/Bilmes 2012 Minimizing submodular f − g page 7 / 34

Background v = f − g Procedures for min f − g Additional Theoretical Results Experiments Summary Applications Sensor placement with submodular costs. I.e., let V be a set of possible sensor locations, f ( A ) = I ( X A ; X V \ A ) measures the quality of a subset A of placed sensors, and c ( A ) the submodular cost. We have min A f ( A ) − λ c ( A ). Discriminatively structured graphical models, EAR measure I ( X A ; X V \ A ) − I ( X A ; X V \ A | C ), and synergy in neuroscience. Iyer/Bilmes 2012 Minimizing submodular f − g page 7 / 34

Background v = f − g Procedures for min f − g Additional Theoretical Results Experiments Summary Applications Sensor placement with submodular costs. I.e., let V be a set of possible sensor locations, f ( A ) = I ( X A ; X V \ A ) measures the quality of a subset A of placed sensors, and c ( A ) the submodular cost. We have min A f ( A ) − λ c ( A ). Discriminatively structured graphical models, EAR measure I ( X A ; X V \ A ) − I ( X A ; X V \ A | C ), and synergy in neuroscience. Feature selection: a problem of maximizing I ( X A ; C ) − λ c ( A ) = H ( X A ) − [ H ( X A | C ) + λ c ( A )], the difference between two submodular functions, where H is the entropy and c is a feature cost function. Iyer/Bilmes 2012 Minimizing submodular f − g page 7 / 34

Background v = f − g Procedures for min f − g Additional Theoretical Results Experiments Summary Applications Sensor placement with submodular costs. I.e., let V be a set of possible sensor locations, f ( A ) = I ( X A ; X V \ A ) measures the quality of a subset A of placed sensors, and c ( A ) the submodular cost. We have min A f ( A ) − λ c ( A ). Discriminatively structured graphical models, EAR measure I ( X A ; X V \ A ) − I ( X A ; X V \ A | C ), and synergy in neuroscience. Feature selection: a problem of maximizing I ( X A ; C ) − λ c ( A ) = H ( X A ) − [ H ( X A | C ) + λ c ( A )], the difference between two submodular functions, where H is the entropy and c is a feature cost function. Graphical Model Inference. Finding x that maximizes p ( x ) ∝ exp( − v ( x )) where x ∈ { 0 , 1 } n and v is a pseudo-Boolean function. When v is non-submodular, it can be represented as a difference between submodular functions. Iyer/Bilmes 2012 Minimizing submodular f − g page 7 / 34

Background v = f − g Procedures for min f − g Additional Theoretical Results Experiments Summary Heuristics for General Set Function Optimization Lemma (Narisimham & Bilmes, 2005) Given any set function v, it can be expressed as v ( X ) = f ( X ) − g ( X ) , ∀ X ⊆ V for some submodular functions f and g. We give a new proof that depends on computing α v = min X ⊂ Y ⊆ V \ j v ( j | X ) − v ( j | Y ) which can be intractable for general v . However, we show that for those functions where α v can be bounded efficiently, f and g can be computed efficiently. Lemma For a given set function v, if α v or a lower bound can be found in polynomial time, a corresponding decomposition f and g can also be found in polynomial time. Iyer/Bilmes 2012 Minimizing submodular f − g page 8 / 34

Background v = f − g Procedures for min f − g Additional Theoretical Results Experiments Summary Convex/Concave and Semigradients A convex function φ has a subgradient at any in-domain point y , namely there exists h y such that φ ( x ) − φ ( y ) ≥ � h y , x − y � , ∀ x . (3) A concave ψ has a supergradient at any in-domain point y , namely there exists g y such that ψ ( x ) − ψ ( y ) ≤ � g y , x − y � , ∀ x . (4) If a function has both a sub- and super-gradient at a point, then the function must be affine. Iyer/Bilmes 2012 Minimizing submodular f − g page 10 / 34

Background v = f − g Procedures for min f − g Additional Theoretical Results Experiments Summary Submodular Subgradients For submodular function f , the subdifferential can be defined as: ∂ f ( X ) = { x ∈ R V : ∀ Y ⊆ V , x ( Y ) − x ( X ) ≤ f ( Y ) − f ( X ) } (5) Extreme points of the sub-differential are easily computable via the greedy algorithm: Theorem (Fujishige 2005, Theorem 6.11) A point y is an extreme point of ∂ f ( Y ) , iff there exists a chain ∅ = S 0 ⊂ S 1 ⊂ · · · ⊂ S n with Y = S j for some j, such that y ( S i \ S i − 1 ) = y ( S i ) − y ( S i − 1 ) = f ( S i ) − f ( S i − 1 ) . Iyer/Bilmes 2012 Minimizing submodular f − g page 11 / 34

Background v = f − g Procedures for min f − g Additional Theoretical Results Experiments Summary The Submodular Subgradients (Fujishige 2005) Let σ be a permutation of V and define S σ i = { σ (1) , σ (2) , . . . , σ ( i ) } as σ ’s chain containing Y , meaning S σ | Y | = Y (we say that σ ’s chain contains Y ). Then we can define a subgradient h f Y corresponding to f as: � f ( S σ 1 ) if i = 1 h f Y ,σ ( σ ( i )) = otherwise . f ( S σ i ) − f ( S σ i − 1 ) We get a tight modular lower bound of f as follows: � h f h f Y ,σ ( X ) � Y ,σ ( x ) ≤ f ( X ) , ∀ X ⊆ V . x ∈ X Note, h f Y ,σ ( Y ) = f ( Y ). Iyer/Bilmes 2012 Minimizing submodular f − g page 12 / 34

New Algorithms for Approximate Minimization of the Difference - PowerPoint PPT Presentation

Background v = f g Procedures for min f g Additional Theoretical Results Experiments Summary New Algorithms for Approximate Minimization of the Difference Between Submodular Functions, with Applications Rishabh Iyer Jeff Bilmes

Quantum and Classical Algorithms for Approximate Submodular Function Minimization Yassine

Quantum and Classical Algorithms for Approximate Submodular Function Minimization Yassine

Minimization Satoru Iwata (University of Tokyo) Submodular Function Minimization ( )

Approximate Computing Is Dead; Long Live Approximate Computing Adrian Sampson Cornell Hardware

Approximate Nearest Neighbors Search Approximate Nearest Neighbors Search in High Dimensions in

A Congruence-based Perspective on Automata Minimization Algorithms Pierre Ganty, Elena

Benefits of Radial Build Benefits of Radial Build Minimization and Requirements Minimization and

1 The Minimization Problem The Minimization Problem Input: A DFA (deterministic finite-state

Empirical Risk Minimization October 29, 2015 Outline Empirical risk minimization view

ARS Workshop Context Markov Random Fields minimization and minimal cuts in Exact total variation

Counterfactual Regret Minimization and Domination in Extensive-Form Games Richard Gibson

Learning as Loss Minimization Machine Learning 1 Learning as loss minimization The setup

One-Dimensional Minimization Lectures for PHD course on Numerical optimization Enrico Bertolazzi

Minimization Using Descent Information we will consider the minimization of unconstrained

Cluster Minimization in Geometric Graphs Jakob Geiger Motivation Motivation Cluster

11. Equality constrained minimization equality constrained minimization eliminating

CS 4803 / 7643: Deep Learning Topics: Policy Gradients Actor Critic Zsolt Kira Georgia

CS 251 Fall 2019 CS 251 Fall 2019 Principles of Programming Languages Principles of

L-functions: structure and tools David Farmer AIM joint work with Sally Koutsoliotas and Stefan

Functions Jason Smith, Josiah Manson, and Scott Schaefer Texas A&M University Indicator

Writing Home 2: The Simpson Family The Simpson family moved to Biggin Hill on the outbreak of

De slides para powerpoint gratis Direct Link #1 I downtown commented in the Windows news post on

Exam Structure Two Hours (although you should only need one) 5 short questions that will

Kent Resilience Forum Activity 2017 Steve Scully KCC Senior Resilience Officer Kent Resilience

New Algorithms for Approximate Minimization of the Difference - PowerPoint PPT Presentation

Background v = f g Procedures for min f g Additional Theoretical Results Experiments Summary New Algorithms for Approximate Minimization of the Difference Between Submodular Functions, with Applications Rishabh Iyer Jeff Bilmes

Quantum and Classical Algorithms for Approximate Submodular Function Minimization Yassine

Quantum and Classical Algorithms for Approximate Submodular Function Minimization Yassine

Minimization Satoru Iwata (University of Tokyo) Submodular Function Minimization ( )

Approximate Computing Is Dead; Long Live Approximate Computing Adrian Sampson Cornell Hardware

Approximate Nearest Neighbors Search Approximate Nearest Neighbors Search in High Dimensions in

A Congruence-based Perspective on Automata Minimization Algorithms Pierre Ganty, Elena

Benefits of Radial Build Benefits of Radial Build Minimization and Requirements Minimization and

1 The Minimization Problem The Minimization Problem Input: A DFA (deterministic finite-state

Empirical Risk Minimization October 29, 2015 Outline Empirical risk minimization view

ARS Workshop Context Markov Random Fields minimization and minimal cuts in Exact total variation

Counterfactual Regret Minimization and Domination in Extensive-Form Games Richard Gibson

Learning as Loss Minimization Machine Learning 1 Learning as loss minimization The setup

One-Dimensional Minimization Lectures for PHD course on Numerical optimization Enrico Bertolazzi

Minimization Using Descent Information we will consider the minimization of unconstrained

Cluster Minimization in Geometric Graphs Jakob Geiger Motivation Motivation Cluster

11. Equality constrained minimization equality constrained minimization eliminating

CS 4803 / 7643: Deep Learning Topics: Policy Gradients Actor Critic Zsolt Kira Georgia

CS 251 Fall 2019 CS 251 Fall 2019 Principles of Programming Languages Principles of

L-functions: structure and tools David Farmer AIM joint work with Sally Koutsoliotas and Stefan

Functions Jason Smith, Josiah Manson, and Scott Schaefer Texas A&amp;M University Indicator

Writing Home 2: The Simpson Family The Simpson family moved to Biggin Hill on the outbreak of

De slides para powerpoint gratis Direct Link #1 I downtown commented in the Windows news post on

Exam Structure Two Hours (although you should only need one) 5 short questions that will

Kent Resilience Forum Activity 2017 Steve Scully KCC Senior Resilience Officer Kent Resilience

Functions Jason Smith, Josiah Manson, and Scott Schaefer Texas A&M University Indicator