exploiting structure of uncertainty for efficient matroid
play

Exploiting Structure of Uncertainty for Efficient Matroid - PowerPoint PPT Presentation

Exploiting Structure of Uncertainty for Efficient Matroid Semi-Bandits Pierre Perrault (INRIA Lille CMLA, ENS PS) Vianney Perchet (CMLA, ENS PS Criteo AI Lab) Michal Valko (INRIA Lille) Perrault et al. Exploiting Structure of


  1. Exploiting Structure of Uncertainty for Efficient Matroid Semi-Bandits Pierre Perrault (INRIA Lille — CMLA, ENS PS) Vianney Perchet (CMLA, ENS PS — Criteo AI Lab) Michal Valko (INRIA Lille) Perrault et al. Exploiting Structure of Uncertainty 1 / 6

  2. Semi-bandits confidence regions � � � ( δ i − µ i,t − 1 ) 2 N i,t − 1 ( δ i − µ i,t − 1 ) 2 N i,t − 1 ≤ log( t ) max ≤ log( t ) i i Not very accurate Accurate Perrault et al. Exploiting Structure of Uncertainty 2 / 6

  3. Efficiency Algorithms use the OFU principle: A t ∈ arg max e T A µ = arg max e T + max e T . A µ t − 1 A µ µ ∈C t − µ t − 1 A ∈A , µ ∈C t A ∈A � �� � � �� � L ( A ) F ( A ) Theorem (Perrault et al.) : F linear, : F submodular. � � � ( δ i − µ i,t − 1 ) 2 N i,t − 1 ( δ i − µ i,t − 1 ) 2 N i,t − 1 ≤ log( t ) max ≤ log( t ) i i Not very accurate Accurate Efficient Inefficient Perrault et al. Exploiting Structure of Uncertainty 3 / 6

  4. NEW: Approximation for matroid Assume non-negative rewards. A is the family of independent sets. G REEDY : L ( S ) + F ( S ) ≥ L ( O ) + F ( O ) , ∀ O ∈ A . 1 − 1 /e Gives linear regret . We expect a constant close to 1 for F small. Perrault et al. Exploiting Structure of Uncertainty 4 / 6

  5. NEW: Approximation for matroid Assume non-negative rewards. A is the family of independent sets. G REEDY : L ( S ) + F ( S ) ≥ L ( O ) + F ( O ) , ∀ O ∈ A . 1 − 1 /e Gives linear regret . We expect a constant close to 1 for F small. L ( S 1 ) ≥ L ( O ) , ∀ O ∈ A . F ( S 2 ) 1 − 1 /e ≥ F ( O ) , ∀ O ∈ A . Perrault et al. Exploiting Structure of Uncertainty 4 / 6

  6. NEW: Approximation for matroid Assume non-negative rewards. A is the family of independent sets. G REEDY : L ( S ) + F ( S ) ≥ L ( O ) + F ( O ) , ∀ O ∈ A . 1 − 1 /e Gives linear regret . We expect a constant close to 1 for F small. L ( S 1 ) ≥ L ( O ) , ∀ O ∈ A . F ( S 2 ) 1 − 1 /e ≥ F ( O ) , ∀ O ∈ A . Theorem (Perrault et al.) G REEDY for maximizing L + F gives S such that L ( S ) + 2 F ( S ) ≥ L ( O ) + F ( O ) , ∀ O ∈ A . Perrault et al. Exploiting Structure of Uncertainty 4 / 6

  7. When reward can be negative... L OCAL S EARCH based algorithm. Theorem (Perrault et al.) L ( S ) + 2(1 + ε ) F ( S ) ≥ L ( O ) + F ( O ) , ∀ O ∈ A . � � m 2 n log( mt ) /ε Time complexity per round: O Start from the greedy solution S init ∈ arg max L ( A ) . A Then, repeatedly try three basic operations in order to improve the current solution. Improvements greater than ε mF ( S ) . Perrault et al. Exploiting Structure of Uncertainty 5 / 6

  8. Thank you! Poster: Pacific Ballroom #53 Extension to budgeted bandit , where we want to minimize � L 1 − F 1 � + . L 2 + F 2 Solution uses NEW concept: Approximation Lagrangian L κ ( λ, S ) � L 1 ( S ) − κF 1 ( S ) − λ ( L 2 ( S ) + κF 2 ( S )) , Experiments × 10 2 × 10 2 CUCB CUCB 5 3 ESCB ESCB 4 2 3 R T R T 2 1 1 0 0 10 1 10 3 10 5 10 1 10 3 10 5 T T Perrault et al. Exploiting Structure of Uncertainty 6 / 6

Recommend


More recommend