Exploiting Structure of Uncertainty for Efficient Matroid Semi-Bandits Pierre Perrault (INRIA Lille — CMLA, ENS PS) Vianney Perchet (CMLA, ENS PS — Criteo AI Lab) Michal Valko (INRIA Lille) Perrault et al. Exploiting Structure of Uncertainty 1 / 6
Semi-bandits confidence regions � � � ( δ i − µ i,t − 1 ) 2 N i,t − 1 ( δ i − µ i,t − 1 ) 2 N i,t − 1 ≤ log( t ) max ≤ log( t ) i i Not very accurate Accurate Perrault et al. Exploiting Structure of Uncertainty 2 / 6
Efficiency Algorithms use the OFU principle: A t ∈ arg max e T A µ = arg max e T + max e T . A µ t − 1 A µ µ ∈C t − µ t − 1 A ∈A , µ ∈C t A ∈A � �� � � �� � L ( A ) F ( A ) Theorem (Perrault et al.) : F linear, : F submodular. � � � ( δ i − µ i,t − 1 ) 2 N i,t − 1 ( δ i − µ i,t − 1 ) 2 N i,t − 1 ≤ log( t ) max ≤ log( t ) i i Not very accurate Accurate Efficient Inefficient Perrault et al. Exploiting Structure of Uncertainty 3 / 6
NEW: Approximation for matroid Assume non-negative rewards. A is the family of independent sets. G REEDY : L ( S ) + F ( S ) ≥ L ( O ) + F ( O ) , ∀ O ∈ A . 1 − 1 /e Gives linear regret . We expect a constant close to 1 for F small. Perrault et al. Exploiting Structure of Uncertainty 4 / 6
NEW: Approximation for matroid Assume non-negative rewards. A is the family of independent sets. G REEDY : L ( S ) + F ( S ) ≥ L ( O ) + F ( O ) , ∀ O ∈ A . 1 − 1 /e Gives linear regret . We expect a constant close to 1 for F small. L ( S 1 ) ≥ L ( O ) , ∀ O ∈ A . F ( S 2 ) 1 − 1 /e ≥ F ( O ) , ∀ O ∈ A . Perrault et al. Exploiting Structure of Uncertainty 4 / 6
NEW: Approximation for matroid Assume non-negative rewards. A is the family of independent sets. G REEDY : L ( S ) + F ( S ) ≥ L ( O ) + F ( O ) , ∀ O ∈ A . 1 − 1 /e Gives linear regret . We expect a constant close to 1 for F small. L ( S 1 ) ≥ L ( O ) , ∀ O ∈ A . F ( S 2 ) 1 − 1 /e ≥ F ( O ) , ∀ O ∈ A . Theorem (Perrault et al.) G REEDY for maximizing L + F gives S such that L ( S ) + 2 F ( S ) ≥ L ( O ) + F ( O ) , ∀ O ∈ A . Perrault et al. Exploiting Structure of Uncertainty 4 / 6
When reward can be negative... L OCAL S EARCH based algorithm. Theorem (Perrault et al.) L ( S ) + 2(1 + ε ) F ( S ) ≥ L ( O ) + F ( O ) , ∀ O ∈ A . � � m 2 n log( mt ) /ε Time complexity per round: O Start from the greedy solution S init ∈ arg max L ( A ) . A Then, repeatedly try three basic operations in order to improve the current solution. Improvements greater than ε mF ( S ) . Perrault et al. Exploiting Structure of Uncertainty 5 / 6
Thank you! Poster: Pacific Ballroom #53 Extension to budgeted bandit , where we want to minimize � L 1 − F 1 � + . L 2 + F 2 Solution uses NEW concept: Approximation Lagrangian L κ ( λ, S ) � L 1 ( S ) − κF 1 ( S ) − λ ( L 2 ( S ) + κF 2 ( S )) , Experiments × 10 2 × 10 2 CUCB CUCB 5 3 ESCB ESCB 4 2 3 R T R T 2 1 1 0 0 10 1 10 3 10 5 10 1 10 3 10 5 T T Perrault et al. Exploiting Structure of Uncertainty 6 / 6
Recommend
More recommend