Gregor Hendel Matthias Miltenberger Jakob Witzig Gregor Hendel, Matthias Miltenberger, Jakob Witzig – Adaptive algorithms for MIP 1/26 Adaptive Algorithmic Behavior for solving Mixed Integer Programs using Bandit Algorithms International Conference on O perations R esearch 2018, Sep 12, Brussels, Belgium
Introduction Adaptive Large Neighborhood Search Adaptive LP Pricing Adaptive Diving Gregor Hendel, Matthias Miltenberger, Jakob Witzig – Adaptive algorithms for MIP 2/26 Overview
Introduction
c T x s.t. (MIP) Solution method: • typically solved with branch-and-cut • at each node, an LP relaxation is (re-)solved with the dual Simplex algorithm • primal heuristics, e.g., Large Neighborhood Search and diving methods, support the solution process Gregor Hendel, Matthias Miltenberger, Jakob Witzig – Adaptive algorithms for MIP 3/26 Mixed Integer Programs min Ax ≥ b ℓ ≤ x ≤ u x ∈ { 0 , 1 } n b × Z n i − n b × Q n − n i
• stochastic i.i.d. rewards for each action over time • adversarial an opponent tries to maximize the player’s regret. Two main scenarios: Literature: [Bubeck and Cesa-Bianchi, 2012] Gregor Hendel, Matthias Miltenberger, Jakob Witzig – Adaptive algorithms for MIP 4/26 The Multi-Armed Bandit Problem • Discrete time steps t = 1 , 2 , . . . • Finite set of actions H 1. Choose h t ∈ H 2. Observe reward r ( h t , t ) ∈ [ 0 , 1 ] 3. Goal: Maximize ∑ t r ( h t , t )
• adversarial an opponent tries to maximize the player’s regret. Two main scenarios: Literature: [Bubeck and Cesa-Bianchi, 2012] Gregor Hendel, Matthias Miltenberger, Jakob Witzig – Adaptive algorithms for MIP 4/26 The Multi-Armed Bandit Problem • Discrete time steps t = 1 , 2 , . . . • Finite set of actions H 1. Choose h t ∈ H 2. Observe reward r ( h t , t ) ∈ [ 0 , 1 ] 3. Goal: Maximize ∑ t r ( h t , t ) • stochastic i.i.d. rewards for each action over time
Literature: [Bubeck and Cesa-Bianchi, 2012] Two main scenarios: Gregor Hendel, Matthias Miltenberger, Jakob Witzig – Adaptive algorithms for MIP 4/26 The Multi-Armed Bandit Problem • Discrete time steps t = 1 , 2 , . . . • Finite set of actions H 1. Choose h t ∈ H 2. Observe reward r ( h t , t ) ∈ [ 0 , 1 ] 3. Goal: Maximize ∑ t r ( h t , t ) • stochastic i.i.d. rewards for each action over time • adversarial an opponent tries to maximize the player’s regret.
Upper Confidence Bound (UCB) r h t T h t Exp.3 p h t 1 if t , H t if t 5/26 t w h t h w h t 1 Individual parameters 0 can be calibrated to the problem at hand. Gregor Hendel, Matthias Miltenberger, Jakob Witzig – Adaptive algorithms for MIP 1 1 1 1 h h t and Bandit Algorithms ¯ Let T h ( t ) = ∑ r h ( t ) = ∑ 1 h = h t r h , t 1 h = h t T h ( t ) t ′ ≤ t t ′ ≤ t ε -greedy √ |H| Select heuristic at random with probability ε t = ε t , otherwise use best.
Exp.3 5/26 Gregor Hendel, Matthias Miltenberger, Jakob Witzig – Adaptive algorithms for MIP 0 can be calibrated to the problem at hand. and Individual parameters 1 1 t w h h w h t 1 p h t Bandit Algorithms ¯ Let T h ( t ) = ∑ r h ( t ) = ∑ 1 h = h t r h , t 1 h = h t T h ( t ) t ′ ≤ t t ′ ≤ t ε -greedy √ |H| Select heuristic at random with probability ε t = ε t , otherwise use best. Upper Confidence Bound (UCB) { √ } α ln( 1 + t ) ¯ argmax r h ( t − 1 ) + if t > |H| , T h ( t − 1 ) h t ∈ h ∈H { H t } if t ≤ |H| .
5/26 1 Gregor Hendel, Matthias Miltenberger, Jakob Witzig – Adaptive algorithms for MIP 0 can be calibrated to the problem at hand. and Individual parameters 1 Bandit Algorithms ¯ Let T h ( t ) = ∑ r h ( t ) = ∑ 1 h = h t r h , t 1 h = h t T h ( t ) t ′ ≤ t t ′ ≤ t ε -greedy √ |H| Select heuristic at random with probability ε t = ε t , otherwise use best. Upper Confidence Bound (UCB) { √ } α ln( 1 + t ) ¯ argmax r h ( t − 1 ) + if t > |H| , T h ( t − 1 ) h t ∈ h ∈H { H t } if t ≤ |H| . Exp.3 exp( w h , t ) p h , t = ( 1 − γ ) · h ′ exp( w h ′ , t ) + γ · ∑ |H|
5/26 1 Gregor Hendel, Matthias Miltenberger, Jakob Witzig – Adaptive algorithms for MIP 1 and Bandit Algorithms ¯ Let T h ( t ) = ∑ r h ( t ) = ∑ 1 h = h t r h , t 1 h = h t T h ( t ) t ′ ≤ t t ′ ≤ t ε -greedy √ |H| Select heuristic at random with probability ε t = ε t , otherwise use best. Upper Confidence Bound (UCB) { √ } α ln( 1 + t ) ¯ argmax r h ( t − 1 ) + if t > |H| , T h ( t − 1 ) h t ∈ h ∈H { H t } if t ≤ |H| . Exp.3 exp( w h , t ) p h , t = ( 1 − γ ) · h ′ exp( w h ′ , t ) + γ · ∑ |H| Individual parameters α, ε, γ ≥ 0 can be calibrated to the problem at hand.
Adaptive Large Neighborhood Search
Large Neighborhood Search (LNS) heuristics solve auxiliary MIPs and can be c T distinguished by their respective neighborhoods. Gregor Hendel, Matthias Miltenberger, Jakob Witzig – Adaptive algorithms for MIP 6/26 LNS and the auxiliary MIP Auxiliary MIP Let P be a MIP with solution set F P . For a polyhedron N ⊆ Q n and objective coeffjcients c aux ∈ Q n , a MIP P aux defined as { } aux x | x ∈ F P ∩ N min is called an auxiliary MIP of P , and N is called neighborhood.
• Relaxation Induced Neighborhood Search (RINS) [Danna et al., 2005] • Local Branching [Fischetti and Lodi, 2003] • Crossover, Mutation [Rothberg, 2007] • Proximity [Fischetti and Monaci, 2014] • Zeroobjective [in SCIP, Gurobi, XPress,…] • Analytic Center Search [Berthold et al., 2017] • … Gregor Hendel, Matthias Miltenberger, Jakob Witzig – Adaptive algorithms for MIP 7/26 Famous LNS Heuristics • RENS [Berthold, 2014] • DINS [Ghosh, 2007]
r gap r sol Solution Reward r sol h t t Gap Reward r fail r gap h t t Failure Penalty r fail h t t h t t n h t 8/26 1 1 scaling (opt.) 2 2 1 Default settings in ALNS: 1 0 8 2 0 5 Gregor Hendel, Matthias Miltenberger, Jakob Witzig – Adaptive algorithms for MIP 1 1 r alns n lim x new if x old 1 c dual c T x old c T x new c T x old , else 0 x new , if x old 1 Rewarding Neighborhoods Goal A suitable reward function r alns ( h t , t ) ∈ [ 0 , 1 ]
r gap r sol Gap Reward r fail r gap h t t Failure Penalty r fail h t t h t t n h t 1 1 1 scaling (opt.) 8/26 2 1 2 Default settings in ALNS: 1 0 8 2 0 5 Gregor Hendel, Matthias Miltenberger, Jakob Witzig – Adaptive algorithms for MIP r alns x new n lim c T x new 1 0 1 c T x old , else c T x old c dual 1 if x old Rewarding Neighborhoods Goal A suitable reward function r alns ( h t , t ) ∈ [ 0 , 1 ] Solution Reward , if x old ̸ = x new r sol ( h t , t ) =
r gap r sol r fail Failure Penalty r fail h t t h t t n h t scaling (opt.) 1 Gregor Hendel, Matthias Miltenberger, Jakob Witzig – Adaptive algorithms for MIP 1 1 0 5 2 r alns 1 2 Default settings in ALNS: 1 0 8 2 8/26 n lim 1 1 0 , else if x old x new 1 Rewarding Neighborhoods Goal A suitable reward function r alns ( h t , t ) ∈ [ 0 , 1 ] Solution Reward , if x old ̸ = x new r sol ( h t , t ) = Gap Reward r gap ( h t , t ) = c T x old − c T x new c T x old − c dual
r gap r sol r fail 8/26 1 1 1 1 scaling (opt.) 2 2 n lim , else 0 Default settings in ALNS: 1 1 0 8 2 0 5 Gregor Hendel, Matthias Miltenberger, Jakob Witzig – Adaptive algorithms for MIP r alns Rewarding Neighborhoods Goal A suitable reward function r alns ( h t , t ) ∈ [ 0 , 1 ] Solution Reward , if x old ̸ = x new r sol ( h t , t ) = Gap Reward r gap ( h t , t ) = c T x old − c T x new c T x old − c dual Failure Penalty if x old ̸ = x new 1 , r fail ( h t , t ) = 1 − ϕ ( h t , t ) n ( h t )
8/26 1 Gregor Hendel, Matthias Miltenberger, Jakob Witzig – Adaptive algorithms for MIP scaling (opt.) n lim , else 0 Rewarding Neighborhoods Goal A suitable reward function r alns ( h t , t ) ∈ [ 0 , 1 ] r gap ( . ) r sol ( . ) Solution Reward , if x old ̸ = x new · ( 1 − η 1 ) · η 1 r sol ( h t , t ) = + Gap Reward r gap ( h t , t ) = c T x old − c T x new r fail ( . ) c T x old − c dual · η 2 · ( 1 − η 2 ) + Failure Penalty if x old ̸ = x new 1 , r fail ( h t , t ) = r alns ( . ) 1 − ϕ ( h t , t ) n ( h t ) Default settings in ALNS: η 1 = 0 . 8 , η 2 = 0 . 5
9/26 • Always execute all 8 neighborhoods with ALNS (disable old LNS heuristics) • Disable solution transfer • Record each reward Gregor Hendel, Matthias Miltenberger, Jakob Witzig – Adaptive algorithms for MIP time limit. 666 instances from the test sets MIPLIB3, MIPLIB2003, MIPLIB2010, Cor@l, 5h Simulation for parameter calibration 750 Fixing rate 0.1 Instances 500 0.3 0.5 0.7 0.9 250 • Fixing rates 0 . 1 − 0 . 9 0 0 20 40 60 ALNS calls Test Set
10/26 (UCB) Gregor Hendel, Matthias Miltenberger, Jakob Witzig – Adaptive algorithms for MIP UCB Calibration Simulate 100 repetitions of UCB, Exp.3, and ϵ -greedy on the data ● ● 0.55 UCB ● alpha_0 ● ● alpha_0.2 0.50 alpha_0.4 Sol. rate alpha_0.6 alpha_0.8 ● alpha_1 0.45 alpha_0.0016 avg 0.40 0.1 0.3 0.5 0.7 0.9 Fixing rate { √ } α ln( 1 + t ) ¯ argmax r h ( t − 1 ) + if t > |H| , Th ( t − 1 ) h t ∈ h ∈H { H t } if t ≤ |H| .
Recommend
More recommend