Online linear optimization and adaptive routing Baruch Awerbuch, Robert Kleinberg
Motivation ● Overlay network routing – Send a packet from source to target using the route with minimum delay ● The total route delay is revealed ● Graph example 12 1 3 12 r s 5 5 10 1
Using previous algorithms ● We can use EXP3. Each route is a arm. Since we have n! routes, our regret will be O ( √ ( K G max ln K ))→ O ( √ ( n! ln n! )) ● We have seen online shortest paths with (full information) E [ cost ]≤( 1 +ϵ) mincost T + O ( mn log n /ϵ)
Problem definition ● G=(V,E) – Directed graph ● For each j = 1, …, T the adaptive adversary select cost for each edge c j : E →[ 0,1 ] ● The algorithm select a path of length ≤ H ● Receive cost of the entire path ● Goal to minimize the difference between the algorithm's expected total cost and the cost of the best single path from source to target
Regret 1 / 3 T 2 / 3 ) 2 ( mH log Δ log mHT ) O ( H
Pre-processing ● We will transform the graph G to a leveled G =( ̃ ̃ V , ̃ directed acyclic graph E ) ● Start by calculating G x {0, 1, …, H} – Vertex set V x {0, 1, …, H} – e i from (u, i - 1) to (v, i) for every e=(u, v) in E ● The graph is obtained by: ̃ G – Deleting paths that doesn't reach to r
Main idea ● We can traverse the graph by querying BEX for probabilities on the outgoing edges until we reach r ● To do so we need to feed BEX with information on all experts ● We will run in phases, at each phase we will estimate the cost for all experts. At the end of each phase we will update BEX ● We will feed BEX with the total path cost
Sampling experts ● We can sample the experts according to the distribution BEX returns (according to the previous phases costs) ● The problem – We might ignore some edges that might be better at next phases ● We will add some exploration steps at each phase
Exploration ● Will occur with probability δ ● Choose an edge e=(u,v) uniformly at random ● Construct a path by joining prefix(u), e and suffix(v)
Suffix ● Suffix(v) will return the distribution on s – v paths ● Implementation – Choose edge by BEX probabilities, traverse the edge, repeat until r is reached ● Why can't it be random? 1000 1 1 1 v 2 1 10 r
Prefix ● Prefix(v) – Will return the distribution on s - v paths ● Let suffix(u | v) be the distribution on u – v paths ● Obtained by sampling from suffix(u) conditional to the event that the path passes through v.
Prefix ● Sample from suffix(s | v) with probability ( 1 −δ) Pr ( v ∈ suffix ( s ))/ P ϕ ( v ) ● For all e = (q,u) from , with probability ̃ E (δ/ ̃ m ) Pr ( v ∈ suffix ( u ))/ P ϕ ( v ) sample from suffix(u | v) prepend e and then prepend a sample from prefix(q) ● Where P Φ (v) is the probability v is contained in the suffix of a path in phase Φ
Updating costs ● Phase length τ= ⌈ 2 mH log ( mH T )/δ ⌉ ● At each phase we will sum the costs for each edge only if the edge wasn't part of the path chosen by prefix ● The reason for that is that we cannot control the probability those edges came from
Updating costs ● At the end of each phase ∀ e ∈ ̃ E , μ ϕ ( e )← E [ ∑ χ j ( e )] j ∈τ ϕ c ϕ ( e )←( ∑ χ j ( e ) c j (π j ))/μ ϕ ( e ) ̃ j ∈τ ϕ Where ϕ= 1,... ,. ⌈ T /τ⌉ j =τ(ϕ− 1 )+ 1, τ(ϕ− 1 )+ 2,... , τϕ
Algorithm analysis ● Let T − ( v )= ∑ E [ c j ( prefix ( v ))] C j = 1 T + ( v )= ∑ E [ c j ( suffix ( v ))] C j = 1 T OPT ( v )= min paths π : v → r ∑ c j (π) j = 1
Algorithm analysis ● We know that for BEX t K t ∑ ∑ p j ( i ) c j ( i )≤ ∑ c j ( k )+ O (ϵ t + log K /ϵ) M j = 1 i = 1 j = 1 ● Let p ϕ be the probability distribution supplied by BEX(v) during phase ϕ t t ∑ ∑ c ϕ ( e )≤ ∑ p ϕ ( e ) ̃ c ϕ ( e 0 )+ O (ϵ H t + H log Δ/ϵ) ̃ ϕ= 1 e ∈Δ( v ) ϕ= 1
Algorithm analysis ● We used the fact that cost of a phase M is smaller than 3H with high probability. By Chernoff bound τ= 2mHlog ( mHT ) δ μ ϕ >δ τ mH = 2log ( mHT ) − 2 / 32log ( mH T ) ≤ 1 Pr ( ∑ χ j ≥ 3 ∗ 2log ( mHT ))≤ e mHT j ∈τ ϕ
Algorithm analysis ● Now by applying union bound over all phases we get that this low probability event contributes at most HT / (mHT) < 1. So we will ignore this event
Algorithm analysis ● Expanding ̃ c ϕ t ( Eq.12 ) ∑ e ∈Δ( v ) ∑ ∑ p ϕ ( e )χ j ( e ) c j (π j )/μ ϕ ( e ) ϕ= 1 j ∈τ ϕ t χ j ( e 0 ) c j (π j )/μ ϕ ( e 0 )+ O (ϵ Ht + H ≤ ∑ ∑ ϵ log Δ) ϕ= 1 j ∈τ ϕ
Algorithm analysis ● Claim 3.2. Pr (π⊂π j ∣χ j ( e )= 1 )= Pr ( prefix ( v )=π) π : s → v
Algorithm analysis ● Proof of claim 3.2 + 0 ∨ e ∈π j χ j ( e )= 1 → e ∈π j 0 )= Pr ( prefix ( v )=π) Pr (π⊆π j ∣ e ∈π j + )= Pr ( prefix ( v )=π) Pr (π⊆π j ∣ e ∈π j ● The first claim is by definition, let's prove the second claim
Algorithm analysis ● e is sampled independently from the path preceding v, so + )= Pr (π∈π j ∣ v ∈π j + ) Pr (π⊆π j ∣ e ∈π j + ) Pr (π⊆π j ∣ v ∈π j + )= Pr (π⊆π j ∩ v ∈π j + ) Pr ( v ∈π j =( 1 −δ) Pr ( v ∈ suffix ( s )) Pr (π= suffix ( s ∣ v )) δ + ∑ m Pr ( v ∈ suffix ( u )) Pr (π= prefix ( q )∪{ e }∪ suffix ( u ∣ v )) ̃ e =( q ,u )∈ ̃ E + ) Pr (π= prefix ( v )) = Pr ( v ∈π j
Algorithm analysis ● Claim 3.3. If e =(v, w) then E [χ j ( e ) c j (π j )]=(μ( e )/τ)( A j ( v )+ B j ( w )+ c j ( e )) A j ( v )= E [ c j ( prefix ( v ))] B j ( w )= E [ c j ( suffix ( w ))] ● Follows from claim 3.2 that the portion of the path preceding e is distributed by prefix(v)
Algorithm analysis ● Taking the expectation of eq.12 The left side will become t ∑ e ∈Δ( v ) ∑ ∑ p ϕ ( e )( A j ( v )+ B j ( w )+ c j ( e )) ϕ= 1 j ∈τ ϕ T = 1 τ ∑ ∑ p ϕ ( e )( A j ( v )+ B j ( w )+ c j ( e )) j = 1 e ∈Δ( v ) ● The right side will become T 1 τ ∑ ( A j ( v )+ B j ( w 0 )+ c j ( e 0 )) j = 1
Algorithm analysis ● After removing A j (v) from both sides and notice that ∑ p ϕ ( e )( B j ( w )+ c j ( e ))= E [ c j ( suffix ( v ))] e ∈Δ( v ) ● So the left side will become T 1 + ( v )/τ τ ∑ E [ c j ( suffix ( v ))]= c j = 1
Algorithm analysis ● The right side will become T 1 + ( w 0 )/τ+ O (ϵ Ht + H τ ∑ E [ c j ( suffix ( v ))]+ c ϵ log Δ) j = 1 ● Thus we have derived the local performance guarantee (Eq.13) T + ( v )≤ c + ( w 0 )+ ∑ c j ( e 0 )+ O (ϵ HT +τ ϵ H log Δ) c j = 1
Global performance guarantee ● Claim 3.4 + ( v )≤ OPT ( v )+ O (ϵ HT +τ ϵ H log Δ) h ( v ) c ● To prove we can use the following observation T OPT ( v )= min e 0 =( v ,w 0 ) { ∑ c j ( e 0 )+ OPT ( w 0 )} j = 1
Global performance guarantee ● Proof – By induction on h(v) and by using the local performance guarantee ● Lets mark F = O (ϵ Ht +τ H ϵ log Δ) ● Now rewrite the claim and eq.13 + ( v )≤ OPT ( v )+ F h ( v ) c T + ( v )≤ c + ( w 0 )+ ∑ c j ( e 0 )+ F c j = 1
Global performance guarantee ● h(v)=1 T + ( v )≤ OPT ( v )+ F = ∑ c c j ( e 0 )+ OPT ( r )+ F : ∀ e 0 =( v ,r ) j = 1 T + ( v )≤ ∑ c j ( e 0 )+ F : ∀ e 0 =( v ,r ) c j = 1 It's true by the local performance guarantee
Global performance guarantee ● h(v)=k+1 T + ( v )≤ c + ( v k )+ ∑ c j ( e k + 1 )+ F c j = 1 T ≤ ∑ c j ( e k + 1 )+ OPT ( v k )+ kF + F j = 1 = OPT ( v k + 1 )+( k + 1 ) F
Regret ● Theorem 3.5. The algorithm suffers regret 2 ( mH log Δ log mHT ) 1 / 3 T 2 / 3 ) O ( H ● The exploration step contributes δ TH + ( s )− OPT ( s ) ● The exploitation contributes c ● Also τ= 2 mH log ( mH T )/δ ● Substituting in claim 3.4 we get total exploitation cost + ( s )− OPT ( s )= O (ϵ T + 2mHlog Δ log ( mhT ) 2 ) H c ϵδ
Regret Regret ≤ O (δ T +ϵ T + 2mHlog Δ log ( mhT ) 2 ) H ϵδ ● We can assign 1 / 3 T − 1 / 3 ϵ=δ=( 2mH log Δ log ( mhT )) And we will get the desired regret 2 ( mH log Δ log mHT ) 1 / 3 T 2 / 3 ) O ( H
Recommend
More recommend