Online linear optimization and adaptive routing Baruch Awerbuch, - PowerPoint PPT Presentation

Online linear optimization and adaptive routing Baruch Awerbuch, Robert Kleinberg

Motivation ● Overlay network routing – Send a packet from source to target using the route with minimum delay ● The total route delay is revealed ● Graph example 12 1 3 12 r s 5 5 10 1

Using previous algorithms ● We can use EXP3. Each route is a arm. Since we have n! routes, our regret will be O ( √ ( K G max ln K ))→ O ( √ ( n! ln n! )) ● We have seen online shortest paths with (full information) E [ cost ]≤( 1 +ϵ) mincost T + O ( mn log n /ϵ)

Problem definition ● G=(V,E) – Directed graph ● For each j = 1, …, T the adaptive adversary select cost for each edge c j : E →[ 0,1 ] ● The algorithm select a path of length ≤ H ● Receive cost of the entire path ● Goal to minimize the difference between the algorithm's expected total cost and the cost of the best single path from source to target

Regret 1 / 3 T 2 / 3 ) 2 ( mH log Δ log mHT ) O ( H

Pre-processing ● We will transform the graph G to a leveled G =( ̃ ̃ V , ̃ directed acyclic graph E ) ● Start by calculating G x {0, 1, …, H} – Vertex set V x {0, 1, …, H} – e i from (u, i - 1) to (v, i) for every e=(u, v) in E ● The graph is obtained by: ̃ G – Deleting paths that doesn't reach to r

Main idea ● We can traverse the graph by querying BEX for probabilities on the outgoing edges until we reach r ● To do so we need to feed BEX with information on all experts ● We will run in phases, at each phase we will estimate the cost for all experts. At the end of each phase we will update BEX ● We will feed BEX with the total path cost

Sampling experts ● We can sample the experts according to the distribution BEX returns (according to the previous phases costs) ● The problem – We might ignore some edges that might be better at next phases ● We will add some exploration steps at each phase

Exploration ● Will occur with probability δ ● Choose an edge e=(u,v) uniformly at random ● Construct a path by joining prefix(u), e and suffix(v)

Suffix ● Suffix(v) will return the distribution on s – v paths ● Implementation – Choose edge by BEX probabilities, traverse the edge, repeat until r is reached ● Why can't it be random? 1000 1 1 1 v 2 1 10 r

Prefix ● Prefix(v) – Will return the distribution on s - v paths ● Let suffix(u | v) be the distribution on u – v paths ● Obtained by sampling from suffix(u) conditional to the event that the path passes through v.

Prefix ● Sample from suffix(s | v) with probability ( 1 −δ) Pr ( v ∈ suffix ( s ))/ P ϕ ( v ) ● For all e = (q,u) from , with probability ̃ E (δ/ ̃ m ) Pr ( v ∈ suffix ( u ))/ P ϕ ( v ) sample from suffix(u | v) prepend e and then prepend a sample from prefix(q) ● Where P Φ (v) is the probability v is contained in the suffix of a path in phase Φ

Updating costs ● Phase length τ= ⌈ 2 mH log ( mH T )/δ ⌉ ● At each phase we will sum the costs for each edge only if the edge wasn't part of the path chosen by prefix ● The reason for that is that we cannot control the probability those edges came from

Updating costs ● At the end of each phase ∀ e ∈ ̃ E , μ ϕ ( e )← E [ ∑ χ j ( e )] j ∈τ ϕ c ϕ ( e )←( ∑ χ j ( e ) c j (π j ))/μ ϕ ( e ) ̃ j ∈τ ϕ Where ϕ= 1,... ,. ⌈ T /τ⌉ j =τ(ϕ− 1 )+ 1, τ(ϕ− 1 )+ 2,... , τϕ

Algorithm analysis ● Let T − ( v )= ∑ E [ c j ( prefix ( v ))] C j = 1 T + ( v )= ∑ E [ c j ( suffix ( v ))] C j = 1 T OPT ( v )= min paths π : v → r ∑ c j (π) j = 1

Algorithm analysis ● We know that for BEX t K t ∑ ∑ p j ( i ) c j ( i )≤ ∑ c j ( k )+ O (ϵ t + log K /ϵ) M j = 1 i = 1 j = 1 ● Let p ϕ be the probability distribution supplied by BEX(v) during phase ϕ t t ∑ ∑ c ϕ ( e )≤ ∑ p ϕ ( e ) ̃ c ϕ ( e 0 )+ O (ϵ H t + H log Δ/ϵ) ̃ ϕ= 1 e ∈Δ( v ) ϕ= 1

Algorithm analysis ● We used the fact that cost of a phase M is smaller than 3H with high probability. By Chernoff bound τ= 2mHlog ( mHT ) δ μ ϕ >δ τ mH = 2log ( mHT ) − 2 / 32log ( mH T ) ≤ 1 Pr ( ∑ χ j ≥ 3 ∗ 2log ( mHT ))≤ e mHT j ∈τ ϕ

Algorithm analysis ● Now by applying union bound over all phases we get that this low probability event contributes at most HT / (mHT) < 1. So we will ignore this event

Algorithm analysis ● Expanding ̃ c ϕ t ( Eq.12 ) ∑ e ∈Δ( v ) ∑ ∑ p ϕ ( e )χ j ( e ) c j (π j )/μ ϕ ( e ) ϕ= 1 j ∈τ ϕ t χ j ( e 0 ) c j (π j )/μ ϕ ( e 0 )+ O (ϵ Ht + H ≤ ∑ ∑ ϵ log Δ) ϕ= 1 j ∈τ ϕ

Algorithm analysis ● Claim 3.2. Pr (π⊂π j ∣χ j ( e )= 1 )= Pr ( prefix ( v )=π) π : s → v

Algorithm analysis ● Proof of claim 3.2 + 0 ∨ e ∈π j χ j ( e )= 1 → e ∈π j 0 )= Pr ( prefix ( v )=π) Pr (π⊆π j ∣ e ∈π j + )= Pr ( prefix ( v )=π) Pr (π⊆π j ∣ e ∈π j ● The first claim is by definition, let's prove the second claim

Algorithm analysis ● e is sampled independently from the path preceding v, so + )= Pr (π∈π j ∣ v ∈π j + ) Pr (π⊆π j ∣ e ∈π j + ) Pr (π⊆π j ∣ v ∈π j + )= Pr (π⊆π j ∩ v ∈π j + ) Pr ( v ∈π j =( 1 −δ) Pr ( v ∈ suffix ( s )) Pr (π= suffix ( s ∣ v )) δ + ∑ m Pr ( v ∈ suffix ( u )) Pr (π= prefix ( q )∪{ e }∪ suffix ( u ∣ v )) ̃ e =( q ,u )∈ ̃ E + ) Pr (π= prefix ( v )) = Pr ( v ∈π j

Algorithm analysis ● Claim 3.3. If e =(v, w) then E [χ j ( e ) c j (π j )]=(μ( e )/τ)( A j ( v )+ B j ( w )+ c j ( e )) A j ( v )= E [ c j ( prefix ( v ))] B j ( w )= E [ c j ( suffix ( w ))] ● Follows from claim 3.2 that the portion of the path preceding e is distributed by prefix(v)

Algorithm analysis ● Taking the expectation of eq.12 The left side will become t ∑ e ∈Δ( v ) ∑ ∑ p ϕ ( e )( A j ( v )+ B j ( w )+ c j ( e )) ϕ= 1 j ∈τ ϕ T = 1 τ ∑ ∑ p ϕ ( e )( A j ( v )+ B j ( w )+ c j ( e )) j = 1 e ∈Δ( v ) ● The right side will become T 1 τ ∑ ( A j ( v )+ B j ( w 0 )+ c j ( e 0 )) j = 1

Algorithm analysis ● After removing A j (v) from both sides and notice that ∑ p ϕ ( e )( B j ( w )+ c j ( e ))= E [ c j ( suffix ( v ))] e ∈Δ( v ) ● So the left side will become T 1 + ( v )/τ τ ∑ E [ c j ( suffix ( v ))]= c j = 1

Algorithm analysis ● The right side will become T 1 + ( w 0 )/τ+ O (ϵ Ht + H τ ∑ E [ c j ( suffix ( v ))]+ c ϵ log Δ) j = 1 ● Thus we have derived the local performance guarantee (Eq.13) T + ( v )≤ c + ( w 0 )+ ∑ c j ( e 0 )+ O (ϵ HT +τ ϵ H log Δ) c j = 1

Global performance guarantee ● Claim 3.4 + ( v )≤ OPT ( v )+ O (ϵ HT +τ ϵ H log Δ) h ( v ) c ● To prove we can use the following observation T OPT ( v )= min e 0 =( v ,w 0 ) { ∑ c j ( e 0 )+ OPT ( w 0 )} j = 1

Global performance guarantee ● Proof – By induction on h(v) and by using the local performance guarantee ● Lets mark F = O (ϵ Ht +τ H ϵ log Δ) ● Now rewrite the claim and eq.13 + ( v )≤ OPT ( v )+ F h ( v ) c T + ( v )≤ c + ( w 0 )+ ∑ c j ( e 0 )+ F c j = 1

Global performance guarantee ● h(v)=1 T + ( v )≤ OPT ( v )+ F = ∑ c c j ( e 0 )+ OPT ( r )+ F : ∀ e 0 =( v ,r ) j = 1 T + ( v )≤ ∑ c j ( e 0 )+ F : ∀ e 0 =( v ,r ) c j = 1 It's true by the local performance guarantee

Global performance guarantee ● h(v)=k+1 T + ( v )≤ c + ( v k )+ ∑ c j ( e k + 1 )+ F c j = 1 T ≤ ∑ c j ( e k + 1 )+ OPT ( v k )+ kF + F j = 1 = OPT ( v k + 1 )+( k + 1 ) F

Regret ● Theorem 3.5. The algorithm suffers regret 2 ( mH log Δ log mHT ) 1 / 3 T 2 / 3 ) O ( H ● The exploration step contributes δ TH + ( s )− OPT ( s ) ● The exploitation contributes c ● Also τ= 2 mH log ( mH T )/δ ● Substituting in claim 3.4 we get total exploitation cost + ( s )− OPT ( s )= O (ϵ T + 2mHlog Δ log ( mhT ) 2 ) H c ϵδ

Regret Regret ≤ O (δ T +ϵ T + 2mHlog Δ log ( mhT ) 2 ) H ϵδ ● We can assign 1 / 3 T − 1 / 3 ϵ=δ=( 2mH log Δ log ( mhT )) And we will get the desired regret 2 ( mH log Δ log mHT ) 1 / 3 T 2 / 3 ) O ( H

Online linear optimization and adaptive routing Baruch Awerbuch, - PowerPoint PPT Presentation

Online linear optimization and adaptive routing Baruch Awerbuch, Robert Kleinberg Motivation Overlay network routing Send a packet from source to target using the route with minimum delay The total route delay is revealed Graph

Scalable Routing Outline Routing Algorithms Scalability 1 Overview Forwarding vs Routing

Ad Hoc Wireless Routing CS 218- Fall 2003 Wireless multihop routing challenges Review of

Routing Algebras What are routing algebras? Created to study properties of routing protocols

Advanced routing topics Tuomas Launiainen Suboptimal routing Routing trees Measurement of

Neural Nets for Adaptive Filter and Adaptive Neural Nets as Adaptive Filters Pattern Recognition

Adaptive Control Chapter 1: Introduction to Adaptive Control Adaptive Control Landau, Lozano,

Adaptive Control Chapter 11: Direct Adaptive Control 1 Adaptive Control Landau, Lozano,

Optimization-based routing and congestion control Routing, congestion control as optimization

Interplay between routing and forwarding routing algorithm Routing Algorithms and Routing local

4.3 Routing protocols We first look at Routing Tables and routing mechanisms. A routing table has

Outline Integer Programming DMP204 SCHEDULING, TIMETABLING AND ROUTING 1. Vehicle Routing

Landmark Landmark-based routing based routing Landmark Landmark-based routing based routing

Global routing Global routing Global routing Global routing Bill Swartz Bill Swartz

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Adaptive Control Chapter 12: Indirect Adaptive Control 1 Adaptive Control Landau, Lozano,

Network layer Distributed Routing: Link State Routing Link State Routing A very frequently

Routing Protocols ITS323: Introduction to Data Communications CSS331: Fundamentals of Data

Network layer The Dijkstra Algorithm or Dijkstras Shortest Path First Algorithm Non-Adaptive

Q-learning based Congestion-aware Routing Algorithm for On-Chip Network Fahime Farahnakian,

Optimized Q-learning Model for Distributing Traffic in On-Chip Networks Fahimeh Farahnakian,

WINLAB Rutgers, The State University of New Jersey www.winlab.rutgers.edu Contact: Professor

ON-CHIP NETWORK INNOVATIONS Mahdi Nazm Bojnordi Assistant Professor School of Computing

7 On-Chip Interconnection Networks Chip Multiprocessors (ACS MPhil) Robert Mullins

Adaptive Caching Algorithms with Optimality Guarantees for NDN Networks Stratis Ioannidis and