Diagnosing Missing Events in Distributed Systems with Negative Provenance Yang Wu* Mingchen Zhao* Andreas Haeberlen* Wenchao Zhou + Boon Thau Loo* * University of Pennsylvania + Georgetown University 1
Motivation: Network debugging - Example: Software Defined Networks - SDN offers flexibility, but can have bugs - Need good debuggers! Why is the HTTP server getting DNS queries? SDN Controller DNS HTTP Query ¡ Request ¡ Internet Data Center Network HTTP Server 2
Approach: Provenance - Existing tools: SNP (SOSP ‘11), NetSight (NSDI ‘14) - They produce “backtraces”, or provenance Why is the HTTP server DNS Query arrived getting DNS queries? at HTTP Server SDN Controller DNS Query Broken FlowEntry Program ¡ received at Switch existed at Switch Broken … … FlowEntry ¡ DNS DNS … ¡ Query ¡ Query ¡ Internet Data Center Network HTTP Server 3
Challenge: Missing events - What if an expected event does not happen? - Cannot be handled by existing tools Why is the HTTP server - No starting point for a backtrace NOT getting requests? SDN Controller ??? ¡ Internet Data Center Network HTTP Server 4
Survey: How common are missing events? - Missing events are consistently in the majority - Email threads for missing events are longer Missing events Positive events NANOG-user floodlight-dev Outages 17% 26% 52% 48% 74% 83% Outages ¡ NANOG-user ¡ Floodlight-dev ¡ 5
Approach: Counter-factual reasoning Find all the ways a missing event could have occurred, and show why each of them did not happen. Why did Bob NOT arrive at SIGCOMM? Philadelphia Chicago 6
Result: Debugger for missing events No HTTP Request arrived Why is the HTTP server at HTTP Server NOT getting requests? No Forwarding-FlowEntry installed at Switch Controller HTTP Request Dropping-FlowEntry Program ¡ received at Switch existed at Switch … … Dropping- FlowEntry ¡ HTTP ??? ¡ … ¡ ??? ¡ Request ¡ Internet Data Center Network HTTP Server 7
Challenge: Too many possible explanations! Why did Bob NOT arrive at SIGCOMM? When an event happens, there is one reason. When an event does not happen, there can be many reasons. 8
WHY NOT ? Goal: Diagnose missing events Overview Approach: Counter-factual reasoning Challenge: Too many explanations Background: Provenance function QUERY ( EXIST ( [ t 1 , t 2 ] ,N, τ )) function QUERY ( RECEIVE (t, N 1 ← N 2 , + τ )) function QUERY ( NAPPEAR ([ t 1 , t 2 ],N, τ )) t s ← max t 0 <t : (+ τ , N 2 , t 0 ,r,1) ∈ Log S ← ∅ if BaseTuple( τ ) then for each ( + τ ,N,t,r,c) ∈ Log: t 1 ≤ t ≤ t 2 RETURN { SEND ( t s , N 1 → N 2 , + τ ), RETURN { NINSERT ([ t 1 , t 2 ],N, τ ) } S ← S ∪ { APPEAR (t,N, τ ,r,c) } DELAY ( t s , N 2 → N 1 , + τ , t − t s ) } else if LocalTuple(N, τ ) then for each ( − τ ,N,t,r,c) ∈ Log: t 1 ≤ t ≤ t 2 function QUERY ( SEND (t, N → N 0 , + τ )) RETURN S r 2 Rules(N) : Head ( r )= τ S ← S ∪ { DISAPPEAR (t,N, τ ,r,c) } FIND (+ τ ,N,t,r,c) ∈ Log { NDERIVE ([ t 1 , t 2 ],N, τ ,r) } RETURN S RETURN { APPEAR (t, N , τ ,r) } function QUERY ( APPEAR (t,N, τ ,r,c)) else RETURN { NRECEIVE ([ t 1 , t 2 ],N, + τ ) } function QUERY ( NEXIST ([ t 1 , t 2 ],N, τ )) Approach if BaseTuple( τ ) then function QUERY ( NRECEIVE ([ t 1 , t 2 ],N, + τ )) Generating Negative Provenance if ∃ t < t 1 : (- τ ,N,t,r,1) ∈ Log then S ← ∅ , t 0 ← t 1 − ∆ max RETURN { INSERT (t,N, τ ) } t x ← max t<t 1 : (- τ ,N,t,r,1) ∈ Log for each N 0 ∈ SENDERS ( τ ,N): else if LocalTuple(N, τ ) then RETURN { DISAPPEAR ( t x ,N, τ ), RETURN { DERIVE (t,N, τ ,r) } X ← { t 0 ≤ t ≤ t 2 | (+ τ , N 0 ,t,r,1) ∈ Log } NAPPEAR (( t x , t 2 ],N, τ ) } else RETURN { RECEIVE (t, N ← r.N , τ ) } t x ← t 0 else RETURN { NAPPEAR ([0, t 2 ],N, τ ) } for (i=0; i < | X | ; i++) function QUERY ( INSERT (t,N, τ )) function QUERY ( NDERIVE ([ t 1 , t 2 ],N, τ ,r)) S ← S ∪ { NSEND (( t x , X i ), N 0 , + τ ), RETURN ∅ S ← ∅ NARRIVE (( t 1 , t 2 ), N 0 → N , X i , + τ ) } function QUERY ( DERIVE (t,N, τ , τ :- τ 1 , τ 2 ...)) for ( τ i , I i ) ∈ PARTITION ([ t 1 , t 2 ],N, τ ,r) t x ← X i S ← ∅ S ← S ∪ { NEXIST ( I i ,N, τ i ) } S ← S ∪ { NSEND ([ t x , t 2 ], N 0 , + τ ) } for each τ i : if (+ τ i ,N,t,r,c) ∈ Log: RETURN S S ← S ∪ { APPEAR (t,N, τ i ,c) } RETURN S function QUERY ( NSEND ([ t 1 , t 2 ],N, + τ )) function Q ( NARRIVE ([ t 1 , t 2 ], N 1 → N 2 , t 0 , + else τ )) t x ← max t 0 <t : (+ τ , N , t 0 ,r,1) ∈ Log if ∃ t 1 <t<t 2 : (- τ ,N,t,r,1) ∈ Log then FIND ( + τ , N 2 , t 3 ,( N 1 , t 0 ),1) ∈ Log RETURN { EXIST ([ t 1 , t ],N, τ ), RETURN { SEND ( t 0 , N 1 → N 2 , + τ ), S ← S ∪ { EXIST ([ t x ,t],N, τ i ,c) } NAPPEAR (( t , t 2 ],N, τ ) } RETURN S DELAY ( t 0 , N 1 → N 2 , + τ , t 3 − t 0 ) } else RETURN { NAPPEAR ([ t 1 , t 2 ],N, τ ) } Figure 3: Graph construction algorithm. Some rules have been omitted; for instance, the handling of + τ and − τ messages is Improving readability Y! System R-tree indexing Experiments Query speed Evaluation Size reduction Usability 9
Background: Provenance - Captures causality between events - Example: SNP (SOSP ’11) Event Causal DNS Query arrived relationship at HTTP Server network datalog (NDLOG) ¡ DNS Query Broken FlowEntry PacketSent :- PacketReceived , FlowEntry. received at Switch existed at Switch … … Provenance graph 10
Background: How to generate provenance? Step 3: Provenance graph is generated Step 2: Issue query when relevant event occurs Step 1: Collect events from distributed system PacketSent :- PacketReceived, FlowEntry . PacketSent :- PacketOut . PacketSent during [t4,t5] FlowEntry during [t4,t5] PacketReceived during [t4,t5] ??? PacketReceived ¡ FlowEntry ¡ PacketOut ¡ PacketSent ¡ time ¡ now ¡ t4 ¡ t5 ¡ 11
WHY NOT ? Goal: Diagnose missing events Overview Approach: Counter-factual reasoning Challenge: Too many explanations Background: Provenance function QUERY ( EXIST ( [ t 1 , t 2 ] ,N, τ )) function QUERY ( RECEIVE (t, N 1 ← N 2 , + τ )) function QUERY ( NAPPEAR ([ t 1 , t 2 ],N, τ )) t s ← max t 0 <t : (+ τ , N 2 , t 0 ,r,1) ∈ Log S ← ∅ if BaseTuple( τ ) then for each ( + τ ,N,t,r,c) ∈ Log: t 1 ≤ t ≤ t 2 RETURN { SEND ( t s , N 1 → N 2 , + τ ), RETURN { NINSERT ([ t 1 , t 2 ],N, τ ) } S ← S ∪ { APPEAR (t,N, τ ,r,c) } DELAY ( t s , N 2 → N 1 , + τ , t − t s ) } else if LocalTuple(N, τ ) then for each ( − τ ,N,t,r,c) ∈ Log: t 1 ≤ t ≤ t 2 function QUERY ( SEND (t, N → N 0 , + τ )) RETURN S r 2 Rules(N) : Head ( r )= τ S ← S ∪ { DISAPPEAR (t,N, τ ,r,c) } FIND (+ τ ,N,t,r,c) ∈ Log { NDERIVE ([ t 1 , t 2 ],N, τ ,r) } RETURN S RETURN { APPEAR (t, N , τ ,r) } function QUERY ( APPEAR (t,N, τ ,r,c)) else RETURN { NRECEIVE ([ t 1 , t 2 ],N, + τ ) } function QUERY ( NEXIST ([ t 1 , t 2 ],N, τ )) Approach if BaseTuple( τ ) then function QUERY ( NRECEIVE ([ t 1 , t 2 ],N, + τ )) Generating Negative Provenance if ∃ t < t 1 : (- τ ,N,t,r,1) ∈ Log then S ← ∅ , t 0 ← t 1 − ∆ max RETURN { INSERT (t,N, τ ) } t x ← max t<t 1 : (- τ ,N,t,r,1) ∈ Log for each N 0 ∈ SENDERS ( τ ,N): else if LocalTuple(N, τ ) then RETURN { DISAPPEAR ( t x ,N, τ ), RETURN { DERIVE (t,N, τ ,r) } X ← { t 0 ≤ t ≤ t 2 | (+ τ , N 0 ,t,r,1) ∈ Log } NAPPEAR (( t x , t 2 ],N, τ ) } else RETURN { RECEIVE (t, N ← r.N , τ ) } t x ← t 0 else RETURN { NAPPEAR ([0, t 2 ],N, τ ) } for (i=0; i < | X | ; i++) function QUERY ( INSERT (t,N, τ )) function QUERY ( NDERIVE ([ t 1 , t 2 ],N, τ ,r)) S ← S ∪ { NSEND (( t x , X i ), N 0 , + τ ), RETURN ∅ S ← ∅ NARRIVE (( t 1 , t 2 ), N 0 → N , X i , + τ ) } function QUERY ( DERIVE (t,N, τ , τ :- τ 1 , τ 2 ...)) for ( τ i , I i ) ∈ PARTITION ([ t 1 , t 2 ],N, τ ,r) t x ← X i S ← ∅ S ← S ∪ { NEXIST ( I i ,N, τ i ) } S ← S ∪ { NSEND ([ t x , t 2 ], N 0 , + τ ) } for each τ i : if (+ τ i ,N,t,r,c) ∈ Log: RETURN S S ← S ∪ { APPEAR (t,N, τ i ,c) } RETURN S function QUERY ( NSEND ([ t 1 , t 2 ],N, + τ )) function Q ( NARRIVE ([ t 1 , t 2 ], N 1 → N 2 , t 0 , + else τ )) t x ← max t 0 <t : (+ τ , N , t 0 ,r,1) ∈ Log if ∃ t 1 <t<t 2 : (- τ ,N,t,r,1) ∈ Log then FIND ( + τ , N 2 , t 3 ,( N 1 , t 0 ),1) ∈ Log RETURN { EXIST ([ t 1 , t ],N, τ ), RETURN { SEND ( t 0 , N 1 → N 2 , + τ ), S ← S ∪ { EXIST ([ t x ,t],N, τ i ,c) } NAPPEAR (( t , t 2 ],N, τ ) } RETURN S DELAY ( t 0 , N 1 → N 2 , + τ , t 3 − t 0 ) } else RETURN { NAPPEAR ([ t 1 , t 2 ],N, τ ) } Figure 3: Graph construction algorithm. Some rules have been omitted; for instance, the handling of + τ and − τ messages is Improving readability Y! System R-tree indexing Experiments Query speed Evaluation Size reduction Usability 12
Generating negative provenance graphs - Goal: Explain why something does not exist - Use missing preconditions to explain missing events No PacketSent during [t1,now] ??? PacketSent :- PacketReceived , FlowEntry . PacketSent ¡ PacketReceived ¡ FlowEntry ¡ time ¡ now ¡ t1 ¡ t2 ¡ t3 ¡ t4 ¡ t5 ¡ 13
Generating negative provenance graphs - Explanation can be unnecessarily complex No PacketSent during [t1,now] No PacketReceived No PacketReceived during [t1,t2] during [t5,now] No FlowEntry No FlowEntry No PacketReceived during [t2,t3] during [t4,t5] during [t3,t4] PacketSent ¡ PacketReceived ¡ FlowEntry ¡ time ¡ now ¡ t1 ¡ t2 ¡ t5 ¡ t3 ¡ t4 ¡ 14
Recommend
More recommend