Continuous Distributed Monitoring in the Evolved Packet Core Industry Experience Report Romaric Duvignau 1 Marina Papatriantafilou 1 Konstantinos Peratinos 3 om 2 Patrik Nyman 2 Eric Nordstr¨ DEBS 2019, Darmstadt (June 26). 1 Chalmers University of Technology, 2 Ericsson, 3 Chalmers student and Ericsson intern.
Introduction
LTE EPC PDN LTE EPC PDN User Equipment EPG User Equipment EPG (UE) Context: Monitoring the Evolved Packet Core (EPC) in 4G (UE) Control Plane Control Plane (CP) LTE LTE EPC LTE EPC PDN EPC PDN PDN (CP) The Evolved Packet Core LTE EPC PDN User Equipment User Equipment EPG User Equipment EPG EPG Sx a /Sx b PFCP (UE) (UE) (UE) Sx a /Sx b PFCP Control Plane Control Plane Control Plane User Equipment EPG TEID TEID (CP) (CP) (CP) TEID TEID (UE) User plane User plane (UP) Control Plane Sx a /Sx b EPC Sx a /Sx b Sx a /Sx b PFCP PFCP PFCP (UP) GTP (CP) GTP TEID TEID TEID TEID TEID TEID W 1 W 2 W N ..... User plane User plane User plane W 1 W 2 W N Base station Servers ..... (UP) (UP) (UP) Base station Servers GTP GTP GTP Sx a /Sx b PFCP W 1 W 2 W N W 1 W 2 W N W 1 W 2 W N ..... ..... ..... Base station Base station Base station Servers Servers Servers TEID TEID User plane (UP) GTP W 1 W 2 W N ..... Base station Servers 1
LTE EPC PDN LTE EPC PDN User Equipment EPG User Equipment EPG (UE) Context: Monitoring the Evolved Packet Core (EPC) in 4G (UE) Control Plane Control Plane (CP) LTE LTE EPC LTE EPC PDN EPC PDN PDN (CP) The Evolved Packet Core LTE EPC PDN User Equipment User Equipment EPG User Equipment EPG EPG Sx a /Sx b PFCP (UE) (UE) (UE) Sx a /Sx b PFCP Control Plane Control Plane Control Plane MME, QoS, billing, ... User Equipment EPG TEID TEID (CP) (CP) (CP) TEID TEID (UE) User plane User plane (UP) Control Plane Sx a /Sx b EPC Sx a /Sx b Sx a /Sx b PFCP PFCP PFCP (UP) GTP (CP) GTP TEID TEID TEID TEID TEID TEID W 1 W 2 W N ..... User plane User plane User plane W 1 W 2 W N Base station Servers ..... (UP) (UP) (UP) Base station Servers GTP GTP GTP Sx a /Sx b PFCP W 1 W 2 W N W 1 W 2 W N W 1 W 2 W N ..... ..... ..... Base station Base station Base station Servers Servers Servers TEID TEID User plane (UP) GTP W 1 W 2 W N ..... Base station Servers 1
LTE EPC PDN LTE EPC PDN User Equipment EPG User Equipment EPG (UE) Context: Monitoring the Evolved Packet Core (EPC) in 4G (UE) Control Plane Control Plane (CP) LTE LTE EPC LTE EPC PDN EPC PDN PDN (CP) The Evolved Packet Core LTE EPC PDN User Equipment User Equipment EPG User Equipment EPG EPG Sx a /Sx b PFCP (UE) (UE) (UE) Sx a /Sx b PFCP Control Plane Control Plane Control Plane MME, QoS, billing, ... User Equipment EPG TEID TEID (CP) (CP) (CP) TEID TEID (UE) User plane User plane (UP) Control Plane Sx a /Sx b EPC Sx a /Sx b Sx a /Sx b PFCP PFCP PFCP (UP) GTP (CP) GTP TEID TEID TEID TEID TEID TEID W 1 W 2 W N ..... User plane User plane User plane W 1 W 2 W N Base station Packet Gateway Servers ..... (UP) (UP) (UP) Base station Servers GTP GTP GTP Sx a /Sx b PFCP W 1 W 2 W N W 1 W 2 W N W 1 W 2 W N ..... ..... ..... Base station Base station Base station Servers Servers Servers TEID TEID User plane (UP) GTP W 1 W 2 W N ..... Base station Servers 1
LTE EPC PDN LTE EPC PDN User Equipment EPG User Equipment EPG (UE) Context: Monitoring the Evolved Packet Core (EPC) in 4G (UE) Control Plane Control Plane (CP) LTE LTE EPC LTE EPC PDN EPC PDN PDN (CP) The Evolved Packet Core LTE EPC PDN User Equipment User Equipment EPG User Equipment EPG EPG Sx a /Sx b PFCP (UE) (UE) (UE) Sx a /Sx b PFCP Control Plane Control Plane Control Plane MME, QoS, billing, ... User Equipment EPG TEID TEID (CP) (CP) (CP) TEID TEID (UE) User plane User plane (UP) Control Plane Sx a /Sx b EPC Sx a /Sx b Sx a /Sx b PFCP PFCP PFCP (UP) GTP (CP) GTP TEID TEID TEID TEID TEID TEID W 1 W 2 W N ..... User plane User plane User plane W 1 W 2 W N Base station Packet Gateway Servers ..... (UP) (UP) (UP) Base station Servers GTP GTP GTP Sx a /Sx b PFCP W 1 W 2 W N W 1 W 2 W N W 1 W 2 W N ..... ..... ..... Base station Base station Base station Servers Servers Servers TEID TEID • Large-Scale, Distributed, Performance-critical system. User plane (UP) • Strong need to continuously monitor the EPC: e.g. detection GTP of under- or over-used subcomponents. W 1 W 2 W N ..... Base station Servers 1
Continuous Distributed Monitoring
Continuous Distributed Monitoring (CDM) Model 2
Continuous Distributed Monitoring (CDM) Model f ( S 1 , S 2 , · · · , S k ) 2
Continuous Distributed Monitoring (CDM) Model f ( S 1 , S 2 , · · · , S k ) There exist variants (unidirectional, relay nodes, etc). 2
Continuous Distributed Monitoring (CDM) Model f ( S 1 , S 2 , · · · , S k ) • Instant computation & communication • f depends on ∪ S i There exist variants (unidirectional, relay nodes, etc). 2
System Architecture
System Architecture Overview Incoming Traffic C Load Balancer Agg 1 Agg 2 · · · w 1 w 1 w 1 w 2 w 2 w 2 · · · · · · 1 2 ℓ 1 2 ℓ 3
System Architecture Overview Incoming Traffic C Load Balancer Agg 1 Agg 2 · · · Fetched Statis- tics w 1 w 1 w 1 w 2 w 2 w 2 · · · · · · 1 2 ℓ 1 2 ℓ 3
System Architecture Overview Incoming Traffic Monitoring Messages C Load Balancer Agg 1 Agg 2 · · · Fetched Statis- tics w 1 w 1 w 1 w 2 w 2 w 2 · · · · · · 1 2 ℓ 1 2 ℓ 3
System Architecture Overview Incoming Display (analysts) Traffic Monitoring Messages C Load Balancer Agg 1 Agg 2 · · · Fetched Statis- tics w 1 w 1 w 1 w 2 w 2 w 2 · · · · · · 1 2 ℓ 1 2 ℓ 3
System Architecture Overview Incoming Display (analysts) Traffic Monitoring Messages C Load Balancer Agg 1 Agg 2 · · · Fetched Statis- tics w 1 w 1 w 1 w 2 w 2 w 2 · · · · · · 1 2 1 2 ℓ ℓ Differences with CDM models • Sites identity matters, performance statistics � = “events”, etc • Need to account for comp. and communication delays! 3
System Architecture Overview Incoming Display (analysts) Traffic Monitoring Messages C Load Balancer Agg 1 Agg 2 · · · Fetched Statis- tics w 1 w 1 w 1 w 2 w 2 w 2 · · · · · · 1 2 ℓ 1 2 ℓ Monitoring Period time 3
System Architecture Overview Incoming Display (analysts) Traffic Monitoring Messages C Load Balancer Agg 1 Agg 2 · · · Fetched Statis- tics w 1 w 1 w 1 w 2 w 2 w 2 · · · · · · 1 2 ℓ 1 2 ℓ Monitoring Period Fetches time 3
System Architecture Overview Incoming Display (analysts) Traffic Monitoring Messages C Load Balancer Agg 1 Agg 2 · · · Fetched Statis- tics w 1 w 1 w 1 w 2 w 2 w 2 · · · · · · 1 2 ℓ 1 2 ℓ Monitoring Period Fetches Sliding Window time 3
System Architecture Overview Incoming Display (analysts) Traffic Monitoring Messages C Load Balancer Agg 1 Agg 2 · · · Fetched Statis- tics w 1 w 1 w 1 w 2 w 2 w 2 · · · · · · 1 2 ℓ 1 2 ℓ Monitoring Period Fetches Sliding Window time 3
System Architecture Overview Incoming Display (analysts) Traffic Monitoring Messages C Load Balancer Agg 1 Agg 2 · · · Fetched Statis- tics w 1 w 1 w 1 w 2 w 2 w 2 · · · · · · 1 2 ℓ 1 2 ℓ Monitoring Period Fetches Sliding Window time 3
System Architecture Overview Incoming Display (analysts) Traffic Monitoring Messages C Load Balancer Agg 1 Agg 2 · · · Fetched Statis- tics w 1 w 1 w 1 w 2 w 2 w 2 · · · · · · 1 2 ℓ 1 2 ℓ Monitoring Period Fetches Sliding Window time → At the Agg: monitoring decisions then 1 monitoring message. 3
Monitoring Algorithms
Selected CDM Algorithms for Counting problems Basic Mode: Exact Monitoring • Send an update if last value sent is • • different to measured value • • • Keep an exact sliding window of • the last n values • • • 4
Selected CDM Algorithms for Counting problems Basic Mode: Exact Monitoring • Send an update if last value sent is • • different to measured value • • • Keep an exact sliding window of • the last n values • • • Approximation Mode: Relative Error of ε • Uses Exponential Histograms for approximate counting • • • Send the approximate count when • • it is beyond some error bound from • the last value sent • • • • Requires in all O (log( n ε ) /ε ) words 4
Results
Experimental setup • EPG setup: 2 aggregators, 72 workers per aggregator • 2 phases: increasing load (20min) then stable load (15min) 100 Max CPU utilization (%) 80 p95 Median 60 p5 40 Min 20 0 3M Packet rate (packets/s) Max p95 2M Median p5 Min 1M 0 500 1000 1500 2000 5
Experimental setup • EPG setup: 2 aggregators, 72 workers per aggregator • 2 phases: increasing load (20min) then stable load (15min) 100 Max CPU utilization (%) 80 p95 Median 60 p5 40 Min 20 1000 fetches /s – high precision 0 3M Packet rate (packets/s) Max p95 2M Median p5 Min 1M 1 fetch /s – low precision 0 500 1000 1500 2000 5
Recommend
More recommend