Traffic Driven Analysis of Cellular Data Networks Samir R. Das Computer Science Department Stony Brook University Joint work with Utpal Paul, Luis Ortiz (Stony Brook U), Milind Buddhikot, Anand Prabhu Subramanian (Alcatel ‐ Lucent Bell Labs)
Mobile Data Usage 10.8 EB / month Higher than the traffic volume in the entire Global Internet In 2006 0.6 EB / month 1 Exabyte = 1 million Terabyte Forecast of Global Mobile Data Traffic Source: CISCO VNI Mobile Relatively little research on nature of mobile data traffic. 2
Modeling Traffic and Management Forecasting Traffic Analysis 3
Measurement Infrastructure Packet Flows Flow Records Internet Flow Monitoring Mobility and Tool Session Manager SQL Database Radio Access Network
Sample Results from Traffic Analysis • Data collected from a nationwide 2G/3G network circa 2007 – About 10K BSes, 1M subscribers. • Significant traffic imbalance per subscriber and per BS – 1% of subscribers create more than 60% of load. – 10% of BSes experience more than 50% of load. • Mobility is generally low – More than 50% subscribers stick to just one BS daily. – Median radius of gyration is ~1 mile.
Sample Results from Traffic Analysis • Mobility is predictable – Subscribers are almost always found in their top 2 ‐ 3 most visited locations. – They return to the same location at the same time of the day with high probability. • More mobile subscribers tend to generate more traffic. • Radio resource usage efficiency is very poor – Much poorer for light users relative to heavy users.
Functional Influence Among BSes • Model BS load as time series. Explore causal relationships between pairs of time series. • Granger Causality – Determines whether one time series is useful in forecasting another when using an autoregressive model. – Has been used in economics and neuroscience. • Statistically significant causality exists among neighboring BSes (roughly among half of the neighbors). • Causality graph and causal path – Make a graph out of causality. Long paths exist in this graph (median = 15 hops, 90 ‐ percentile = 37 hops).
Modeling Study • Model BS traffic loads exploiting any interactions/dependencies – Exploit tools from machine learning. – Many possible directions – purely static/spatial, dynamic/temporal. • Goals: – Intellectual – broad understanding of any underlying structure would help future network architectures. – Utilitarian – models can help estimation/forecasting. Useful for various resource management.
Spatial Modeling Approach: Probabilistic Graphical Modeling • Assume load on n base stations are multi ‐ variate Gaussian: Covariance matrix Mean vector • Learn the parameters given a set of training data, specifically the “inverse covariance matrix” , given a set of training data ( p observations). 1 • is easier to estimate than and exposes interesting properties.
Inverse Covariance Matrix: Properties • If then load variables and are 2 X i X j 1 conditionally independent , 4 given the rest of the 3 variables. 5 • Most problems produce a `sparse’ model. • Related to probabilistic Undirected Graphical Model graphical models (e.g., ‐ > Edge Gaussian Markov Random ‐ > no edge Field). Graph properties translate to probabilistic (in)dependencies
Inference Problem • Estimate load for BS i given 2 1 the load of a subset of BSes 4 S as the conditional mean: 3 5 Measure only a • Broad questions: subset and estimate the rest. – How large should be S ? Effort vs. accuracy tradeoff. – How to choose S ?
First Solve the Learning Problem • Learn the inverse covariance matrix from training data. • How? Exploit relationship with linear regression modeling. – Express load of BS i as a linear function of all other BS loads and then regress: Y i X j j i j i – Regression coefficients can be shown to be directly related to inv. cov. matrix elements.
Sparse Models j i • Sparse model ‐ > many regression coeffs are zero. • Reduces danger of over ‐ fitting (lowering variance). Also, computationally efficient. • Introduce a regularization term in regression. We used “Lasso” . Regularization term modeling Empirical error penalty
Regularization • Cross ‐ validate using additional training samples (not used for model creation). • Use various values of to create different models. • Choose the one with max likelihood.
Data Processing • Hourly load of 400 BSes covering 75 x 84 miles area. Includes a busy downtown and surrounding suburbs. • No temporal dimension in model. Create different models for for different parts of the day (every 4 hours). • Account for diurnal variation of load. Use residuals from a fitting function. Residuals pass normality test.
Average Edge Length in the Model Graph In miles In hops (in Voronoi Graph) • Apparent spatial/regional significance.
Choosing the Measured Set S • Greedy strategy – each iteration picks the BS that minimizes the error estimate. • Higher load first achieves almost similar performance.
Impact of Estimation Accuracy on Applications • We understand the measurement complexity (size of S) vs. Error tradeoff. • But how much accuracy do we need? Need to turn to applications • Studied two applications – Energy Management – Opportunistic Traffic Scheduling
Opportunistic Traffic Scheduling • Similar to Smart Electric Grid – move non ‐ urgent traffic from peak to off ‐ peak periods. – What is non ‐ urgent? p2p, large downloads, sync, push, etc. – Who decides? User agent on mobile. May have multiple levels of priority or have deadlines to aid scheduling. – Carriers can incentivize such scheduling. • Similar to QoS scheduling – but at a higher layer and at a longer time scale. • Two components in System Architecture – Server (Scheduler) in core network. – User agent on mobile that coordinates with Server.
Server (scheduler) in the core network Creates low-priority flow Deadline=2hr Time Line 2:30PM 2PM 3PM 3:30PM 20
Solving the Scheduling Problem • Several approaches possible based on how flows are prioritized. • But for any approach, server needs to be able estimate current/future loads at all BSes. – Also, needs to model/estimate subscriber mobility (separate problem). • Poor estimation leads to poor scheduler performance.
Evaluation Approach • Trace ‐ driven simulator based on a capacity model of BSes. • Opportunistic scheduling is meant to admit more traffic but with the same network capacity. • We use the same traffic trace always, but reduce network capacity to demonstrate impact. • Impact? – Do low priority flows still finish within a reasonable time? – Are high priority flows impacted?
Results • Low priority flows = random subset of long ‐ lived flows (over 25 mins), about 8% of all flows. Randomly chosen deadlines 1 ‐ 4 hours. • Rest high priority. • Scheduling epoch hourly. • Only a subset of 400 BSes are measured, rest estimated.
Conclusions • Discovering structures in mobile traffic is a rich area of study. • Applications in network and resource management.
Questions? Modeling Traffic and Management Forecasting Traffic Analysis 25
Recommend
More recommend