analysis of network flow data
play

Analysis of Network Flow Data Gonzalo Mateos Dept. of ECE and - PowerPoint PPT Presentation

Analysis of Network Flow Data Gonzalo Mateos Dept. of ECE and Goergen Institute for Data Science University of Rochester gmateosb@ece.rochester.edu http://www.ece.rochester.edu/~gmateosb/ April 26, 2016 Network Science Analytics Analysis of


  1. Analysis of Network Flow Data Gonzalo Mateos Dept. of ECE and Goergen Institute for Data Science University of Rochester gmateosb@ece.rochester.edu http://www.ece.rochester.edu/~gmateosb/ April 26, 2016 Network Science Analytics Analysis of Network Flow Data 1

  2. Network flows Network flows, measurements and statistical analysis Gravity models Traffic matrix estimation Case study: Internet traffic matrix estimation Estimation of network flow costs Case study: Dynamic delay cartography Network Science Analytics Analysis of Network Flow Data 2

  3. Traffic flows ◮ Networks often serve as conduits for traffic flows Example ◮ Commodities and people flow over transportation networks; ◮ Data flows over communication networks; and ◮ Capital flows over networks of trade relations ◮ Flow-related questions on network design, provisioning and routing ⇒ Solutions involve tools in optimization and algorithms ◮ Our focus: statistical analysis and modeling of network flow data ⇒ Regression-based prediction of unknown flow characteristics Network Science Analytics Analysis of Network Flow Data 3

  4. Routing matrix ◮ Let G ( V , E ) be a digraph. Flows are directed: origin → destination ⇒ Directed edges (arcs) here referred to as links ⇒ Number of flows is N f , typically have N f = O ( N 2 v ) ⇒ Flows traverse multiple links en route to their destinations ◮ Routing matrix R ∈ { 0 , 1 } N e × N f states incidence of routes with links � 1 , if flow f routed via link e , r e , f = 0 , otherwise ◮ Assumed flows follow a single route from origin to destination Network Science Analytics Analysis of Network Flow Data 4

  5. Example: Routing of two flows Ex: Consider a digraph with N e = 7 links and N f = 2 active flows e 2   1 0 f 2 0 0 e 5     1 0   e 7   f 1 R = 0 0 e 3     0 1   e 1   0 1   e 6 e 4 1 0 ◮ Strongly connected digraph: flows can be as many as N v ( N v − 1) Network Science Analytics Analysis of Network Flow Data 5

  6. Traffic matrix ◮ Central to study of network flows is the traffic matrix Z ∈ R N v × N v ◮ Entry z ij is total volume of flow from origin vertex i to destination j ◮ Ex: net out-flow from i and net in-flow to j given by � � z i + = z ij and z + j = z ij j i ◮ Link-level aggregate traffic vector x := [ x 1 , . . . , x N e ] T related to Z as x = Rz , where z := vec( Z ) ⇒ Link counts x e equal the sum of flow volumes routed through e Network Science Analytics Analysis of Network Flow Data 6

  7. Flow costs and time dependencies ◮ Notion of cost c associated with paths or links also important Ex: generalized socioeconomic cost for transportation analysis ⇒ Study choices made by consumers of transportation resources Ex: quality of service (QoS) in network traffic analysis ⇒ Monitor delays to unveil congestion or anomalies ◮ Implicitly assumed a static snapshot taken of the network flows ⇒ Flows dynamic in nature. Time-varying models more realistic ⇒ When appropriate will denote x ( t ) , Z ( t ) or R ( t ) ◮ Common assumption to treat routing matrix R as being fixed ⇒ Routing changes at slower time scale than flow dynamics Network Science Analytics Analysis of Network Flow Data 7

  8. Example: Internet2 traffic matrix ◮ Internet2 backbone: N f = 110 flows (8 shown) over a week ⇒ Temporal periodicity and “spatial” correlation apparent Network Science Analytics Analysis of Network Flow Data 8

  9. Roadmap ◮ Roadmap dictated by types of measurement and analysis goal ◮ Measure: origin-destination (OD) flow volumes z ij in full ◮ Goal: model flows to understand and predict future traffic ⇒ Gravity models ◮ Measure: link counts x e , flow volumes unavailable ◮ Goal: traffic matrix estimation, i.e., predict unobserved OD flows z ij ⇒ Gaussian and Poisson models, entropy minimization ◮ Measure: OD costs c ij for a subset of paths ◮ Goal: predict unobserved OD and link costs ⇒ Active network tomography and network kriging Network Science Analytics Analysis of Network Flow Data 9

  10. Gravity models Network flows, measurements and statistical analysis Gravity models Traffic matrix estimation Case study: Internet traffic matrix estimation Estimation of network flow costs Case study: Dynamic delay cartography Network Science Analytics Analysis of Network Flow Data 10

  11. Gravity models ◮ Gravity models originate in the social sciences [Stewart ’41] ⇒ Describe aggregate level of interactions among populations ◮ Ex: geography, economics, sociology, hydrology, computer networks ◮ Newton’s law of gravitation for masses m 1 , m 2 separated by d 12 F 12 = G m 1 m 2 d 2 12 ◮ Gravity models specify interactions among populations vary: ⇒ In direct proportion to the population’s sizes; and ⇒ Inversely with some measure of their separation ◮ Intuition: OD flows as “population interactions”, makes sense! Network Science Analytics Analysis of Network Flow Data 11

  12. Model specification ◮ Sets of origins I and destinations J . Flows Z ij from i ∈ I to j ∈ J ◮ Gravity models state Z ij are independent, Poisson, with mean E [ Z ij ] = h O ( i ) h D ( j ) h S ( c ij ) ⇒ Origin h O ( · ), destination h D ( · ), and separation function h S ( · ) ⇒ “Distance” between i , j captured by separation attributes c ij ◮ Ex: Stewart’s theory of demographic gravitation specifies E [ Z ij ] = γπ O , i π D , j d − 2 ij ⇒ Population sizes measured by π O , i and π D , j , distance by d ij ⇒ Demographic gravitational constant γ ◮ Unlike Netwon’s law, no empirical or theoretical support here Network Science Analytics Analysis of Network Flow Data 12

  13. Origin, destination and separation functions ◮ Multiple origin, destination and separation functions proposed ⇒ Motivated from sociophysics and economic utility theory ◮ Ex: power functions for h O ( i ) and h D ( j ), where for α, β ≥ 0 h O ( i ) = ( π O , i ) α h D ( j ) = ( π D , j ) β and ◮ Ex: power function h S ( c ij ) = c − θ ij , θ ≥ 0. General exponential form h S ( c ij ) = exp( θ T c ij ) , θ , c ij ∈ R K ◮ Convenient for inference of model parameters, since log E [ Z ij ] = log γ + α log π O , i + β log π D , j + θ T c ij ⇒ Log-linear form facilitates standard regression software Network Science Analytics Analysis of Network Flow Data 13

  14. Example: Austrian phone-call data ◮ Q: Structure of telecommunication interactions among populations? ⇒ Planning for government (de)regulation of the sector ⇒ Predict influence of technologies in regional development ◮ Gravity models to model telecommunication patterns as flows ◮ Data for phone-call traffic among 32 Austrian districts in 1991 ⇒ 32 × 31 = 992 flow measurements z ij , i � = j = 1 , . . . , 32 ⇒ Gross regional product (GRP) per region → Size proxy ⇒ Road-based distance among regions → Separation proxy Network Science Analytics Analysis of Network Flow Data 14

  15. Phone-call data scatterplots 6.5 7.5 8.5 6.5 7.5 8.5 1.6 2.0 2.4 2.8 5 4 Z ij 3 2 1 | | | | | | | || | | | | | | | | | | | | GRP i GRP j d ij 8.5 ◮ Data (in log 10 scale) suggest a gravity model of the form E [ Z ij ] = γ ( π O , i ) α ( π D , j ) β ( c ij ) − θ ⇒ π O , i = GRP i , π D , j = GRP j , c ij = d ij i - j ’s road-based distance ◮ Typical that flow volumes vary widely in scale Network Science Analytics Analysis of Network Flow Data 15

  16. Inference for gravity models ◮ Specified Z ij as independent Poisson RVs, with means µ ij = E [ Z ij ] ⇒ ML for statistical inference in the general gravity model ◮ Let α i = log h O ( i ), β i = log h D ( j ) and θ ∈ R K . Will focus on log µ ij = α i + β j + θ T c ij ⇒ Log-linear model ∈ class of generalized linear models ◮ P. McCullagh and J. Nedler, Generalized Linear Models . CRC, 1989 ◮ Given flow observations Z = z , the Poisson log-likelihood for µ is � ℓ ( µ ) = z ij log µ ij − µ ij i , j ∈I×J ⇒ Substitute the gravity model and maximize ℓ ( µ ) for MLE Network Science Analytics Analysis of Network Flow Data 16

  17. ML parameter estimates α i } i ∈I , ˆ β := { ˆ β j } j ∈J and ˆ ◮ MLEs ˆ α := { ˆ θ satisfy T c ij , i , j ∈ I × J ⇒ log ˆ α i + ˆ β j + ˆ log ˆ µ ij = ˆ µ = M ˆ θ γ T ˆ α T ˆ T � T , mean flow estimates ˆ ◮ Defined ˆ � γ := ˆ β θ µ ij solve � � µ ij = z i + , i ∈ I µ ij = z + j , j ∈ J ˆ and ˆ j i � � c ij ( k )ˆ µ ij = c ij ( k ) z ij , k = 1 , . . . , K i , j i , j ◮ Unique MLE ˆ θ under mild conditions, e.g., rank( M ) = I + J + K − 1 α i , ˆ ⇒ Values ˆ β j unique only up to a constant ◮ A. Sen, “Maximum likelihood estimation of gravity model parameters,” J. Regional Science , vol. 26, pp. 461-474, 1986 Network Science Analytics Analysis of Network Flow Data 17

Recommend


More recommend