Analyzing Big Data From Complex Systems: Smart Cards in Urban Transportation Networks Soong Moon Kang School of Management University College London smkang@ucl.ac.uk The Institute for Korean Regional Studies Seoul National University September 6, 2016
Transport for London (TfL) Oyster Card Wikicommons • Introduced in 2003 • By June 2012: - More than 43 million cards issued - Used by more than 80% of all public transport
Agenda: • Study 1: Patterns of Urban Movement • Study 2: Predicting Traffic Volumes and Estimating Effects of Disruptions • Study 3: Extensions of the Study on the Patterns of Urban Movement • Study 4: Extensions of the Study on the Effects of Disruptions • Discussions
Study 1: Patterns of Urban Movement • "Structure of Urban Movements: Polycentric Activity and Entangled Hierarchical Flows” PLoS ONE , January 7, 2011, 6(1):e15923. (with Camille Roth, Michael Batty and Marc Barthélémy)
Data: • March 31, 2008 — April 6, 2008 (1 week) - 11.22 million journeys (trips) - 2.03 million individual users (IDs) • Information for each ID: - time and location of tap-in and tap-out individual movements
Descriptives: Distribution of travel distances can be fitted with a negative binomial function distribution of distances between stations distribution of journeys 9.28 km
Descriptives: Travel propensity actual flow ( w ij ) vs random random simulation (given in- and out-flow at stations) null-model of randomized journeys
Descriptives: Flow distribution: normalized histogram of flows of individuals power law with exponent ≈ 1.3 strong heterogeneity of individual movements w ij : flow of passengers between stations i and j
Descriptives: Distribution of total flows: Zipf plot with for morning peak hours (7am – 10am) • Exponential decay most of total flows concentrated on few stations
Polycenters: Identifying polycenters: 1. Arrange stations by decreasing order of inflow definition of centers by decreasing importance 2. Account for geographical proximity aggregate all stations within a distance (1,500 meters) within the defined center 3. Continue until we capture a large percentage of total flow (60% of total flow)
Polycenters: Hierarchical organization
Polycenters: Northern Stations West End Western Stations City Docklands West London Museums Parliament Mid-Town Government
Polycenters: Anisotropy - Use random simulation from travel propensity to study relative orientation of incoming flow anisotropy if no bias, fully isotropic (= 1)
Polycenters:
Structure of Flows: How flows from single stations (sources) go to centers - squares: sources (single stations) - grey: 20% of total inflow - circles: centers - red: 40% of total inflow
Structure of Flows: Proportion of links going from sources to centers (group) Group I Group II Group III For more than 80% of the sources, the most important link (1 st link) - connects to a center of Group I For more than 80% of the sources, the least important link (10 th link) - connects to a center of Group III.
Study 1: Patterns of Urban Movement • Contributions: - application of complex systems analytical tools to a novel data - a new approach to determine polycenters - attempt to model hierarchical nature of urban movements • Limitations: - exploratory - naive
Study 2: Predicting Traffic Volumes and Estimating Effects of Disruptions • "Predicting Traffic Volumes and Estimating the Effects of Shocks in Massive Transportation Systems” Proceedings of the National Academy of Sciences ( PNAS ) May 5, 2015, 112(18): 5643 – 5648. (with Ricardo Silva and Edoardo M. Airoldi) Introducing statistical analysis into complex systems
Data: • February 2011 — February 2012 - 70 weekdays and 25 weekend days - 211 million journeys (trips) - 10.7 million individual users (IDs) 1.71 journeys per user per day 1.76 million users per day 3 million journeys per day - 374 stations open during the period (underground + overground + DLR)
Data: • Weekdays only
Statistical Model: Basic Idea:
Statistical Model: Basic Idea:
Statistical Model: Basic Idea:
Statistical Model: Basic Idea: Smart Card Data “Natural Regime” Model Network Structure Data “Disruption” Model Disruption Logs Passenger Route Surveys
“Natural Regime” Model: Smart Card Data “Natural Regime” Model Network Structure Data “Disruption” Model Disruption Logs Passenger Route Surveys
“Natural Regime” Model: Basic Idea:
“Natural Regime” Model: Assessment: - Fivefold cross-validation (i.e., 14 days of test data for each fold): Test if the fine-grained model with 374×374 ≅ 140,000 components overfits as compared to the fully aggregated (blackbox) models, and under which conditions the model does better
“Disruption” Model: Smart Card Data “Natural Regime” Model Network Structure Data “Disruption” Model Disruption Logs Passenger Route Surveys
“Disruption” Model: Basic Model:
“Disruption” Model: Results: Average number of exits per minute at Victoria LU station on Tuesday, January 17, 2012. The blue curve represents the 1-min-ahead prediction under the natural regime using the tracking model. Given a disruption from 6:00 PM to 7:00 PM between Victoria station and Brixton station in the Victoria line , - blue horizontal line : the average expected exit rate given by the tracking model under the natural regime , - red horizontal line : the averaged observed exit count , and - black horizontal line : the prediction given by the disruption model
“Disruption” Model: Assessment: (A) Relative errors for line segment events. The absolute error of tracking model for the line segment disruption varies from 3.0 (all stations) to 12.2 (stations with 85 tap-outs per minute or more) persons per minute. (B) Relative errors for station events. The absolute error varies from 3.5 (all stations) to 10.5 (stations with 75 tap-outs per minute or more) persons per minute.
Station Sensitivity Index: How sensitive stations are to line closures: Red dots: top 10% by number of tap-outs
Study 2: Predicting Traffic Volumes and Estimating Effects of Disruptions • Contributions: - application of statistical and machine learning techniques to complex systems - good model to describe and predict the effects of disruptions • Limitation: - simplistic
Study 3: Extensions of the Study on the Patterns of Urban Movement • with Michael Batty, Hae Ran Shin, Ricardo Silva and Chen Zhong Introducing statistical analysis into the study of urban movement patterns
Study 3a: Passenger Travel Distributions Basic Idea:
Study 3a: Passenger Travel Distributions Basic Idea: frequency frequency 0 distance 0 distance Station B Station A
Study 3a: Passenger Travel Distributions Basic Idea: frequency frequency 0 distance 0 distance Station B Station A
Study 3a: Passenger Travel Distributions Some Research Questions: - Do travel distributions of the passengers entering specific stations reveal a more generic pattern? “local” versus “global” - If a generic pattern exist, how it relates to the urban geography? “ center ” versus “periphery”
Study 3b: Passenger Travel Distributions and Geographic Socio-Economic Characteristics Basic Idea: - Correlate passenger travel distributions with geographic socio- economic characteristics such as income, education, age, employment and family composition.
Study 3: Extensions of the Study on the Patterns of Urban Movement • Data: London and Seoul Major challenges: - Only one day of data from Seoul - Fine grained socio-economic data for Seoul
Study 4: Extensions of the Study on the Effects of Disruptions • with Ricardo Silva Refining the statistical analyses Ultimate goal: real-time assessment of effects of disruptions system-wide
Study 4a: Probabilistic and Causal Approaches Basic Idea:
Study 4a: Probabilistic and Causal Approaches Basic Ideas: - provide a full probabilistic model of movement inside the subway network system - estimate the distribution (instead of only the expectation) of travel times, link loads and exit numbers given a disruption causal inference
Study 4b: Passenger-level modeling Basic Ideas: - model by taking into account the behaviour of individual travellers, instead of aggregated counts - collect fine-grained passenger movement data using mobile apps
Discussion
Recommend
More recommend