Welcome to DS504/CS586: Big Data Analytics --Introduction & Logistics Prof. Yanhua Li Time: 6:00pm –8:50pm THURSDAY Location: KH 116 Fall 2017
Who am I? Yanhua Li , PhD Assistant Professor Computer Science & Data Science PhD, Computer Science, U of Minnesota, 2013 PhD, Electrical Engineering, BUPT, 2009 Research Interests: Big data analytics, Smart Cities, Measurement, Spatio-temporal Data Mining Industrial Experience: Bell-Labs, Microsoft Research, HUAWEI research Labs
What is DS504/CS586 about? v A second Level DS/CS course (primarily) for graduates v CS/DS Ph.D students in big data analytics and related areas; v then other Ph.D students or MS students with v Experience in databases and/or in data mining, or equivalent knowledge. v Sufficient programming experience is expected so that you are comfortable to undertake a course project. 3
Introduction What is “Big Data”? 4
Big Data – What is it? • A “big” buzzword … • No single standard definition… • Talk to 1000 people, there will be 1000 “definitions” … “ Big Data ” is data whose scale, diversity, complexity, and/or quality require new architectures, techniques, algorithms, analytics, and interfaces to manage it and extract value and hidden knowledge from it…
Why Now? Big Data and Big Challenges
Big Data • Volume • Variety • Velocity • Veracity
Big Data • Volume • Variety • Velocity Thanks: http://www-01.ibm.com/software/data/bigdata/images/4-Vs
Big Data • Volume • Variety • Velocity Thanks: http://www-01.ibm.com/software/data/bigdata/images/4- Vs-of-big-data.jpg
Big Data • Volume • Variety • Velocity Thanks: http://www-01.ibm.com/software/data/bigdata/images/4-Vs data.jpg
Big Data Thanks: http://www- 01.ibm.com/software/data/bigdata/i • Volume mages/4-Vs-of-big-data.jp • Variety • Velocity Thanks: http://www- 01.ibm.com/software/data/bigdata/images/4-Vs-of-big-data.jpg
4Vs 12
The Model Has Changed… Old Model of Generating/Consuming Data has Changed Old Model: Few privileged companies are generating and “owning” data, all others are consuming data (in controlled packages)
The Model Has Changed… • New Model of Generating/Consuming Data has Changed Producers : • Everyone - Man, Woman and Child, and Devices Consumers: • Professionals • Businesses 14 • Scientists • And us • Everyone wants a piece of this pie …
What Sectors Can Benefit? • Businesses • Transportation • Science & Engineering • Governments • Energy • Healthcare • Education • Entertainment Utilize data to improve people’s life quality
Big Data Analytics techniques and tools for managing, analyzing and extracting knowledge from “big data” 16
Roadmap 1. Intro of Big Data Analytics 1. 5 minutes break 2. Logistics 1. 10 minutes break, talk to other students 3. Application stories Self-intro (and group forming Hand in your survey Email you for permission or not You will need to find your team and let me know
Done with the high level introduction Begin with application stories
Big Challenges in Big Cities
Big Data in Cities
Service Providing Improve urban planning, Ease Traffic Congestion, Save Energy, Reduce The Environment Air Pollution, ... Win Urban Data Analytics Data Mining, Machine Learning, Visualization Urban Computing Urban Data Management People Win Win Cities OS Spatio-temporal index, streaming, trajectory, and graph data management,... Human Meteorolo Social Road Air Energy Networks POIs Traffic mobility gy Quality Media Tackle the Big Urban Sensing & Data Acquisition challenges Participatory Sensing, Crowd Sensing, Mobile Sensing in Big cities using Big data! Urban Computing: concepts, methodologies, and applications . Zheng, Y., et al. ACM transactions on Intelligent Systems and Technology .
Service Providing Improve urban planning, Ease Traffic Congestion, Save Energy, Reduce Air Pollution, ... Urban Data Analytics Data Mining, Machine Learning, Visualization Urban Data Management Spatio-temporal index, streaming, trajectory, and graph data management,... • Data sparsity and missing • Skewed sample distribution Human Air Meteorolo Social Road Energy Networks POIs • Traffic Limited resources mobility gy Quality Media Urban Sensing & Data Acquisition The Environment Participatory Sensing, Crowd Sensing, Mobile Sensing Win Urban Computing People Win Win Cities OS Zheng, Y., et al. Urban Computing: concepts, methodologies, and applications. ACM transactions on Intelligent Systems and Technology .
Urban Sensing A sample of data à An entire dataset • Data sparsity and missing • Biased distribution S1 S2 S6 S7 S6 S8 S12 S14 S13 S21 S19 S15 S22 S20 S9 S16 S10 S11 S4 S3 S18 S16 Taxi flow Entire traffic flow S5 S17 Air quality monitoring stations Inferring Gas Consumption and Pollution Zheng, Y., et al. U-Air: when urban air quality Emission of Vehicles throughout a City. KDD inference meets big data. KDD 2013 2014.
Urban Sensing A limited resource (budget, labors, land…) • Static sensing: Where to • Crowdsensing: How to arrange deploy sensor to maximize the the incentives dynamically? gain? S1 S2 S6 S7 S6 S8 S12 S14 S13 S21 S19 S15 S22 S20 S9 S16 S10 S11 S4 S3 S18 S16 S5 S17 Suggesting locations for monitoring stations, KDD 2015
Improving Medical Emergency Services using Big Data Dispatching Center Ambulance stations Patients Save 30+% time! Hospital • Select locations for Ambulance Stations • Dynamic ambulance allocation Yilun Wang, Yu Zheng , et al. Travel Time Estimation of a Path using Sparse Trajectories.. KDD 2014 Location Selection for Ambulance Stations: A Data-Driven Approach, ACM SIGSPATIAL 2015
Service Providing Improve urban planning, Ease Traffic Congestion, Save Energy, Reduce Air Pollution, ... Urban Data Analytics Data Mining, Machine Learning, Visualization Urban Data Management • Management in spatio-temporal spaces Spatio-temporal index, streaming, trajectory, and graph data management,... • Multi-modality data • Dynamic, high velocity and volume Human Air Meteorolo Social Road Energy Networks POIs Traffic mobility gy Quality Media Urban Sensing & Data Acquisition The Environment Participatory Sensing, Crowd Sensing, Mobile Sensing Win Urban Computing People Win Win Cities OS Zheng, Y., et al. Urban Computing: concepts, methodologies, and applications. ACM transactions on Intelligent Systems and Technology .
Urban Data Management • Managing multi-modality data • Dynamic and big volume – Categorical and numeric data – Group query strategy – Different scales, densities, – Computing in parallel updating frequency, and ST properties Spatio-temporal Spatial Static Spatio-Temporal Static Data Temporal Dynamic Data Dynamic Data Point-Based POI Distributions Spatial-temporal Weather/AQI Station Data Crowd Souring Data Network-Based Road/Transportation Road Traffic Data Trajectory Data Networks Yu Zheng . Trajectory Data Mining: An Overview. ACM Transactions on Intelligent Systems and Technology ( ACM TIST ). 2015
Service Providing • Texts and images à Improve urban planning, Ease Traffic Congestion, Save Energy, Reduce spatial and spatio-temporal data; Air Pollution, ... • A single data source à Data cross different domains • Separate data mining algorithms à Urban Data Analytics machine learning + data management Data Mining, Machine Learning, Visualization Urban Data Management Spatio-temporal index, streaming, trajectory, and graph data management,... Human Air Meteorolo Social Road Energy Networks POIs Traffic mobility gy Quality Media Urban Sensing & Data Acquisition The Environment Participatory Sensing, Crowd Sensing, Mobile Sensing Win Urban Computing People Win Win Cities OS Zheng, Y., et al. Urban Computing: concepts, methodologies, and applications. ACM transactions on Intelligent Systems and Technology .
Data Integration vs Knowledge Fusion Schema Mapping Dataset A Schema Duplicate Domain S Data Merge Object Mapping Detection Dataset B Schema Mapping Dataset C A) Paradigm of the conventional data fusion Knowledge Knowledge Domain A Extraction Dataset A Knowledge Knowledge Latent Knowledge Domain B Fusion Object Extraction Dataset B Knowledge Domain C Knowledge Extraction Dataset C Cross-Domain Data Fusion Yu Zheng . Methodologies for Cross-Domain Data Fusion: An Overview. IEEE Transactions on Big Data, 1, 1, 2015.
Multi-View-Based Learning
Urban Computing for Urban Planning Best Paper Nominee Award at UbiComp 2011 The Most Cited Paper
City-Wide Traffic Modeling Partition a city into regions with major roads Regions are root causes of the problem Yu Zheng , et al. Urban Computing with Taxicabs, In Proc. Of UbiComp 2011
Shanghai Big Data Hotpot Restaurant
When Urban Air Meets Big Data KDD 2013 http://urbanair.msra.cn/
Air Pollution: A Global Concern ! Air quality monitor station PM2.5, PM10, NO 2 , SO 2 , CO, O 3 S1 50kmx40km S2 S6 S7 S6 S8 S12 S14 S13 S21 S19 S15 S22 S20 S9 S16 S10 S11 S4 S3 S18 S16 S5 S17
Recommend
More recommend