Pick up a handout on the front table 1
Welcome to DS504/CS586: Big Data Analytics --Review Prof. Yanhua Li Time: 6:00pm –8:50pm R Location: AK232 Fall 2016
Today • 1. Review – Key topics, techniques, discussed in the semester • 2. Future opportunities – Big data analytics – Urban Computing 10 min Break • 3. Team 1 presentation • 4. Course evaluation • 5. Group discussion for final projects
Introduction What is “Big Data”? 4
Big Data Analytics techniques and tools for managing, analyzing and extracting knowledge from “big data” 5
CS586/DS504-2016Fall 5. Applications Techniques Sampling and index Urban Computing, Social Network Analysis Networking 1. Graph Mining 3. Index, Query 4. Big Data Mining 4. Data Collection Graph Mining, Data Clustering Recommender systems, Deep Learning Clustering 4. K-means, DBSCAN 3. Data Management 4. BFR, DENCLUE Indexing, Query Processing 4. Trajectory Clustering 5. Urban: Bike sharing 2. Data Preprocessing/Cleaning Error Correction, Map-Matching More techniques 2. Map-Matching 1. Data Acquisition & Measurement 4. Recommender Systems 4. Deep Learning (Guest) Representative data collection: Sampling
Big Data Mining Topics Topics in Big Data Mining 1 Graph Mining : 3 Recommender Systems Content-Based Graph Sampling Collaborative Filtering Node Importance Ranking User-User Based Facebook/Social graph estimation Item-Item Based Social influence Location-based recommender sys Topic sensitive PageRank Personalized Geo-Social Recom. 2 Clustering Hierarchical 4. Deep Learning K-means, BFR Deep Neural Networks DBScan, DENCLUE Alpha Go Trajectory clustering
Roadmap • 1. Sampling & Indexing – Random prefix/region/zoomin/region sampling – Index structure: B-Tree, Quad-tree, R-tree, etc • 2. Clustering – Hirachical – K-means, DBScan • 3. Recommender System, Deep learning, Map-Matching, etc • 4. Applications
Sampling Big Data 1.1 R andom sampling 1.2 c rawling (uniform & independent) } vertex sampling } BFS sampling } edge sampling } random walk sampling 9 9
Class Outcomes 10
What is DS504/CS586 about? v We’ll learn about – Advanced Techniques for Big Data Analytics • Large scale data sampling and estimation, • Data Cleaning, • Graph Data Mining, • Data management, clustering, etc. – Applications with Big Data Analytics • Urban Computing • Social network analysis • Recommender system, etc. v Learning outcomes – Understand & Explain challenges and advances in the state-of-art in big data analytics. – Design, develop and fully execute a big data analytics project . – Communicate the ideas effectively in the form of a presentation and written documents 11 to a technical audience.
CS586/DS504-2016Fall 5. Applications Techniques Sampling and index Urban Computing, Social Network Analysis Networking 1. Graph Mining 3. Index, Query 4. Big Data Mining 4. Data Collection Graph Mining, Data Clustering Recommender systems, Deep Learning Clustering 4. K-means, DBSCAN 3. Data Management 4. BFR, DENCLUE Indexing, Query Processing 4. Trajectory Clustering 2. Data Preprocessing/Cleaning More techniques Error Correction, Map-Matching 2. Map-Matching 1. Data Acquisition & Measurement 4. Recommender Systems 4. Deep Learning (Guest) Representative data collection: Sampling
Project 1 (Single Data Source) • T1: Allstate Claim Prediction Challenge • T2: Predicting YouTube 3D Videos Trends • T3: Sampling Method for Sum Aggregation of Point of Interests on Map • T4: Mining of Stack Overflow reviews for insights • T5: Measuring restaurant diversity index for different cities • T6: GitHub – Sizing up online social networks 13
Project 2 (Heterogeneous Data) • T1: Restaurants Location Recommendation • T2: Online learning performance vs Offline Geographic Information • T3: Airbnb user behavior prediction • T4: Community detection in large networks • T5: Demand-Supply analysis on Regional Restaurant Distribution • T6: Social Network Marketing through Influence Prediction • Real application problems • Data collection/processing/management/mining/ evaluation/visualization/ 14
Workload v Focus more on critical thinking, problem solving, “ heads-on/hands-on ” experiences! v Understand, formulate and solve problems v Read and critique research papers v Two Course Projects v Oral presentation v Team Work, v Coding, Logistics 15
Workload and Grading • Grading – Projects (40%) • Project 1 (10%) • Project 2 (30%) – Final reports in the discussion forum (by 11:59pm 12/13); – Self-and-peer evaluation form for project 2 (by 11:59PM 12/13); – Written work (30%): • Critiques + Project reports (20%) • Quiz (10%, with 5% each) – Oral work (30%): • Presentation
Models and Algorithms Data Scientist t f r N t w 11 b 1 Features Regions w' 11 Regions Time slots X Regions A f g ɵ N a d v Y b' 1 w 1 X = R × U Y = T × R T Categories f p w Categories Categories N p α v Z c x b'' Y t- 1 Y t Y t- 1 w r b' r w pq w' qr Problems b q Data ANN F m ( t -1) F t ( t -1) F h ( t -1) t -1 F m ( t ) F t ( t ) F h ( t ) t F m ( t +1) F t ( t +1) F h ( t +1) t +1
Next Session: Final Project Presentation v 12/15 R v 22 min each team (including Q&A) v Team 1 v Team 2 v Team 3 v Team 4 v Team 5 v Team 6 v Snacks and Drinks will be provided. 18
Want to learn more? Future Opportunities. 19
Spring 2017 • DS595/CS525 Special Topics in DS/CS, • Urban Computing, applications and methodologies
Urban Computing Research Group at WPI • Hub-and-Spoke Urban Transportation
Urban Computing Research Group at WPI • Most influential k-location Mining u 1 Tr 1 Tr 2 n 1 u 3 Tr 3 u 2 n 2 c 3 c 1 c 2 g 2 n 3 g 1 g 1 c 5 Tr 4 c 4 g 3 u 4 Tr 5 (a) EV Charging Station (b) Advertisement (c) Observation Station Placement Placement Placement
Urban Computing Research Group at WPI • Human-in-Loop Urban Computing
Research opportunities are available in my group. Contact: yli15@wpi.edu website: http://wpi.edu/~yli15/ index.html 24
Recommend
More recommend