pick up a handout on the front table
play

Pick up a handout on the front table 1 Welcome to DS504/CS586: - PowerPoint PPT Presentation

Pick up a handout on the front table 1 Welcome to DS504/CS586: Big Data Analytics --Review Prof. Yanhua Li Time: 6:00pm 8:50pm R Location: AK232 Fall 2016 Today 1. Review Key topics, techniques, discussed in the semester 2.


  1. Pick up a handout on the front table 1

  2. Welcome to DS504/CS586: Big Data Analytics --Review Prof. Yanhua Li Time: 6:00pm –8:50pm R Location: AK232 Fall 2016

  3. Today • 1. Review – Key topics, techniques, discussed in the semester • 2. Future opportunities – Big data analytics – Urban Computing 10 min Break • 3. Team 1 presentation • 4. Course evaluation • 5. Group discussion for final projects

  4. Introduction What is “Big Data”? 4

  5. Big Data Analytics techniques and tools for managing, analyzing and extracting knowledge from “big data” 5

  6. CS586/DS504-2016Fall 5. Applications Techniques Sampling and index Urban Computing, Social Network Analysis Networking 1. Graph Mining 3. Index, Query 4. Big Data Mining 4. Data Collection Graph Mining, Data Clustering Recommender systems, Deep Learning Clustering 4. K-means, DBSCAN 3. Data Management 4. BFR, DENCLUE Indexing, Query Processing 4. Trajectory Clustering 5. Urban: Bike sharing 2. Data Preprocessing/Cleaning Error Correction, Map-Matching More techniques 2. Map-Matching 1. Data Acquisition & Measurement 4. Recommender Systems 4. Deep Learning (Guest) Representative data collection: Sampling

  7. Big Data Mining Topics Topics in Big Data Mining 1 Graph Mining : 3 Recommender Systems Content-Based Graph Sampling Collaborative Filtering Node Importance Ranking User-User Based Facebook/Social graph estimation Item-Item Based Social influence Location-based recommender sys Topic sensitive PageRank Personalized Geo-Social Recom. 2 Clustering Hierarchical 4. Deep Learning K-means, BFR Deep Neural Networks DBScan, DENCLUE Alpha Go Trajectory clustering

  8. Roadmap • 1. Sampling & Indexing – Random prefix/region/zoomin/region sampling – Index structure: B-Tree, Quad-tree, R-tree, etc • 2. Clustering – Hirachical – K-means, DBScan • 3. Recommender System, Deep learning, Map-Matching, etc • 4. Applications

  9. Sampling Big Data 1.1 R andom sampling 1.2 c rawling (uniform & independent) } vertex sampling } BFS sampling } edge sampling } random walk sampling 9 9

  10. Class Outcomes 10

  11. What is DS504/CS586 about? v We’ll learn about – Advanced Techniques for Big Data Analytics • Large scale data sampling and estimation, • Data Cleaning, • Graph Data Mining, • Data management, clustering, etc. – Applications with Big Data Analytics • Urban Computing • Social network analysis • Recommender system, etc. v Learning outcomes – Understand & Explain challenges and advances in the state-of-art in big data analytics. – Design, develop and fully execute a big data analytics project . – Communicate the ideas effectively in the form of a presentation and written documents 11 to a technical audience.

  12. CS586/DS504-2016Fall 5. Applications Techniques Sampling and index Urban Computing, Social Network Analysis Networking 1. Graph Mining 3. Index, Query 4. Big Data Mining 4. Data Collection Graph Mining, Data Clustering Recommender systems, Deep Learning Clustering 4. K-means, DBSCAN 3. Data Management 4. BFR, DENCLUE Indexing, Query Processing 4. Trajectory Clustering 2. Data Preprocessing/Cleaning More techniques Error Correction, Map-Matching 2. Map-Matching 1. Data Acquisition & Measurement 4. Recommender Systems 4. Deep Learning (Guest) Representative data collection: Sampling

  13. Project 1 (Single Data Source) • T1: Allstate Claim Prediction Challenge • T2: Predicting YouTube 3D Videos Trends • T3: Sampling Method for Sum Aggregation of Point of Interests on Map • T4: Mining of Stack Overflow reviews for insights • T5: Measuring restaurant diversity index for different cities • T6: GitHub – Sizing up online social networks 13

  14. Project 2 (Heterogeneous Data) • T1: Restaurants Location Recommendation • T2: Online learning performance vs Offline Geographic Information • T3: Airbnb user behavior prediction • T4: Community detection in large networks • T5: Demand-Supply analysis on Regional Restaurant Distribution • T6: Social Network Marketing through Influence Prediction • Real application problems • Data collection/processing/management/mining/ evaluation/visualization/ 14

  15. Workload v Focus more on critical thinking, problem solving, “ heads-on/hands-on ” experiences! v Understand, formulate and solve problems v Read and critique research papers v Two Course Projects v Oral presentation v Team Work, v Coding, Logistics 15

  16. Workload and Grading • Grading – Projects (40%) • Project 1 (10%) • Project 2 (30%) – Final reports in the discussion forum (by 11:59pm 12/13); – Self-and-peer evaluation form for project 2 (by 11:59PM 12/13); – Written work (30%): • Critiques + Project reports (20%) • Quiz (10%, with 5% each) – Oral work (30%): • Presentation

  17. Models and Algorithms Data Scientist t f r N t w 11 b 1 Features Regions w' 11 Regions Time slots X Regions A f g ɵ N a d v Y b' 1 w 1 X = R × U Y = T × R T Categories f p w Categories Categories N p α v Z c x b'' Y t- 1 Y t Y t- 1 w r b' r w pq w' qr Problems b q Data ANN F m ( t -1) F t ( t -1) F h ( t -1) t -1 F m ( t ) F t ( t ) F h ( t ) t F m ( t +1) F t ( t +1) F h ( t +1) t +1

  18. Next Session: Final Project Presentation v 12/15 R v 22 min each team (including Q&A) v Team 1 v Team 2 v Team 3 v Team 4 v Team 5 v Team 6 v Snacks and Drinks will be provided. 18

  19. Want to learn more? Future Opportunities. 19

  20. Spring 2017 • DS595/CS525 Special Topics in DS/CS, • Urban Computing, applications and methodologies

  21. Urban Computing Research Group at WPI • Hub-and-Spoke Urban Transportation

  22. Urban Computing Research Group at WPI • Most influential k-location Mining u 1 Tr 1 Tr 2 n 1 u 3 Tr 3 u 2 n 2 c 3 c 1 c 2 g 2 n 3 g 1 g 1 c 5 Tr 4 c 4 g 3 u 4 Tr 5 (a) EV Charging Station (b) Advertisement (c) Observation Station Placement Placement Placement

  23. Urban Computing Research Group at WPI • Human-in-Loop Urban Computing

  24. Research opportunities are available in my group. Contact: yli15@wpi.edu website: http://wpi.edu/~yli15/ index.html 24

Recommend


More recommend