analytics building blocks
play

Analytics Building Blocks Duen Horng (Polo) Chau Assistant Professor - PowerPoint PPT Presentation

http://poloclub.gatech.edu/cse6242 CSE6242 / CX4242: Data & Visual Analytics Analytics Building Blocks Duen Horng (Polo) Chau Assistant Professor Associate Director, MS Analytics Georgia Tech Partly based on materials by


  1. http://poloclub.gatech.edu/cse6242 
 CSE6242 / CX4242: Data & Visual Analytics 
 Analytics Building Blocks Duen Horng (Polo) Chau 
 Assistant Professor 
 Associate Director, MS Analytics 
 Georgia Tech Partly based on materials by 
 Professors Guy Lebanon, Jeffrey Heer, John Stasko, Christos Faloutsos

  2. Collection Cleaning Integration Analysis Visualization Presentation Dissemination

  3. Building blocks. Not Rigid “Steps”. Collection Can skip some Cleaning Can go back (two-way street) • Data types inform visualization design Integration • Data size informs choice of algorithms Analysis • Visualization motivates more data cleaning Visualization • Visualization challenges algorithm Presentation assumptions 
 e.g., user finds that results don’t make sense Dissemination

  4. How “big data” affects the process? 
 (Hint: almost everything is harder!) The Vs of big data (3Vs originally, then 7, now 42) Collection Volume : “billions”, “petabytes” are common Cleaning Velocity : think Twitter, fraud detection, etc. Integration Variety : text (webpages), video (youtube)… Analysis Veracity : uncertainty of data Visualization Variability Visualization Presentation Value Dissemination http://www.ibmbigdatahub.com/infographic/four-vs-big-data 
 http://dataconomy.com/seven-vs-big-data/ https://tdwi.org/articles/2017/02/08/10-vs-of-big-data.aspx

  5. Two Example Projects 
 from Polo Club

  6. Apolo Graph Exploration: 
 Machine Learning + Visualization 
 Apolo: Making Sense of Large Network Data by Combining Rich User Interaction and Machine Learning . 
 Duen Horng (Polo) Chau, Aniket Kittur, Jason I. Hong, Christos Faloutsos. CHI 2011. 6

  7. 7

  8. Beautiful Hairball Death Star Spaghetti 7

  9. Finding More Relevant Nodes HCI Paper Data Mining 
 Paper Citation network 8

  10. Finding More Relevant Nodes HCI Paper Data Mining 
 Paper Citation network 8

  11. Finding More Relevant Nodes HCI Paper Data Mining 
 Paper Citation network Apolo uses guilt-by-association 
 (Belief Propagation) 8

  12. Demo : Mapping the Sensemaking Literature Nodes : 80k papers from Google Scholar (node size: #citation) Edges : 150k citations 9

  13. Key Ideas (Recap) Specify exemplars Find other relevant nodes (BP) 11

  14. What did Apolo go through? Scrape Google Scholar. No API. 😪 Collection Cleaning Integration Design inference algorithm 
 Analysis (Which nodes to show next?) Interactive visualization you just saw Visualization Paper, talks, lectures Presentation You will a new Apolo prototype 
 Dissemination (called Argo)

  15. Apolo: Making Sense of Large Network Data by Combining Rich User Interaction and Machine Learning . Duen Horng (Polo) Chau, Aniket Kittur, Jason I. Hong, Christos Faloutsos. 13 ACM Conference on Human Factors in Computing Systems (CHI) 2011 . May 7-12, 2011.

  16. NetProbe : 
 Fraud Detection in Online Auction NetProbe: A Fast and Scalable System for Fraud Detection in Online Auction Networks. Shashank Pandit, Duen Horng (Polo) Chau, Samuel Wang, Christos Faloutsos. WWW 2007

  17. NetProbe: The Problem Find bad sellers ( fraudsters ) on eBay who don’t deliver their items $$$ Buyer Seller Non-delivery fraud is a common auction fraud source: https://www.fbi.gov/contact-us/field-offices/portland/news/press-releases/fbi-tech-tuesday---building-a-digital-defense-against-auction-fraud 15

  18. 16

  19. NetProbe: Key Ideas § Fraudsters fabricate their reputation by “trading” with their accomplices § Fake transactions form near bipartite cores § How to detect them? 17

  20. NetProbe: Key Ideas Use Belief Propagation F A H Fraudster Darker means Accomplice more likely Honest 18

  21. NetProbe: Main Results 19

  22. 20

  23. 20

  24. “Belgian Police” 20

  25. 21

  26. What did NetProbe go through? Scraping (built a “scraper”/“crawler”) Collection Cleaning Integration Design detection algorithm Analysis Visualization Paper, talks, lectures Presentation Not released Dissemination

  27. NetProbe: A Fast and Scalable System for Fraud Detection in Online Auction Networks . Shashank Pandit, Duen Horng (Polo) Chau, Samuel Wang, Christos Faloutsos. International Conference on World Wide 23 Web (WWW) 2007 . May 8-12, 2007. Banff, Alberta, Canada. Pages 201-210.

  28. Homework 1 (out next week; tasks subject to change) Collection • Simple “End-to-end” analysis Cleaning • Collect data using Twitter API • Store in SQLite database Integration • Great graph from data Analysis • Analyze, using SQL queries (e.g., Visualization create graph’s degree distribution) Presentation • Visualize graph using Gephi • Describe your discoveries Dissemination

Recommend


More recommend