building mashups by example
play

Building Mashups by Example Rattapoom Tuchinda Doctoral Defense - PowerPoint PPT Presentation

Building Mashups by Example Rattapoom Tuchinda Doctoral Defense July 22, 2008 1 Whats a Mashup? A website or application that combines content from more than one source into an integrated experience [wikipedia] a) LA crime map b) zillow.com


  1. Building Mashups by Example Rattapoom Tuchinda Doctoral Defense July 22, 2008 1

  2. What’s a Mashup? A website or application that combines content from more than one source into an integrated experience [wikipedia] a) LA crime map b) zillow.com c) Ski bonk - � - � - � Crime Report from Real Estate Listing Weather different counties -Property Tax -Snow Report - � Map -Snow Resorts Combined Data gives new insight / provides new services 2 Introduction • Approach Evaluation Related Work Conclusion • • •

  3. Statistics and Trends Survey of top 50 Mashups • � Divide into five categories based on programming structures • � Focus of this thesis is on the first four categories which account for 47% of the most popular Mashups 3 Introduction • Approach Evaluation Related Work Conclusion • • •

  4. Mashup Building Issues Data Wrapper Wrapper Retrieval Calibration Attribute Attribute -source modeling -cleaning Clean Clean Combine Integration Display Customize Display 4 Introduction • Approach Evaluation Related Work Conclusion • • •

  5. Type 1: One Simple Source Data Wrapper Retrieval Calibration Attribute -source modeling -cleaning Clean Display Customize Display 5 Introduction • Approach Evaluation Related Work Conclusion • • •

  6. Type 2: Union Data Wrapper Wrapper Retrieval Calibration Attribute Attribute -source modeling -cleaning Clean Clean Union Integration Display Customize Display 6 Introduction • Approach Evaluation Related Work Conclusion • • •

  7. Type 3: One Source with Form Data Wrapper Retrieval Calibration Attribute -source modeling -cleaning Clean Display Customize Display 7 Introduction • Approach Evaluation Related Work Conclusion • • •

  8. Type 4: Database Join Data Wrapper Wrapper Retrieval Calibration Attribute Attribute -source modeling -cleaning Clean Clean Join Integration Display Customize Display 8 Introduction • Approach Evaluation Related Work Conclusion • • •

  9. Type 5: Customized Display Data Wrapper Wrapper Retrieval Calibration Attribute Attribute -source modeling -cleaning Clean Clean Combine Integration Display Customize Display 9 Introduction • Approach Evaluation Related Work Conclusion • • •

  10. Existing Approaches Goal : Create Mashups without Programming • � Doesn’t translate to not having to understand programming. Widget Paradigm - � Widgets (i.e., 43 for Pipes, 300+ for MS) represents an operation on the data. - � Locating and learning to customize widget can be time consuming - � Most tools focus on particular Yahoo’s Pipes issues and ignore others. Can we come up with a framework that addresses all of the issues while still making the Mashup building process easy? 10 Introduction • Approach Evaluation Related Work Conclusion • • •

  11. Thesis Statement Web users can build Mashups effectively using an integrated framework that lets them solve the problems of data extraction, source modeling, data cleaning, and data integration by specifying examples instead of programming operations. 11 Introduction • Approach Evaluation Related Work Conclusion • • •

  12. Contributions • � A programming by demonstration approach that uses a single table for building a Mashup • � An integrated approach that links data extraction, source modeling, data cleaning, and data integration together. • � A query formulation technique that allows users to specify examples to build complicated queries. 12 Introduction • Approach Evaluation Related Work Conclusion • • •

  13. Key Ideas • � Focus on data, not operations – � Users are more familiar with data. • � Leverage existing database – � Help source modeling, cleaning, and data integration. • � Consolidate as opposed to Divide-And-Conquer – � Solving a problem in one issue can help solve another issue. – � Interacting within a single spreadsheet platform 13 Introduction • Approach Evaluation Related Work Conclusion • • •

  14. Embedded Browser Our system: Karma 14 Introduction • Approach Evaluation Related Work Conclusion • • •

  15. Embedded Browser Our system: Karma 15 Introduction • Approach Evaluation Related Work Conclusion • • •

  16. Our system: Karma Embedded Browser Table 16 Introduction • Approach Evaluation Related Work Conclusion • • •

  17. Our system: Karma Embedded Browser Table Interaction Modes 17 Introduction • Approach Evaluation Related Work Conclusion • • •

  18. Extract Extract {Restaurant name, address, phone, {Restaurant name, address, Date of Review} Inspection, Score} Clean Clean {Restaurant name, address, phone, review, Date of Inspection, Score} Map 18 Introduction • Approach Evaluation Related Work Conclusion • • •

  19. Extract Extract {Restaurant name, address, phone, {Restaurant name, address, Date of Review} Inspection, Score} Clean Clean {Restaurant name, address, phone, review, Date of Inspection, Score} Map 19 Introduction • Approach Evaluation Related Work Conclusion • • •

  20. Extract {Restaurant name, address, phone, Review} Database Clean {Restaurant name, address, phone, review, Date of Inspection, Score} Map 20 Introduction • Approach Evaluation Related Work Conclusion • • •

  21. Extract {Restaurant name, address, phone, Review} Database Clean {Restaurant name, address, phone, review, Date of Inspection, Score} Map 21 Introduction • Approach Evaluation Related Work Conclusion • • •

  22. Database contains past Extract Mashups and data tables {Restaurant name, address, phone, Review} Database Clean {Restaurant name, address, phone, review, Date of Inspection, Score} Map 22 Introduction • Approach Evaluation Related Work Conclusion • • •

  23. Data Retrieval: Extraction TBODY Tbody/tr[1]/td[2]/a tr tr td td td td 1. 2. a br br a br br Japon Bistro Hokusai 970 E Colora.. 8400 Wilshir. Upscale yet affordabl.. Chic elegance….. 23 Introduction • Approach Evaluation Related Work Conclusion • • •

  24. Data Retrieval: Extraction TBODY Tbody/tr[1]/td[2]/a tr tr td td td td Tbody/tr*/td*/a 1. 2. a br br a br br Japon Bistro Hokusai 970 E Colora.. 8400 Wilshir. Upscale yet affordabl.. Chic elegance….. 24 Introduction • Approach Evaluation Related Work Conclusion • • •

  25. Data Retrieval: Navigation TBODY tr tr td td td td 1. 2. a br br a br br Japon Bistro Hokusai 970 E Colora.. 8400 Wilshir. Upscale yet affordab Chic elegance… 25 Introduction • Approach Evaluation Related Work Conclusion • • •

  26. Data Retrieval: Navigation TBODY tr tr td td td td 1. 2. a br br a br br Japon Bistro Hokusai 970 E Colora.. 8400 Wilshir. Upscale yet affordab Chic elegance… 26 Introduction • Approach Evaluation Related Work Conclusion • • •

  27. Data Retrieval: Navigation TBODY tr tr td td td td 1. 2. a br br a br br Japon Bistro Hokusai 970 E Colora.. 8400 Wilshir. Upscale yet affordab Chic elegance… 27 Introduction • Approach Evaluation Related Work Conclusion • • •

  28. Source Modeling (Attribute selection) LA Health Rating restaurant Address Health .. name Rating Hokusai 8400.. .. 90 Katana 8439.. .. 99 Newly extracted data Japon 927 E.. .. 95 Japon Bistro Bistro Hokusai Artist Info artist nationality .. .. Sushi name Sasabune Possible Attribute Hokusai Japanese .. .. … Renoir French .. .. {a | a,s: a � att (s) � ( val (a,s) � V)} .. .. .. .. restaurant name (3) Zagat artist name (1) restaurant zagat .. .. name Rating Sushi 27 .. .. Sasabune Sushi 25 .. .. Roku Katana 23 .. .. Database 28 Introduction • Approach Evaluation Related Work Conclusion • • •

  29. Data Cleaning: using existing values Data repository LA Health Rating restaurant Address .. Health name Rating Newly extracted data Hokusai 8400.. .. 90 Japon Bistro Katana 8439.. .. 99 Hokusai Japon 927 E.. .. 95 Bistro Sushi Sasabune Zagat Sushi restaurant zagat .. .. Roka name Rating Sushi 27 .. .. Sasabune Restaurant name Sushi 25 .. .. Roku Katana 23 .. .. 29 Introduction • Approach Evaluation Related Work Conclusion • • •

  30. Data Cleaning: using existing values Data repository LA Health Rating restaurant Address .. Health name Rating Newly extracted data Hokusai 8400.. .. 90 Japon Bistro Katana 8439.. .. 99 Hokusai Japon 927 E.. .. 95 Bistro Sushi Sasabune Zagat Sushi restaurant zagat .. .. Roka name Rating Sushi 27 .. .. Sasabune Restaurant name Sushi 25 .. .. Roku Katana 23 .. .. 30 Introduction • Approach Evaluation Related Work Conclusion • • •

  31. Data Cleaning: using predefined rules 31 Reviews � 31 Subset Rule: (s 1 s 2 ..s k ) � (d 1 d 2 …d t ) � (k <= t) � . s i � {d 1 ,d 2 ,…,d t } � Predefined d i � d j Rules . . 31 Introduction • Approach Evaluation Related Work Conclusion • • •

  32. Data Cleaning: using predefined rules 31 Reviews � 31 Subset Rule: (s 1 s 2 ..s k ) � (d 1 d 2 …d t ) � (k <= t) � . s i � {d 1 ,d 2 ,…,d t } � Predefined d i � d j Rules . . 32 Introduction • Approach Evaluation Related Work Conclusion • • •

  33. Data Integration [tuchinda 2007] 33 Introduction • Approach Evaluation Related Work Conclusion • • •

Recommend


More recommend