outline
play

Outline Motivation & Goal Framework & Design Examples - PowerPoint PPT Presentation

Hazy Lixing Lian, Cheng Ren Outline Motivation & Goal Framework & Design Examples Future Work Conclusion Two Trends that Drive Hazy Data in a large number of formats - (text, audio, video,


  1. Hazy � Lixing Lian, Cheng Ren �

  2. Outline � ¤ Motivation & Goal ¤ Framework & Design ¤ Examples ¤ Future Work ¤ Conclusion

  3. Two Trends that Drive Hazy � ¤ Data in a large number of formats - (text, audio, video, OCR, sensor data, etc.) ¤ Arms race to deeply understand data � Statistical tools attack both 1. and 2. � Hazy = statistical + data management �

  4. Hazy’s Thesis � ¤ The next breakthrough in data analysis - may not be a new data analysis algorithm… - …but may be in the ability to rapidly combine, deploy, and maintain existing algorithms. �

  5. � Hazy’s Goal � ¤ Making big-data analytics-driven systems easier to build and maintain. ¤ Find common patterns when deploying statistical tools on data. - Programming abstractions - Infrastructure abstractions

  6. � Programming abstractions ¤ Enable developers to try many algorithms for the same data set. ¤ One algorithm improves, all applications using that algorithm automatic improve.

  7. Infrastructure abstractions � ¤ No need to reinvent or reengineer the wheel when adding a new algorithm to the system ¤ One component of the infrastructure improved, all algorithms benefit automatically �

  8. Markov logic � ¤ Easily represent common statistical models : logistic regression and conditional random fields ¤ Build more sophisticated statistical models �

  9. Markov Logic by Example �

  10. � Markov Logic by Example � wrote(s, t) ∧ advisedBy(s, p) - > wrote(p,t) � Step 1: Grounding � wrote(Tom, P1), advisedBy(Tom, Jerry) - > wrote (Jerry, P1) wrote(Tom, P1), advisedBy(Tom, Bob) - > wrote (Bob, P1) wrote(Bob, P1), advisedBy(Bob, Jerry) - > wrote (Jerry, P1) � advisee � advisor Find the field Tom Jerry � and extract data � Step 2: Sampling � Tom � Bob �

  11. Grounding via SQL in Tuffy � Program Transformed into many SQL queries (Bottom-up) � wrote(s, t) ∧ advisedBy(s, p) - > wrote(p,t) � SELECT w1.id, a.id, w2.id FROM wrote w1, advisedBy a, wrote w2 WHERE w1.person = a.advisee AND w1.paper = w2.paper AND a.advisor = w2.person AND … �

  12. Grounding: Top-down vs. Bottom-up �

  13. Example 1: DeepDive � ¤ Enrich Wikipedia with structured data that is extracted from both unstructured sources �

  14. DeepDive � DeepDive’s Origin � ¤ Build a system that is able to read the Web and answer questions. ¤ Machine Reading: “List members of the Brazilian Olympic Team in this corpus with years of membership” �

  15. DeepDive �

  16. DeepDive �

  17. DeepDive � Given a name, collects all the information related to this name and display together. �

  18. DeepDive � Demo � ¤ Wikipedia : http://en.wikipedia.org/wiki/ Barack_Obama ¤ WiscI : http://research.cs.wisc.edu/hazy/ wikidemo/index.php/Barack_Obama ¤ DeepDive : http://research.cs.wisc.edu/hazy/ demos/deepdive/index.php/Barack_Obama

  19. DeepDive: Demo � Tasks it performs: Some Information: • Web Crawling • 50TB Data • Information Extraction • 500K Machine hours • Deep Linguistic Processing • 500M Webpages • Audio/Video Transcription • 400K Videos • Tera-byte Parallel Joins � • 7Bn Entity Mentions • 114M Relationship Mentions � Declare graphical models at Web scale �

  20. Example 2 : GeoDeepDive � ¤ http://hazy.cs.wisc.edu/demo/geo/ ¤ The goal is to help geo-scientists extract data that is buried in the text, tables, and figures of journal articles and web sites, sometimes called dark data. ¤ Extends a database called Macrostrat. �

  21. Future work � ¤ Assisted Development - expertise, experience of data and algorithms ¤ New Data Platforms - Hadoop environment �

  22. Conclusion � ¤ Key technical hypothesis: A large fraction of the processing performed by applications that use and analyze these new sources of data can be captured using a small handful of primitives . ¤ Hazy group is building several applications ¤ More information: http://hazy.cs.wisc.edu/hazy/ �

  23. Question? �

Recommend


More recommend