Outline Motivation & Goal Framework & Design Examples - PowerPoint PPT Presentation

Hazy � Lixing Lian, Cheng Ren �

Outline � ¤ Motivation & Goal ¤ Framework & Design ¤ Examples ¤ Future Work ¤ Conclusion

Two Trends that Drive Hazy � ¤ Data in a large number of formats - (text, audio, video, OCR, sensor data, etc.) ¤ Arms race to deeply understand data � Statistical tools attack both 1. and 2. � Hazy = statistical + data management �

Hazy’s Thesis � ¤ The next breakthrough in data analysis - may not be a new data analysis algorithm… - …but may be in the ability to rapidly combine, deploy, and maintain existing algorithms. �

� Hazy’s Goal � ¤ Making big-data analytics-driven systems easier to build and maintain. ¤ Find common patterns when deploying statistical tools on data. - Programming abstractions - Infrastructure abstractions

� Programming abstractions ¤ Enable developers to try many algorithms for the same data set. ¤ One algorithm improves, all applications using that algorithm automatic improve.

Infrastructure abstractions � ¤ No need to reinvent or reengineer the wheel when adding a new algorithm to the system ¤ One component of the infrastructure improved, all algorithms benefit automatically �

Markov logic � ¤ Easily represent common statistical models : logistic regression and conditional random fields ¤ Build more sophisticated statistical models �

Markov Logic by Example �

� Markov Logic by Example � wrote(s, t) ∧ advisedBy(s, p) - > wrote(p,t) � Step 1: Grounding � wrote(Tom, P1), advisedBy(Tom, Jerry) - > wrote (Jerry, P1) wrote(Tom, P1), advisedBy(Tom, Bob) - > wrote (Bob, P1) wrote(Bob, P1), advisedBy(Bob, Jerry) - > wrote (Jerry, P1) � advisee � advisor Find the field Tom Jerry � and extract data � Step 2: Sampling � Tom � Bob �

Grounding via SQL in Tuffy � Program Transformed into many SQL queries (Bottom-up) � wrote(s, t) ∧ advisedBy(s, p) - > wrote(p,t) � SELECT w1.id, a.id, w2.id FROM wrote w1, advisedBy a, wrote w2 WHERE w1.person = a.advisee AND w1.paper = w2.paper AND a.advisor = w2.person AND … �

Grounding: Top-down vs. Bottom-up �

Example 1: DeepDive � ¤ Enrich Wikipedia with structured data that is extracted from both unstructured sources �

DeepDive � DeepDive’s Origin � ¤ Build a system that is able to read the Web and answer questions. ¤ Machine Reading: “List members of the Brazilian Olympic Team in this corpus with years of membership” �

DeepDive �

DeepDive � Given a name, collects all the information related to this name and display together. �

DeepDive � Demo � ¤ Wikipedia : http://en.wikipedia.org/wiki/ Barack_Obama ¤ WiscI : http://research.cs.wisc.edu/hazy/ wikidemo/index.php/Barack_Obama ¤ DeepDive : http://research.cs.wisc.edu/hazy/ demos/deepdive/index.php/Barack_Obama

DeepDive: Demo � Tasks it performs: Some Information: • Web Crawling • 50TB Data • Information Extraction • 500K Machine hours • Deep Linguistic Processing • 500M Webpages • Audio/Video Transcription • 400K Videos • Tera-byte Parallel Joins � • 7Bn Entity Mentions • 114M Relationship Mentions � Declare graphical models at Web scale �

Example 2 : GeoDeepDive � ¤ http://hazy.cs.wisc.edu/demo/geo/ ¤ The goal is to help geo-scientists extract data that is buried in the text, tables, and figures of journal articles and web sites, sometimes called dark data. ¤ Extends a database called Macrostrat. �

Future work � ¤ Assisted Development - expertise, experience of data and algorithms ¤ New Data Platforms - Hadoop environment �

Conclusion � ¤ Key technical hypothesis: A large fraction of the processing performed by applications that use and analyze these new sources of data can be captured using a small handful of primitives . ¤ Hazy group is building several applications ¤ More information: http://hazy.cs.wisc.edu/hazy/ �

Question? �

Outline Motivation & Goal Framework & Design Examples - PowerPoint PPT Presentation

Hazy Lixing Lian, Cheng Ren Outline Motivation & Goal Framework & Design Examples Future Work Conclusion Two Trends that Drive Hazy Data in a large number of formats - (text, audio, video,

Ins Domingues Breast Cancer Workshop April 7th 2015 Outline Outline Outline Outline

Presentation Preparation Outline Speech Outline Template ***Use this outline to guide you in

Outline for St Outline for St Outline for

Beob Kyun Kim, S oonwook Hwang {kyun, hwang}@ kisti.re.kr KIS TI, Korea Outline Outline

Catherine Revels, World Bank November 2009 Presentation outline Presentation outline

Battlestar Galactica Battlestar Galactica Galactica Battlestar Outline Outline Outline

Outline 2 Outline 2 ZSim core simulation techniques Outline 2 ZSim core simulation

Appendix J: Capstone Presentation Outline Revised Spring 2016 CAPSTONE PRESENTATION OUTLINE This

PT1 TMP Presentation Outline 1 Group Members: ___________________________________ Use this outline

Broverview Outline 2 Outline Philosophy and Architecture A framework for network traffic

Xingqian Peng, Huaqiao University, China Presented by Zhen Wu Presented by Zhen Wu October 30,2011

1 Web Application Development 2 3 Web Application Development CSS Outline An outline is a

Lecture Outline Strengthening Induction Hypothesis. Lecture Outline Strengthening Induction

STAT 213 Simple Linear Regression I Colin Reimer Dawson Oberlin College 5 October 2016 Outline

High Dimensional Approximation - Outline Background and Sources Wolfgang Dahmen Seminar: USC,

Outline Outline Deaf and Hearing Impaired Deaf and Hearing Impaired Physical Structures of

PRECISION CALCULATIONS FOR FCC-ee selected examples on ( Z ) , ( W ) and Higgs production,

Core Grammatical Evolution Generated Parallel Recursive Programs Gopinath Chennupati, R. Muhammad

The High Performance Solution of Sparse Linear Systems and its application to large 3D

High Performance and Energy Efficient Machine Learning Accelerators and Variable Precision

IC3D 2016 Towards an Interactive Navigation in Large Virtual Microscopy Images on 3D Displays J.

Mathematical Expressions Return to Table of Contents Slide 5 / 185 Expressions Algebra

Towards Pseudometric Graded Semantics Paul Wild Friedrich-Alexander-Universitt

Announcements Wednesday, November 01 WeBWorK 3.1, 3.2 are due today at 11:59pm. The quiz