Interac(vely Building Geospa(al Mashups Craig A. Knoblock University of Southern California Work in collabora(on with Shubham Gupta, Pedro Szekely, and RaHapoom Tuchinda 1
MASHUPS • A website or application that combines content from more than one source into an integrated experience [wikipedia] a) LA crime map b) zillow.com c) Ski bonk - Crime Report from - Real Estate Listing - Weather different counties -Property Tax -Snow Report - Map -Snow Resorts Combined Data gives new insight / provides new services 2 Introduc(on • Approach • Evalua(on • Related Work • Conclusion
PROBLEM • Most Mashups require significant exper(se to create • Demand for crea(ng integrated applica(ons is huge • Every user has their own unique requirements for an integrated applica(on • Available sources and needs to integrated data con(nues to grow
MASHUP BUILDING ISSUES Data Wrapper Wrapper Retrieval Calibra(on AHribute AHribute ‐source modeling ‐cleaning Clean Clean Combine Integra(on Display Customize Display 4 Introduc(on • Approach • Evalua(on • Related Work • Conclusion
EXISTING APPROACHES Goal : Create Mashups without Programming Doesn’t translate to not having to understand programming • Widget Paradigm ‐ Widgets (i.e., 43 for Pipes, 300+ for MS) represents an opera(on on the data ‐ Loca(ng and learning to customize widget can be (me consuming ‐ Most tools focus on par(cular Yahoo’s Pipes issues and ignore others Can we come up with a framework that addresses all of the issues while s(ll making the Mashup building process easy? 5 Introduc(on • Approach • Evalua(on • Related Work • Conclusion
KEY CONTRIBUTIONS • A programming by demonstra(on approach that uses a single table for building a Mashup • An integrated approach that links data extrac(on, source modeling, data cleaning, and data integra(on together • A query formula(on technique that allows users to specify examples to build complicated queries 6 Introduc(on • Approach • Evalua(on • Related Work • Conclusion
KEY IDEAS • Focus on data, not opera(ons – Users are more familiar with data • Leverage exis(ng data – Help source modeling, cleaning, and data integra(on • Consolidate as opposed to Divide‐And‐Conquer – Solving a problem in one issue can help solve another issue – Interac(ng within a single spreadsheet pladorm 7 Introduc(on • Approach • Evalua(on • Related Work • Conclusion
KARMA USER INTERFACE Data Table – Spreadsheet Type Interface Various Informa(on Integra(on Opera(ons Data Source Types Currently Supported by Karma Introduc(on • Approach • Evalua(on • Related Work • Conclusion
INTEGRATION SCENARIO Injury sta(s(cs in Excel Spreadsheet Emergency Coordinator MySQL Evacua(on Centers CSV Database Extract Extract Extract {Date, Injuries, Fatali(es} {EvacCenter_ID, Address, City} {Name, City, Phone No.} Visualize as chart Clean Google News Website , Name, Phone No.} {EvacCenter_ID, Address, City Extract Visualize as {Headlines, Summary, Date, Link} MAP bulleted list Introduc(on • Approach • Evalua(on • Related Work • Conclusion
RETRIEVING DATA FROM DIVERSE SOURCES Karma facilitates retrieval of data from structured data‐sources, such as • Excel spreadsheets, MySQL databases and CSV files Karma also facilitates the extrac(on of data from semi‐structured data • sources such as web pages CSV Text File MySQL Database Excel Spreadsheet HTML Web Page Introduc(on • Approach • Evalua(on • Related Work • Conclusion
EXTRACTION BY EXAMPLE The retrieval of data from structured data‐sources, such as Excel sheets • and CSV files is done through a drag and drop mechanism The user is only required to select a sample data‐element and drop it into • Karma’s data table Introduc(on • Approach • Evalua(on • Related Work • Conclusion
EXTRACTION FROM THE WEB TBODY Tbody/tr[1]/td[2]/a tr tr td td td td Tbody/tr*/td*/a 1. 2. a br br a br br Japon Bistro Hokusai 970 E Colora.. 8400 Wilshir. Upscale yet affordabl.. Chic elegance….. 12 Introduc(on • Approach • Evalua(on • Related Work • Conclusion
EXTRACTION FROM THE WEB TBODY tr tr td td td td 1. 2. a br br a br br Japon Bistro Hokusai 970 E Colora.. 8400 Wilshir. Upscale yet affordab Chic elegance… 13 Introduc(on • Approach • Evalua(on • Related Work • Conclusion
EXPLOITING WRAPPER LIBRARIES Wrapper Library: Karma lists all the available wrappers on the local machine. Introduc(on • Approach • Evalua(on • Related Work • Conclusion
SOURCE MODELING Karma automa(cally generates the seman(c types of each aHribute to • learn the underlying model of the data source Supervised machine learning techniques are used to generate a set of • paHerns for each seman(c type from training data Ini(al Type Manually label the data with the correct seman(c type to train Karma When the new data is imported of same type, Karma automa(cally labels it correctly Introduc(on • Approach • Evalua(on • Related Work • Conclusion
LEARNING SEMANTIC TYPES Idea: Learn a model of the content of data and use it to recognize new examples :StreetAddress: :Email: 4DIG CAPS Rd ALPHA@ALPHA.edu 3DIG N CAPS Ave ALPHA@ALPHA.com … … :State: :Telephone: CA (3DIG) 3DIG-4DIG 2UPPER +1 3DIG 2DIG 4DIG … … Background Patterns learn knowledge label
DATA CLEANING Karma performs the data cleaning by learning and applying the • transforma(on rules that are learned from examples User provides example Ini(al data source Data source auer cleaning Karma learns a transforma(on rule and applies to remaining data Introduc(on • Approach • Evalua(on • Related Work • Conclusion
DATA CLEANING: PREDEFINED TRANSFORMATIONS 31 Reviews → 31 Subset Rule: (s 1 s 2 ..s k ) → (d 1 d 2 …d t ) ∧ (k <= t) ∧ . s i ∈ {d 1 ,d 2 ,…,d t } ∧ Predefined d i ≠ d j Rules . . 18 Introduc(on • Approach • Evalua(on • Related Work • Conclusion
DATA INTEGRATION Karma discovers the related sources by detec(ng and ranking associa(ons • based on the common aHribute names and matching seman(c types Karma suggests poten(al joins between the current data sources in the • form of column comple(ons Introduc(on • Approach • Evalua(on • Related Work • Conclusion
USER SELECTS FROM COLUMN COMPLETIONS Karma suggests the possible column comple(ons in a drop down list MySQL Database loaded as a another source in Karma Karma executes the join query once the user selects an op(on Introduc(on • Approach • Evalua(on • Related Work • Conclusion
DATA VISUALIZATION • Visualiza(on by demonstra(on approach – The user demonstrates to Karma the kind of visualiza(on desired for the data specified through examples using a drag and drop mechanism Introduc(on • Approach • Evalua(on • Related Work • Conclusion
DATA VISUALIZATION Karma currently supports four types of visualiza(on formats: 1. Chart Format: Useful for visualizing numerical sta(s(cs, (me based events etc 2. Paragraph Format: Useful for visualizing descrip(ve text data such as Wikipedia defini(ons Introduc(on • Approach • Evalua(on • Related Work • Conclusion
DATA VISUALIZATION 3. List Format: Useful for visualizing informa(on in a bulleted list such as list of summarized news ar(cles 4. Table Format: Useful for visualizing informa(on that is best presented in a row‐and‐column format such as numerical values etc Introduc(on • Approach • Evalua(on • Related Work • Conclusion
RESULTS CAN BE PUBLISHED IN MULTIPLE FORMATS Karma lets you export your final mashup in variety of formats: • ‐ HTML Page ‐ Database table ‐ KML Layer ‐ XML File ‐ CSV Text File Different mashup publishing op(ons Introduc(on • Approach • Evalua(on • Related Work • Conclusion
AUTOMATICALLY FINDS GEOSPATIAL REFERENCES Final mashup output in HTML web page format: • ‐ Karma iden(fies geospa(al informa(on in the current data with the help of geographic seman(c types such as PR‐Address, PR‐La(tude etc ‐ The Google geocoding service is used to find the coordinates for a given address ‐ Karma uses the coordinates informa(on to place the markers in the final mashup Poten(al geographic informa(on Op(ons to publish mashup as HTML web page Introduc(on • Approach • Evalua(on • Related Work • Conclusion
CONSTRUCTS A MAP WITH USER‐DEFINED LAYOUT Final mashup as a HTML web page: • Introduc(on • Approach • Evalua(on • Related Work • Conclusion
RESULTS CAN BE EXPORTED AS KML Final mashup output as a KML layer • Op(ons to publish mashup as KML layer Introduc(on • Approach • Evalua(on • Related Work • Conclusion
KML LAYERS CAN BE OPENED IN GOOGLE EARTH The generated KML layer can be viewed in a GIS souware such as Google Earth Introduc(on • Approach • Evalua(on • Related Work • Conclusion
Recommend
More recommend