learning semantic definitions learning semantic
play

Learning Semantic Definitions Learning Semantic Definitions for - PowerPoint PPT Presentation

Doctoral Thesis: Doctoral Thesis: Learning Semantic Definitions Learning Semantic Definitions for Information Sources on the Internet for Information Sources on the Internet Mark James Carman Mark James Carman Advisors: Advisors: Prof.


  1. Doctoral Thesis: Doctoral Thesis: Learning Semantic Definitions Learning Semantic Definitions for Information Sources on the Internet for Information Sources on the Internet Mark James Carman Mark James Carman Advisors: Advisors: Prof. Paolo Traverso Traverso Prof. Paolo Prof. Craig A. Knoblock Prof. Craig A. Knoblock

  2. Motivation Approach Search Scoring Extensions Experiments Related Work Conclusions Abundance of Information Sources Abundance of Information Sources a s e e B o g l G o l s o t e H Weather Realtime Conditions Stock Quote t e l H o l s e a D Exchange Rates Package i t y o c e l Deals a v T r Earthquake Currency e s a r i r f A Data Rates Orbitz Stock Travel Deals Cheap Tsunami Quotes Flights Warnings! t o o g h Y a h F l i s a r C e d U s d s e d s i f i e u s s i f i a s s t a t C l a C l S e ! a l r S Last Minute f o g s s t i n L i Flights s a r w C e N l e ! Weather S a o r f Forecasts 24 April 2007 Thesis Defense - - Mark James Carman Mark James Carman 2 24 April 2007 Thesis Defense 2

  3. Motivation Approach Search Scoring Extensions Experiments Related Work Conclusions Bringing the Data Together Bringing the Data Together a s e e B o g l G o l s o t e H t e l H o l s e a D Exchange Rates Package i t y o c e l Deals a v T r e s a r i r f A Orbitz Travel Deals Cheap Flights t g h F l i s u t a t S Last Minute Flights Weather Forecasts 24 April 2007 Thesis Defense - - Mark James Carman Mark James Carman 3 24 April 2007 Thesis Defense 3

  4. Motivation Approach Search Scoring Extensions Experiments Related Work Conclusions Bringing the Data Together Bringing the Data Together s e B a g l e o o G s i t y o t e l o c Exchange H v e l a T r e s f a r A i r Rates e l o t H Package s e a l D Deals Cheap Flights Orbitz Travel Deals Last Minute t g h F l i Flights s t u t a S Weather Forecasts 24 April 2007 Thesis Defense - - Mark James Carman Mark James Carman 4 24 April 2007 Thesis Defense 4

  5. Motivation Approach Search Scoring Extensions Experiments Related Work Conclusions Mediators resolve Heterogeneity Mediators resolve Heterogeneity s e B a g l e o o G s i t y o t e l o c Exchange H v e l a T r Mediator e s f a r A i r Rates e l o t H Package s e a l D Deals Cheap Flights Orbitz Travel Deals Last Minute t g h F l i Flights s t u t a S Weather Forecasts 24 April 2007 Thesis Defense - - Mark James Carman Mark James Carman 5 24 April 2007 Thesis Defense 5

  6. Motivation Approach Search Scoring Extensions Experiments Related Work Conclusions Require Source Definitions Mediators Require Source Definitions Mediators � New service = > no source definition! New service = > no source definition! � � Can we discover a definition automatically? Can we discover a definition automatically? � Orbitz Flight ) ” P X M “ ” , X A Search L “ ( e r a F t s e Mediator w Reformulated Query o l Query Reformulated Query United SELECT MIN(price) Airlines calcPrice(“LAX”,“MXP”,”economy”) FROM flight R e WHERE depart=“LAX” f o r m u l a Qantas t AND arrive=“MXP” e d Q u e Specials r y Source Definitions: a a l i A l i t - Orbitz Flight Search - United Airlines - Qantas Specials 24 April 2007 Thesis Defense - - Mark James Carman Mark James Carman 6 24 April 2007 Thesis Defense 6

  7. Motivation Approach Search Scoring Extensions Experiments Related Work Conclusions Inducing Source Definitions by Example Inducing Source Definitions by Example source1($zip, lat, long) :- centroid(zip, lat, long). Known Known Known Source 1 Source 2 Source 3 source2($lat1, $long1, $lat2, $long2, dist) :- greatCircleDist(lat1, long1, lat2, long2, dist). source3($dist1, dist2) :- convertKm2Mi(dist1, dist2). � Step 1: classify input & Step 1: classify input & � w N e e 4 u r c S o output semantic types output semantic types s a h zipcode distance m e l b o r p s i h t e ! d m source4( $startZip, $endZip, separation) e u v s l o s A s n e e b 24 April 2007 Thesis Defense - - Mark James Carman Mark James Carman 7 24 April 2007 Thesis Defense 7

  8. Motivation Approach Search Scoring Extensions Experiments Related Work Conclusions Inducing Source Definitions - - Step 2 Step 2 Inducing Source Definitions source1($zip, lat, long) :- centroid(zip, lat, long). Known Known Known Source 1 Source 2 Source 3 source2($lat1, $long1, $lat2, $long2, dist) :- greatCircleDist(lat1, long1, lat2, long2, dist). source3($dist1, dist2) :- convertKm2Mi(dist1, dist2). source4($zip1, $zip2, dist):- source1(zip1, lat1, long1), � Step 1: classify input & Step 1: classify input & � e w N source1(zip2, lat2, long2), 4 c e o u r S output semantic types output semantic types source2(lat1, long1, lat2, long2, dist2), � Step 2: generate Step 2: generate � source3(dist2, dist). plausible definitions plausible definitions source4($zip1, $zip2, dist):- source4( $zip1, $zip2, dist) centroid(zip1, lat1, long1), centroid(zip2, lat2, long2), greatCircleDist(lat1, long1, lat2, long2, dist2), convertKm2Mi(dist1, dist2). 24 April 2007 Thesis Defense - - Mark James Carman Mark James Carman 8 24 April 2007 Thesis Defense 8

  9. Motivation Approach Search Scoring Extensions Experiments Related Work Conclusions Inducing Source Definitions – – Step 3 Step 3 Inducing Source Definitions source4($zip1, $zip2, dist):- � Step 1: classify input & Step 1: classify input & � source1(zip1, lat1, long1), output semantic types output semantic types source1(zip2, lat2, long2), � Step 2: generate Step 2: generate � source2(lat1, long1, lat2, long2, dist2), plausible definitions plausible definitions source3(dist2, dist). source4($zip1, $zip2, dist):- � Step 3: invoke service Step 3: invoke service � centroid(zip1, lat1, long1), & compare output & compare output centroid(zip2, lat2, long2), match greatCircleDist(lat1, long1, lat2, long2, dist2), convertKm2Mi(dist1, dist2). $zip1 $zip2 dist dist $zip1 $zip2 dist dist (predicted) (predicted) (actual) (actual) 80210 90266 842.37 843.65 60601 15201 410.31 410.83 10005 35555 899.50 899.21 24 April 2007 Thesis Defense - - Mark James Carman Mark James Carman 9 24 April 2007 Thesis Defense 9

  10. Motivation Approach Search Scoring Extensions Experiments Related Work Conclusions Overlapping Data Requirement Overlapping Data Requirement � Assumption: overlap between new & known sources Assumption: overlap between new & known sources � � Nonetheless, the technique is widely applicable: Nonetheless, the technique is widely applicable: � r g b e m o o B l Yahoo c y � Redundancy Redundancy e n u r r C � Exchange e s a t R Rates d e w i l d o r W � Scope or Completeness Scope or Completeness � US Hotel l s e a D t e l H o Rates e l s o t H � Binding Constraints Binding Constraints 5 * � Hotels By t e S t a y B Zipcode c e a n s t D i � Composed Functionality Composed Functionality � n Great Circle e e w Centroid B e t s d e of Zipcode o Distance p c Z i e g l o o G � Access Time Access Time � e l o t H Government c h a r S e Hotel List 24 April 2007 Thesis Defense - - Mark James Carman Mark James Carman 10 24 April 2007 Thesis Defense 10

  11. Motivation Approach Search Scoring Extensions Experiments Related Work Conclusions Searching for Definitions Searching for Definitions Expressive Language � Search space of Search space of conjunctive queries: conjunctive queries: � Sufficient for modeling most online sources target(X) : target(X ) :- - source1(X source1(X 1 1 ), source2(X ), source2(X 2 2 ), ), … … � For scalability don For scalability don’ ’t allow negation or union t allow negation or union � � Perform Top Perform Top- -Down Best Down Best- -First Search First Search � 1. First sample the New Source Invoke target with set of random inputs; Add empty clause to queue ; while ( queue not empty) v := best definition from queue ; forall ( v’ in Expand( v ) ) if ( Eval( v’ ) > Eval( v ) ) 2. Then perform best-first insert v’ into queue ; search through space of candidate definitions 24 April 2007 Thesis Defense - - Mark James Carman Mark James Carman 11 24 April 2007 Thesis Defense 11

Recommend


More recommend