mpri internship defense advances in holistic ontology
play

Mpri Internship Defense Advances in Holistic Ontology Alignment - PowerPoint PPT Presentation

Background Paris Performance Joins Theory Literals Application to IE Conclusion Mpri Internship Defense Advances in Holistic Ontology Alignment Antoine Amarilli Supervised by Pierre Senellart T el ecom ParisTech 1/32 Background


  1. Background Paris Performance Joins Theory Literals Application to IE Conclusion Mpri Internship Defense Advances in Holistic Ontology Alignment Antoine Amarilli Supervised by Pierre Senellart T´ el´ ecom ParisTech 1/32

  2. Background Paris Performance Joins Theory Literals Application to IE Conclusion The Semantic Web capitalOf <p><b>Paris</b> is the <a href="Capital_city"> Paris France capital</a> of <a href="France">France</a></p> Facts on the Web Facts on the semantic Web The Web. Lots of information in semi-structured HTML documents. The semantic Web. An effort to represent information in a structured and semantic way. Uses. Interoperability, integration of sources, constraints, complex queries, inference. 2/32

  3. Background Paris Performance Joins Theory Literals Application to IE Conclusion Ontologies dbp:France dbp:capital foaf:homepage dbp:Paris http://www.paris.fr/ foaf:name 'Paris' Ontologies are the information sources of the Semantic Web. Vertices are entities or literals. Edges are facts labeled with a relation. Sources : manual creation, existing databases, information extraction. 3/32

  4. Background Paris Performance Joins Theory Literals Application to IE Conclusion Linked Data Cloud iServe BNB BibBase OS Project DBpedia data Guten- dcs berg DBLP DBLP (FU (L3S) Many ontologies are created Berlin) data- dbpedia open- TCM lite Gene ac- independently: different entities DIT uk Daily UN/ ERA Med LOCODE and relations express the same Disea- some SIDER Eurécom things. Drug Bank Pfam Enipedia LinkedCT Linked Data: integrate existing PDB UniProt ontologies in a network structured UniProt Taxo- HGNC (Bio2RDF) nomy by equality links between PRO- ProDom STIT SITE equivalent concepts. Affy- Pub SISVU metrix To automatically derive those links, PubMed Gene SGD Chem Ontology we need to perform ontology ChEMBL OMIM alignment. nked MGI InterPro GeneID Open Colors Smart Link Medi Care UniParc UniRef UniSTS 4/32 Google

  5. Background Paris Performance Joins Theory Literals Application to IE Conclusion Ontology Alignment Sometimes URIs do not help us and literals are ambiguous or have minor differences... imdb:label imdb:p138992 'Charles Brackett' imdb:producerOf imdb:tt0046435 imdb:label 'Titanic' foaf:name dbp:Charles_Brackett 'Charles William Brackett' dbp:producer dbp:Titanic_(1953_ fi lm) foaf:name 'Titanic' Sometimes the structures of the two ontologies do not match... rdf:type bio:Birth bio:event bnb:AdamsDouglas1952-2001 bnb:AdamsDouglas1952-2001/birth bio:date '1952' dbp:birthDate dbp:Douglas_Adams '1952-03-11' 5/32

  6. Background Paris Performance Joins Theory Literals Application to IE Conclusion Table of Contents Background: the Semantic Web 1 The Paris System 2 Performance Improvements 3 Join Relations 4 Theoretical Analysis 5 Approximate Literal Matching 6 Application to Information Extraction 7 Conclusion 8 6/32

  7. Background Paris Performance Joins Theory Literals Application to IE Conclusion Paris Paris : Probabilistic Alignment of Relations, Instances, and Schema. To bootstrap a matching, Paris uses an equality function on literals and applies propagation rules. x y x y r r = ⊆ = = ⊆ = r’ r’ x’ y’ x’ y’ The rules are represented as a system of equations which we iterate until a fixpoint is reached: � 1 − Pr n ( r ′ ⊆ r ) × fun -1 ( r ) × Pr n ( y ≡ y ′ ) � Pr n +1 ( x ≡ x ′ ) = 1 − � r ( x , y ) � 1 − Pr n ( r ⊆ r ′ ) × fun -1 ( r ′ ) × Pr n ( y ≡ y ′ ) � × r ′ ( x ′ , y ′ ) � �� 1 − (Pr n ( x ≡ x ′ ) × Pr n ( y ≡ y ′ )) � 1 − � � r ( x , y ) r ′ ( x ′ , y ′ ) Pr n +1 ( r ⊆ r ′ ) = � � x ′ , y ′ (1 − Pr n ( x ≡ x ′ ) × Pr n ( y ≡ y ′ )) � 1 − � r ( x , y ) 7/32

  8. Background Paris Performance Joins Theory Literals Application to IE Conclusion Paris by Example 'Elvis Presley' a:name a:birthdate a:Elvis '1935-01-08' a:spouse a:name a:Priscilla 'Priscilla Presley' 'Elvis Presley' b:name b:birthdate b:Elvis '1935-01-08' b:spouse b:name b:Priscilla 'Priscilla Presley' 8/32

  9. Background Paris Performance Joins Theory Literals Application to IE Conclusion Paris by Example 'Elvis Presley' a:name a:birthdate a:Elvis '1935-01-08' a:spouse a:name a:Priscilla 'Priscilla Presley' 'Elvis Presley' b:name b:birthdate b:Elvis '1935-01-08' b:spouse b:name b:Priscilla 'Priscilla Presley' 8/32

  10. Background Paris Performance Joins Theory Literals Application to IE Conclusion Paris by Example 'Elvis Presley' a:name a:birthdate a:Elvis '1935-01-08' a:spouse a:name a:Priscilla 'Priscilla Presley' 'Elvis Presley' b:name b:birthdate b:Elvis '1935-01-08' b:spouse b:name b:Priscilla 'Priscilla Presley' 8/32

  11. Background Paris Performance Joins Theory Literals Application to IE Conclusion Paris by Example 'Elvis Presley' a:name a:birthdate a:Elvis '1935-01-08' a:spouse a:name a:Priscilla 'Priscilla Presley' 'Elvis Presley' b:name b:birthdate b:Elvis '1935-01-08' b:spouse b:name b:Priscilla 'Priscilla Presley' 8/32

  12. Background Paris Performance Joins Theory Literals Application to IE Conclusion Relation Functionalities 'Ei ff el tower' 'T our Ei ff el' name name A B position '48.8583°N 2.2945°E' position '48.8583°N 2.2945°E' Two instances should be aligned when they share the same values for aligned functional relations. In theory, the ontology schema should indicate which relations are functional. In practice, no schema, and no “strict” functionality: compute a fuzzy functionality in [0 , 1] from the data. 9/32

  13. Background Paris Performance Joins Theory Literals Application to IE Conclusion Existing Implementation and Previous Results Paris is implemented in Java. Paris was evaluated on: toy datasets from the OAEI, DBpedia and Yago (two ontologies extracted from Wikipedia) Yago and IMDb The evaluation is done in terms of precision, recall and F-measure. Instances Classes Relations Prec Rec F Prec Rec F Prec Rec F OAEI person 100% 100% 100% 100% 100% 100% 100% 100% 100% OAEI restaurant 95% 88% 91% 100% 100% 100% 100% 66% 88% DBpedia–Yago 90% 73% 81% 94% - - 93% - - IMDb–Yago 94% 90% 92% 28% - - 100% 80% 89% 10/32

  14. Background Paris Performance Joins Theory Literals Application to IE Conclusion Table of Contents Background: the Semantic Web 1 The Paris System 2 Performance Improvements 3 Join Relations 4 Theoretical Analysis 5 Approximate Literal Matching 6 Application to Information Extraction 7 Conclusion 8 11/32

  15. Background Paris Performance Joins Theory Literals Application to IE Conclusion Table of Contents Background: the Semantic Web 1 The Paris System 2 Performance Improvements 3 Join Relations 4 Theoretical Analysis 5 Approximate Literal Matching 6 Application to Information Extraction 7 Conclusion 8 12/32

  16. Background Paris Performance Joins Theory Literals Application to IE Conclusion Performance Improvements The original Paris takes a few hours per iteration. Ways to improve this: Replace BerkeleyDB by an in-memory representation of the ontologies. Parallelize the propagation of entity alignment scores over all entities. Aggregate results at the end to avoid races. Change the hardware (now that the computation is CPU-bound). 13/32

  17. Background Paris Performance Joins Theory Literals Application to IE Conclusion Performance Improvement Results Iteration Original PARIS New PARIS (1 thread) New PARIS (4 threads) Startup 0h00 0h27 0h10 1 4h04 0h40 0h27 2 5h06 3h00 1h02 3 5h00 0h34 0h24 4 5h30 0h29 0h16 Total 20h 5h 2h Table: Running times for the DBpedia– Yago alignment task. The original Paris was run on an Intel Xeon E5620 CPU clocked at 2.40 Ghz on a machine with 12 GB of RAM. The new Paris was run on an Intel Core i7-3820 CPU clocked at 3.60 Ghz with 48 GB of RAM. 14/32

  18. Background Paris Performance Joins Theory Literals Application to IE Conclusion Table of Contents Background: the Semantic Web 1 The Paris System 2 Performance Improvements 3 Join Relations 4 Theoretical Analysis 5 Approximate Literal Matching 6 Application to Information Extraction 7 Conclusion 8 15/32

  19. Background Paris Performance Joins Theory Literals Application to IE Conclusion Join Relations a:countryOfBirth a:Douglas_Adams a:UK b:Cambridge b:birthPlace b:country b:Douglas_Adams b:UK (b:birthPlace, b:country) The simplest possible difference in structure between ontologies: relations of one ontology correspond to join relations in the other ontology. The terminology is motivated by the “join” operator of relational algebra. We see the join as a binary predicate: the intermediate nodes are existentially quantified but projected away. 16/32

Recommend


More recommend