improving link discovery using context aware link
play

Improving Link Discovery using context-aware link specifications - PowerPoint PPT Presentation

Improving Link Discovery using context-aware link specifications LDOW 2016 PhD candidate Andrea Cimmino Supervised by David Ruiz, University of Seville, Spain Carlos R. Rivero, Rochester Institute of Technology, USA Hi! My name is Andrea


  1. Improving Link Discovery using context-aware link specifications LDOW 2016 PhD candidate Andrea Cimmino Supervised by David Ruiz, University of Seville, Spain Carlos R. Rivero, Rochester Institute of Technology, USA

  2. Hi! My name is Andrea BARI ROCHESTER SEVILLE 2

  3. 
 Roadmap 
 Problem statement
 Results
 Future work

  4. To be or not to be … the same name : “Wei Wang” ? e m a s e email : “weiwang@cs.unc.edu” h T name : “Wei Wang” name : “Wei Wang” email : “wwang@unm.edu” DATASET 1 DATASET 2 4

  5. To be or not to be … the same Link Specification (LS AR ): Levenshtein( name, full-name) ≤ 0.42 full-name : “Wei Wang” ? e m a s e email : “weiwang@cs.unc.edu” h T name : “Wei Wang” full-name : “Wei Wang” email : “wwang@unm.edu” DATASET 1 DATASET 2 5

  6. To be or not to be … the same LS AR leads Award writes supports Some publications in common? … … Paper Article 6

  7. To be or not to be … the same 1. RDF, OWL LS AR leads Award writes supports Some publications in common? … … Paper Article 7

  8. To be or not to be … the same 1. RDF, OWL LS AR 2. ≠ Vocabularies leads Award writes supports Some publications in common? … … Paper Article 8

  9. To be or not to be … the same 1. RDF, OWL LS AR 2. ≠ Vocabularies 3. Rule generation leads Award writes supports Some publications in common? … … Paper Article 9

  10. To be or not to be … the same 1. RDF, OWL LS AR 2. ≠ Vocabularies 3. Rule generation 4. Context leads Award writes supports Some publications in common? … … Paper Article 10

  11. Overlap Factor owl:sameAs (LS AR ) FOR ALL owl:sameAs (LS AP ) EXISTS Contex-Aware Link Specification: FOR ALL Levenshtein( name, full-name) ≤ 0.42 AND EXISTS Levenshtein (title, title) < 1.20 11

  12. Applying LS AR full-name : “Wei Wang” ? e m a email : “weiwang@cs.unc.edu” s e h T name : “Wei Wang” full-name : “Wei Wang” The same? email : “wwang@unm.edu” 12

  13. Applying LS AR wrongly linked correctly linked full-name : “Wei Wang” owl:sameAs ? e m a email : “weiwang@cs.unc.edu” s e h T name : “Wei Wang” full-name : “Wei Wang” The same? owl:sameAs email : “wwang@unm.edu” 13

  14. Applying CALS title: “Efficient computation … ” title: “Efficient computation … ” date: “2007” year: “2007” title: “HolisticTtwig … ” full-name : “Wei Wang” ? e m a email : “weiwang@cs.unc.edu” s e h T name : “Wei Wang” full-name : “Wei Wang” The same? email : “wwang@unm.edu” title: “Direct Oxidative Conversion … ” date: “2012” 14

  15. Applying CALS title: “Efficient computation … ” title: “Efficient computation … ” date: “2007” year: “2007” title: “HolisticTtwig … ” full-name : “Wei Wang” s A e m email : “weiwang@cs.unc.edu” a s : l w o name : “Wei Wang” full-name : “Wei Wang” email : “wwang@unm.edu” wrongly linked correctly linked title: “Direct Oxidative Conversion … ” date: “2012” 15

  16. 
 Roadmap 
 Problem statement
 Results
 Future work

  17. Experiments ♦ Scenarios Scenario 1 – DBLP-NSF DBLP NSF Author 764 Researcher 235 Article 47,225 Award 235 Paper 6,877 owl:sameAs Author ~ Researcher 188 Scenario 2 – DBLP-DBLP DBLP Author 58 Article 5,284 owl:sameAs Author ~Author 62 17

  18. DBLP-NSF improving precision Link Specification (LS 1 ) Context-Aware Link Specification CALS 1.00 0.83 LS 1 : Jaro(name, full-name) < Threshold CALS: for all BEST(LS 1 ) and exists Jaro(title, title) < Threshold 18

  19. DBLP-DBLP improving recall Link Specification (LS 1 ) Context-Aware Link Specification CALS 1.00 0.83 LS 1 : Jaro(name, name) < Threshold CALS: for all Jaro(title, title) < Threshold 19

  20. DBLP-NSF GenLink evaluation results LS for DBLP-NSF ID Examples(+/-) Link LSN 1 (+1, -1) Author ~ Researcher LSN 5 (+5, -5) Author ~ Researcher LSN 10 (+5, -5) Author ~ Researcher LST 1 (+1, -1) Article ~ Paper LST 5 (+5, -5) Article ~ Paper LST 10 (+10, -10) Article ~ Paper LS for DBLP-NSF CALS for DBLP-NSF for link Author~ Researcher ID P R ID P R LSN 1 0.76 1.0 for all LSN 1 and exists LST 1 0.94 1.0 LSN 5 0.76 1.0 for all LSN 5 and exists LST 5 1.0 0.38 LSN 10 0.76 1.0 for all LSN 10 and exists LST 10 1.0 0.95 Best improvement 0.24 20

  21. DBLP-DBLP GenLink evaluation results LS for DBLP-NSF ID Examples(+/-) Link LSN 1 (+1, -1) Author ~ Author LSN 5 (+5, -5) Author ~ Author LSN 10 (+5, -5) Author ~ Author LST 1 (+1, -1) Article ~ Article LST 5 (+5, -5) Article ~ Article LST 10 (+10, -10) Article ~ Article LS CALS for DBLP-NSF for link Author ~ Author ID P R ID P R LSN 1 1.00 0.26 for all LST 1 1.00 0.84 LSN 5 1.00 0.30 for all LST 5 1.00 0.84 LSN 10 1.00 0.26 for all LST 10 1.00 0.84 Best impr. 0.58 21

  22. 
 Roadmap 
 Problem statement
 Results
 Future work

  23. Current work LS AR co-author leads co-leads writes Award supports LS AR1 , LS AR2 … … Paper Article WWW2017 WWW2017 Australia Au 23

  24. Future work SETUP DATASETS TECHNIQUES E x p e r i m e n t s E x p e r i m e n t s a and Anal Anal yses yses tr tr ac ack 24

  25. Future Future work owl:sameAs co-author leads co-leads writes Award supports owl:sameAs … … Paper Article 25

  26. THANKS! Queries? Andrea Cimmino cimmino@us.es http://tdg-seville.info/acimmino

  27. Features ♦ R1: Input RDF, not OWL. ♦ R2: Handle different schemas/vocabularies ♦ R3: Rule based (LS) ♦ R4: Context aware ♦ R5: Efficient context 27

  28. Related Work Technique R1 R2 R3 R4 R5 RiMOM ¡ - - - Nikolov et al. - AgreementMaker ¡ - - GenLink ¡ - - CODI ¡ ¡ ¡ - - - - EAGLE ¡ - - LOGMAP ¡ - - - Zhishi.links ¡ - - - SLINT+ ¡ - - - SignoProsik ¡ ~ - ~ SERIMI ¡ - - - Song and Heflin - - - PARIS - - - - 28 Hassanzadeh et al. - -

  29. DBLP-NSF GenLink LS Jaccard ≤ 0.37 LSN 1 Jaccard ≤ 0.37 LSN 5 dblp:name, nsf:name Jaccard ≤ 0.21 LSN 10 Levenshtein ≤ 29.48 LST 1 Levenshtein ≤ 0.59 LST 5 dblp:title, nsf:title Levenshtein ≤ 7.05 LST 10 29

  30. DBLP-DBLP GenLink LS Jaccard ≤ 0.15 LSN 1 Levenshtein ≤ 1.48 LSN 5 dblp:name, nsf:name Levenshtein ≤ 1.15 LSN 10 Levenshtein ≤ 1.76 LST 1 Levenshtein ≤ 1.46 LST 5 dblp:title, nsf:title Levenshtein ≤ 1.76 LST 10 30

  31. Link Specification model 31

  32. Link Specification extended (context) C-ALinkSpecification source: Class target: Class * C-ACondition C-ASameAsCondition ConditionComposite oF: OverlapFactor f: Aggregation 2 LinkSpecification LeafNode source: Class prop: ObjectProperty * target: Class dataset: {SRC, TRG} 32

Recommend


More recommend