Improving Link Discovery using context-aware link specifications LDOW 2016 PhD candidate Andrea Cimmino Supervised by David Ruiz, University of Seville, Spain Carlos R. Rivero, Rochester Institute of Technology, USA
Hi! My name is Andrea BARI ROCHESTER SEVILLE 2
Roadmap Problem statement Results Future work
To be or not to be … the same name : “Wei Wang” ? e m a s e email : “weiwang@cs.unc.edu” h T name : “Wei Wang” name : “Wei Wang” email : “wwang@unm.edu” DATASET 1 DATASET 2 4
To be or not to be … the same Link Specification (LS AR ): Levenshtein( name, full-name) ≤ 0.42 full-name : “Wei Wang” ? e m a s e email : “weiwang@cs.unc.edu” h T name : “Wei Wang” full-name : “Wei Wang” email : “wwang@unm.edu” DATASET 1 DATASET 2 5
To be or not to be … the same LS AR leads Award writes supports Some publications in common? … … Paper Article 6
To be or not to be … the same 1. RDF, OWL LS AR leads Award writes supports Some publications in common? … … Paper Article 7
To be or not to be … the same 1. RDF, OWL LS AR 2. ≠ Vocabularies leads Award writes supports Some publications in common? … … Paper Article 8
To be or not to be … the same 1. RDF, OWL LS AR 2. ≠ Vocabularies 3. Rule generation leads Award writes supports Some publications in common? … … Paper Article 9
To be or not to be … the same 1. RDF, OWL LS AR 2. ≠ Vocabularies 3. Rule generation 4. Context leads Award writes supports Some publications in common? … … Paper Article 10
Overlap Factor owl:sameAs (LS AR ) FOR ALL owl:sameAs (LS AP ) EXISTS Contex-Aware Link Specification: FOR ALL Levenshtein( name, full-name) ≤ 0.42 AND EXISTS Levenshtein (title, title) < 1.20 11
Applying LS AR full-name : “Wei Wang” ? e m a email : “weiwang@cs.unc.edu” s e h T name : “Wei Wang” full-name : “Wei Wang” The same? email : “wwang@unm.edu” 12
Applying LS AR wrongly linked correctly linked full-name : “Wei Wang” owl:sameAs ? e m a email : “weiwang@cs.unc.edu” s e h T name : “Wei Wang” full-name : “Wei Wang” The same? owl:sameAs email : “wwang@unm.edu” 13
Applying CALS title: “Efficient computation … ” title: “Efficient computation … ” date: “2007” year: “2007” title: “HolisticTtwig … ” full-name : “Wei Wang” ? e m a email : “weiwang@cs.unc.edu” s e h T name : “Wei Wang” full-name : “Wei Wang” The same? email : “wwang@unm.edu” title: “Direct Oxidative Conversion … ” date: “2012” 14
Applying CALS title: “Efficient computation … ” title: “Efficient computation … ” date: “2007” year: “2007” title: “HolisticTtwig … ” full-name : “Wei Wang” s A e m email : “weiwang@cs.unc.edu” a s : l w o name : “Wei Wang” full-name : “Wei Wang” email : “wwang@unm.edu” wrongly linked correctly linked title: “Direct Oxidative Conversion … ” date: “2012” 15
Roadmap Problem statement Results Future work
Experiments ♦ Scenarios Scenario 1 – DBLP-NSF DBLP NSF Author 764 Researcher 235 Article 47,225 Award 235 Paper 6,877 owl:sameAs Author ~ Researcher 188 Scenario 2 – DBLP-DBLP DBLP Author 58 Article 5,284 owl:sameAs Author ~Author 62 17
DBLP-NSF improving precision Link Specification (LS 1 ) Context-Aware Link Specification CALS 1.00 0.83 LS 1 : Jaro(name, full-name) < Threshold CALS: for all BEST(LS 1 ) and exists Jaro(title, title) < Threshold 18
DBLP-DBLP improving recall Link Specification (LS 1 ) Context-Aware Link Specification CALS 1.00 0.83 LS 1 : Jaro(name, name) < Threshold CALS: for all Jaro(title, title) < Threshold 19
DBLP-NSF GenLink evaluation results LS for DBLP-NSF ID Examples(+/-) Link LSN 1 (+1, -1) Author ~ Researcher LSN 5 (+5, -5) Author ~ Researcher LSN 10 (+5, -5) Author ~ Researcher LST 1 (+1, -1) Article ~ Paper LST 5 (+5, -5) Article ~ Paper LST 10 (+10, -10) Article ~ Paper LS for DBLP-NSF CALS for DBLP-NSF for link Author~ Researcher ID P R ID P R LSN 1 0.76 1.0 for all LSN 1 and exists LST 1 0.94 1.0 LSN 5 0.76 1.0 for all LSN 5 and exists LST 5 1.0 0.38 LSN 10 0.76 1.0 for all LSN 10 and exists LST 10 1.0 0.95 Best improvement 0.24 20
DBLP-DBLP GenLink evaluation results LS for DBLP-NSF ID Examples(+/-) Link LSN 1 (+1, -1) Author ~ Author LSN 5 (+5, -5) Author ~ Author LSN 10 (+5, -5) Author ~ Author LST 1 (+1, -1) Article ~ Article LST 5 (+5, -5) Article ~ Article LST 10 (+10, -10) Article ~ Article LS CALS for DBLP-NSF for link Author ~ Author ID P R ID P R LSN 1 1.00 0.26 for all LST 1 1.00 0.84 LSN 5 1.00 0.30 for all LST 5 1.00 0.84 LSN 10 1.00 0.26 for all LST 10 1.00 0.84 Best impr. 0.58 21
Roadmap Problem statement Results Future work
Current work LS AR co-author leads co-leads writes Award supports LS AR1 , LS AR2 … … Paper Article WWW2017 WWW2017 Australia Au 23
Future work SETUP DATASETS TECHNIQUES E x p e r i m e n t s E x p e r i m e n t s a and Anal Anal yses yses tr tr ac ack 24
Future Future work owl:sameAs co-author leads co-leads writes Award supports owl:sameAs … … Paper Article 25
THANKS! Queries? Andrea Cimmino cimmino@us.es http://tdg-seville.info/acimmino
Features ♦ R1: Input RDF, not OWL. ♦ R2: Handle different schemas/vocabularies ♦ R3: Rule based (LS) ♦ R4: Context aware ♦ R5: Efficient context 27
Related Work Technique R1 R2 R3 R4 R5 RiMOM ¡ - - - Nikolov et al. - AgreementMaker ¡ - - GenLink ¡ - - CODI ¡ ¡ ¡ - - - - EAGLE ¡ - - LOGMAP ¡ - - - Zhishi.links ¡ - - - SLINT+ ¡ - - - SignoProsik ¡ ~ - ~ SERIMI ¡ - - - Song and Heflin - - - PARIS - - - - 28 Hassanzadeh et al. - -
DBLP-NSF GenLink LS Jaccard ≤ 0.37 LSN 1 Jaccard ≤ 0.37 LSN 5 dblp:name, nsf:name Jaccard ≤ 0.21 LSN 10 Levenshtein ≤ 29.48 LST 1 Levenshtein ≤ 0.59 LST 5 dblp:title, nsf:title Levenshtein ≤ 7.05 LST 10 29
DBLP-DBLP GenLink LS Jaccard ≤ 0.15 LSN 1 Levenshtein ≤ 1.48 LSN 5 dblp:name, nsf:name Levenshtein ≤ 1.15 LSN 10 Levenshtein ≤ 1.76 LST 1 Levenshtein ≤ 1.46 LST 5 dblp:title, nsf:title Levenshtein ≤ 1.76 LST 10 30
Link Specification model 31
Link Specification extended (context) C-ALinkSpecification source: Class target: Class * C-ACondition C-ASameAsCondition ConditionComposite oF: OverlapFactor f: Aggregation 2 LinkSpecification LeafNode source: Class prop: ObjectProperty * target: Class dataset: {SRC, TRG} 32
Recommend
More recommend