graphs aphs
play

Graphs aphs Workshop on Linked Data on the Web (LDOW 2013) - PowerPoint PPT Presentation

14 Mai 2013 Similar milar Str truc uctures tures ins nside ide RDF- Graphs aphs Workshop on Linked Data on the Web (LDOW 2013) Collocated with the 22nd International World Wide Web Conference (WWW 2013) Anas s Alzogbi bi Georg rg


  1. 14 Mai 2013 Similar milar Str truc uctures tures ins nside ide RDF- Graphs aphs Workshop on Linked Data on the Web (LDOW 2013) Collocated with the 22nd International World Wide Web Conference (WWW 2013) Anas s Alzogbi bi Georg rg Lausen University of Freiburg Databases & Information Systems

  2. 1. Mo Moti tivation vation  RDF datasets are growing constantly (e.g. LOD)  Minimum Constraints for RDF data make it irregular, difficult to comprehend and visualize  Idea ea ◦ Discover RDF subjects which exhibit similar structures ◦ Preserve the meaning by preserving the structure Similar Structures inside RDF-Graphs 2

  3. 2. Our ur Approach proach  Two phases approach ◦ Collapse Equivalent structures (Bisimilarity Equivalence) ◦ Collapse Similar structures (Clustering) reduced RDF Graph RDF Graph Non-Literal Entities Perfect Typing Similarity based reduction Bisimilarity PTG Complete link Equivalence agglomerative clustering Similar Structures inside RDF-Graphs 3

  4. 3. Per erfe fect ct Typing ping Bisimilarity equivalence Let 𝐻 = (𝑊, 𝐹, 𝑀) be an RDF graph, Two nodes 𝑤, 𝑣 ∈ 𝑊 are bisimilar ( 𝑤 ≈ 𝐶 𝑣 ) if they have the same set of outgoing paths: 𝑄 𝑤 = 𝑄 𝑣 𝑤 2 ≈ 𝐶 𝑄 𝑤 6 𝑄 𝑤 2 = 𝑄 𝑤 6 = 𝑗 ⇒ 𝑄 i i c c b b 𝑤 2 𝑤 3 𝑤 5 𝑤 6 d d a a 𝑤 4 𝑄 𝑤 5 = 𝑄 𝑤 3 = { 𝑏 , 𝑐, 𝑗 , 𝑑 , 𝑒, ℎ , 𝑒, 𝑕 , 𝑓 } e 𝑤 3 ≈ 𝐶 𝑄 g e h 𝑤 5 ⇒ 𝑄 Similar Structures inside RDF-Graphs 4

  5. 4. Similari milarity ty Based sed Red eduction uction  Hierarchical clustering ◦ Exclusive, unsupervised ◦ Requires similarity matrix  Instance tree & intersection tree [Lösch et al. 2012] 𝜏 (𝑤) is the instance tree of node 𝑤  𝑈 c 𝑤 1 f i b 𝑤 3 𝑤 1 b c a e f e d b 𝑤 3 d a d c 𝑤 4 𝑤 2 𝑤 4 i a g h g h e 𝑈 𝜏 (𝑤 1 ) PTG Similar Structures inside RDF-Graphs 5

  6. 4. Similari milarity ty Based sed Red eduction uction  Instance tree & intersection tree 𝑤 1 𝑤 2 c b c e c b e f e d b a a d a d 𝑤 3 𝑤 3 𝑤 4 𝑤 4 i i i h g h h g g 𝑗𝑜𝑢𝑓𝑠𝑡𝑓𝑑𝑢 𝑈 𝜏 𝑤 1 , 𝑈 𝜏 𝑤 2 𝑈 𝜏 (𝑤 1 ) 𝑈 𝜏 (𝑤 2 ) 𝑡𝑗𝑨𝑓 𝑈 𝜏 𝑤 1 = 9 𝑡𝑗𝑨𝑓 𝑈 𝜏 𝑤 2 = 8 𝑡𝑗𝑨𝑓 𝑗𝑜𝑢𝑓𝑠𝑡𝑓𝑑𝑢 = 8  Pairwise similarity 𝑡𝑗𝑛 𝑤 1 , 𝑤 2 = 𝑡𝑗𝑨𝑓(𝑗𝑜𝑢𝑓𝑠𝑡𝑓𝑑𝑢 𝑈 𝜏 𝑤 1 , 𝑈 𝜏 𝑤 2 ) = 8 8,5 = 0,94 (𝑡𝑗𝑨𝑓 𝑈 𝜏 𝑤 1 + 𝑡𝑗𝑨𝑓 𝑈 𝜏 𝑤 2 ) 2 Similar Structures inside RDF-Graphs 6

  7. 4. Similari milarity ty based sed red eduction uction  agglomerative algorithm for complete-link clustering x 1 x 4 x 5 x 2 x 3 G(∞)={{x 1 },{ x 2 },{x 3 }, {x 4 }, {x 5 }} G(0.9)={{x 1 }, {x 2 , x 3 }, {x 4 },{x 5 }} G(0.8) = {{x 1 , x 4 },{x 2 , x 3 }, {x 5 }} G(0.3) = {{x 1 , x 4 , x 5 },{x 2 , x 3 }} x 1 x 4 G(0) = {{x 1 , x 4 , x 5 ,x 2 , x 3 }} Dendrogram x 5 x 2 x 3 Threshold graph Similar Structures inside RDF-Graphs 7

  8. 4. Similari milarity ty based sed red eduction uction  List of partitions G(∞)={{x 1 },{x 2 },{x 3 }, {x 4 }, {x 5 }} G(0.9)={{x 1 }, {x 2 , x 3 }, {x 4 },{x 5 }} G(0.8) = {{x 1 , x 4 },{x 2 , x 3 }, {x 5 }} G(0.3) = {{x 1 , x 4 , x 5 },{x 2 , x 3 }} G(0) = {{x 1 , x 4 , x 5 , x 2 , x 3 }}  Which partition is appropriate? 1 |𝒬 𝜐 | 𝐽𝑜𝑢𝑠𝑏𝑇𝑗𝑛 𝒬𝜐 = 𝐽𝑜𝑢𝑠𝑏𝑇𝑗𝑛 𝑑 𝑑∈𝒬 𝜐 1 𝑜 , where: 𝜇 𝐽𝑜𝑢𝑠𝑏𝑇𝑗𝑛 𝑑 = 𝑇[𝑑 𝑗 , 𝑑 𝑘 ] 𝑗<𝑘 𝑜(𝑜−1) , 𝑜 : the number of elements in 𝑑 𝜇 = 2 Similar Structures inside RDF-Graphs 8

  9. 5. E Eva valuati uation on Data set Subjects Objects Predicates Edges SP 2 Bench250K 50K 100K 61 250K LUBM2 40K 20K 32 240K BSBM500K 48K 100K 40 500K SwDogFood 25K 55K 170 290K Similar Structures inside RDF-Graphs 9

  10. 5. Eval valuation uation  Experimental Results 1. IntraSim & Similarity value Similar Structures inside RDF-Graphs 10

  11. 5. Eval valuation uation  Experimental Results 1. IntraSim & Partition size Similar Structures inside RDF-Graphs 11

  12. 5. Eval valuation uation  Experimental Results Data set Subjects RDF types Clusters errors SP 2 Bench250K 50K 9 85 0 LUBM2 40K 14 6 2 BSBM500K 48K 9 7 0 SwDogFood 25K 43 1918 22 ◦ LUBM2 2 universities appeared with 3728 courses ◦ SwDogFood 21 ResearchTopics appeared with 36 SpatialThings Similar Structures inside RDF-Graphs 12

  13. 5. Eval valuation uation  Experimental Results ◦ SwDogFood  22K typed subjects  43 different types 𝑂𝑣𝑛𝑐𝑓𝑠 𝑝𝑔 𝐷𝑚𝑣𝑡𝑢𝑓𝑠𝑡 . 10 4 𝒬 35% 𝒬 30% 𝒬 23% 𝒬 64% 𝒬 50% 𝒬 𝒬 45% 40% Partition #Clusters 1918 424 287 196 119 70 25 #Clusters 1795 413 280 191 116 68 25 with Types Multi Types 83 58 51 46 33 23 17 Clusters #Errors 22 133 209 209 209 210 251 Error Ratio 0, 09% 0, 94% 0, 94% 0, 94% 0,95% 1,26% 0, 6% Similar Structures inside RDF-Graphs 13

  14. 6. Con onclusion clusion & Fut Futur ure e Wor ork  Concl clusion usion ◦ Two phase approach ◦ Discover equivalent, then similar structures ◦ Use Bisimilarity equivalence + Agglomerative clustering ◦ Apply 𝐽𝑜𝑢𝑠𝑏𝑇𝑗𝑛 as a metric to choose the best partition  Future ure Work ◦ Edge filtering Consider only important edges ◦ Experiment on bigger data sets [http://www.superscholar.org] Similar Structures inside RDF-Graphs 14

  15. Tha hank nk you ou fo for you our att ttent ntion on! Similar Structures inside RDF-Graphs 15

  16. Ref efer eren ences ces  [Lösch et al. 2012] U. Lösch, S. Bloehdorn, and A. Rettinger, Graph Kernels for RDF Data , in ESWC, 2012 Similar Structures inside RDF-Graphs 16

  17. SP 2 Bench250K Similar Structures inside RDF-Graphs 17

  18. BSBM500K Similar Structures inside RDF-Graphs 18

  19. LUBM2 Similar Structures inside RDF-Graphs 19

Recommend


More recommend