14 Mai 2013 Similar milar Str truc uctures tures ins nside ide RDF- Graphs aphs Workshop on Linked Data on the Web (LDOW 2013) Collocated with the 22nd International World Wide Web Conference (WWW 2013) Anas s Alzogbi bi Georg rg Lausen University of Freiburg Databases & Information Systems
1. Mo Moti tivation vation RDF datasets are growing constantly (e.g. LOD) Minimum Constraints for RDF data make it irregular, difficult to comprehend and visualize Idea ea ◦ Discover RDF subjects which exhibit similar structures ◦ Preserve the meaning by preserving the structure Similar Structures inside RDF-Graphs 2
2. Our ur Approach proach Two phases approach ◦ Collapse Equivalent structures (Bisimilarity Equivalence) ◦ Collapse Similar structures (Clustering) reduced RDF Graph RDF Graph Non-Literal Entities Perfect Typing Similarity based reduction Bisimilarity PTG Complete link Equivalence agglomerative clustering Similar Structures inside RDF-Graphs 3
3. Per erfe fect ct Typing ping Bisimilarity equivalence Let 𝐻 = (𝑊, 𝐹, 𝑀) be an RDF graph, Two nodes 𝑤, 𝑣 ∈ 𝑊 are bisimilar ( 𝑤 ≈ 𝐶 𝑣 ) if they have the same set of outgoing paths: 𝑄 𝑤 = 𝑄 𝑣 𝑤 2 ≈ 𝐶 𝑄 𝑤 6 𝑄 𝑤 2 = 𝑄 𝑤 6 = 𝑗 ⇒ 𝑄 i i c c b b 𝑤 2 𝑤 3 𝑤 5 𝑤 6 d d a a 𝑤 4 𝑄 𝑤 5 = 𝑄 𝑤 3 = { 𝑏 , 𝑐, 𝑗 , 𝑑 , 𝑒, ℎ , 𝑒, , 𝑓 } e 𝑤 3 ≈ 𝐶 𝑄 g e h 𝑤 5 ⇒ 𝑄 Similar Structures inside RDF-Graphs 4
4. Similari milarity ty Based sed Red eduction uction Hierarchical clustering ◦ Exclusive, unsupervised ◦ Requires similarity matrix Instance tree & intersection tree [Lösch et al. 2012] 𝜏 (𝑤) is the instance tree of node 𝑤 𝑈 c 𝑤 1 f i b 𝑤 3 𝑤 1 b c a e f e d b 𝑤 3 d a d c 𝑤 4 𝑤 2 𝑤 4 i a g h g h e 𝑈 𝜏 (𝑤 1 ) PTG Similar Structures inside RDF-Graphs 5
4. Similari milarity ty Based sed Red eduction uction Instance tree & intersection tree 𝑤 1 𝑤 2 c b c e c b e f e d b a a d a d 𝑤 3 𝑤 3 𝑤 4 𝑤 4 i i i h g h h g g 𝑗𝑜𝑢𝑓𝑠𝑡𝑓𝑑𝑢 𝑈 𝜏 𝑤 1 , 𝑈 𝜏 𝑤 2 𝑈 𝜏 (𝑤 1 ) 𝑈 𝜏 (𝑤 2 ) 𝑡𝑗𝑨𝑓 𝑈 𝜏 𝑤 1 = 9 𝑡𝑗𝑨𝑓 𝑈 𝜏 𝑤 2 = 8 𝑡𝑗𝑨𝑓 𝑗𝑜𝑢𝑓𝑠𝑡𝑓𝑑𝑢 = 8 Pairwise similarity 𝑡𝑗𝑛 𝑤 1 , 𝑤 2 = 𝑡𝑗𝑨𝑓(𝑗𝑜𝑢𝑓𝑠𝑡𝑓𝑑𝑢 𝑈 𝜏 𝑤 1 , 𝑈 𝜏 𝑤 2 ) = 8 8,5 = 0,94 (𝑡𝑗𝑨𝑓 𝑈 𝜏 𝑤 1 + 𝑡𝑗𝑨𝑓 𝑈 𝜏 𝑤 2 ) 2 Similar Structures inside RDF-Graphs 6
4. Similari milarity ty based sed red eduction uction agglomerative algorithm for complete-link clustering x 1 x 4 x 5 x 2 x 3 G(∞)={{x 1 },{ x 2 },{x 3 }, {x 4 }, {x 5 }} G(0.9)={{x 1 }, {x 2 , x 3 }, {x 4 },{x 5 }} G(0.8) = {{x 1 , x 4 },{x 2 , x 3 }, {x 5 }} G(0.3) = {{x 1 , x 4 , x 5 },{x 2 , x 3 }} x 1 x 4 G(0) = {{x 1 , x 4 , x 5 ,x 2 , x 3 }} Dendrogram x 5 x 2 x 3 Threshold graph Similar Structures inside RDF-Graphs 7
4. Similari milarity ty based sed red eduction uction List of partitions G(∞)={{x 1 },{x 2 },{x 3 }, {x 4 }, {x 5 }} G(0.9)={{x 1 }, {x 2 , x 3 }, {x 4 },{x 5 }} G(0.8) = {{x 1 , x 4 },{x 2 , x 3 }, {x 5 }} G(0.3) = {{x 1 , x 4 , x 5 },{x 2 , x 3 }} G(0) = {{x 1 , x 4 , x 5 , x 2 , x 3 }} Which partition is appropriate? 1 |𝒬 𝜐 | 𝐽𝑜𝑢𝑠𝑏𝑇𝑗𝑛 𝒬𝜐 = 𝐽𝑜𝑢𝑠𝑏𝑇𝑗𝑛 𝑑 𝑑∈𝒬 𝜐 1 𝑜 , where: 𝜇 𝐽𝑜𝑢𝑠𝑏𝑇𝑗𝑛 𝑑 = 𝑇[𝑑 𝑗 , 𝑑 𝑘 ] 𝑗<𝑘 𝑜(𝑜−1) , 𝑜 : the number of elements in 𝑑 𝜇 = 2 Similar Structures inside RDF-Graphs 8
5. E Eva valuati uation on Data set Subjects Objects Predicates Edges SP 2 Bench250K 50K 100K 61 250K LUBM2 40K 20K 32 240K BSBM500K 48K 100K 40 500K SwDogFood 25K 55K 170 290K Similar Structures inside RDF-Graphs 9
5. Eval valuation uation Experimental Results 1. IntraSim & Similarity value Similar Structures inside RDF-Graphs 10
5. Eval valuation uation Experimental Results 1. IntraSim & Partition size Similar Structures inside RDF-Graphs 11
5. Eval valuation uation Experimental Results Data set Subjects RDF types Clusters errors SP 2 Bench250K 50K 9 85 0 LUBM2 40K 14 6 2 BSBM500K 48K 9 7 0 SwDogFood 25K 43 1918 22 ◦ LUBM2 2 universities appeared with 3728 courses ◦ SwDogFood 21 ResearchTopics appeared with 36 SpatialThings Similar Structures inside RDF-Graphs 12
5. Eval valuation uation Experimental Results ◦ SwDogFood 22K typed subjects 43 different types 𝑂𝑣𝑛𝑐𝑓𝑠 𝑝𝑔 𝐷𝑚𝑣𝑡𝑢𝑓𝑠𝑡 . 10 4 𝒬 35% 𝒬 30% 𝒬 23% 𝒬 64% 𝒬 50% 𝒬 𝒬 45% 40% Partition #Clusters 1918 424 287 196 119 70 25 #Clusters 1795 413 280 191 116 68 25 with Types Multi Types 83 58 51 46 33 23 17 Clusters #Errors 22 133 209 209 209 210 251 Error Ratio 0, 09% 0, 94% 0, 94% 0, 94% 0,95% 1,26% 0, 6% Similar Structures inside RDF-Graphs 13
6. Con onclusion clusion & Fut Futur ure e Wor ork Concl clusion usion ◦ Two phase approach ◦ Discover equivalent, then similar structures ◦ Use Bisimilarity equivalence + Agglomerative clustering ◦ Apply 𝐽𝑜𝑢𝑠𝑏𝑇𝑗𝑛 as a metric to choose the best partition Future ure Work ◦ Edge filtering Consider only important edges ◦ Experiment on bigger data sets [http://www.superscholar.org] Similar Structures inside RDF-Graphs 14
Tha hank nk you ou fo for you our att ttent ntion on! Similar Structures inside RDF-Graphs 15
Ref efer eren ences ces [Lösch et al. 2012] U. Lösch, S. Bloehdorn, and A. Rettinger, Graph Kernels for RDF Data , in ESWC, 2012 Similar Structures inside RDF-Graphs 16
SP 2 Bench250K Similar Structures inside RDF-Graphs 17
BSBM500K Similar Structures inside RDF-Graphs 18
LUBM2 Similar Structures inside RDF-Graphs 19
Recommend
More recommend