Graph Indexing: Tree + Delta Delta >= Graph >= Graph Graph - PowerPoint PPT Presentation

The Chinese University of Hong Kong The Chinese University of Hong Kong Graph Indexing: Tree + Delta Delta >= Graph >= Graph Graph Indexing: Tree + Peixian Zhao, Jeffrey Xu Yu, Philip S. Yu Zhao, Jeffrey Xu Yu, Philip S. Yu Peixian The Chinese University of Hong Kong, {pxzhao, pxzhao,yu}@se.cuhk.edu.hk yu}@se.cuhk.edu.hk The Chinese University of Hong Kong, { IBM Watson Research Center, psyu@us.ibm.com psyu@us.ibm.com IBM Watson Research Center, 1 1

An Overview An Overview • Graph containment query Graph containment query • • The framework and query cost model The framework and query cost model • • Some existing path/graph based solutions Some existing path/graph based solutions • • A new tree-based approach A new tree-based approach • • Experimental studies Experimental studies • • Conclusion Conclusion • 2 2

Graph Containment Query Graph Containment Query • Given a graph database G = { g 1 , g 2 , …, g N } and a query sup(q ) { g |q g ,g G } graph q , find the set = � � i i i √ √ (q) • Infeasible to check subgraph isomorphism for every g i in G , because subgraph-isomorphism is NP-Complete. 3 3

The Framework The Framework • Index construction generates a set of features, generates a set of features, F, F, from the from the • Index construction graph database G G . Each . Each feature feature , , f f , maintains a set of graph ids , maintains a set of graph ids graph database in G G containing, containing, f f , , sup sup ( ( f f ). ). in • Query processing is a is a filtering-verification filtering-verification process. process. • Query processing • Filtering phase uses the features in query graph, q, to compute the candidate set. • Verification phase checks subgraph isomorphism for every graph in C q . False positives are pruned. 4 4

Query Cost Model Query Cost Model • The cost of processing a graph containment query q q upon G is upon G is • The cost of processing a graph containment query modeled as modeled as • C f : the filtering cost, and • C v : the verification cost (NP-Complete) • Several Facts: • Several Facts: • To improve query performance is to minimize |C q |. • The feature set F selected has great impacts on C f and |C q |. • There is also an index construction cost, which is the cost of discovering the feature set F. 5 5

Existing Solutions: Paths vs vs Graphs Graphs Existing Solutions: Paths • Path-based Indexing Approach: GraphGrep ( PODS PODS’ ’02 02 ) ) • Path-based Indexing Approach: GraphGrep ( • All paths up to a certain length l p are enumerated as indexing features – An efficient index construction process – Index size is determined by l p – Limited pruning power, because the structural information is lost. • Graph-based Indexing Approach: gIndex ( SIGMOD SIGMOD’ ’04) 04) • Graph-based Indexing Approach: gIndex ( • Discriminative frequent subgraphs are mined from G as indexing features – A costly index construction process – Compact index structure – Great pruning power, because structural information is well- preserved 6 6

Tree Features? Tree Features? • Regarding paths and graphs as index features: Regarding paths and graphs as index features: • • The cost of generating path features is small but the candidate set can be large. • The cost of generating frequent graph features is high but the candidate set can be small. • The key observation The key observation : the majority of frequent : the majority of frequent • graph-features (more than 95%) are trees. graph-features (more than 95%) are trees. • How good can tree features do? How good can tree features do? • 7 7

A New Approach: Tree+ Δ Δ A New Approach: Tree+ • To explore To explore indexability indexability of path, tree and graph. of path, tree and graph. • • A new approach Tree+ A new approach Tree+ Δ Δ : : • • To select frequent tree features. • To select a small number of discriminative graph- features that can prune graphs effectively, on demand, without costly graph mining . 8 8

Indexability of Path, Tree and Graph Indexability of Path, Tree and Graph • We consider three main factors to answer indexability. • The frequent feature set size: | F | • The feature selection cost (mining): C FS • The candidate set size: |C q | 9 9

The Frequent Feature Set Size: | F F | • 95% of frequent graph features are trees. Why? 95% of frequent graph features are trees. Why? • • Consider non-tree frequent graph features g and g Consider non-tree frequent graph features g and g’ ’. . • • Based on Apriori principle, all g’s subtrees, t 1 , t 2 , …, t n are frequent. • Because of the structural diversity and vertex/edge label variety, there is a little chance that subrees of g coincide with those of g’. 10 10

Frequent Feature Distributions Frequent Feature Distributions The Real Dataset (AIDS antivirus screen dataset) N = 1,000, σ = 0.1 11 11

The Feature Selection Cost: C C FS The Feature Selection Cost: FS • Given a graph database, G , and a minimum support threshold, σ , to discover the frequent feature set F from G. • Graph : two prohibitive operations are unavoidable – Subgraph isomorphism – Graph isomorphism • Tree : one prohibitive operation is unavoidable – Tree-in-Graph testing • Path : polynomial time 12 12

The Candidate Set Size: |C |C q | The Candidate Set Size: q | • Let pruning power of a frequent feature, f, be • Let pruning power of a frequent feature set S = { f 1 , f 2 , …, f n } • Let a frequent subtree feature set of graph, g, be T ( g ) = { t 1 , t 2 , …, t n }. power( power( g g ) ) ≥ ≥ power( power( T T ( ( g g )) )) • Let a frequent subpath feature set of tree, t, be P ( t ) = { p 1 , p 2 , …, p n }. power( power( t t ) ) ≥ ≥ power( power( P P ( ( t t )) )) 13 13

The Pruning Power The Pruning Power The Real Dataset (AIDS antivirus screen dataset) N = 1,000, σ = 0.1 14 14

Indexability of Tree Indexability of Tree • The frequent tree-feature set dominates (95%). The frequent tree-feature set dominates (95%). • • Discovering frequent tree-features can be done Discovering frequent tree-features can be done • much more efficiently than mining frequent much more efficiently than mining frequent general graph-features. general graph-features. • Frequent tree features Frequent tree features can contribute similar can contribute similar • pruning power as frequent graph features do. pruning power as frequent graph features do. 15 15

Add Graph Features On Demand Add Graph Features On Demand • Consider a query graph q which contains a subgraph g • If power( T ( g )) ≈ power( g ), there is no need to index the graph-feature g . • If power( g ) >> power( T ( g )), it needs to select g as an index feature, because g is more discriminative than T ( g ), in terms of pruning. • Select discriminative graph-features on-demand, without mining the whole set of frequent graph-features from G. • The selected graph features are additional indexing features, denoted Δ , for later reuse . 16 16

Discriminative Ratio Discriminative Ratio • A discriminative ratio, ε (g) , is defined to measure the similarity of pruning power between a graph -feature g and its subtrees T(g). • A non-tree graph feature, g, is discriminative if ε (g) ≥ ε 0 . 17 17

Discriminative Graph Selection (1) (1) Discriminative Graph Selection • Consider two graphs g and g’ , where g g’ . • If the gap between power( g’ ) and power( g ) is large, reclaim g’ from G. Otherwise , do not reclaim g’ in the presence of g. • Approximate the discriminative between g’ and g , in the presence of frequent tree-features discovered. 18 18

Discriminative Graph Selection (2) (2) Discriminative Graph Selection • Let occurrence probability of g in the graph DB be • The conditional occurrence probability of g’ , w.r.t. g : • When Pr(g’|g) is small, g’ has higher probability to be discriminative w.r.t. g. 19 19

Discriminative Graph Selection (3) (3) Discriminative Graph Selection • The upper and lower bound of Pr (g’| g ) become because ε (g) ≥ ε 0 and ε (g’) ≥ ε 0 . recall: | sup( ) | / | x G | � = x 20 20

Discriminative Graph Selection (4) (4) Discriminative Graph Selection • Because 0 ≤ Pr (g’| g ) ≤ 1, the conditional occurrence probability of Pr ( g ’ |g ), is solely upper-bounded by T (g’). 21 21

Graph Indexing: Tree + Delta Delta >= Graph >= Graph Graph - PowerPoint PPT Presentation

The Chinese University of Hong Kong The Chinese University of Hong Kong Graph Indexing: Tree + Delta Delta >= Graph >= Graph Graph Indexing: Tree + Peixian Zhao, Jeffrey Xu Yu, Philip S. Yu Zhao, Jeffrey Xu Yu,

Delta highlighting Delta highlighting edits highlighted Delta highlighting edits highlighted

Indexing CS6320 1/29/2018 Shachi Deshpande, Yunhe Liu Content Motivation for Indexing

INDEXING - 1 Tree-Structured Indices Tree-structured indexing techniques support both

CAS CS 460/660 Introduction to Database Systems Tree Based Indexing: B+-tree Slides from UC

Sacramento Countys Policies and Response To The Delta Vision and Bay-Delta Conservation Plan

Are Hybrid Physical Designs Important? 1 B+ tree 2 C O L B+ tree 3 ? C O L C O L B+ tree

Distributed Indexing Indexing, session 8 CS6200: Information Retrieval Slides by: Jesse Anderton

Indexing Multimedia Multimedia Databases Databases Indexing Indexing Multimedia Databases

61A Lecture 21 Announcements Binary Trees Binary Tree Class 4 Binary Tree Class class

Tree-sitter @maxbrunsfeld What is Tree-sitter? Why I wrote Tree-sitter What were

Minimal Spanning Trees Spanning Tree Assume you have an undirected graph G = (V,E)

Indexing Presentation - The Basics Attached is the slide deck for a short presentation on indexing

Indexing and Searching Indexing and Searching TDT4215 TDT4215 Indexing & Searching 3

Bitmap Indexing and related indexing techniques Presented by: El Ghailani Maher Outline I

Chapter 6 Hash-Based Indexing Efficient Support for Equality Search Hash-Based Indexing Static

Indexing December 12, 2008 Indexing Introduction New tuple is stored without any order next

SSHIX Carrier Presentation April 2019 Agenda 1. GetInsured 834 Extensions and Changes 2.

Automatic differentiation, chaos indicators and dynamics Roberto Barrio IUMA and GME, Depto.

Ontologies, semantic annotation and GATE Kalina Bontcheva Johann Petrak University of Sheffield

Photonics in Telecom Satellite Payloads Nikos Karafolas with the kind contribution of colleagues

Digital Media Technology Group Strategy Planning Session Compiled Presentations Ryan Group and

Eclipse Andmore Project David Carver Eric Cloninger

Local Representation Alignment: A Biologically Motivated Algorithm for Training Neural Systems

RINAS IM : Y OUR R ECURSIVE I NTER N ETWORK Intro RINASim A RCHITECTURE S IMULATOR Outro

Graph Indexing: Tree + Delta Delta >= Graph >= Graph Graph - PowerPoint PPT Presentation

The Chinese University of Hong Kong The Chinese University of Hong Kong Graph Indexing: Tree + Delta Delta >= Graph >= Graph Graph Indexing: Tree + Peixian Zhao, Jeffrey Xu Yu, Philip S. Yu Zhao, Jeffrey Xu Yu,

Delta highlighting Delta highlighting edits highlighted Delta highlighting edits highlighted

Indexing CS6320 1/29/2018 Shachi Deshpande, Yunhe Liu Content Motivation for Indexing

INDEXING - 1 Tree-Structured Indices Tree-structured indexing techniques support both

CAS CS 460/660 Introduction to Database Systems Tree Based Indexing: B+-tree Slides from UC

Sacramento Countys Policies and Response To The Delta Vision and Bay-Delta Conservation Plan

Are Hybrid Physical Designs Important? 1 B+ tree 2 C O L B+ tree 3 ? C O L C O L B+ tree

Distributed Indexing Indexing, session 8 CS6200: Information Retrieval Slides by: Jesse Anderton

Indexing Multimedia Multimedia Databases Databases Indexing Indexing Multimedia Databases

61A Lecture 21 Announcements Binary Trees Binary Tree Class 4 Binary Tree Class class

Tree-sitter @maxbrunsfeld What is Tree-sitter? Why I wrote Tree-sitter What were

Minimal Spanning Trees Spanning Tree Assume you have an undirected graph G = (V,E)

Indexing Presentation - The Basics Attached is the slide deck for a short presentation on indexing

Indexing and Searching Indexing and Searching TDT4215 TDT4215 Indexing &amp; Searching 3

Bitmap Indexing and related indexing techniques Presented by: El Ghailani Maher Outline I

Chapter 6 Hash-Based Indexing Efficient Support for Equality Search Hash-Based Indexing Static

Indexing December 12, 2008 Indexing Introduction New tuple is stored without any order next

SSHIX Carrier Presentation April 2019 Agenda 1. GetInsured 834 Extensions and Changes 2.

Automatic differentiation, chaos indicators and dynamics Roberto Barrio IUMA and GME, Depto.

Ontologies, semantic annotation and GATE Kalina Bontcheva Johann Petrak University of Sheffield

Photonics in Telecom Satellite Payloads Nikos Karafolas with the kind contribution of colleagues

Digital Media Technology Group Strategy Planning Session Compiled Presentations Ryan Group and

Eclipse Andmore Project David Carver Eric Cloninger

Local Representation Alignment: A Biologically Motivated Algorithm for Training Neural Systems

RINAS IM : Y OUR R ECURSIVE I NTER N ETWORK Intro RINASim A RCHITECTURE S IMULATOR Outro

Indexing and Searching Indexing and Searching TDT4215 TDT4215 Indexing & Searching 3