paper discussion adding regular expressions to graph
play

PAPER DISCUSSION: Adding Regular Expressions to Graph Reachability - PowerPoint PPT Presentation

PAPER DISCUSSION: Adding Regular Expressions to Graph Reachability and Pattern Queries Xilun Wu Purdue University Contribution Describe reachability queries and graph pattern queries in a subset of Regular Expression . (Tractable) Define


  1. PAPER DISCUSSION: Adding Regular Expressions to Graph Reachability and Pattern Queries Xilun Wu Purdue University

  2. Contribution Describe reachability queries and graph pattern queries in a subset of Regular Expression . (Tractable) Define queries using graph simulation instead of using subgraph isomorphism . Low polynomial time algorithms for containment, equivalence and minimization problems for RQ s and PQ s. algorithm for RQ answering and for PQ answering. O ( N 2 ) O ( N 3 )

  3. Problem Definition Reachability queries: whether there exists a path from one node to another. Graph pattern queries: find all subgraphs of a graph that are isomorphic to a pattern graph. multiple edge types (fa, fn, sa, sn) indicating various relationships

  4. Problem Definition Reachability queries: whether there exists a path from one node to another. Graph pattern queries: find all subgraphs of a graph that are isomorphic to a pattern graph. Q2: 1. “Alice”s friends-nemeses (via fn) 3 who are doctors, and are against “cloning”. 2. biologists who support “cloning Q1: 4 research”, and are connected within 2 find all biologists (nodes C) who support hops to someone via fa relationships 2 “cloning”, along with those doctors who is within 2 hops to person D via (nodes B) who are friends-nemeses (via sa fn) of some users supported by C within 2 hops (via fa ≤ 2 ). 3. a scientist group with friends all 1 sharing the same view towards Identify connectivity via a path: cloning (a) with edges of particular types and patterns, 4. these biologists are against those and (b) with a bound on its length (hops). doctor friends of Alice, and vice versa, via paths of certain patterns

  5. Notation G = ( V , E , f A , f C )

  6. Notation (Pattern Query- Unfinished) G = ( V , E , f A , f C )

  7. Subgraph Isomorphism • Pattern graph Q, subgraph G s of data graph G • Q matches G s if there exists a bijective function f: V Q → V Gs such that – for each node u in Q, u and f(u) have the same label – An edge (u, u‘) in Q if and only if (f(u), f(u')) is an edge in G s • Goodness: Keep exact structure topology between Q and G s • Badness: Decision problem is NP-complete May return exponential many matched subgraphs In certain scenarios, too restrictive to find matches These hinder the usability in emerging applications, e.g., social networks

  8. Graph Simulation • Given pattern graph Q(Vq, Eq) and data graph G(V, E), a binary relation R ⊆ Vq × V is said to be a match if – (1) for each (u, v) ∈ R, u and v have the same label; and – (2) for each edge (u, u ′ ) ∈ Eq, there exists an edge (v, v ′ ) in E such that (u ′ , v ′ ) ∈ R. • Graph G matches pattern Q via graph simulation, if there exists a total match relation M – for each u ∈ Vq, there exists v ∈ V such that (u, v) ∈ M. • Goodness: Quadratic time solvable • Badness: Lose structure topology (how much? open question) Return a single unique matched subgraph Subgraph isomorphism (NP-complete) vs. graph simulation (O(n 2 ))!

  9. Graph Simulation Set up a team to develop a new software product Graph simulation returns F 3 , F 4 and F 5; Subgraph isomorphism returns empty! Subgraph Isomorphism is too strict for emerging applications!

  10. Graph Simulation Loses Structures Connected pattern graphs match disconnected subgraphs Gs Q • S( HR ) = {HR} • S( SE ) = {SE} S( Bio ) = {Bio 1 , Bio 2 } • Cyclic pattern graphs match tree subgraphs Q Gs • S( HR ) = {HR} • S( SE ) = {SE} S( Bio ) = {Bio 1 , Bio 2 } • These motivate us to propose a new matching model!

  11. Complexity (Bounded Regex) 1. The subgraph isomorphism problem is NP-complete. 2. But bounded graph simulation time is polynomial. 3. Bounded = allow bounds on the number of hops.

  12. To Be Continued 1. Two efficient algorithms for RQ 2. Two efficient algorithms for PQ 3. Containment, Equivalence, and Minimization problems for RQ and PQ, their complexity bounds, and algorithms. 4. Evaluations 5. Perhaps more on Graph Simulation and Subgraph Isomorphism.

  13. PAPER DISCUSSION: Adding Regular Expressions to Graph Reachability and Pattern Queries (Cont’d) Xilun Wu Purdue University

  14. Content 1. Two efficient algorithms for RQs 2. Two efficient algorithms for PQs

  15. Reachability Queries (RQs) 1. G = ( V , E , f A , f C ) 2. Q r = ( u 1 , u 2 , f u 1 , f u 2 , f e ) 3. if nodes and the edge between satisfy the predicates, ( v 1 , v 2 ) is a match of ( u 1 , u 2 ) ( v 1 ∼ u 1 , v 2 ∼ u 2 , ( v 1 , v 2 ) ≈ f 2 )

  16. Graph Pattern Queries (PQs) 1. G = ( V , E , f A , f C ) 2. Q p = ( V p , E p , f v , f e ) 3. if nodes and edges satisfy the predicates, ( V 1 , E 1 ) is a match of ( V p , E p ) 4. Can be answered using RQs.

  17. Answer RQs 1. Two methods: 1. Shortest distance matrix 2. Bi-directional BFS with an auxiliary LRU cache 2. Starts with single-color RQs. A multiple-color RQ F = F 1 F 2 … F k can be decomposed into k single-color RQs.

  18. Single-Color RQs 1. Shortest distance matrix 1. Matrix contains pair-wise shortest distance of nodes with the bound k on number of M k edges. The third dimension is the color. I s the distance along color c path of no M k [ v 1 ][ v 2 ][ c ] more than k edges. 2. Assumption: this matrix is pre-computed in O (( m + 1) | V | 2 + | V | ( | V | + | E | )) 3. Answer time: (_, _) ∼ O ( | V | 2 ) 1. ( v 1 , _) or (_, v 2 ) ∼ O ( | V | ) 2. 3. ( v 1 , v 2 ) ∼ O (1)

  19. Single-Color RQs 2. Bi-directional BFS with an auxiliary LRU cache 1. Iterate BFS from src and dest until those two sets intersect or one becomes empty. 2. Answer time: 1. (_, _) ∼ O ( | V | 2 ( | V | + | E | )) 2. ( v 1 , _) or (_, v 2 ) ∼ O ( | V | ( | V | + | E | )) 3. ( v 1 , v 2 ) ∼ O ( | V | + | E | )

  20. Multi-Color RQs 1. Shortest distance matrix 1. can be decomposed into k single-color RQs. F = F 1 F 2 … F k 2. (_, v 2 ); (_, _); …; ( v 1 , _) 3. Answer time: 1. (_, _) ∼ O ( | V | 2 ) O ( k | V | 2 ) 2. ( v 1 , _) or (_, v 2 ) ∼ O ( | V | ) 3. ( v 1 , v 2 ) ∼ O (1)

  21. Multi-Color RQs 2. Bi-directional BFS with an auxiliary LRU cache 1. Extend the set from src and dest. 2. ( v 1 , _), (_, v 2 ); (_, _); …; (_, _) 3. Terminate when those two sets intersect or one becomes empty. (_, _) ∼ O ( | V | 2 ( | V | + | E | )) 1. O ( k | V | 2 ( | V | + | E | )) ( v 1 , _) or (_, v 2 ) ∼ O ( | V | ( | V | + | E | )) 2. 3. ( v 1 , v 2 ) ∼ O ( | V | + | E | )

  22. Graph Pattern Queries (PQs) 1. JoinMatch 2. SplitMatch

  23. JoinMatch 1. Create a candidate match set for node. 2. Use RQs to remove ineligible nodes from the set 3. The input graph has to be DAG. Otherwise, compute SCC instead.

  24. JoinMatch

  25. SplitMatch 1. Treat query nodes and graph nodes uniformly. 2. Group nodes into blocks. Each block contains a set of nodes from both sources. 3. Compute partition-relation pair. This pair is recursively refined by splitting the blocks based on constraints (same rmv-set concept in JoinMatch).

  26. SplitMatch

  27. SplitMatch

  28. SplitMatch

Recommend


More recommend