graph cube on warehousing and olap multidimensional
play

Graph Cube: On Warehousing and OLAP Multidimensional Networks - PowerPoint PPT Presentation

Graph Cube: On Warehousing and OLAP Multidimensional Networks Peixiang Zhao , Xiaolei Li , Dong Xin , Jiawei Han Department of Computer Science, UIUC Groupon Inc. Google Cooperation pzhao4@illinois.edu,


  1. Graph Cube: On Warehousing and OLAP Multidimensional Networks Peixiang Zhao † , Xiaolei Li ‡ , Dong Xin § , Jiawei Han † † Department of Computer Science, UIUC ‡ Groupon Inc. § Google Cooperation † pzhao4@illinois.edu, hanj@cs.illinois.edu ‡ me@xiaolei.org, § dongxin@gmail.com June 16th, 2011 SIGMOD 2011 Athens, Greece 1 / 24

  2. Outline 1 Introduction 2 The Graph Cube Model 3 OLAP on Graph Cube Cuboid Query Crossboid Query 4 Implementing Graph Cube 5 Experiment 6 Conclusion SIGMOD 2011 Athens, Greece 2 / 24

  3. Introduction Recent years have seen an astounding growth of networks in a wide spectrum of application domains Communication networks Social networks Biological networks The Web Multidimensional networks An underlying graph structure comprising entities and 1 relationships Multidimensional attributes are specified and associated with 2 entities of the network There exist considerable technology gaps in managing, querying and summarizing multidimensional networks effectively SIGMOD 2011 Athens, Greece 3 / 24

  4. A Sample Multidimensional Network ID Gender Location Profession Income 1 Male CA Teacher $70 , 000 1 5 2 Female WA Teacher $65 , 000 3 Female CA Engineer $80 , 000 2 3 10 4 Female NY Teacher $90 , 000 5 Male IL Lawyer $80 , 000 4 6 6 Female WA Teacher $90 , 000 7 Male NY Lawyer $100 , 000 7 8 9 8 Male IL Engineer $75 , 000 9 Female CA Lawyer $120 , 000 (a) Graph 10 Male IL Engineer $95 , 000 (b) Vertex Attribute Table Figure: A Multidimensional Network Comprising a Graph Structure and a Multidimensional Vertex Attribute Table SIGMOD 2011 Athens, Greece 4 / 24

  5. Introduction Motivation : Can we extend decision support facilities on multidimensional networks? Data warehouses and OLAP are advantageous in the multidimensional network scenario Summarizing the massive networks into different levels of granularity for more effective analysis and exploration Business Intelligence: in Facebook and Twitter, advertisers and marketers take advantage of social networks within different multidimensional spaces to better promote their products via social targeting or viral marketing However, in multidimensional networks, much of the valuation and interest lies in the network itself! Simple numeric value based group-by’s in traditional data warehouses are no longer insightful and of limited usage, because the structural information of the networks is simply ignored SIGMOD 2011 Athens, Greece 5 / 24

  6. Network Aggregation v.s. Traditional Group-by 3 Gender COUNT(*) 9 Male 5 5 5 Female 5 Male Female (a) Aggregate Network (b) Aggregate Table Figure: Multidimensional Network Aggregation v.s. Traditional RDB Aggregation (Group by Gender) (Male, CA) Gender Location COUNT(*) 1 Male CA 1 Female CA 2 (Female, WA) Female WA 2 (Female, CA) 2 2 Male IL 3 5 (Male, NY) Male NY 1 (Male, IL) 3 1 1 Female NY 1 (Female, NY) (a) Aggregate Network (b) Aggregate Table Figure: Multidimensional Network Aggregation v.s. Traditional RDB Aggregation (Group by Gender and Location) SIGMOD 2011 Athens, Greece 6 / 24

  7. Introduction Graph Cube A multidimensional network can be summarized to aggregate networks in coarser levels of granularity within different multidimensional spaces Vertex coalescence Structure summarization Different query models and OLAP solutions are proposed for multidimensional networks Cuboid Queries Crossboid Queries Efficient implementation is based on a combination of Well-studied data cube implementation techniques Special characteristics of multidimensional networks The first to systematically address warehousing and OLAP issues on large multidimensional networks SIGMOD 2011 Athens, Greece 7 / 24

  8. The Graph Cube Model Multidimensional Network A multidimensional network, N , is a graph denoted as N = ( V , E , A ), where V is a set of vertices, E ⊆ V × V is a set of edges and A = { A 1 , A 2 , . . . , A n } is a set of n vertex-specific attributes, i.e., ∀ u ∈ V , there is a tuple A ( u ) of u , denoted as A ( u ) = ( A 1 ( u ) , A 2 ( u ) , . . . , A n ( u )), where A i ( u ) is the value of u on i -th attribute, 1 ≤ i ≤ n . A is called the dimensions of the network N . Some (or all) dimension A i could be ∗ (ALL), representing a super-aggregation along A i Given a set of n dimensions of a network, there exist 2 n multidimensional spaces (aggregations) The measure within each possible space is no longer a simple numeric value, but an aggregate network SIGMOD 2011 Athens, Greece 8 / 24

  9. The Graph Cube Model Graph Cube Given a multidimensional network N = ( V , E , A ), the graph cube is obtained by restructuring N in all possible aggregations of A . For each possible aggregation A ′ of A , the grouping measure is an aggregate network G ′ w.r.t. A ′ . Apex 2 (Location) (Gender) (Profession) 5 12 8 (Gender, Profession) (Gender, Location) (Location, Profession) 15 16 19 23 Base Figure: The Graph Cube Lattice SIGMOD 2011 Athens, Greece 9 / 24

  10. OLAP on Graph Cubes Cuboid Query: return as output the aggregate network corresponding to a specific aggregation of the dimensions of the multidimensional network What is the network structure between various genders? What is the network structure between the various gender and location combinations? (Male, CA) 1 (Female, WA) (Female, CA) 2 2 5 (Male, NY) 3 (Male, IL) 3 1 1 9 5 5 (Female, NY) Male Female SIGMOD 2011 Athens, Greece 10 / 24

  11. OLAP on Graph Cubes A cuboid query is within a single multidimensional space, which follows the traditional OLAP model A crossboid query crosses multiple multidimensional spaces of the network, i.e., more than one cuboid is involved in a query What is the network structure between the user with ID = 3 and various locations? What is the network structure between users grouped by gender v.s. users grouped by location?. Male Female 5 5 WA IL 1 3 4 6 2 3 3 6 2 ID: 3 2 1 3 3 2 2 CA NY 1 1 CA IL WA NY SIGMOD 2011 Athens, Greece 11 / 24

  12. Cuboid Queries v.s. Crossboid Queries (Gender) (Location) "What is the network structure between users and the locations?" (Profession) (Gender, Location, Profession) (Location) (Gender) (Gender, Profession) "What is the network structure between users grouped by gender and (Gender, Location) Apex users grouped by location?" (Gender, Location, Profession) (a) Traditional Cuboid Queries (b) Crossboid Queries Straddling Multiple Cuboids SIGMOD 2011 Athens, Greece 12 / 24

  13. Graph Cube Implementation Objective: compute the aggregate networks of different cuboids grouping on all possible dimension combinations of a multidimensional network Full materialization: Best query response time, worst space 1 cost No materialization: Best space cost, worst query response 2 time Partial materialization: A small portion of cuboids is 3 materialized in order to balance the tradeoff between query response time and cube resource requirement SIGMOD 2011 Athens, Greece 13 / 24

  14. Graph Cube Implementation: Partial Materialization Problem: To select a set S of k cuboids in the graph cube for materialization, such that the average time taken to evaluate the queries can be minimized The partial materialization problem is NP-complete, reduced from set-cover Greedy Algorithm: Selecting k cuboids with the highest size-reduction benefit Theorem Let B greedy be the benefit of k cuboids chosen by the greedy algorithm and let B opt be the benefit of any optimal set of k cuboids. Then B greedy ≤ (1 − 1 / e ) × B opt and this bound is tight MinLevel Algorithm: Materializing cuboids c , where dim ( c ) = l 0 indicating the level in the cube lattice at which we start materializing cuboids SIGMOD 2011 Athens, Greece 14 / 24

  15. Experimental Evaluation DBLP data set A co-authorship graph with 28 , 702 authors as vertices and 66 , 832 coauthor relationships as edges Three dimensions: name, area, productivity area: DB, DM, AI, IR productivity: Excellent, Good, Fair, Poor IMDB data set A movie rating network with 116 , 164 vertices and 5 , 452 , 350 edges Seven dimensions: Title, Year, Length, Budget, Rating, MPAA and Type MPAA: G, PG, PG-13, R, NC-17, NR Type: action, animation, comedy, drama, documentary, romance, short SIGMOD 2011 Athens, Greece 15 / 24

  16. Effectiveness Evaluation DM Poor Fair DB 22490 31587 7116 3520 2220 15877 7752 4590 26170 2165 2307 1999 1182 1229 5787 872 1744 2584 682 18729 139 8010 11329 5031 321 46 1550 496 AI IR Good Excellent (c) (Area) (d) (Productivity) Figure: Cuboid Queries of the Graph Cube on DBLP Data Set SIGMOD 2011 Athens, Greece 16 / 24

  17. Effectiveness Evaluation 4182 252 (DM, Poor) (DB, Excellent) (DM, Fair) 4209 203 105 331 34 32 425 (DB, Good) (DM, Good) 333 161 670 396 290 43 410 1270 1422 2877 (DB, Fair) (DM, Excellent) 170 732 7 4 1148 5276 (DB, Poor) (AI, Poor) 6825 361 10498 10975 8887 (AI, Fair) 253 244 4 292 747 838 1 523 (IR, Excellent) 34 (AI, Good) 679 31 83 76 (IR, Good) 355 1 478 4638 (AI, Excellent) (IR, Fair) (IR, Poor) 4590 (a) (Area, Productivity) Figure: Cuboid Queries of the Graph Cube on DBLP Data Set SIGMOD 2011 Athens, Greece 17 / 24

Recommend


More recommend