incremental maintenance for non distributive aggregate
play

Incremental Maintenance for Non-Distributive Aggregate Functions - PowerPoint PPT Presentation

Incremental Maintenance for Non-Distributive Aggregate Functions Themis Palpanas (U of Toronto) Richard Sidle Bobbie Cochrane Hamid Pirahesh work done at IBM Almaden Research Center Motivation large amounts of data stored in databases


  1. Incremental Maintenance for Non-Distributive Aggregate Functions Themis Palpanas (U of Toronto) Richard Sidle Bobbie Cochrane Hamid Pirahesh work done at IBM Almaden Research Center

  2. Motivation large amounts of data stored in databases � expensive OLAP queries, but with nice properties: � based on same set of tables � perform similar aggregations can efficiently support such queries with � Automatic Summary Tables (ASTs) � precomputed once, used many times � answer complex queries fast must maintain ASTs when base tables change � inserts, updates, deletes VLDB, Aug 2002 Themis Palpanas - U of Toronto 2

  3. Motivation (cont’d) AST AST definition base tables insert/update/delete 3

  4. Problem Statement given ASTs with aggregate functions � distributive � SUM, COUNT � non -distributive � STDDEV, CORRELATION, REGRESSION, MIN/MAX, XMLAGG, … when base tables change � incrementally maintain affected ASTs efficient maintenance of ASTs with non-distributive aggregate functions VLDB, Aug 2002 Themis Palpanas - U of Toronto 4

  5. Outline Current Approach Our Solution Experimental Evaluation Related Work Conclusions VLDB, Aug 2002 Themis Palpanas - U of Toronto 5

  6. Current Approach Apply combine phase old and new values AST Propagate phase delta AST definition base tables insert/update/delete 6

  7. Current Approach (cont’d) works for distributive � SUM, COUNT does not work for non -distributive � STDDEV, CORRELATION, REGRESSION � MIN/MAX � XMLAGG need new way to deal with these functions VLDB, Aug 2002 Themis Palpanas - U of Toronto 7

  8. Our Solution selective recomputation � no longer enough to compute delta � must recompute some aggregation groups minimize work to be done � choose which groups to recompute � optimize query plan VLDB, Aug 2002 Themis Palpanas - U of Toronto 8

  9. Our Solution (cont’d) Apply combine phase old and Propagate new values recompute phase affected groups AST delta AST definition base tables insert/update/delete 9

  10. Our Solution (cont’d) the 5 steps compute new aggregate values 1. change column derivation 2. recompute only affected groups 3. eliminate unnecessary operations 4. optimize for special cases 5. VLDB, Aug 2002 Themis Palpanas - U of Toronto 10

  11. Initial Query Plan Query Graph Model (QGM) UDI LOJ AST prop VLDB, Aug 2002 Themis Palpanas - U of Toronto 11

  12. 1. Compute New Aggregate Values compute delta for distributive functions UDI recompute non- distributive functions LOJ get those values only for affected groups AST LOJ duplicate computation for prop AST distributive functions! VLDB, Aug 2002 Themis Palpanas - U of Toronto 12

  13. 2. Change Column Derivation change column UDI derivation rewrite phase projects LOJ out unused columns AST LOJ non-distributive entire AST gets only recomputed! prop AST distributive only VLDB, Aug 2002 Themis Palpanas - U of Toronto 13

  14. 3. Recompute Affected Groups push join predicate UDI down in AST non-distributive LOJ � only affected groups only are recomputed LOJ AST distributive only special rules for AST* super-aggregates J J prop … T1 Tk VLDB, Aug 2002 Themis Palpanas - U of Toronto 14

  15. 3. Recompute Affected Groups special treatment for ASTs with super-aggregates � predicates not pushdownable � caution not to compute totals of totals build special join predicate � ensure correct aggregations change rewrite rules � allow predicate pushdown through super aggregates � applicable only for special join predicate VLDB, Aug 2002 Themis Palpanas - U of Toronto 15

  16. 4. Remove Unnecessary Operations outerjoin not always needed UDI when changes are only inserts LOJ all columns � reroute columns from propagate AST phase through AST distributive � remove outerjoin AST only operator same for updates J J prop not referencing AST … grouping columns T1 Tk and predicates VLDB, Aug 2002 Themis Palpanas - U of Toronto 16

  17. 5. Optimize for Special Cases recomputation step not needed when � only insertions and only MIN/MAX functions � build predicate in apply phase � check if new min/max should replace old values � only deletions referring only to grouping columns of AST � can only cause entire groups to be deleted � handled in apply phase VLDB, Aug 2002 Themis Palpanas - U of Toronto 17

  18. Experimental Evaluation prototype implementation in IBM DB2 UDB star schema database sales of products over 5 year time period � fact table: 10 million tuples � AST with non-distributive aggregate function 240,000 tuples � workload simulates nightly updates add/delete data for first day of month 1. add/delete data for second day of month 2. add/delete data for full month 3. VLDB, Aug 2002 Themis Palpanas - U of Toronto 18

  19. Experimental Evaluation (cont’d) workload 1 workload 2 workload 3 incremental 286 294 420 full refresh 699 702 692 deletions require 40-60% of full refresh time workload 1 workload 2 workload 3 incremental 3 n/a 31 full refresh 699 702 692 optimized deletions require 1-4% of full refresh time VLDB, Aug 2002 Themis Palpanas - U of Toronto 19

  20. Experimental Evaluation (cont’d) workload 1 workload 2 workload 3 incremental 151 158 180 full refresh 702 702 721 insertions/updates require 20-25% of full refresh time VLDB, Aug 2002 Themis Palpanas - U of Toronto 20

  21. Related Work incremental view maintenance � differential refresh algorithms � Lindsay et al. 1986, Blakeley et al. 1986, Qian and Wiederhold 1991, Ceri and Widom 1991 � deferred incremental maintenance � Colby et al. 1996, Salem et al. 2000 � views with aggregation � Quass 1996, Mumick et al. 1997 VLDB, Aug 2002 Themis Palpanas - U of Toronto 21

  22. Conclusions incremental maintenance for ASTs with non-distributive aggregate functions � support MIN/MAX, STDDEV, CORRELATION, REGRESSION, XMLAGG, … efficient selective recomputation � recompute only affected groups � optimize query plan � customize for special cases significant performance improvements VLDB, Aug 2002 Themis Palpanas - U of Toronto 22

Recommend


More recommend