Incremental Maintenance for Non-Distributive Aggregate Functions Themis Palpanas (U of Toronto) Richard Sidle Bobbie Cochrane Hamid Pirahesh work done at IBM Almaden Research Center
Motivation large amounts of data stored in databases � expensive OLAP queries, but with nice properties: � based on same set of tables � perform similar aggregations can efficiently support such queries with � Automatic Summary Tables (ASTs) � precomputed once, used many times � answer complex queries fast must maintain ASTs when base tables change � inserts, updates, deletes VLDB, Aug 2002 Themis Palpanas - U of Toronto 2
Motivation (cont’d) AST AST definition base tables insert/update/delete 3
Problem Statement given ASTs with aggregate functions � distributive � SUM, COUNT � non -distributive � STDDEV, CORRELATION, REGRESSION, MIN/MAX, XMLAGG, … when base tables change � incrementally maintain affected ASTs efficient maintenance of ASTs with non-distributive aggregate functions VLDB, Aug 2002 Themis Palpanas - U of Toronto 4
Outline Current Approach Our Solution Experimental Evaluation Related Work Conclusions VLDB, Aug 2002 Themis Palpanas - U of Toronto 5
Current Approach Apply combine phase old and new values AST Propagate phase delta AST definition base tables insert/update/delete 6
Current Approach (cont’d) works for distributive � SUM, COUNT does not work for non -distributive � STDDEV, CORRELATION, REGRESSION � MIN/MAX � XMLAGG need new way to deal with these functions VLDB, Aug 2002 Themis Palpanas - U of Toronto 7
Our Solution selective recomputation � no longer enough to compute delta � must recompute some aggregation groups minimize work to be done � choose which groups to recompute � optimize query plan VLDB, Aug 2002 Themis Palpanas - U of Toronto 8
Our Solution (cont’d) Apply combine phase old and Propagate new values recompute phase affected groups AST delta AST definition base tables insert/update/delete 9
Our Solution (cont’d) the 5 steps compute new aggregate values 1. change column derivation 2. recompute only affected groups 3. eliminate unnecessary operations 4. optimize for special cases 5. VLDB, Aug 2002 Themis Palpanas - U of Toronto 10
Initial Query Plan Query Graph Model (QGM) UDI LOJ AST prop VLDB, Aug 2002 Themis Palpanas - U of Toronto 11
1. Compute New Aggregate Values compute delta for distributive functions UDI recompute non- distributive functions LOJ get those values only for affected groups AST LOJ duplicate computation for prop AST distributive functions! VLDB, Aug 2002 Themis Palpanas - U of Toronto 12
2. Change Column Derivation change column UDI derivation rewrite phase projects LOJ out unused columns AST LOJ non-distributive entire AST gets only recomputed! prop AST distributive only VLDB, Aug 2002 Themis Palpanas - U of Toronto 13
3. Recompute Affected Groups push join predicate UDI down in AST non-distributive LOJ � only affected groups only are recomputed LOJ AST distributive only special rules for AST* super-aggregates J J prop … T1 Tk VLDB, Aug 2002 Themis Palpanas - U of Toronto 14
3. Recompute Affected Groups special treatment for ASTs with super-aggregates � predicates not pushdownable � caution not to compute totals of totals build special join predicate � ensure correct aggregations change rewrite rules � allow predicate pushdown through super aggregates � applicable only for special join predicate VLDB, Aug 2002 Themis Palpanas - U of Toronto 15
4. Remove Unnecessary Operations outerjoin not always needed UDI when changes are only inserts LOJ all columns � reroute columns from propagate AST phase through AST distributive � remove outerjoin AST only operator same for updates J J prop not referencing AST … grouping columns T1 Tk and predicates VLDB, Aug 2002 Themis Palpanas - U of Toronto 16
5. Optimize for Special Cases recomputation step not needed when � only insertions and only MIN/MAX functions � build predicate in apply phase � check if new min/max should replace old values � only deletions referring only to grouping columns of AST � can only cause entire groups to be deleted � handled in apply phase VLDB, Aug 2002 Themis Palpanas - U of Toronto 17
Experimental Evaluation prototype implementation in IBM DB2 UDB star schema database sales of products over 5 year time period � fact table: 10 million tuples � AST with non-distributive aggregate function 240,000 tuples � workload simulates nightly updates add/delete data for first day of month 1. add/delete data for second day of month 2. add/delete data for full month 3. VLDB, Aug 2002 Themis Palpanas - U of Toronto 18
Experimental Evaluation (cont’d) workload 1 workload 2 workload 3 incremental 286 294 420 full refresh 699 702 692 deletions require 40-60% of full refresh time workload 1 workload 2 workload 3 incremental 3 n/a 31 full refresh 699 702 692 optimized deletions require 1-4% of full refresh time VLDB, Aug 2002 Themis Palpanas - U of Toronto 19
Experimental Evaluation (cont’d) workload 1 workload 2 workload 3 incremental 151 158 180 full refresh 702 702 721 insertions/updates require 20-25% of full refresh time VLDB, Aug 2002 Themis Palpanas - U of Toronto 20
Related Work incremental view maintenance � differential refresh algorithms � Lindsay et al. 1986, Blakeley et al. 1986, Qian and Wiederhold 1991, Ceri and Widom 1991 � deferred incremental maintenance � Colby et al. 1996, Salem et al. 2000 � views with aggregation � Quass 1996, Mumick et al. 1997 VLDB, Aug 2002 Themis Palpanas - U of Toronto 21
Conclusions incremental maintenance for ASTs with non-distributive aggregate functions � support MIN/MAX, STDDEV, CORRELATION, REGRESSION, XMLAGG, … efficient selective recomputation � recompute only affected groups � optimize query plan � customize for special cases significant performance improvements VLDB, Aug 2002 Themis Palpanas - U of Toronto 22
Recommend
More recommend