deciding equivalences among aggregate queries w erner
play

Deciding Equivalences among Aggregate Queries W erner Nutt - PDF document

Deciding Equivalences among Aggregate Queries W erner Nutt German Resea rch Center fo r AI (DFKI) Saa rb r uck en, Germany Y ehoshua Sagiv, Sa ra Shurin The Heb rew Universit y Jerusalem, Israel Deciding


  1. Deciding Equivalences among Aggregate Queries W erner Nutt German Resea rch Center fo r AI (DFKI) Saa rb r� uck en, Germany Y ehoshua Sagiv, Sa ra Shurin The Heb rew Universit y Jerusalem, Israel � Deciding Equivalences among Aggregate Queries PODS June 1998 { Slide 1

  2. Ackno wledgement � This w o rk o riginated within the ESPRIT Long T erm Resea rch Project "F oundations of Data W a rehouse Qualit y" (D W Q) � Deciding Equivalences among Aggregate Queries PODS June 1998 { Slide 2

  3. Motivation In recent y ea r, increased interest in optimization of aggregate queries � data w a rehousing decision supp o rt � Aggregate queries a re costly � they touch many data items need fo r sp ecialized optimizati on techniques ; Idea: Use p revious results to answ er new queries � exploit redundancy! � create redundancy! T o do so, w e have to b e able to answ er the question: \What can b e computed from what?" (= the view usabilit y p roblem) � Deciding Equivalences among Aggregate Queries PODS June 1998 { Slide 3

  4. Decision Supp o rt Queries Gr o wth Change in in Sales Sales as Sales Sales as vs. % of This % of Last Ca tegor y vs. Pr oduct Region Month Ca tegor y Month Last Month F ramis Cen tral 110 � 12% 31% 3% F ramis Eastern 179 -3% 28% -1% F ramis W estern 55 5% � 12% 1% 344 � 6% 33% 1% T otal F r amis Widget Cen tral 66 2% 18% 2% Widget Eastern 102 4% 12% 5% Widget W estern 39 -9% 9% -1% 207 1% 13% 4% T otal Widget Grand T otal 551 4% 20% 2% Example of a business rep ort. Exceptionally high v alues are mark ed with an asterisk ( � ). Exceptionally lo w v alues are sho wn as b old. (The example is tak en from R. Kim ball, The Data W arehouse T o olkit, Addison W esley) An SQL-query fo r the �rst column: Select p.product -na me as Product, m.region name as Region, sum(f.sal es) as Sales This Month From sales fact f, product p, market m, time t Where f.product key = p.product key, f.market key = m.market key, f.time key = t.time key, p.product name in ('Framis', 'Widget' ), t.month = 'May', t.year = 1996 Groupby p.product name, m.region name � Deciding Equivalences among Aggregate Queries PODS June 1998 { Slide 4

  5. Aggregate Queries: Abstract Notation SQL notation: Select p.A, s.B, max(r.C), sum(s.D), count(*) From p, r, s Where p.Z = r.Z, p.A = s.A, r.W = 'Joe', s.B < 10 Groupby p.A, s.B Abstract notation: ( A; max ( C ) ; sum ( D ) ; coun t ) q B ; s ( A; ) & p ( A; ) & ( Z ) & B ; D Z r ; C ; W 0 0 = & � W Joe B 10 In general: ( x ( y ) ; ( y )) & q ; : : : ; x ; � : : : ; � R C 1 m 1 1 n n Sho rt: ( � � ( � )) & , q x ; � y R C with conjunction of relational atoms R conjunction of compa risons C � Deciding Equivalences among Aggregate Queries PODS June 1998 { Slide 5

  6. The View Usabilit y Problem Given views ( � � ( � )) & v x ; � y R C i i i i i i and a query ( � � ( � )) & q x ; � y R C ; is there a query � ~ ~ ~ ( � ( � )) & q x; � y R C ; such that � consists of instantiations of the R v i � and ~ a re equivalent q q (i.e., and ~ p ro duce the same results q q over all databases ) W e need a syntactic cha racterization of ; equivalence! � Deciding Equivalences among Aggregate Queries PODS June 1998 { Slide 6

  7. Dimensions of the Problem ( � � ( � ) ) & q x ; � y R C Which aggregate functions? � { min , max { coun t { sum { coun t distinct { . . . � Queries { without compa risons { with compa risons over the: rationals , � � integers � Deciding Equivalences among Aggregate Queries PODS June 1998 { Slide 7

  8. Previous W o rk � Equivalence of conjunctive queries Chandra/Merlin 1977, Klug 1988 View usabilit y fo r conjunctive queries � Levy et al. 1995 � Containment and equivalence of conjunc- tive queries under bag-semantics Chaudhuri/V a rdi 1993 � Equivalence p reserving transfo rmations of aggregate queries Levy/Mumick 1994, Gupta et al. 1995 � View usabilit y fo r aggregate queries (su�- cient criteria) Srivastava et al. 1996 � View usabilit y fo r data cub es Ha rina ra y an et al. 1996, Gupta et al. 1997 (Almost) no complete cha racterizations fo r ; aggregate queries! � Deciding Equivalences among Aggregate Queries PODS June 1998 { Slide 8

  9. Decouple Aggregations Observation: In ( Pro ( Sales ) ; ( Pro�t )) & , q d ; max sum R C the aggregates ( Sales ), ( Pro�t ), max sum a re functionally dep endent on Pro d . De�nition: ( � ( y )) & q x ; � R C j j is the -th k ernel of j ( � ( y ) ; ( y )) & . q x ; � : : : ; � R C n n 1 1 � Deciding Equivalences among Aggregate Queries PODS June 1998 { Slide 9

  10. Divide and Conquer Theo rem: ( � ( y ) ; ( y )) q x ; � : : : ; � 1 1 n n 0 ( � ( y ) ; ( y )) q x ; � : : : ; � 1 1 n n a re equivalent if and only if their k ernels ( � ( y )) q x; � j j j 0 ( � ( y )) q x; � j j j a re pairwise equivalent fo r all 2 1 ::n . j it su�ces to solve the equivalence p roblem ; fo r queries with a single aggregate term ( � � ( y )) & q x ; R C ( simple aggregate queries). � Deciding Equivalences among Aggregate Queries PODS June 1998 { Slide 10

  11. Aggregate Queries and Conjunctive Queries The co re of ( � � ( y )) & q x ; R C is the conjunctive query � ( � ) & q x ; y R C : Examples: � The co re of ( � ( y )) & q x ; sum R C is � ( � ) & q x; y R C : The co re of � ( � coun t ) & q x; R C is � ( � ) & q x R C : Strategy: Reduce equivalence of simple aggre- gate queries to p rop erties of their co res. � Deciding Equivalences among Aggregate Queries PODS June 1998 { Slide 11

  12. Reminder on Conjunctive Queries � Conjunctive queries have the fo rm ( � ) & q x R C relational atom conjunction conjunction with va riables of relational of compa risons atoms D � := the result of over database D q q 0 0 � and a re equivalent (written � ) i� q q q q D 0D = fo r all db's D q q 0 0 � is contained in (written � ) i� q q q q D 0D � fo r all db's D q q Ho w can w e check containment? ; � Deciding Equivalences among Aggregate Queries PODS June 1998 { Slide 12

  13. Query Homomo rphisms An homomo r phism from 0 0 0 ( � ) & q x R C to ( � ) & q x R C is a substitution such that � � = � � � x x 0 � � � R R 0 � j = . C � C Theo rem (Chandra/Merlin 77): F o r relational conjunctive queries: 0 � , there is an homomo rphism q q 0 from to q q Finding an homomo rphism is NP-complete! � Deciding Equivalences among Aggregate Queries PODS June 1998 { Slide 13

  14. Containment and Compa risons Classical example: 0 p ( u; ) & � q v u v p ( y ) & p ( z ) q ; z ; y 0 W e have , � q q 0 but no homomo rphism from to . q q Idea: replace with its linea r expansion ( q ) ! q L L p ( y ) & p ( z ) & q ; z ; y y < z f y g <z p ( y ) & p ( z ) & = q ; z ; y y z f y = z g p ( y ) & p ( z ) & q ; z ; y y > z f y g >z (case analysis) Theo rem (Klug 88): 0 fo r every in ( q ) , � , q q q L L L there is an homomo rphism 0 from to q q L P Containment with compa risons is � -complete. 2 (van der Meyden) � Deciding Equivalences among Aggregate Queries PODS June 1998 { Slide 14

  15. Relational Max-Queries 0 0 ( � max ( y )) � ( � max ( y )) ? q x ; R q x; R Theo rem: F o r relational max-queries: 0 0 the co res � and � � , q q q q a re equivalent Relational queries deliver the same max only if they deliver the same values ! � Deciding Equivalences among Aggregate Queries PODS June 1998 { Slide 15

Recommend


More recommend