Pessimistic Query Optimization: Tighter Upper Bounds for Intermediate Join Cardinalities Walter Cai Magdalena Balazinska Dan Suciu University of Washington [walter,magda,suciu]@cs.washington.edu July 19th, 2019 Cai, Balazinska, Suciu (UW) Pessimistic Query Optimization July 19th, 2019 1 / 27
Systematic Underestimation Query optimizers assume: ◮ Uniformity ◮ Independence Cai, Balazinska, Suciu (UW) Pessimistic Query Optimization July 19th, 2019 2 / 27
Background: Cardinality Bounds Background: Cardinality Bounds 1 Tightened Cardinality Bounds 2 Evaluation 3 Cai, Balazinska, Suciu (UW) Pessimistic Query Optimization July 19th, 2019 3 / 27
Background: Cardinality Bounds Background: Cardinality Bounds 1 Tightened Cardinality Bounds 2 Evaluation 3 Cai, Balazinska, Suciu (UW) Pessimistic Query Optimization July 19th, 2019 3 / 27
Background: Cardinality Bounds Example Query (SQL) SELECT * FROM pseudonym , cast, movie_companies , company_name WHERE pseudonym.person id = cast.person id AND cast.movie id id = movie_companies.movie id AND movie_companies.company id = company_name.id; Cai, Balazinska, Suciu (UW) Pessimistic Query Optimization July 19th, 2019 4 / 27
Background: Cardinality Bounds Example Query (Join Graph & Datalog) pseudonym Q ( x , y , z , w ) � pseudo ( x , y ) , cast ( y , z ) , cast mc ( z , w ) , cn ( w ) movie companies company name y �→ person z �→ movie Figure: Join Graph. w �→ company Cai, Balazinska, Suciu (UW) Pessimistic Query Optimization July 19th, 2019 5 / 27
Background: Cardinality Bounds Review: Entropy Take random variable X : � h ( X ) = − P ( X = a ) · log( P ( X = a )) a Multiple variables: � h ( X , Y ) = − P ( X = a , Y = b ) · log( P ( X = a , Y = b )) a , b Conditional entropy: � P ( X = a , Y = b ) � � h ( X | Y ) = − P ( X = a , Y = b ) · log P ( Y = b ) a , b Cai, Balazinska, Suciu (UW) Pessimistic Query Optimization July 19th, 2019 6 / 27
Background: Cardinality Bounds Review: Entropy X ∼ P ( x 1 , . . . , x n ) ◮ Fact: h ( X ) ≤ log( n ) ◮ h ( X ) = log( n ) iff P is uniform Cai, Balazinska, Suciu (UW) Pessimistic Query Optimization July 19th, 2019 7 / 27
Background: Cardinality Bounds Connection to Entropy Q ( x , y , z , w ) � pseudo ( x , y ) , cast ( y , z ) , mc ( z , w ) , cn ( w ) ◮ Create random variable for each attribute. x → X , y → Y , z → Z , w → W ◮ Let ( X , Y , Z , W ) be uniformly distributed over true output of Q . � � � � h ( X , Y , Z , W ) = log � Q ( x , y , z , w ) � � � � � � � � � exp h ( X , Y , Z , W ) = � Q ( x , y , z , w ) � � � Cai, Balazinska, Suciu (UW) Pessimistic Query Optimization July 19th, 2019 8 / 27
Background: Cardinality Bounds Entropic Bound � � � � � � � Q ( x , y , z , w ) � = exp h ( X , Y , Z , W ) � � ◮ Suffices to bound h ( X , Y , Z , W ) . Cai, Balazinska, Suciu (UW) Pessimistic Query Optimization July 19th, 2019 9 / 27
Background: Cardinality Bounds Entropic Bound h ( X , Y , Z , W ) ≤ h ( X | Y ) + h ( Y , Z ) + h ( W | Z ) Cai, Balazinska, Suciu (UW) Pessimistic Query Optimization July 19th, 2019 10 / 27
Background: Cardinality Bounds Entropic Bound � � � � � Q ( x , y , z , w ) � = exp( h ( X , Y , Z , W )) ≤ exp( h ( X | Y ) + h ( Y , Z ) + h ( W | Z )) � � h ( Y , Z ) ≤ log( count ( cast )) h ( X | Y ) ≤ log( max degree ( pseudonym )) h ( W | Z ) ≤ log( max degree ( movie companies )) Cai, Balazinska, Suciu (UW) Pessimistic Query Optimization July 19th, 2019 11 / 27
Background: Cardinality Bounds Entropic Bound � � � � � Q ( x , y , z , w ) � = exp( h ( X , Y , Z , W )) ≤ exp( h ( X | Y ) + h ( Y , Z ) + h ( W | Z )) � � h ( Y , Z ) ≤ log c cast h ( X | Y ) ≤ log d y pseudo h ( W | Z ) ≤ log d z mc Cai, Balazinska, Suciu (UW) Pessimistic Query Optimization July 19th, 2019 11 / 27
Background: Cardinality Bounds Cardinality Bound � � � � � Q ( x , y , z , w ) � = exp( h ( X , Y , Z , W )) � � ≤ exp( h ( X | Y ) + h ( Y , Z ) + h ( W | Z ) ) � �� �� �� � � ��� �� ��� � � ��� �� ��� � ≤ log d y ≤ log c cast ≤ log d z mc pseudo ≤ d y pseudo · c cast · d z mc Cai, Balazinska, Suciu (UW) Pessimistic Query Optimization July 19th, 2019 12 / 27
Background: Cardinality Bounds Many Entropic Bounds h ( X , Y , Z , W ) ≤ ... h ( X , Y ) + h ( Z | Y ) + h ( W | Z ) h ( X , Y ) + h ( Z | Y ) + h ( W ) h ( X , Y ) + h ( Z , W ) h ( X , Y ) + h ( Z | W ) + h ( W ) h ( X | Y ) + h ( Y , Z ) + h ( W | Z ) h ( X | Y ) + h ( Y | Z ) + h ( Z , W ) h ( X | Y ) + h ( Y | Z ) + h ( Z | W ) + h ( Z ) Cai, Balazinska, Suciu (UW) Pessimistic Query Optimization July 19th, 2019 13 / 27
Background: Cardinality Bounds Entropic Bounds Q ( x , y , z , w ) � pseudo ( x , y ) , cast ( y , z ) , mc ( z , w ) , cn ( w ) c pseudo · d y cast · d z mc c pseudo · d y cast · c cn c pseudo · c mc c pseudo · d w mc · c cn � � � � � Q ( x , y , z , w ) � ≤ min � � d y pseudo · c cast · d z mc d y pseudo · c cast · c cn d y pseudo · d z cast · c mc d y pseudo · d z cast · d w mc · c cn Cai, Balazinska, Suciu (UW) Pessimistic Query Optimization July 19th, 2019 14 / 27
Background: Cardinality Bounds Entropic Bounds Neat! But is it useful? ◮ Short answer: ‘No’. (Not yet, anyway) ◮ Bounds still too loose (overestimation) ◮ Need to tighten ◮ How to tighten? Partitioning Cai, Balazinska, Suciu (UW) Pessimistic Query Optimization July 19th, 2019 15 / 27
Tightened Cardinality Bounds Background: Cardinality Bounds 1 Tightened Cardinality Bounds 2 Evaluation 3 Cai, Balazinska, Suciu (UW) Pessimistic Query Optimization July 19th, 2019 16 / 27
Tightened Cardinality Bounds Q ( x , y , z , w ) � pseudo ( x , y ) , cast ( y , z ) , mc ( z , w ) , cn ( w ) company name pseudonym movie companies cast pseudo cast mc cn Cai, Balazinska, Suciu (UW) Pessimistic Query Optimization July 19th, 2019 17 / 27
Tightened Cardinality Bounds Q ( x , y , z , w ) � pseudo ( x , y ) , cast ( y , z ) , mc ( z , w ) , cn ( w ) company name pseudonym movie companies cast pseudo [0,0] pseudo [1,0] cast [0,0] cast [1,0] mc [0,0] mc [1,0] cn [0] pseudo [0,1] pseudo [1,1] cast [0,1] cast [1,1] mc [0,1] mc [1,1] cn [1] ◮ Value based hashing ◮ Analagous to hash join Cai, Balazinska, Suciu (UW) Pessimistic Query Optimization July 19th, 2019 17 / 27
Tightened Cardinality Bounds Q ( x , y , z , w ) � pseudo ( x , y ) , cast ( y , z ) , mc ( z , w ) , cn ( w ) company name pseudonym movie companies cast cast [1,0] pseudo [0,1] mc [0,1] cn [1] ◮ Value based hashing ◮ Analagous to hash join Cai, Balazinska, Suciu (UW) Pessimistic Query Optimization July 19th, 2019 17 / 27
Tightened Cardinality Bounds Q ( x , y , z , w ) � pseudo ( x , y ) , cast ( y , z ) , mc ( z , w ) , cn ( w ) company name pseudonym movie companies cast pseudo [0,0] pseudo [1,0] cast [0,0] cast [1,0] mc [0,0] mc [1,0] cn [0] pseudo [0,1] pseudo [1,1] cast [0,1] cast [1,1] mc [0,1] mc [1,1] cn [1] ◮ Q ( D ) : query evaluated on database D � Q ( D ) = Q ( D [ J ]) ◮ Q ( D [ J ]) : query evaluated on parition D [ J ] J Cai, Balazinska, Suciu (UW) Pessimistic Query Optimization July 19th, 2019 17 / 27
Tightened Cardinality Bounds Q ( x , y , z , w ) � pseudo ( x , y ) , cast ( y , z ) , mc ( z , w ) , cn ( w ) company name pseudonym movie companies cast pseudo [0,0] pseudo [1,0] cast [0,0] cast [1,0] mc [0,0] mc [1,0] cn [0] pseudo [0,1] pseudo [1,1] cast [0,1] cast [1,1] mc [0,1] mc [1,1] cn [1] � Q ( D ) = Q ( D [ J ]) ◮ Bound each partition D [ J ] J � � � ◮ Sum will be bound on full database � � � Q ( D ) bound ( Q ( D [ J ])) � ≤ � � J Cai, Balazinska, Suciu (UW) Pessimistic Query Optimization July 19th, 2019 17 / 27
Tightened Cardinality Bounds Partition Bounding c pseudo [ J ] · d y cast [ J ] · d z mc [ J ] c pseudo [ J ] · d y cast [ J ] · c cn [ J ] c pseudo [ J ] · c mc [ J ] c pseudo [ J ] · d w mc [ J ] · c cn [ J ] � � � � � � Q ( D ) min � ≤ � � d y pseudo [ J ] · c cast [ J ] · d z mc [ J ] J ∈{ 0 , 1 } 4 d y pseudo [ J ] · c cast [ J ] · c cn [ J ] d y pseudo [ J ] · d z cast [ J ] · c mc [ J ] d y pseudo [ J ] · d z cast [ J ] · d w mc [ J ] · c cn [ J ] Cai, Balazinska, Suciu (UW) Pessimistic Query Optimization July 19th, 2019 18 / 27
Tightened Cardinality Bounds Optimizations ◮ Bound Formula Generation ◮ Partition Budgeting ◮ Combats exponential runtime w.r.t. hash size ◮ Non-monotonic behaviour ◮ Filter Predicates Cai, Balazinska, Suciu (UW) Pessimistic Query Optimization July 19th, 2019 19 / 27
Recommend
More recommend