cs411
play

CS411 balanced search tree Database Systems hash table R etc - PDF document

One-pass Algorithms Duplicate elimination (R) Need to keep a dictionary in memory: CS411 balanced search tree Database Systems hash table R etc Cost: B(R) 12: Query Optimization Scan Assumption: B( (R)) <=


  1. One-pass Algorithms Duplicate elimination δ (R) • Need to keep a dictionary in memory: CS411 – balanced search tree Database Systems – hash table R – etc • Cost: B(R) 12: Query Optimization Scan • Assumption: B( δ (R)) <= M before? Input buffer Kazuhiro Minami Output M-1 buffers buffer One-pass Algorithms Optimization • Step 1: convert the SQL query to some logical Grouping: γ city, sum(price) (R) plan – Remove subqueries from conditions • Need to keep a dictionary in memory – Map the SFW statement into RA expression • Also store the sum(price) for each city • Step 2: find a better logical plan, find an • Cost: B(R) associated physical plan • Assumption: number of cities fits in memory – Algebraic laws: • foundation for every optimization – Two approaches to optimizations: • Heuristics: apply laws that seem to result in cheaper plans • Cost based: estimate size and cost of intermediate results, search systematically for best plan

  2. Converting from SQL to Logical Plans Select a 1 , …, a n Select a 1 , …, a n From R 1 , …, R k Π a1,…,an ( σ C (R 1 × R 2 × … × R k )) From R 1 , …, R k SQL –> Logical Query Plans Where C Where C Select a 1 , …, a n Select a 1 , …, a n From R 1 , …, R k From R 1 , …, R k Where C Where C Group by b 1 , …, b l Group by b 1 , …, b l Π a1,…,an ( γ b1, …, bm, aggs ( σ C (R 1 × R 2 × … × R k ))) Some nested queries can be flattened Converting Nested Queries Q: Give a list of product-manufacture pairs where the color of the product is blue and its prices is the Select distinct product.name highest among the products with blue color from Select distinct product.name From product that manufacture. From product Where product.maker in (Select company.name Where product.maker in (Select company.name Select distinct x.name, x.maker Select distinct x.name, x.maker From company From company From product x From product x where company.city=“Urbana”) where company.city=“Urbana”) Where x.color= “blue” Where x.color= “blue” AND x.price >= ALL (Select y.price AND x.price >= ALL (Select y.price Select distinct product.name Select distinct product.name From product y From product y From product, company From product, company Where x.maker = y.maker Where x.maker = y.maker Where product.maker = company.name AND Where product.maker = company.name AND AND y.color=“blue”) company.city=“Urbana” AND y.color=“blue”) company.city=“Urbana” Q: How do we convert this one to logical plan ?

  3. Converting Nested Queries Converting Nested Queries Let’s compute the complement first: This one becomes a query without subqueries: Select distinct x.name, x.maker Select distinct x.name, x.maker From product x From product x Select distinct x.name, x.maker Where x.color= “blue” Select distinct x.name, x.maker Where x.color= “blue” AND x.price < SOME (Select y.price AND x.price < SOME (Select y.price From product x, product y From product x, product y From product y From product y Where x.color= “blue” AND x.maker = y.maker Where x.color= “blue” AND x.maker = y.maker Where x.maker = y.maker Where x.maker = y.maker AND y.color=“blue” AND x.price < y.price AND y.color=“blue”) AND y.color=“blue” AND x.price < y.price AND y.color=“blue”) This returns exactly the products we DON’T want, so… A set difference operator finishes the job Now rewrite the logical plan to an equivalent but better one Will probably run faster (Select x.name, x.maker Same query (Select x.name, x.maker answer From product x From product x Optimizer uses algebraic Heuristic : Where x.color = “blue”) Where x.color = “blue”) laws likely to result in cheaper plans EXCEPT EXCEPT Equivalen (Select x.name, x.maker (Select x.name, x.maker Original t From product x, product y From product x, product y Cost-based : Where x.color= “blue” AND x.maker = y.maker Where x.color= “blue” AND x.maker = y.maker estimate size and cost AND y.color=“blue” AND x.price < y.price) AND y.color=“blue” AND x.price < y.price) of intermediate results, search systematically for best plan

  4. Algebraic Laws • Commutative and Associative Laws – R ∪ S = S ∪ R, R ∪ (S ∪ T) = (R ∪ S) ∪ T – R ∩ S = S ∩ R, R ∩ (S ∩ T) = (R ∩ S) ∩ T Algebraic Laws – R ⋈ S = S ⋈ R, R ⋈ (S ⋈ T) = (R ⋈ S) ⋈ T • Distributive Laws – R ⋈ (S ∪ T) = (R ⋈ S) ∪ (R ⋈ T) R( A,B,C,D ) S( E,F,G ) Algebraic laws about selections σ F=3 (R ⋈ D=E S) = (R ⋈ D=E σ F=3 (S)) σ C AND D (R) = σ C ( σ D (R)) = σ C (R) ∩ σ D (R) σ C OR D (R) = σ C (R) ∪ σ D (R) σ C (R ∪ S) = σ C (R) ∪ σ C (S) σ A=5 AND G=9 (R ⋈ D=E S) = σ A=5 ( σ G=9 (R ⋈ D=E S)) = σ A=5 (R ⋈ D=E σ G=9 (S)) = σ A=5 (R) ⋈ D=E σ G=9 (S) o o l l v v e e s s C C i i n n v v i i f f σ C (R ⋈ S) = σ C (R) ⋈ S t t e e s s t t r r i i b b u u l l y y a a t t o o n n σ C (R – S) = σ C (R) – S o o f f R R σ C (R ∩ S) = σ C (R) ∩ S

  5. Algebraic laws for grouping and Algebraic laws for projection aggregation Π M (R ⋈ S) = Π N ( Π P (R) ⋈ Π Q (S)) SUM δ ( γ A, agg(B) (R)) = γ A, agg(B) (R) where N, P, Q are appropriate subsets of attributes of M COUNT γ A, agg(B) ( δ (R)) = γ A, agg(B) (R) , AVG Π M ( Π N (R)) = Π M,N (R) MIN if agg is duplicate insensitive MAX R(A,B,C,D) S(E,F,G) The book describes additional algebraic laws, but even the book doesn’t cover them all. Π A,B,G (R ⋈ D=E S) = Π ? ( Π ? (R) ⋈ D=E Π ? (S)) Heuristic Based Optimizations • Query rewriting based on algebraic laws • Result in better queries most of the time Heuristics-based Optimization • Heuristics number 1: - or – – Push selections down Do projections and selections as • Heuristics number 2: early as possible – Sometimes push selections up, then down

  6. For each company with a product costing Predicate Pushdown more than $100, find the max price of its products pname Select y.name, y.address, Select y.name, y.address, Select y.name, y.address, Select y.name, y.address, pname Max(x.price) Max(x.price) Max(x.price) Max(x.price) σ price>100 AND city=“Urbana” From product x, company y From product x, company y From product x, company y From product x, company y Where x.maker=y.name and Where x.maker = y.name Where x.maker=y.name and Where x.maker = y.name maker=name x.price > 100 GroupBy y.name x.price > 100 GroupBy y.name GroupBy y.name Having Max(x.price) > 100 maker=name GroupBy y.name Having Max(x.price) > 100 city=“Urbana” price>100 Having Max(x.price) > 100 Having Max(x.price) > 100 Company Product Company Product •Advantage: the size of the join will be smaller. • Requires transformation rules specific to the grouping/aggregation (but may cause us to lose an important ordering operators. of the tuples, if we use indexes). • Won’t work if we replace Max by Min. Query Rewrite: Pushing predicates up Pushing predicates up Select V2.category, V2.cname, V2.price Select V2.category, V2.cname, V2.price From V1, V2 Select V2.cname, V2.price From V1, V2 Select V2.cname, V2.price From V1, V2 Where V1.category = V2.category and From V1, V2 Where V1.category = V2.category and V1.p = V2.price Where V1.category = V2.category and V1.p = V2.price Where V1.category = V2.category and V1.p = V2.price AND V1.p < 20 V1.p = V2.price AND V1.p < 20 V1: categories and the cheapest V2: Company name, product price of the product category, and the price of the in that category under $20 product made by that company Create View V1 AS Create View V1 AS Create View V1 AS Create View V1 AS Select x.category, Select x.category, Create View V2 AS Create View V2 AS Select x.category, Create View V2 AS Select x.category, Create View V2 AS Min(x.price) AS p Select y.cname, x.category, x.price Min(x.price) AS p Select y.cname, x.category, x.price Min(x.price) AS p Min(x.price) AS p Select y.cname, x.category, x.price Select y.cname, x.category, x.price From product x From product x, company y From product x From product x, company y From product x From product x, company y From product x From product x, company y Where x.price < 20 Where x.price < 20 Where x.maker=y.cname Where x.maker=y.cname Where x.price < 20 Where x.price < 20 Where x.maker=y.cname Where x.maker=y.cname GroupBy x.category GroupBy x.category GroupBy x.category GroupBy x.category

Recommend


More recommend