DS 1300 - Introductjon to SQL Part 3 – Aggregatjon & other Topics by Michael Hahsler Based on slides for CS145 Introductjon to Databases (Stanford)
Lecture Overview 1. Aggregatjon & GROUP BY 2. Advanced SQL-izing (set operatjons, NULL, Outer Joins, etc.) 2
AGGREGATION, GROUP BY AND HAVING CLAUSE 3
Aggregatjon SELECT COUNT(*) SELECT AVG(price) SELECT COUNT(*) SELECT AVG(price) FROM Product FROM Product FROM Product FROM Product WHERE year > 1995 WHERE year > 1995 WHERE maker = ‘T WHERE maker = ‘T oyota’ oyota’ • SQL supports several aggregatjon operatjons: • SUM, COUNT, MIN, MAX, AVG Except for COUNT, all Except for COUNT, all aggregatjons apply to aggregatjons apply to a single aturibute! a single aturibute! 4
Aggregatjon: COUNT COUNT counts the number of tuples including duplicates. SELECT COUNT(category) SELECT COUNT(category) Note: Same as Note: Same as FROM Product FROM Product COUNT(*)! COUNT(*)! WHERE year > 1995 WHERE year > 1995 We probably want count the number of “difgerent” categories: SELECT COUNT(DISTINCT category) SELECT COUNT(DISTINCT category) FROM Product FROM Product WHERE year > 1995 WHERE year > 1995 5
More Examples Purchase(product, date, price, quantity) Purchase(product, date, price, quantity) SELECT SUM(price * quantity) SELECT SUM(price * quantity) FROM Purchase FROM Purchase What do these mean? SELECT SUM(price * quantity) SELECT SUM(price * quantity) FROM Purchase FROM Purchase WHERE product = ‘bagel’ WHERE product = ‘bagel’ 6
Simple Aggregatjons Purchase Product Date Price Quantity bagel 10/21 1 20 banana 10/3 0.5 10 banana 10/10 1 10 bagel 10/25 1.50 20 SELECT SUM(price * quantity) SELECT SUM(price * quantity) 50 (= 1 * 20 + 1.50 * 20) FROM Purchase FROM Purchase WHERE product = ‘bagel’ WHERE product = ‘bagel’ 7
Grouping and Aggregatjon Find total Purchase(product, date, price, quantity) Purchase(product, date, price, quantity) sales afuer Oct 1, 2010, SELECT product, SELECT product, per product. SUM(price * quantity) AS T otalSales SUM(price * quantity) AS T otalSales FROM Purchase FROM Purchase Note: Be very careful WHERE date > ‘2000-10-01’ WHERE date > ‘2000-10-01’ with dates! GROUP BY product GROUP BY product Use date/tjme related Let’s see what this means… functjons! 8
Grouping and Aggregatjon Semantjcs of the query: 1. Compute the FROM and WHERE clauses 2. Group by the atuributes in the GROUP BY 3. Compute the SELECT clause: grouped atuributes and aggregates 9
1. Compute the FROM and WHERE clauses SELECT product, SUM(price*quantity) AS SELECT product, SUM(price*quantity) AS T T otalSales otalSales FROM Purchase FROM Purchase WHERE date > ‘2000-10-01’ WHERE date > ‘2000-10-01’ GROUP BY product GROUP BY product Product Date Price Quantity FROM Bagel 2000-10-21 1 20 Bagel 2000-10-25 1.50 20 Banana 2000-10-03 0.5 10 Banana 2000-10-10 1 10 10
2. Group by the atuributes in the GROUP BY SELECT product, SUM(price*quantity) AS SELECT product, SUM(price*quantity) AS T otalSales T otalSales FROM Purchase FROM Purchase WHERE date > ‘2000-10-01’ WHERE date > ‘2000-10-01’ GROUP BY product GROUP BY product Product Date Price Quantity GROUP BY Product Date Price Quantity Bagel 2000-10-21 1 20 2000-10-21 1 20 Bagel 2000-10-25 1.50 20 Bagel 2000-10-25 1.50 20 Banana 2000-10-03 0.5 10 2000-10-03 0.5 10 Banana Banana 2000-10-10 1 10 2000-10-10 1 1 11
3. Compute the SELECT clause: grouped atuributes and aggregates SELECT product, SUM(price*quantity) AS T SELECT product, SUM(price*quantity) AS T otalSales otalSales FROM Purchase FROM Purchase WHERE date > ‘2000-10-01’ WHERE date > ‘2000-10-01’ GROUP BY product GROUP BY product Product Date Price Quantity Product TotalSales SELECT 2000-10-21 1 20 Bagel 50 Bagel 2000-10-25 1.50 20 Banana 15 2000-10-03 0.5 10 Banana 2000-10-10 1 10 12
Actjvity Company(Cname, country) Company(Cname, country) Product(PName, price, category, manufacturer) Product(PName, price, category, manufacturer) Purchase(id, product, buyer) Purchase(id, product, buyer) 1) What do the next two queries calculate? SELECT SUM(price) AS total, SUM(price) *1.08 AS totalPlusTax FROM Product pr JOIN Purchase p ON pr.PName = p.product WHERE p.buyer = 'Joe Blow' SELECT p.buyer, SUM(price) AS total, SUM(price) *1.08 AS totalPlusTax FROM Product pr JOIN Purchase p ON pr.PName = p.product GROUP BY p.buyer ORDER BY 1 13 2) Write a query to fjnd the price of the most expensive product in each category.
HAVING Clause Purchase(product, date, price, quantity) Purchase(product, date, price, quantity) SELECT product, SUM(price*quantity) SELECT product, SUM(price*quantity) Same query as before, FROM Purchase FROM Purchase except that we consider WHERE date > ‘2005-10-01’ WHERE date > ‘2005-10-01’ only products that have GROUP BY product GROUP BY product more than 100 buyers HAVING SUM(quantity) > 100 HAVING SUM(quantity) > 100 HAVING clauses contains conditjons on aggregates HAVING clauses contains conditjons on aggregates Whereas WHERE clauses conditjon on individual tuples… Whereas WHERE clauses conditjon on individual tuples… 14
General form of Grouping and Aggregatjon SELECT S SELECT S FROM R 1 ,…,R n FROM R 1 ,…,R n WHERE C 1 WHERE C 1 GROUP BY a 1 ,…,a k GROUP BY a 1 ,…,a k HAVING C 2 HAVING C 2 • S = Can ONLY contain atuributes a 1 ,…,a k and/or aggregates Why? Why? over other atuributes • C 1 = is any conditjon on the atuributes in R 1 ,…,R n • C 2 = is any conditjon on the aggregate expressions 15
General form of Grouping and Aggregatjon SELECT S SELECT S 3 FROM R 1 ,…,R n FROM R 1 ,…,R n 1 WHERE C 1 WHERE C 1 2 GROUP BY a 1 ,…,a k GROUP BY a 1 ,…,a k 4 HAVING C 2 HAVING C 2 Evaluatjon steps: 1. Evaluate FROM-WHERE: apply conditjon C 1 on the atuributes in R 1 ,…,R n 2. GROUP BY the atuributes a 1 ,…,a k 3. Compute aggregates in S and do projectjon (SELECT) 4. Apply conditjon C 2 to each group (may have aggregates) 16
Actjvity Company(Cname, country) Company(Cname, country) Product(PName, price, manufacturer) Product(PName, price, manufacturer) Purchase(id, product, buyer) Purchase(id, product, buyer) 1) What does this query do? SELECT p.buyer, SUM(price) AS total, COUNT(*) AS purchases FROM Product pr JOIN Purchase p ON pr.PName = p.product GROUP BY p.buyer HAVING purchases >2 ORDER BY 1 2) What products in the DB have a revenue of more then $10,000? 17
OTHER SQL TOPICS: SUBQUERIES, NULLS, CASTING, OUTER JOINS AND ADDING DATA 29
Subqueries SELECT * SELECT * FROM ( SELECT product, COUNT(product) AS count FROM ( SELECT product, COUNT(product) AS count FROM Purchase GROUP BY product ) FROM Purchase GROUP BY product ) WHERE count > 2 WHERE count > 2 SELECT *, (SELECT count(*) FROM Product p1 SELECT *, (SELECT count(*) FROM Product p1 WHERE p1.category = p2.category) AS '# Prod. in Cat.' WHERE p1.category = p2.category) AS '# Prod. in Cat.' FROM Product p2 FROM Product p2 Subqueries can appear wherever a table or a value is needed. Subqueries can appear wherever a table or a value is needed. 30
NULL VALUES & OTHER DETAILS 43
NULL Values • Whenever we do not have a value, we can use NULL • Can mean many things: – Value does not exists – Value exists but is unknown (n/a, not available) – Value not applicable • The schema specifjes for each aturibute if it can be null ( nullable aturibute) or not with NOT NULL 44
NULL Values and Operators For numerical operatjons : – If x = NULL then 4*(3-x)/7 is also NULL For boolean operatjons, in SQL there are three values: SQLite does not have a SQLite does not have a FALSE = 0 boolean datatype. It uses boolean datatype. It uses TRUE = 1 Integer instead! Integer instead! UNKNOWN Try: Try: • • SELECT 2>1 SELECT 2>1 • • SELECT 2>NULL SELECT 2>NULL If x= NULL then x=‘Joe’ is UNKNOWN • • SELECT 1+NULL SELECT 1+NULL Note: comparison in SQL is a single ‘=‘ 45
Null Values in the WHERE Clause SELECT * SELECT * FROM Person FROM Person WHERE (age < 25) WHERE (age < 25) AND (height > 6 AND weight > 190) AND (height > 6 AND weight > 190) Will not return age=20, height=NULL, weight=200 Will not return age=20, height=NULL, weight=200 Since NULL > 6 is UNKNOWN! Since NULL > 6 is UNKNOWN! 46
NULL Values in WHERE Clauses Unexpected behavior: SELECT * SELECT * FROM Person FROM Person WHERE age < 25 OR age >= 25 WHERE age < 25 OR age >= 25 Should return all persons, but Should return all persons, but persons with NULL as age are not included! persons with NULL as age are not included! You can use CASE with IS NULL, ISNULL(), IFNULL() or COALESCE() to handle NULL values. 47
Recommend
More recommend