group by and having
play

GROUP BY AND HAVING So far, weve applied aggregation operatorsto all - PDF document

M ATH IN SQL A GGREGATION O PERATORS Operators on sets of tuples. Significant extension of relational algebra. SUM ( [DISTINCT] A): the sum of all (unique) values in attribute A. AVG ( [DISTINCT] A): the average of all (unique)


  1. M ATH IN SQL A GGREGATION O PERATORS  Operators on sets of tuples.  Significant extension of relational algebra.  SUM ( [DISTINCT] A): the sum of all (unique) values in attribute A.  AVG ( [DISTINCT] A): the average of all (unique) values in attribute A. SELECT AVG (S.age) FROM Sailors S; SELECT AVG ( DISTINCT S.age) FROM Sailors S WHERE S.rating=10; 2 A GGREGATION O PERATORS  Operators on sets of tuples.  Significant extension of relational algebra.  MAX (A): the maximum value in attribute A.  MIN (A): the minimum value in attribute A. SELECT MAX (rating) FROM Sailors; SELECT S.sname FROM Sailors S WHERE S.rating= ( SELECT MAX (S2.rating) FROM Sailors S2); 3 3 3

  2. A GGREGATION O PERATORS  Operators on sets of tuples.  Significant extension of relational algebra.  COUNT (*): the number of tuples. SELECT COUNT (*) FROM Sailors S 4 A GGREGATION O PERATORS  Operators on sets of tuples.  Significant extension of relational algebra.  COUNT ( [DISTINCT] A): the number of (unique) values in attribute A. SELECT COUNT ( DISTINCT S.rating) FROM Sailors S WHERE S.sname=‘Bob’; 5 A GGREGATION O PERATORS  Find name and age of the oldest sailor(s). SELECT S.sname, MAX (S.age)  The first query looks FROM Sailors S; correct, but is illegal.  Thoughts as to why?  The second query is a SELECT S.sname, S.age correct and legal FROM Sailors S solution. WHERE S.age = ( SELECT MAX (S2.age) FROM Sailors S2); 6 6 6

  3. GROUP BY AND HAVING  So far, we’ve applied aggregation operatorsto all (qualifying) tuples. Sometimes, we want to apply them to each of several groups of tuples.  Find the age of the youngest sailor for each rating value.  Suppose we know that rating values go from 1 to 10; we can write ten (!) queries that look like this: For i = 1, 2, ... , 10: SELECT MIN (S.age) FROM Sailors S WHERE S.rating= i ;  But in general, we don’t know how many rating values exist, and what these rating values are.  Plus, it’s a waste of time to write so many queries 7 GROUP BY AND HAVING [DISTINCT] target-list SELECT relation-list FROM Notice the qualification WHERE notation GROUP BY grouping-list group-qualification HAVING  A group is a set of tuples that have the same value for all attributes grouping-list .  The target-list contains  attribute names  terms with aggregation operations.  Attribute list must be a subset of grouping-list.  Each answer tuple correspondsto a group, and output attributes must have a single value per group. 8 C ONCEPTUAL E VALUATION  Given: SELECT S.rating, MIN(S.age) as minage FROM Sailors S WHERE S.age >= 18 GROUP BY S.rating HAVING COUNT(*) > 1  Step 1  The cross-product of relation-list is computed  In this instance, it’s only Sailors 9 9 9

  4. C ONCEPTUAL E VALUATION  Given: SELECT S.rating, MIN(S.age) as minage FROM Sailors S WHERE S.age >= 18 GROUP BY S.rating HAVING COUNT(*) > 1  Step 2  Tuples that fail qualification are discarded  ‘unnecessary’ attributes are deleted 10 10 10 C ONCEPTUAL E VALUATION  Given: SELECT S.rating, MIN(S.age) as minage FROM Sailors S WHERE S.age >= 18 GROUP BY S.rating HAVING COUNT(*) > 1  Step 3  Remaining tuples are partitioned into groups by the value of attributes in grouping-list 11 11 11 C ONCEPTUAL E VALUATION  Given: SELECT S.rating, MIN(S.age) as minage FROM Sailors S WHERE S.age >= 18 GROUP BY S.rating HAVING COUNT(*) > 1  Step 4  The group-qualification is then applied to eliminate groups that do not satisfy this condition. 12 12 12

  5. C ONCEPTUAL E VALUATION  Given: SELECT S.rating, MIN(S.age) as minage FROM Sailors S WHERE S.age >= 18 GROUP BY S.rating HAVING COUNT(*) > 1  Step 5  One answer tuple is generated per qualifying group by applying the aggregation operator. 13 13 13 GROUP BY AND HAVING  Find the age of the youngest  sailor with age 18, for each sid sname rating age rating with at least 2 such sailors. 22 dustin 7 45.0 SELECT S.rating, MIN 31 lubber 8 55.5 (S.age) 71 zorba 10 16.0 FROM Sailors S WHERE S.age >= 18 64 horatio 7 35.0 GROUP BY S.rating 29 brutus 1 33.0 HAVING COUNT (*) > 1; 58 rusty 10 35.0  Only S.rating and S.age are mentioned in the SELECT, rating age GROUP BY or HAVING clauses; 1 33.0 other attributes ` unnecessary ’. Answer relation 7 45.0  2nd column of result is rating 7 35.0 unnamed 7 35.0 8 55.5 What to do?  10 35.0 14 14 14 GROUP BY AND HAVING  For each red boat, find the number of reservations for this boat. SELECT B.bid, COUNT (*) AS scount FROM Sailors S, Boats B, Reserves R WHERE S.sid=R.sid AND R.bid=B.bid AND B.color=‘red’ GROUP BY B.bid;  Grouping over a join of three relations.  What do we get if we remove B.color=‘red’ from the WHERE clause and add a HAVING clause with this condition?  What if we drop Sailors and the condition involving S.sid? 15 15 15

  6. GROUP BY AND HAVING  Find the age of the youngest sailor with age > 18, for each rating with at least 2 sailors (of any age). SELECT S.rating, MIN (S.age) FROM Sailors S WHERE S.age > 18 GROUP BY S.rating HAVING 1 < (SELECT COUNT (*) FROM Sailors S2 WHERE S.rating=S2.rating);  Shows HAVINGclause can also contain a subquery.  What if HAVING clause is replaced by: HAVINGCOUNT(*) >1  16 16 16 GROUP BY AND HAVING  Find those ratings for which the average age is the minimum over all ratings.  Aggregation operations cannot be nested!  WRONG: SELECT S.rating FROM Sailors S WHERE S.age = (SELECT MIN (AVG (S2.age)) FROM Sailors S2);  Correctsolution: SELECT Temp.rating, Temp.avgage FROM (SELECT S.rating, AVG (S.age) AS avgage FROM Sailors S GROUP BY S.rating) AS Temp WHERE Temp.avgage = (SELECT MIN (Temp.avgage) FROM Temp); 17 17 17 ORDERING & TOP/BOTTOM

  7. ORDER BY  The ORDER BY keyword is used to sort the result-set by a specified column.  The ORDER BY keyword sort the recordsin ascending order by default.  If you want to sort the records in a descending order, you can use the DESC keyword. 19 19 19 TOP/BOTTOM  The TOP clause is used to specify the number of records to return.  The TOP clause can be very useful on large tables with thousands of records  Returning a large number of records can impact on performance  Can ‘sample’ the table using TOP  Not all database systems support the TOP clause or implement it in different fashion 20 20 20 TOP/BOTTOM SQL Server SELECT TOP number|percent column_name(s) FROM table_name Ex: SELECT TOP 5 * FROM Persons MySQL SELECT column_name(s) FROM table_name LIMIT number Ex: SELECT * FROM Persons LIMIT 5 21 21 21

  8. TOP/BOTTOM Oracle SELECT column_name(s) FROM table_name WHERE ROWNUM <= number Ex: SELECT * FROM Persons WHERE ROWNUM <=5 DB2 SELECT column_name(s) FROM table_name FETCH FIRST number ROWS ONL Y Ex: SELECT * FROM Persons FETCH FIRST 5 ROWS ONL Y 22 22 22 TOP/BOTTOM  Can specify  Fixed number  SELECT TOP 10 * …  A percent  SELECT TOP 10 PERCENT * … 23 23 23 TOP/BOTTOM  How to return the oldest 5 rentals?  How to return the newest 5 rentals? 24 24 24

  9. TOP/BOTTOM  How to return the 3 rd newest rental? 25 25 25 S UMMARY  SQL was an important factor in the early acceptance of the relational model; more natural than earlier, procedural query languages.  All queries that can be expressed in relational algebra can also be formulated in SQL.  In addition, SQL has significantly more expressive power than relational algebra, in particular aggregation operationsand grouping.  Many alternative ways to write a query; query optimizer looks for most efficient evaluation plan.  In practice, users need to be aware of how queries are optimized and evaluated for most efficient results. 26 26 26

Recommend


More recommend