Advanced SQL II — Advanced Aggregation and OLAP 5DV120 — Database System Principles Ume˚ a University Department of Computing Science Stephen J. Hegner hegner@cs.umu.se http://www.cs.umu.se/~hegner Advanced SQL II — Advanced Aggregation and OLAP 20130417 Slide 1 of 35
Aggregation Operators — Review Query: For each department, find the minimum, maximum, average salary, as well as the number of employees. SELECT dept_name , MIN(salary) AS min_sal , MAX(salary) AS max_sal , CAST (AVG(salary) as NUMERIC (8 ,2)) AS avg_sal , COUNT (*) AS n_emp FROM instructor GROUP BY dept_name UNION SELECT dept_name , 0, 0, 0, 0 FROM department as D WHERE (NOT EXISTS (SELECT * FROM instructor NATURAL JOIN department AS DI WHERE D.dept_name=DI.dept_name )); An important rule: The attributes which are listed in the GROUP BY clause must include those which occur in the SELECT and are not aggregated. • Will not work (even though it clearly should): SELECT dept_name , building , MIN(salary) AS min_sal , MAX(salary) AS max_sal , CAST (AVG(salary) as NUME02RIC (8 ,2)) AS avg_sal , COUNT (*) AS n_emp FROM instructor NATURAL JOIN department GROUP BY dept_name UNION ... Advanced SQL II — Advanced Aggregation and OLAP 20130417 Slide 2 of 35
Omission of GROUP BY — Review • If the GROUP BY clause is omitted, the aggregation is over the entire table. SELECT MIN(salary) AS min_sal , MAX(salary) AS max_sal , CAST (AVG(salary) as NUMERIC (8 ,2)) AS avg_sal , COUNT (*) AS n_emp FROM instructor ; • In this case, there must be no non-aggregated attributes in the SELECT clause. • Does not work: SELECT dept_name , MIN(salary) AS min_sal , MAX(salary) AS max_sal , CAST (AVG(salary) as NUMERIC (8 ,2)) AS avg_sal , COUNT (*) AS n_emp FROM instructor ; • In the above case, dept name must appear in the GROUP BY clause. Advanced SQL II — Advanced Aggregation and OLAP 20130417 Slide 3 of 35
The HAVING Clause — Review Query: For each department with at least two distinct instructors, find the minimum, maximum, average salary, as well as the number of employees. • The following does not work: SELECT dept_name , MIN(salary) AS min_sal , MAX(salary) AS max_sal , CAST (AVG(salary) as NUMERIC (8 ,2)) AS avg_sal , COUNT (*) AS n_emp FROM instructor WHERE (COUNT (*) >= 2); GROUP BY dept_name; • The problem is that the WHERE clause is evaluated before the aggregation. • The solution is to use a HAVING clause, which is evaluated after the aggregation. SELECT dept_name , MIN(salary) AS min_sal , MAX(salary) AS max_sal , CAST (AVG(salary) as NUMERIC (8 ,2)) AS avg_sal , COUNT (*) AS n_emp FROM instructor GROUP BY dept_name HAVING (COUNT (*) >= 2); • The HAVING clause must come after the GROUP BY clause. Advanced SQL II — Advanced Aggregation and OLAP 20130417 Slide 4 of 35
Embedded Queries in the HAVING Clause — Review Query: Find the department(s) with the greatest number of instructors. SELECT I1.dept_name , COUNT(I1 .*) FROM instructor as I1 GROUP BY dept_name HAVING NOT EXISTS (SELECT I2.dept_name , COUNT(I2 .*) FROM instructor AS I2 GROUP BY I2.dept_name HAVING (COUNT(I2 .*) > COUNT(I1 .*)) ); • It is also possible to do this with an embedded subquery in the FROM clause. SELECT I.dept_name , I.n_instr FROM (SELECT dept_name , COUNT (*) AS n_instr FROM instructor GROUP BY dept_name ) AS I WHERE I.n_instr >= ALL (SELECT n_instr FROM (SELECT dept_name , COUNT (*) AS n_instr FROM instructor GROUP BY dept_name ) AS Pointless ); • Note the alias Pointless which is required by SQL rules. Advanced SQL II — Advanced Aggregation and OLAP 20130417 Slide 5 of 35
Window Functions I Review Query: For each building, find the average budget for all departments in that building. SELECT building , AVG(budget) AS avg_for_bldg FROM department GROUP BY building ORDER BY AVG(budget) DESC; • To provide this averaged information while still providing full information for each individual department, a window function is very useful. SELECT dept_name , building , budget , CAST ( AVG(budget) OVER (PARTITION BY building) AS NUMERIC (8 ,2) ) AS avg_for_bldg FROM department ORDER BY building , budget DESC ; dept_name | building | budget | avg_for_bldg -- ----------+----------+-----------+-------------- Music | Packard | 80000.00 | 80000.00 Finance | Painter | 120000.00 | 85000.00 History | Painter | 50000.00 | 85000.00 Comp. Sci. | Taylor | 100000.00 | 92500.00 Elec. Eng. | Taylor | 85000.00 | 92500.00 Biology | Watson | 90000.00 | 80000.00 Physics | Watson | 70000.00 | 80000.00 • Note that a GROUP BY clause is not required, despite the aggregation. Advanced SQL II — Advanced Aggregation and OLAP 20130417 Slide 6 of 35
Window Functions II • When a window function is used more than once, it is helpful to declare it by name. SELECT dept_name , building , budget , CAST ( AVG(budget) OVER (PARTITION BY building) AS NUMERIC (8 ,2) ) AS avg_for_bldg , CAST ( budget - AVG(budget) OVER (PARTITION BY building) AS NUMERIC (8 ,2) ) As deviation FROM department ; • The above query is represented more compactly as: SELECT dept_name , building , budget , CAST (AVG(budget) OVER W AS NUMERIC (8 ,2)) AS avg_for_bldg , CAST (budget - AVG(budget) OVER W AS NUMERIC (8 ,2)) AS deviation FROM department WINDOW W AS (PARTITION BY building ); dept_name | building | budget | avg_for_bldg | deviation -- ----------+----------+-----------+--------------+----------- Music | Packard | 80000.00 | 80000.00 | 0.00 Finance | Painter | 120000.00 | 85000.00 | 35000.00 History | Painter | 50000.00 | 85000.00 | -35000.00 Comp. Sci. | Taylor | 100000.00 | 92500.00 | 7500.00 Elec. Eng. | Taylor | 85000.00 | 92500.00 | -7500.00 Biology | Watson | 90000.00 | 80000.00 | 10000.00 Physics | Watson | 70000.00 | 80000.00 | -10000.00 Advanced SQL II — Advanced Aggregation and OLAP 20130417 Slide 7 of 35
Window Functions III • Here is a more complex query which provides, for each instructor salary, its percentage of the average within the department. SELECT ID , name , dept_name , salary , CAST (( salary *100)/ AVG(salary) OVER (PARTITION BY dept_name) AS NUMERIC (5 ,2) ) AS pct_avg_in_dept FROM instructor NATURAL JOIN department ; id | name | dept_name | salary | pct_avg_in_dept -- -----+------------+------------+----------+----------------- 76766 | Crick | Biology | 72000.00 | 100.00 10101 | Srinivasan | Comp. Sci. | 65000.00 | 84.05 45565 | Katz | Comp. Sci. | 75000.00 | 96.98 83821 | Brandt | Comp. Sci. | 92000.00 | 118.97 98345 | Kim | Elec. Eng. | 80000.00 | 100.00 12121 | Wu | Finance | 90000.00 | 105.88 76543 | Singh | Finance | 80000.00 | 94.12 32343 | El Said | History | 60000.00 | 98.36 58583 | Califieri | History | 62000.00 | 101.64 15151 | Mozart | Music | 40000.00 | 100.00 33456 | Gold | Physics | 87000.00 | 95.60 22222 | Einstein | Physics | 95000.00 | 104.40 Advanced SQL II — Advanced Aggregation and OLAP 20130417 Slide 8 of 35
Window Functions IV • Window functions are also very useful for computing running averages. • Suppose that the following simple table of years and sales amounts is given. CREATE TABLE Sales (year NUMERIC (4 ,0) NOT NULL , amount NUMERIC (8 ,2) NOT NULL , PRIMARY KEY (year) ); • The following query provides a moving average of sales for the current year and two preceding. SELECT Year , Amount , CAST ( AVG(Amount) OVER (ORDER BY year ROWS 2 PRECEDING) AS NUMERIC (8 ,2) ) AS mvg_avg_3_yr FROM Sales; year | amount | mvg_avg_3_yr year | amount | mvg_avg_3_yr -- ----+-----------+-------------- -- ----+-----------+-------------- 1990 | 52134.00 | 52134.00 2001 | 95000.00 | 93333.33 1991 | 66000.00 | 59067.00 2002 | 98000.00 | 96000.00 1992 | 75000.00 | 64378.00 2003 | 96000.00 | 96333.33 1993 | 65000.00 | 68666.67 2004 | 101000.00 | 98333.33 1994 | 70000.00 | 70000.00 2005 | 99000.00 | 98666.67 1995 | 80000.00 | 71666.67 2006 | 104000.00 | 101333.33 1996 | 85000.00 | 78333.33 2007 | 120000.00 | 107666.67 1997 | 82000.00 | 82333.33 2008 | 125000.00 | 116333.33 1998 | 88000.00 | 85000.00 2009 | 115000.00 | 120000.00 1999 | 90000.00 | 86666.67 2010 | 130000.00 | 123333.33 2000 | 95000.00 | 91000.00 2011 | 175000.00 | 140000.00 Advanced SQL II — Advanced Aggregation and OLAP 20130417 Slide 9 of 35
Window Functions V • Other options for the moving average are also possible. • All preceding years as well as the current: SELECT Year , Amount , CAST ( AVG(Amount) OVER (ORDER BY year ROWS UNBOUNDED PRECEDING) AS NUMERIC (8 ,2) ) AS ubd_prec_mvg_avg FROM Sales; • A three-year running average which includes the preceding as well as the following year: SELECT Year , Amount , CAST ( AVG(Amount) OVER (ORDER BY year ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS NUMERIC (8 ,2) ) AS moving_avg_1p_1f FROM Sales; Advanced SQL II — Advanced Aggregation and OLAP 20130417 Slide 10 of 35
Recommend
More recommend