windows 3 1
play

Windows 3.1? So why stick with SQL-92? @ModernSQL - - PowerPoint PPT Presentation

Still using Windows 3.1? So why stick with SQL-92? @ModernSQL - https://modern-sql.com/ @MarkusWinand SQL:1999 WITH (Common Table Expressions) WITH (non-recursive) The Problem Nested queries are hard to read: SELECT FROM (SELECT


  1. Since SQL:1999 WITH RECURSIVE Recursive common table expressions may refer to themselves in a leg of a UNION [ALL] : WITH RECURSIVE cte (n) AS (SELECT 1 It's a UNION ALL n loop! SELECT n+1 --- FROM cte 1 WHERE n < 3) 2 SELECT * FROM cte 3 (3 rows)

  2. Since SQL:1999 WITH RECURSIVE Recursive common table expressions may refer to themselves in a leg of a UNION [ALL] : WITH RECURSIVE cte (n) AS (SELECT 1 It's a UNION ALL n loop! SELECT n+1 --- FROM cte 1 WHERE n < 3) 2 SELECT * FROM cte 3 (3 rows)

  3. Since SQL:1999 WITH RECURSIVE Recursive common table expressions may refer to themselves in a leg of a UNION [ALL] : WITH RECURSIVE cte (n) AS (SELECT 1 It's a UNION ALL n loop! SELECT n+1 --- FROM cte 1 WHERE n < 3) 2 SELECT * FROM cte 3 (3 rows)

  4. Since SQL:1999 WITH RECURSIVE Recursive common table expressions may refer to themselves in a leg of a UNION [ALL] : WITH RECURSIVE cte (n) AS (SELECT 1 UNION ALL n n=3 
 doesn't SELECT n+1 --- match FROM cte 1 WHERE n < 3) 2 SELECT * FROM cte 3 (3 rows)

  5. Since SQL:1999 WITH RECURSIVE Recursive common table expressions may refer to themselves in a leg of a UNION [ALL] : WITH RECURSIVE cte (n) AS (SELECT 1 UNION ALL n n=3 
 doesn't SELECT n+1 --- match FROM cte 1 WHERE n < 3) 2 Loop terminates SELECT * FROM cte 3 (3 rows)

  6. 
 
 
 Use Cases WITH RECURSIVE ‣ Row generators 
 As shown on previous slide To fill gaps (e.g., in time series), generate test data. ‣ Processing graphs 
 http://aprogrammerwrites.eu/?p=1391 Shortest route from person A to B “[…] for certain classes of graphs, solutions utilizing in LinkedIn/Facebook/Twitter/… relational database technology […] can offer performance superior to that of the dedicated graph databases.” event.cwi.nl/grades2013/07-welc.pdf ‣ Finding distinct values 
 http://wiki.postgresql.org/wiki/Loose_indexscan with n*log(N) † time complexity. […many more…] † n … # distinct values, N … # of table rows. Suitable index required

  7. In a Nutshell WITH RECURSIVE WITH RECURSIVE is the “ while ” of SQL WITH RECURSIVE "supports" infinite loops Except PostgreSQL, databases generally don't require the RECURSIVE keyword. DB2, SQL Server & Oracle don’t even know the keyword RECURSIVE , but allow recursive CTEs anyway.

  8. Availability WITH RECURSIVE 9 1 3 5 7 9 1 3 5 7 9 0 0 0 0 0 1 1 1 1 9 0 0 0 0 0 0 0 0 0 1 2 2 2 2 2 2 2 2 2 MariaDB 5.1 10.2 8.0 MySQL PostgreSQL 8.4 SQLite 3.8.3 [0] DB2 LUW 7.0 Oracle 11gR2 SQL Server 2005 [0] Only for top-level SELECT statements

  9. GROUPING SETS

  10. Before SQL:1999 GROUPING SETS Only one GROUP BY operation at a time: Monthly revenue Yearly revenue SELECT year SELECT year , month , sum(revenue) , sum(revenue) FROM tbl FROM tbl GROUP BY year, month GROUP BY year

  11. Before SQL:1999 GROUPING SETS SELECT year , month , sum(revenue) FROM tbl GROUP BY year, month SELECT year , sum(revenue) FROM tbl GROUP BY year

  12. Before SQL:1999 GROUPING SETS SELECT year , month , sum(revenue) FROM tbl GROUP BY year, month UNION ALL SELECT year , null , sum(revenue) FROM tbl GROUP BY year

  13. Since SQL:1999 GROUPING SETS SELECT year SELECT year , month , month , sum(revenue) , sum(revenue) FROM tbl FROM tbl GROUP BY year, month GROUP BY UNION ALL GROUPING SETS ( SELECT year (year, month) , null , (year) , sum(revenue) ) FROM tbl GROUP BY year

  14. In a Nutshell GROUPING SETS GROUPING SETS are multiple GROUP BY s in one go () (empty parenthesis) build a group over all rows GROUPING (function) disambiguates the meaning of NULL 
 (was the grouped data NULL or is this column not currently grouped?) Permutations can be created using ROLLUP and CUBE 
 ( ROLLUP(a,b,c) = GROUPING SETS ((a,b,c), (a,b),(a),())

  15. Availability GROUPING SETS 9 1 3 5 7 9 1 3 5 7 9 0 0 0 0 0 1 1 1 1 9 0 0 0 0 0 0 0 0 0 1 2 2 2 2 2 2 2 2 2 MariaDB 5.1 [0] MySQL 5.0 [1] PostgreSQL 9.5 SQLite DB2 LUW 5 Oracle 9iR1 SQL Server 2008 [0] Only ROLLUP (properitery syntax). [1] Only ROLLUP (properitery syntax). GROUPING function since MySQL 8.0.

  16. SQL:2003

  17. OVER and PARTITION BY

  18. OVER (PARTITION BY) The Problem Two distinct concepts could not be used independently: ‣ Merge rows with the same key properties ‣ GROUP BY to specify key properties ‣ DISTINCT to use full row as key ‣ Aggregate data from related rows ‣ Requires GROUP BY to segregate the rows ‣ COUNT , SUM , AVG , MIN , MAX to aggregate grouped rows

  19. OVER (PARTITION BY) The Problem No ⇠ Aggregate ⇢ Yes SELECT c1 SELECT c1 Yes ⇠ Merge rows ⇢ No , c2 , c2 FROM t FROM t SELECT DISTINCT SELECT c1 SELECT c1 c1 , SUM (c2) tot , SUM (c2) tot , c2 FROM t FROM t FROM t GROUP BY c1 GROUP BY c1

  20. OVER (PARTITION BY) The Problem No ⇠ Aggregate ⇢ Yes SELECT c1 SELECT c1 Yes ⇠ Merge rows ⇢ No , c2 , c2 FROM t FROM t SELECT DISTINCT SELECT c1 SELECT c1 c1 , SUM (c2) tot , SUM (c2) tot , c2 FROM t FROM t FROM t GROUP BY c1 GROUP BY c1

  21. OVER (PARTITION BY) The Problem No ⇠ Aggregate ⇢ Yes SELECT c1 SELECT c1 Yes ⇠ Merge rows ⇢ No , c2 , c2 FROM t FROM t JOIN ( ) ta ON (t.c1=ta.c1) SELECT DISTINCT SELECT c1 SELECT c1 c1 , SUM (c2) tot , SUM (c2) tot , c2 FROM t FROM t FROM t GROUP BY c1 GROUP BY c1

  22. OVER (PARTITION BY) The Problem No ⇠ Aggregate ⇢ Yes SELECT c1 SELECT c1 Yes ⇠ Merge rows ⇢ No , tot , c2 , c2 FROM t FROM t SELECT c1 JOIN ( ) ta , SUM (c2) tot FROM t GROUP BY c1 ON (t.c1=ta.c1) SELECT DISTINCT SELECT c1 c1 , SUM (c2) tot , c2 FROM t FROM t GROUP BY c1

  23. OVER (PARTITION BY) The Problem No ⇠ Aggregate ⇢ Yes SELECT c1 SELECT c1 Yes ⇠ Merge rows ⇢ No , tot , c2 , c2 FROM t FROM t SELECT c1 JOIN ( ) ta , SUM (c2) tot FROM t GROUP BY c1 ON (t.c1=ta.c1) SELECT DISTINCT SELECT c1 c1 , SUM (c2) tot , c2 FROM t FROM t GROUP BY c1

  24. OVER (PARTITION BY) Since SQL:2003 No ⇠ Aggregate ⇢ Yes SELECT c1 SELECT c1 Yes ⇠ Merge rows ⇢ No , c2 , c2 FROM t FROM t , SUM(c2) OVER (PARTITION BY c1) FROM t SELECT DISTINCT SELECT c1 c1 , SUM (c2) tot , c2 FROM t FROM t GROUP BY c1

  25. OVER (PARTITION BY) How it works dep salary SELECT dep, 1 1000 6000 salary, 22 1000 6000 SUM(salary) 22 1000 6000 OVER() 333 1000 6000 FROM emp 333 1000 6000 333 1000 6000

  26. OVER (PARTITION BY) How it works Look here dep salary SELECT dep, 1 1000 6000 salary, 22 1000 6000 SUM(salary) 22 1000 6000 OVER() 333 1000 6000 FROM emp 333 1000 6000 333 1000 6000

  27. OVER (PARTITION BY) How it works dep salary SELECT dep, 1 1000 6000 salary, 22 1000 6000 SUM(salary) 22 1000 6000 OVER() 333 1000 6000 FROM emp 333 1000 6000 333 1000 6000

  28. OVER (PARTITION BY) How it works dep salary SELECT dep, 1 1000 6000 salary, 22 1000 6000 SUM(salary) 22 1000 6000 OVER() 333 1000 6000 FROM emp 333 1000 6000 333 1000 6000

  29. OVER (PARTITION BY) How it works dep salary SELECT dep, 1 1000 1000 salary, 22 1000 2000 SUM(salary) 22 1000 2000 OVER() PARTITION BY dep ) 333 1000 3000 FROM emp 333 1000 3000 333 1000 3000

  30. OVER and ORDER BY (Framing & Ranking)

  31. OVER (ORDER BY) The Problem acnt id value balance SELECT id, 1 1 +10 +10 value, FROM transactions t 22 2 +20 +30 22 3 -10 +20 333 4 +50 +70 333 5 -30 +40 333 6 -20 +20

  32. OVER (ORDER BY) The Problem acnt id value balance SELECT id, 1 1 +10 +10 value, (SELECT SUM(value) 22 2 +20 +30 FROM transactions t2 22 3 -10 +20 WHERE t2.id <= t.id) 333 4 +50 +70 FROM transactions t 333 5 -30 +40 Range segregation (<=) 
 not possible with 
 333 6 -20 +20 GROUP BY or 
 PARTITION BY

  33. OVER (ORDER BY) Since SQL:2003 acnt id value balance SELECT id, 1 1 +10 +10 value, SUM(value) 22 2 +20 +30 OVER ( 22 3 -10 +20 ORDER BY id 333 4 +50 +70 333 5 -30 +40 333 6 -20 +20 ) FROM transactions t

  34. OVER (ORDER BY) Since SQL:2003 acnt id value balance SELECT id, 1 1 +10 +10 value, SUM(value) 22 2 +20 +30 OVER ( 22 3 -10 +20 ORDER BY id 333 4 +50 +70 ROWS BETWEEN 
 UNBOUNDED PRECEDING 333 5 -30 +40 333 6 -20 +20 ) FROM transactions t

  35. OVER (ORDER BY) Since SQL:2003 acnt id value balance SELECT id, 1 1 +10 +10 value, SUM(value) 22 2 +20 +30 OVER ( 22 3 -10 +20 ORDER BY id 333 4 +50 +70 ROWS BETWEEN 
 UNBOUNDED PRECEDING 333 5 -30 +40 AND CURRENT ROW 333 6 -20 +20 ) FROM transactions t

  36. OVER (ORDER BY) Since SQL:2003 acnt id value balance SELECT id, 1 1 +10 +10 value, SUM(value) 22 2 +20 +30 OVER ( 22 3 -10 +20 ORDER BY id 333 4 +50 +70 ROWS BETWEEN 
 UNBOUNDED PRECEDING 333 5 -30 +40 AND CURRENT ROW 333 6 -20 +20 ) FROM transactions t

  37. OVER (ORDER BY) Since SQL:2003 acnt id value balance SELECT id, 1 1 +10 +10 value, SUM(value) 22 2 +20 +30 OVER ( 22 3 -10 +20 ORDER BY id 333 4 +50 +70 ROWS BETWEEN 
 UNBOUNDED PRECEDING 333 5 -30 +40 AND CURRENT ROW 333 6 -20 +20 ) FROM transactions t

  38. OVER (ORDER BY) Since SQL:2003 acnt id value balance SELECT id, 1 1 +10 +10 value, SUM(value) 22 2 +20 +30 OVER ( 22 3 -10 +20 ORDER BY id 333 4 +50 +70 ROWS BETWEEN 
 UNBOUNDED PRECEDING 333 5 -30 +40 AND CURRENT ROW 333 6 -20 +20 ) FROM transactions t

  39. OVER (ORDER BY) Since SQL:2003 acnt id value balance SELECT id, 1 1 +10 +10 value, SUM(value) 22 2 +20 +30 OVER ( 22 3 -10 +20 ORDER BY id 333 4 +50 +70 ROWS BETWEEN 
 UNBOUNDED PRECEDING 333 5 -30 +40 AND CURRENT ROW 333 6 -20 +20 ) FROM transactions t

  40. OVER (ORDER BY) Since SQL:2003 acnt id value balance SELECT id, 1 1 +10 +10 value, SUM(value) 22 2 +20 +30 OVER ( 22 3 -10 +20 ORDER BY id 333 4 +50 +70 ROWS BETWEEN 
 UNBOUNDED PRECEDING 333 5 -30 +40 AND CURRENT ROW 333 6 -20 +20 ) FROM transactions t

  41. OVER (ORDER BY) Since SQL:2003 acnt id value balance SELECT id, 1 1 +10 +10 value, SUM(value) 22 2 +20 +30 OVER ( 22 3 -10 +20 ORDER BY id 333 4 +50 +70 ROWS BETWEEN 
 UNBOUNDED PRECEDING 333 5 -30 +40 AND CURRENT ROW 333 6 -20 +20 ) FROM transactions t

  42. OVER (ORDER BY) Since SQL:2003 acnt id value balance SELECT id, 1 1 +10 value, SUM(value) 22 2 +20 OVER ( 22 3 -10 ORDER BY id 333 4 +50 ROWS BETWEEN 
 UNBOUNDED PRECEDING 333 5 -30 AND CURRENT ROW 333 6 -20 ) FROM transactions t

  43. OVER (ORDER BY) Since SQL:2003 acnt id value balance SELECT id, 1 1 +10 +10 value, SUM(value) 22 2 +20 +20 OVER ( PARTITION BY acnt 22 3 -10 +10 ORDER BY id 333 4 +50 +50 ROWS BETWEEN 
 UNBOUNDED PRECEDING 333 5 -30 +20 AND CURRENT ROW 333 6 -20 . 0 ) FROM transactions t

  44. OVER (ORDER BY) Since SQL:2003 With OVER (ORDER BY n) a new type of functions make sense: n ROW_NUMBER RANK DENSE_RANK PERCENT_RANK CUME_DIST 1 1 1 1 0 0.25 2 2 2 2 0.33… 0.75 3 3 2 2 0.33… 0.75 4 4 4 3 1 1

  45. Use Cases OVER (SQL:2003) ‣ Aggregates without GROUP BY ‣ Running totals, 
 AVG(…) OVER(ORDER BY … moving averages ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING) moving_avg ‣ Ranking SELECT * FROM (SELECT ROW_NUMBER() ‣ Top-N per Group OVER(PARTITION BY … ORDER BY …) rn , t.* ‣ Avoiding self-joins FROM t) numbered_t WHERE rn <= 3 [… many more …]

  46. In a Nutshell OVER (SQL:2003) OVER may follow any aggregate function OVER defines which rows are visible at each row OVER() makes all rows visible at every row OVER(PARTITION BY …) segregates like GROUP BY OVER(ORDER BY … BETWEEN) segregates using < , >

  47. Availability OVER (SQL:2003) Impala Spark Hive 9 1 3 5 7 9 1 3 5 7 9 0 0 0 0 0 1 1 1 1 NuoDB 9 0 0 0 0 0 0 0 0 0 1 2 2 2 2 2 2 2 2 2 MariaDB 5.1 10.2 8.0 MySQL PostgreSQL 8.4 SQLite [0] DB2 LUW 7.0 Oracle 8i SQL Server 2005 [0] Expected for release 3.25.0 (available in snapshot release).

  48. NULLS FIRST/LAST

  49. NULLS FIRST/LAST Before SQL:2003 The sorting of NULL is implementation defined If you know a value 
 (some DBs sort NULL as great, others as very small value) larger/smaller than any 
 actual value… SELECT … FROM … ORDER BY COALESCE(nullable, ?);

  50. NULLS FIRST/LAST Before SQL:2003 The sorting of NULL is implementation defined (some DBs sort NULL as great, others as very small value) This shows NULLs first (no matter if nullable 
 SELECT … is sorted ASC or DESC) FROM … ORDER BY COALESCE(nullable, ?); ORDER BY CASE WHEN nullable IS NULL THEN 0 ELSE 1 Using an extra sort key 
 END to put NULL and NOT NULL , nullable; apart is more robust

  51. NULLS FIRST/LAST Since SQL:2003 SQL:2003 introduced ORDER BY … NULLS FIRST/LAST SELECT … FROM … ORDER BY nullable NULLS FIRST This returns 
 NULLs first 
 (for ASC and DESC) Note: PostgreSQL accepts NULLS FIRST/LAST in index definitions.

  52. NULLS FIRST/LAST Since SQL:2003 9 1 3 5 7 9 1 3 5 7 9 0 0 0 0 0 1 1 1 1 9 0 0 0 0 0 0 0 0 0 1 2 2 2 2 2 2 2 2 2 MariaDB [0] MySQL [0] PostgreSQL 8.3 [1] SQLite [0] [1] DB2 LUW 11.1 Oracle 11gR1 [1] SQL Server [0] [0] By default sorted as smallest [1] By default sorted as greatest

  53. FILTER

  54. The Problem FILTER Pivot table: Years on the Y axis, month on X: SELECT YEAR, SUM(CASE WHEN MONTH = 1 THEN revenue 
 ELSE 0 
 END) JAN, SUM(CASE WHEN MONTH = 2 THEN revenue END) FEB, … FROM sales GROUP BY YEAR

  55. The Problem FILTER Pivot table: Years on the Y axis, month on X: SELECT YEAR, SUM(CASE WHEN MONTH = 1 THEN revenue 
 Aggregates 
 ELSE 0 
 ignore NULL* Optional: 
 END) JAN, ELSE NULL SUM(CASE WHEN MONTH = 2 THEN revenue END) FEB, is default … FROM sales GROUP BY YEAR *Exceptions: array_agg , json_objectagg , xmlagg See: https://modern-sql.com/concept/null#aggregates

  56. The Problem FILTER Pivot table: Years on the Y axis, month on X: SELECT YEAR, SUM(CASE WHEN MONTH = 1 THEN revenue 
 ELSE 0 
 END) JAN, SUM(CASE WHEN MONTH = 2 THEN revenue END) FEB, … FROM sales GROUP BY YEAR

  57. Since SQL:2003 FILTER SQL:2003 allows FILTER (WHERE…) after aggregates: SELECT YEAR, SUM(revenue) FILTER (WHERE MONTH = 1) JAN, SUM(revenue) FILTER (WHERE MONTH = 2) FEB, … FROM sales GROUP BY YEAR;

  58. Since SQL:2003 FILTER Pivot in SQL Year Jan Feb Mar ... Dec 2016 1 23 345 ... 1234 1. Use GROUP BY to combine rows ) ) … 2 ) = 3 2. Use FILTER to pick E h = R t h E n t H ) rows per column o n W … m o ( = m R h E E ) t R T E 2 n E L R 1 o H I E = m F W H h ( ) W Year Month Revenue R E t … ( ( E R n M R U T S E o L E H I m T F W L 2016 1 1 I ( ) E … F R ( R M E U S ) E T e H L u W n I e 2016 2 23 F ( v e R r ) ( E M e U T S u L n e I v F e 2016 3 345 r ( ) M U e S u n e v e r ( 2016 ... ... M U S 2016 12 1234 See: https://modern-sql.com/use-case/pivot

  59. Availability FILTER 9 1 3 5 7 9 1 3 5 7 9 0 0 0 0 0 1 1 1 1 9 0 0 0 0 0 0 0 0 0 1 2 2 2 2 2 2 2 2 2 MariaDB 5.1 MySQL PostgreSQL 9.4 SQLite [0] DB2 LUW Oracle SQL Server [0] Only with OVER clause

  60. Inverse Distribution Functions (percentiles)

  61. The Problem Inverse Distribution Functions Grouped rows cannot be ordered prior aggregation. (how to get the middle value (median) of a set) SELECT d1.val FROM data d1 JOIN data d2 ON (d1.val < d2.val OR (d1.val=d2.val AND d1.id<d2.id)) GROUP BY d1.val HAVING count(*) = (SELECT FLOOR(COUNT(*)/2) FROM data d3)

  62. The Problem Inverse Distribution Functions Grouped rows cannot be ordered prior aggregation. (how to get the middle value (median) of a set) Number rows SELECT d1.val FROM data d1 JOIN data d2 ON (d1.val < d2.val OR (d1.val=d2.val AND d1.id<d2.id)) GROUP BY d1.val HAVING count(*) = Pick middle one (SELECT FLOOR(COUNT(*)/2) FROM data d3)

Recommend


More recommend