modern sql
play

Modern SQL: Evolution of a dinosaur Markus Winand Krakw, 9-11 May - PowerPoint PPT Presentation

Modern SQL: Evolution of a dinosaur Markus Winand Krakw, 9-11 May 2018 Still using Windows 3.1? So why stick with SQL-92? @ModernSQL - https://modern-sql.com/ @MarkusWinand SQL:1999 WITH (Common Table Expressions) WITH (non-recursive)


  1. OVER (PARTITION BY) The Problem Two distinct concepts could not be used independently: ‣ Merge rows with the same key properties ‣ GROUP BY to specify key properties ‣ DISTINCT to use full row as key ‣ Aggregate data from related rows ‣ Requires GROUP BY to segregate the rows ‣ COUNT , SUM , AVG , MIN , MAX to aggregate grouped rows

  2. OVER (PARTITION BY) The Problem No ⇠ Aggregate ⇢ Yes SELECT c1 SELECT c1 Yes ⇠ Merge rows ⇢ No , c2 , c2 FROM t FROM t SELECT DISTINCT SELECT c1 SELECT c1 c1 , SUM (c2) tot , SUM (c2) tot , c2 FROM t FROM t FROM t GROUP BY c1 GROUP BY c1

  3. OVER (PARTITION BY) The Problem No ⇠ Aggregate ⇢ Yes SELECT c1 SELECT c1 Yes ⇠ Merge rows ⇢ No , tot , c2 , c2 FROM t FROM t SELECT c1 JOIN ( ) ta , SUM (c2) tot FROM t GROUP BY c1 ON (t.c1=ta.c1) SELECT DISTINCT SELECT c1 c1 , SUM (c2) tot , c2 FROM t FROM t GROUP BY c1

  4. OVER (PARTITION BY) The Problem No ⇠ Aggregate ⇢ Yes SELECT c1 SELECT c1 Yes ⇠ Merge rows ⇢ No , tot , c2 , c2 FROM t FROM t SELECT c1 JOIN ( ) ta , SUM (c2) tot FROM t GROUP BY c1 ON (t.c1=ta.c1) SELECT DISTINCT SELECT c1 c1 , SUM (c2) tot , c2 FROM t FROM t GROUP BY c1

  5. OVER (PARTITION BY) Since SQL:2003 No ⇠ Aggregate ⇢ Yes SELECT c1 SELECT c1 Yes ⇠ Merge rows ⇢ No , c2 , c2 FROM t FROM t , SUM(c2) OVER (PARTITION BY c1) FROM t SELECT DISTINCT SELECT c1 c1 , SUM (c2) tot , c2 FROM t FROM t GROUP BY c1

  6. OVER (PARTITION BY) How it works dep salary ts SELECT dep, 1 1000 6000 salary, 22 1000 6000 SUM(salary) 22 1000 6000 OVER() 333 1000 6000 FROM emp 333 1000 6000 333 1000 6000

  7. OVER (PARTITION BY) How it works dep salary ts SELECT dep, 1 1000 6000 salary, 22 1000 6000 SUM(salary) 22 1000 6000 OVER() 333 1000 6000 FROM emp 333 1000 6000 333 1000 6000

  8. OVER (PARTITION BY) How it works dep salary ts SELECT dep, 1 1000 6000 salary, 22 1000 6000 SUM(salary) 22 1000 6000 OVER() 333 1000 6000 FROM emp 333 1000 6000 333 1000 6000

  9. OVER (PARTITION BY) How it works dep salary ts SELECT dep, 1 1000 6000 salary, 22 1000 6000 SUM(salary) 22 1000 6000 OVER() 333 1000 6000 FROM emp 333 1000 6000 333 1000 6000

  10. OVER (PARTITION BY) How it works dep salary ts SELECT dep, 1 1000 6000 salary, 22 1000 6000 SUM(salary) 22 1000 6000 OVER() 333 1000 6000 FROM emp 333 1000 6000 333 1000 6000

  11. OVER (PARTITION BY) How it works dep salary ts SELECT dep, 1 1000 1000 salary, 22 1000 2000 SUM(salary) 22 1000 2000 OVER() PARTITION BY dep ) 333 1000 3000 FROM emp 333 1000 3000 333 1000 3000

  12. OVER and ORDER BY (Framing & Ranking)

  13. OVER (ORDER BY) The Problem acnt id value balance SELECT id, 1 1 +10 +10 value, FROM transactions t 22 2 +20 +30 22 3 -10 +20 333 4 +50 +70 333 5 -30 +40 333 6 -20 +20

  14. OVER (ORDER BY) The Problem acnt id value balance SELECT id, 1 1 +10 +10 value, (SELECT SUM(value) 22 2 +20 +30 FROM transactions t2 22 3 -10 +20 WHERE t2.id <= t.id) 333 4 +50 +70 FROM transactions t 333 5 -30 +40 333 6 -20 +20

  15. OVER (ORDER BY) The Problem acnt id value balance SELECT id, 1 1 +10 +10 value, (SELECT SUM(value) 22 2 +20 +30 FROM transactions t2 22 3 -10 +20 WHERE t2.id <= t.id) 333 4 +50 +70 FROM transactions t 333 5 -30 +40 Range segregation (<=) 
 not possible with 
 333 6 -20 +20 GROUP BY or 
 PARTITION BY

  16. OVER (ORDER BY) Since SQL:2003 acnt id value balance SELECT id, 1 1 +10 +10 value, SUM(value) 22 2 +20 +30 OVER ( 22 3 -10 +20 333 4 +50 +70 333 5 -30 +40 333 6 -20 +20 ) FROM transactions t

  17. OVER (ORDER BY) Since SQL:2003 acnt id value balance SELECT id, 1 1 +10 +10 value, SUM(value) 22 2 +20 +30 OVER ( 22 3 -10 +20 ORDER BY id 333 4 +50 +70 333 5 -30 +40 333 6 -20 +20 ) FROM transactions t

  18. OVER (ORDER BY) Since SQL:2003 acnt id value balance SELECT id, 1 1 +10 +10 value, SUM(value) 22 2 +20 +30 OVER ( 22 3 -10 +20 ORDER BY id 333 4 +50 +70 ROWS BETWEEN 
 UNBOUNDED PRECEDING 333 5 -30 +40 333 6 -20 +20 ) FROM transactions t

  19. OVER (ORDER BY) Since SQL:2003 acnt id value balance SELECT id, 1 1 +10 +10 value, SUM(value) 22 2 +20 +30 OVER ( 22 3 -10 +20 ORDER BY id 333 4 +50 +70 ROWS BETWEEN 
 UNBOUNDED PRECEDING 333 5 -30 +40 AND CURRENT ROW 333 6 -20 +20 ) FROM transactions t

  20. OVER (ORDER BY) Since SQL:2003 acnt id value balance SELECT id, 1 1 +10 +10 value, SUM(value) 22 2 +20 +30 OVER ( 22 3 -10 +20 ORDER BY id 333 4 +50 +70 ROWS BETWEEN 
 UNBOUNDED PRECEDING 333 5 -30 +40 AND CURRENT ROW 333 6 -20 +20 ) FROM transactions t

  21. OVER (ORDER BY) Since SQL:2003 acnt id value balance SELECT id, 1 1 +10 +10 value, SUM(value) 22 2 +20 +30 OVER ( 22 3 -10 +20 ORDER BY id 333 4 +50 +70 ROWS BETWEEN 
 UNBOUNDED PRECEDING 333 5 -30 +40 AND CURRENT ROW 333 6 -20 +20 ) FROM transactions t

  22. OVER (ORDER BY) Since SQL:2003 acnt id value balance SELECT id, 1 1 +10 +10 value, SUM(value) 22 2 +20 +30 OVER ( 22 3 -10 +20 ORDER BY id 333 4 +50 +70 ROWS BETWEEN 
 UNBOUNDED PRECEDING 333 5 -30 +40 AND CURRENT ROW 333 6 -20 +20 ) FROM transactions t

  23. OVER (ORDER BY) Since SQL:2003 acnt id value balance SELECT id, 1 1 +10 +10 value, SUM(value) 22 2 +20 +30 OVER ( 22 3 -10 +20 ORDER BY id 333 4 +50 +70 ROWS BETWEEN 
 UNBOUNDED PRECEDING 333 5 -30 +40 AND CURRENT ROW 333 6 -20 +20 ) FROM transactions t

  24. OVER (ORDER BY) Since SQL:2003 acnt id value balance SELECT id, 1 1 +10 +10 value, SUM(value) 22 2 +20 +30 OVER ( 22 3 -10 +20 ORDER BY id 333 4 +50 +70 ROWS BETWEEN 
 UNBOUNDED PRECEDING 333 5 -30 +40 AND CURRENT ROW 333 6 -20 +20 ) FROM transactions t

  25. OVER (ORDER BY) Since SQL:2003 acnt id value balance SELECT id, 1 1 +10 value, SUM(value) 22 2 +20 OVER ( 22 3 -10 ORDER BY id 333 4 +50 ROWS BETWEEN 
 UNBOUNDED PRECEDING 333 5 -30 AND CURRENT ROW 333 6 -20 ) FROM transactions t

  26. OVER (ORDER BY) Since SQL:2003 acnt id value balance SELECT id, 1 1 +10 +10 value, SUM(value) 22 2 +20 +20 OVER ( PARTITION BY acnt 22 3 -10 +10 ORDER BY id 333 4 +50 +50 ROWS BETWEEN 
 UNBOUNDED PRECEDING 333 5 -30 +20 AND CURRENT ROW 333 6 -20 . 0 ) FROM transactions t

  27. OVER (ORDER BY) Since SQL:2003 With OVER (ORDER BY n) a new type of functions make sense: n ROW_NUMBER RANK DENSE_RANK PERCENT_RANK CUME_DIST 1 1 1 1 0 0.25 2 2 2 2 0.33… 0.75 3 3 2 2 0.33… 0.75 4 4 4 3 1 1

  28. Use Cases OVER (SQL:2003) ‣ Aggregates without GROUP BY ‣ Running totals, 
 AVG(…) OVER(ORDER BY … moving averages ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING) moving_avg ‣ Ranking SELECT * FROM (SELECT ROW_NUMBER() ‣ Top-N per Group OVER(PARTITION BY … ORDER BY …) rn , t.* ‣ Avoiding self-joins FROM t) numbered_t WHERE rn <= 3 [… many more …]

  29. In a Nutshell OVER (SQL:2003) OVER may follow any aggregate function OVER defines which rows are visible at each row OVER() makes all rows visible at every row OVER(PARTITION BY …) segregates like GROUP BY OVER(ORDER BY … BETWEEN) segregates using < , >

  30. Availability OVER (SQL:2003) Impala Spark Hive 1999 2001 2003 2005 2007 2009 2011 2013 2015 2017 NuoDB MariaDB 5.1 10.2 8.0 MySQL PostgreSQL 8.4 SQLite DB2 LUW 7.0 Oracle 8i SQL Server 2005

  31. SQL:2006

  32. XMLTABLE

  33. Since SQL:2006 XMLTABLE Stored in tbl.x : SELECT id , c1 <d> XPath * expression , n to identify rows <e id="42"> FROM tbl <c1>…</c1> , XMLTABLE( </e> '/d/e' </d> PASSING x COLUMNS id INT PATH '@id' , c1 VARCHAR(255) PATH 'c1' , n FOR ORDINALITY ) r *Standard SQL allows XQuery

  34. Since SQL:2006 XMLTABLE Stored in tbl.x : SELECT id , c1 <d> , n <e id="42"> FROM tbl <c1>…</c1> , XMLTABLE( </e> '/d/e' </d> PASSING x COLUMNS id INT PATH '@id' , c1 VARCHAR(255) PATH 'c1' , n FOR ORDINALITY ) r *Standard SQL allows XQuery

  35. Since SQL:2006 XMLTABLE Stored in tbl.x : SELECT id , c1 <d> , n <e id="42"> FROM tbl <c1>…</c1> XPath * expressions , XMLTABLE( </e> to extract data '/d/e' </d> PASSING x COLUMNS id INT PATH '@id' , c1 VARCHAR(255) PATH 'c1' , n FOR ORDINALITY ) r *Standard SQL allows XQuery

  36. Since SQL:2006 XMLTABLE Stored in tbl.x : SELECT id , c1 <d> , n <e id="42"> FROM tbl <c1>…</c1> , XMLTABLE( </e> '/d/e' </d> Row number PASSING x (like for unnest) COLUMNS id INT PATH '@id' , c1 VARCHAR(255) PATH 'c1' , n FOR ORDINALITY ) r *Standard SQL allows XQuery

  37. Since SQL:2006 XMLTABLE Stored in tbl.x : SELECT id Result , c1 <d> id | c1 | n , n <e id="42"> ----+----+--- FROM tbl <c1>…</c1> 42 | … | 1 , XMLTABLE( </e> '/d/e' </d> PASSING x COLUMNS id INT PATH '@id' , c1 VARCHAR(255) PATH 'c1' , n FOR ORDINALITY ) r *Standard SQL allows XQuery

  38. Availability XMLTABLE 1999 2001 2003 2005 2007 2009 2011 2013 2015 2017 MariaDB MySQL 10 [0] PostgreSQL SQLite DB2 LUW 9.7 Oracle 11gR1 SQL Server [0] No XQuery (only XPath). No default namespace declaration.

  39. SQL:2008

  40. FETCH FIRST

  41. The Problem FETCH FIRST Limit the result to a number of rows. ( LIMIT , TOP and ROWNUM are all proprietary) SELECT * FROM (SELECT * , ROW_NUMBER() OVER(ORDER BY x) rn FROM data) numbered_data WHERE rn <=10 SQL:2003 introduced ROW_NUMBER() to number rows. 
 But this still requires wrapping to limit the result. And how about databases not supporting ROW_NUMBER() ?

  42. The Problem FETCH FIRST Limit the result to a number of rows. ( LIMIT , TOP and ROWNUM are all proprietary) Dammit! SELECT * Let's take 
 FROM (SELECT * LIMIT , ROW_NUMBER() OVER(ORDER BY x) rn FROM data) numbered_data WHERE rn <=10 SQL:2003 introduced ROW_NUMBER() to number rows. 
 But this still requires wrapping to limit the result. And how about databases not supporting ROW_NUMBER() ?

  43. Since SQL:2008 FETCH FIRST SQL:2008 introduced the FETCH FIRST … ROWS ONLY clause: SELECT * FROM data ORDER BY x FETCH FIRST 10 ROWS ONLY

  44. Availability FETCH FIRST 1999 2001 2003 2005 2007 2009 2011 2013 2015 2017 MariaDB 5.1 MySQL 3.19.3 [0] PostgreSQL 6.5 8.4 [1] SQLite 2.1.0 [1] DB2 LUW 7.0 Oracle 12cR1 SQL Server 7.0 2012 [2] [0] Earliest mention of LIMIT . Probably inherited from mSQL [1] Functionality available using LIMIT [2] SELECT TOP n ... SQL Server 2000 also supports expressions and bind parameters

  45. SQL:2011

  46. OFFSET

  47. The Problem OFFSET How to fetch the rows after a limit? 
 (pagination anybody?) SELECT * FROM (SELECT * , ROW_NUMBER() OVER(ORDER BY x) rn FROM data) numbered_data WHERE rn > 10 and rn <= 20

  48. Since SQL:2011 OFFSET SQL:2011 introduced OFFSET , unfortunately! SELECT * FROM data ORDER BY x OFFSET 10 ROWS FETCH NEXT 10 ROWS ONLY

  49. Since SQL:2011 OFFSET SQL:2011 introduced OFFSET , unfortunately! OFFSET SELECT * FROM data ORDER BY x OFFSET 10 ROWS FETCH NEXT 10 ROWS ONLY e r s o a s t a b c G r e r s ! s t i c k & https://use-the-index-luke.com/no-offset

  50. Since SQL:2011 OFFSET 1999 2001 2003 2005 2007 2009 2011 2013 2015 MariaDB 5.1 MySQL 3.20.3 4.0.6 [0] [1] PostgreSQL 6.5 SQLite 2.1.0 DB2 LUW 9.7 11.1 [2] Oracle 12c SQL Server 2012 [0] LIMIT [offset,] limit : "With this it's easy to do a poor man's next page/previous page WWW application." [1] The release notes say "Added PostgreSQL compatible LIMIT syntax" [2] Requires enabling the MySQL compatibility vector: db2set DB2_COMPATIBILITY_VECTOR=MYS

  51. OVER

  52. The Problem OVER (SQL:2011) Direct access of other rows of the same window is not possible. (E.g., calculate the di ff erence to the previous rows) WITH numbered_t AS (SELECT * FROM t ) SELECT curr.* curr , curr.balance balance … rn - COALESCE(prev.balance,0) 50 … 1 FROM numbered_t curr 90 … 2 LEFT JOIN numbered_t prev 70 … 3 ON (curr.rn = prev.rn+1) 30 … 4

  53. The Problem OVER (SQL:2011) Direct access of other rows of the same window is not possible. (E.g., calculate the di ff erence to the previous rows) WITH numbered_t AS (SELECT * , ROW_NUMBER() OVER(ORDER BY x) rn ) FROM t SELECT curr.* curr , curr.balance balance … rn - COALESCE(prev.balance,0) 50 … 1 FROM numbered_t curr 90 … 2 LEFT JOIN numbered_t prev 70 … 3 ON (curr.rn = prev.rn+1) 30 … 4

  54. The Problem OVER (SQL:2011) Direct access of other rows of the same window is not possible. (E.g., calculate the di ff erence to the previous rows) WITH numbered_t AS (SELECT * , ROW_NUMBER() OVER(ORDER BY x) rn ) FROM t SELECT curr.* curr , curr.balance balance … rn - COALESCE(prev.balance,0) 50 … 1 FROM numbered_t curr 90 … 2 LEFT JOIN numbered_t prev 70 … 3 ON (curr.rn = prev.rn+1) 30 … 4

  55. The Problem OVER (SQL:2011) Direct access of other rows of the same window is not possible. (E.g., calculate the di ff erence to the previous rows) WITH numbered_t AS (SELECT * , ROW_NUMBER() OVER(ORDER BY x) rn ) FROM t SELECT curr.* curr prev , curr.balance balance … rn balance … rn - COALESCE(prev.balance,0) 50 … 1 50 … 1 FROM numbered_t curr 90 … 2 90 … 2 LEFT JOIN numbered_t prev 70 … 3 70 … 3 ON (curr.rn = prev.rn+1) 30 … 4 30 … 4

  56. The Problem OVER (SQL:2011) Direct access of other rows of the same window is not possible. (E.g., calculate the di ff erence to the previous rows) WITH numbered_t AS (SELECT * , ROW_NUMBER() OVER(ORDER BY x) rn ) FROM t SELECT curr.* curr prev , curr.balance balance … rn balance … rn - COALESCE(prev.balance,0) 50 … 1 FROM numbered_t curr 50 … 1 90 … 2 LEFT JOIN numbered_t prev 70 … 3 90 … 2 ON (curr.rn = prev.rn+1) 30 … 4 70 … 3 30 … 4

  57. The Problem OVER (SQL:2011) Direct access of other rows of the same window is not possible. (E.g., calculate the di ff erence to the previous rows) WITH numbered_t AS (SELECT * , ROW_NUMBER() OVER(ORDER BY x) rn ) FROM t SELECT curr.* curr prev , curr.balance balance … rn balance … rn - COALESCE(prev.balance,0) +50 50 … 1 FROM numbered_t curr 50 … 1 +40 90 … 2 LEFT JOIN numbered_t prev 70 … 3 90 … 2 -20 ON (curr.rn = prev.rn+1) 30 … 4 70 … 3 -40 30 … 4

  58. Since SQL:2011 OVER (SQL:2011) SQL:2011 introduced LEAD , LAG , NTH_VALUE , … for that: SELECT *, balance 
 - COALESCE( LAG(balance) 
 OVER(ORDER BY x) 
 , 0) 
 FROM t Available functions: LEAD / LAG 
 FIRST_VALUE / LAST_VALUE 
 NTH_VALUE(col, n) FROM FIRST/LAST 
 RESPECT/IGNORE NULLS

  59. Since SQL:2011 OVER (LEAD, LAG, …) 1999 2001 2003 2005 2007 2009 2011 2013 2015 2017 MariaDB 5.1 10.2 [0] [0] MySQL 8.0 PostgreSQL 8.4 [0] SQLite DB2 LUW 9.5 11.1 [1] Oracle 8i 11gR2 [1] SQL Server 2012 [1] [0] No IGNORE NULLS and FROM LAST [1] No NTH_VALUE

Recommend


More recommend