more than a query language sql in the 21 st century
play

More Than a Query Language: SQL in the 21 st Century @MarkusWinand - PowerPoint PPT Presentation

More Than a Query Language: SQL in the 21 st Century @MarkusWinand @ModernSQL http://www.almaden.ibm.com/cs/people/chamberlin/sequel-1974.pdf 1974 1992 SQL-92 Tied to the Relational Idea SQL-92 Tied to the Relational Idea


  1. SQL:2003 — Schemaless & Analytical Schemaless Analytical ‣ Introduced XML ‣ Introduced 
 ‣ Non-uniform 
 window functions ‣ Accessing other rows 
 documents in 
 a single column of the current result Later: Later: ‣ JSON added with SQL:2016 ‣ Extended in SQL:2011 ‣ Proprietary JSON support: ‣ Popular among “New SQLs” ‣ 2012: PostgreSQL ‣ 2013: BigQuery, Hive ‣ 2014: Oracle ‣ 2014: Impala ‣ 2015: MySQL ‣ 2015: Spark SQL ‣ 2016: SQL Server ‣ 2016: NuoDB, MemSQL, Cockroach DB, VoltDB

  2. SQL:2003 — Analytical id value SELECT id, value 1 +10 2 +20 3 -10 4 +50 5 -30 6 -20 FROM t

  3. SQL:2003 — Analytical id value bal SELECT id, value 1 +10 2 +20 3 -10 4 +50 5 -30 6 -20 FROM t

  4. SQL:2003 — Analytical id value bal SELECT id, value 1 +10 +10 2 +20 3 -10 4 +50 5 -30 6 -20 FROM t

  5. SQL:2003 — Analytical id value bal SELECT id, value 1 +10 +10 2 +20 +30 3 -10 +20 4 +50 +70 5 -30 +40 6 -20 +20 FROM t

  6. SQL:2003 — Analytical id value bal SELECT id, value , SUM(value) 1 +10 +10 OVER ( 2 +20 +30 3 -10 +20 4 +50 +70 5 -30 +40 6 -20 +20 ) bal FROM t

  7. SQL:2003 — Analytical id value bal SELECT id, value , SUM(value) 1 +10 +10 OVER ( 2 +20 +30 ORDER BY id 3 -10 +20 4 +50 +70 5 -30 +40 6 -20 +20 ) bal FROM t

  8. SQL:2003 — Analytical id value bal SELECT id, value , SUM(value) 1 +10 +10 OVER ( 2 +20 +30 ORDER BY id 3 -10 +20 4 +50 +70 5 -30 +40 6 -20 +20 ) bal FROM t

  9. SQL:2003 — Analytical id value bal SELECT id, value , SUM(value) 1 +10 +10 OVER ( 2 +20 +30 ORDER BY id 3 -10 +20 ROWS BETWEEN 4 +50 +70 UNBOUNDED PRECEDING 5 -30 +40 6 -20 +20 ) bal FROM t

  10. SQL:2003 — Analytical id value bal SELECT id, value , SUM(value) 1 +10 +10 OVER ( 2 +20 +30 ORDER BY id 3 -10 +20 ROWS BETWEEN 4 +50 +70 UNBOUNDED PRECEDING 5 -30 +40 6 -20 +20 ) bal FROM t

  11. SQL:2003 — Analytical id value bal SELECT id, value , SUM(value) 1 +10 +10 OVER ( 2 +20 +30 ORDER BY id 3 -10 +20 ROWS BETWEEN 4 +50 +70 UNBOUNDED PRECEDING 5 -30 +40 AND CURRENT ROW 6 -20 +20 ) bal FROM t

  12. SQL:2003 — Analytical id value bal SELECT id, value , SUM(value) 1 +10 +10 OVER ( 2 +20 +30 ORDER BY id 3 -10 +20 ROWS BETWEEN 4 +50 +70 UNBOUNDED PRECEDING 5 -30 +40 AND CURRENT ROW 6 -20 +20 ) bal FROM t

  13. SQL:2003 — Analytical id value bal SELECT id, value , SUM(value) 1 +10 +10 OVER ( 2 +20 +30 ORDER BY id 3 -10 +20 ROWS BETWEEN 4 +50 +70 UNBOUNDED PRECEDING 5 -30 +40 AND CURRENT ROW 6 -20 +20 ) bal FROM t

  14. SQL:2003 — Analytical id value bal SELECT id, value , SUM(value) 1 +10 +10 OVER ( 2 +20 +30 ORDER BY id 3 -10 +20 ROWS BETWEEN 4 +50 +70 UNBOUNDED PRECEDING 5 -30 +40 AND CURRENT ROW 6 -20 +20 ) bal FROM t

  15. SQL:2003 — Analytical id value bal SELECT id, value , SUM(value) 1 +10 +10 OVER ( 2 +20 +30 ORDER BY id 3 -10 +20 ROWS BETWEEN 4 +50 +70 UNBOUNDED PRECEDING 5 -30 +40 AND CURRENT ROW 6 -20 +20 ) bal FROM t

  16. SQL:2003 — Analytical id value bal SELECT id, value , SUM(value) 1 +10 +10 OVER ( 2 +20 +30 ORDER BY id 3 -10 +20 ROWS BETWEEN 4 +50 +70 UNBOUNDED PRECEDING 5 -30 +40 AND CURRENT ROW 6 -20 +20 ) bal FROM t

  17. SQL:2003 — Analytical id value bal SELECT id, value , SUM(value) 1 +10 +10 OVER ( 2 +20 +30 ORDER BY id 3 -10 +20 ROWS BETWEEN 4 +50 +70 UNBOUNDED PRECEDING 5 -30 +40 AND CURRENT ROW 6 -20 +20 ) bal FROM t

  18. SQL:2003 — Analytical id value bal SELECT id, value , SUM(value) 1 +10 +10 OVER ( 2 +20 +30 ORDER BY id 3 -10 +20 ROWS BETWEEN 4 +50 +70 UNBOUNDED PRECEDING 5 -30 +40 AND CURRENT ROW 6 -20 +20 ) bal FROM t

  19. SQL:2003 — Analytical 1999 2001 2003 2005 2007 2009 2011 2013 2015 2017 MariaDB 5.1 10.2 8.0 MySQL PostgreSQL 8.4 SQLite 1.0 3.25.0 DB2 LUW 7.0 Oracle 8i SQL Server 2005 [0] 2012 [0] Without framing

  20. SQL:2003 — Analytical Impala Spark BigQuery Hive NuoDB 1999 2001 2003 2005 2007 2009 2011 2013 2015 2017 MariaDB 5.1 10.2 8.0 MySQL PostgreSQL 8.4 SQLite 1.0 3.25.0 DB2 LUW 7.0 Oracle 8i SQL Server 2005 [0] 2012 [0] Without framing

  21. Inverse Distribution Functions (percentiles)

  22. SQL:2003 — Analytical (Median)

  23. SQL:2003 — Analytical (Median) SELECT d1.val FROM data d1 JOIN data d2 ON (d1.val < d2.val OR (d1.val=d2.val AND d1.id<d2.id)) GROUP BY d1.val HAVING count(*) = (SELECT FLOOR(COUNT(*)/2) FROM data d3)

  24. SQL:2003 — Analytical (Median) Number rows SELECT d1.val FROM data d1 JOIN data d2 ON (d1.val < d2.val OR (d1.val=d2.val AND d1.id<d2.id)) GROUP BY d1.val HAVING count(*) = (SELECT FLOOR(COUNT(*)/2) FROM data d3)

  25. SQL:2003 — Analytical (Median) Number rows SELECT d1.val FROM data d1 JOIN data d2 ON (d1.val < d2.val OR (d1.val=d2.val AND d1.id<d2.id)) GROUP BY d1.val HAVING count(*) = Pick middle one (SELECT FLOOR(COUNT(*)/2) FROM data d3)

  26. SQL:2003 — Analytical (Median) Number rows SELECT d1.val FROM data d1 JOIN data d2 ON (d1.val < d2.val OR (d1.val=d2.val AND d1.id<d2.id)) GROUP BY d1.val HAVING count(*) = Pick middle one (SELECT FLOOR(COUNT(*)/2) FROM data d3)

  27. SQL:2003 — Analytical (Median) Number rows SELECT d1.val FROM data d1 JOIN data d2 ON (d1.val < d2.val OR (d1.val=d2.val AND d1.id<d2.id)) GROUP BY d1.val HAVING count(*) = Pick middle one (SELECT FLOOR(COUNT(*)/2) FROM data d3)

  28. SQL:2003 — Analytical (Median) Notice Number rows SELECT d1.val FROM data d1 All employees must JOIN data d2 wash hands ON (d1.val < d2.val after using OR (d1.val=d2.val AND d1.id<d2.id)) self-joins GROUP BY d1.val HAVING count(*) = Pick middle one (SELECT FLOOR(COUNT(*)/2) FROM data d3)

  29. SQL:2003 — Analytical (Median) Grab stickers 
 Notice and coasters! Number rows SELECT d1.val FROM data d1 All employees must JOIN data d2 wash hands ON (d1.val < d2.val after using OR (d1.val=d2.val AND d1.id<d2.id)) self-joins GROUP BY d1.val HAVING count(*) = Pick middle one (SELECT FLOOR(COUNT(*)/2) FROM data d3)

  30. SQL:2003 — Analytical (Median)

  31. SQL:2003 — Analytical (Median) SELECT PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY val) FROM data

  32. SQL:2003 — Analytical (Median) Median SELECT PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY val) FROM data

  33. SQL:2003 — Analytical (Median) Median SELECT PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY val) Which value? FROM data

  34. SQL:2003 — Analytical (Median) SELECT PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY val) FROM data

  35. SQL:2003 — Analytical (Median) SELECT PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY val) FROM data Two variants: ‣ for discrete values 
 (categories) ‣ for continuous values 
 (linear interpolation)

  36. SQL:2003 — Analytical (Median) SELECT PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY val) FROM data 0 0.25 0.5 0.75 1 Two variants: 1 1 ‣ for discrete values 
 2 2 (categories) 3 3 ‣ for continuous values 
 (linear interpolation) 4 4

  37. SQL:2003 — Analytical (Median) SELECT PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY val) FROM data 0 0 0.25 0.25 0.5 0.5 0.75 0.75 1 1 Two variants: 1 1 1 PERCENTILE_DISC ‣ for discrete values 
 2 2 2 (categories) 3 3 3 ‣ for continuous values 
 (linear interpolation) 4 4 4

  38. SQL:2003 — Analytical (Median) SELECT PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY val) FROM data 0 0 0 0.25 0.25 0.25 0.5 0.5 0.5 0.75 0.75 0.75 1 1 1 Two variants: 1 1 1 1 PERCENTILE_DISC PERCENTILE_DISC ‣ for discrete values 
 2 2 2 2 (categories) 3 3 3 3 ‣ for continuous values 
 (linear interpolation) 4 4 4 4

  39. SQL:2003 — Analytical (Median) SELECT PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY val) FROM data 0 0 0 0 0.25 0.25 0.25 0.25 0.5 0.5 0.5 0.5 0.75 0.75 0.75 0.75 1 1 1 1 Two variants: 1 1 1 1 1 PERCENTILE_DISC PERCENTILE_DISC PERCENTILE_DISC(0.5) ‣ for discrete values 
 2 2 2 2 2 (categories) 3 3 3 3 3 ‣ for continuous values 
 (linear interpolation) 4 4 4 4 4

  40. SQL:2003 — Analytical (Median) SELECT PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY val) FROM data 0 0 0 0 0 0.25 0.25 0.25 0.25 0.25 0.5 0.5 0.5 0.5 0.5 0.75 0.75 0.75 0.75 0.75 1 1 1 1 1 Two variants: 1 1 1 1 1 1 PERCENTILE_DISC(0.5) PERCENTILE_DISC PERCENTILE_DISC(0.5) PERCENTILE_DISC ‣ for discrete values 
 2 2 2 2 2 2 PERCENTILE_CONT (categories) 3 3 3 3 3 3 ‣ for continuous values 
 (linear interpolation) 4 4 4 4 4 4

  41. SQL:2003 — Analytical (Median) SELECT PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY val) FROM data 0 0 0 0 0 0 0.25 0.25 0.25 0.25 0.25 0.25 0.5 0.5 0.5 0.5 0.5 0.5 0.75 0.75 0.75 0.75 0.75 0.75 1 1 1 1 1 1 Two variants: 1 1 1 1 1 1 1 PERCENTILE_DISC(0.5) PERCENTILE_DISC(0.5) PERCENTILE_DISC PERCENTILE_DISC PERCENTILE_DISC(0.5) ‣ for discrete values 
 2 2 2 2 2 2 2 PERCENTILE_CONT(0.5) PERCENTILE_CONT (categories) 3 3 3 3 3 3 3 ‣ for continuous values 
 (linear interpolation) 4 4 4 4 4 4 4

  42. SQL:2003 — PERCENTILE_DISC 1999 2001 2003 2005 2007 2009 2011 2013 2015 2017 MariaDB 5.1 10.3.7 [0] MySQL PostgreSQL 9.4 SQLite 1.0 DB2 LUW Oracle 9iR1 SQL Server 2012 [0] [0] Only as window function ( OVER required).

  43. SQL:2003 — PERCENTILE_DISC Hive-16255 RESOLVED BigQuery Fix Version: 4.0.0 1999 2001 2003 2005 2007 2009 2011 2013 2015 2017 MariaDB 5.1 10.3.7 [0] MySQL PostgreSQL 9.4 SQLite 1.0 DB2 LUW Oracle 9iR1 SQL Server 2012 [0] [0] Only as window function ( OVER required).

  44. 2003 2016

  45. SQL:2016 — JSON http://standards.iso.org/ittf/PubliclyAvailableStandards/c067367_ISO_IEC_TR_19075-6_2017.zip

  46. SQL:2016 — JSON [ { "id": 42, "a1": "foo" }, { "id": 43, "a1": "bar" } ] http://standards.iso.org/ittf/PubliclyAvailableStandards/c067367_ISO_IEC_TR_19075-6_2017.zip

  47. SQL:2016 — JSON [ { "id": 42, id a1 "a1": "foo" }, 42 foo { 43 bar "id": 43, "a1": "bar" } ] http://standards.iso.org/ittf/PubliclyAvailableStandards/c067367_ISO_IEC_TR_19075-6_2017.zip

  48. SQL:2016 — JSON_TABLE [ SELECT * { "id": 42, FROM JSON_TABLE "a1": "foo" }, ( ? { "id": 43, , '$[*]' "a1": "bar" COLUMNS } ] ( id INT PATH '$.id' , a1 VARCHAR(…) PATH '$.a1' id a1 ) 42 foo ) r 43 bar

  49. SQL:2016 — JSON_TABLE [ SELECT * { "id": 42, FROM JSON_TABLE "a1": "foo" }, ( ? { Bind "id": 43, , '$[*]' Parameter "a1": "bar" COLUMNS } ] ( id INT PATH '$.id' , a1 VARCHAR(…) PATH '$.a1' id a1 ) 42 foo ) r 43 bar

  50. SQL:2016 — JSON_TABLE SQL/JSON Path [ SELECT * { ‣ Query language to "id": 42, select elements from FROM JSON_TABLE "a1": "foo" a JSON document }, ( ? { ‣ Defined in the 
 Bind "id": 43, , '$[*]' SQL standard Parameter "a1": "bar" COLUMNS } ] ( id INT PATH '$.id' , a1 VARCHAR(…) PATH '$.a1' id a1 ) 42 foo ) r 43 bar

  51. SQL:2016 — JSON_TABLE SQL/JSON Path [ SELECT * { ‣ Query language to "id": 42, select elements from FROM JSON_TABLE "a1": "foo" a JSON document }, ( ? { ‣ Defined in the 
 Bind "id": 43, , '$[*]' SQL standard Parameter "a1": "bar" COLUMNS } ] ( id INT PATH '$.id' , a1 VARCHAR(…) PATH '$.a1' id a1 ) 42 foo ) r 43 bar

  52. SQL:2016 — JSON_TABLE SQL/JSON Path [ SELECT * { ‣ Query language to "id": 42, select elements from FROM JSON_TABLE "a1": "foo" a JSON document }, ( ? { ‣ Defined in the 
 Bind "id": 43, , '$[*]' SQL standard Parameter "a1": "bar" COLUMNS } ] ( id INT PATH '$.id' , a1 VARCHAR(…) PATH '$.a1' id a1 ) 42 foo ) r 43 bar

Recommend


More recommend