formal semantics of sql
play

Formal Semantics of SQL (and Cypher) Paolo Guagliardo SQL - PowerPoint PPT Presentation

Formal Semantics of SQL (and Cypher) Paolo Guagliardo SQL Standard query language for relational databases $30B/year business Implemented in all major RDBMSs (free and commercial) First standardized in 1986 (ANSI) and 1987 (ISO)


  1. Formal Semantics of SQL (and Cypher) Paolo Guagliardo

  2. SQL • Standard query language for relational databases • $30B/year business • Implemented in all major RDBMSs (free and commercial) • First standardized in 1986 (ANSI) and 1987 (ISO) • Several revision afterwards (SQL-89, SQL-92, SQL:1999, SQL:2003, SQL:2006, SQL:2008, SQL:2011, SQL:2016) “The nice thing about standards is that you have so many to choose from” — Andrew S. Tanenbaum

  3. How standard is SQL? SELECT * FROM ( SELECT R.A, R.A FROM R ) S PostgreSQL outputs a table with two columns named “ A ” Oracle throws an ERROR : reference to column “ A ” is ambiguous SELECT * FROM R WHERE EXISTS ( SELECT * FROM ( SELECT R.A, R.A FROM R ) S ) Both PostgreSQL and Oracle output R

  4. Who is right? Let’s have a look at the standard! A. If the <select list> * is simply contained in a <subquery> that is immediately contained in an <exists predicate> , then the <select list> is equivalent to a <value expression> that is an arbitrary <literal> . B. Otherwise, the <select list> * is equivalent to a <value expression> sequence in which each <value expression> is a column reference that references a column of T and each column of T is referenced exactly once . The columns are referenced in the ascending sequence of their ordinal position within T.

  5. … which means SELECT * SELECT S.A, S.A ≡ FROM ( SELECT R.A, R.A FROM ( SELECT R.A, R.A FROM R ) S FROM R ) S SELECT * FROM R SELECT R.A FROM R WHERE EXISTS ( WHERE EXISTS ( ≡ SELECT * SELECT 1 FROM ( SELECT R.A, R.A FROM ( SELECT R.A, R.A FROM R ) S FROM R ) S ) )

  6. The Need for a Formal Semantics • Avoid ambiguity of natural language • Clearly defined and not subject to interpretation • Easy to understand and implement Previous attempts • Many simplifying assumptions: no bags , no nulls • No justification of correctness

  7. A A R S 1 NULL Answer NULL SELECT R.A FROM R A EXCEPT 1 SELECT S.A FROM S SELECT R.A FROM R A WHERE R.A NOT IN ( SELECT S.A FROM S) SELECT R.A FROM R A WHERE NOT EXISTS ( 1 SELECT S.A FROM S NULL WHERE S.A=R.A )

  8. Core SQL fragment ⌧ := ( T 1 , . . . , T k ) , � := ( N 1 , . . . , N k ) , k > 0 ↵ := ( A 1 , . . . , A m ) , � 0 := ( N 0 1 , . . . , N 0 m ) , m > 0 Queries: Q := SELECT [ DISTINCT ] ( ↵ : � 0 | * ) FROM ⌧ : � WHERE ✓ | Q ( UNION | INTERSECT | EXCEPT ) [ ALL ] Q Conditions: ✓ := TRUE | ¯ t (= | 6 =) ¯ t | t IS [ NOT ] NULL | ¯ t [ NOT ] IN Q | EXISTS Q | ✓ AND ✓ | ✓ OR ✓ | NOT ✓ Essentially SQL without arithmetic, grouping and aggregation

  9. Formal Semantics: Challenges Data model • Base relations / query outputs / intermediate results • Primitive data manipulation operations Attribute references • Binding rules in subqueries • Environment collects and propagates bindings

  10. Proposed Semantics J R K D, η = R D J ⌧ : � K D, η = J ( T 1 , . . . , T k ) : ( N 1 , . . . , N k ) K D, η = N 1 . J T 1 K D, η ⇥ · · · ⇥ N k . J T k K D, η s { ⌧ : � FROM � = ¯ a 2 J ⌧ : � K D, η | J ✓ K D, η ; η ¯ a = t ✓ WHERE D, η t SELECT | s { ⇤ ⌧ : � FROM : � � 1 ⌧ : � = FROM ✓ WHERE ✓ WHERE D, η D, η t SELECT | • Fits in one page s { ↵ : � 0 ! ⌧ : � FROM ⌧ : � = ⇡ α : � 0 FROM ✓ WHERE ✓ WHERE D, η D, η t SELECT DISTINCT | t SELECT | ↵ : � 0 | ⇤ ↵ : � 0 | ⇤ 0 1 ⌧ : � ⌧ : � = " FROM FROM @ A ✓ ✓ • Non-ambiguous WHERE WHERE D, η D, η J TRUE K D, η = t ⇢ ⌘ ( A ) if t = A J t K D, η = t if t 2 C or t = NULL • Easy to understand 8 t if J t 1 K D, η = J t 2 K D, η and J t 1 K D, η 6 = NULL and J t 2 K D, η 6 = NULL < J t 1 = t 2 K D, η = f if J t 1 K D, η 6 = J t 2 K D, η and J t 1 K D, η 6 = NULL and J t 2 K D, η 6 = NULL : if J t 1 K D, η = NULL or J t 2 K D, η = NULL u ⇢ t if J t K D, η = NULL J t IS NULL K D, η = if J t K D, η 6 = NULL f • Easy to implement J t IS NOT NULL K D, η = ¬ J t IS NULL K D, η n n ^ _ J ( t 1 , . . . t n ) = ( t 0 1 , . . . , t 0 J t i = t 0 J ( t 1 , . . . t n ) 6 = ( t 0 1 , . . . , t 0 J t i 6 = t 0 n ) K D, η = i K D, η n ) K D, η = i K D, η i =1 i =1 r 2 J Q K D, η : J ¯ 8 t if 9 ¯ t = ⌫ (¯ r ) K D, η = t • Easy to modify < J ¯ r 2 J Q K D, η : J ¯ t IN Q K D, η = f if 8 ¯ t = ⌫ (¯ r ) K D, η = f r 2 J Q K D, η : J ¯ r 2 J Q K D, η : J ¯ if @ ¯ : u t = ⌫ (¯ r ) K D, η = t and 9 ¯ t = ⌫ (¯ r ) K D, η 6 = f J ¯ t NOT IN Q K D, η = ¬ J ¯ t IN Q K D, η ⇢ t if J Q K D, η 6 = ? J EXISTS Q K D, η = if J Q K D, η = ? f J ✓ 1 AND ✓ 2 K D, η = J ✓ 1 K D, η ^ J ✓ 2 K D, η J ✓ 1 OR ✓ 2 K D, η = J ✓ 1 K D, η _ J ✓ 2 K D, η J NOT ✓ K D, η = ¬ J ✓ K D, η J Q 1 UNION ALL Q 2 K D, η = J Q 1 K D, η [ J Q 2 K D, η : ` ( J Q 1 K ) J Q 1 INTERSECT ALL Q 2 K D, η = J Q 1 K D, η \ J Q 2 K D, η : ` ( J Q 1 K ) J Q 1 EXCEPT ALL Q 2 K D, η = J Q 1 K D, η � J Q 2 K D, η : ` ( J Q 1 K ) J Q 1 ? Q 2 K D, η = " � � J Q 1 ? ALL Q 2 K D, η ? 2 { UNION , INTERSECT } , J Q 1 EXCEPT Q 2 K D, η = " ( J Q 1 K D, η ) � J Q 2 K D, η : ` ( J Q 1 K )

  11. Formal Semantics: Validation • Cannot prove that semantics is correct • Provide sufficient experimental evidence • Implemented in Python • Validated on 100000+ random SQL queries

  12. Formal Semantics of Cypher • Collaboration between Neo Technology and the University of Edinburgh • Preliminary meeting in December • Legal agreements finalized recently • Neo Technology sponsors a researcher ( Nadime Francis )

  13. Challenges • Getting the (abstract) data model right • Intermediate representation (QUIL?) • Identify core fragment • Language constantly evolving • Follow the footsteps of SQL? (nulls)

Recommend


More recommend