what is sql
play

What is SQL? Declarative Say what to do rather than how to do it - PowerPoint PPT Presentation

2 What is SQL? Declarative Say what to do rather than how to do it Introduction to SQL Avoid data manipulation details needed by procedural languages Database engine figures out best way to execute query


  1. 2 What is SQL? • Declarative – Say “what to do” rather than “how to do it” Introduction to SQL • Avoid data ‐ manipulation details needed by procedural languages – Database engine figures out “best” way to execute query • Called “query optimization” Introduction to databases • Crucial for performance: “best” can be a million times faster than “worst” CSCC43 Winter 2011 • Data independent Ryan Johnson – Decoupled from underlying data organization • Views (= precomputed queries) increase decoupling even further • Correctness always assured… performance not so much – SQL is standard and (nearly) identical among vendors • Differences often shallow, syntactical Thanks to Arnold Rosenbloom and Renee Miller for material in these slides Fairly thin wrapper around relational algebra 3 4 What does SQL look like? What does SQL *really* look like?    • Query syntax ORDER BY SELECT <desired attributes>  SELECT FROM <one or more tables>  WHERE <predicate holds for selected tuple> HAVING GROUP BY <key columns, aggregations>  GROUP BY HAVING <predicate holds for selected group> data flow  ORDER BY <columns to sort> WHERE FROM R S That’s not so bad, is it? 1

  2. 5 6 Other aspects of SQL ‘FROM’ clause • Updates, transactions • Identifies the tables (relations) to query – Insert, delete, update rows – Comma ‐ separated list – Transaction management • Optional: specify joins – Consistency levels – … but often use WHERE clause instead • “Active” logic • Optional: rename table (“tuple variable”) – Triggers and constraints – Using the same table twice (else they’re ambiguous) – User ‐ defined functions, stored procedures – Nested queries (else they’re unnamed) • Data definition (sub)language (“DDL”) – Manipulate database schema – Specify, alter physical data layout We’ll come back to these later in the course 7 8 ‘FROM’ clause – examples ‘FROM’ clause – examples (cont) • Employees [AS] E • Employees NATURAL JOIN Sales => Table alias (most systems don’t require “AS” keyword) => Natural join (bug ‐ prone, use equijoin instead) • Employees, Sales • Employees E LEFT JOIN Sales S ON E.EID=S.EID => Cartesian product • Employees E JOIN Sales S => Left join • Employees E1 => Cartesian product ( no join condition given! ) JOIN Employees E2 ON E1.EID < E2.EID • Employees E JOIN Sales S ON E.EID=S.EID => Theta self ‐ join ( what does it return? ) => Equijoin 2

  3. 9 10 Gotcha: natural join in practice Gotcha: join selectivity • Uses *all* same ‐ named attributes • Consider tables R, S, T with T=Ø and this query: – May be too many (self ‐ join => interesection => no ‐ op) SELECT R.x FROM R,S,T WHERE R.x=S.x OR R.x=T.x – May be too few (almost ‐ same names => Cartesian product) • Result contains no rows! • Implicit nature reduces readability – Selection operates on pre ‐ joined tuples – Better to list explicitly all join conditions – R  S  T = R  S  Ø = Ø • Fragile under schema changes => No tuples for WHERE clause to work with! – Alternative: for loops assigning tuples to variables R, S, T – Nasty interaction of above two cases.. => Empty relation => zero iterations => empty result • Workaround? – Two coming up later Moral of the story: WHERE cannot create tuples 11 12 Explicit join ordering Scalar expressions in SQL • Literals, attributes, single ‐ valued relations • Use parentheses to group joins • Boolean expressions – e.g. (A join B) join (C join D) – Boolean T/F coerce to 1/0 in arithmetic expressions • Special ‐ purpose feature – Zero/non ‐ zero coerce to F/T in boolean expressions – Helps some (inferior) systems optimize better • Logical connectors: AND, OR, NOT – Helps align schemas for natural join • Conditionals • Recommendation: avoid = != < > <= >= <> – People are notoriously bad at optimizing things BETWEEN, [NOT] LIKE, IS [NOT] NULL, … – Optimizer usually does what it wants anyway • Operators: + ‐ * / % & | ^ … but sometimes treats explicit ordering as a constraint • Functions: math, string, date/time, etc. (more later) Similar to expressions in C, python, etc. 3

  4. 13 14 ‘SELECT’ clause ‘SELECT’ clause – examples • E.name • Identifies which attribute(s) query returns => Vanilla projection – Comma ‐ separated list • name => Determines schema of query result => Implicit relation (error if R.name and S.name exist) • Optional: extended projection • E.name [AS] ‘Employee name’ – Compute arbitrary expressions => Prettified for output (like table renaming, ‘AS’ usually not required) – Usually based on selected attributes, but not always • sum(S.value) • Optional: rename attributes => Grouping (compute sum) – “Prettify” column names for output • sum(S.value)*0.13 ‘HST’ – Disambiguate (E1.name vs. E2.name) => Computed value based on aggregate • Optional: specify groupings • 123 ‘Magic number’ – More on this later => Filler column • Optional: duplicate elimination • *, E.* – SELECT DISTINCT … => Select all attributes, all attributes from E (no projection) 15 16 ‘WHERE’ clause ‘WHERE’ clause – examples • Conditions which all returned tuples must meet • S.date > ‘01 ‐ Jan ‐ 2010’ – Arbitrary boolean expression => Simple tuple ‐ literal condition – Combine multiple expressions with AND/OR • E.EID = S.EID • Zero in on data of interest => Simple tuple ‐ tuple condition (equijoin) – Specific people, dates, places, quantities • E.EID = S.EID AND S.PID = P.PID – Things which do (or do not) correlate with other data => Conjunctive tuple ‐ tuple condition (three ‐ way equijoin) • Often used instead of JOIN • S.value < 10 OR S.value > 10000 – SELECT tables (Cartesian product, e.g. A, B) => Disjunctive tuple ‐ literal condition – Specify join condition (e.g. A.ID=B.ID) – Optimizers (usually) understand and do the right thing 4

  5. 17 18 Pattern matching Pattern matching – examples • Compare a string to a pattern • phone LIKE ‘%268 ‐ _ _ _ _’ – <attribute> LIKE <pattern> – phone numbers with exchange 268 – <attribute> NOT LIKE <pattern> – WARNING: spaces only shown for clarity • Pattern is a quoted string • last_name LIKE ‘Jo%’ % => “any string” – Jobs, Jones, Johnson, Jorgensen, etc. _ => “any character” • Dictionary.entry NOT LIKE ‘%est’ • To escape ‘%’ or ‘_’: – Ignore ‘biggest’, ‘tallest’, ‘fastest’, ‘rest’, … – LIKE ‘%x_%’ ESCAPE ‘x’ (replace ‘x’ with character of choice) => matches strings containing ‘_’ DBMS increasingly allow regular expressions 19 20 ‘ORDER BY’ clause ‘ORDER BY’ clause – examples • Each query can sort by one or more attributes • E.name – Refer to attributes by name or position in SELECT => Defaults to ascending order – Ascending (default) or descending (reverse) order • E.name ASC Equivalent to relational operator  – => Explicitly ascending order • Definition of ‘sorted’ depends on data type • E.name DESC – Numbers use natural ordering – Date/time uses earlier ‐ first ordering => Explicitly descending order – NULL values are not comparable, cluster at end or beginning • CarCount DESC, CarName ASC • Strings are more complicated => Matches our car lot example from previous lecture – Intuitively, sort in “alphabetical order” – Problem: which alphabet? case sensitive? • SELECT E.name … ORDER BY 1 – Answer: user ‐ specified “collation order” => Specify attribute’s position instead of its name – Default collation: case ‐ sensitive latin (ASCII) alphabet String collation not covered in this class 5

  6. 21 22 NULL values in SQL Effect of NULL in expressions • Values allowed to be NULL • Consider x having value NULL – Explicitly stored in relations • Arithmetic: NaN Ternary logic tricks: – Result of outer joins – x*0 NULL TRUE  = 1 • Possible meanings • Logic: “unknown” FALSE  = 0 – Not present (homeless man’s address) – x OR FALSE NULL NULL  = ½ – Unknown (Julian Assange’s address) – x OR TRUE TRUE AND = min(…) • Effect: “poison” – x AND TRUE NULL OR = max(…) – Arithmetic: unknown value takes over expression – x AND FALSE FALSE NOT = 1 ‐ x – Conditionals: ternary logic (TRUE, FALSE, UNKNOWN) – NOT x NULL – Grouping: “not present” Gotcha: x OR NOT x is unknown (why?) 23 24 Nested queries Nested queries – uses • Scary ‐ looking syntax, simple concept • Explicit join ordering – Treat one query’s output as input to another query – FROM (A join B) is a (very simple) query to run first – Inner schema determined by inner SELECT clause • Target of relation set operation • Consider the expression tree  – Union, intersect, difference  • One of several input relations for a larger query  – Appears in FROM clause – Usually joined with other tables (or other nested queries)   R vs. => FROM A, (SELECT …) B WHERE …  => Explicit join ordering is a degenerate case R S S T 6

Recommend


More recommend