2 What is SQL? • Declarative – Say “what to do” rather than “how to do it” Introduction to SQL • Avoid data ‐ manipulation details needed by procedural languages – Database engine figures out “best” way to execute query • Called “query optimization” Introduction to databases • Crucial for performance: “best” can be a million times faster than “worst” CSCC43 Winter 2011 • Data independent Ryan Johnson – Decoupled from underlying data organization • Views (= precomputed queries) increase decoupling even further • Correctness always assured… performance not so much – SQL is standard and (nearly) identical among vendors • Differences often shallow, syntactical Thanks to Arnold Rosenbloom and Renee Miller for material in these slides Fairly thin wrapper around relational algebra 3 4 What does SQL look like? What does SQL *really* look like? • Query syntax ORDER BY SELECT <desired attributes> SELECT FROM <one or more tables> WHERE <predicate holds for selected tuple> HAVING GROUP BY <key columns, aggregations> GROUP BY HAVING <predicate holds for selected group> data flow ORDER BY <columns to sort> WHERE FROM R S That’s not so bad, is it? 1
5 6 Other aspects of SQL ‘FROM’ clause • Updates, transactions • Identifies the tables (relations) to query – Insert, delete, update rows – Comma ‐ separated list – Transaction management • Optional: specify joins – Consistency levels – … but often use WHERE clause instead • “Active” logic • Optional: rename table (“tuple variable”) – Triggers and constraints – Using the same table twice (else they’re ambiguous) – User ‐ defined functions, stored procedures – Nested queries (else they’re unnamed) • Data definition (sub)language (“DDL”) – Manipulate database schema – Specify, alter physical data layout We’ll come back to these later in the course 7 8 ‘FROM’ clause – examples ‘FROM’ clause – examples (cont) • Employees [AS] E • Employees NATURAL JOIN Sales => Table alias (most systems don’t require “AS” keyword) => Natural join (bug ‐ prone, use equijoin instead) • Employees, Sales • Employees E LEFT JOIN Sales S ON E.EID=S.EID => Cartesian product • Employees E JOIN Sales S => Left join • Employees E1 => Cartesian product ( no join condition given! ) JOIN Employees E2 ON E1.EID < E2.EID • Employees E JOIN Sales S ON E.EID=S.EID => Theta self ‐ join ( what does it return? ) => Equijoin 2
9 10 Gotcha: natural join in practice Gotcha: join selectivity • Uses *all* same ‐ named attributes • Consider tables R, S, T with T=Ø and this query: – May be too many (self ‐ join => interesection => no ‐ op) SELECT R.x FROM R,S,T WHERE R.x=S.x OR R.x=T.x – May be too few (almost ‐ same names => Cartesian product) • Result contains no rows! • Implicit nature reduces readability – Selection operates on pre ‐ joined tuples – Better to list explicitly all join conditions – R S T = R S Ø = Ø • Fragile under schema changes => No tuples for WHERE clause to work with! – Alternative: for loops assigning tuples to variables R, S, T – Nasty interaction of above two cases.. => Empty relation => zero iterations => empty result • Workaround? – Two coming up later Moral of the story: WHERE cannot create tuples 11 12 Explicit join ordering Scalar expressions in SQL • Literals, attributes, single ‐ valued relations • Use parentheses to group joins • Boolean expressions – e.g. (A join B) join (C join D) – Boolean T/F coerce to 1/0 in arithmetic expressions • Special ‐ purpose feature – Zero/non ‐ zero coerce to F/T in boolean expressions – Helps some (inferior) systems optimize better • Logical connectors: AND, OR, NOT – Helps align schemas for natural join • Conditionals • Recommendation: avoid = != < > <= >= <> – People are notoriously bad at optimizing things BETWEEN, [NOT] LIKE, IS [NOT] NULL, … – Optimizer usually does what it wants anyway • Operators: + ‐ * / % & | ^ … but sometimes treats explicit ordering as a constraint • Functions: math, string, date/time, etc. (more later) Similar to expressions in C, python, etc. 3
13 14 ‘SELECT’ clause ‘SELECT’ clause – examples • E.name • Identifies which attribute(s) query returns => Vanilla projection – Comma ‐ separated list • name => Determines schema of query result => Implicit relation (error if R.name and S.name exist) • Optional: extended projection • E.name [AS] ‘Employee name’ – Compute arbitrary expressions => Prettified for output (like table renaming, ‘AS’ usually not required) – Usually based on selected attributes, but not always • sum(S.value) • Optional: rename attributes => Grouping (compute sum) – “Prettify” column names for output • sum(S.value)*0.13 ‘HST’ – Disambiguate (E1.name vs. E2.name) => Computed value based on aggregate • Optional: specify groupings • 123 ‘Magic number’ – More on this later => Filler column • Optional: duplicate elimination • *, E.* – SELECT DISTINCT … => Select all attributes, all attributes from E (no projection) 15 16 ‘WHERE’ clause ‘WHERE’ clause – examples • Conditions which all returned tuples must meet • S.date > ‘01 ‐ Jan ‐ 2010’ – Arbitrary boolean expression => Simple tuple ‐ literal condition – Combine multiple expressions with AND/OR • E.EID = S.EID • Zero in on data of interest => Simple tuple ‐ tuple condition (equijoin) – Specific people, dates, places, quantities • E.EID = S.EID AND S.PID = P.PID – Things which do (or do not) correlate with other data => Conjunctive tuple ‐ tuple condition (three ‐ way equijoin) • Often used instead of JOIN • S.value < 10 OR S.value > 10000 – SELECT tables (Cartesian product, e.g. A, B) => Disjunctive tuple ‐ literal condition – Specify join condition (e.g. A.ID=B.ID) – Optimizers (usually) understand and do the right thing 4
17 18 Pattern matching Pattern matching – examples • Compare a string to a pattern • phone LIKE ‘%268 ‐ _ _ _ _’ – <attribute> LIKE <pattern> – phone numbers with exchange 268 – <attribute> NOT LIKE <pattern> – WARNING: spaces only shown for clarity • Pattern is a quoted string • last_name LIKE ‘Jo%’ % => “any string” – Jobs, Jones, Johnson, Jorgensen, etc. _ => “any character” • Dictionary.entry NOT LIKE ‘%est’ • To escape ‘%’ or ‘_’: – Ignore ‘biggest’, ‘tallest’, ‘fastest’, ‘rest’, … – LIKE ‘%x_%’ ESCAPE ‘x’ (replace ‘x’ with character of choice) => matches strings containing ‘_’ DBMS increasingly allow regular expressions 19 20 ‘ORDER BY’ clause ‘ORDER BY’ clause – examples • Each query can sort by one or more attributes • E.name – Refer to attributes by name or position in SELECT => Defaults to ascending order – Ascending (default) or descending (reverse) order • E.name ASC Equivalent to relational operator – => Explicitly ascending order • Definition of ‘sorted’ depends on data type • E.name DESC – Numbers use natural ordering – Date/time uses earlier ‐ first ordering => Explicitly descending order – NULL values are not comparable, cluster at end or beginning • CarCount DESC, CarName ASC • Strings are more complicated => Matches our car lot example from previous lecture – Intuitively, sort in “alphabetical order” – Problem: which alphabet? case sensitive? • SELECT E.name … ORDER BY 1 – Answer: user ‐ specified “collation order” => Specify attribute’s position instead of its name – Default collation: case ‐ sensitive latin (ASCII) alphabet String collation not covered in this class 5
21 22 NULL values in SQL Effect of NULL in expressions • Values allowed to be NULL • Consider x having value NULL – Explicitly stored in relations • Arithmetic: NaN Ternary logic tricks: – Result of outer joins – x*0 NULL TRUE = 1 • Possible meanings • Logic: “unknown” FALSE = 0 – Not present (homeless man’s address) – x OR FALSE NULL NULL = ½ – Unknown (Julian Assange’s address) – x OR TRUE TRUE AND = min(…) • Effect: “poison” – x AND TRUE NULL OR = max(…) – Arithmetic: unknown value takes over expression – x AND FALSE FALSE NOT = 1 ‐ x – Conditionals: ternary logic (TRUE, FALSE, UNKNOWN) – NOT x NULL – Grouping: “not present” Gotcha: x OR NOT x is unknown (why?) 23 24 Nested queries Nested queries – uses • Scary ‐ looking syntax, simple concept • Explicit join ordering – Treat one query’s output as input to another query – FROM (A join B) is a (very simple) query to run first – Inner schema determined by inner SELECT clause • Target of relation set operation • Consider the expression tree – Union, intersect, difference • One of several input relations for a larger query – Appears in FROM clause – Usually joined with other tables (or other nested queries) R vs. => FROM A, (SELECT …) B WHERE … => Explicit join ordering is a degenerate case R S S T 6
Recommend
More recommend