SQL: The Query Language, Part I COMS4037 Databases Dmitry Shkatov - PowerPoint PPT Presentation

SQL: The Query Language, Part I COMS4037 Databases Dmitry Shkatov Slides courtesy of Prof. Joe Hellerstein (UC Berkeley) R & G, 5.1-5.6

SQL: “Intergalactic Dataspeak” • Developed @IBM Research in the 1970s – System R project – Vs. Berkeley’s Quel language (Ingres project) • Commercialized/Popularized in the 1980s – IBM beaten to market by a startup called Oracle • Questioned repeatedly – 90’s: OO-DBMS (OQL, etc.) – 2000’s: XML (Xquery, Xpath, XSLT) – 2010’s: NoSQL & MapReduce • SQL keeps re-emerging as the standard – Even Hadoop, Spark etc. see lots of SQL – May not be perfect, but it is useful

SQL Pros and Cons • Declarative – Say what you want, not how to get it • Implemented widely – With varying levels of effjciency, completeness • Constrained – Not a T uring-complete language • General-purpose and feature-rich – many years of added features – extensible: callouts to other languages, data sources

Relational T erminology • Database : Set of Relations • Relation ( Table ): – Schema (description) – Instance (data satisfying the schema) • Attribute ( Column ) • Tuple ( Record, Row ) • Also: schema of database is set of schemas of its relations

Relational T ables • Schema is fjxed: – attribute names, atomic types – students(name text, gpa float, dept text) • Instance can change – a multi set of “rows” (“tuples”) • {(‘Bob Snob’, 3.3,’CS’), (‘Bob Snob’, 3.3,’CS’), (‘Mary Contrary’, 3.8, ‘CS’)}

SQL Language • T wo sublanguages: – DDL – Data Defjnition Language • Defjne and modify schema – DML – Data Manipulation Language • Queries can be written intuitively. • RDBMS responsible for effjcient evaluation. – Choose and run algorithms for declarative queries • Choice of algorithm must not afgect query answer.

Example Database I sailors boats sid sname rating age bid bname color 1 Nina red 1 Fred 7 22 2 Pinta blue 2 Jim 2 39 3 Santa red 3 Nancy 8 27 Maria reserves sid bid day 1 2 12/9/2015 2 2 13/9/2015

The SQL DDL sid sname rating age CREATE TABLE sailors ( sid SERIAL, 1 Fred 7 22 sname CHAR(20), 2 Jim 2 39 rating INTEGER, 3 Nancy 8 27 age REAL, PRIMARY KEY (sid)); CREATE TABLE boats ( bid bname color bid INTEGER, 1 Nina red bname CHAR (20), 2 Pinta blue color CHAR(10), PRIMARY KEY (bid)); 3 Santa Maria red CREATE TABLE reserves ( sid INTEGER, bid INTEGER, sid bid day day DATE, 1 2 12/9/2015 PRIMARY KEY (sid, bid, day), FOREIGN KEY (sid) REFERENCES sailors, 2 2 13/9/2015 FOREIGN KEY (bid) REFERENCES boats));

Example Database II students sid name gpa age dept gender 1 Sergey Brin 4 40 CS M 2 Danah Boyd 4 35 CS M 3 Bill Gates 1 60 CS M 4 Hillary Mason 4 35 DATASCI F 5 Mike Olson 4 50 CS M Mark Zuckerberg 4 6 30 CS M Cheryl Sandberg 4 7 47 BUSINES F S

The SQL DML sid sname rating age sailors 1 Fred 7 22 2 Jim 2 39 3 Nancy 8 27 • Find all 27-year-old sailors: SELECT * FROM sailors S WHERE S.age=27; • T o fjnd just names and ratings, replace the fjrst line: SELECT S.sname, S.rating

SQL DML 1: Basic Single-T able Queries • SELECT [DISTINCT] <column expression list> FROM <single table> [WHERE <predicate> ] [GROUP BY <column list> [HAVING <predicate> ] ] [ORDER BY <column list> ];

Basic Single-T able Queries • SELECT [DISTINCT] <column expression list> FROM <single table> [WHERE <predicate> ] [GROUP BY <column list> [HAVING <predicate> ] ] [ORDER BY <column list> ] ; • Simplest version is straightforward – Produce all tuples in the table that satisfy the predicate – Output the expressions in the SELECT list – Expression can be a column reference, or an arithmetic expression over column refs

Basic Single-T able Queries • SELECT S.name, S.gpa FROM students S WHERE S.dept = 'CS' [GROUP BY <column list> [HAVING <predicate> ] ] [ORDER BY <column list> ] ; • Simplest version is straightforward – Produce all tuples in the table that satisfy the predicate – Output the expressions in the SELECT list – Expression can be a column reference, or an arithmetic expression over column refs

SELECT DISTINCT • SELECT DISTINCT S.gpa FROM students S WHERE S.dept = 'CS' [GROUP BY <column list> [HAVING <predicate> ] ] [ORDER BY <column list> ] ; • DISTINCT fmag specifjes removal of duplicates before output

ORDER BY SELECT [DISTINCT] S.name, S.gpa, S.age*2 AS a2 FROM students S WHERE S.dept = 'CS' [GROUP BY <column list> [HAVING <predicate> ] ] ORDER BY S.gpa, a2; • ORDER BY clause specifjes output to be sorted – Lexicographic ordering • Obviously must refer to columns in the output – Note the AS clause for naming output columns!

ORDER BY SELECT [DISTINCT] S.name, S.gpa FROM students S WHERE S.dept = 'CS' [GROUP BY <column list> [HAVING <predicate> ] ] ORDER BY S.gpa DESC, S.name ASC; • Ascending order by default, but can be overriden – DESC fmag for descending, ASC for ascending – Can mix and match, lexicographically

Aggregates SELECT [DISTINCT] AVG(S.gpa) FROM students S WHERE S.dept = 'CS' [GROUP BY <column list> [HAVING <predicate> ] ] [ORDER BY <column list> ] ; • Before producing output, compute a summary (a.k.a. an aggregate ) of some arithmetic expression • Produces 1 row of output – with one column in this case • Other aggregates: SUM, COUNT, MAX, MIN • Note: can use DISTINCT inside the agg function – SELECT COUNT(DISTINCT S.name) FROM Students S – vs. SELECT DISTINCT COUNT (S.name) FROM Students S;

GROUP BY SELECT [DISTINCT] S.dept, AVG(S.gpa) FROM students S [WHERE <predicate> ] GROUP BY S.dept [HAVING <predicate> ] [ORDER BY <column list> ] ; • Partition table into groups with same GROUP BY column values – Can group by a list of columns • Produce an aggregate result per group – Cardinality of output = # of distinct group values • Note: can put grouping columns in SELECT list – For aggregate queries, SELECT list can contain aggs and GROUP BY columns only! – What would it mean if we said SELECT S.name, AVG(S.gpa) above??

HAVING SELECT [DISTINCT] S.dept, AVG(S.gpa) FROM students S [WHERE <predicate> ] GROUP BY S.dept HAVING COUNT(*) > 3 [ORDER BY <column list> ] ; • The HAVING predicate is applied after grouping and aggregation – Hence can contain anything that could go in the SELECT list – I.e. aggs or GROUP BY columns • HAVING can only be used in aggregate queries • It’s an optional clause

Putting it all together • SELECT S.dept, AVG(S.gpa), COUNT(*) FROM students S WHERE S.gender = 'F' GROUP BY S.dept HAVING COUNT(*) > 2 ORDER BY S.dept;

Querying Multiple Relations SELECT S.sname FROM sailors AS S, reserves AS R WHERE S.sid=R.sid AND R.bid=2 sailors boats sid sname rating age bid bname color 1 Nina red 1 Popeye 10 22 2 Pinta blue 2 OliveOyl 11 39 reserves 3 Garfjeld 1 27 4 Bob 5 19 sid bid day 1 2 12/9/15 2 2 13/9/15 1 1 01/10/15

Join Queries • SELECT [DISTINCT] <column expression list> FROM <table1 [AS t1], ... , tableN [AS tn]> [WHERE <predicate> ] [GROUP BY <column list> [HAVING <predicate> ] ] [ORDER BY <column list> ];

Query Semantics SELECT [DISTINCT] target-list FROM relation-list WHERE qualifjcation 1. WHERE : Check conditions, discard tuples that fail. 2. SELECT : Specify desired fjelds in output. 3. DISTINCT (optional) : eliminate duplicate rows. 4. FROM : compute cross product of tables. • Note: likely a terribly ineffjcient strategy! – Query optimizer will fjnd more effjcient plans.

Conceptual SQL Evaluation SELECT [DISTINCT] target-list FROM relation-list WHERE qualifjcation GROUP BY grouping-list HAVING group-qualifjcation Eliminate Form groups GROUP By [DISTINCT] duplicates & aggregate Project away Apply selections WHERE SELECT columns (eliminate rows) Relation Eliminate FROM HAVING cross-product groups

Find names of sailors who’ve reserved at least one boat SELECT S.sname FROM sailors AS S, reserves AS R WHERE S.sid = R.sid • Would DISTINCT make a difgerence here?

Range Variables (aliases) • Needed when ambiguity could arise. – e.g., same table used multiple times in FROM (“self-join”) SELECT x.sname, x.age, y.sname, y.age FROM sailors AS x, sailors AS y WHERE x.age > y.age sailors sid sname rating age 1 Popeye 10 22 2 OliveOyl 11 39 3 Garfjeld 1 27 4 Bob 5 19

Arithmetic Expressions SELECT S.age, S.age-5 AS age1, 2*S.age AS age2 FROM sailors AS S WHERE S.sname = 'Popeye' SELECT S1.sname AS name1, S2.sname AS name2 FROM sailors AS S1, sailors AS S2 WHERE 2*S1.rating = S2.rating - 1

String Comparisons SELECT S.sname FROM sailors S WHERE S.sname LIKE 'B_% ' ‘_’ stands for any one character and ‘%’ stands for 0 or more arbitrary characters. Most DBMSs now support standard regex as well (incl. PostgreSQL)

SQL: The Query Language, Part I COMS4037 Databases Dmitry Shkatov - PowerPoint PPT Presentation

SQL: The Query Language, Part I COMS4037 Databases Dmitry Shkatov Slides courtesy of Prof. Joe Hellerstein (UC Berkeley) R & G, 5.1-5.6 SQL: Intergalactic Dataspeak Developed @IBM Research in the 1970s System R project

SQL SQL SQL = Structured Query Language Standard query language for relational

Database Systems SQL Based on slides by Feifei Li, University of Utah The SQL Query Language n

What is SQL? SQL stands for Structured Query Language SQL lets you access and manipulate

SQL , the Structured Query Language Overview Introduction DDL Commands DML Commands SQL

Introduction to SQL Chapter 3: Introduction to SQL Overview of the SQL Query Language

BASIC SQL CHAPTER 4 (6/E) CHAPTER 8 (5/E) 1 CHAPTER 4 OUTLINE SQL Data Definition and

Intermezzo: A typical database architecture 136 A typical database architecture SQL SQL SQL

This Lecture SQL The SQL language SQL, the relational model, and E/R diagrams SQL Data

Query Execution 2 and Query Optimization Instructor: Matei Zaharia cs245.stanford.edu Query

A1 (Part 2): Injection SQL Injection SQL injection is prevalent SQL injection is impactful Why a

Improve Query Performance with the Query Log Analyzer Kees Vegter Field Engineer Query Log

Query Execuon Declarave Query (SQL) We start from

More Than a Query Language: SQL in the 21st Century Markus Winand, winand.at SQL Renaissance

Relational Algebra and SQL Chapter 5 1 Relational Query Languages Languages for describing

SQL Structured Query Language (SQL) CS 235: The language of databases Based on

Simple SQL Queries (2) Review SQL the structured query language for relational databases

Class 8: SQL, The Query Language Part I Instructor: Manos Athanassoulis

ECE 242 Data Structures Lecture 5 Queues September 18, 2009 ECE242 L5: Queues Overview

Virtual Memory - I Tevfik Ko ar University at Buffalo October 29th, 2013 1 Roadmap

MODULE 8: REPORTS IDIS Online for CDBG Entitlement Communities 1 Key IDIS Reports General

Valley Clean Energy Board Meeting November 12, 2020 Via Teleconference Item 15 2020 Local

DM550 / DM857 Introduction to Programming Peter Schneider-Kamp petersk@imada.sdu.dk

Investor Community Conference Call Q2 2008 2008 2008 2008 Risk Review Tom Flynn Executive

Efficient First-Order Algorithms for Adaptive Signal Denoising Dmitrii Ostrovskii * Zaid Harchaoui

SQL: The Query Language, Part I COMS4037 Databases Dmitry Shkatov - PowerPoint PPT Presentation

SQL: The Query Language, Part I COMS4037 Databases Dmitry Shkatov Slides courtesy of Prof. Joe Hellerstein (UC Berkeley) R & G, 5.1-5.6 SQL: Intergalactic Dataspeak Developed @IBM Research in the 1970s System R project

SQL SQL SQL = Structured Query Language Standard query language for relational

Database Systems SQL Based on slides by Feifei Li, University of Utah The SQL Query Language n

What is SQL? SQL stands for Structured Query Language SQL lets you access and manipulate

SQL , the Structured Query Language Overview Introduction DDL Commands DML Commands SQL

Introduction to SQL Chapter 3: Introduction to SQL Overview of the SQL Query Language

BASIC SQL CHAPTER 4 (6/E) CHAPTER 8 (5/E) 1 CHAPTER 4 OUTLINE SQL Data Definition and

Intermezzo: A typical database architecture 136 A typical database architecture SQL SQL SQL

This Lecture SQL The SQL language SQL, the relational model, and E/R diagrams SQL Data

Query Execution 2 and Query Optimization Instructor: Matei Zaharia cs245.stanford.edu Query

A1 (Part 2): Injection SQL Injection SQL injection is prevalent SQL injection is impactful Why a

Improve Query Performance with the Query Log Analyzer Kees Vegter Field Engineer Query Log

Query Execu*on Declara*ve Query (SQL) We start from

More Than a Query Language: SQL in the 21st Century Markus Winand, winand.at SQL Renaissance

Relational Algebra and SQL Chapter 5 1 Relational Query Languages Languages for describing

SQL Structured Query Language (SQL) CS 235: The language of databases Based on

Simple SQL Queries (2) Review SQL the structured query language for relational databases

Class 8: SQL, The Query Language Part I Instructor: Manos Athanassoulis

ECE 242 Data Structures Lecture 5 Queues September 18, 2009 ECE242 L5: Queues Overview

Virtual Memory - I Tevfik Ko ar University at Buffalo October 29th, 2013 1 Roadmap

MODULE 8: REPORTS IDIS Online for CDBG Entitlement Communities 1 Key IDIS Reports General

Valley Clean Energy Board Meeting November 12, 2020 Via Teleconference Item 15 2020 Local

DM550 / DM857 Introduction to Programming Peter Schneider-Kamp petersk@imada.sdu.dk

Investor Community Conference Call Q2 2008 2008 2008 2008 Risk Review Tom Flynn Executive

Efficient First-Order Algorithms for Adaptive Signal Denoising Dmitrii Ostrovskii * Zaid Harchaoui

Query Execuon Declarave Query (SQL) We start from