introduction to sql and the relational model
play

Introduction to SQL and the Relational Model Data Boot Camp! May - PowerPoint PPT Presentation

Introduction to SQL and the Relational Model Data Boot Camp! May 20, 2014 Michael Cafarella 5/20/14 1 Relational Databases The most common kind is a relational database The software is called a Relational Database Management System


  1. Introduction to SQL and the Relational Model Data Boot Camp! May 20, 2014 Michael Cafarella 5/20/14 1

  2. Relational Databases • The most common kind is a relational database • The software is called a Relational Database Management System (RDBMS) • Oracle, IBM’s DB2, Microsoft’s SQLServer, MySQL, SQLite, etc • Your dataset is “a database”, managed by an RDBMS AID Name Country Sport 1 Mary Lou Retton USA Gymnastics 2 Jackie Joyner-Kersee USA Track 3 Michael Phelps USA Swimming 5/20/14 2

  3. Relational Databases • A relational database is a set of “relations” (aka tables) • Each relation has two parts: • Instance (a table, with rows (aka tuples, records), and columns (aka fields, attributes)) • # Rows = cardinality • # Columns = degree / arity • Schema • Relation name • Name and type for each column • E.g., Student (sid int, name varchar(128), gpa real) 5/20/14 3

  4. Instance of Athlete Relation AID Name Country Sport 1 Mary Lou Retton USA Gymnastics 2 Jackie Joyner-Kersee USA Track 3 Michael Phelps USA Swimming (aid: integer, name: string, What is the schema? country: string, sport:string) Cardinality & Degree? Cardinality = 3, Degree = 4 5/20/14 4

  5. Relational Query Languages • RDBMS do lots of things, but mainly: • Keeps data safe • Gives you a powerful query language • Queries written declaratively • In contrast to procedural methods • RDBMS is responsible for efficient evaluation • System can optimize for efficient query execution, and still ensure that the answer does not change • Most popular query language is SQL 5/20/14 5

  6. Creating Relations in SQL CREATE TABLE Athlete • Create the Athlete (aid INTEGER, relation name CHAR(30), country CHAR(20), • Type constraint enforced sport CHAR(20)) when tuples added or modified CREATE TABLE Olympics • Create the Olympics (oid INTEGER, relation year INTEGER, city CHAR(20)) CREATE TABLE Compete • Create the Compete (aid INTEGER, relation oid INTEGER) 5/20/14 6

  7. The SQL Query Language • Find all athletes from USA: AID Name Country Sport SELECT * 1 Mary Lou Retton USA Gymnastics FROM Athlete A 2 Jackie Joyner-Kersee USA Track WHERE A.country = ‘ USA ’ 3 Michael Phelps USA Swimming • Print only the names and sports: Name Sport SELECT A.name, A.sport Mary Lou Retton Gymnastics FROM Athlete A Jackie Joyner-Kersee Track WHERE A.country = ‘ USA ’ Michael Phelps Swimming 5/20/14 7

  8. Querying Multiple Relations • What does the following query compute? SELECT O.year FROM Athletes A, Olympics O, Compete C WHERE A.aid = C.aid AND O.oid = C.oid AND A.name = ‘ Michael Phelps ’ Find the years when Michael Phelps competed in the Olympics 5/20/14 8

  9. Adding & Deleting Tuples • Can insert a single tuple using: INSERT INTO Athlete (aid, name, country, sport) VALUES (4, ‘ Johann Koss ’ , ‘ Norway ’ , ‘ Speedskating ’ ) • Can delete all tuples satisfying some condition (e.g., name = Smith): DELETE FROM Athlete A WHERE A.name = ‘ Smith ’ 5/20/14 9

  10. Destroying & Altering Relations DROP TABLE Olympics Destroys the relation Olympics. (Schema information and tuples are deleted) 5/20/14 10

  11. Hands-On #1 • Go to sqlfiddle.com • In another window, go to web.eecs.umich.edu/~michjc/players.txt • Copy the text into the left-hand window and click “Build Schema” • Schema: • playerID, year, gameNum, gameID, teamID, lgID, GP, startingPos • ('ortizda01', 2012, 0, 'ALS201207100', 'BOS', 'AL', 1, 0), 5/20/14 Data Boot Camp! 11

  12. Hands-On #1 • Write queries to find: • Names of all the players in the database • All info for all players from Detroit • Names and teams of the first basemen • SELECT playerID FROM Allstars • SELECT * FROM Allstars WHERE teamID = “DET” • SELECT playerID, teamID FROM Allstars WHERE startingPos = 1 5/20/14 Data Boot Camp! 12

  13. Attributes from Basic SQL Query input relations Optional List of relations SELECT [DISTINCT] attr-list FROM relation-list Attr1 op Attr2 OPS: <, >, =, <=, >=, <> WHERE qualification Combine using AND, OR, NOT (Conceptual) Evaluation: 1. Take cross-product of relation-list 2. Select rows satisfying qualification 3. Project columns in attr-list (eliminate duplicates only if DISTINCT) 5/20/14 13

  14. Example of Basic Query • Schema: • Sailors (sid, sname, rating, age) • Boats (bid, bname, color) • Reserves (sid, bid, day) • Find the names of sailors who have reserved boat #103 SELECT S.sname FROM Sailors S, Reserves R WHERE S.sid = R.sid AND R.bid = 103 5/20/14 14

  15. Example of Basic Query Reserves Sailors sid bid day sid sname rating age 22 101 10/10 22 dustin 7 45 58 103 11/12 58 rusty 10 35 31 lubber 8 55 Reserves x Sailors sid bid day sid sname rating age 22 101 10/10 22 dustin 7 45 22 101 10/10 58 rusty 10 35 22 101 10/10 31 lubber 8 55 58 103 11/12 22 dustin 7 45 58 103 11/12 58 rusty 10 35 58 103 11/12 31 lubber 8 55 5/20/14 15

  16. Example of Basic Query SELECT DISTINCT sname FROM Sailors S, Reserves R WHERE S.sid = R.sid AND R.bid = 103 What ’ s the effect of adding DISTINCT? 5/20/14 16

  17. Another Example • Schema: • Sailors (sid, sname, rating, age) • Boats (bid, bname, color) • Reserves (sid, bid, day) • Find the colors of boats reserved by a sailor named rusty SELECT B.color FROM Sailors S, Reserves R, Boats B WHERE S.sid = R.sid AND R.bid = B.bid AND S.sname = ‘ rusty ’ 5/20/14 17

  18. Note on Range Variables • Needed when same relation appears twice in FROM clause SELECT S1.sname, S2.sname What does this FROM Sailors S1, Sailors S2 Query compute? WHERE S1.age > S2.age Good style to always use range variables anyway… 5/20/14 18

  19. Hands-On #2 • Go back to sqlfiddle.com; clear to restart • In another window, go to web.eecs.umich.edu/~michjc/teams.txt • Copy the text, Build Schema, etc • In addition to Allstars table, Teams table: • yearID, lgID, teamID, franchID, name, park, attendance, BPF, PPF, teamIDBR, teamIDlahman45, teamIDretro 5/20/14 Data Boot Camp! 19

  20. Hands-On #2 • Write queries to find: • Team names for all teams with attendance more than 2,000,000 • Player ID and home stadium for all Allstars • TeamID, attendance for teams that had an all- star player • SELECT name FROM Teams WHERE attendance > 2000000 • SELECT playerID, park FROM Allstars, Teams WHERE Allstars.teamID = Teams.teamID 5/20/14 Data Boot Camp! 20

  21. Hands-On #2 • Last one: • TeamID, attendance values for teams that had an all-star player • One answer: • SELECT Allstars.teamID, attendance FROM Teams, Allstars WHERE Teams.teamID = Allstars.teamID • A better answer: • SELECT DISTINCT Allstars.teamID, attendance FROM Teams, Allstars WHERE Teams.teamID = Allstars.teamID 5/20/14 Data Boot Camp! 21

  22. ORDER BY clause • Most of the time, results are unordered • You can change this with the ORDER BY clause Attribute(s) in ORDER BY clause must be in SELECT list. Find the names and ages Find the names and ages of all sailors, in increasing of all sailors, in decreasing order of age order of age SELECT S.sname, S.age SELECT S.sname, S.age FROM Sailors S FROM Sailors S ORDER BY S.age [ASC] ORDER BY S.age DESC 5/20/14 22

  23. ORDER BY clause SELECT S.sname, S.age, S.rating FROM Sailors S ORDER BY S.age ASC, S.rating DESC What does this query compute? Find the names, ages, and rankings of all sailors. Sort the result in increasing order of age. If there is a tie, sort those tuples in decreasing order of rating. 5/20/14 23

  24. Hands-On #3 • A twist: • TeamID, attendance values for teams that had an all-star player ORDERED BY ATTENDANCE • A good answer: • SELECT DISTINCT Allstars.teamID, attendance FROM Teams, Allstars WHERE Teams.teamID = Allstars.teamID ORDER BY attendance DESC 5/20/14 Data Boot Camp! 24

  25. COUNT (*) Aggregate Operators COUNT ( [ DISTINCT ] A) SUM ( [ DISTINCT ] A) SELECT COUNT (*) FROM Sailors S AVG ( [ DISTINCT ] A) MAX (A) Can use Distinct SELECT COUNT ( DISTINCT S.name) MIN (A) Can use Distinct FROM Sailors S single column SELECT AVG (S.age) SELECT AVG ( DISTINCT S.age) FROM Sailors S FROM Sailors S WHERE S.rating=10 WHERE S.rating=10 SELECT S.sname FROM Sailors S WHERE S.rating= ( SELECT MAX (S2.rating) FROM Sailors S2) 5/20/14 25

  26. Hands-On #4 • Another twist: • Average attendance for all teams AND Average attendance among teams that had an all-star player • SELECT AVG(attendance) FROM Teams • SELECT AVG(DISTINCT attendance) FROM Teams, Allstars WHERE Teams.teamID = Allstars.teamID 5/20/14 Data Boot Camp! 26

  27. GROUP BY • Conceptual evaluation • Partition data into groups according to some criterion • Evaluate the aggregate for each group Example: For each rating level, find the age of the youngest sailor SELECT MIN (S.age), S.rating How many tuples FROM Sailors S in the result? GROUP BY S.rating 5/20/14 27

Recommend


More recommend