staff introduction to database systems
play

Staff Introduction to Database Systems Instructor: Dan Suciu CSE - PDF document

Staff Introduction to Database Systems Instructor: Dan Suciu CSE 444 Sieg, Room 318, suciu@cs.washington.edu Office hours: Monday, 11:30-12:30 (or by appointment) Lecture #1 TA: Yana Kadiyska September 30, 2002


  1. Staff Introduction to Database Systems • Instructor: Dan Suciu CSE 444 – Sieg, Room 318, suciu@cs.washington.edu – Office hours: Monday, 11:30-12:30 – (or by appointment) Lecture #1 • TA: Yana Kadiyska September 30, 2002 – yana@cs.washington.edu – Office hours: TBA (check mailing list) 1 2 Communications Textbook(s) • Web page: Main textbook, available at the bookstore: http://www.cs.washington.edu/444/ • Database Systems: The Complete Book , Hector Garcia-Molina, Jeffrey Ullman, Jennifer Widom • Mailing list: send email to Almost identical, and also available at the bookstore: • A First Course in Database Systems , Jeff Ullman majordomo@cs and Jennifer Widom saying: • Database Implementation , Hector Garcia-Molina, subscribe cse444 Jeff Ullman and Jennifer Widom 3 4 Other Texts Other Required Readings On reserve at the Engineering Library: There will be reading assignments from the Web: • Database Management Systems , Ramakrishnan • SQL for Web Nerds, by Philip Greenspun, – very comprehensive http://philip.greenspun.com/sql/ • Fundamentals of Database Systems , Elmasri, Navathe – very widely used • Others, especially for XML • Foundations of Databases , Abiteboul, Hull, Vianu – Mostly theory of databases For SQL, a good source of information is the • Data on the Web, Abiteboul, Buneman, Suciu MSDN library (on your Windows machine) – XML and other new/advanced stuff 5 6 1

  2. What Is a Relational Database Outline for Today’s Lecture Management System ? • Overview of database systems Database Management System = DBMS – Reading assignment for next lecture Relational DBMS = RDBMS (Wednesday): from SQL for Web Nerds, by Philip Greenspun, Introduction • A collection of files that store the data http://philip.greenspun.com/sql/ • Course Outline • A big C program written by someone else • Structure of the course that accesses and updates those files for you 7 8 Example of a Traditional Where are RDBMS used ? Database Application • Backend for traditional “database” Suppose we are building a system applications to store the information about: • Backend for large Websites • students • Backend for Web services • courses • professors • who takes what, who teaches what 9 10 Can we do it without a DBMS ? Doing it without a DBMS... Sure we can! Start by storing the data in files: • Enroll “Mary Johnson” in “CSE444”: Write a C program to do the following: students.txt courses.txt professors.txt Read ‘students.txt’ Read ‘students.txt’ Read ‘courses.txt’ Read ‘courses.txt’ Find&update the record “Mary Johnson” Find&update the record “Mary Johnson” Find&update the record “CSE444” Find&update the record “CSE444” Write “students.txt” Write “students.txt” Write “courses.txt” Write “courses.txt” Now write C or Java programs to implement specific tasks 11 12 2

  3. Problems without an DBMS... Enters a DMBS “Two tier database system” • System crashes: Read ‘students.txt’ Read ‘students.txt’ Read ‘courses.txt’ Read ‘courses.txt’ CRASH ! Find&update the record “Mary Johnson” Find&update the record “Mary Johnson” Find&update the record “CSE444” Find&update the record “CSE444” Write “students.txt” Write “students.txt” Write “courses.txt” Write “courses.txt” – What is the problem ? connection • Large data sets (say 50GB) (ODBC, JDBC) – What is the problem ? • Simultaneous access by many users – Need locks: we know them from OS, but now data on disk; and is there any fun to re-implement them ? Database server (someone else’s Applications Data files C program) 13 14 Functionality of a DBMS Functionality of a DBMS The programmer sees SQL, which has two components: Two things to remember: • Data Definition Language - DDL • Data Manipulation Language - DML • Client-server architecture – query language – Slow, cumbersome connection – But good for the data Behind the scenes the DBMS has: • It is just someone else’s C program • Query optimizer – In the beginning we may be impressed by its speed – But later we discover that it can be frustratingly slow • Query engine – We can do any particular task faster outside the DBMS • Storage management – But the DBMS is general and convenient • Transaction Management (concurrency, recovery) 15 16 How the Programmer Sees the How the Programmer Sees the DBMS DBMS • Tables: • Start with DDL to create tables : Students: Takes: ��� ���� �������� ��� ��� CREATE TABLE Students ( ����������� ������� ��������� ����������� �� ��� CREATE TABLE Students ( Name CHAR(30) ����������� ��� ���� ����������� �� ��� Name CHAR(30) SSN CHAR(9) PRIMARY KEY NOT NULL, � � SSN CHAR(9) PRIMARY KEY NOT NULL, ����������� �� ��� Category CHAR(20) Category CHAR(20) � Courses: ) . . . ) . . . ���! ����! "������! �� ���! ����#����! $���! • Continue with DML to populate tables: �� ���! %&����'��!�������! ('����! ! • Still implemented as files, but behind the scenes can INSERT INTO Students INSERT INTO Students be quite complex VALUES(‘Charles’, ‘123456789’, ‘undergraduate’) VALUES(‘Charles’, ‘123456789’, ‘undergraduate’) . . . . . . . . “ data independence ” = separate logical view from physical implementation 17 18 3

  4. Transactions Transactions • A transaction = sequence of statements that • Enroll “Mary Johnson” in “CSE444”: either all succeed, or all fail BEGIN TRANSACTION; BEGIN TRANSACTION; • Transactions have the ACID properties: INSERT INTO Takes INSERT INTO Takes SELECT Students.SSN, Courses.CID SELECT Students.SSN, Courses.CID A = atomicity FROM Students, Courses FROM Students, Courses WHERE Students.name = ‘Mary Johnson’ and WHERE Students.name = ‘Mary Johnson’ and C = consistency Courses.name = ‘CSE444’ Courses.name = ‘CSE444’ I = independence -- More updates here.... -- More updates here.... D = durability IF everything-went-OK IF everything-went-OK THEN COMMIT; THEN COMMIT; ELSE ROLLBACK ELSE ROLLBACK 19 20 If system crashes, the transaction is still either committed or aborted Queries, behind the scene Queries Declarative SQL query Imperative query execution plan: • Find all courses that “Mary” takes sname SELECT C.name SELECT C.name FROM Students S, Takes T, Courses C FROM Students S, Takes T, Courses C SELECT C.name SELECT C.name WHERE S.name=“Mary” and cid=cid WHERE S.name=“Mary” and FROM Students S, Takes T, Courses C FROM Students S, Takes T, Courses C S.ssn = T.ssn and T.cid = C.cid S.ssn = T.ssn and T.cid = C.cid WHERE S.name=“Mary” and WHERE S.name=“Mary” and sid=sid S.ssn = T.ssn and T.cid = C.cid S.ssn = T.ssn and T.cid = C.cid name=“Mary” • What happens behind the scene ? Courses Students Takes – Query processor figures out how to answer the query efficiently. The optimizer chooses the best execution plan for a query 21 22 Database Systems New Trends in Databases • The big commercial database vendors: • Object-relational databases – Oracle • Main memory database systems – IBM (with DB2) bought Informix recently • XML XML XML ! – Microsoft (SQL Server) – Relational databases with XML support – Sybase – Middleware between XML and relational databases • Some free database systems (Unix) : – Native XML database systems – Postgres – Lots of research here at UW on XML and databases – Mysql – Predator • Peer to peer, stream data management – still research • In CSE444 we use SQL Server. You may use something else, but you are on your own. 23 24 4

Recommend


More recommend