database systems ii
play

Database Systems II Introduction CMPT 454, Simon Fraser University, - PDF document

Database Systems II Introduction CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 1 Database Systems I Recap A Database Management System (DBMS) is a software package designed to store, manage and retrieve databases. A Database


  1. Database Systems II Introduction CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 1 Database Systems I Recap A Database Management System (DBMS) is a software package designed to store, manage and retrieve databases. A Database System (DBS) consists of two components: the DBMS the database. CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 2

  2. Database Systems I Recap Why use a DBS? - Logical data independence. - Physical data independence. - Efficient access. - Reduced application development time. - Data integrity and security. - Concurrent access / concurrency control. - Recovery from crashes. CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 3 Database Systems I Recap A data model is a collection of concepts for describing data (a formal language !). A schema is a description of a particular collection of data (database), using the given data model. The relational data model is the most widely used model today. Main concept: relation , basically a table with rows and columns. Every relation has a schema , which describes the columns, or fields. CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 4

  3. Database Systems I Recap The conceptual schema defines the logical structure of the whole database. View 1 View 2 View 3 An external schema ( view ) describes how some user Conceptual Schema sees the data (restricted Physical Schema access, derived data). The physical schema describes the storage and index structures of the database. CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 5 Database Systems I Recap Relational database: a set of relations Relation: made up of 2 parts: Instance : a table , with rows and columns. #Rows = cardinality , #attributes = degree / arity. Schema : specifies name of relation, plus name and type of each attribute. e.g. Students( sid : string, name : string, login : string, age : integer, gpa : real). Can think of a relation as a set of rows or tuples (i.e., all rows are distinct). CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 6

  4. Database Systems I Recap Relational algebra: mathematical query language which forms the basis for “real” languages (e.g. SQL), and for implementation. Five basic operations: union, set-difference, selection, projection, cartesian product. Shortcuts for common operations: join, division. CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 7 Database Systems I Recap SQL: the standard practical query language for relational databases. Schema modifications: create, alter, delete table. Instance modifications: insert, delete, update tuples of a table. Queries to retrieve a specified set of tuples ( what ). Queries are descriptive, which allows the DBS to find the most efficient way how to process a query. CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 8

  5. Database Systems I Recap SELECT [DISTINCT] target-list relation-list FROM WHERE qualification relation-list A list of relation names (possibly with a range-variable after each name). target-list A list of attributes of relations in relation-list. qualification Comparisons (“Attr op const” or “Attr1 op Attr2”, where op is one of ) combined using AND, OR and NOT. CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 9 Database Systems I Recap Semantics of an SQL query defined in terms of the following conceptual evaluation strategy. Compute the cross-product of relation-list . Selection of the tuples satisfying qualifications . Projection onto the attributes that are in target- list . If DISTINCT is specified, eliminate duplicate rows. A query optimizer will find more efficient strategies to compute the same answers . CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 10

  6. A Simple DBS Implementation Relations SQL Results Statements A B C D E A D A D CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 11 A Simple DBS Implementation Relations stored in files (ASCII) e.g., relation R is in /usr/db/R.txt Smith # 123 # CS Jones # 522 # EE . . Schema file (ASCII) in /usr/db/schema.txt R1 # A # INT # B # STR … R2 # C # STR # A # INT … CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 12

  7. A Simple DBS Implementation Sample query & select * from R # Relation R A B C SMITH 123 CS & CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 13 A Simple DBS Implementation Sample session & select * from R | LPR # & Query result sent to printer CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 14

  8. A Simple DBS Implementation Creating a new relation T & select * from R where R.A < 100 | T # & CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 15 A Simple DBS Implementation Processing single table queries To process “ select * from R where condition ”: (1) Read dictionary to get R attributes (2) Read R file. For each line: (a) Check condition (b) If OK, display CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 16

  9. A Simple DBS Implementation Processing single table queries creating a new table To process “ select * from R where condition | T ”: (1) Process select as before (2) Write results to new file T (3) Append new line to dictionary CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 17 A Simple DBS Implementation Processing multi-table queries To process “ select A,B from R,S where condition ”: (1) Read dictionary to get R,S attributes (2) Read R file, for each line: (a) Read S file, for each line: (i) Create join tuple A,B from R,S (ii) Check condition (iii) Display if OK CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 18

  10. What’s wrong with this Implementation? Tuple layout on disk e.g., - Change string from „Cat‟ to „Cats‟ and we have to rewrite the entire file - ASCII storage is expensive wastes a factor of ~256/10 of space for integers - Deletions are expensive CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 19 What’s wrong with this Implementation? Search very expensive e.g., - Cannot find tuple with given key quickly - Always have to read full relation CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 20

  11. What’s wrong with this Implementation? Inefficient query processing e.g., select * from R,S where R.A = S.A and S.B > 1000 Simple implementation has quadratic runtime complexity - Do selection first? - More efficient join? CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 21 What’s wrong with this Implementation? No buffer manager In particular, need caching No concurrency control No concept of transactions Need to enforce ACID properties No API No interaction with other DBS CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 22

  12. DBS Architecture Strategy Selector Query Parser User User Transaction Transaction Manager Concurrency Control Buffer Manager Recovery Manager Lock Table File Manager M.M. Buffer Log Statistical Indexes Data User Data System Data CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 23 Outline Database Systems II Secondary storage management disks, records and files, . . . Index structures B-trees, hash tables, multi-dimensional indexes Query execution one-pass algorithms, two-pass algorithms, index-based algorithms Query compiler parsing and preprocessing, query optimization, cost estimation CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 24

  13. Outline Database Systems II Crash recovery disk failures, stable storage, logging,… Concurrency Control correctness, locks, scheduling, … Transaction Processing logs, deadlocks, serializability,… Data Mining knowledge discovery in databases, association rules CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 25 Marking Scheme Assignments 40% paper and pencil, no programming Midterm exam 15% covering all material up to and including query optimization Final exam 45% covering all the material No alternative marking scheme CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 26

  14. Tentative Schedule October 21 other instructor or class canceled October 28 midterm exam December 2 last class December 16 final exam CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 27 References Textbook - Database Systems: The Complete Book, Garcia-Molina, Ullman, and Widom, Prentice Hall, 2008: 2nd edition - relevant sections listed in schedule on class website, study these sections in advance! Recommended book Database Management Systems, Ramakrishnan and Gehrke, McGraw Hill, 2003: 3rd edition Lecture slides - based on slides by Hector Garcia-Molina and Martin Theobald, - posted on the class website. CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 28

Recommend


More recommend