Faculty of Computer Science Technion – Israel Institute of Technology Database Management Systems Course 236363 Lecture 1: Introduction
DATABASES 2
3 Modeling The real world: • Objects - unique • Relationships between objects – at different complexity levels • Operations: Changing aspects of objects and relationships Example: Bank • Objects- customers, branches, employees • Relationships- “customer-owns-account”, “acount- managed_at- branch • Operations: add a customer, add an account, update an account, deposit, withdrawal, balance, prediction A DBMS is charged with faithful and efficient modeling of aspects of the real world for a specific operation
4 So, What is a Database? • Simply, a persistent (cross session) repository of data – Models differ in what “data” means, especially how to separate content from structure; Tables? Graphs? Trees? Objects? Maps? • Data Base Management System (DBMS): A software system for creating, maintaining, updating, and querying the database – General purpose—not for any specific application – Interacts with a user (e.g., DBA) or an application • Challenges: – Modeling (data, languages, consistency, security) – Engineering – Efficiency & scalability
6 What Services do Databases Provide? • Centralized management of information at the conceptual and physical levels. • A “smart” query processor • Transaction Processing, ACID • Centralized access control • Centralized level of operational recovery • A language for data access • Accessing data from various platforms • A high degree of logical-physical independence
8 The IMDb Application
9 Steps in Database Setup • Requirement analysis – What information needs to be stored? How will it be used? • Conceptual database design – Define/describe/discuss the semantic modeling of data in the application • Logical database design – Translate conceptual design into a database schema • Physical database design – Translate the database schema into a a physical storage plan on available hardware (done by DBMS)
10 Faculty Example • Design a database for the faculty’s administrative assistants • Several types of entities – Student: student name, id, address – Course: name, catalogue number, lecturer – Lecturer? Faculty? Building? Academic track? • Depending on the application needs • Various relationships – Student took course (and got a grade)
11 Data Modeling Numbe ID Address Name r Lecturer Phone Student Took Course Name An Entity-Relationship Diagram (ERD)
12 More Detailed? Year Numbe Phone Name r ID Student Took Course Address Name Semester Gave Year Semester emp# Lecturer Phone ID Name Address
13 Type Inheritance? Year Numbe Name r Course Student Took Gave Year Semester Semester emp# Lecturer ISA ISA Person Phone ID Name Address
14 Relational Design Option 1: Single Table StudentCourseRegistry sID sName sAddr cNum cName cLecturer grade 1234 Avia Haifa 363 DB Anna 95 1234 Avia Haifa 319 PL Barak 82 2345 Boris Nesher 319 PL Barak 73 Advantages? Drawbacks: • Cost & Redundancy: Why should the student’s address be stored in each course she takes? • Incompleteness: What about students that do not take any courses? Course w/o students? • Harder to maintain: If a student changes address, need to update all records of relevant tuples; risk inconsistency or require more expensive controls • Harder to maintain: If we wish to add a semester column, every app will need to update its schema assumption
15 Relational Design Option 1: Single Table StudentCourseRegistry sID sName sAddr cNum cName cLecturer grade 1234 Avia Haifa 363 DB Anna 95 1234 Avia Haifa 319 PL Barak 82 2345 Boris Nesher 319 PL Barak 73 Option 2: Multiple Tables Student Course Took ID name addr number name lecturer sID cNum grade 1234 Avia Haifa 363 DB Anna 1234 363 95 2345 Boris Nesher 319 PL Barak 1234 319 82 Drawback: join required more often... 2345 319 73 How can we formalize what “goodness” means? Need to understand the connection between sID and sName, etc.
16 Integrity Constraints • Schema-level (data-independent) specifications on how records should behave beyond the relational structure – (e.g., students with the same ID have the same name, take the same courses, etc.) • DBMS guarantees that constraints are always satisfied, by disabling actions that cause violations
17 Why Schema-Level Constraints? • Maintenance: consistency assured w/o custom code • Development complexity: no reliance on consistency tests – But exceptions need to be handled • Optimization: operations may be optimized if we know that some constraints hold – (e.g., once a sought student ID is found, you can stop; you won’t find it again)
18 Which Constraints Should Hold Here? A student cannot get two grades for the same course Grade must be > 53 (check constraint) Student Course Took ID name addr number name lecturer sID cNum grade 1234 Avia Haifa 363 DB Anna 1234 363 95 2345 Boris Nesher 319 PL Barak 2345 319 73 No two tuples have Courses with the same sID is a Student.ID; the same ID (key number have the same cNum is a constraint) name (functional Course.number dependency) (referential constraint)
19 Querying: Which Courses Avia Took? S C T ID name addr number name lecturer sID cNum grade 1234 Avia Haifa 363 DB Anna 1234 363 95 2345 Boris Nesher 319 PL Barak 1234 319 82 2345 319 73 Assembly . . . QL SQL m ov $1, % r ax m ov $1, % r di SELECT C. nam e m ov $m es s a ge , % r s i FROM S, C, T m ov $13, W HERE S. nam e = ‘ Avi a’ AND S. I D = T. s I D % r dx AND T. c Num = C. num be r s ys c a l l m ov $60, % r a x Logic (RC) xor % r di , % r di { ⟨ x ⟩ | ∃ y, n, z , l , g . . . [ S( y, ' Avi a ‘ , n) ∧ C( z , x, l ) ∧ T( y, z , g) ] } Python f or s i n S: Logic Programming (Datalog) f or c i n C: Q( x) S( y, ‘ Avi a ’ , n) , C( z , x, l ) , T( y, z, g) f or t i n T: i f s . s Nam e ==‘ Avi a ’ a nd Algebra (RA) s . I D==t . s I D a nd t . c Num == c . num be r : e ( σ S. na m , I D=s I D ( S ⨉ C ⨉ T) ) ) π C. na m e =‘ Avi a ’ , num be r =c Num pr i nt c. na m e
20 What is a Query Language? • A language for specifying how desired information is retrieved/derived/inserted/deleted from the database • Usually, does not change the database schema – At least not the user-defined tables • Specialized to the database model – As opposed to a general programming language • In contrast, a Data Definition Language ( DDL ) is a language for manipulating (creating / updating / deleting) schemas
21 “Goodness” of a Query Language • Simple – Users: easier to use – DBMS: easier to implement, easier to optimize • High-level – Declare what , not program how – Users: easier, less control – DBMS: more flexibility, more responsibility • Expressive – NOT: predefined queries; YES: ops w/ composition – Users: better – DBMS: harder to implement/optimize
22 Other Data Models: XML <students> students <student id="100026"> <name>Joe Average</name> <age>21</age> student student <major>Biology</major> <results> results id name <result course="Math 101" grade="C-"/> <result course="Biology 101" grade="C+"/> <result course="Statistics 101" grade="D"/> 100026 Joe Average result result </results> </student> <student id="100078"> grade course <name>Jack Doe</name> <age>18</age> Math 101 C <major>Physics</major> <major>XML Science</major> <results> <result course="Math 101" grade="A"/> <result course="XML 101" grade="A-"/> <result course="Physics 101" grade="B+"/> <result course="XML 102" grade="A"/> </results> </student> </students>
Account Data in XML 23 <Customer CustomerNum =" 100 " Accounts=" 12 15" > <CustomerName> MacNiel, John </CustomerName> <CustomerAddress>1234 Huron Street </CustomerAddress> </Customer> <Account AccountNum ="12" CustomerIn =" 100 370"> <Balance > 12000 </Balance> </Account >
24 Bank DTD ("Schema") in XML <?xml version="1.0" encoding="UTF-8"?> <!ELEMENT CustomerName (#PCDATA)> <!ELEMENT CustomerAddress (#PCDATA)> <!ELEMENT Customer (CustomerName, CustomerAddress)> <!ATTLIST Customer CustomerNum ID #REQUIRED Accounts IDREFS #IMPLIED> <!ELEMENT Customers (Customer)*> <!ELEMENT Balance (#PCDATA)> <!ELEMENT Account (Balance)> <!ATTLIST Account AccountNum ID #REQUIRED CustomerIn IDREFS #IMPLIED> <!ELEMENT Accounts (Account)*> XML Schema – a more sophisticated mechanism, separates tags and the data types they contain
Recommend
More recommend