Welcome to CS411 • Web site: CS411 http://www.cs.illinois.edu/class/cs411 Database Systems • Announcements, syllabus, policies, schedule, lectures… 01: Introduction • Please read the class syllabus, policies, and lecture schedule; ask if you have Kazuhiro Minami questions. CS411 2 What makes this course Teaching Staff: The Front End particularly cool • Hengzhi (Hanna) Zhong • We learn a new data ‐ centric way of • Yun Hee Lee thinking about information, typically much more abstract than before. • Both are PhD students from the Database and • People come from all over campus Information Systems (DAIS) group • More fun than most other CS courses – Can build a cool DB application without being miserable CS411 3 CS411 4
CS411 presents two perspectives Teaching Staff: The Back End on DBs Kazuhiro Minami • User perspective: externals (1/2) • Research interests: – how to use a database system? – Information security in distributed database – conceptual data modeling, relational and other systems data models, database schema design, relational algebra, and the SQL query language. – Privacy in ubiquitous computing • System perspective: internals (1/2) • The past 10 years – how to design and implement a database system? – PhD in CS from Dartmouth – data representation, indexing, query optimization – 3 years at UIUC (visiting lecturer & postdoc) and processing, transaction processing, concurrency control, and crash recovery – Taught CS411 in Fall 2007 and Fall 2008 CS411 5 CS411 6 Textbook Prerequisites • Must have data structures and algorithms • Textbook: Database Systems: The Complete Book (2nd edition) , by background Hector Garcia ‐ Molina, Jeffrey D. Ullman, and Jennifer D. – CS 225 or 400 equivalent Widom • Good at C++ or Java • Good references: – Database Management Systems , by Raghu Ramakrishnan – project will require lot of programming and Johannes Gehrke, McGraw ‐ Hill – need C++ or Java to do a good job at talking with – Database System Concepts , by Abraham Silberschatz, databases Henry F. Korth, and S. Sudarshan, McGraw Hill (easiest) – you or your project group pick the language – Fundamentals of Database Systems , by Ramez Elmasri and Shamkant Navathe, Addison Wesley • Knowing only C will require more work – An Introduction to Database Systems , by C. J. Date, – more difficult to talk to databases in C Addison Wesley CS411 7 CS411 8
Course Format Lectures • For all students • Lecture slides will be posted shortly before or after – two 75 ‐ min lectures / week the lecture – 4 homeworks – project • Lectures are important for guiding your reading of – a midterm and a final exam textbook (and will be covered in exams and • Graduate students do an extra project homeworks) – survey – You can ask questions too : ‐ ) – or research ‐ oriented projects. discuss with TA. – I plan to give in ‐ class exercises CS411 9 CS411 10 Homeworks Project • DBMS application • Mostly paper ‐ based, some may – select an application that needs a database involve light programming – build a database application from start to finish • Due at the beginning of class on the – build its user interface (e.g., Web interface) • Significant amount of programming due date • Will be done in stages • No late homework will be accepted – you will submit some work at the end of each stage • Will show a demo at semester end CS411 11 CS411 12
Project Groups Exams • Project will be done in group of 3 ‐ 4 students – learn how to work in a group: valuable skills • Midterm and final – also use project group as study partners • There will be a brief review before each – people from other departments especially valuable exam • Try to form groups as soon as possible – can start by posting requests on the class • Check dates and make sure no conflict! newsgroup • There will be a deadline soon for forming groups – generally no makeup exams unless – if you have not formed groups by then, we will exceptional cases help assign you to groups • Grading: – all members receive the same project grade – if someone drops the course, the rest go on CS411 13 CS411 14 Tentative Grading Breakdown Office Hours • Homework: 35% • Often the best way for asking • Project: 25% questions and clarifications • Midterm: 15% • Will have office hours every day • Final: 25% Monday ‐ Friday • Extra ‐ project: 20% ( The overall scores will be scaled proportionally.) • See course web site for schedule CS411 15 CS411 16
Communications Newsgroup: class.cs411 • http://www.cs.illinois.edu/class/cs411 – “Announcements” page • Designed for you and your peers • Newsgroup: class.cs411 – to communicate and help one another check it regularly for questions/clarifications – please do not post solutions/admin ‐ requests to – the newsgroup – announcements will appear here and at the course web site • TAs will monitor and try their best to help with your questions • If you have a question/problem • But not always the best way to get answers 1. talk to people in your group first – TAs may not be able to answer all questions 2. post your question on the newsgroup quickly 3. email TA – not good for more complex questions 4. go to office hours to talk to TA or instructor – can come to office hours or email TA Let me know if you are having trouble getting questions answered on newsgroups/email CS411 17 CS411 18 Don’t be afraid to come talk to us Data Management Evolution So you are getting a C and don’t want to Jim Gray: Evolution of Data Management . IEEE Computer 29(10): 38-46 (1996). bother the TA/professor with your • Manual processing: ‐‐ 1900 questions? • Mechanical punched ‐ cards: 1900 ‐ 1955 • Stored ‐ program computer: sequential record processing: 1955 ‐ 1970 Do you think Marc Andreessen got all As? • Online navigational network DBs: 1965 ‐ 1980 Do you think Tom Siebel got all As? (maybe – many applications still run today! in CS311) • Relational DBs: 1980 ‐ present Do you think our “most successful” alums got • Post ‐ relational and the Internet: 1995 ‐ all As? CS411 19 CS411 20
What is a database management system DBMS Examples (DBMS)? System for providing efficient, convenient, • Most familiar use: many Web sites rely and safe multi ‐ user storage of and access heavily on DBMS's to massive amounts of persistent data – Examples? Red words = key characteristics • And many non ‐ Web examples CS411 21 CS411 22 Example: Banking system Why is multi-user access hard? • Data = information on accounts, Multi ‐ user: many people/programs accessing same db, or even same data, simultaneously ‐ > need careful controls customers, balances, current interest Alice @ ATM1: withdraw $100 from account #002 rates, transaction histories, etc. get balance from database; if balance >= 100 then balance := balance ‐ 100; • Massive: many gigabytes at a minimum dispense cash; put new balance into database; for big banks, more if keep history of all Bob @ ATM2: withdraw $50 from account #002 transactions, even more if keep images get balance from database; if balance >= 50 then balance := balance ‐ 50; of checks ‐ > Far too big for memory dispense cash; put new balance into database; • Persistent: data outlives programs that Initial balance = 200. Final balance = ?? operate on it CS411 23 CS411 24
Example execution How can we implement a db? • Why don’t we just put all the data in an ordinary file, and access it via an ordinary program? $100 lost! $100 lost! Alice’s Write $100 – size limited by disk or address space $200 $100 Read ATM – when system crashes we may lose data #002 – file ‐ based authorization is insufficient Dispense $100 $200 $150 $100 • Query/update: – need to write a new C++/Java program for every Bob’s Read $200 $150 Write $150 new query ATM Banking database – need to worry about performance Dispense $50 CS411 25 CS411 26 Back to the red words • Safe: • Concurrency: limited protection – need to worry about interfering with other users – from system failures – need to offer different views to different users – from malicious users (e.g., BANNER and registrar, students, professors) • Convenient: • Schema change: – simple commands to debit account, get balance, write statement, transfer funds, etc. – entails changing file formats – also unexpected queries should be easy – need to rewrite virtually all applications • Efficient: DBMSs were invented to solve all these problems! – don't scan the entire file to get balance of one account, get all accounts with low balances, get large transactions, etc. – massive data! ‐ > DBMS's carefully tuned for performance CS411 27 CS411 28
Recommend
More recommend