what is a database system
play

What is a database system ? Database: a large, integrated collection - PowerPoint PPT Presentation

2 What is a database system ? Database: a large, integrated collection of data. Models a real world enterprise Entities (teams, games) Relationships (Orphan Pamuk received the Nobel Prize) Course introduction Constraints (


  1. 2 What is a database system ? • Database: a large, integrated collection of data. • Models a real ‐ world enterprise – Entities (teams, games) – Relationships (Orphan Pamuk received the Nobel Prize) Course introduction – Constraints ( at least one doctor on duty during off ‐ hours) – More recently, active components (“business logic”) Introduction to databases • Database Management System (DBMS): a CSCC43 Winter 2013 software system designed to store, manage, Ryan Johnson and facilitate access to databases. Thanks to Arnold Rosenbloom and Renee Miller for material in these slides 3 4 In the beginning… Early computing challenges • Time sharing • Bare hardware • There was The Mainframe – ~100 terminals per mainframe – No OS – Cost: millions – Users share hardware – No device drivers – Watts: millions – Want to share data , too – No file system – Size: acres – Speed: 40kHz – Memory: 2kB – Storage: 3.5MB (tape) SAGE (1954) SABRE (1960) UNIVAC (1951) => “The Database” => File Management System Few organizations could afford two! 1

  2. 5 6 “The Database” File management systems (FMS) • Abstract concept dating back to the 1950’s • File management ca. 1935 – Centralized repository for all the enterprise’s data – File: box of punchcards – Realtime updates from many sources – Metadata: label on the box – Concurrent access by many users – Ad ‐ hoc report: no big deal – Interactive (ad ‐ hoc) exploration and reporting – Hardware change: no big deal • Semi Automatic Ground Environment (SAGE) • File management ca. 1955 – Computer ‐ aided tracking and interception of aircraft – File: several km of magnetic tape – Dozens of SAGE installations (big one in North Bay) – Metadata: embedded in application logic – Hundreds of radar stations throughout North America – Ad ‐ hoc report: hire a couple programmers – Thousands of operators – Hardware change: hire a dozen programmers… Goal: all relevant information at your fingertips Huge need for portability, abstraction 7 Database Management System Why study databases?? • Shift from computation to information • File management systems meet The Database – always true for corporate computing – Protect users from each other (isolation, consistency) – Web made this point for personal computing – Protect application from data changes (at logical level) – more and more true for scientific computing – Protect data from hardware changes (at physical level) • Need for DBMS has exploded • Split personality remains to this day – Corporate: retail swipe/clickstreams, “customer relationship mgmt”, “supply chain mgmt”, “data warehouses”, etc. – Theory/applications (declarative access to changing data) – Scientific: digital libraries, Human Genome project, Sloan Digital Sky – Systems (make it run fast on ever ‐ changing hardware) Survey, physical sensors, grid physics network • Why so important? • A practical discipline spanning much of CS – Rate of change of DB applications is incredibly slow – OS, languages, theory, AI, multimedia, logic – Yet with a focus on real ‐ world apps – d app /dt << d platform /dt This semester: the theory/application side 2

  3. 9 10 What’s the intellectual content? Is the WWW a DBMS? • Representing information • Fairly sophisticated search available – Crawler indexes pages on the web – data modeling – Keyword ‐ based search for pages • Languages and systems for querying data • But… – complex queries with real semantics* – Data is mostly unstructured and untyped – over massive data sets – Search only (can’t modify, summarize, analyze, correlate, …) • Concurrency control for data manipulation – Few (zero) guarantees of freshness, accuracy, durability, consistency – DBMS lurking behind most Web sites provides these functions – controlling concurrent access • The picture is changing – ensuring transactional semantics – New standards like XML can help data modeling • Reliable data storage – The WWW/DB boundary is blurry! – maintain data semantics even if the lights go out * semantics: the meaning or relationship of meanings of a sign or set of signs 11 12 “Search” vs. Query Is my file system a DBMS? • What if you wanted to find • Strong shared heritage out which actors donated to – Direct descendant of file management system Steven Harper’s campaign? – Excellent insulator against hardware changes • Try “actors donate to harper • But… campaign” in your favorite search engine. – Data is mostly unstructured and untyped – No concept of constraints, relationships – Minimal support for atomicity, isolation, consistency • Stephen Harper (politician) or Hill Harper (actor)? • The picture is changing • Did Harper give or – File systems adopting database concepts (logging, transactions) receive the donation? – Object ‐ oriented file systems provide finer grain data model • Year? Comparison with other – The FS/DBMS boundary is blurry! donations? 3

  4. 13 14 Database vs. file system OS support for data management • Thought experiment #1 • Again, strong shared heritage – You and your project partner are editing the same file. – Another direct descendant of file management system – You both save it at the same time. – Powerful API abstractions – Whose changes survive? – Bring your favorite programming language A) Yours B) Partner’s C) Both D) Neither E) Who knows – Enforces protections on files, objects • Thought experiment #2 • But… – You’re updating a file when the lights go out – Scheduling, resource management inadequate for big data – Which of your changes survive? A) All B) None C) All since last save D) Who knows – Error handling: “program terminated with SIGSEGV” – Ad ‐ hoc query? Hire a programmer… • How to code against “who knows” ??? – Concurrency? Write code very, very carefully… – Very, very carefully… 15 16 DBMS vs. {OS, FS, WWW} Concept: transaction • Key services missing from some or all • “Business transaction” • Database transaction – Recovery, isolation, consistency – Old idea: withdraw money, – Sequence of reads and writes reserve seats, escrow, etc. to underlying data – Support for ad ‐ hoc queries – Atomic : I deliver and you pay, – Writes [appear to] take effect – Effective concurrency control or neither atomically – Preserve semantics across crashes, outages – Consistent : Sell each seat to – Each transaction moves the • SMOP? Simple matter of programming? only one person system between consistent states** – Isolated : Doctor doesn’t talk – Not really (we’ll see this semester) about the patient next door – Transactions can’t see (or – In fact, OS/FS often get in the way (next semester) interfere with) each other – Durable : Sales receipt, – Analogy: Memory management in C++ vs. Java confirmation number, etc. – Once the system returns success it will not lose the data • Misquoting Greenspun’s tenth rule: ** user responsible to write sane transactions Any sufficiently complex data processing system resembles a Formalized into an entire programming model buggy, half ‐ implemented, and poorly performing DBMS 4

  5. Concept: concurrency control Concept: data models • Concurrent execution: key to high performance. • Data model: a collection of concepts for – Disk accesses frequent, pretty slow describing data. – Keep the CPU working on several programs concurrently • Schema: a description of a particular • Interleaving two programs’ actions: trouble! collection of data, using a given data – Print statements during active account transfer model. – He and She both withdraw the last $100 from the ATM • DBMS ensures “anomalies” don’t arise • Many possible data models – Give users/programmers illusion of a single ‐ user system – Network, hierarchical, relational, object ‐ oriented, … – Thank goodness! Don’t have to program “very, very – The relational model is the most widely used today carefully”. A good data model is key to data independence 19 20 Concept: data independence Advantages of a DBMS • FMS (1950’s) • Data independence – File, metadata management • Efficient data access – Hardware abstraction layer • CODASYL/DBTG (1965) • Data integrity & security – Decouple application from schema – Decouple schema from physical data layout • Data administration • Edgar Codd (1970) • Concurrent access, crash recovery – Relational algebra – Move from procedural to declarative • Reduced application development time • Charles Bachman (1973) – Programmer navigates data instead of (merely) writing code • So why not use them always? – Move from machine ‐ centric to data ‐ centric programming – Expensive/complicated to set up & maintain • Fast forward to today – SQL, ODBC/JDBC, federation, web services, … – Cost & complexity must be offset by need – Data integration, cleaning, performance tuning, … – General ‐ purpose, not suited for special ‐ purpose tasks (e.g. text search!) Big Deal™… but still a work in progress 5

Recommend


More recommend