Overview of Database Systems CS3860 - Jay Urbain, PhD Introduction to Database Systems 1
2 UFR – Uniform Financial Report (I think)
What is a database? 3
What is a database? Collection of data – Files, notes, to do list, patient records, play lists, etc. 4
What is a Database Management System? DBMS? My family room 5
Database Management System DBMS: A Database Management System (DBMS) is a software package designed to store and manage databases. • Typically large, integrated collection of data • Models real-world enterprise – Entities (e.g., students, courses, professors) – Relationships (e.g., Bono is enrolled in CS3860; urbain is teaching CS3860) 6
Files vs. DBMS Why not just use files? 7
Why use a DBMS? • Data independence • Efficient access • Reduced application development time • Data integrity • Data security • Uniform data administration • Concurrent access, recovery from crashes • Distributability, scalability • Aggregate functions 8
Why Study Databases? • Shift from computation to information – At the low-end: scramble to web-space, big data – At the high-end: scientific applications, data analytics, collective intelligence, eScience, machine learning • Shift to functional computing on large datasets. • Datasets increasing in diversity & volume • Digital libraries, interactive video, Human Genome, the web … • Integration of structured and unstructured data • Polyglot persistance 9
Why Study Databases? • DBMS encompass most of CS – OS, languages, theory, computational complexity, data structures, algorithms, AI, multimedia, logic 10
IceCube Neutrino Observatory. Big science is data driven. 11
All of society is online. New York DA use Data analysis in MEMEX Data for all trafficking the fight against investigations this human trafficking. year. 12
Increasingly many companies see themselves as data driven . 13
EVEN MORE “TRADITIONAL” COMPANIES … https://www.youtube.com/watch?v=OvfU1NpCJQQ https://www.youtube.com/watch?v=3xGoBlI_fdg https://www.youtube.com/watch?v=OpDIEJrog3s 14
THE WORLD IS INCREASINGLY DRIVEN BY DATA … This class teaches the basics of how to use & manage data. 15
Big Data Landscape … Infrastructure is Changing New tech. Same Principles. 16 http://www.bigdatalandscape.com/
Why should you study databases? • Mercenary - make more $$$: – Startups need DB talent right away = low employee # – Massive industry … • Intellectual : – Science: data poor to data rich • No idea how to handle the data! – Fundamental ideas to/from all of CS: • Systems, theory, AI, logic, stats, analysis … . Many great computer systems ideas started in DB. 17
Data Models • A data model is a collection of concepts for describing data. • The relational data model is one of most popular models. • Main concept: relation , basically a table with rows (records) and columns (attributes). • Every relation has a schema which describes the relation name, the name of the columns (or fields), and the field types. • A schema is a description of a particular collection of data, using the given data model . • A semantic data model is a more abstract high level representation that makes it easier to come up with an initial description. 18
Levels of Abstraction • Many views (external schema) – Describe how users see the data • Single conceptual (logical) schema – Defines logical structure • Single physical schema – Describes the files (pages, blocks) and indexes used. 19
Example: University Database • Conceptual schema: – Students(sid: string, name: string, login: string, age: integer, gpa: real); – Courses(cid: string, cname: string, credit: integer); – Enroll(sid: string, cid: string, grade: string); • Physical schema: – Relations stored as ordered/unordered files – Index on first column of Students • External schema (View): – Course_Info(cid: string, enrollment: integer); 20
Data Independence Applications insulated from how data is structured and stored. • Logical data independence : Protection from changes in the logical structure (conceptual schema) of data. • Physical data independence : Protection from changes in physical structure of data. 21
Concurrency Control • Concurrent execution (threads) of user programs is required for good DBMS performance. • Since disk access is frequent and slow , it is important to keep the CPU humming by working on several user programs concurrently. • Interleaving actions of different user programs can lead to inconsistency, e.g., check cleared while account balance is still being computed. • DBMS ensures such problems do not happen: users can pretend they are using a single-user system. • Alternatively, other database models allow eventual consistency. 22
Transaction • A transaction is an execution of a DB program. • Key concept of transaction: an atomic sequence of database actions (reads/writes). • Each transaction, executed completely, must leave the DB in a consistent state (provided DB was consistent when the transaction begins). 23
Transactions • Users can specify some simple integrity constraints on the data and the DBMS will enforce these constraints. • The DBMS does not really understand the semantics (meaning) of the data. – E.g., how interest is calculated on student overdue accounts. • So, ensuring that a transaction (runs alone) is ultimately the user ’ s responsibility! 24
Polyglot Persistence • Using different data storage technologies to handle varying data storage needs. • An application that talks to different databases using each for what they are best at to achieve an end goal, • http://www.dummies.com/programming/big-data/engineering/big- data-and-polyglot-persistence/ • http://www.informit.com/articles/article.aspx?p=1930511 • https://martinfowler.com/bliki/PolyglotPersistence.html 25
A Note on DMBS’s: there are many 26
Recommend
More recommend