(R)DBMSs today DMBS Approx. market share (Microsoft Access) - PowerPoint PPT Presentation

3/3/2014 (R)DBMSs ”today” DMBS Approx. market share (Microsoft Access) Database Systems Oracle 40-50% IBM DB2 20-30% Microsoft SQL Server ~15% RDBMSs Sybase 3% The NoSQL movement (…) MySQL 1% PostgreSQL 0.5% 1 2 Oracle DB IBM DB2 • First (in-house) SQL database • First commercially sold database engine – 1974, DB2 released commercially 1982. – V2 release 1979. “At the time IBM didn't believe in the potential of Codd's ideas, leaving the implementation to a group of programmers not under Codd's supervision, who violated several fundamentals of • HUGE market share historically (~80%) Codd's relational model; the result was Structured English QUEry Language or SEQUEL.” – Still clear market leader. - Wikipedia article on DB2 • Near-monopoly on ”mainframes” I believe this • Standard incompliant explains a lot… • Towards Oracle-compliant(!) – Might makes right? – IBM and Oracle cooperate on e.g. tools. 3 4 1

3/3/2014 Microsoft SQL Server MySQL • Open Source – but owned by Oracle since 2010 • First release 1989. • Historically: fast but feature-poor – Subqueries(!) added in ”recent” release – FK constraints only with Oracle’s (InnoDB) backend • Market leading on Windows application – Not optimized for joins • Still missing features (but getting closer) platforms. – No CHECK constraints (including assertions) – No sequencing (WITH) – No INSTEAD OF triggers on views. • Code base originally by Sybase, aquired by • Big on the web: used by Wikipedia, Facebook, … Microsoft 1991. – Early support in PHP helped boost 5 6 PostgreSQL SQLite • Open Source – community development • Small-ish C library • Historically: full-featured but (relatively) slow – Embedded into application means no communication • Much faster today – and optimized for complex tasks overhead – Efficient support for joins • Stores all data as a single file on host platform. • Almost standard-compliant – Not as fast as dedicated systems. – Full constraint support – except assertions! • Not full-featured. – Full ACID transactions – Sequencing (WITH) • Prominent users: Firefox, Chrome, Opera, Skype • Prominent users: Yahoo, MySpace, Skype, … 7 8 2

3/3/2014 Feature Comparison - Queries Feature Comparison - DDL OUTER JOIN INTERSECT/EXCEPT WITH FK REF CHECK ASSERTION TRIGGER PROC/FUN Oracle Yes Yes (MINUS) Yes Oracle Yes Yes No Yes Yes IBM DB2 Yes Yes Yes IBM DB2 Yes Yes No Yes Yes MS SQL Server Yes Yes Yes MS SQL Server Yes Yes No Yes Yes MySQL Yes No No MySQL InnoDB No No Yes Yes PostgreSQL Yes Yes Yes PostgreSQL Yes Yes No Yes Yes SQLite LEFT Yes No SQLite Yes No No Yes No Assertions (schema-level constraints) are incompatible with most existing research on database optimization. No existing implementation supports them (except by using triggers to mimic). 9 10 Feature Comparison - Misc Beyond SQL? OS Support Programming support • SQL was first developed around 1970 Oracle Win, Mac, Unix, z/OS PL/SQL – Contemporaries: Forth, PL/I, Pascal, C, SmallTalk, ML, … IBM DB2 Win, Mac, Unix, z/OS, iOS SQL/PSM MS SQL Server Windows T-SQL MySQL Win, Mac, Unix, z/OS, Symbian SQL/PSM • With one prominent exception – C – these have all PostgreSQL Win, Mac, Unix, Android PL/pgSQL, SQL/PSM, lots more! been succeded by newer, cooler languages. SQLite Any N/A • Has the time finally come for SQL as well? PostgreSQL allows procedures to be written in a wide variety of languages, including Java, Python, Tcl, Perl, PHP… SQLite is a C library, so can be used with any language that has C FFI support. 11 12 3

3/3/2014 SQL injection attacks The world is a-changin’ • ”The Claremont Report”, 2008 – Big Data is coming big-time • Social networks, e-science, Web 2.0, streaming media, … – Data IS the business • Efficient data management is no longer just a cost reduction – it’s a selling point! – Rapidly expanding user base http://xkcd.com/327/ • … means rapidly expanding user needs – Architectural shifts in computing The possibility for SQL injection attacks has lead development away from literal • Storage hardware is still improving exponentially SQL, towards higher-level interfaces, tools and libraries. • Data networks, cloud computing, … 13 14 Examples of database sizes RDBMS weakness • Digg: 3 TB – just to store the up/down votes • RDBMSs typically handle ”massive” amounts of data in complex domains, with frequent small read/writes. • Facebook: 50 TB – for the private messaging – The archetypical RDBMS serves a bank. feature • Data-intensive applications don’t fit this pattern: – MASSIVE+++ amounts of data (e.g. eBay) • eBay: 2 PB data overall (2 000 000 GB) – Super-fast indexing of documents (e.g. Google) – Serving pages on high-traffic websites (e.g. Facebook) – Streaming media (e.g. Spotify) 15 16 4

3/3/2014 Column-oriented databases The ”NoSQL” movement • Typical RDBMSs are row-oriented SELECT fun, profit FROM Real_World – All data associated with a particular key is stored close together. WHERE relational = FALSE; – Allows horizontal partitioning (sharding) • NoSQL (”Not only SQL”) • Different rows stored on different distributed nodes. • Column-oriented (R)DBMSs – Originally the name of an RDBMS with a different interface language. – All data in a given column is stored close together. – Allows vertical partitioning (normalization) – Nowadays a term that encompasses a wide variety • Different columns stored on different nodes. of non-relational DBMSs (”NoRel”?). – Fast computation of aggregations, e.g. data mining. 17 18 Non-relational databases MapReduce • MapReduce framework • No data model – all data stored in files – Google originally; Hadoop (Apache), … • Operations supplied by user: • Key-Value stores – Reader :: file → [input record] – BigTable (Google), Cassandra (Apache), … – Map :: input record → <key, value> • Document stores – Reduce :: <key, [value]> → [output record] – Writer :: [output record] → file – CouchDB, MongoDB, SimpleDB, … • Everything else done behind the scenes: • Graph databases – Consistency, atomicity, distribution and parallelism, ”glue” – Neo4j, FlockDB, … • Optimized for broad data analytics • Semi-structured databases – Running simple queries over all data at once – (Native) XML databases, … 19 20 5

3/3/2014 MapReduce implementations Key-Value Stores • The ”secret” behind Google’s success • Key-Value stores is a fancy name for persistant maps (associative arrays): – Still going strong. • Hadoop (Apache) void Put(string key, byte[] data); – Open Source implementation of the MapReduce byte[] Get(string key); framework void Remove(string key); – Used by Ebay, Amazon, Last.fm, LinkedIn, Twitter, • Extremely simple interface – extremely Yahoo, Facebook internal logs (~15PB), … complex implementations. 21 22 Key-Value Stores Key-Value store implementations • Built for extreme efficiency, scalability and fault • BigTable (Google) tolerance – Sparse, distributed, multi-dimensional sorted map – Records distributed to different nodes based on key – Proprietary – used in Google’s internals: Google Reader, Google Maps, YouTube, Blogger, … – Replication of data to several nodes • Sacrifice consistency and querying power • Cassandra (Apache) – Single-record transactions – Not good for ”joins” – Originally Facebook’s PM database – now Open Source (Apache top-level project) – ”Eventual consistency” between nodes – Used by Netflix, Digg, Reddit, Spotify, … • AKA: ”why does Facebook tell me I have a notification when there isn’t anything new when I click the icon??!?” 23 24 6

3/3/2014 Semi-structured data (SSD) Document stores • More flexible than the relational model. • Roughly: Key-Value stores where the values – The type of each ”entity” is its own business. are ”documents” – Labels indicate meanings of substructures. – XML, JSON, mixed semistructured data sets • Semi-structured: it is structured, but not everything is structured the same way! • Typically incorporate a query language for the • Support for XML and XQuery in e.g. Oracle, DB2, SQL Server. document type. – See previous lecture for discussion on XML • Special case: Document databases querying. 25 26 Document store implementations Graph Databases • MongoDB • Data modeled in a graph structure – Nodes = ”entities” – Name short for ”Hu mongo us” – Properties = ”tags”, attribute values – Open source – owned by 10gen – Edges connect • Nodes to nodes (relationships) – JSON(-like) semi-structured storage • Nodes to properties (attributes) – JavaScript query language • Fast access to associative data sets – All entities that share a common property – Supports MapReduce for aggregations – Computing association paths • Apache CouchDB • Used by Twitter (FlockDB) to store user relations, … 27 28 7

(R)DBMSs today DMBS Approx. market share (Microsoft Access) - PowerPoint PPT Presentation

3/3/2014 (R)DBMSs today DMBS Approx. market share (Microsoft Access) Database Systems Oracle 40-50% IBM DB2 20-30% Microsoft SQL Server ~15% RDBMSs Sybase 3% The NoSQL movement () MySQL 1% PostgreSQL 0.5% 1 2 Oracle

MapReduce and Parallel DBMSs: Friends or Foes? Presented by Guozhang Wang DB Lunch, May 3 rd ,

(R)DBMSs today DMBS Approx. market share (Microsoft Access) Database Systems Oracle

COMP9315 18s2 DBMS Implementation ( Data structures and algorithms inside relational DBMSs )

Summary 1 Things you should know now: Basic ideas about databases and DBMSs What is a data

Analytical Query Processing Marco Serafini COMPSCI 532 Lecture 7 Announcement Midterm date

@andy_pavlo Part #1 Background Part # 2 Engineering Part # 3 Oracle Rant 3 AUTONOMOUS DBMSs

DATABASE SYSTEM IMPLEMENTATION GT 4420/6422 // SPRING 2019 // @JOY_ARULRAJ LECTURE #2: IN-MEMORY

dbms overview 1 Top-level Goals of DBMSs Provide solutions to data processing

CS 764: Topics in Database Management Systems Lecture 12: Parallel DBMSs Xiangyao Yu 10/14/2020

Database Systems Indexes DBMSs and NoSQL 1 Quiz! How costly is this operation (naive

CS 764: Topics in Database Management Systems Lecture 13: Distributed DBMSs Xiangyao Yu

@andy_pavlo Part #1 Background Part # 2 Engineering Part # 3 Oracle Rant 3 AUTONOMOUS DBMSs

What is the League Today 1 1/23/2017 What is the League Today What is the League Today 2

Social/Network/Analysis mohamed.bouguessa@uqo.ca/ 1 Web/today 2

Lecture 15 Logistics HW4 is due today HW5 posted today HW5 posted today Exam

WIEMANN LAMPHERE ARCHITECTS MONTPELIER TODAY MONTPELIER TODAY PARKING! VEHICLES ARE

Chapter 9 Cardinality Estimation How Many Rows Does a Query Yield? Cardinality Estimation

Take your Wor ord tem emplates s to o the he ne next xt le level! l! Doc ocuments

CS145: INTRODUCTION TO DATA MINING Text Data: Topic Model Instructor: Yizhou Sun

1 Automating Machine Learning and Deep Learning Workflows 2 Information Name: Mourad

Trivariate Density Revisited Steven E. Shreve Carnegie Mellon University Conference in

CLIC CLient-Informed Caching for Storage Servers Xin Liu Ashraf Aboulnaga Ken Salem Xuhui Li

Dr. Pat Selinger IBM Fellow and VP of Data Management Architecture and Technology IBM, USA DB2

SHRiMPS Status of soft interactions in SHERPA Holger Schulz (IPPP Durham) November 23, 2015

Sambuz

Useful Links

Newsletter

Mail Us