flickr.com/96dpi flickr.com/96dpi I Hate Your Database I Hate Your Database Andrew Godwin Andrew Godwin @andrewgodwin @andrewgodwin
flickr.com/parkerblohm flickr.com/parkerblohm Hate? Databases? Hate? Databases?
· Misuse · Misuse · Ignorance · Ignorance · Lies · Lies
flickr.com/maistora flickr.com/maistora Different databases, Different databases, different occasions different occasions
Graph Graph Relational Relational Object / Hierarchial Object / Hierarchial Document Document Spatial Spatial Key-Value Time-series / RRD Key-Value Time-series / RRD Search Search
PostgreSQL PostgreSQL PostgreSQL PostgreSQL Relational Relational MySQL MySQL MySQL MySQL SQLite SQLite SQLite SQLite MongoDB MongoDB Document Document CouchDB CouchDB Redis Redis Key-value Key-value Cassandra Cassandra Riak Riak
flickr.com/iamdabe flickr.com/iamdabe Some Quick Theory Some Quick Theory
Atomicity tomicity A Consistency onsistency C Isolation solation I Durability urability D
Consistency onsistency C Availability vailability A Partition Tolerance artition Tolerance P
flickr.com/maistora flickr.com/maistora Relational Databases Relational Databases
Common Pitfalls Common Pitfalls SELECT item1, item2, item3 FROM basket; SELECT item1, item2, item3 FROM basket; INSERT INTO groups (name, people) INSERT INTO groups (name, people) VALUES ('friends', 'aaron,betty,chris,damien') VALUES ('friends', 'aaron,betty,chris,damien') SELECT file_bytes FROM pictures WHERE filename = SELECT file_bytes FROM pictures WHERE filename = 'foo.jpg' 'foo.jpg'
names = set() names = set() for book in Book.objects.filter(year=2012): for book in Book.objects.filter(year=2012): names.add(book.author.name) names.add(book.author.name)
names = Author.objects.filter( names = Author.objects.filter( books__year=2012 books__year=2012 ).values_list('name', flat=True).distinct() ).values_list('name', flat=True).distinct()
MySQL MySQL No transactional DDL No transactional DDL Poor query optimiser Poor query optimiser MyISAM: Full-table locking, no transactions MyISAM: Full-table locking, no transactions Oracle Oracle Very fast for some operations Very fast for some operations
SQLite SQLite Little integrity checking (slowly being fixed) Little integrity checking (slowly being fixed) Impossible to do some table alterations Impossible to do some table alterations No concurrent access No concurrent access Very low overhead Very low overhead Very portable Very portable
PostgreSQL PostgreSQL Slow default configuration Slow default configuration Can be a little harder to learn / less familiar Can be a little harder to learn / less familiar Almost too many features Almost too many features Incredibly reliable Incredibly reliable
flickr.com/babyowls flickr.com/babyowls Document Databases Document Databases
db.insert({ db.insert({ 'name': 'Sally', 'name': 'Sally', 'tags': ['django', 'python', 'search'], 'tags': ['django', 'python', 'search'], 'addresses': [ 'addresses': [ {'type': 'jabber', 'jid': 'sally@eg.com'}, {'type': 'jabber', 'jid': 'sally@eg.com'}, {'type': 'phone', 'number': '011899981199'}, {'type': 'phone', 'number': '011899981199'}, ], ], }) }) db.find({ db.find({ 'tags': 'python', 'tags': 'python', 'addresses.type': 'jabber', 'addresses.type': 'jabber', }) })
Advantages Advantages No fixed schema No fixed schema Low barrier to entry Low barrier to entry Closer to Python datatypes Closer to Python datatypes
Problems Problems Immature (but improving) Immature (but improving) No transactions No transactions No integrity checking No integrity checking
flickr.com/zebble flickr.com/zebble Key-value stores Key-value stores
db.set('foo', 'bar) db.set('foo', 'bar) x = db.get('foo') x = db.get('foo') db.sadd('names', 'andrew') db.sadd('names', 'andrew') db.sadd('names', 'brian') db.sadd('names', 'brian') y = db.scard('names') y = db.scard('names')
Traits Traits Horizontal scaling (but with drawbacks) Horizontal scaling (but with drawbacks) Extremely fast Extremely fast Can only fetch objects by key Can only fetch objects by key Batch/map-reduce queries Batch/map-reduce queries Transactions not possible Transactions not possible
flickr.com/tusnelda flickr.com/tusnelda Other database types Other database types
Spatial Spatial Knowledge of projections useful Knowledge of projections useful Spatial indexes really speed up some problems Spatial indexes really speed up some problems Generally add-on to existing DB Generally add-on to existing DB
Filesystems Filesystems Hierarchial key-value store Hierarchial key-value store Allows multiple writers for appends Allows multiple writers for appends Supports very large files Supports very large files
Graph Databases Graph Databases Allow efficient neighbour queries Allow efficient neighbour queries Generally not much use for anything else Generally not much use for anything else
Round-Robin Database Round-Robin Database Deliberately loses old data Deliberately loses old data Useful for logging or statistics Useful for logging or statistics
flickr.com/pagedooley flickr.com/pagedooley Final Thoughts Final Thoughts
It's unlikely your data all It's unlikely your data all fits in one paradigm. fits in one paradigm.
Just buying bigger servers Just buying bigger servers goes a long way goes a long way
If it sounds too good to be true, If it sounds too good to be true, it probably is. it probably is.
flickr.com/oimax flickr.com/oimax Fin. Fin. Andrew Godwin / @andrewgodwin Andrew Godwin / @andrewgodwin
Recommend
More recommend