D i s t r i b u t e d S t o r a g e S y s t e m s John Leach john@brightbox.com twitter @johnleach Brightbox Cloud http://brightbox.com
O u r r e q u i r e m e n t s ● Bright box has multiple zones (data centres) ● Should tolerate a zone failure ● Scale smoothly as data size grows ● Should use exciting unproven technology ● Libre software license
B r i e f h i s t o r y o f f i l e a c c e s s
S c a l i n g N F S : O n e d i s k Clients Filesystem Server disk
S c a l i n g N F S : R A I D Clients Filesystem Server Redundant Array of Inexpensive Disks
S c a l i n g N F S : S A N Clients Filesystem Server Redundant Array of Inexpensive Disks in a NRSES (not redundant singular expensive SAN)
S c a l i n g N F S : S h a r e d d i s k f s Clients Filesystem Filesystem Filesystem Filesystem Filesystem Filesystem Server Server Server Server Server Server GFS or OCFS or ... Redundant Array of Inexpensive Disks in a NRSES (not redundant singular expensive SAN)
S h a r e d d i s k f s : R e p l i c a t i o n Clients Filesystem Filesystem Filesystem Filesystem Filesystem Filesystem Server Server Server Server Server Server GFS or OCFS or ... Clustered LVM with mirroring Redundant Array of Inexpensive Disks Redundant Array of Inexpensive Disks in a not redundant singular expensive SAN in a not redundant singular expensive SAN
S h a r e d d i s k f s : R e p l i c a t i o n Clients Filesystem Filesystem Filesystem Filesystem Filesystem Filesystem Server Server Server Server Server Server GFS or OCFS or ... Redundant Array of Inexpensive Disks in a not redundant singular more expensive SAN
S h a r e d d i s k f s : R e p l i c a t i o n Clients Filesystem Filesystem Filesystem Filesystem Filesystem Filesystem Server Server Server Server Server Server GFS or OCFS or ... Clustered LVM with mirroring Redundant Array of Inexpensive Disks Redundant Array of Inexpensive Disks in a not redundant singular more expensive SAN in a not redundant singular more expensive SAN
O l d t e c h n i q u e s ● Hot or warm standby servers ● Expensive SAN hardware ● Shared block devices ● Moving IP addresses ● Server side replication ● Scales mostly vertically ● Manual partitioning to scale horizontally
N e w t e c h n i q u e s ● Shared nothing ● Clever clients ● Automatic partitioning ● Automatic replication ● Clever stuff: DHT, Vector clocks, PAXOS, Mapreduce, Merkle trees, Unicorn hooves ● POSIX
N e w P r o b l e m s ● Locating your data ● Ensuring consistency ● Something has to give
B r e w e r s C A P t h e o r e m ● Consistency ● Availability ● Partition tolerance
G l u s t e r F S Clients Storage Cluster
H a d o o p F i l e S y s t e m Clients Name node Storage Cluster
H a d o o p F i l e S y s t e m ● Hot failover patches in Feb ● Batch processing, not interactive ● High throughput, not low latency ● Map Reduce ● Namenode SPOF ● Multi-data centre ● Consistent
M o n g o D B ● Document store, dynamic schema ● Async replication ● Primary server for writes ● Automatic sharding ● Map Reduce ● GridFS for large files ● Multi-datacentre, but not partition tolerant ● Mostly consistent
M o n g o D B
O p e n s t a c k S w i f t Clients proxies Storage Cluster
O p e n s t a c k S w i f t
C a s s a n d r a ● P2P, DHT, Gossip, Hinted Handoff ● Column orientated. Data ordered. ● Design schema for types of queries ● Very fast highly available writing ● Per request consistency. Multi-data centre ● Thrift API
R i a k ● Key value store. ● DHT, Gossip, Vector Clocks ● Map reduce ● Luwak for large files
Z o o k e e p e r ● PAXOS like consensus protocol ● Read scales up with more servers ● Writes slow down with more servers ● Always consistent ● In-memory ● Strict ordering ● Small data
C e p h ● Object store ● Full POSIX file system on top ● PAXOS for cluster state ● CRUSH rather than DHT ● Multi-datacenter. ● Strongly consistent, not partition tolerant ● RBD, S3-alike, plus POSIX
C e p h Monitor Cluster Metadata Cluster Clients Storage Cluster
D i s t r i b u t e d S t o r a g e S y s t e m s John Leach john@brightbox.com twitter @johnleach Brightbox Cloud http://brightbox.com
Recommend
More recommend