d i s t r i b u t e d s t o r a g e s y s t e m s
play

D i s t r i b u t e d S t o r a g e S y s t e - PowerPoint PPT Presentation

D i s t r i b u t e d S t o r a g e S y s t e m s John Leach john@brightbox.com twitter @johnleach Brightbox Cloud http://brightbox.com O u r r e q u i r e m e n t s Bright box has multiple zones (data


  1. D i s t r i b u t e d S t o r a g e S y s t e m s John Leach john@brightbox.com twitter @johnleach Brightbox Cloud http://brightbox.com

  2. O u r r e q u i r e m e n t s ● Bright box has multiple zones (data centres) ● Should tolerate a zone failure ● Scale smoothly as data size grows ● Should use exciting unproven technology ● Libre software license

  3. B r i e f h i s t o r y o f f i l e a c c e s s

  4. S c a l i n g N F S : O n e d i s k Clients Filesystem Server disk

  5. S c a l i n g N F S : R A I D Clients Filesystem Server Redundant Array of Inexpensive Disks

  6. S c a l i n g N F S : S A N Clients Filesystem Server Redundant Array of Inexpensive Disks in a NRSES (not redundant singular expensive SAN)

  7. S c a l i n g N F S : S h a r e d d i s k f s Clients Filesystem Filesystem Filesystem Filesystem Filesystem Filesystem Server Server Server Server Server Server GFS or OCFS or ... Redundant Array of Inexpensive Disks in a NRSES (not redundant singular expensive SAN)

  8. S h a r e d d i s k f s : R e p l i c a t i o n Clients Filesystem Filesystem Filesystem Filesystem Filesystem Filesystem Server Server Server Server Server Server GFS or OCFS or ... Clustered LVM with mirroring Redundant Array of Inexpensive Disks Redundant Array of Inexpensive Disks in a not redundant singular expensive SAN in a not redundant singular expensive SAN

  9. S h a r e d d i s k f s : R e p l i c a t i o n Clients Filesystem Filesystem Filesystem Filesystem Filesystem Filesystem Server Server Server Server Server Server GFS or OCFS or ... Redundant Array of Inexpensive Disks in a not redundant singular more expensive SAN

  10. S h a r e d d i s k f s : R e p l i c a t i o n Clients Filesystem Filesystem Filesystem Filesystem Filesystem Filesystem Server Server Server Server Server Server GFS or OCFS or ... Clustered LVM with mirroring Redundant Array of Inexpensive Disks Redundant Array of Inexpensive Disks in a not redundant singular more expensive SAN in a not redundant singular more expensive SAN

  11. O l d t e c h n i q u e s ● Hot or warm standby servers ● Expensive SAN hardware ● Shared block devices ● Moving IP addresses ● Server side replication ● Scales mostly vertically ● Manual partitioning to scale horizontally

  12. N e w t e c h n i q u e s ● Shared nothing ● Clever clients ● Automatic partitioning ● Automatic replication ● Clever stuff: DHT, Vector clocks, PAXOS, Mapreduce, Merkle trees, Unicorn hooves ● POSIX

  13. N e w P r o b l e m s ● Locating your data ● Ensuring consistency ● Something has to give

  14. B r e w e r s C A P t h e o r e m ● Consistency ● Availability ● Partition tolerance

  15. G l u s t e r F S Clients Storage Cluster

  16. H a d o o p F i l e S y s t e m Clients Name node Storage Cluster

  17. H a d o o p F i l e S y s t e m ● Hot failover patches in Feb ● Batch processing, not interactive ● High throughput, not low latency ● Map Reduce ● Namenode SPOF ● Multi-data centre ● Consistent

  18. M o n g o D B ● Document store, dynamic schema ● Async replication ● Primary server for writes ● Automatic sharding ● Map Reduce ● GridFS for large files ● Multi-datacentre, but not partition tolerant ● Mostly consistent

  19. M o n g o D B

  20. O p e n s t a c k S w i f t Clients proxies Storage Cluster

  21. O p e n s t a c k S w i f t

  22. C a s s a n d r a ● P2P, DHT, Gossip, Hinted Handoff ● Column orientated. Data ordered. ● Design schema for types of queries ● Very fast highly available writing ● Per request consistency. Multi-data centre ● Thrift API

  23. R i a k ● Key value store. ● DHT, Gossip, Vector Clocks ● Map reduce ● Luwak for large files

  24. Z o o k e e p e r ● PAXOS like consensus protocol ● Read scales up with more servers ● Writes slow down with more servers ● Always consistent ● In-memory ● Strict ordering ● Small data

  25. C e p h ● Object store ● Full POSIX file system on top ● PAXOS for cluster state ● CRUSH rather than DHT ● Multi-datacenter. ● Strongly consistent, not partition tolerant ● RBD, S3-alike, plus POSIX

  26. C e p h Monitor Cluster Metadata Cluster Clients Storage Cluster

  27. D i s t r i b u t e d S t o r a g e S y s t e m s John Leach john@brightbox.com twitter @johnleach Brightbox Cloud http://brightbox.com

Recommend


More recommend