  10. P R E S L AV L E • At Dropbox since 2013 • Projects: Magic Pocket, Infrastructure Performance, Tra ffi c team

  13. 2 0 1 2 Dropbox’s datacenters AWS Memcached DB S3 Memcached DB Memcached DB metaserver async metaserver blockserver async processing metaserver blockserver notification async processing blockserver server processing nginx nginx nginx nginx LB LB clients

  14. B LO C K DATA I N S 3 Dropbox’s datacenters AWS AWS Memcached DB S3 Memcached DB Memcached DB metaserver async metaserver blockserver async processing metaserver blockserver notification async processing blockserver server processing nginx nginx nginx nginx LB LB clients

  15. M E TA DATA I N M Y S Q L Dropbox’s datacenters Dropbox’s datacenters AWS Memcached DB S3 Memcached DB Memcached DB metaserver async metaserver blockserver async processing metaserver blockserver notification async processing blockserver server processing nginx nginx nginx nginx LB LB clients

  16. 1 . F E TC H M E TA DATA Dropbox’s datacenters AWS Memcached DB S3 Memcached DB Memcached Memcached DB DB metaserver async metaserver blockserver async processing metaserver metaserver blockserver notification async processing blockserver server processing nginx nginx nginx nginx LB LB LB clients clients

  17. 2 . D OW N LOA D B LO C K S Dropbox’s datacenters AWS Memcached DB S3 S3 Memcached DB Memcached DB metaserver async metaserver blockserver async processing metaserver blockserver notification async processing blockserver blockserver server processing nginx nginx nginx nginx LB LB LB LB clients clients

  18. 3 . WA I T F O R N OT I F I C AT I O N S Dropbox’s datacenters AWS Memcached DB S3 Memcached DB Memcached DB metaserver async metaserver blockserver async processing metaserver metaserver blockserver notification notification async processing blockserver server server processing nginx nginx nginx nginx LB LB clients clients

  19. P Y T H O N E V E R Y W H E R E Dropbox’s datacenters AWS Memcached DB S3 Memcached DB Memcached DB metaserver async metaserver blockserver async processing metaserver blockserver notification async processing blockserver server processing nginx nginx nginx nginx LB LB clients

  20. C LU S T E R I S O L AT I O N meta-client meta-client meta-api meta-mobile meta-client meta-client meta-api meta-mobile meta-client meta-web meta-api meta-mobile Dropbox’s datacenters

  21. Scaling Databases Scaling as Organization Scaling Software Managing Complexity

  22. S C A L I N G DATA BA S E S shard0 shard0 shard1 mysql shard1 mysql shardN shrardN replica replica replica replica replica replica replica replica … shard0 shard1 mysql shardN master master master master Memcached metaserver Memcached Memcached

  23. H O R I ZO N TA L S C A L I N G shard0 shard0 shard1 shard1 shardN shrardN replica replica replica replica replica replica … shard0 shard1 shardN master master master … metaserver metaserver metaserver metaserver metaserver metaserver

  24. CO N N E C T I O N S shard0 shard0 shard1 shard1 shardN shrardN replica replica replica replica replica replica … shard0 shard1 shardN master master master … metaserver metaserver metaserver metaserver metaserver metaserver

  25. S Q L P R OX Y shard0 shard0 shard1 shard1 shardN shrardN replica replica replica replica replica replica … shard0 shard1 shardN master master master SQL Proxy SQL Proxy SQL Proxy … metaserver metaserver metaserver metaserver metaserver metaserver

  28. AVA I L A B I L I T Y I S S U E S

  29. P L AY B O O K 1. Check for ongoing deployments or newly enabled features

  30. P L AY B O O K 1. Check for ongoing deployments or newly enabled features 2. Check for recently started background jobs

  31. P L AY B O O K 1. Check for ongoing deployments or newly enabled features 2. Check for recently started background jobs 3. DBA oncall, please help!

  32. Dropbox grew from 100 to 500 employees

  33. • Slow queries would adversely impact performance across the board

  34. • Slow queries would adversely impact performance across the board • More features => Managing more independent MySQL

  35. • Slow queries would adversely impact performance across the board • More features => Managing more independent MySQL • Reactively (re)sharding individual databases as they hit capacity

  36. • Slow queries would adversely impact performance across the board • More features => Managing more independent MySQL • Reactively (re)sharding individual databases as they hit capacity • Impacted developer productivity

  37. S C A L A B L E M E TA DATA S TO R E D E S I G N E D F O R M U LT I -T E N A N C Y 2013 — Present

  38. S H A R D I N G A N D C AC H I N G B E H I N D T H E S C E N E S

  39. E N T I T I E S A N D A S S O C I AT I O N S

  40. F I R S T G O S E R V I C E

  41. Scaling Databases Scaling as Organization Scaling Software Managing Complexity

  42. P E R F E C T S TO R M

  43. S H A R D I N G

  44. P H OTO A L B U M S

  45. T E A M A D M I N CO N S O L E

  46. R E Q U E S T FA N O U T request

  47. G LO BA L I D 8 bytes 8 bytes Colocation ID Counter • Colocation ID: Identi fi es a shard • Counter: Unique ID within the shard

  48. Lack of colocation also hurts performance

  49. N E W S E R V I C E : F I L E J O U R N A L shard0 shard0 shard1 shard1 shardN shrardN replica replica replica replica replica replica … shard0 shard1 shardN master master master … File Journal File Journal File Journal … metaserver metaserver metaserver metaserver metaserver metaserver

  50. S H A R D FA I LU R E shard0 shard0 shard1 shard1 shardN shrardN replica replica replica replica replica replica … shard0 shard1 shard1 shardN master master master master … File Journal File Journal File Journal … metaserver metaserver metaserver metaserver metaserver metaserver

  51. S H A R D I N G ( PA R T I I )

  52. LO N G T I M E O U T S shard0 shard0 shard1 shard1 shardN shrardN replica replica replica replica replica replica … shard0 shard1 shard1 shardN master master master master … File Journal File Journal File Journal … metaserver metaserver metaserver metaserver metaserver metaserver

  53. R U N O U T O F W O R K E R S shard0 shard0 shard1 shard1 shardN shrardN replica replica replica replica replica replica … shard0 shard1 shard1 shardN master master master master … File Journal File Journal File Journal File Journal File Journal File Journal … metaserver metaserver metaserver metaserver metaserver metaserver

  54. C A S C A D I N G FA I LU R E shard0 shard0 shard1 shard1 shardN shrardN replica replica replica replica replica replica … shard0 shard1 shard1 shardN master master master master … File Journal File Journal File Journal File Journal File Journal File Journal … metaserver metaserver metaserver metaserver metaserver metaserver metaserver metaserver metaserver metaserver metaserver metaserver

  55. S H A R D I S O L AT I O N Limit resources dedicated to processing a single shard

  56. Scaling Databases Scaling as Organization Scaling Software Managing Complexity

  57. M AG I C P O C K E T B LO C K S TO R AG E S Y S T E M 500PB+ user block data 3+ geographic regions 500+ million users

  58. put Zone Zone (west) (east) get put put get get Zone (central)

  59. complicated! complicated! complicated! complicated! ☹ ☹ ☹ ☹ simple ☺ complicated! complicated! ☹ ☹

  62. 2 0 1 6 Magic Magic Pocket File Journal Magic … Pocket File Journal Pocket Auth Cape Auth Edgestore Blockservice Block File Journal Search Auth Block Routing File Journal Search Auth Block Auth Routing File Journal Search Riviera Routing service Riviera Thumbnail Auth service Auth Presence &Notications meta-client meta-client meta-api meta-mobile blockserver meta-client meta-client meta-api meta-mobile blockserver meta-client meta-web meta-api meta-mobile blockserver


