optimized for change architecture etsy
play

Optimized for change: Architecture @ Etsy Kellan Elliott-McCrea - PowerPoint PPT Presentation

Optimized for change: Architecture @ Etsy Kellan Elliott-McCrea @kellan CTO, Etsy Monday, June 18, 12 Monday, June 18, 12 Launched June 18, 2005 875,000 active sellers 33.5MM items for sale $65.9MM in sales, in May 1.4B page views, in May


  1. Optimized for change: Architecture @ Etsy Kellan Elliott-McCrea @kellan CTO, Etsy Monday, June 18, 12

  2. Monday, June 18, 12

  3. Launched June 18, 2005 875,000 active sellers 33.5MM items for sale $65.9MM in sales, in May 1.4B page views, in May 102 engineers 32 releases, last Friday Monday, June 18, 12

  4. LAMP any questions? 8BitLit, http://www.etsy.com/listing/90066890/ Monday, June 18, 12

  5. Why? Monday, June 18, 12

  6. 3 inevitabilities we design for: 1. Things break, unexpectedly 2. What we're building changes 3. We don't get to start over Monday, June 18, 12

  7. 2 years of change. Monday, June 18, 12

  8. Architectural Principles * Don't bet against the future. * Our customers are humans. * Simplicity always wins, in the end. * Favor global vs local optimization. * Ambiguity kills momentum. * Make failure cheap. * Technical debt is an inevitable by-product of shipping code. * Optimize for change. Monday, June 18, 12

  9. Cleverness Ckrickett, http://www.etsy.com/listing/90611466 Monday, June 18, 12

  10. Complex systems and change 1. Distributed systems are inherently complex. 2. The outcome of change in complex systems is hard to predict. 3. The outcome of small, frequent, measurable changes are easier to predict, easier to recover from, and promote learning. Ckrickett, http://www.etsy.com/listing/90611466 Monday, June 18, 12

  11. Continuous deployment, Metrics Driven Development, Blameless Post-Mortems Ckrickett, http://www.etsy.com/listing/90611466 Monday, June 18, 12

  12. Continuous deployment: Small, frequent changes to production Ckrickett, http://www.etsy.com/listing/90611466 Monday, June 18, 12

  13. Continuous Deployment: No branching. “All existing revision control systems were built by people who build installed software” - Paul Hammond, Always Ship Trunk, Velocity 2010 Thursday, March 17, 2011 Monday, June 18, 12

  14. Continuous Deployment: feature flags if ($cfg[‘awesome_new_search’]) { # new hotness $rsp = do_solr(); } else { # boring old stuff $rsp = do_grep(); } Monday, June 18, 12

  15. Continuous Deployment: Ramp - ups (on top of feature flags) 1. Launch to sta ff only 2. Launch to 1% of all users 3. Launch to members of a beta group Monday, June 18, 12

  16. Continuous Deployment: any engineer can launch a feature to 1% of users Monday, June 18, 12

  17. Continuous Deployment: ~200 experiments live right now Monday, June 18, 12

  18. Metrics driven development: introspection isn’t optional. measure everything, log everything Monday, June 18, 12

  19. Metrics driven development: Metrics happen when you make it easy. And visible. Monday, June 18, 12

  20. Metrics driven development: Teach computer to read graphs holtWintersConfidence(Upper|Lower) Monday, June 18, 12

  21. Metrics driven development: More info: http://www.slideshare.net/ mikebrittain/metricsdriven-engineering Monday, June 18, 12

  22. Optimize for MTTR, not MTBF Monday, June 18, 12

  23. How? Monday, June 18, 12

  24. Etsy Monday, June 18, 12

  25. Etsy EMR/S3 PCI BCP, Cold Monday, June 18, 12

  26. inbound request CDNs - diversified at the DNS level Internet providers - diversified at borders AWS Etsy network appliances analytics imstor etsystatic.com/ EMR S3 etsy.com/ bcn.etsy.com photos JRuby/ api.etsy.com Cascading Squid /atlas apache S3 apache apache logs PHP php php application logrotate MySQL imstor MySQL HDFS NFS search analytics memcache async http StatsD sqlite gearman logs PCI search mail out MySQL server/OS hardware Thrift SMTP dbindex Jetty X-Yarnblaster dbshards via jsonp, Solr slaves dbaux no privileged access datasets dbdata etc Solr master HBase sharded MySQL Monday, June 18, 12

  27. CDNs: Put a slider on it Just works via weighted DNS Monday, June 18, 12

  28. Apache * Well known * PHP is native * apache_note * fast start time * cheap in place replacement * .htaccess * Challenge: memory usage Monday, June 18, 12

  29. Apache: apache_note A i d n d t r i t o i v s e p ! e i c n t s i o a n n e t l y h r u o s u apache_note('etsy_uaid', $id); e g f h u l ! t h e l i f e c y c l e Monday, June 18, 12

  30. Apache: log format LogFormat "%{X-Forwarded-For}i % {True-Client-IP}i %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User- Agent}i\" % {etsy_shop_id}n % {etsy_uaid}n %V % {etsy_ab_selections}n % {etsy_request_uuid}n % {etsy_api_consumer_key}n % {etsy_api_method_name}n % {php_memory_usage_bytes}n % {php_time_microsec}n %D" combined Monday, June 18, 12

  31. Etsy: the App * 487,000 lines of PHP * 214,000 lines of Javascript * Monolithic codebase * 3 front ends, Etsy.com, API, Atlas Monday, June 18, 12

  32. Etsy: the App * routing handled by Apache * scripts fronting OO PHP5 * PHP, fast by default * opcode caching * Challenge: liveliness when calling services Monday, June 18, 12

  33. Etsy: coding patterns * light weight, home rolled “framework” * ORM handles DAO across backends * config and feature flags systems used everywhere * small slow moving datasets stored as PHP arrays * A/B tests * Smarty * StatsD * Concurrency * memcache Monday, June 18, 12

  34. Etsy: A/B tests * beaconed * inserted into logs via apache_note * conditionalized on feature flags * nightly reports on conversion, bounce rate, etc * nightly reports on page speed, memory usage, etc Monday, June 18, 12

  35. Etsy: Smarty * pre-compiled * pre-compiled per language Monday, June 18, 12

  36. Etsy: StatsD StatsD::increment("logins.success"); StatsD::timing("gearman.time", $msec); * 340,000 application metrics Monday, June 18, 12

  37. Etsy: Concurrency * no native concurrency in PHP * asynchronous HTTP calls * Gearman Monday, June 18, 12

  38. Etsy: Async HTTP calls * curl_multi_exec * non-blocking, per request time outs * used for optional aspects of a page * curl against http://localhost to avoid network overhead Monday, June 18, 12

  39. Etsy: Gearman * language agnostic job server * don’t use an MQ when you want a job server * 150 job types * persistent jobs flushed to MySQL, read from memory * non-persistent jobs just stored in memory * NP queue is wicked fast. Monday, June 18, 12

  40. Etsy: Gearman * scaling CPU of cron jobs * denormalizing data * pushing to 3rd party services Monday, June 18, 12

  41. Etsy: Challenges * Apache memory usage * liveliness talking to services, no concurrency, blocking by default Monday, June 18, 12

  42. Etsy: graph of distributed failure Monday, June 18, 12

  43. Etsy: Challenges * Apache memory usage * liveliness talking to services: no concurrency, blocking by default Enforce liveliness with a judicious application of force Monday, June 18, 12

  44. Etsy: judicious application of force list($v, $res, $shar) = @fopen(‘/proc/self/statm', 'r'); $mine = $res-$shar; if ($mine > $cfg[‘sizelimit’]) { $pid = getmypid(); @exec("kill -USR1 $pid"); } Monday, June 18, 12

  45. Etsy: judicious application of force Bowhunter * Find long running PHP processes * Try to avoid those mid-post open(APACHE, "/usr/bin/curl -s http://localhost/server- status|") || die "$!"; Monday, June 18, 12

  46. Etsy: judicious application of force Query_killer * Same idea, long running queries * MySQL “SHOW PROCESSLIST();” Monday, June 18, 12

  47. Memcache * Caching, obviously * Cache invalidation is hard * Write bu ff ering * multi_get * rate limits Monday, June 18, 12

  48. Memcache * atomic INCR is awesome * slice your time windows to reduce risk of cache eviction * we’ve been unlucky, lots of segfaults :( * multi_get slows down the more boxes in the pool Monday, June 18, 12

  49. MySQL: By the numbers * 25K+queries/sec avg * 3TB InnoDB bu ff er pool * 15TB + data stored * 50 servers * 99.99% queries under 1ms Monday, June 18, 12

  50. MySQL: a NotMuchSQL server * no joins * no foreign keys * no transactions or locks * no sub-selects * store data like you want to read it. * also: no auto_increment Monday, June 18, 12

  51. MySQL: a NotMuchSQL server “Normalization is for sissie.” - Cal Henderson, Flickr Monday, June 18, 12

  52. MySQL: scale horizontally * objects shared by key * lookups maintained in dbindex (MySQL is a FAST key-value store) * avoid key hashing, range partitions, and partitioning functions more: http://www.slideshare.net/jgoulah/the-etsy-shard-architecture-starts-with-s-and-ends-with-hard Monday, June 18, 12

  53. MySQL: Master-Master * objects hashed to a side, avoid split brain * allows in place schema upgrades without slave promotion * simplified capacity planning more: http://codeascraft.etsy.com/2012/04/20/two-sides-for-salvation/ Monday, June 18, 12

Recommend


More recommend