You build it, you run it Matthias Rampke, SoundCloud
You build it, you run it Operating SoundCloud's microservice architecture GOTO Berlin 2016
Intro: me Who I am and where I work Engineer in Production Engineering (platform, monitoring, availability) previously in Systems Engineering (ops remnant catch-all)
Intro: SoundCloud Who I am and where I work a cloud full of sounds 135M tracks, 12M artists, 175M listeners 300+ employees no ops team
Intro: Agenda Where we came from Where we are today Why we did it How you can do it How does this compare to…?
⋁ Where we came from ⋁
In the beginning … the early days One team One table One codebase
2009/2010 growing pains 20-50 engineers hired an ops team, 24/7 on-call app team deploys the monolith first separate "micro"services
2011/2012 the fork in the road more microservices deployment platform SRE/platforms team multiple on-call rotations
2013-2015 maturing cambrian explosion of microservices feature teams and collectives client specific APIs shared components & libraries continuous delivery
⋁ Where we are today ⋁
Org chart simplified
Ownership You buildown it, you run it every feature • service • codebase is owned by a team
On Call owners are on call for what they own groups of teams work together to reduce load remove alerts • write documentation
Shared Components avoid shared infrastructure be flexible don't duplicate work
Production Engineering run the systems that run systems monitoring & availability internal consulting
⋁ Why we did it ⋁
Delivery get more done, consistently autonomy predictability velocity
Personal growth learn something new every day no pure specialists internal mobility
Better systems simple resilient operable
⋁ How you can do it ⋁
Prerequisites basic automation openness pride trust
Expanding ownership testing & deployment on-call provisioning dependencies
Checks & Balances internal moves escalation paths documentation tooling
Postmortems learn improve commiserate
⋁ How does this compare to …? ⋁
Site Reliability Engineering as Google describes it no assignment to SWE teams no on-call handoff no deploy blocks
Radical agility as Zalando describe it more shared code more communication infrastructure & core teams
DevOps as described by Etsy no Ops team less shared infrastructure less standardization deploys spread in a different dimension
Slides: https://bit.ly/gotober16-sc Please rate! . soundcloud.com Berlin New York San Francisco London
Recommend
More recommend