Scaling Slack Infrastructure Julia Grace Senior Director of - PowerPoint PPT Presentation

Scaling Slack Infrastructure 🚁 Julia Grace Senior Director of Engineering @jewelia @jewelia

Phase 0: 2015 @jewelia

~2.5M Daily Active Users @jewelia

~4M Daily Active Users @jewelia

Phase 1: 2016 Slack was originally designed for teams < 150ppl. You make very different architectural decisions when you’re building for a team of 100 people vs 500,000. Before August 2016 we had no Infra team. Original infrastructure built for Glitch worked very well in 2014/2015. ~150 Engineers total. Infrastructure investments would come secondary to feature work. @jewelia

  Things were starting to break in strange, unusual ways. @jewelia

Phase 1: 2016 Example: User Presence Green dot indicaFng online/away/offline. Very few people noFce it, unless it’s broken (people expect it to “just work”). Apps and bots are always online. @jewelia

Phase 1: 2016 User Presence IniFally broadcast all changes to all users (e.g. “Julia Grace is away”) to the whole workspace: O(n^2). Presence was ~80% of all web socket traffic. Peak volume in late 2016: 16 million messages/minute over web socket. Presence messages: 13 million/minute. Rapidly transiFon from broadcast to publish/subscribe. @jewelia

  There were many organizational challenges as well. @jewelia

Phase 1: 2016 How to build engineering-led org in a product- led company? Would we be able to get headcount, budget? How to communicate the value of we are doing to non-technical audiences? How do we interface with sales? Infrastructure as a compeFFve advantage. @jewelia

@jewelia hUps:/ /www.flickr.com/photos/pocheco/14833391966

Phase 1: 2016 Start internal evangelism on day #1. I went on an internal PR campaign: Why our work was important, why we needed to conFnually invest in infrastructure. Make work very visible to execs in other funcFons. Followed existing company process. We did planning, status reporFng, etc. at the same cadence and in the same meeFngs as product engineering. Don’t try to start a new group and invent new process. Identify executive sponsor. @jewelia

Phase 2: 2017 Technology landscape. Hack/PHP monolith on backend, JavaScript with no libraries on frontend. 1 service: presence and real-Fme messaging. Building a second service: Go caching service. These bespoke services each had to handle rate limiFng, traffic management, deployment. @jewelia

  Phase 2: 2017 It was time to change our DB sharding strategy. MySQL sharded by team/workspace to Vitess sharded by various keys. Worked great! UnFl we hit scaling limits, significant hotspots. @jewelia

Monolith Service Service A B @jewelia

Monolith Who owns this? Service Service A B @jewelia

Communication Risk The more technically complex, nuanced a problem is… @jewelia

Communication Risk The more technically complex, nuanced a problem is… The higher the communication risk. @jewelia

Phase 2: 2017 Immense pressure to hire engineers. Many human SPOFs (single points of failure) because team was so small. Everyone was overextended and overcommihed. We had to figure out how to hire Infra engineers. All our hiring processes were opFmized around hiring generalists: frontend backend, iOS, Android, Ops. We skills do we need and value? How do we test for those skills? @jewelia

Phase 2: 2017 Decided to hire Infra engineering generalists. Created a take home coding exercise designed to test: 1. An understanding of servers, networking, and protocols. 2. An understanding of concurrency, performance, and resource constraints, and an ability to anFcipate future issues and implement soluFons. 3. An ability to write clear, easy to understand code, communicate your approach, and reason about tradeoffs that you have made. @jewelia

Phase 2: 2017 I wore so many hats. Too many hats. Similar to my days as a startup CTO! I was the Engineering Director and Forming strategy, hiring managers and ICs, evangelizing the org. …Product Manager and Internal interface to Product Engineering/PMs building features, externally to customers with quesFons about the integrity of our infrastructure. …Program Manager. Running cross funcFonal iniFaFves. @jewelia

Phase 3: 2018 “0 to 1” was over. Now time for “1 to ∞ ” . ReacFve to ProacFve. Transition from few teams to an org in 3 offices. Team nearly 100 engineers by end of year. Now included Data, Machine Learning, Search Infrastructure Many orders of magnitude better performance Things were not breaking all the time. @jewelia

Phase 3: 2018 Services model matured significantly. SLAs for services, consistent deployment processes, etc. Mature incident response process. Dividing into sub-teams made sense. Data Stores & Cache Infra, Service Mesh & Web Serving, Distributed Messaging. @jewelia

Phase 3: 2018 Hired Director Specialists… Had to quickly learn how to hire senior leaders whose jobs you haven’t done before. How to do this well: talk to a lot people who currently do the job you’re trying to hire for, deeply understand the talent market. and Product Managers… and did an acquisition. @jewelia

Phase 3: 2018 Challenge: coherency across a large organization. Example: overlap between Machine Learning and Frontend Infra was NULL . Difficult to have a unified vision. Stakeholders were each org were different for each part of the org; Data Infra organizaFon worked closely with G&A (finance), Search Infra did not. I should have done more re-orgs! @jewelia

  2016: @jewelia

2016: 2017: @jewelia

2016: 2017: 2018: @jewelia

Today Infra has been around for ~3 years 400M async jobs processed/day to 2.5B 3M DAU (daily active users) to 10M DAU 1M simultaneously connected users to 7.5M 10 to ~100 engineers in SF, NYC, YVR Generalist (ICs, Managers) to specialists 1 amazing team @jewelia

Thank You! @jewelia

Scaling Slack Infrastructure Julia Grace Senior Director of - PowerPoint PPT Presentation

Scaling Slack Infrastructure Julia Grace Senior Director of Engineering @jewelia @jewelia Phase 0: 2015 @jewelia ~2.5M Daily Active Users @jewelia Phase 1: 2016 @jewelia ~4M Daily Active Users @jewelia Phase 1: 2016 Slack was

How Slack Works Keith Adams kma@slack-corp.com @keithmadams facebook.com/kma What is Slack?

Scaling Slack Bing Wei Infrastructure@Slack 2 3 Our Mission: To make peoples working

Presto Summit NYC 2019 December 11, 2019 Slack handles: @cheolsoo; @abhonsule slack-corp.com

Slack and Lateness D i i t a i s i f i d i R R i slack = d slack i = d i - f i f D i

Scaling Slack The Good, The Unexpected, and The Road Ahead Michael Demmer mdemmer@slack-corp.com

Outline Scaling Scalinga Plenitude of Power Laws Scaling-at-large Scaling-at-large

UP UP AND OUT: SCALING SOFTWARE WITH AKKA Jonas Bonr CTO Typesafe @jboner Scaling software

SLACK BASICS Introducing Slack Group messaging system with the persistence,

Meeting 98 // Virtual Machines // If Youre New! Join our Slack: cyberatuc.slack.com SIGN

Meeting 97 // Fall 2019 Briefing // If Youre New! Join our Slack: cyberatuc.slack.com

Reliable Events Pipeline 1 No data, No problem!!! -Jackson Argo, Slack No data, No problem!!!

Analysis of Scaling Algorithms for Matrix & Operator Scaling Contents Scaling Algorithms

Jira for Slack Overview De fi ning and improving how Atlassians products operate within the

Cyber@UC Meeting 40 CEH Networking If Youre New! Join our Slack ucyber.slack.com SIGN

Cyber@UC Meeting 66 Welcome New Members! If Youre New! Join our Slack: ucyber.slack.com

Cyber@UC Meeting 77 Magical Goats If Youre New! Join our Slack: cyberatuc.slack.com Check

Investor Presentation Daniel Rabie (CEO) and Paul Haworth (CFO) Document Management And

AMS Final Budget Fiscal Year 2015- 2016 Overview Final Budget Summary Process and

Congressional Budget Office September 24, 2014 Measuring Labor Market Slack Presentation at the

Agenda 1. Roll Call, Agenda Overview and Introductions 2. Public Comment 3. Groundwater

FOR MICHIGANS POPULATION, ECONOMY ECONOMY, , AND AND W WORK? ORK? Jason son S S. .

Gr and Challenge I Enhance the sustainability, competitiveness, and profitability of U.S. food

Revenue and Budget Update A Briefing for the Senate Finance Committee Richard D. Brown

21/01/2015 AOHS presentation to Budget Committee 1 AOHS Mission To be responsible advocates

Sambuz

Useful Links

Newsletter

Mail Us

Scaling Slack Infrastructure Julia Grace Senior Director of - PowerPoint PPT Presentation

Scaling Slack Infrastructure Julia Grace Senior Director of Engineering @jewelia @jewelia Phase 0: 2015 @jewelia ~2.5M Daily Active Users @jewelia Phase 1: 2016 @jewelia ~4M Daily Active Users @jewelia Phase 1: 2016 Slack was

How Slack Works Keith Adams kma@slack-corp.com @keithmadams facebook.com/kma What is Slack?

Scaling Slack Bing Wei Infrastructure@Slack 2 3 Our Mission: To make peoples working

Presto Summit NYC 2019 December 11, 2019 Slack handles: @cheolsoo; @abhonsule slack-corp.com

Slack and Lateness D i i t a i s i f i d i R R i slack = d slack i = d i - f i f D i

Scaling Slack The Good, The Unexpected, and The Road Ahead Michael Demmer mdemmer@slack-corp.com

Outline Scaling Scalinga Plenitude of Power Laws Scaling-at-large Scaling-at-large

UP UP AND OUT: SCALING SOFTWARE WITH AKKA Jonas Bonr CTO Typesafe @jboner Scaling software

SLACK BASICS Introducing Slack Group messaging system with the persistence,

Meeting 98 // Virtual Machines // If Youre New! Join our Slack: cyberatuc.slack.com SIGN

Meeting 97 // Fall 2019 Briefing // If Youre New! Join our Slack: cyberatuc.slack.com

Reliable Events Pipeline 1 No data, No problem!!! -Jackson Argo, Slack No data, No problem!!!

Analysis of Scaling Algorithms for Matrix &amp; Operator Scaling Contents Scaling Algorithms

Jira for Slack Overview De fi ning and improving how Atlassians products operate within the

Cyber@UC Meeting 40 CEH Networking If Youre New! Join our Slack ucyber.slack.com SIGN

Cyber@UC Meeting 66 Welcome New Members! If Youre New! Join our Slack: ucyber.slack.com

Cyber@UC Meeting 77 Magical Goats If Youre New! Join our Slack: cyberatuc.slack.com Check

Investor Presentation Daniel Rabie (CEO) and Paul Haworth (CFO) Document Management And

AMS Final Budget Fiscal Year 2015- 2016 Overview Final Budget Summary Process and

Congressional Budget Office September 24, 2014 Measuring Labor Market Slack Presentation at the

Agenda 1. Roll Call, Agenda Overview and Introductions 2. Public Comment 3. Groundwater

FOR MICHIGANS POPULATION, ECONOMY ECONOMY, , AND AND W WORK? ORK? Jason son S S. .

Gr and Challenge I Enhance the sustainability, competitiveness, and profitability of U.S. food

Revenue and Budget Update A Briefing for the Senate Finance Committee Richard D. Brown

21/01/2015 AOHS presentation to Budget Committee 1 AOHS Mission To be responsible advocates

Sambuz

Useful Links

Newsletter

Mail Us

Analysis of Scaling Algorithms for Matrix & Operator Scaling Contents Scaling Algorithms