Practices Building Resilient Systems</u> Pablo Jensen, CTO - PowerPoint PPT Presentation

<u>Best Practices Building Resilient Systems</u> Pablo Jensen, CTO

Who is Pablo Jensen?  Danish – but born in Argentina where they didn’t had Paul on their whitelist of names so my parents had to call me Pablo  Computer Science degree from Copenhagen University – and MBA from Henley  Several years in Thomson Reuters in Scandinavia, London and Switzerland  Joined Sportradar as CTO in 2013 when the business had 500 employees with 150 in IT – now 2.000 employees and 400 in IT  Industrial advisor for EQT  Running, wine, car’s , Brøndby IF

Who is Sportradar? Operating at the intersection of sports, media and entertainment.  Global leader in live sports data solutions for digital sport entertainment  8,000+ staff and contractors globally  30+ global offices  Deep coverage of more than 40 sports and 600,000 live events per year  9.000 data points updated every second  1 second delay from live stadium event to when data is out at our customers  Platform handling 200,000 requests a second, serving users with up to 4gbit/s in total traffic  9.000 requests/second in average  800+ Clients and Partners

Serving More Than 800 Global Customers Betting Sports Media Integrity Rights Holders

Sportradar in a Nutshell Data Collection Data Processing DATA ANALYTICS Data Monitoring Data Marketing Digital Sports Solutions

Sports Media: Live Score

Sports Media: AV & OTT

Sports Media: Widgets & Cards

Betting: Life Cycle of Odds

Betting: Live Odds

Betting: Virtual Games

Betting: Integrity

Data Feeds & Development Services MDP – Mobile eSports Service Live Odds Development Platform

What can go possibly wrong??

What can go possibly wrong?? Top incident reasons 1. 3 rd Party Provider issue Physical Footprint 2. Limit exceed (table, storage, traffic) Technology 3. Coding error 4. Not following agreed procedures Process Prepare for that there always will be something wrong

IT Organisation Web: HTML5/CSS3, React, Javascript, API driven, Nginx, NodeJS, Varnish, Tomcat, Jetty Tech Stack Mobile: IOS, Android 400+ employees in 10+ IT locations: Backend: • 40+ Dedicated teams Java, PHP, Scala, JRuby, Go, C++, Memcache, Redis, MySql, Cassandra, MongoDB • 300+ Developers • 35 Tech Leads Sys Admin • 40+ System Administrators Ganeti, OpenStack, Zabbix, Puppet, Mcollective, Debian Linux, AWS, Ceph, • 40+ Project Managers Kubernetes • 30+ QA Source code system • 20+ Mobile Developers GIT (GitLab) Open Source scanning WhiteSource Build management : Jenkins, GitLab CI BI & Analytics: S3, ORC, NiFi, RedShift, Athena, Spark, Qlik Communication Tools Slack, Outlook, own build tools for Incident and Maintenance Management but looking at migrating to 3rd party services (StatusPage.io)

IT Organisation Web: HTML5/CSS3, React, Javascript, API driven, Nginx, NodeJS, Varnish, Tomcat, Jetty Tech Stack Mobile: IOS, Android 400+ employees in 10+ IT locations: Best practices for building resiliency Backend: • 40+ Dedicated teams Java, PHP, Scala, JRuby, Go, C++, Memcache, Redis, MySql, Cassandra, MongoDB • 300+ Developers • 35 Tech Leads Strict defined tech stack – new technologies are Sys Admin • 40+ System Administrators Ganeti, OpenStack, Zabbix, Puppet, Mcollective, Debian Linux, Amazon Web Services, architecture driven, not developer driven Kubernetes • 40+ Project Managers • 30+ QA Key technical IT gate points to be followed • 20+ Mobile Developers Source code system GIT (GitLab) Fitness for Development • Open Source scanning WhiteSource Fitness for Launch • “30% Rule” • Build management : Secure Development Guidelines • Jenkins, GitLab CI Maintenance Procedure • Incident Procedure • BI & Analytics: On Duty Procedure • S3, ORC, NiFi, RedShift, Athena, Spark, Qlik

Sportradar Hosting Locations Own regional based data center locations in Europe AWS/Amazon hosting locations used by Sportradar

Sportradar Hosting Locations Physical Footprint Best practices for building resiliency Identical physical regional located core data centers running live-live treated as single redundant data center. Multiple options for client access: Strategic located POP’s • Direct connect • Open Internet • Conceptual Cluster Physical Cluster Data Center A Data Center B A B C A B C A B C Own regional based data center locations in Europe AWS/Amazon hosting locations used by Sportradar

Sportradar’s Global Data Production Operations setup is physical redundant so we can Sportradar Production with more than 900 employees globally shift operations between locations Key facts Germany US • Worldwide accepted data quality unmatched in combination of speed and accuracy Estonia • Redundant production setup • Key positions manned with branch expertise Philippines from all business segments • State of the art data entry tools, developed in- house, enhanced based on needs of operations Austria • Operations approved and well-rehearsed, permanently reviewed and improved/adjusted Uruguay • >900 operators across 7 locations • >6,000 scouts globally

Sportradar’s Global Data Production Operations setup is physical redundant so we can Sportradar Production with more than 900 employees globally shift operations between locations Physical Footprint Key facts Best practices for building resiliency Germany US • Worldwide accepted data quality unmatched in combination of speed and accuracy Identical production locations Estonia • Redundant production setup • Key positions manned with branch expertise Tasks can move from one location to another Philippines from all business segments • State of the art data entry tools, developed in- house, enhanced based on needs of operations Austria • Operations approved and well-rehearsed, permanently reviewed and improved/adjusted Uruguay • >900 operators across 7 locations • >6,000 scouts globally

Providers All service elements; eg. ISP, CDN, DDOS Protection, cloud hosting, physical hosting, DNS, physical production locations, POPs, fixed line connections are understood and categorized with full risk understanding and acceptance.

Providers Physical Footprint All service elements; eg. ISP, CDN, DDOS Protection, cloud Best practices for building resiliency hosting, physical hosting, DNS, physical production locations, Understand and accept: Service elements that are ‘multi -vendor • POPs, fixed line connections are understood and categorized Service elements that are ‘multi -regional ’ • with full risk understanding and acceptance. Service elements that are ‘single’ served •

Separate technology stacks Closed extranet environment for Business Area A Open internet environment for Business Area B US Asia EU EU Client Client Client Client Client Client Client Client DDOS Amazon AWS City A POP City B POP Amazon AWS Protection DC Closed Stack DC Open Stack Own hardware, firewall, routers Own hardware, firewall, routers Leased/fixed line Open Internet during normal operation Gateway for clients from Open Internet during DDOS mitigation open internet

Separate technology stacks Closed extranet environment for Business Area A Open internet environment for Business Area B US Asia EU EU Client Client Client Client Client Client Client Client Technology Best practices for building resiliency Amazon AWS City A POP City B POP Amazon AWS Prolexic Business areas served via separate technology stacks; one stack can have issues without impacting other stacks DC Closed Stack DC Open Stack Technology stacks are hosted on independent redundant services Own hardware, firewall, routers Own hardware, firewall, routers Leased/fixed line Open Internet during normal operation Gateway for B2B clients Open Internet during DDOS attack from open internet

Architecture Deployment Model One of our Backend Core Systems 3 availability zones Running on 3 dedicated physical servers in 3 different physical locations • Separate cluster per sub system Composed of many sub-systems - each running as an independent cluster • Java services either stateless or stateful while keeping data in a distributed mem-grid • Active/Active Clustered active-active setup of RabbitMQ, Zookeeper, HAProxy, Mongo replica sets, Cassandra • Master-slave active-passive setup of MySQL, MySQL Fabric and Redis instances • Active/Passive Mongo point-in-time incremental backup, MySQL/Redis/ZK daily backups • Recovery mechanisms (e.g. a subsystem is able to recover its state based on reference data) • Recovery Async service design (message passing, streaming) • Async Design Circuit-breakers, request throttling, fail-fast approach (Hystrix) • Decoupling Decoupling of operational and archive/warehouse databases • Decoupling and different types of disk volumes, reduce I/O contention (e.g. Mongo, MySQL, Backup, VMs) • Lots of attention to low-latency implementation and design •

Practices Building Resilient Systems</u> Pablo Jensen, CTO - PowerPoint PPT Presentation

<u>Best Practices Building Resilient Systems</u> Pablo Jensen, CTO Who is Pablo Jensen? Danish but born in Argentina where they didnt had Paul on their whitelist of names so my parents had to call me Pablo Computer

Practices of looking Practices of looking Introduction to practices of looking Discuss

1 Best Practices Conversational UX Design 2 Best Practices Conversational UX Design SET THE

Changes in Budgeting Practices and Presentation Changes in Budgeting Practices and Presentation

Good Practices An Introduction Presentation Module 3 Good Practices and Youth perception

HR Best Practices: Hiring Dos, Donts and Best Practices WM. MICHAEL HANNA

Foreclosures, Property Value Assessment Practices and Tax Delinquency in the Practices and Tax

Best Practices Guide a digital agency 1 Ad Creation Best Practices 1. Imagery and branding

Restorative Practices: Definition Restorative Practices are based on principles and processes

Best Practices Presentation Skills Best Practices in Presentation Skills Whether you are

IR Offi IR Office Web Sites: W b Sit Tips and Best Practices Tips and Best Practices Katherine

Instrumentation best practices in Brewing Slide 1 Ola Wesstrom Instrumentation best practices in

Workshop C Best Practices in Best Practices in Air P Air Permitting & Compliance rmitting

Enhanced Safety for Cable Ship Operations Based on Oil and Gas Practices Practices James

Research Performance Progress Report (RPPR) Best Practices Contents Best practices Roles &

ICAO Best Management Practices and ICAO Best Management Practices and the International Strike

Best Practices in Mentoring: some Best Practices in Mentoring: some opening remarks/rambles

Best Practices: Electronics Cooling Ruben Bons - CD-adapco Best Practices Outline Geometry

Best Practices on Handling & Best Practices on Handling & Managing MTBE Managing MTBE

Practices to Promote Health RPN Quarterly Meeting 11 February 2015 Bringing Best Practices to

UNFAIR COMMERCIAL UNFAIR COMMERCIAL PRACTICES DIRECTIVE PRACTICES DIRECTIVE An Overview Peter

Medicaid Agency Best Practices and Not-So-Best Practices Five Slide Series, Volume 39 August

Edmund Coleman-Fountain Janice McLaughlin Stories and Practices Relationship between practices

Witness Interviews: 21 Best and Worst Practices Alexander DC Kask Guild Yule LLP 14 Best

Principles & Practices Principles & Practices of Software Development of Software

Practices Building Resilient Systems</u> Pablo Jensen, CTO - PowerPoint PPT Presentation

<u>Best Practices Building Resilient Systems</u> Pablo Jensen, CTO Who is Pablo Jensen? Danish but born in Argentina where they didnt had Paul on their whitelist of names so my parents had to call me Pablo Computer

Practices of looking Practices of looking Introduction to practices of looking Discuss

1 Best Practices Conversational UX Design 2 Best Practices Conversational UX Design SET THE

Changes in Budgeting Practices and Presentation Changes in Budgeting Practices and Presentation

Good Practices An Introduction Presentation Module 3 Good Practices and Youth perception

HR Best Practices: Hiring Dos, Donts and Best Practices WM. MICHAEL HANNA

Foreclosures, Property Value Assessment Practices and Tax Delinquency in the Practices and Tax

Best Practices Guide a digital agency 1 Ad Creation Best Practices 1. Imagery and branding

Restorative Practices: Definition Restorative Practices are based on principles and processes

Best Practices Presentation Skills Best Practices in Presentation Skills Whether you are

IR Offi IR Office Web Sites: W b Sit Tips and Best Practices Tips and Best Practices Katherine

Instrumentation best practices in Brewing Slide 1 Ola Wesstrom Instrumentation best practices in

Workshop C Best Practices in Best Practices in Air P Air Permitting &amp; Compliance rmitting

Enhanced Safety for Cable Ship Operations Based on Oil and Gas Practices Practices James

Research Performance Progress Report (RPPR) Best Practices Contents Best practices Roles &amp;

ICAO Best Management Practices and ICAO Best Management Practices and the International Strike

Best Practices in Mentoring: some Best Practices in Mentoring: some opening remarks/rambles

Best Practices: Electronics Cooling Ruben Bons - CD-adapco Best Practices Outline Geometry

Best Practices on Handling &amp; Best Practices on Handling &amp; Managing MTBE Managing MTBE

Practices to Promote Health RPN Quarterly Meeting 11 February 2015 Bringing Best Practices to

UNFAIR COMMERCIAL UNFAIR COMMERCIAL PRACTICES DIRECTIVE PRACTICES DIRECTIVE An Overview Peter

Medicaid Agency Best Practices and Not-So-Best Practices Five Slide Series, Volume 39 August

Edmund Coleman-Fountain Janice McLaughlin Stories and Practices Relationship between practices

Witness Interviews: 21 Best and Worst Practices Alexander DC Kask Guild Yule LLP 14 Best

Principles &amp; Practices Principles &amp; Practices of Software Development of Software

Workshop C Best Practices in Best Practices in Air P Air Permitting & Compliance rmitting

Research Performance Progress Report (RPPR) Best Practices Contents Best practices Roles &

Best Practices on Handling & Best Practices on Handling & Managing MTBE Managing MTBE

Principles & Practices Principles & Practices of Software Development of Software