Honey, I shrunk the database! Resilience and recoverability in Cloud - PowerPoint PPT Presentation

Honey, I shrunk the database! Resilience and recoverability in Cloud Native services JEFFREY FARBER SIDNEY SHEK

Cloud infrastructure = reliable services right?

SUPER RESILIENT CLOUD-BASED ARCHITECTURE Canary Progressive Rollouts Distributed Blue-Green Deployments Multi-Region Cassandra Database 5 minute backups

WITH LOTS OF DEPENDENCIES Reliability Promise = 99.999% Recovery Time Objective = 1 hour Recovery Point Objective = 5 minutes

Until…

BET YOU DIDN’T SEE THIS COMING (THIS IS A TRUE STORY)

WE RESTORE… 2 hour old snapshot

WE RESTORE… … BUT WHAT ABOUT OUR DEPENDENCIES? M AY HAVE CORRECT DATA , BUT HOW L OST DATA W RONG (?) DATA TO SYNC ? 10 X NORMAL LOAD ?

Big 💪 will happen. Accept and plan for it. statement with support Emergent behaviour Our systems are complex and unpredictable. Broad spectrum solutions Incorporate general recovery methods to handle the unexpected.

1. Event sourcing Minimize data loss after a restore 2. Easily recoverable replication Get downstream systems back in sync 3. Local and distributed redundancy PATTERNS FOR GUARDING Having fallbacks for fallbacks for fallbacks… AGAINST THE IMPOSSIBLE

Event sourcing Minimize data loss after a restore

Let’s add recovery here

Event Sourcing Minimize data loss of DB restore (RPO) Critical applications can’t afford hours of data loss. Goals Recover from bugs ruining data Databases can replicate your data across regions. It also replicates your bugs. Write Events Accuracy & Time (RTO) Generating We need confidence in our restored data, and we need it quickly! Recovery

Event Sourcing Service Goals INSERT INTO table SELECT FROM table DELETE FROM table Write Events Generating Recovery

Event Sourcing Writes Reads Goals INSERT INTO table SELECT FROM table DELETE FROM table Write Events Generating Recovery

Event Sourcing Writes Goals SELECT FROM table Write Events INSERT INTO table DELETE FROM table Generating INSERT/APPEND historical command store Recovery

Event Sourcing Writes Goals SELECT FROM table Write Events INSERT INTO table DELETE FROM table Generating INSERT/APPEND historical Replay events command to main store Recovery database

Event Sourcing Writes Goals SELECT FROM table Write Events INSERT INTO table DELETE FROM table Generating INSERT/APPEND M AKE SURE THIS IS AN historical I NDEPENDENT STORE ! command store Recovery

[ { Event Sourcing "sequence": 20, "stream": "user123", "commandType": "grant_permission", "params": { Goals "user": "user123", "resource": "issueABC", Write Events "permission": "view" }, "timestamp": "2019-07-24 3:30 PM”, Generating "actor": "jira_share_service" }, Recovery ]

[ { Event Sourcing "sequence": 20, "stream": "user123", Order all writes, so we can replay in same order 1 "commandType": "grant_permission", "params": { Goals "user": "user123", "resource": "issueABC", Write Events "permission": "view" }, "timestamp": "2019-07-24 3:30 PM”, Generating "actor": "jira_share_service" }, Recovery ]

"sequence": 20, Order all writes, so we can replay in same order 1 Event Sourcing S TRICTLY ORDERED Goals Write Events Stream Sequence user123 19 Generating SET sequence = sequence + 1 WHERE sequence = 19 Recovery sequences -> 19, 20, 21...

"sequence": 20, Order all writes, so we can replay in same order 1 Event Sourcing M OSTLY ORDERED Goals sequence = {timestamp}{unique_node_id} Write Events sequences -> 1565312340, 1565323450, ... Generating Recovery

"sequence": 20, Order all writes, so we can replay in same order 1 Event Sourcing M OSTLY ORDERED Goals sequence = {timestamp}{unique_node_id} Write Events sequences -> 1565312340, 1565323450, ... Generating No SPOF (database) Only certain writes need strict ordering Clock skew window is small (< 1 sec) Recovery Don’t know previous sequence

"sequence": 20, Order all writes, so we can replay in same order 1 Event Sourcing M OSTLY ORDERED + CUSTOMER - DICTATED STRICT ORDERING Goals write2 write1 ?after={timestamp1} Write Events /create /delete Generating {sequence/timestamp1} {sequence/timestamp2} Recovery timestamp2 > timestamp1

[ { Event Sourcing "sequence": 20, "stream": "user123", "commandType": "grant_permission", Streams guarantee order 2 "params": { Parallelize across streams Goals "user": "user123", "resource": "issueABC", Write Events "permission": "view" }, "timestamp": "2019-07-24 3:30 PM”, Generating "actor": "jira_share_service" }, Recovery ]

[ { Event Sourcing "sequence": 20, "stream": "user123", "commandType": "grant_permission", Streams guarantee order 2 "params": { Parallelize across streams Goals "user": "user123", user123 {sequence: 21, {sequence: 20, "resource": "issueABC", commandType: ”revoke”, commandType: ”grant", Write Events "permission": "view" permission: "view"} permission: "view"} }, "timestamp": "2019-07-24 3:30 PM”, Generating "actor": "jira_share_service" }, Recovery ]

[ { Event Sourcing "sequence": 20, "stream": "user123", "commandType": "grant_permission", Streams guarantee order 2 "params": { Parallelize across streams Goals "user": "user123", user123 {sequence: 21, {sequence: 20, "resource": "issueABC", commandType: ”revoke”, commandType: ”grant", Write Events "permission": "view" permission: "view"} permission: "view"} }, user456 {sequence: 75, {sequence: 74, "timestamp": "2019-07-24 3:30 PM”, Generating commandType: ”revoke”, commandType: ”grant", "actor": "jira_share_service" permission: “edit"} permission: “edit"} }, Recovery ]

Event Sourcing Restore snapshot 1 Goals Write Events Generating Recovery

Event Sourcing Restore snapshot 1 user123 Goals 21, 20 19 Recover streams in parallel 2 user456 Write Events ..., 76, 75, 74 73 Generating Recovery

Event Sourcing Restore snapshot 1 user123 Goals 21, 20 19 Recover streams in parallel 2 user456 Write Events ..., 76, 75, 74 73 Generating Bonus: process all stream events in-memory Recovery

Events for stream “user123” Main Datastore [ Event Sourcing { "stream": "user123", "sequence": 20, "commandType": "grant_permission", "params": { Stream Sequence Goals "user": "user123", user123 19 "resource": "issueABC", "permission": “view" User Resource Permissions Write Events }, user123 issueABC [] ... }, Generating { "stream": "user123", "sequence": 21, "commandType": "grant_permission", Recovery "params": { "user": "user123",

Honey, I shrunk the database! Resilience and recoverability in Cloud - PowerPoint PPT Presentation

Honey, I shrunk the database! Resilience and recoverability in Cloud Native services JEFFREY FARBER SIDNEY SHEK Cloud infrastructure = reliable services right? SUPER RESILIENT CLOUD-BASED ARCHITECTURE Canary Progressive Rollouts

Honey, There is a Python in My Android Phone Ing Wei, Tang (James) About the Title: It was

Honey, I Shrunk our Records! ARMA Silicon Valley Chapter March 8, 2018 Presented by: Karen

Honey, I Shrunk the Cube Matteo Golfarelli Stefano Rizzi University of Bologna - Italy Summary

SELLS BEST 2018 SUPERMARKET PRESENTATION HONEES HONEY LEMON RANKS #1 IN UNIT SALES HONEY LEMON

AFRICAN HONEY BEES Texas Master Beekeeper Program Advanced Level Module European vs. African

Why Are Honey Bees Important? Cathy Schuman Objectives Provide information about honey bees 1.

Has the honey bee a future? Some facts about honey bees Pollinate 60% of all commercial crops

bee to beer Keith Seiz and Alison Wuebbels National Honey Board National Honey Board Based

Honey Market Presentation Spring 2020 Agriculture and Rural Development Overview of EU honey

Honey Bee Pests and Diseases Dale McMahan Honey Bee Pests and Diseases Pests Diseases

Honey Production BC Honey Producers Association March 22-23, 2019 Where thought meets action

Honey Market Presentation Agriculture and Rural Development Overview of EU honey market (1)

The Effects of Pesticide- Contaminated Pollen on Larval Development of the Honey Bee, Apis

Honey Marketing Specifics Regulating the Beekeeping, Honey Production and Trade Sectors in UAE

Iodine, Silver, Honey Honey Iodine Silver Enzymatic debridement Proteolytic enzyme, also

Database Utilities 10/17/2007 DC/Win Database Utilities Opening Database Utilities From File on

Web: www.ricoauto.in Web: www.ricoauto.in Company Profile, Facilities & Locations

BRAELINN ELEMENTARY 1:1 Computing HELLO! Presenters: Wenonah Bell, Principal Jamie Munoz,

Safety Safety is a primary concern when it comes to onsite wastewater. Safety doesnt

What Keeps You Up at Night? Issues of Fraud and Abuse Compliance Series Proper Recordkeeping in

A Lightweight Fault-Tolerant Mechanism for Network-on-Chip Michihiro Koibuchi*, Hiroki

Investor Presentation April 2012 Executive Summary Successful Explorer with 5 discoveries since

GOOD OIL CONFERENCE SEPTEMBER 2019 1 ABOUT CALIMA Calima Energy Acreage - CE1 owns and

Recovery Strategies from the Community Perspective Oakland, California Steve King Urban

Honey, I shrunk the database! Resilience and recoverability in Cloud - PowerPoint PPT Presentation

Honey, I shrunk the database! Resilience and recoverability in Cloud Native services JEFFREY FARBER SIDNEY SHEK Cloud infrastructure = reliable services right? SUPER RESILIENT CLOUD-BASED ARCHITECTURE Canary Progressive Rollouts

Honey, There is a Python in My Android Phone Ing Wei, Tang (James) About the Title: It was

Honey, I Shrunk our Records! ARMA Silicon Valley Chapter March 8, 2018 Presented by: Karen

Honey, I Shrunk the Cube Matteo Golfarelli Stefano Rizzi University of Bologna - Italy Summary

SELLS BEST 2018 SUPERMARKET PRESENTATION HONEES HONEY LEMON RANKS #1 IN UNIT SALES HONEY LEMON

AFRICAN HONEY BEES Texas Master Beekeeper Program Advanced Level Module European vs. African

Why Are Honey Bees Important? Cathy Schuman Objectives Provide information about honey bees 1.

Has the honey bee a future? Some facts about honey bees Pollinate 60% of all commercial crops

bee to beer Keith Seiz and Alison Wuebbels National Honey Board National Honey Board Based

Honey Market Presentation Spring 2020 Agriculture and Rural Development Overview of EU honey

Honey Bee Pests and Diseases Dale McMahan Honey Bee Pests and Diseases Pests Diseases

Honey Production BC Honey Producers Association March 22-23, 2019 Where thought meets action

Honey Market Presentation Agriculture and Rural Development Overview of EU honey market (1)

The Effects of Pesticide- Contaminated Pollen on Larval Development of the Honey Bee, Apis

Honey Marketing Specifics Regulating the Beekeeping, Honey Production and Trade Sectors in UAE

Iodine, Silver, Honey Honey Iodine Silver Enzymatic debridement Proteolytic enzyme, also

Database Utilities 10/17/2007 DC/Win Database Utilities Opening Database Utilities From File on

Web: www.ricoauto.in Web: www.ricoauto.in Company Profile, Facilities &amp; Locations

BRAELINN ELEMENTARY 1:1 Computing HELLO! Presenters: Wenonah Bell, Principal Jamie Munoz,

Safety Safety is a primary concern when it comes to onsite wastewater. Safety doesnt

What Keeps You Up at Night? Issues of Fraud and Abuse Compliance Series Proper Recordkeeping in

A Lightweight Fault-Tolerant Mechanism for Network-on-Chip Michihiro Koibuchi*, Hiroki

Investor Presentation April 2012 Executive Summary Successful Explorer with 5 discoveries since

GOOD OIL CONFERENCE SEPTEMBER 2019 1 ABOUT CALIMA Calima Energy Acreage - CE1 owns and

Recovery Strategies from the Community Perspective Oakland, California Steve King Urban

Web: www.ricoauto.in Web: www.ricoauto.in Company Profile, Facilities & Locations