About me A data engineering challenge - PowerPoint PPT Presentation

● About me ● ● ● ●

● ● ● ●

● ● A data engineering challenge ● ● ●

● ● ● ○

● ● ● ● ● ●

Transaction Data store responsible for ● ○ Billing ○ Internal debugging ○ Downstream services Reporting ■ ■ Analytics Warehouse

● OLTP (Online Transactional Processing) ■ Every write to DB = $$ exchanging hands ■ No downtime, low latency writes ■ Accuracy is crucial ● OLAP (Online Analytical Processing) ■ Monthly financial CSV exports & list endpoints ■ Easy aggregation ■ Slice and dice over arbitrary set of columns

● ● ● Mistakes we made ● ●

2 days later, he see CX sees

CSV Exports Re-pulled export on Jan 5 Downloaded CSV file on Jan 1 ● ● ●

● ● ● Our solution ● ●

1. Immutable - Records are never changed, only inserted

Why Immutable? ● Biggest pain point ● Able to track changes over time (data lineage) ● Financial data should never be mutable ○ useful for auditing ○ state is reproducible at any point in time ○ allows for correction in next accounting period

Immutable event log What CX observed was no fluke! July 3rd July 1st

1. Immutable 2. Deltas for Easy Aggregation - represent amounts in “deltas” Digiday, 2017

See total commissions by day Before After

microsoft excel stock image

Benefit of Delta ● Easy aggregation ● A single service responsible for computing deltas ● “Atomic” - self contained description of the change ● Events can arrive out of order, and end state will be eventually consistent With Latest State ● Greater tolerance for missing events, later states will overwrite incorrect earlier states

1. Immutable 2. Deltas for Easy Aggregation - represent amounts in “deltas” 3. Denormalized - few tables, lots of dimensions Digiday, 2017

Why Denormalized? More OLAP use cases than OLTP. OLAP use cases - large # of records ● Marketing - Campaign analysis ● Finance - Billing Exports & Invoices ● Data team - Analytics ● Partners - API for historical data OLTP use cases - single record ● Customer Support - Debugging individual orders ● Inserting events

Hybrid Performance Approach ● Use Postgres DB ● Denormalized Data Hybrid in the sense that data format is optimized for querying over historical time ranges yet DB is a traditional OLTP database.

For faster performance with CSV Exports and aggregations Previous Financial Data Store New Data Store - denormalized

1. Immutable 2. Deltas for Easy Aggregation - represent amounts in “deltas” 3. Denormalized - few tables, lots of dimensions 4. Separate record keeping for billing Digiday, 2017

Why keep separate records for billing? ● Need stable tracking of which events fit into each invoice ● Enable later adjustments ● Allow changes in billing logic ○ may bill on events vs orders ○ may bill per customer vs per order ○ may bill weekly vs monthly

Immutable Event Log Product/Service rendered Invoicing

1. Immutable 2. Deltas for easy aggregation 3. Denormalized 4. Separate record keeping for billing 5. Self Heal - programmatic detection & adjustment Digiday, 2017

Self-Heal - programmatic detection & adjustment ● Immutable data helps with this ● So does having separate records for billing ● Limiting points of failure Example: ● Orders that were processed “late”, that didn’t make it into the last billing cycle, should be automatically added to the next cycle ● Automatic checks of billing records (immutable) against order event records (also immutable)

Use stable ID & ordering throughout your procession pipeline Ordering (seqn) and Event ID should be set as upstream as possible in the order pipeline, and ● carried all the way downstream. Good for debugging ●

● Dates really matter. ● ○ ○ ○ ○ ○ ● ● Avoid floats Double-Entry doesn’t matter ● ○

● ● ● ●

About me A data engineering challenge - PowerPoint PPT Presentation

About me A data engineering challenge Transaction Data store responsible for Billing Internal debugging

ReSAKSS DATA CHALLENGE Annual Newsletter www.resakss.org/challenge ReSAKSS DATA CHALLENGE ANNUAL

VAST CHALLENGE 2017 Bianca Barnucz & Stephanie Wegscheidl OVERVIEW VAST Challenge

STEP CHALLENGE February 7 th March 8 th CHALLENGE OVERVIEW This Step Challenge is a fun

Michelin Challenge Bibendum 2014 CONTENT CHALLENGE BIBENDUM THINK & ACTION TANK TO

Ultimately our vision is about GRAND CHALLENGE using science to make a difference in the world.

New Challenge 10 New Challenge 10 June 1, 2007 Business environment Direction Challenge

Petroleum Engineering Are you up for the future energy challenge? Are you up for the challenge?

Heat Program Challenge: Risk Perception Source: NOAA, ADHS Challenge: Risk Perception Source:

Arizona FAF$A Challenge Julie Sainz, M.Ed. Arizona FAF$A Challenge Project Manager Arizona

City of Santa Clara Challenge Team May 10, 2017 https://hkidsf.org/our-programs/challenge-team/

@ International KEYSTONE Challenge Track Conference Challenge Track Koice 11 12 May 2015

THIS IS WHERE CHANGE BEGINS - Worlds Challenge Challenge February 12, 2018 1 AGENDA

www.bpho.org.uk Oxford 24 th June 2014 Physics Challenge AS Challenge A2 Challenge

Smarter Cities Challenge Burlington, Vermont 2 | Smarter Cities Challenge Mission

Forensic Challenge V2.0 UNAM-CERT RedIRIS Topics * Forensic Challenge V1.0 * Forensic

POSIX mini-challenge Leo Freitas and Jim Woodcock University of York December 2006 @ TC Dublin

Solution approaches towards verifjed -Kernel Danny Ziesche August 25, 2017 RheinMain

The Greatest Challenge Joachim Parrow Bertinoro 2014 The slides for this talk is a subset of the

Finding Your Bot-Mate: Criteria for evaluating robot kits for use in undergraduate computer

BUILDING YOUR OWN SMART DEVICE 1 Agenda Introduction to Electronics Voltage and current

SVM Kernels COMPSCI 371D Machine Learning COMPSCI 371D Machine Learning SVM Kernels 1 /

Excursion 3 Tour III Capability and Severity: Deeper Concepts Frequentist Family Feud A

Multicore Processing Element for SIMD Computing Da-Qi Ren and Reiji Suda Department of Computer

ECE 2162 Branch Prediction Control Dependencies Branches are very frequent Approx. 20%

About me A data engineering challenge - PowerPoint PPT Presentation

About me A data engineering challenge Transaction Data store responsible for Billing Internal debugging

ReSAKSS DATA CHALLENGE Annual Newsletter www.resakss.org/challenge ReSAKSS DATA CHALLENGE ANNUAL

VAST CHALLENGE 2017 Bianca Barnucz &amp; Stephanie Wegscheidl OVERVIEW VAST Challenge

STEP CHALLENGE February 7 th March 8 th CHALLENGE OVERVIEW This Step Challenge is a fun

Michelin Challenge Bibendum 2014 CONTENT CHALLENGE BIBENDUM THINK &amp; ACTION TANK TO

Ultimately our vision is about GRAND CHALLENGE using science to make a difference in the world.

New Challenge 10 New Challenge 10 June 1, 2007 Business environment Direction Challenge

Petroleum Engineering Are you up for the future energy challenge? Are you up for the challenge?

Heat Program Challenge: Risk Perception Source: NOAA, ADHS Challenge: Risk Perception Source:

Arizona FAF$A Challenge Julie Sainz, M.Ed. Arizona FAF$A Challenge Project Manager Arizona

City of Santa Clara Challenge Team May 10, 2017 https://hkidsf.org/our-programs/challenge-team/

@ International KEYSTONE Challenge Track Conference Challenge Track Koice 11 12 May 2015

THIS IS WHERE CHANGE BEGINS - Worlds Challenge Challenge February 12, 2018 1 AGENDA

www.bpho.org.uk Oxford 24 th June 2014 Physics Challenge AS Challenge A2 Challenge

Smarter Cities Challenge Burlington, Vermont 2 | Smarter Cities Challenge Mission

Forensic Challenge V2.0 UNAM-CERT RedIRIS Topics * Forensic Challenge V1.0 * Forensic

POSIX mini-challenge Leo Freitas and Jim Woodcock University of York December 2006 @ TC Dublin

Solution approaches towards verifjed -Kernel Danny Ziesche August 25, 2017 RheinMain

The Greatest Challenge Joachim Parrow Bertinoro 2014 The slides for this talk is a subset of the

Finding Your Bot-Mate: Criteria for evaluating robot kits for use in undergraduate computer

BUILDING YOUR OWN SMART DEVICE 1 Agenda Introduction to Electronics Voltage and current

SVM Kernels COMPSCI 371D Machine Learning COMPSCI 371D Machine Learning SVM Kernels 1 /

Excursion 3 Tour III Capability and Severity: Deeper Concepts Frequentist Family Feud A

Multicore Processing Element for SIMD Computing Da-Qi Ren and Reiji Suda Department of Computer

ECE 2162 Branch Prediction Control Dependencies Branches are very frequent Approx. 20%

VAST CHALLENGE 2017 Bianca Barnucz & Stephanie Wegscheidl OVERVIEW VAST Challenge

Michelin Challenge Bibendum 2014 CONTENT CHALLENGE BIBENDUM THINK & ACTION TANK TO