Applying Design To Solve Scaling Problems and Evolve an Architecture - PowerPoint PPT Presentation

Introducing the Bidder-as-a-Service Applying Design To Solve Scaling Problems and Evolve an Architecture DataEngConf, NYC Oct. 30, 2017 Mark Weiss Senior Software Engineer mark@beeswax.com @marksweiss

What is Beeswax?

We Built a Better Bidder About Beeswax ● Beeswax is a 3-year-old ad tech startup based in NYC ● Founded by three ex-Googlers, CEO has deep roots in ad tech ● 40 employees in NYC and London Why we are Different ● Customers get the benefits of a custom bidder stack, without the development and operating cost and risk ● Give customers access to all of their data ● Provide APIs for customers to customize bidding strategy, API-driven ● SaaS model and pricing, customers pay to use the platform

RTB: Real Time Bidding (AKA "Please Let Us Do This") Ad Exchange Step 1: Step 2: Send ad request & userid Broadcast bid request Beeswax Bidder Scale: 1M QPS < 200 ms Publisher Latency_99 : 20 ms - Target campaigns - Target user profiles - Optimize for ROI Step 4: Step 3: - Customize Show ad to user Submit bid & ad markup Auction

What is the Beeswax Data Platform?

Beeswax Data Platform Bid Data Customer Raw Event Customer Data Normalized Log Data Redshift Event Ingestion S3 Event Join, Processing Normalize, Aggregate Impression, Click and Customer other Event Data Reports

Beeswax Data Platform: Event Stream Bid Data Customer Raw Event Customer Python Web App Data Normalized Log Data Input: HTTP/JSON Output: Protobuf Redshift Event Ingestion S3 Event Join, Processing Normalize, Aggregate Kinesis Impression, Click and Customer other Event Data Reports

Beeswax Data Platform: Event Processing Bid Data Custom Java KCL App Input: Protobuf Customer Raw Event Customer Output: CSV Data Normalized Log Data Redshift Event Ingestion S3 Event Join, Processing Normalize, Aggregate Impression, Click and Customer other Event Data Reports

Beeswax Data Platform: Event Processing Bid Data Customer Raw Event Customer Data Normalized Log Data Redshift Event Ingestion S3 Event Join, Processing Normalize, Aggregate - AWS Data Pipeline - AWS Redshift/SQL - Custom Python libs - Python Activities Impression, Click and Customer other Event Data Reports

What Was the State of the System?

Event Join and Aggregation ("Everything Looks Good …") Bids Honeycomb Joining Fact Table: Impressions and Impression Details Aggregation Clicks, Conversions

Event Join and Aggregation ("Everything Looks Good …") Bids Honeycomb Joining Fact Table: Impressions and Impression Details Aggregation Clicks, Conversions Other Impression Data

Pipeline Problems: Monolithic and Inflexible Step 1 Step 2 Step 3 Target Table Step 1 Step 2 Step 3

We were a lucky startup with a bunch of "good problems to have"

System Goals for Architectural Evolution ● Support separate pipelines writing to the same target tables ● Support any pipeline depending on the data from any other ● Centralize job-level state management and job control

System Goals for Architectural Evolution ● Support separate pipelines writing to the same target tables ● Support any pipeline depending on the data from any other ● Centralize job-level state management and job control ● Continue to use the existing platform technologies … for now

From Goals to Principles to Patterns to Design

Goals to Principles: Remove Contention Goal Principle Multiple asynchronous pipelines with Jobs always write to new versioned no write contention instances of target tables Multiple pipelines land data in same One job per master target table reads master fact table from multiple sources and writes into the target table sequentially

Principles to Patterns: Remove Contention Input Data Set A Data Pipeline Job Staging Table A A Input Data Set A Gather Data Target Fact Pipeline Job Table Input Data Set B Data Pipeline Job Staging Table B B Input Data Set B

Goals to Principles: Job Composition and Job State Goal Principle Any job can depend on any other job Jobs record completion of uniquely identifiable, timestamped data sets into one source of truth for all jobs Jobs always consume the most recent Jobs can query one source of truth to source data available discover the the most recent data sets available upon which they depend

Principles to Patterns: Job Composition and Job State Global Job State Scatter Job Staging Table A Data Set Type A Version 1 Time 1 Data Set Time 1 Type 1

Principles to Patterns: Job Composition and Job State Global Job State Scatter Job Staging Table A Data Set Type A Version 1 Time 1 Data Set Time 1 Type A Data Set Time 2 Type A Scatter Job Staging Table A Data Set Type A Version 2 Time 2

Principles to Patterns: Job Composition and Job State Consumes most recent data Gather Data Job Global Job State Scatter Job Staging Table A Data Set Type A Version 1 Time 1 Data Set Time 1 Type A Data Set Time 2 Type A Scatter Job Staging Table A Data Set Type A Garbage Version 2 Time 2 Collection Job DROPs less recent data

Patterns to Design: Job Composition and Job State (Data Set Type A, timestamp 1, processing_window) Scatter Job A Gather Data Global Job Pipeline Job State (Data Set Type A, timestamp 1, proecssing_window), (Data Set Type B, Scatter Job B (Data Set Type A, timestamp 2, timestamp 1, processing_window) processing_window)

Implementing the Design with What we Have on Hand

Implementing the Design RDS (MySQL) Data Pipeline Python API Global Job State Jobs Data Pipeline Tables Jobs Redshift DDL ● AWS Data Pipeline ● Python ● Redshift SQL

Conclusions ● You can evolve data architecture without adopting new technology ● Carefully chosen invariants define a design that can solve present problems and supports future flexibility ● Invariants are system Goals ● Identifying goals suggest Principles ● Patterns embody Principles ● Design applies patterns

Introducing the Bidder-as-a-Service Questions? We have a great team! Mark Weiss Senior Software Engineer We have lots of fun problems to solve! mark@beeswax.com We have LaCroix and Kind Bars! @marksweiss We're hiring! https://www.beeswax.com/careers/

Applying Design To Solve Scaling Problems and Evolve an Architecture - PowerPoint PPT Presentation

Introducing the Bidder-as-a-Service Applying Design To Solve Scaling Problems and Evolve an Architecture DataEngConf, NYC Oct. 30, 2017 Mark Weiss Senior Software Engineer mark@beeswax.com @marksweiss What is Beeswax? We Built a Better

DESIGN, TECHNOLOGIES AND TROUBLESHOOTING We help our clients to solve business problems using

Using Python to Solve Computationally Hard Problems Using Python to Solve Computationally Hard

Together we can solve problems that are too big for any one organization to solve alone Eastern

C# Design Patterns: Proxy APPLYING THE PROXY PATTERN Steve Smith FORCE MULTIPLIER FOR DEV TEAMS

Can we solve it efficiently? We have seen that there is a distinction between problems that

!! Problems to solve ! ! ! ! There is no one size that fi ts all BFT protocol ! !

Introduction to the Engineering Design Cycle Solving Everyday Problems Using the Engineering

Trapdoor Problems Basing the solution on the complexity of problems, which are easy to solve for

Trapdoor Problems Basing the solution on the complexity of problems, which are easy to solve for

Reliability matters We believe that by listening carefully to customer needs, applying expert

Hard Problems Some problems are hard to solve. No polynomial time algorithm is known.

BIOE 301 Four Questions What are the problems in healthcare today? Who pays to solve

BIOE 301 Who pays to solve problems in healthcare? How can we use science and technology to

Non-Intrusively Avoiding Scaling Problems in and out of MPI Collectives Hongbo Li , Zizhong Chen,

Public Service Announcement Help nonprofits solve real world problems at Berkeley Builds! Join

Scaling Quality On Quora Using Machine Learning Nikhil Garg @nikhilgarg28 @Quora @QconSF

Educomp Solutions Ltd. Educomp Solutions Ltd. Creating a New Learning Curve VISION To solve

UMBC A B M A L T F O U M B C I M Y O R T 1 (10/18/04) I E S R C E O V

Mood. Aim 7.4: How do we solve Regents style rational equation problems? HW: DeltaMath DoNow

Simulation Experiments and Trials 15-110 Friday 4/17 Learning Goals Use randomization

Tor performance problems ...and how to solve them Roger Dingledine The Tor Project

In this lesson, we are going to learn how to apply geometric methods to solve design problems of

EVolve Houston Shared Vision and Roadmap for the Greater Houston Area Presented by : EVolve

Panorama of scaling problems and algorithms Ankit Garg Microsoft Research India FOCS 2018,