Databases and Stream Processing: A Future of Consolidation Ben Stopford Office of the CTO, Confluent
Marc Andreessen: Software is Eating the World
Weak Form Strong Form Companies are Companies are USING MORE SOFTWARE BECOMING SOFTWARE
Loan Application Using Software 1 2 3 4 5 6 APPROVE DENY BORROWER APPLICATION CREDIT RISK LOAN FORM OFFICER OFFICER OFFICER
Loan Application in Software 1 2 3 APPROVE DENY CREDIT RISK CRM BORROWER LOAN APP UI SERVICE SERVICE SERVICE
Using Software: Classic Three-Tier Architecture USER UI SERVICE DATABASE
Becoming Software: Services Talking To Each Other With APIs SERVICE SERVICE SERVICE SERVICE
CUSTOMER DRIVER REQUESTING A RIDE BUSINESS BUSINESS EVENTS EVENTS GEOSPATIAL ROUTE MATCHING RE-PLANNING
9 Evolution of software systems Event-Driven Monolith Distributed Monolith Microservices Microservices UI UI UI UI Service App App App Service Service Service Apps App Service Apps Service Apps Apps Apps Apps Apps Apps Apps Apps User Centric Kafka Software Centric Service Service Apps Service Apps Apps Increasing Complexity
THE USER OF IS MORE THE SOFTWARE SOFTWARE
What does this mean for databases?
10
We have hundreds of databases...
FUNDAMENTAL ASSUMPTION: We have hundreds DATA IS PASSIVE of databases...
Databases are designed to help you !
Unless there is a user and UI waiting, why should it be synchronous?
The Alternative: Event Streams
Stream Processors are built for Asynchronicity
Stream Processors have a different interaction model TRADITIONAL EVENT STREAM DATABASE PROCESSING Active Query Passive Data Active Data Passive Query CREATE TABLE AS SELECT * SELECT * FROM FROM EVENT_STREAM DB_TABLE DB Table Event Stream
Streams or Tables?
21 An Event records the fact that something happened A good An invoice A payment A new customer was sold was issued was made registered
Events are state changes, they carry intent State: Event: Bob works at Bob moved Google from Google to Amazon
23 Tables Streams current state record exactly what happened Where you have been vs. Where you are now Payments you made vs. Your account balance
24 Streams Tables A sequence of moves Position of each piece 1. e4 e5 2. Nf3 Nc6 3. Bc4 Bc5 4. d3 Nf6 5. Nbd2
25 Streams = INSERT only Immutable, append-only Tables = INSERT, UPDATE, DELETE Mutable, Primary Key
A stream can be considered as an immutable, append-only table
Stream Processors Communicate Through Streams INPUT STREAMS OUTPUT STREAMS SP
But internally they use tables CREATE TABLE credit_scores AS SELECT user, updateScore(p.amount)… Payments Stream Credit Score Table Credit Score Stream 20
29 Tables Streams represent state record history projection (Group By Key, SUM, COUNT) Duality table changes *See Streams and Tables: Two Sides of the Same Coin, M. Sax et al., BIRTE ’18
Similar to a materialized view in a database STREAM Payments - Asynchronous PROCESSOR APP Stream - Push query Credit Score Credit Score Table Stream ACTIVE - Synchronous DATABASE Payments - Pull query Table Credit Score Table 20
31 Joins
Joining a stream with a table Orders Lookup Customer Table of Customers Customers (with Primary Key)
33 Joining two streams Bob’s Jill’s Order Order Orders Payments Bob’s Jill’s Payment Payment orders.join(payments)
34 Joining two streams Bob’s Jill’s Order Order Bob’s Jill’s Payment Payment orders.join(payments)
35 Joining two streams Bob’s Jill’s Order Order Bob’s Jill’s Payment Payment orders.join(payments)
36 Joining two streams Jill’s Bob’s Order Order Jill’s Bob’s Payment Payment orders.join(payments)
37 Joining two streams Key-value store Jill’s Bob’s Order Order Jill’s Payment Bob’s Payment
38 Joining two streams Jill’s Order Key-value store Bob’s Order Jill’s Payment Bob’s Payment
39 Joining two streams Jill’s Order Key-value store Bob’s Order Jill’s Payment Bob’s Payment
40 Joining two streams Jill’s Order Bob’s Order Jill’s Payment Bob’s Payment
41 Joining two streams Bob’s Order Jill’s Order Jill’s Payment Bob’s Payment
42 Joining two streams Bob’s Order Jill’s Order Jill’s Payment Bob’s Payment
43 Joining two streams Bob’s Order Jill’s Order Jill’s Payment Bob’s Payment
44 Joining two streams Bob’s Jill’s Order Order Bob’s Jill’s Payment Payment
45 Streams represent history –> Cartesian Product Payments Orders Stream Stream 101 Boots 101 $50 200 Hat 200 $10 101 Boots2 105 $3 105 Pants 200 $12 200 Hat2 101 $60 Join Output (Stream)
46 Joining Streams to Streams Orders Payments Stream Stream 101 Boots 101 $50 200 Hat 200 $10 101 Boots2 105 $3 105 Pants 200 $12 200 Hat2 101 $60 Use time window Join Output (Stream)
Tools for correlating recent events in time
48 More advanced temporal functions Orders Page Visits Session Join Output (Stream)
49 Late and out-of-order data Orders Page Visits Window 1 Window 2 Join Output (Stream)
Stream processors provide tools that handle asynchronicity, leverage time and focus on ‘now’
51 Data Placement
52 Layered storage model Storage (Kafka) ‘Caching’ in ... streaming layer ... from stream’s P2 ... Stream read via Processor network from table’s P2 ... ... ...
53 Partitioned Data (Fact-Fact joins) Partitioned Storage (Kafka) KTable / TABLE 2 GB SP 1 P1 ... 3 GB SP 2 P2 ... 5 GB SP 3 P3 ... SP 4 2 GB P4 ...
54 Broadcast Data (Fact-Dimension Joins) GlobalKTable 2 + 3 + 5 + 2 = 12 GB Stream Task 1 P1 ... 12 GB Stream Task 2 P2 ... Stream Task 3 12 GB P3 ... Stream Task 4 12 GB P4 ...
Architecturally there are parallels e.g. Data Warehousing FACTS DIMS ETL REPORTING
56 Interaction Model
Stream Processors Continuously Process Input to Output INPUT STREAMS OUTPUT STREAMS SP
TRADITIONAL EVENT STREAM DATABASE PROCESSING Active Query Passive Data Active Data Passive Query CREATE TABLE AS SELECT * SELECT * FROM FROM EVENT_STREAM DB_TABLE DB Table Event Stream
Stream Processors are Databases are Push Queries Pull Queries Payments Payments What is Ben’s credit score now? Ben’s credit score is 670 Ben’s credit score is 710 APP APP Ben’s credit score is 695 695 ...
Hybrid stream processors provide both interaction models Payments Stream APP ksqlDB Query Credit Scores APP Summarize & Materialize Stream Credit Scores Credit Scores
Unified Model For: ynchronous and the Syn 1. The As Asyn ynchronous Active or Pa 2. 2. In Interaction on with Ac Passive Dat Data
Unified interaction model Standard Database Earliest to now Query The Past The Future Now
Unified interaction model Standard Stream Processing Query Now to forever The Past The Future Now
Unified interaction model ‘Dashboard query’ Earliest to forever The Past The Future Now
Unified Interaction Model Earliest to now Earliest to forever Now to forever The Past The Future Now
PUSH PULL SELECT user, credit_score SELECT user, credit_score FROM orders FROM orders WHERE ROWKEY = ‘bob’ WHERE ROWKEY = ‘bob’; EMIT CHANGES;
Asynchronous => Pipelines Transactions APP SQL SQL SQL APP Joins/aggregation/time-handling
Other important variants ● Stream processors are often programming frameworks today Storm ○ Flink ○ Kafka Streams ○ ● Today we have active databases that include change streams: Mongo ○ Couchbase ○ RethinkDB ○
As Software Eats the World
THE USER OF IS MORE THE SOFTWARE SOFTWARE
We need Asynchronous + Synchronous Active + Passive
We still need all of these
So is the traditional perception of “a database” enough?
Ben Stopford Confluent @benstopford ben@confluent.io
Recommend
More recommend