CS 839: Design the Next-Generation Database Lecture 23: Serverless - PowerPoint PPT Presentation

CS 839: Design the Next-Generation Database Lecture 23: Serverless Xiangyao Yu 4/14/2020 1

Announcements Please sign up for the presentation slots following the email 2

Discussion Highlights How far away is Snowflake from the “optimal design”? • Auto-scaling • Better optimized storage layer (like Aurora) • Security and reliability • Code compilation • Caching can be improved (e.g., workload specific) • Data sharing across virtual warehouses • Opportunities to extend into providing HTAP solutions • Cloud service layer might be a bottleneck Combine data warehousing and OLTP in cloud? • Master and slave nodes within a VW to support writes as well • Build snapshot isolation into storage (concurrency control) • Transaction log -> (intermedia storage) -> S3 -> data warehouse every Y hours • VW per transaction? 3

Today’s Paper SIGMOD 2020 4

What is Serverless Computing? Serverless computing is a cloud computing execution model in which the cloud provider runs the server, and dynamically manages the allocation of machine resources. Pricing is based on the actual amount of resources consumed by an application, rather than on pre-purchased units of capacity. According to a Berkeley TechReport [1] Core of serverless today Serverless computing = FaaS + BaaS Function-as-a-Service Backend-as-a-Service [1] E. Jonas, et al. Cloud Programming Simplified: A Berkeley View on Serverless Computing , Berkeley TR 2019 5

Function-as-a-Service FaaS offerings • AWS Lambda • Google Cloud Functions • Microsoft Azure Functions • IBM/Apache's OpenWhisk (open source) • Oracle Cloud Fn (open source) 6

AWS Lambda Features • Function starts execution (within a container) within sub-second • Charged at 100ms granularity that the container runs • Can run thousands/millions of small invocations in parallel Limitations • Limited runtime: 15 min • Limited resources: 1 core, 3 GB main memory • No direct communication between functions 7

Opinion from a CIDR’19 Paper [2] • Cloud storage is 1—2 orders of magnitude slower than SSD • No inter-function communication • Paper gave suggestions for future work [2] Hellerstein, Joseph M., et al. "Serverless computing: One step forward, two steps back." arXiv preprint arXiv:1812.03651 (2018). 8

Opinion from Berkeley Report [1] However in our final example, Serverless SQLite, we identify a use case that maps so poorly to FaaS that we conclude that databases and other state-heavy applications will remain as BaaS” [1] E. Jonas, et al. Cloud Programming Simplified: A Berkeley View on Serverless Computing , 9 Berkeley TR 2019

Database: FaaS or BaaS? FaaS: Today’s paper BaaS: Athena, Snowflake, Aurora, etc. 10

Cloud Analytics Databases 11

Starling Architecture Coordinator Workers Storage • Query compilation • Query execution • Input data • Initiate workers • Communication 12

Example Query Execution (TPC-H Q12) Lineitem (S3) Orders (S3) Step 1: Filter λ λ λ x200 λ x800 λ λ Projection Partition Shuffle Partitions (S3) Step 2: Join and λ x200 λ λ partial aggregate Partial Aggregates (S3) Join Filtering Step 3: Final x1 λ aggregate Final Aggregate (S3) Group-by Aggregate 13

Optimizations Parallel reads 14

Optimizations Parallel reads Read straggler mitigation (RSM) • If a read request times out, send duplicate request 15

Optimizations Parallel reads Read straggler mitigation (RSM) Write straggler mitigation (WSM) • If a write request times out, send duplicate request • Single Timer: allow only single time out 16

Optimizations Parallel reads Read straggler mitigation (RSM) Write straggler mitigation (WSM) Doublewrite • Producer writes two copies of an object; consumer reads the one ready first 17

Optimizations Parallel reads Read straggler mitigation (RSM) Write straggler mitigation (WSM) Doublewrite Pipelining • Start the following stage before the previous stage finishes 18

Optimizations Parallel reads Read straggler mitigation (RSM) Write straggler mitigation (WSM) Doublewrite Pipelining Combining to reduce cost of shuffle 19

Evaluation 330 774 Starling can be faster than other S3-based cloud data warehouses Starling can be cheaper than other cloud data warehouses 20

Evaluation TPC-H Q12 Easy to tune performance by changing the number of tasks 21

Starling vs. Snowflake Control layer vs. Coordinator Compute layer vs. Workers Storage layer 22

Future of Serverless Computing Opinion from Berkeley Report [1] • Challenges: Abstraction, System, Networking, Security, Architecture • Predictions: new BaaS, heterogeneous hardware, easy to program securely, cheaper, DB in BaaS, serverless replacing serverful Opinion from a CIDR’19 Paper [2] • Fluid Code and Data Placement • Heterogeneous Hardware Support • Long-Running, Addressable Virtual Agents • Disorderly programming • Flexible Programming, Common IR • Service-level objectives & guarantees • Security concerns [1] E. Jonas, et al. Cloud Programming Simplified: A Berkeley View on Serverless Computing , Berkeley TR 2019 23 [2] Hellerstein, Joseph M., et al. "Serverless computing: One step forward, two steps back." arXiv preprint arXiv:1812.03651 (2018).

Serverless – Q/A Replace S3 with other storage system? What about sorting? Is doublewrite an optimization? Poor tail latency a common problem in a distributed system? OLTP on serverless? Lambda + Starling vs. Hadoop? Starling bank based on Starling? Starling relying on AWS specifics (e.g., S3, pricing model, etc.) Cloud fosters the growth of small-scale data analytic needs? Indexing? 24

Group Discussion Starling and Snowflake represent the FaaS and BaaS approaches of implementing a database, respectively. What are the relative advantages and disadvantages of both approaches? What ideas can a BaaS implementation like Snowflake borrow from FaaS? How can OLTP benefit from serverless computing? Are there major limiting factors in today’s cloud? 25

CS 839: Design the Next-Generation Database Lecture 23: Serverless - PowerPoint PPT Presentation

CS 839: Design the Next-Generation Database Lecture 23: Serverless Xiangyao Yu 4/14/2020 1 Announcements Please sign up for the presentation slots following the email 2 Discussion Highlights How far away is Snowflake from the optimal

CS 839: Design the Next-Generation Database Lecture 6: Deterministic Database Xiangyao Yu

CS 839: Design the Next-Generation Database Lecture 7: GPU Database Xiangyao Yu 2/11/2020 1

CS 839: Design the Next-Generation Database Lecture 4: Multicore (Part I) Xiangyao Yu 1/30/2020

CS 839: Design the Next-Generation Database Lecture 24: HTAP Xiangyao Yu 4/16/2020 1

CS 839: Design the Next-Generation Database Lecture 19: RDMA for OLAP Xiangyao Yu 3/31/2020 1

CS 839: Design the Next-Generation Database Lecture 14: Process in Memory Xiangyao Yu 3/5/2020

CS 839: Design the Next-Generation Database Lecture 20: OLTP in Cloud Xiangyao Yu 4/2/2020 1

CS 839: Design the Next-Generation Database Lecture 2: Transaction Basics Xiangyao Yu 1/23/2020

CS 839: Design the Next-Generation Database Lecture 1: Introduction Xiangyao Yu 1/21/2020 Who

CS 839: Design the Next-Generation Database Lecture 22: Snowflake Xiangyao Yu 4/9/2020 1

CS 839: Design the Next-Generation Database Lecture 17: Smart NIC Xiangyao Yu 3/24/2020 1

CS 839: Design the Next-Generation Database Lecture 13: Smart SSD Xiangyao Yu 3/3/2020 1

Database Utilities 10/17/2007 DC/Win Database Utilities Opening Database Utilities From File on

THE FINEST HOMES DESERVE www.SabinaKier.com THE FINEST MARKETING. G oinG to the ends of the earth

Database Design October 24, 2008 Database Design Outline Database Design E-R diagrams

Video Consoles - The Next Generation consoles and games from Next Generation 1994 - present

Starling:** simplerconcurrencyproofs* Ma#$Windsor (1),$ Mike$Dodds (1) ,$$$$$$$Ma#hew$Parkinson

Geometry and Topology, Lecture 4 The fundamental group and covering spaces Text: Andrew Ranicki

Supporting Fault Tolerance in a Data-Intensive Computing Middleware Tekin Bicer, Wei Jiang and

Critical Leadership (23369) Self-leadership Week 16 workshop by Andrew Stewart and Chianu Dibia

Conference 2019 Keynote Helen Bierton, Head of Banking, Starling Bank Helping you lead a healthy

Starling: automating concurrency verification Mike Dodds (1) , Matthew Parkinson (2) , Matt

Informa(on transfer in moving animal groups: the case of

1 2 Some Background In our laboratory at the University of Maryland, we work with trajectory

CS 839: Design the Next-Generation Database Lecture 23: Serverless - PowerPoint PPT Presentation

CS 839: Design the Next-Generation Database Lecture 23: Serverless Xiangyao Yu 4/14/2020 1 Announcements Please sign up for the presentation slots following the email 2 Discussion Highlights How far away is Snowflake from the optimal

CS 839: Design the Next-Generation Database Lecture 6: Deterministic Database Xiangyao Yu

CS 839: Design the Next-Generation Database Lecture 7: GPU Database Xiangyao Yu 2/11/2020 1

CS 839: Design the Next-Generation Database Lecture 4: Multicore (Part I) Xiangyao Yu 1/30/2020

CS 839: Design the Next-Generation Database Lecture 24: HTAP Xiangyao Yu 4/16/2020 1

CS 839: Design the Next-Generation Database Lecture 19: RDMA for OLAP Xiangyao Yu 3/31/2020 1

CS 839: Design the Next-Generation Database Lecture 14: Process in Memory Xiangyao Yu 3/5/2020

CS 839: Design the Next-Generation Database Lecture 20: OLTP in Cloud Xiangyao Yu 4/2/2020 1

CS 839: Design the Next-Generation Database Lecture 2: Transaction Basics Xiangyao Yu 1/23/2020

CS 839: Design the Next-Generation Database Lecture 1: Introduction Xiangyao Yu 1/21/2020 Who

CS 839: Design the Next-Generation Database Lecture 22: Snowflake Xiangyao Yu 4/9/2020 1

CS 839: Design the Next-Generation Database Lecture 17: Smart NIC Xiangyao Yu 3/24/2020 1

CS 839: Design the Next-Generation Database Lecture 13: Smart SSD Xiangyao Yu 3/3/2020 1

Database Utilities 10/17/2007 DC/Win Database Utilities Opening Database Utilities From File on

THE FINEST HOMES DESERVE www.SabinaKier.com THE FINEST MARKETING. G oinG to the ends of the earth

Database Design October 24, 2008 Database Design Outline Database Design E-R diagrams

Video Consoles - The Next Generation consoles and games from Next Generation 1994 - present

Starling:** simpler*concurrency*proofs* Ma#$Windsor (1),$ Mike$Dodds (1) ,$$$$$$$Ma#hew$Parkinson

Geometry and Topology, Lecture 4 The fundamental group and covering spaces Text: Andrew Ranicki

Supporting Fault Tolerance in a Data-Intensive Computing Middleware Tekin Bicer, Wei Jiang and

Critical Leadership (23369) Self-leadership Week 16 workshop by Andrew Stewart and Chianu Dibia

Conference 2019 Keynote Helen Bierton, Head of Banking, Starling Bank Helping you lead a healthy

Starling: automating concurrency verification Mike Dodds (1) , Matthew Parkinson (2) , Matt

Informa(on transfer in moving animal groups: the case of

1 2 Some Background In our laboratory at the University of Maryland, we work with trajectory

Starling:** simplerconcurrencyproofs* Ma#$Windsor (1),$ Mike$Dodds (1) ,$$$$$$$Ma#hew$Parkinson