The DataPath System: A Data-Centric Analytic Processing Engine for - PowerPoint PPT Presentation

The DataPath System: A Data-Centric Analytic Processing Engine for Large Data Warehouses Subi Arumugam 1 ,Alin Dobra 1 ,Christopher M. Jermaine 2 , Niketan Pansare 2 ,Luis Perez 2 1 University of Florida, 2 Rice University June 9, 2010

Motivation • Storage is cheap: 1TB disk is 80-100$ • Disks have high throughput • 100$ 1TB disk can do 150MB/s reads/writes • 4,000$ 1TB SSD (OCZ p88) reads at 1.4GB/s • Processors are fast: 6GFLOPs/Core, 24GFrops for 100$ • TPC-H Q1 ( at 1TB scale factor ) • 8 Aggregates over 95-97% of lineitem • need to read about 160-700GB: 2 P88 scan in 60-250s • need to perform 30FLOPs*6 · 10 9 =180GFLOPS; 8s • Q1 should be I/O bound; should do 8 in parallel

Motivation • Storage is cheap: 1TB disk is 80-100$ • Disks have high throughput • 100$ 1TB disk can do 150MB/s reads/writes • 4,000$ 1TB SSD (OCZ p88) reads at 1.4GB/s • Processors are fast: 6GFLOPs/Core, 24GFrops for 100$ • TPC-H Q1 ( at 1TB scale factor ) • 8 Aggregates over 95-97% of lineitem • need to read about 160-700GB: 2 P88 scan in 60-250s • need to perform 30FLOPs*6 · 10 9 =180GFLOPS; 8s • Q1 should be I/O bound; should do 8 in parallel • Best non-clustered performer: 142s for 1.7M$ • 64 cores, 512GB memory, 576 disks

Large Scale Analytics Goals • Deal with analytical queries on large data (1-10TB) • Get closer to theoretical CPU performance • gap stands at 100-1000 for most databases • Sub 100,000$ system with minute response time (1TB) • stay I/O bound even with fast disks and multiple queries • No or little tuning: no indexing, no tunable partitioning

Large Scale Analytics Goals • Deal with analytical queries on large data (1-10TB) • Get closer to theoretical CPU performance • gap stands at 100-1000 for most databases • Sub 100,000$ system with minute response time (1TB) • stay I/O bound even with fast disks and multiple queries • No or little tuning: no indexing, no tunable partitioning DataPath • System designed from ground up to meet these goals.

Benchmark System Old System (2008) – 60,000 $ • 8 processors, 32 cores • 128GB DDR2 RAM (16 bays) • 2 Averatec RAID controlless, 4 12-disk enclosures • 47 Velociraptor Disks, 8 Baracuda disks • Maximum aggregate throughput 2.2GB/s New System (2010) – 20,000 $ • 4 processors, 48 cores • 128GB DDR3 memory • 2 OCZ Z-drive 1TB PCI SSD disks • Maximum aggregate throughput 2.8GB/s

Data-centric Computation

DataPath Execution Model • Tuple-oriented execution model • Tuples shared by queries in the system • Chunks of tuples pushed into waypoints for processing • Waypoints implement operations for multiple queries • Tuple processing loops at full CPU speed for (int i = 0; true; i++) { if (tuple[i].BelongsTo (Q1)) Q1.Process (tuple[i]); if (tuple[i].BelongsTo (Q2)) Q2.Process (tuple[i]); if (tuple[i].BelongsTo (Q3)) Q3.Process (tuple[i]); }

Query Execution – Example Q 1 : SELECT SUM (l quantity) FROM lineitem WHERE l shipdate > ’1-1-06’; out � Q 1 : SUM(l_quantity) � Q 1 : l_shipdate > ‘1-1-06’ Q 1 lineitem orders

Query Execution – Example Q 1 : SELECT SUM (l quantity) FROM lineitem WHERE l shipdate > ’1-1-06’; out out Q 2 : SELECT SUM (l extendedprice) � Q 2 : SUM(l_extendedprice) FROM lineitem, order WHERE � Q 1 : SUM(l_quantity) l shipmode <> ’rail’ Q 2 : l_orderkey = o_orderkey Q 1 AND o orderdate < ’1-1-08’ Q 2 � Q 2 : o_orderdate < ‘1-1-08’ AND l orderkey = o orderkey; � Q 1 : l_shipdate > ‘1-1-06’ Q 2 : l_shipmode <> ‘rail’ Q 2 Q 1 , Q 2 orders lineitem

Query Execution – Example Q 1 : SELECT SUM (l quantity) FROM lineitem WHERE l shipdate > ’1-1-06’; out out Q 2 : SELECT SUM (l extendedprice) � Q 2 : SUM(l_extendedprice) FROM lineitem, order WHERE � Q 1 : SUM(l_quantity) l shipmode <> ’rail’ Q 2 : l_orderkey = o_orderkey Q 1 AND o orderdate < ’1-1-08’ Q 2 � Q 2 : o_orderdate < ‘1-1-08’ AND l orderkey = o orderkey; � Q 1 : l_shipdate > ‘1-1-06’ Q 2 : l_shipmode <> ‘rail’ Q 2 Q 1 , Q 2 Q 3 : SELECT AVG (l discount) FROM lineitem, orders WHERE lineitem orders o custkey = 1234 AND l orderkey = o orderkey;

Tuple Processing Loop Usual problems: • branch mis-prediction • instruction cache misses • per-tuple overhead DataPath solution – Use a C++ meta-compiler • generate new tuple processing loops for each waypoint when new queries added • code is human-readable (has even comments) • compiled as a library with -O3 -msse4.1 • everything is hardcoded • compiler finds sharing, branch-misprediction, SSE

File Scanner Staging Area Chunk 1 Chunk 2 Chunk 3 Chunk 4 Disk 1 Disk 2 Disk 3 Disk 4 Disk 5 3 2 1 5 4 6 10 9 7 8 12 15 11 14 13 20 18 16 17 19 Finished:

File Scanner Staging Area Chunk 1 Chunk 2 Chunk 3 Chunk 4 2 Disk 1 Disk 2 Disk 3 Disk 4 Disk 5 3 1 5 4 6 10 9 7 8 12 15 11 14 13 20 18 16 17 19 Finished:

File Scanner Staging Area Chunk 1 Chunk 2 Chunk 3 Chunk 4 2 3 Disk 1 Disk 2 Disk 3 Disk 4 Disk 5 1 5 4 6 10 9 7 8 12 15 11 14 13 20 18 16 17 19 Finished:

File Scanner Staging Area Chunk 1 Chunk 2 Chunk 3 Chunk 4 2 5 3 Disk 1 Disk 2 Disk 3 Disk 4 Disk 5 1 4 6 10 9 7 8 12 15 11 14 13 20 18 16 17 19 Finished:

File Scanner Staging Area Chunk 1 Chunk 2 Chunk 3 Chunk 4 1 2 5 3 Disk 1 Disk 2 Disk 3 Disk 4 Disk 5 4 6 10 9 7 8 12 15 11 14 13 20 18 16 17 19 Finished:

File Scanner Staging Area Chunk 1 Chunk 2 Chunk 3 Chunk 4 1 2 5 3 4 Disk 1 Disk 2 Disk 3 Disk 4 Disk 5 6 10 9 7 8 12 15 11 14 13 20 18 16 17 19 Finished:

File Scanner Staging Area Chunk 1 Chunk 2 Chunk 3 Chunk 4 1 2 5 6 3 4 Disk 1 Disk 2 Disk 3 Disk 4 Disk 5 10 9 7 8 12 15 11 14 13 20 18 16 17 19 Finished:

File Scanner Staging Area Chunk 1 Chunk 2 Chunk 3 Chunk 4 1 2 5 6 3 4 8 Disk 1 Disk 2 Disk 3 Disk 4 Disk 5 10 9 7 12 15 11 14 13 20 18 16 17 19 Finished: Chunk 1

File Scanner Staging Area Chunk 5 Chunk 2 Chunk 3 Chunk 4 5 6 7 8 Disk 1 Disk 2 Disk 3 Disk 4 Disk 5 10 9 12 15 11 14 13 20 18 16 17 19 Finished: Chunk 1

File Scanner Staging Area Chunk 5 Chunk 2 Chunk 3 Chunk 4 5 6 10 7 8 Disk 1 Disk 2 Disk 3 Disk 4 Disk 5 9 12 15 11 14 13 20 18 16 17 19 Finished: Chunk 1

File Scanner Staging Area Chunk 5 Chunk 2 Chunk 3 Chunk 4 5 6 9 10 7 8 Disk 1 Disk 2 Disk 3 Disk 4 Disk 5 12 15 11 14 13 20 18 16 17 19 Finished: Chunk 1

The DataPath System: A Data-Centric Analytic Processing Engine for - PowerPoint PPT Presentation

The DataPath System: A Data-Centric Analytic Processing Engine for Large Data Warehouses Subi Arumugam 1 ,Alin Dobra 1 ,Christopher M. Jermaine 2 , Niketan Pansare 2 ,Luis Perez 2 1 University of Florida, 2 Rice University June 9, 2010

This Unit: Single-Cycle Datapath App App App Datapath storage elements System software

Datapath Elements & Single Cycle Datapath Unit Chapter 11 Datapath Elements Introduction

Datapath Design, Coding Standards, and Lab 2 1 Separating Control From Data The datapath is

Lesson 10 Processors Continued Building a datapath Datapath element is a unit used to operate

FSMD%Block%Diagram FSM$Datapath*Systems Datapath%Elements

LECTURE 5 Single-Cycle Datapath and Control PROCESSORS Datapath and control are the two

CSCI-2500: Computer Organization Processor Design Datapath n The datapath is the interconnection

Zeros of analytic functions Lecture 14 Zeros of analytic functions Zeros of analytic functions

Control Path Design and Lab 3 1 Separating Control From Data The datapath is where data

Control Path Design and Lab 3 1 Separating Control From Data The datapath is where data

GraVF: GraVF: A Vertex-Centric A Vertex-Centric Graph Processing Graph Processing Framework

A Decision A Decision A Decision-Analytic Approach for A Decision Analytic Approach for

CSCI341 Lecture 30, Building a Datapath RECALL... The datapath is a representation of

Pipelining Full Datapath Chapter 4 The Processor 2 Datapath With Control Chapter 4

CENG 342 Digital Systems Finite State Machine with Datapath (FSMD) Larry Pyeatt SDSM&T

CSSE232 Computer Architecture I Mul5cycle Datapath Class Status

Operating Systems 1 About Me Heechul Yun, Assistant Prof., Dept. of EECS Office: 3040

Informatics 1 Lecture 8 Resolution (continued) Michael Fourman "I am never really satisfied

Dynamic analysis and preliminary design of twin-cylinder engines for clean propulsion systems Y.

A Brief History of Computers A Brief History of Computers A Brief History of Computers By

Lecture 1 Introduction 1-0 Welcome to 433330 http://www.cs.mu.oz.au/330/

Trees Everything falls in order between the ends Arrays, linked lists, queues, stacks,

CS 310 Advanced Data Structures and Algorithms Tree June 14, 2018 Mohammad Hadian Advanced

Tree and Its Implementation Tessema M. Mengistu Department of Computer Science Southern Illinois

The DataPath System: A Data-Centric Analytic Processing Engine for - PowerPoint PPT Presentation

The DataPath System: A Data-Centric Analytic Processing Engine for Large Data Warehouses Subi Arumugam 1 ,Alin Dobra 1 ,Christopher M. Jermaine 2 , Niketan Pansare 2 ,Luis Perez 2 1 University of Florida, 2 Rice University June 9, 2010

This Unit: Single-Cycle Datapath App App App Datapath storage elements System software

Datapath Elements &amp; Single Cycle Datapath Unit Chapter 11 Datapath Elements Introduction

Datapath Design, Coding Standards, and Lab 2 1 Separating Control From Data The datapath is

Lesson 10 Processors Continued Building a datapath Datapath element is a unit used to operate

FSMD%Block%Diagram FSM$Datapath*Systems Datapath%Elements

LECTURE 5 Single-Cycle Datapath and Control PROCESSORS Datapath and control are the two

CSCI-2500: Computer Organization Processor Design Datapath n The datapath is the interconnection

Zeros of analytic functions Lecture 14 Zeros of analytic functions Zeros of analytic functions

Control Path Design and Lab 3 1 Separating Control From Data The datapath is where data

Control Path Design and Lab 3 1 Separating Control From Data The datapath is where data

GraVF: GraVF: A Vertex-Centric A Vertex-Centric Graph Processing Graph Processing Framework

A Decision A Decision A Decision-Analytic Approach for A Decision Analytic Approach for

CSCI341 Lecture 30, Building a Datapath RECALL... The datapath is a representation of

Pipelining Full Datapath Chapter 4 The Processor 2 Datapath With Control Chapter 4

CENG 342 Digital Systems Finite State Machine with Datapath (FSMD) Larry Pyeatt SDSM&amp;T

CSSE232 Computer Architecture I Mul5cycle Datapath Class Status

Operating Systems 1 About Me Heechul Yun, Assistant Prof., Dept. of EECS Office: 3040

Informatics 1 Lecture 8 Resolution (continued) Michael Fourman &quot;I am never really satisfied

Dynamic analysis and preliminary design of twin-cylinder engines for clean propulsion systems Y.

A Brief History of Computers A Brief History of Computers A Brief History of Computers By

Lecture 1 Introduction 1-0 Welcome to 433330 http://www.cs.mu.oz.au/330/

Trees Everything falls in order between the ends Arrays, linked lists, queues, stacks,

CS 310 Advanced Data Structures and Algorithms Tree June 14, 2018 Mohammad Hadian Advanced

Tree and Its Implementation Tessema M. Mengistu Department of Computer Science Southern Illinois

Datapath Elements & Single Cycle Datapath Unit Chapter 11 Datapath Elements Introduction

CENG 342 Digital Systems Finite State Machine with Datapath (FSMD) Larry Pyeatt SDSM&T

Informatics 1 Lecture 8 Resolution (continued) Michael Fourman "I am never really satisfied