ADVANCED DATABASE SYSTEMS Query Execution & Processing @ - PowerPoint PPT Presentation

Lect ure # 13 ADVANCED DATABASE SYSTEMS Query Execution & Processing @ Andy_Pavlo // 15- 721 // Spring 2020

2 ARCHITECTURE OVERVIEW Networking Layer SQL Query SQL Parser Planner Binder Rewriter Optimizer / Cost Models Compiler Scheduling / Placement We Are Here Execution Engine Concurrency Control Operator Execution Indexes Storage Manager Storage Models Logging / Checkpoints 15-721 (Spring 2020)

3 EXECUTIO N OPTIM IZATION We are now going to start discussing ways to improve the DBMS's query execution performance for data sets that fit entirely in memory. There are other bottlenecks to target when we remove the disk. 15-721 (Spring 2020)

4 OPTIM IZATIO N GOALS Approach #1: Reduce Instruction Count → Use fewer instructions to do the same amount of work. Approach #2: Reduce Cycles per Instruction → Execute more CPU instructions in fewer cycles. Approach #3: Parallelize Execution → Use multiple threads to compute each query in parallel. 15-721 (Spring 2020)

5 ACCESS PATH SELECTIO N One major decision in query planning is whether to perform a sequential scan or index scan to retrieve data from table. This decision depends on the selectivity of predicates as well as hardware performance and concurrency. ACCESS PATH SELECTION IN MAIN- MEMORY OPTIMIZED DATA SYSTEMS: SHOULD I SCAN OR SHOULD I PROBE? SIGMOD 2017 15-721 (Spring 2020)

6 OPERATO R EXECUTIO N Query Plan Processing Scan Sharing Materialized Views Query Compilation Vectorized Operators Parallel Algorithms Application Logic Execution (UDFs) 15-721 (Spring 2020)

8 MonetDB/X100 Analysis Processing Models Parallel Execution 15-721 (Spring 2020)

9 M ONETDB/ X10 0 (20 0 5) Low-level analysis of execution bottlenecks for in- memory DBMSs on OLAP workloads. → Show how DBMS are designed incorrectly for modern CPU architectures. Based on these findings, they proposed a new DBMS called MonetDB/X100. → Renamed to Vectorwise and acquired by Actian in 2010. → Rebranded as Vector and Avalanche. MONETDB/X100: HYPER- PIPELINING QUERY EXECUTION CIDR 2005 15-721 (Spring 2020)

10 CPU OVERVIEW CPUs organize instructions into pipeline stages . The goal is to keep all parts of the processor busy at each cycle by masking delays from instructions that cannot complete in a single cycle. Super-scalar CPUs support multiple pipelines. → Execute multiple instructions in parallel in a single cycle if they are independent ( out-of-order execution). Everything is fast until there is a mistake… 15-721 (Spring 2020)

11 DBM S / CPU PROBLEM S Problem #1: Dependencies → If one instruction depends on another instruction, then it cannot be pushed immediately into the same pipeline. Problem #2: Branch Prediction → The CPU tries to predict what branch the program will take and fill in the pipeline with its instructions. → If it gets it wrong, it must throw away any speculative work and flush the pipeline. 15-721 (Spring 2020)

12 BRANCH M ISPREDICTIO N Because of long pipelines, CPUs will speculatively execute branches. This potentially hides the long stalls between dependent instructions. The most executed branching code in a DBMS is the filter operation during a sequential scan. But this is (nearly) impossible to predict correctly. 15-721 (Spring 2020)

13 SELECTIO N SCANS SELECT * FROM table WHERE key >= $(low) AND key <= $(high) Source: Bogdan Raducanu 15-721 (Spring 2020)

13 SELECTIO N SCANS Scalar (Branching) Scalar (Branchless) i = 0 i = 0 for t in table : for t in table : key = t.key copy (t, output[i]) if ( key≥ low ) && ( key≤ high ): key = t.key copy (t, output[i]) m = ( key≥ low ? 1 : 0) && ⮱ ( key≤ high ? 1 : 0) i = i + 1 i = i + m Source: Bogdan Raducanu 15-721 (Spring 2020)

14 SELECTIO N SCANS Source: Bogdan Raducanu 15-721 (Spring 2020)

15 EXCESSIVE INSTRUCTIO NS The DBMS needs to support different data types, so it must check a values type before it performs any operation on that value. → This is usually implemented as giant switch statements. → Also creates more branches that can be difficult for the CPU to predict reliably. Example: Postgres' addition for NUMERIC types. 15-721 (Spring 2020)

16 PROCESSIN G M ODEL A DBMS's processing model defines how the system executes a query plan. → Different trade-offs for workloads (OLTP vs. OLAP). Approach #1: Iterator Model Approach #2: Materialization Model Approach #3: Vectorized / Batch Model 15-721 (Spring 2020)

17 ITERATO R M ODEL Each query plan operator implements a next function. → On each invocation, the operator returns either a single tuple or a null marker if there are no more tuples. → The operator implements a loop that calls next on its children to retrieve their tuples and then process them. Also called Volcano or Pipeline Model. 15-721 (Spring 2020)

18 ITERATO R M ODEL SELECT R.id, S.cdate Next() for t in child.Next() : FROM R JOIN S emit ( projection (t)) ON R.id = S.id WHERE S.value > 100 Next() for t 1 in left.Next() : p buildHashTable (t 1 ) for t 2 in right.Next() : R.id, S.value if probe (t 2 ): emit (t 1 ⨝ t 2 ) ⨝ R.id=S.id Next() for t in child.Next() : s if evalPred (t): emit (t) value>100 Next() Next() for t in R: for t in S: R S emit (t) emit (t) 15-721 (Spring 2020)

18 ITERATO R M ODEL SELECT R.id, S.cdate for t in child.Next() : 1 FROM R JOIN S emit ( projection (t)) ON R.id = S.id WHERE S.value > 100 for t 1 in left.Next() : p 2 buildHashTable (t 1 ) for t 2 in right.Next() : R.id, S.value if probe (t 2 ): emit (t 1 ⨝ t 2 ) ⨝ R.id=S.id for t in child.Next() : Single Tuple s if evalPred (t): emit (t) value>100 for t in R: for t in S: 3 R S emit (t) emit (t) 15-721 (Spring 2020)

18 ITERATO R M ODEL SELECT R.id, S.cdate for t in child.Next() : 1 FROM R JOIN S emit ( projection (t)) ON R.id = S.id WHERE S.value > 100 for t 1 in left.Next() : p 2 buildHashTable (t 1 ) for t 2 in right.Next() : R.id, S.value if probe (t 2 ): emit (t 1 ⨝ t 2 ) ⨝ R.id=S.id for t in child.Next() : 4 s if evalPred (t): emit (t) value>100 for t in R: for t in S: 3 5 R S emit (t) emit (t) 15-721 (Spring 2020)

19 ITERATO R M ODEL This is used in almost every DBMS. Allows for tuple pipelining . Some operators must block until their children emit all their tuples. → Joins, Subqueries, Order By Output control works easily with this approach. 15-721 (Spring 2020)

20 M ATERIALIZATIO N M ODEL Each operator processes its input all at once and then emits its output all at once. → The operator "materializes" it output as a single result. → The DBMS can push down hints into to avoid scanning too many tuples. → Can send either a materialized row or a single column. The output can be either whole tuples (NSM) or subsets of columns (DSM) 15-721 (Spring 2020)

21 M ATERIALIZATIO N M ODEL out = [ ] for t in child.Output() : 1 SELECT R.id, S.cdate out.add ( projection (t)) FROM R JOIN S return out ON R.id = S.id out = [ ] WHERE S.value > 100 for t 1 in left.Output() : p 2 buildHashTable (t 1 ) for t 2 in right.Output() : R.id, S.value if probe (t 2 ): out.add (t 1 ⨝ t 2 ) return out ⨝ R.id=S.id out = [ ] for t in child.Output() : s if evalPred (t): out.add (t) All Tuples value>100 return out R S out = [ ] out = [ ] 3 for t in R: for t in S: out.add (t) out.add (t) return out return out 15-721 (Spring 2020)

21 M ATERIALIZATIO N M ODEL out = [ ] for t in child.Output() : 1 SELECT R.id, S.cdate out.add ( projection (t)) FROM R JOIN S return out ON R.id = S.id out = [ ] WHERE S.value > 100 for t 1 in left.Output() : p 2 buildHashTable (t 1 ) for t 2 in right.Output() : R.id, S.value if probe (t 2 ): out.add (t 1 ⨝ t 2 ) return out ⨝ R.id=S.id out = [ ] for t in child.Output() : s 4 if evalPred (t): out.add (t) value>100 return out R S out = [ ] out = [ ] 3 5 for t in R: for t in S: out.add (t) out.add (t) return out return out 15-721 (Spring 2020)

21 M ATERIALIZATIO N M ODEL out = [ ] for t in child.Output() : 1 SELECT R.id, S.cdate out.add ( projection (t)) FROM R JOIN S return out ON R.id = S.id out = [ ] WHERE S.value > 100 for t 1 in left.Output() : p 2 buildHashTable (t 1 ) for t 2 in right.Output() : R.id, S.value if probe (t 2 ): out.add (t 1 ⨝ t 2 ) return out ⨝ R.id=S.id out = [ ] for t in S : s if evalPred (t): out.add (t) value>100 return out R S out = [ ] 3 for t in R: out.add (t) return out 15-721 (Spring 2020)

ADVANCED DATABASE SYSTEMS Query Execution & Processing @ - PowerPoint PPT Presentation

Lect ure # 13 ADVANCED DATABASE SYSTEMS Query Execution & Processing @ Andy_Pavlo // 15- 721 // Spring 2020 2 ARCHITECTURE OVERVIEW Networking Layer SQL Query SQL Parser Planner Binder Rewriter Optimizer / Cost Models Compiler

Database Utilities 10/17/2007 DC/Win Database Utilities Opening Database Utilities From File on

Advanced Database CS 525: Organization? Advanced Database =Database Implementation

Advanced Database Management Systems Database Management Systems Alvaro A A Fernandes School of

DATABASE SYSTEMS Database programming in a web environment Database System Course, 2016-2017

DATABASE SYSTEMS Database programming in a web environment Database System Course AGENDA FOR

Lect ure # 11 ADVANCED DATABASE SYSTEMS System Catalogs and Database Compression @

NEBC Database Course 2008 Database Servers Database Interfaces Tim Booth : tbooth@ceh.ac.uk

DATABASE SECURITY CS4750 Database Systems Prof. Nada Basit Email: basit@virginia.edu Fall

DATABASE SECURITY CS4750 Database Systems Prof. Nada Basit Email: basit@virginia.edu Fall

Database Systems Database Systems 1 Creating a Database System Design Construction

Overview of Database Systems CS3860 - Jay Urbain, PhD Introduction to Database Systems 1 2 UFR

ADVANCED DATABASE SYSTEMS Database Compression @ Andy_Pavlo // 15- 721 // Spring 2019 CMU

National Address Database National Address Database What is a National Address Database?

CSc 337 LECTURE 24: CREATING A DATABASE AND MORE JOINS Creating a database In the command line

Lect ure # 01 ADVANCED DATABASE SYSTEMS Course Introduction & History of Database Systems

ADVANCED DATABASE SYSTEMS Self-Driving Database Management Systems @ Andy_Pavlo // 15- 721 //

NOT FOR USE Presented by: Instructor Name WELCOME! DO NOT DUPLICATE Registered Representative

CHECKLISTS FOR FINANCIAL AND COMPLIANCE AUDIT Helena Lopes Helena Fernandes Checklists for

Social communication with Covid DSA review Virtual vs. physical communication New

Bridge Design Prof. Dr. W. Kaufmann Spring Semester 2020 13.02.2020 ETH Zurich | Chair of

LHCb Online System VELO/BGV Slice Detector VELO ECal HCal Muon ST OT RICH L0 Trigger

CSC 6991: Using Hardware Isolated Execu;on Environments for

Service Mesh sh Interface Brendan Burns QCon New York 2019 This Photoby Unknown Author is

Multi-sided Bzier surfaces over concave polygonal domains Pter Salvi, Tams Vrady

ADVANCED DATABASE SYSTEMS Query Execution & Processing @ - PowerPoint PPT Presentation

Lect ure # 13 ADVANCED DATABASE SYSTEMS Query Execution & Processing @ Andy_Pavlo // 15- 721 // Spring 2020 2 ARCHITECTURE OVERVIEW Networking Layer SQL Query SQL Parser Planner Binder Rewriter Optimizer / Cost Models Compiler

Database Utilities 10/17/2007 DC/Win Database Utilities Opening Database Utilities From File on

Advanced Database CS 525: Organization? Advanced Database =Database Implementation

Advanced Database Management Systems Database Management Systems Alvaro A A Fernandes School of

DATABASE SYSTEMS Database programming in a web environment Database System Course, 2016-2017

DATABASE SYSTEMS Database programming in a web environment Database System Course AGENDA FOR

Lect ure # 11 ADVANCED DATABASE SYSTEMS System Catalogs and Database Compression @

NEBC Database Course 2008 Database Servers Database Interfaces Tim Booth : tbooth@ceh.ac.uk

DATABASE SECURITY CS4750 Database Systems Prof. Nada Basit Email: basit@virginia.edu Fall

DATABASE SECURITY CS4750 Database Systems Prof. Nada Basit Email: basit@virginia.edu Fall

Database Systems Database Systems 1 Creating a Database System Design Construction

Overview of Database Systems CS3860 - Jay Urbain, PhD Introduction to Database Systems 1 2 UFR

ADVANCED DATABASE SYSTEMS Database Compression @ Andy_Pavlo // 15- 721 // Spring 2019 CMU

National Address Database National Address Database What is a National Address Database?

CSc 337 LECTURE 24: CREATING A DATABASE AND MORE JOINS Creating a database In the command line

Lect ure # 01 ADVANCED DATABASE SYSTEMS Course Introduction &amp; History of Database Systems

ADVANCED DATABASE SYSTEMS Self-Driving Database Management Systems @ Andy_Pavlo // 15- 721 //

NOT FOR USE Presented by: Instructor Name WELCOME! DO NOT DUPLICATE Registered Representative

CHECKLISTS FOR FINANCIAL AND COMPLIANCE AUDIT Helena Lopes Helena Fernandes Checklists for

Social communication with Covid DSA review Virtual vs. physical communication New

Bridge Design Prof. Dr. W. Kaufmann Spring Semester 2020 13.02.2020 ETH Zurich | Chair of

LHCb Online System VELO/BGV Slice Detector VELO ECal HCal Muon ST OT RICH L0 Trigger

CSC 6991: Using Hardware Isolated Execu;on Environments for

Service Mesh sh Interface Brendan Burns QCon New York 2019 This Photoby Unknown Author is

Multi-sided Bzier surfaces over concave polygonal domains Pter Salvi, Tams Vrady

Lect ure # 01 ADVANCED DATABASE SYSTEMS Course Introduction & History of Database Systems