DATABASE SYSTEM IMPLEMENTATION GT 4420/6422 // SPRING 2019 // - PowerPoint PPT Presentation

37 COMPILER HINTS The restrict keyword in void add ( int * restrict X , C++ tells the compiler that int * restrict Y , int * restrict Z ) { the arrays are distinct for ( int i=0; i<MAX; i++) { locations in memory. Z[i] = X[i] + Y[i]; } }

38 COMPILER HINTS This pragma tells the void add ( int *X , compiler to ignore loop int *Y , int *Z ) { dependencies for the vectors. #pragma ivdep for ( int i=0; i<MAX; i++) { It’s up to you make sure that Z[i] = X[i] + Y[i]; } this is correct. }

39 EXPLICIT VECTORIZATION Use CPU intrinsics to manually marshal data between SIMD registers and execute vectorized instructions. Potentially not portable.

40 EXPLICIT VECTORIZATION Store the vectors in 128-bit void add ( int *X , SIMD registers. int *Y , int *Z ) { __mm128i *vecX = (__m128i*)X; Then invoke the intrinsic to __mm128i *vecY = (__m128i*)Y; add together the vectors and __mm128i *vecZ = (__m128i*)Z; for ( int i=0; i<MAX /4 ; i++) { write them to the output _mm_store_si128(vecZ++, location. _mm_add_epi32(*vecX, *vecY)) ; } }

41 VECTORIZATION DIRECTION Approach #1: Horizontal → Perform operation on all elements together within a single vector. Approach #2: Vertical → Perform operation in an elementwise manner on elements of each vector. Source: Przemys ł aw Karpi ń ski

42 VECTORIZATION DIRECTION Approach #1: Horizontal 0 1 2 3 → Perform operation on all elements 6 SIMD Add together within a single vector. Approach #2: Vertical → Perform operation in an elementwise manner on elements of each vector. Source: Przemys ł aw Karpi ń ski

43 VECTORIZATION DIRECTION Approach #1: Horizontal 0 1 2 3 → Perform operation on all elements 6 SIMD Add together within a single vector. Approach #2: Vertical 0 1 2 3 → Perform operation in an elementwise manner on elements of each vector. 1 2 3 4 SIMD Add 1 1 1 1 Source: Przemys ł aw Karpi ń ski

44 EXPLICIT VECTORIZATION Linear Access Operators → Predicate evaluation → Compression Ad-hoc Vectorization → Sorting → Merging Composable Operations → Multi-way trees → Bucketized hash tables Source: Orestis Polychroniou

45 VECTORIZED DBMS ALGORITHMS Principles for efficient vectorization by using fundamental vector operations to construct more advanced functionality. → Favor vertical vectorization by processing different input data per lane. → Maximize lane utilization by executing different things per lane subset. RET RETHINK NKING NG SIMD VEC ECTORI RIZATION N FO FOR R IN-ME MEMO MORY DA DATABASES SIGMOD 2015

46 FUNDAMENTAL OPERATIONS Selective Load Selective Store Selective Gather Selective Scatter

47 FUNDAMENTAL VECTOR OPERATIONS Selective Load Vector A B C D Mask 0 1 0 1 U V W X Y Z • • • Memory

52 FUNDAMENTAL VECTOR OPERATIONS Selective Load Vector A B C D U Mask 0 1 0 1 U V W X Y Z • • • Memory

53 FUNDAMENTAL VECTOR OPERATIONS Selective Load Vector A B C D U Mask 0 1 0 1 U V W X Y Z • • • Memory

54 FUNDAMENTAL VECTOR OPERATIONS Selective Load Vector A B C D U V Mask 0 1 0 1 U V W X Y Z • • • Memory

55 FUNDAMENTAL VECTOR OPERATIONS Selective Load Selective Store U V W X Y Z • • • Vector Memory A B C D U V Mask Mask 0 1 0 1 0 1 0 1 U V W X Y Z • • • Memory Vector A B C D

58 FUNDAMENTAL VECTOR OPERATIONS Selective Load Selective Store U V W X Y Z • • • Vector Memory A B C D U V B Mask Mask 0 1 0 1 0 1 0 1 U V W X Y Z • • • Memory Vector A B C D

59 FUNDAMENTAL VECTOR OPERATIONS Selective Load Selective Store U V W X Y Z • • • Vector Memory A B C D U V B Mask Mask 0 1 0 1 0 1 0 1 U V W X Y Z • • • Memory Vector A B C D

60 FUNDAMENTAL VECTOR OPERATIONS Selective Load Selective Store U V W X Y Z • • • Vector Memory A B C D U V B D Mask Mask 0 1 0 1 0 1 0 1 U V W X Y Z • • • Memory Vector A B C D

61 FUNDAMENTAL VECTOR OPERATIONS Selective Gather Value Vector A B C A D Index Vector 2 1 5 3 U V W X Y Z • • • Memory

62 FUNDAMENTAL VECTOR OPERATIONS Selective Gather Value Vector A B C A D Index Vector 2 1 5 3 U V W X Y Z • • • Memory

63 FUNDAMENTAL VECTOR OPERATIONS Selective Gather Value Vector A B C A D Index Vector 2 1 5 3 U V W X Y Z • • • Memory 0 1 2 3 4 5

64 FUNDAMENTAL VECTOR OPERATIONS Selective Gather Value Vector A B C A D Index Vector 2 1 5 3 U V W X Y Z • • • Memory 0 1 2 3 4 5

65 FUNDAMENTAL VECTOR OPERATIONS Selective Gather Value Vector A B W A C D Index Vector 2 1 5 3 U V W X Y Z • • • Memory 0 1 2 3 4 5

66 FUNDAMENTAL VECTOR OPERATIONS Selective Gather Value Vector A B W V Z A C X D Index Vector 2 1 5 3 U V W X Y Z • • • Memory 0 1 2 3 4 5

67 FUNDAMENTAL VECTOR OPERATIONS Selective Gather Selective Scatter U V W X Y Z • • • Value Vector Memory W V A B C Z A X D Index Vector Index Vector 2 1 5 3 2 1 5 3 U V W X Y Z • • • Memory Value Vector A B C D 0 1 2 3 4 5

68 FUNDAMENTAL VECTOR OPERATIONS Selective Gather Selective Scatter U V W X Y Z • • • Value Vector Memory W V A B C Z A X D Index Vector Index Vector 2 1 5 3 2 1 5 3 U V W X Y Z • • • Memory Value Vector A B C D 0 1 2 3 4 5

69 FUNDAMENTAL VECTOR OPERATIONS Selective Gather Selective Scatter 0 1 2 3 4 5 U V W X Y Z • • • Value Vector Memory A B W V C A Z D X Index Vector Index Vector 2 1 5 3 2 1 5 3 U V W X Y Z • • • Memory Value Vector A B C D 0 1 2 3 4 5

70 FUNDAMENTAL VECTOR OPERATIONS Selective Gather Selective Scatter 0 1 2 3 4 5 U V W X Y Z • • • Value Vector Memory A B W V C A Z D X Index Vector Index Vector 2 1 5 3 2 1 5 3 U V W X Y Z • • • Memory Value Vector A B C D 0 1 2 3 4 5

71 FUNDAMENTAL VECTOR OPERATIONS Selective Gather Selective Scatter 0 1 2 3 4 5 U V W X Y Z • • • Value Vector Memory A B W V A C Z D X B A D C Index Vector Index Vector 2 1 5 3 2 1 5 3 U V W X Y Z • • • Memory Value Vector A B C D 0 1 2 3 4 5

72 ISSUES Gathers and scatters are not really executed in parallel because the L1 cache only allows one or two distinct accesses per cycle. Gathers are only supported in newer CPUs. Selective loads and stores are also emulated in Xeon CPUs using vector permutations.

73 VECTORIZED OPERATORS Selection Scans Hash Tables Partitioning Paper provides additional info: → Joins, Sorting, Bloom filters. RET RETHINK NKING NG SIMD VEC ECTORI RIZATION N FO FOR R IN-ME MEMO MORY DA DATABASES SIGMOD 2015

74 SELECTION SCANS SELECT * FROM table WHERE key >= $(low) AND key <= $(high)

75 SELECTION SCANS Scalar (Branching) i = 0 for t in table : key = t.key if (key≥ low ) && (key≤ high ): copy (t, output[i]) i = i + 1

76 SELECTION SCANS Scalar (Branching) i = 0 for t in table : key = t.key if (key≥ low ) && (key≤ high ): copy (t, output[i]) i = i + 1

77 SELECTION SCANS Scalar (Branching) Scalar (Branchless) i = 0 i = 0 for t in table : for t in table : key = t.key copy (t, output[i]) if (key≥ low ) && (key≤ high ): key = t.key copy (t, output[i]) m = (key≥ low ? 1 : 0) && i = i + 1 (key≤ high ? 1 : 0) i = i + m

78 SELECTION SCANS Scalar (Branching) Scalar (Branchless) i = 0 i = 0 for t in table : for t in table : key = t.key copy (t, output[i]) if (key≥ low ) && (key≤ high ): key = t.key copy (t, output[i]) m = (key≥ low ? 1 : 0) && i = i + 1 (key≤ high ? 1 : 0) i = i + m

79 SELECTION SCANS Scalar (Branching) Scalar (Branchless) i = 0 i = 0 for t in table : for t in table : key = t.key copy (t, output[i]) if (key≥ low ) && (key≤ high ): key = t.key copy (t, output[i]) m = (key≥ low ? 1 : 0) && i = i + 1 (key≤ high ? 1 : 0) i = i + m Source: Bogdan Raducanu

80 SELECTION SCANS Vectorized i = 0 for v t in table : simdLoad ( v t .key, v k ) v m = ( v k ≥ low ? 1 : 0) && ( v k ≤ high ? 1 : 0) simdStore ( v t , v m , output[i]) i = i + | v m ≠false |

86 SELECTION SCANS Vectorized i = 0 for v t in table : simdLoad ( v t .key, v k ) v m = ( v k ≥ low ? 1 : 0) && ( v k ≤ high ? 1 : 0) simdStore ( v t , v m , output[i]) i = i + | v m ≠false | SELECT * FROM table WHERE key >= "O" AND key <= "U"

87 SELECTION SCANS Vectorized ID KEY 1 J i = 0 2 O for v t in table : 3 Y simdLoad ( v t .key, v k ) 4 S v m = ( v k ≥ low ? 1 : 0) && 5 U ( v k ≤ high ? 1 : 0) 6 X simdStore ( v t , v m , output[i]) i = i + | v m ≠false | SELECT * FROM table WHERE key >= "O" AND key <= "U"

88 SELECTION SCANS Vectorized Key Vector ID KEY J O Y S U X 1 J i = 0 2 O for v t in table : 3 Y simdLoad ( v t .key, v k ) 4 S v m = ( v k ≥ low ? 1 : 0) && 5 U ( v k ≤ high ? 1 : 0) 6 X simdStore ( v t , v m , output[i]) i = i + | v m ≠false | SELECT * FROM table WHERE key >= "O" AND key <= "U"

89 SELECTION SCANS Vectorized Key Vector ID KEY J O Y S U X 1 J i = 0 2 O for v t in table : SIMD Compare 3 Y simdLoad ( v t .key, v k ) 4 S v m = ( v k ≥ low ? 1 : 0) && Mask 0 1 0 1 1 0 5 U ( v k ≤ high ? 1 : 0) 6 X simdStore ( v t , v m , output[i]) i = i + | v m ≠false | SELECT * FROM table WHERE key >= "O" AND key <= "U"

90 SELECTION SCANS Vectorized Key Vector ID KEY J O Y S U X 1 J i = 0 2 O for v t in table : SIMD Compare 3 Y simdLoad ( v t .key, v k ) 4 S v m = ( v k ≥ low ? 1 : 0) && Mask 0 1 0 1 1 0 5 U ( v k ≤ high ? 1 : 0) 6 X simdStore ( v t , v m , output[i]) All Offsets i = i + | v m ≠false | 0 1 2 3 4 5 SELECT * FROM table WHERE key >= "O" AND key <= "U"

91 SELECTION SCANS Vectorized Key Vector ID KEY J O Y S U X 1 J i = 0 2 O for v t in table : SIMD Compare 3 Y simdLoad ( v t .key, v k ) 4 S v m = ( v k ≥ low ? 1 : 0) && Mask 0 1 0 1 1 0 5 U ( v k ≤ high ? 1 : 0) 6 X simdStore ( v t , v m , output[i]) All Offsets i = i + | v m ≠false | 0 1 2 3 4 5 SIMD Store SELECT * FROM table Matched Offsets 1 3 4 WHERE key >= "O" AND key <= "U"

92 SELECTION SCANS Scalar (Branching) Vectorized (Early Mat) Scalar (Branchless) Vectorized (Late Mat) MIC (Xeon Phi 7120P – 61 Cores + 4 × HT) Multi-Core (Xeon E3-1275v3 – 4 Cores + 2 × HT)

93 SELECTION SCANS Scalar (Branching) Vectorized (Early Mat) Scalar (Branchless) Vectorized (Late Mat) MIC (Xeon Phi 7120P – 61 Cores + 4 × HT) Multi-Core (Xeon E3-1275v3 – 4 Cores + 2 × HT) 48 6.0 (billion tuples / sec) (billion tuples / sec) Throughput Throughput 32 4.0 1.8 1.7 1.7 1.7 16 2.0 1.5 1.6 1.4 1.2 5.7 5.7 5.6 5.3 4.9 4.3 2.8 1.3 0 0.0 0 1 2 5 10 20 50 100 0 1 2 5 10 20 50 100 Selectivity (%) Selectivity (%)

94 SELECTION SCANS Scalar (Branching) Vectorized (Early Mat) Scalar (Branchless) Vectorized (Late Mat) MIC (Xeon Phi 7120P – 61 Cores + 4 × HT) Multi-Core (Xeon E3-1275v3 – 4 Cores + 2 × HT) 48 6.0 (billion tuples / sec) (billion tuples / sec) Throughput Throughput 32 4.0 16 2.0 0 0.0 0 1 2 5 10 20 50 100 0 1 2 5 10 20 50 100 Selectivity (%) Selectivity (%)

95 SELECTION SCANS Scalar (Branching) Vectorized (Early Mat) Scalar (Branchless) Vectorized (Late Mat) MIC (Xeon Phi 7120P – 61 Cores + 4 × HT) Multi-Core (Xeon E3-1275v3 – 4 Cores + 2 × HT) 48 6.0 (billion tuples / sec) (billion tuples / sec) Throughput Throughput 32 4.0 Memory Bandwidth 16 2.0 Memory Bandwidth 0 0.0 0 1 2 5 10 20 50 100 0 1 2 5 10 20 50 100 Selectivity (%) Selectivity (%)

98 HASH TABLES – PROBING Linear Probing Hash Table KEY PAYLOAD

99 HASH TABLES – PROBING Scalar Linear Probing Hash Table Input Key KEY PAYLOAD k1

100 HASH TABLES – PROBING Scalar Linear Probing Hash Table Input Key hash(key) Hash Index KEY PAYLOAD # k1 h1

101 HASH TABLES – PROBING Scalar Linear Probing Hash Table Input Key hash(key) Hash Index KEY PAYLOAD # k1 h1 = k1 k9

DATABASE SYSTEM IMPLEMENTATION GT 4420/6422 // SPRING 2019 // - PowerPoint PPT Presentation

DATABASE SYSTEM IMPLEMENTATION GT 4420/6422 // SPRING 2019 // @JOY_ARULRAJ LECTURE #23: VECTORIZED EXECUTION 2 ANATOMY OF A DATABASE SYSTEM Process Manager Connection Manager + Admission Control Query Parser Query Processor Query Optimizer

Database Utilities 10/17/2007 DC/Win Database Utilities Opening Database Utilities From File on

Advanced Database CS 525: Organization? Advanced Database =Database Implementation

NEBC Database Course 2008 Database Servers Database Interfaces Tim Booth : tbooth@ceh.ac.uk

DATABASE SYSTEMS Database programming in a web environment Database System Course, 2016-2017

DATABASE SYSTEMS Database programming in a web environment Database System Course AGENDA FOR

CS411: Two Perspectives on DBMS User perspective CS411 how to use a database system

Database Systems Database Systems 1 Creating a Database System Design Construction

National Address Database National Address Database What is a National Address Database?

DATABASE SECURITY CS4750 Database Systems Prof. Nada Basit Email: basit@virginia.edu Fall

DATABASE SECURITY CS4750 Database Systems Prof. Nada Basit Email: basit@virginia.edu Fall

CSc 337 LECTURE 24: CREATING A DATABASE AND MORE JOINS Creating a database In the command line

DATABASE SYSTEMS Introduction to MySQL Database System Course, 2016 AGENDA FOR TODAY

Lect ure # 11 ADVANCED DATABASE SYSTEMS System Catalogs and Database Compression @

Database Management System (DBMS) DBMS contains information about a particular enterprise

Distributed Databases 1 19.1 Distributed Database System A distributed database system

Distributed Databases Distributed database management system A distributed database (DDB) is

Lecture 5 - SIMD recap Welcome! , = (, ) ,

Performance of GeantV Soon Yung Jun, Philippe Canal, Guilherme Lima Sept. 13, 2019 PDS Geant

Feasibility study on polyparylene deposition in a PECVD reactor E. v. Wahl 1 , C Kirchberg 2 , M.

Factoring Large Numbers Factoring Large Numbers with the TWIRL Device with the TWIRL Device Adi

Convolution Engine Balancing Efficiency & Flexibility in Specialized Computing Wajahat

$ n

Revec: Program Rejuvenation through Revectorization Charith Mendis * Ajay Jain * Paras Jain

Kernels on structures Andrea Passerini passerini@disi.unitn.it Machine Learning Kernels on

DATABASE SYSTEM IMPLEMENTATION GT 4420/6422 // SPRING 2019 // - PowerPoint PPT Presentation

DATABASE SYSTEM IMPLEMENTATION GT 4420/6422 // SPRING 2019 // @JOY_ARULRAJ LECTURE #23: VECTORIZED EXECUTION 2 ANATOMY OF A DATABASE SYSTEM Process Manager Connection Manager + Admission Control Query Parser Query Processor Query Optimizer

Database Utilities 10/17/2007 DC/Win Database Utilities Opening Database Utilities From File on

Advanced Database CS 525: Organization? Advanced Database =Database Implementation

NEBC Database Course 2008 Database Servers Database Interfaces Tim Booth : tbooth@ceh.ac.uk

DATABASE SYSTEMS Database programming in a web environment Database System Course, 2016-2017

DATABASE SYSTEMS Database programming in a web environment Database System Course AGENDA FOR

CS411: Two Perspectives on DBMS User perspective CS411 how to use a database system

Database Systems Database Systems 1 Creating a Database System Design Construction

National Address Database National Address Database What is a National Address Database?

DATABASE SECURITY CS4750 Database Systems Prof. Nada Basit Email: basit@virginia.edu Fall

DATABASE SECURITY CS4750 Database Systems Prof. Nada Basit Email: basit@virginia.edu Fall

CSc 337 LECTURE 24: CREATING A DATABASE AND MORE JOINS Creating a database In the command line

DATABASE SYSTEMS Introduction to MySQL Database System Course, 2016 AGENDA FOR TODAY

Lect ure # 11 ADVANCED DATABASE SYSTEMS System Catalogs and Database Compression @

Database Management System (DBMS) DBMS contains information about a particular enterprise

Distributed Databases 1 19.1 Distributed Database System A distributed database system

Distributed Databases Distributed database management system A distributed database (DDB) is

Lecture 5 - SIMD recap Welcome! , = (, ) ,

Performance of GeantV Soon Yung Jun, Philippe Canal, Guilherme Lima Sept. 13, 2019 PDS Geant

Feasibility study on polyparylene deposition in a PECVD reactor E. v. Wahl 1 , C Kirchberg 2 , M.

Factoring Large Numbers Factoring Large Numbers with the TWIRL Device with the TWIRL Device Adi

Convolution Engine Balancing Efficiency &amp; Flexibility in Specialized Computing Wajahat

$ n

Revec: Program Rejuvenation through Revectorization Charith Mendis * Ajay Jain * Paras Jain

Kernels on structures Andrea Passerini passerini@disi.unitn.it Machine Learning Kernels on

Convolution Engine Balancing Efficiency & Flexibility in Specialized Computing Wajahat