In-Place Data Structures: Which Complexity Measures Do Matter? Jyrki - PowerPoint PPT Presentation

In-Place Data Structures: Which Complexity Measures Do Matter? Jyrki Katajainen 1 , 2 Jingsen Chen 3 , Stefan Edelkamp 4 , Amr Elmasry 5 , Max Stenmark 2 1 Københavns Universitet 2 Jyrki Katajainen and Company 3 Lule˚ a Tekniska Universitet 4 Universit¨ at Bremen 5 Alexandria University � Performance Engineering Laboratory c ARCO meeting at ITU, fall 2012 (1)

Model of computation Available • An infinite array a suitable for storing elements • O (1) number of other memory locations for storing elements • O (1) number of other variables (counters, indices, bit strings of length ⌈ lg(1 + n ) ⌉ ) workspace n = 8 a 5 6 7 0 1 2 3 4 Requirement • If the data structure stores n elements, these elements must be kept in the first n locations of a . � Performance Engineering Laboratory c ARCO meeting at ITU, fall 2012 (2)

Coverage In-place data structures Complexity measures • Binary heaps • Space utilization • Static search trees • # Element comparisons • # Element moves • # Cache misses • # Branch mispredictions • Running time Aha! The whole cycle What is important? design analysis experimentation implementation � Performance Engineering Laboratory c ARCO meeting at ITU, fall 2012 (3)

Binary heaps 0 8 1 2 10 26 construct () 3 4 5 6 for ( i = parent ( n − 1); i ≥ 0; −− i ) 75 12 46 75 siftdown ( i ) 7 minimum () 80 return a [0] n = 8 insert ( x ) a 8 10 26 75 12 46 75 80 a [ n ] = x 5 6 7 0 1 2 3 4 siftup ( n ) n += 1 left - child ( i ) return 2 i + 1 extract - min () min = a [0] right - child ( i ) n − = 1 return 2 i + 2 a [0] = a [ n ] parent ( i ) siftdown (0) return ⌊ ( i − 1) / 2 ⌋ return min � Performance Engineering Laboratory c ARCO meeting at ITU, fall 2012 (4)

Experimental setup Standard benchmark Processor � Intel R Core TM – construct a heap of size n i5-2520M Input data CPU @ 2.50GHz × 4 All elements are of type int Memory system Repetitions 12-way-associative L3 cache: Repeat each experiment 3 MB r times, r = 2 26 /n cache lines: 64 B Reported value main memory: 3.8 GB Measurement result divided Operating system by r × n Ubuntu 12.04 (Linux kernel 3.2.0-29-generic) Compiler compiler ( gcc version g++ 4.6.3) with optimization -O3 � Performance Engineering Laboratory c ARCO meeting at ITU, fall 2012 (5)

Reduce # element comparisons Inventor construct insert extract - min Extra Space Williams/Floyd 2 n ∼ lg n ∼ 2 lg n O (1) words Gonnet & Munro 1 . 625 n Θ( n ) words ∼ lg n + log ∗ n Gonnet & Munro ∼ lg lg n O (1) words Lower bounds ∼ 1 . 37 n Ω(1) ∼ lg n Ω(1) words construct : Use a binomial tree in the construction insert : Binary search on the siftup path extract - min : lg n − lg lg n levels down along the siftdown path, siftup or recur further down � Performance Engineering Laboratory c ARCO meeting at ITU, fall 2012 (6)

Floyd’s heap-construction program 1 template < typename position , typename index , typename comparator > 2 void siftdown ( position a , index i , index n , comparator less ) { 3 typedef typename std : : iterator_traits < position > :: value_type element ; 4 element copy = a [ i ] ; 0 5 loop : index j = 2 ∗ i ; 6 8 7 i f ( j < = n ) { 1 2 8 i f ( j < n ) 26 10 9 i f ( less ( a [ j ] , a [ j + 1]) ) 3 5 4 6 10 j = j + 1; 11 i f ( less ( copy , a [ j ]) ) { 75 12 46 75 12 a [ i ] = a [ j ] ; 7 13 i = j ; 80 14 goto loop ; 15 } n = 8 16 } 17 a [ i ] = copy ; a 8 10 26 75 12 46 75 80 18 } 0 1 2 3 4 5 6 7 19 20 template < typename position , typename comparator > comparator less ) { 21 void make_heap ( position first , position beyond , 22 typedef typename std : : iterator_traits < position > :: difference_type index ; 23 position const a = first − 1; 24 index const n = beyond − first ; 25 for ( index i = n / 2; i > 0; −− i ) 26 siftdown ( a , i , n , less ) ; [Floyd 1964] 27 } � Performance Engineering Laboratory c ARCO meeting at ITU, fall 2012 (7)

Remove an easy-to-predict if opt 1 : Make sure that siftdown is always called with an odd n i f ( j < n ) . . . for ( index i = n / 2; i > 0; −− i ) siftdown ( a , i , n , less ) ; − → template < typename position , typename index , typename comparator > void siftup ( position a , index j , comparator less ) { . . . Construction time [ns] } n F F 1 index const m = ( n & 1) ? n : n − 1; for ( index i = m / 2; i > 0; −− i ) 2 10 7.5 7.1 siftdown ( a , i , m , less ) ; 2 15 siftup ( a , n , less ) ; 7.4 7.0 2 20 8.2 7.9 2 25 8.9 8.4 � Performance Engineering Laboratory c ARCO meeting at ITU, fall 2012 (8)

Remove a hard-to-predict if opt 2 : Interpret the result of a comparison as an integer and use this value in normal index arithmetic i f ( condition ) { j = j + 1; Construction time [ns] } n F 1 F 12 − → 2 10 7.1 4.8 j = j + condition ; 2 15 7.0 4.9 2 20 7.9 6.3 2 25 8.4 7.2 � Performance Engineering Laboratory c ARCO meeting at ITU, fall 2012 (9)

commercial break Lean programs • A program has a constant Theorem. Let P be a program number of unnested loops. of length κ , measured in the • Each loop is branch-free , number of assembly-language in- except the final conditional structions. Assume that the run- branch at the end. ning time of P is t ( n ) for an input • A branch predictor is static : of size n . There exists a pro- forward branches are not gram Q of length O ( κ ) that is taken and backward branches equivalent to P , runs in O ( κt ( n )) are taken. time for the same input as P , and • Each such program induces induces O (1) branch mispredic- O (1) branch mispredictions in tions. this model. [Elmasry, Katajainen 2012] � Performance Engineering Laboratory c ARCO meeting at ITU, fall 2012 (10)

Reduce # element moves opt 3 : Do not make any element moves when the element at the root stays in its original location Construction time [ns] element copy = a [ i ] ; n F 12 F 123 − → 2 10 4.8 4.3 2 15 4.9 4.6 element copy ; index k = 2 ∗ i ; 2 20 6.3 5.9 k = k + less ( a [ k ] , a [ k + 1]) ; 2 25 7.2 6.9 i f ( less ( a [ i ] , a [ k ]) ) { copy = a [ i ] ; Element moves a [ i ] = a [ k ] ; } n F F 123 else { return ; 2 10 1.73 1.52 } i = k ; 2 15 1.74 1.53 2 20 1.74 1.53 2 25 1.74 1.52 Aha! Loop unrolling � Performance Engineering Laboratory c ARCO meeting at ITU, fall 2012 (11)

Reduce # cache misses opt 4 : Visit the nodes in reverse depth-first order instead of reverse breadth-first order [Bojesen et al. 2000] for ( index i = n / 2; i > 0; −− i ) siftdown ( a , i , n , less ) ; Construction time [ns] − → F F 123 F 1 - 4 n index j = n / 2; index const i = j / 2; 2 10 7.4 4.3 5.2 while ( j > i ) { 2 15 siftdown ( a , j , n , less ) ; 7.4 4.6 5.1 index z = j ; 2 20 8.2 5.9 5.2 while (( z & 1) = = 0) { 2 25 z / = 2; 8.7 6.9 5.1 siftdown ( a , z , n , less ) ; } −− j ; } � Performance Engineering Laboratory c ARCO meeting at ITU, fall 2012 (12)

Making the GM algorithm in-place Element comparisons size: ∼ n/ lg n ∼ 2 n − → ∼ 1 . 625 n Element moves size: ∼ lg n ∼ 2 n − → ∼ 2 . 125 n Cache misses 1. Improve GM : ∼ n lg B ∼ n B , assuming − → B O ( n ) words − → O ( n ) bits that B lg n << M ( B block 2. Apply the improved algo- size; M memory size) rithm for all bottom trees; Construction time [ns] keep the bits needed com- n F GM pactly in a word 2 10 7.4 8.0 3. Use F ’s siftdown approach for 2 15 7.4 7.7 the top tree. 2 20 8.2 7.7 2 25 8.7 7.7 � Performance Engineering Laboratory c ARCO meeting at ITU, fall 2012 (13)

Construction time [ns] Instructions n std F F 123 F 1 - 4 GM n std F F 123 F 1 - 4 GM Heap construction: Summary 2 10 10.7 2 15 7.4 4.3 5.2 8.0 2 15 10.4 2 20 35.5 20.8 13.4 16.2 42.9 7.4 4.6 5.1 7.7 2 20 11.0 8.2 5.9 5.2 7.7 2 25 2 25 11.5 8.7 6.9 5.1 7.7 Element comparisons Branches | mispredictions n std / F GM n std F F 123 F 1 - 4 2 10 2 10 5.39 | 0.96 1.98 1.80 4.53 | 0.81 2.17 | 0.27 2.42 | 0.47 2 15 2 15 5.40 | 0.89 1.99 1.66 2.43 | 0.78 2.18 | 0.24 2.43 | 0.47 2 20 2 20 5.41 | 0.89 1.99 1.63 4.57 | 0.78 2.18 | 0.24 2.43 | 0.47 2 25 2 25 5.41 | 0.89 2 1.63 4.56 | 0.78 2.18 | 0.24 2.43 | 0.47 GM Element moves I/Os | misses (per n/B ) 3.60 | 0.66 2.39 | 0.38 n std F GM std / F F 1 - 4 GM n – – | 2 10 3.99 2 10 1.00 | 1.00 1.99 2.15 1.00 | 1.00 0.95 | 0.95 – | – 2 15 3.99 1.99 2.39 2 15 5.66 | 1.00 1.03 | 1.00 1.03 | 1.00 2 20 4 1.99 2.38 2 20 5.87 | 4.94 1.04 | 1.00 – | – 2 25 4 2 2.38 2 25 5.87 | 5.84 1.04 | 0.99 – | – � Performance Engineering Laboratory c ARCO meeting at ITU, fall 2012 (14)

Static search trees 4 46 2 6 construct () 12 75 sort ( a, a + n ) 1 3 5 7 is - member ( x ) 10 26 75 80 i = 0 0 k = n 8 while i � = k n = 8 if x < a [ i ] k = i a 8 10 12 26 46 75 75 80 i = left - child ( i ) 5 6 7 0 1 2 3 4 else if a [ i ] < x left - child ( i ) i = right - child ( i ) return . . . else return yes right - child ( i ) return no return . . . � Performance Engineering Laboratory c ARCO meeting at ITU, fall 2012 (15)

In-Place Data Structures: Which Complexity Measures Do Matter? Jyrki - PowerPoint PPT Presentation

In-Place Data Structures: Which Complexity Measures Do Matter? Jyrki Katajainen 1 , 2 Jingsen Chen 3 , Stefan Edelkamp 4 , Amr Elmasry 5 , Max Stenmark 2 1 Kbenhavns Universitet 2 Jyrki Katajainen and Company 3 Lule a Tekniska Universitet 4

A PLACE TO CALL HOME A PLACE TO CALL HOME A PLACE TO CALL HOME A PLACE TO CALL HOME A PLACE

Background Background Text Complexity Text Complexity Text Complexity Sowmya V.B., Sowmya

quiz insertion sort: worst-case time complexity? best-case time complexity? in-place?

No place like No place like HOME No place like No place like HOME HOME HOME (Harmonising

The Place Approach What is the Place Approach? What makes a Great Place The Benefits of a Great

Leading Causes of Death Where do you think heart disease falls? 1st place 2nd place

Complexity Measures for Parallel Computation Complexity Measures for Parallel Computation

Hypo contact and Sasakian SU ( 2 ) -structures in 5-dimensions structures on Lie groups Sasakian

Hans Vangheluwe Modelling and Simulation Causes of Complexity Dealing with Complexity

Hans Vangheluwe Modelling and Simulation Causes of Complexity Dealing with Complexity

Kolmogorov Complexity of Categories Complexity Programing Language Kolmogorov Noson S.

IN 5210 Complexity Theory Complexity Complexity: Socio-technical (Internet, globalization)

Communication Complexity Lecture 23 Computing with remote inputs 1 Communication Complexity

Complexity and Character of Human Languages The Faculty of Language Informatics 2A: Lecture 28

2 nd place 3 rd place 5 th place 17 th place Ledning och styrning Vision, ml

A place where spiritual people go A place where spiritual people go A place to

TOWARDS THE POST-2020 GLOBAL BIODIVERSITY FRAMEWORK Rodrigo Perptuo | Executive Secretary ICLEI

The Citys Revenues SPUR Presentation CITY & COUNTY OF SAN FRANCISCO Kelly Kirkpatrick,

Cleveland Municipal School District 1 F I V E Y E A R F I N A N CI A L F O R E CA ST O C T O

Supplemental Material Figure S-1: Diagram of the apparatus used to oxidize elemental sulfur in the

Telecom m unication Sector in Transition: Case of South Eastern Europe 5th Conference on Applied

FY 2014 Results Presentation 24 February, 2015 Senior management presenting Prasanth Manghat

Thats my lifes work Thats my lifes work and I dont have and I

Task Force Student Mobility Members: Ana Isabel Ferreira (Portugal) Grald Zimmermann (Basel)

In-Place Data Structures: Which Complexity Measures Do Matter? Jyrki - PowerPoint PPT Presentation

In-Place Data Structures: Which Complexity Measures Do Matter? Jyrki Katajainen 1 , 2 Jingsen Chen 3 , Stefan Edelkamp 4 , Amr Elmasry 5 , Max Stenmark 2 1 Kbenhavns Universitet 2 Jyrki Katajainen and Company 3 Lule a Tekniska Universitet 4

A PLACE TO CALL HOME A PLACE TO CALL HOME A PLACE TO CALL HOME A PLACE TO CALL HOME A PLACE

Background Background Text Complexity Text Complexity Text Complexity Sowmya V.B., Sowmya

quiz insertion sort: worst-case time complexity? best-case time complexity? in-place?

No place like No place like HOME No place like No place like HOME HOME HOME (Harmonising

The Place Approach What is the Place Approach? What makes a Great Place The Benefits of a Great

Leading Causes of Death Where do you think heart disease falls? 1st place 2nd place

Complexity Measures for Parallel Computation Complexity Measures for Parallel Computation

Hypo contact and Sasakian SU ( 2 ) -structures in 5-dimensions structures on Lie groups Sasakian

Hans Vangheluwe Modelling and Simulation Causes of Complexity Dealing with Complexity

Hans Vangheluwe Modelling and Simulation Causes of Complexity Dealing with Complexity

Kolmogorov Complexity of Categories Complexity Programing Language Kolmogorov Noson S.

IN 5210 Complexity Theory Complexity Complexity: Socio-technical (Internet, globalization)

Communication Complexity Lecture 23 Computing with remote inputs 1 Communication Complexity

Complexity and Character of Human Languages The Faculty of Language Informatics 2A: Lecture 28

2 nd place 3 rd place 5 th place 17 th place Ledning och styrning Vision, ml

A place where spiritual people go A place where spiritual people go A place to

TOWARDS THE POST-2020 GLOBAL BIODIVERSITY FRAMEWORK Rodrigo Perptuo | Executive Secretary ICLEI

The Citys Revenues SPUR Presentation CITY &amp; COUNTY OF SAN FRANCISCO Kelly Kirkpatrick,

Cleveland Municipal School District 1 F I V E Y E A R F I N A N CI A L F O R E CA ST O C T O

Supplemental Material Figure S-1: Diagram of the apparatus used to oxidize elemental sulfur in the

Telecom m unication Sector in Transition: Case of South Eastern Europe 5th Conference on Applied

FY 2014 Results Presentation 24 February, 2015 Senior management presenting Prasanth Manghat

Thats my lifes work Thats my lifes work and I dont have and I

Task Force Student Mobility Members: Ana Isabel Ferreira (Portugal) Grald Zimmermann (Basel)

The Citys Revenues SPUR Presentation CITY & COUNTY OF SAN FRANCISCO Kelly Kirkpatrick,