Energy efficient calculation of simple functions Advanced Seminar Computer Engineering Abdulhamid Han 19.01.2016 1
Energy efficiency depends also from the algorithm For example: bubblesort O(n²) ↔ quicksort O(n∙logn) 𝑜 = 10 6 → 𝑠𝑓𝑚𝑏𝑢𝑗𝑤𝑓 𝑒𝑓𝑤𝑗𝑏𝑢𝑗𝑝𝑜 ≈ 10 5 Abdulhamid Han 2
Content Inhaltsverzeichnis • Fast inverse square root • Finding the median without sorting • Bit counting → 3 1 1 1 0 0 28 = Abdulhamid Han 3
Fast inverse square root Abdulhamid Han 4
Fast inverse square root – Single precision floating numbers are stored as 32 bit numbers 31 30 23 22 0 Sign bit 8 Exponent bits 23 Mantissa bits IEEE 754 Single Precision Format → x= ( -1) sign ∙(1+Mantissa)∙2 Exponent-127 π ≈ 0 1 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 1 1 1 1 1 1 0 1 1 0 1 1 ≈ ( -1) 0 ∙ (1.5707963705062866) ∙ 2 128-127 ≈ 3.1415927 Abdulhamid Han 5
Fast inverse square root – In video games the inverse square root is necessary due to vector normalization – Often the speed is more importantly than the accuracy and an accuracy of 1% is acceptable – The main goal is to get a good approximate value in one calculation step How can you calculate the inverse square without division and ? Abdulhamid Han 6
Fast inverse square root 1 1 0 1 0 Integer : 26 = 26 26 >> 1 = = 13 = 0 1 1 0 1 2 π ≈ Float : 0 1 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 1 1 1 1 1 1 0 1 1 0 1 1 π >> 1 ≈ 0 0 1 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 1 1 1 1 1 1 0 1 1 0 1 → 1.6263033∙10 -19 Now calculate 0x5f3759df - ( π >> 1 ) (bitwise calculation!) 1 → 0.563957 ≈ 𝜌 0 1 0 1 1 1 1 1 0 0 1 1 0 1 1 1 0 1 0 1 1 0 0 1 1 1 0 1 1 1 1 1 Abdulhamid Han 7
Result with 0x5f3759df 1 𝑦 − 𝐽𝑜𝑤𝑇𝑟𝑠𝑢(𝑦) 1 𝑦 < 4 % Abdulhamid Han 8
0x5f3759df vs 0x5f34ff59 1 𝑦 − 𝐽𝑜𝑤𝑇𝑟𝑠𝑢(𝑦) 1 𝑦 < 5 % < 4 % Abdulhamid Han 9
Newton’s method https://en.wikipedia.org/wiki/Newton%27s_method#/media/File:NewtonIteration_Ani.gif Abdulhamid Han 10
Fast inverse square root y 𝑔 𝑧 𝑔𝑝𝑠 𝑦 = 50 x Abdulhamid Han 11
Fast inverse square root float InvSqrt ( float number ) { long i ; float x2 , y ; const float threehalfs = 1.5F ; x2 = number * 0.5F ; y = number ; i = * ( long * ) & y ; // store floating-point bits in long i = 0x5f3759df - ( i >> 1 ); // initial guess for Newton's method y = * ( float * ) & i ; // convert new bits into float y = y * ( threehalfs - ( x2 * y * y ) ); // 1st iteration return y ; http://betterexplained.com/articles/understanding-quakes-fast-inverse-square-root/ } π ≈ 0 1 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 1 1 1 1 1 1 0 1 1 0 1 1 Abdulhamid Han 12
Result with 0x5f3759df (1 newton step) 1 𝑦 − 𝐽𝑜𝑤𝑇𝑟𝑠𝑢(𝑦) 1 𝑦 < 2 ∙ 10 −3 Abdulhamid Han 13
Magic number for another exponents http://h14s.p5r.org/2012/09/0x5f3759df.html 𝑔𝑝𝑠 𝑞 = 1 3 < 3% 𝑥𝑗𝑢ℎ𝑝𝑣𝑢 𝑜𝑓𝑥𝑢𝑝𝑜 𝑡𝑢𝑓𝑞 Abdulhamid Han 14 14
Finding the median without sorting Abdulhamid Han 15
Finding the median without sorting Definition: Median is the middle value in a sorted array – It’s easy to find the median in a sorted array 2 5 3 12 20 1 99 7 8 Median? O(n∙log(n)) sorting 1 2 3 5 7 8 12 20 99 – O(n) In an unsorted array one can find the median without sorting Abdulhamid Han 16
A simple algorithm to find the median 2 5 3 12 20 1 99 7 8 • Choose an arbitrary element x 2 5 3 12 20 1 99 7 8 • Partition in 3 sections 2 5 3 1 7 8 12 20 99 a0 a1 a2 a3 a4 a5 a6 a7 a8 𝑜 9 • Rank of median: 𝑙 = 2 = 2 = 4 • → return 2 5 3 1 7 8 • Choose an arbitrary element x and partition in 3 sections 2 5 3 1 7 8 a0 a1 a2 a3 a4 a5 Abdulhamid Han 17
A simple algorithm to find the median array a 0 , a 1 , … , a n-1 with length n Input: median = element with rank m = 𝑜 Output: 2 1. If n=1 return a 0 else 2. Choose an arbitrary element x 3. Partition the array in three sections 1. a 0 , … , a q-1 with elements less than x 2. a q , … , a g-1 with elements equal x 3. a g , … , a n-1 with elements greater than x 4. If m < q return a m in first section If m < g return x else return a m in third section → O(n) Best case: 3 sections of equal length → O(n²) Worst case: returned section is always smaller by 1 Abdulhamid Han 18
A simple algorithm to find the median 5 3 1 7 8 Worst Case: 5 3 1 7 8 5 3 1 7 5 3 1 7 3 1 5 → x should be select carefully ! Abdulhamid Han 19
Improved version array a 0 , a 1 , … , a n-1 with length n Input: median = element with rank m = 𝑜 Output: 2 1. If n<15 sort the array and return median else 2. Partition the array in 𝒐 𝟔 sections with 5 elements and calculate their median 3. Calculate recursively the median m’ of this medians 4. Partition the array in three sections 1. a 0 , … , a q-1 with elements less than m’ 2. a q , … , a g-1 with elements equal m’ 3. a g , … , a n-1 with elements greater than m’ 5. If m < q return a m’ in first section return m’ If m < g else return a m’ in third section Abdulhamid Han 20
Improved version Median of medians s o r medians t e d Up to 4 additional elements, if n is not divisible by 5 Abdulhamid Han 21
Improved version 4 62 100 5 66 4 1 14 5 3 33 5 342 14 3 7 5 24 14 45 22 1 14 124 55 22 26 78 51 55 7 52 78 51 45 33 52 100 79 66 42 26 24 79 82 42 62 342 124 82 < 51 4 1 14 5 3 7 5 24 14 45 22 26 51 55 78 33 52 100 79 66 42 62 342 124 82 > 51 Abdulhamid Han 22
Improved version array a 0 , a 1 , … , a n-1 with length n Input: median = element with rank m = 𝑜 Output: 2 O(1 1. If n<15 sort the array and return median else 𝒐 2. Partition the array in 𝟔 sections with 5 elements and calculate their O(n) median 3. Calculate recursively the median m’ of this medians O(n) 4. Partition the array in three sections O(n) 1. a 0 , … , a q-1 with elements less than m’ 2. a q , … , a g-1 with elements equal m’ 3. a g , … , a n-1 with elements greater than m’ 5. If m < q return a m’ in first section ≤T(3n/4) If m < g return m else return a m’ in third section Abdulhamid Han 23
Bit Counting → 3 28 = 1 1 1 0 0 Abdulhamid Han 24
Bit counting Simple Solution unsigned int c = 0 ; for ( unsigned int mask = 0x1 ; mask ; mask <<= 1 ) { // 32 loops! Repeat until mask == 0 if ( v & mask ) c ++; } Disadvantage: always 32 loops Abdulhamid Han 25
Bit counting First improvement unsigned int c ; for ( c = 0 ; v ; v >>= 1 ) { // shift while v!=0 c += v & 1 ; // increase counter } Disadvantage: as many loops as the highest set bit → 1 loop v=0x1 → 32 loop v=0x80000000 Abdulhamid Han 26
Bit counting Second improvement unsigned int c ; for ( c = 0 ; v ; c ++) { // repeat until v == 0 v &= v - 1 ; // delete lowest set bit } v = …xyz10…0 v- 1 = …xyz01…1 → v & v - 1 = …xyz0…0 Advantage: as many loops as the number of ones But still not fast enough if the number of ones is large Abdulhamid Han 27
An elegant method v = ab | ab | … | ab | ab (16 times 2 bits) c – number of ones a b c ab - 0a 0 0 00 00 0 1 01 01 1 0 01 01 1 1 10 10 ab – 0a can be calculated with v - (( v >> 1 ) & 0x55555555 ) Abdulhamid Han 28
An elegant method Now add 2 neighbor 2 bits to a 4 bit v = ab* | ab* | … | ab* | ab* (16 times 2 bits) v = ab’+ab’’ | … | ab’+ab’’ (8 times 4 bit) • No carry! It can be calculated with: ( v & 0x33333333 ) + (( v >> 2 ) & 0x33333333 ); Abdulhamid Han 29
An elegant method Now sum up 2 neighbor 4 bits to a 8 bit: 1. v = ( v + ( v >> 4 )); 2. v &= 0x0F0F0F0F ; // delete useless bits • Still no carry ! v contains 4 times 8 bit (v=ABCD) v*0x01010101 = D 000 + C D00 + B CD0 + A BCD >> 24 deliver A+B+C+D The result is: c = ( v * 0x01010101 ) >> 24 ; Abdulhamid Han 30
An elegant method v = v - (( v >> 1 ) & 0x55555555 ); // count bits in two groups v = ( v & 0x33333333 ) + (( v >> 2 ) & 0x33333333 ); // Add 2 groups-> 4 groups v = ( v + ( v >> 4 )); // Add 4 groups-> 8 groups v &= 0x0F0F0F0F ; // delete useless bits c = ( v * 0x01010101 ) >> 24 ; // Add the 4 8 groups Advantage: count bits in constant time Disadvantage: not optimal in a few bits set Abdulhamid Han 31
Results Second improvement vs. elegant method CPU cyles http://bits.stephan-brumme.com/countBits.html Abdulhamid Han 32
Conclusion – Fast inverse square root • One can calculate the inverse square root 4 times faster with an accuracy of < 1% – Finding the median without sorting • One can find the median without sorting • The complexity is O(n) – Bit counting • It’s possible to count set bits in constant time independent of the input value Abdulhamid Han 33
Recommend
More recommend