energy efficient calculation of simple functions
play

Energy efficient calculation of simple functions Advanced Seminar - PowerPoint PPT Presentation

Energy efficient calculation of simple functions Advanced Seminar Computer Engineering Abdulhamid Han 19.01.2016 1 Energy efficiency depends also from the algorithm For example: bubblesort O(n) quicksort O(nlogn) = 10 6


  1. Energy efficient calculation of simple functions Advanced Seminar Computer Engineering Abdulhamid Han 19.01.2016 1

  2. Energy efficiency depends also from the algorithm For example: bubblesort O(n²) ↔ quicksort O(n∙logn) 𝑜 = 10 6 → 𝑠𝑓𝑚𝑏𝑢𝑗𝑤𝑓 𝑒𝑓𝑤𝑗𝑏𝑢𝑗𝑝𝑜 ≈ 10 5 Abdulhamid Han 2

  3. Content Inhaltsverzeichnis • Fast inverse square root • Finding the median without sorting • Bit counting → 3 1 1 1 0 0 28 = Abdulhamid Han 3

  4. Fast inverse square root Abdulhamid Han 4

  5. Fast inverse square root – Single precision floating numbers are stored as 32 bit numbers 31 30 23 22 0 Sign bit 8 Exponent bits 23 Mantissa bits IEEE 754 Single Precision Format → x= ( -1) sign ∙(1+Mantissa)∙2 Exponent-127 π ≈ 0 1 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 1 1 1 1 1 1 0 1 1 0 1 1 ≈ ( -1) 0 ∙ (1.5707963705062866) ∙ 2 128-127 ≈ 3.1415927 Abdulhamid Han 5

  6. Fast inverse square root – In video games the inverse square root is necessary due to vector normalization – Often the speed is more importantly than the accuracy and an accuracy of 1% is acceptable – The main goal is to get a good approximate value in one calculation step How can you calculate the inverse square without division and ? Abdulhamid Han 6

  7. Fast inverse square root 1 1 0 1 0 Integer : 26 = 26 26 >> 1 = = 13 = 0 1 1 0 1 2 π ≈ Float : 0 1 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 1 1 1 1 1 1 0 1 1 0 1 1 π >> 1 ≈ 0 0 1 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 1 1 1 1 1 1 0 1 1 0 1 → 1.6263033∙10 -19 Now calculate 0x5f3759df - ( π >> 1 ) (bitwise calculation!) 1 → 0.563957 ≈ 𝜌 0 1 0 1 1 1 1 1 0 0 1 1 0 1 1 1 0 1 0 1 1 0 0 1 1 1 0 1 1 1 1 1 Abdulhamid Han 7

  8. Result with 0x5f3759df 1 𝑦 − 𝐽𝑜𝑤𝑇𝑟𝑠𝑢(𝑦) 1 𝑦 < 4 % Abdulhamid Han 8

  9. 0x5f3759df vs 0x5f34ff59 1 𝑦 − 𝐽𝑜𝑤𝑇𝑟𝑠𝑢(𝑦) 1 𝑦 < 5 % < 4 % Abdulhamid Han 9

  10. Newton’s method https://en.wikipedia.org/wiki/Newton%27s_method#/media/File:NewtonIteration_Ani.gif Abdulhamid Han 10

  11. Fast inverse square root y 𝑔 𝑧 𝑔𝑝𝑠 𝑦 = 50 x Abdulhamid Han 11

  12. Fast inverse square root float InvSqrt ( float number ) { long i ; float x2 , y ; const float threehalfs = 1.5F ; x2 = number * 0.5F ; y = number ; i = * ( long * ) & y ; // store floating-point bits in long i = 0x5f3759df - ( i >> 1 ); // initial guess for Newton's method y = * ( float * ) & i ; // convert new bits into float y = y * ( threehalfs - ( x2 * y * y ) ); // 1st iteration return y ; http://betterexplained.com/articles/understanding-quakes-fast-inverse-square-root/ } π ≈ 0 1 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 1 1 1 1 1 1 0 1 1 0 1 1 Abdulhamid Han 12

  13. Result with 0x5f3759df (1 newton step) 1 𝑦 − 𝐽𝑜𝑤𝑇𝑟𝑠𝑢(𝑦) 1 𝑦 < 2 ∙ 10 −3 Abdulhamid Han 13

  14. Magic number for another exponents http://h14s.p5r.org/2012/09/0x5f3759df.html 𝑔𝑝𝑠 𝑞 = 1 3 < 3% 𝑥𝑗𝑢ℎ𝑝𝑣𝑢 𝑜𝑓𝑥𝑢𝑝𝑜 𝑡𝑢𝑓𝑞 Abdulhamid Han 14 14

  15. Finding the median without sorting Abdulhamid Han 15

  16. Finding the median without sorting Definition: Median is the middle value in a sorted array – It’s easy to find the median in a sorted array 2 5 3 12 20 1 99 7 8 Median? O(n∙log(n)) sorting 1 2 3 5 7 8 12 20 99 – O(n) In an unsorted array one can find the median without sorting Abdulhamid Han 16

  17. A simple algorithm to find the median 2 5 3 12 20 1 99 7 8 • Choose an arbitrary element x 2 5 3 12 20 1 99 7 8 • Partition in 3 sections 2 5 3 1 7 8 12 20 99 a0 a1 a2 a3 a4 a5 a6 a7 a8 𝑜 9 • Rank of median: 𝑙 = 2 = 2 = 4 • → return 2 5 3 1 7 8 • Choose an arbitrary element x and partition in 3 sections 2 5 3 1 7 8 a0 a1 a2 a3 a4 a5 Abdulhamid Han 17

  18. A simple algorithm to find the median array a 0 , a 1 , … , a n-1 with length n Input: median = element with rank m = 𝑜 Output: 2 1. If n=1 return a 0 else 2. Choose an arbitrary element x 3. Partition the array in three sections 1. a 0 , … , a q-1 with elements less than x 2. a q , … , a g-1 with elements equal x 3. a g , … , a n-1 with elements greater than x 4. If m < q return a m in first section If m < g return x else return a m in third section → O(n) Best case: 3 sections of equal length → O(n²) Worst case: returned section is always smaller by 1 Abdulhamid Han 18

  19. A simple algorithm to find the median 5 3 1 7 8 Worst Case: 5 3 1 7 8 5 3 1 7 5 3 1 7 3 1 5 → x should be select carefully ! Abdulhamid Han 19

  20. Improved version array a 0 , a 1 , … , a n-1 with length n Input: median = element with rank m = 𝑜 Output: 2 1. If n<15 sort the array and return median else 2. Partition the array in 𝒐 𝟔 sections with 5 elements and calculate their median 3. Calculate recursively the median m’ of this medians 4. Partition the array in three sections 1. a 0 , … , a q-1 with elements less than m’ 2. a q , … , a g-1 with elements equal m’ 3. a g , … , a n-1 with elements greater than m’ 5. If m < q return a m’ in first section return m’ If m < g else return a m’ in third section Abdulhamid Han 20

  21. Improved version Median of medians s o r medians t e d Up to 4 additional elements, if n is not divisible by 5 Abdulhamid Han 21

  22. Improved version 4 62 100 5 66 4 1 14 5 3 33 5 342 14 3 7 5 24 14 45 22 1 14 124 55 22 26 78 51 55 7 52 78 51 45 33 52 100 79 66 42 26 24 79 82 42 62 342 124 82 < 51 4 1 14 5 3 7 5 24 14 45 22 26 51 55 78 33 52 100 79 66 42 62 342 124 82 > 51 Abdulhamid Han 22

  23. Improved version array a 0 , a 1 , … , a n-1 with length n Input: median = element with rank m = 𝑜 Output: 2 O(1 1. If n<15 sort the array and return median else 𝒐 2. Partition the array in 𝟔 sections with 5 elements and calculate their O(n) median 3. Calculate recursively the median m’ of this medians O(n) 4. Partition the array in three sections O(n) 1. a 0 , … , a q-1 with elements less than m’ 2. a q , … , a g-1 with elements equal m’ 3. a g , … , a n-1 with elements greater than m’ 5. If m < q return a m’ in first section ≤T(3n/4) If m < g return m else return a m’ in third section Abdulhamid Han 23

  24. Bit Counting → 3 28 = 1 1 1 0 0 Abdulhamid Han 24

  25. Bit counting Simple Solution unsigned int c = 0 ; for ( unsigned int mask = 0x1 ; mask ; mask <<= 1 ) { // 32 loops! Repeat until mask == 0 if ( v & mask ) c ++; } Disadvantage: always 32 loops Abdulhamid Han 25

  26. Bit counting First improvement unsigned int c ; for ( c = 0 ; v ; v >>= 1 ) { // shift while v!=0 c += v & 1 ; // increase counter } Disadvantage: as many loops as the highest set bit → 1 loop v=0x1 → 32 loop v=0x80000000 Abdulhamid Han 26

  27. Bit counting Second improvement unsigned int c ; for ( c = 0 ; v ; c ++) { // repeat until v == 0 v &= v - 1 ; // delete lowest set bit } v = …xyz10…0 v- 1 = …xyz01…1 → v & v - 1 = …xyz0…0 Advantage: as many loops as the number of ones But still not fast enough if the number of ones is large Abdulhamid Han 27

  28. An elegant method v = ab | ab | … | ab | ab (16 times 2 bits) c – number of ones a b c ab - 0a 0 0 00 00 0 1 01 01 1 0 01 01 1 1 10 10 ab – 0a can be calculated with v - (( v >> 1 ) & 0x55555555 ) Abdulhamid Han 28

  29. An elegant method Now add 2 neighbor 2 bits to a 4 bit v = ab* | ab* | … | ab* | ab* (16 times 2 bits) v = ab’+ab’’ | … | ab’+ab’’ (8 times 4 bit) • No carry! It can be calculated with: ( v & 0x33333333 ) + (( v >> 2 ) & 0x33333333 ); Abdulhamid Han 29

  30. An elegant method Now sum up 2 neighbor 4 bits to a 8 bit: 1. v = ( v + ( v >> 4 )); 2. v &= 0x0F0F0F0F ; // delete useless bits • Still no carry ! v contains 4 times 8 bit (v=ABCD) v*0x01010101 = D 000 + C D00 + B CD0 + A BCD >> 24 deliver A+B+C+D The result is: c = ( v * 0x01010101 ) >> 24 ; Abdulhamid Han 30

  31. An elegant method v = v - (( v >> 1 ) & 0x55555555 ); // count bits in two groups v = ( v & 0x33333333 ) + (( v >> 2 ) & 0x33333333 ); // Add 2 groups-> 4 groups v = ( v + ( v >> 4 )); // Add 4 groups-> 8 groups v &= 0x0F0F0F0F ; // delete useless bits c = ( v * 0x01010101 ) >> 24 ; // Add the 4 8 groups Advantage: count bits in constant time Disadvantage: not optimal in a few bits set Abdulhamid Han 31

  32. Results Second improvement vs. elegant method CPU cyles http://bits.stephan-brumme.com/countBits.html Abdulhamid Han 32

  33. Conclusion – Fast inverse square root • One can calculate the inverse square root 4 times faster with an accuracy of < 1% – Finding the median without sorting • One can find the median without sorting • The complexity is O(n) – Bit counting • It’s possible to count set bits in constant time independent of the input value Abdulhamid Han 33

Recommend


More recommend