Cache-Oblivious Algorithms Paper Reading Group Matteo Frigo Charles E. Leiserson Harald Prokop Sridhar Ramachandran Presents: Maksym Planeta 03.09.2015
Table of Contents Introduction Cache-oblivious algorithms Matrix multiplication Matrix transposition Fast Fourier Transform Sorting Relieved system model Experimental evaluation Conclusion
Table of Contents Introduction Cache-oblivious algorithms Matrix multiplication Matrix transposition Fast Fourier Transform Sorting Relieved system model Experimental evaluation Conclusion
Matrix multiplication ORD-MULT ( A , B , C ) 1 for i ← 1 to m 2 for j ← 1 to p 3 for k ← 1 to n 4 C ij ← C ij + A ik × B kj
✵ ✎✂ ✟✏✠ ✵ ✄✂ ✟✡✠ ✽✯✽ ✵ ✄✂ ✟✏✠ ✴ ✸ ✂ ✂ ✂ ✂ ☛ ✼ ✂ ✵ ✸ ✼ ✷ ✂ ✿ ✵ ✆ ✾ ❀ ✝ ✴ ✂ ✂ ☎✆ ✝ ✴ ✆ ✆✞ ✴ ✴ ✷ ✷ ✵ ✸ ☞ ✵ ✁ ✵ ✍ ✽ ✽ ✆ ✆ ✿ ✿ ✷ ✆ ✆ ✿ ✁ ✿ ✽ ✿ ✴ ✵ ✝ ✷ ☞ ✂ ✴ ✂ ✸ ✂ ✵ ✴ ✸ ✵ ✌ ☞ ✂ ✴ ✂ ✂ ✾ ✒ ✁ ✆ ✿ ✽ ✾ ✾ ✿ ❁ ✽ ✽ ✆ ✆ ✆ ✿ ✁ ✿ ✏ ✑ ✁ ✁ ✒ ✝ ✾ ✾ ✿ ❁ ✽ ✆ ✁ ✿ ✆ ✾ ✾ ✁ ✝ ✽ ✆ ✁ ✾ ❁ ✾ � ❀ � � � � � � � ✽ � ✑ � ✾ � ✾ � ✿ � ❁ ✽ ❁ ✽ ✿ ✽ ✆ ✆ ✆ ✆ � ✆ ✿ ✁ ✆ ✁ ✝ ✿ ✽ ❁ ✾ ✵ ✝ ✴ ✂ ✑ ✴ ✷ ✼ ✁ ✸ ✂ ✼ ☛ ✂ ✴ ✸ ✁ ✁ ✂ ✁ ✝ ✾ ✁ ✝ ❀ ✝ ✾ ✝ ✝ ❀ ✝ ✾ ✁ ✝ ❀ ✝ ✂ ✵ ✝ ✿ ✁ ✆ ✁ ✿ ✽ ✾ ✾ ❁ ✁ ✽ ✆ ✽ ✆ ✆ ✿ ✁ ✆ ✆ ✷ ✽ ✴ ✵ ✽ ✾ ✾ ✿ ❁ ✆ ✽ ✁ ✝ ✿ ✾ ✾ ✁ ✝ ❀ ✁ ✿ � ✁ ✝ ✾ ❀ ✝ � ✴ ❀ ✂ ✂ ☎ ✝ ✞ ✴ ✷ ✝ ✾ ✸ ✾ ✵ ✽ ✁ ✝ ✿ ✁ ✝ ❀ ✝ ✝ ✁ ✝ ✾ ❀ ✝ ✁ ✷ ✷ ✾ ✾ ✸ ✵ ✷ ✴ ✵ ✽ ✾ ✿ ✂ ❁ ✽ ✁ ✆ ✁ ✝ ✿ ✂ ✴ ✵ ✴ ✂ ✼ ✸ ✂ ✼ ☛ ✂ ✂ ✂ ✸ ✂ ✵ ✷ ✴ ✵ ☞ ✽ ✴ ✷ ☛ ✴ ✷ ✷ ✵ ✸ ✵ ✂ ✂ ✼ ✸ ✂ ✼ ✂ ✆✞ ✴ ✸ ✂ ✂ ✵ ✷ ✝ ❀ ✾ ✝ ✁ ✝ ✴ ✆ ✾ ✾ � � � � � � � � ✑ � ✾ � � ✝ ✒ ✁ ✝ ✾ ✾ ❀ ✝ ✴ ✂ ✂ ☎✆ ✆ ❀ ✝ ❀ ✴ ✆ ✆ ✽ ✽ ✍ ✵ ✴ ✷ ✵ ✸ ✂ ✂ ✂ ✁ ☞ ✌ ✵ ✴ ✷ ✵ ✂ ✸ ✂ ✴ ✂ ☞ ✿ ✿ ✁ ✁ ✝ ❀ ✾ ✝ ✁ ✝ ❀ ✾ ✝ ✁ ✿ ✝ ✽ ✴ ❁ ✿ ✾ ✾ ✽ ✿ ✽ ✿ ✁ ✿ ✆ ✆ � ✿ ✵ ✝ ❀ ✝ ✁ ✾ ✝ ❀ ✾ ✝ ✁ ✾ ✝ ❀ ✁ ✾ ✾ ✿ ✝ ✁ ✆ ✁ ✽ ❁ ✿ ✾ ✾ ✽ ✝ ✁ ✴ ✵ ✵ ✂ ✂ ✸ ✴ ✂ ☛ ✼ ✂ ✸ ✼ ✏ ✷ ✝ ✴ ✑ ✑ ✂ ✴ ✝ ✁ ✁ ✁ ✝ ✝ ❀ ✵ ✷ ✽ ✆ ☎ ❁ ✂ ✂ ✴ � � ❁ ❁ ✽ ✿ ✽ ✆ ✞ ✆ ✽ ✆ ✆ ✿ ✁ ✆ ✁ ✝ ✿ � � ✝ ✴ ✵ ✂ ✸ ✂ ✂ ✴ ✂ ☞ ✵ ✴ ✷ ✵ ✂ ✸ ✴ ✷ ✂ ☛ ✼ ✂ ✸ ✼ ✒ ✂ ✵ ✷ ✸ ✷ ☞ Matrix layout Like in C . . . (a) 0 1 0 1 2 2 3 3 4 4 5 5 6 6 7 7 8 9 8 9 10 10 11 11 12 12 13 13 14 14 15 15 16 16 17 17 18 18 19 19 20 20 21 21 22 22 23 23 24 25 25 26 26 27 27 28 28 29 29 30 30 31 31 24 32 33 32 33 34 34 35 35 36 36 37 37 38 38 39 39 40 41 40 41 42 42 43 43 44 44 45 45 46 46 47 47 48 49 49 50 48 50 51 51 52 52 53 53 54 54 55 55 56 56 57 57 58 58 59 59 60 60 61 61 62 62 63 63 Figure: Row major order ✵ ✎✂ ✟✏✠ ✵ ✄✂ ✟✡✠ ✽✯✽ ✵ ✄✂ ✟✏✠
� ❁ ✆ ✁ ✿ ✆ ✆ ✽ ✆ ✆ ✆ ✽ ✿ ✽ ❁ ❁ ✒ ✝ ✾ ✑ ✏ ✿ ✿ ✁ ✿ ✆ ✆ ✽ ✆ ✽ ❁ ✿ ✁ ✿ ✾ � ☎✆ ✁ ✂ ✂ ✴ ✝ ❀ ✾ ✾ ✝ ✁ ✒ � ✾ ✾ � � � ✽ ✿ ❀ ✝ � � � � � � � � ✑ ✾ ✽ ✝ ❀ ✾ ✵ ✷ ✴ ✑ ✁ ✂ ✴ ✝ ✁ ✁ ✁ ✝ ✝ ✝ ✸ ✁ ✾ ✝ ❀ ✝ ✁ ✾ ✝ ❀ ✝ ✁ ✾ ✝ ❀ ✼ ✂ ✿ ✆ ✁ ✆ ✁ ✆ ✁ ✆ ✽ ✝ ✁ ✾ ✾ ✿ ✝ ✁ ✽ ✼ ✵ ☛ ✂ ✴ ✸ ✂ ✂ ✷ ❁ ✴ ✵ ✽ ✾ ✾ ✿ ✆ ✆ ✁ ❀ ✴ ✵ ✽ ✾ ✾ ✿ ❁ ✽ ✝ ❀ ✾ ✝ ✁ ✝ ✾ ✵ ✝ ✁ ✝ ❀ ✾ ✝ ✁ ✝ ❀ ✾ ✝ ✁ ✿ ✝ ✷ ✸ ✽ ✸ ✴ ✂ ✂ ☎ ✝ ✞ ✴ ✷ ✷ ✸ ✷ ✵ ✂ ✼ ✂ ✂ ✷ ✂ ✴ ✂ ☞ ✵ ✴ ✵ ✼ ✂ ✸ ✂ ✴ ✂ ☛ ✁ ❁ ✆✞ ✂ ✸ ✂ ✴ ✂ ☞ ☞ ✵ ✴ ✷ ✵ ✂ ✂ ✸ ✴ ☛ ✵ ✼ ✂ ✸ ✼ ✆ ✂ ✂ ✵ ✸ ✵ ✷ ✷ ✴ ✴ ✂ ✷ ✿ ✆ ✾ ✾ ✽ ✿ ✽ ✿ ✁ ✿ ✆ ✆ ✁ ✿ ✁ ✿ ✆ ✴ ✂ ✵ ✌ ☞ ✂ ✴ ✂ ✸ ✽ ✵ ✷ ✴ ✵ ✍ ✽ ✝ ✾ � ✂ ✂ ✂ ✴ ✂ ☞ ✌ ✵ ✴ ✷ ✵ ✂ ✸ ✂ ✴ ☞ ✵ ☞ ✵ ✴ ✷ ✵ ✂ ✂ ✸ ✴ ✂ ☛ ✼ ✂ ✸ ✸ ✷ ✁ ✿ ✝ ❀ ✾ ✝ ✁ ✿ ✝ ✁ ✽ ❁ ✿ ✾ ✾ ✽ ✽ ✴ ✿ ✵ ✍ ✽ ✽ ✆ ✆ ✁ ✿ ✿ ✿ ✆ ✆ ✿ ✁ ✼ ✂ ✝ ✆ � � � ❀ ✿ ✽ � � ✿ ✝ ✁ ✆ ✁ ✿ ✆ � ✽ ✆ ✆ ✆ ✽ ✿ ✽ ❁ ❁ ❁ ✒ ✾ ✑ ✏ � � ✂ ✂ ✵ ✸ ✵ ✷ ✷ ✴ ✴ ✆✞ ✆ ✝ ✆ ☎✆ ✿ ✂ ✴ � ✾ � � ✑ � ✾ � � ✝ ✒ ✁ ✝ ✾ ✾ ❀ ✁ ✾ ✿ � ✂ ✵ ✷ ✸ ✷ ✷ ✴ ✞ ✝ ☎ ❀ ✂ ✂ ✴ � ✼ ✝ ✾ ✁ ✝ ❀ ✝ ✾ ✁ ✝ ❀ ✝ ✾ ✁ ✝ ✝ ✸ ✝ ✸ ✝ ✁ ✆ ✁ ✽ ❁ ✿ ✾ ✾ ✽ ✵ ✴ ✷ ✵ ✂ ✂ ✂ ✼ ☛ ✂ ✴ ✂ ✸ ✵ ✂ ✷ ✴ ✵ ☞ ✂ ✴ ❀ ✝ ❀ ❁ ✁ ✝ ✽ ✆ ✁ ✆ ✁ ✆ ✁ ✿ ✽ ✾ ✾ ✿ ✽ ✾ ✁ ✝ ✁ ✝ ✾ ❀ ✝ ✝ ✆ ✾ ❀ ✝ ✆ ✆ ✽ ✾ ✿ ✁ ✂ ✁ ✁ ✝ ✴ ✂ ✑ ✴ ✷ ✵ ✼ ✸ ✂ ✼ ☛ ✴ ✝ ✾ ✁ ✆ ✽ ❁ ✿ ✾ ✽ ✸ ✵ ✴ ✷ ✵ ✂ ✂ � Matrix layout Like in C . . . (a) (b) 0 1 0 1 2 2 3 3 4 4 5 5 6 6 7 7 0 0 8 8 16 16 24 24 32 32 40 40 48 48 56 56 8 8 9 9 10 10 11 11 12 12 13 13 14 14 15 15 1 1 9 9 17 17 25 25 33 33 41 41 49 49 57 57 16 17 16 17 18 18 19 19 20 20 21 21 22 22 23 23 2 2 10 10 18 18 26 26 34 34 42 42 50 50 58 58 24 25 25 26 26 27 27 28 28 29 29 30 30 31 31 3 3 19 19 35 35 43 43 51 51 59 59 24 11 11 27 27 32 33 32 33 34 34 35 35 36 36 37 37 38 38 39 39 4 4 12 12 20 20 28 28 36 36 44 44 52 52 60 60 40 40 41 41 42 42 43 43 44 44 45 45 46 46 47 47 5 5 13 13 21 21 29 29 37 37 45 45 53 53 61 61 48 49 49 50 48 50 51 51 52 52 53 53 54 54 55 55 6 6 14 14 22 22 30 30 38 38 46 46 54 54 62 62 56 56 57 57 58 58 59 59 60 60 61 61 62 62 63 63 7 7 15 15 23 23 31 31 39 39 47 47 55 55 63 63 Figure: Row major order Figure: Column major order Or like in Fortran ✵ ✎✂ ✵ ✎✂ ✟✏✠ ✟✏✠ ✵ ✄✂ ✵ ✄✂ ✟✡✠ ✟✡✠ ✽✯✽ ✽✯✽ ✵ ✄✂ ✵ ✄✂ ✟✏✠ ✟✏✠
Cache friendly algorithm BLOCK-MULT ( A , B , C , n ) 1 for i ← 1 to n / s 2 for j ← 1 to n / s 3 for k ← 1 to n / s 4 ORD-MULT ( A ik , B kj , C ij , s )
BLOCK-MULT issues Being cache aware is hard: ◮ Cumbersome structure ◮ Complicated choice of s ◮ Expensive mispicking of s ◮ Problematic if n mod s � = 0
Motivation ◮ Keeping algorithm simple is nice. ◮ But cache effectiveness is the must .
Table of Contents Introduction Cache-oblivious algorithms Matrix multiplication Matrix transposition Fast Fourier Transform Sorting Relieved system model Experimental evaluation Conclusion
✿ ✾ ✷ ✵ ✸ ✷ ✸ ✼ ✽ ✿ ✴ ❀ ❁ ✽ ✽ ✿ ✽ ✾ ✷ ✵ ✶ ✸ ✷ ✴ ✵ ✴ ✶ ✵ ✵ ✶ ✴ ✷ ✴ ✴ ✵ ✳ ✸ System model �✂✁☎✄✝✆✟✞✂✠✡✁☞☛✍✌✎✞✑✏✒☛✔✓✑✕✟✖✘✗✟✖✙✓✚☛✜✛✘✕✣✢✙✤✎✆✟✁✎✕✣✥✘✦✡✛✙✧★✓✟✞✎✕★✄✩✞✡✌✪✆✬✫✭✏✑✌★✛☞✫✯✮✰✢✪✄✱✫✲✞✎✤✎✧ ◮ Two level memory ◮ Fully associative ◮ Strictly optimal replacement Main organized by Memory ◮ Automatic replacement optimal replacement strategy ◮ Tall cache: Cache CPU Z = Ω( L 2 ) , W work where: Q Z ✸ L Cache lines Z – number of cache misses words in the Lines of length L cache ✵✺✵ ✷✹✴ L – number of Figure 1: The ideal-cache model words in a ✷✻✴ cache line ✿✣✾
Recommend
More recommend