stories not words abstract datatype instruction sets
play

Stories, Not Words: Abstract Datatype Instruction Sets Martha Kim - PowerPoint PPT Presentation

Stories, Not Words: Abstract Datatype Instruction Sets Martha Kim Columbia University Workshop on New Directions in Computer Architecture 6/5/2011 Sunday, June 5, 2011 The Utilization Wall Exponential decrease in percentage of


  1. Stories, Not Words: Abstract Datatype Instruction Sets Martha Kim Columbia University Workshop on New Directions in Computer Architecture 6/5/2011 Sunday, June 5, 2011

  2. The Utilization Wall • Exponential decrease in percentage of transistors that can be operated at full frequency. Moore’s Law (manufacturable transistors) Power budget (operable transistors) 2 • In 45nm TSMC process, 7% of 300mm die can operate at full frequency • In 32nm, 3.5% Goulding et al. Conservation cores: Reducing the energy of mature computations. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 205–218, Pittsburgh, Pennsylvania, March 2010. Sunday, June 5, 2011

  3. Specialization Is a Promising Approach R. Hameed et al., “Understanding sources of inefficiency in general-purpose chips,” ISCA '10 G. Venkatesh et al., “Conservation cores: reducing the energy of mature computations,” ASPLOS '10 J. Kelm, D. Johnson, W. Tuohy, S. Lumetta, and S. Patel, “Cohesion: a hybrid memory model for accelerators,” ISCA '10 H. Franke et al., “Introduction to the wire-speed processor and architecture,” IBM Journal of Research and Development, vol. 54, no. 1, pp. 3:1–3:11, 2010. V. Govindaraju, C. Ho, and K. Sankaralingam, “Dynamically Specialized Datapaths for energy efficient computing,” HPCA ’11 M. Lyons, M. Hempstead, G. Wei, and D. Brooks, “The Accelerator Store framework for high- performance, low-power accelerator-based systems,” Computer Architecture Letters, vol. 9, no. 2, pp. 53–56, 2010. C. Cascaval, S. Chatterjee, H. Franke, K. Gildea, and P. Pattnaik, “A taxonomy of accelerator architectures and their programming models,” IBM Journal of Research and Development, vol. 54, no. 5, p. 5, 2010. R. Hou et al., “Efficient data streaming with on-chip accelerators: Opportunities and challenges,” HPCA ’11 N. Goulding et al., “GreenDroid: A Mobile Application Processor for Silicon’s Dark Future,” Hotchips ‘10. Sunday, June 5, 2011

  4. An Ideal Accelerator System High Performance Low Energy Easy to Program Software Portability Sunday, June 5, 2011

  5. Accelerator Design Processes Application Sunday, June 5, 2011

  6. Accelerator Design Processes Application Microarch. Sunday, June 5, 2011

  7. Accelerator Design Processes Application Microarch. Arch. Sunday, June 5, 2011

  8. Accelerator Design Processes Application Application ! Microarch. Arch. Arch. Microarch. Sunday, June 5, 2011

  9. Accelerator Design Processes Application Application Application ! Microarch. Arch. Arch. Arch. Microarch. Sunday, June 5, 2011

  10. Accelerator Design Processes Application Application Application ! Microarch. Arch. Arch. Microarch. Arch. Microarch. Sunday, June 5, 2011

  11. Extending Software Abstractions to Hardware Application Libraries Machine Code Micro-ops Execution core Caches Memory Sunday, June 5, 2011

  12. Extending Software Abstractions to Hardware Application Libraries Machine Code Micro-ops Execution core Caches Memory Sunday, June 5, 2011

  13. Extending Software Abstractions to Hardware Application Raise HW/SW interface Libraries Machine Code Micro-ops Execution core Caches Memory Sunday, June 5, 2011

  14. Extending Software Abstractions to Hardware Application Raise HW/SW interface Libraries Machine Code Extend interfaces from Micro-ops libraries to hardware Execution core Caches Memory Sunday, June 5, 2011

  15. Extending Software Abstractions to Hardware Application Raise HW/SW interface Libraries Machine Code Extend interfaces from Micro-ops libraries to hardware Execution core Exploit Caches interfaces with specialized Memory hardware Sunday, June 5, 2011

  16. Abstract Datatype Processing SW Arch UArch Sunday, June 5, 2011

  17. Abstract Datatype Processing put(k,v) v get(k) SW class HashTable Arch UArch Sunday, June 5, 2011

  18. Abstract Datatype Processing put(k,v) v get(k) SW class HashTable Arch put $h, $k, $v get $h, $k, $v UArch Sunday, June 5, 2011

  19. Abstract Datatype Processing put(k,v) v get(k) SW class HashTable Arch put $h, $k, $v get $h, $k, $v Hash Table Processor UArch Sunday, June 5, 2011

  20. Compilation & Execution Sequence Labeling SparseVec HashTable Dispatch GP SV HT Sunday, June 5, 2011

  21. The Software Fallback Dispatch Dispatch GP SV GP SV Sunday, June 5, 2011

  22. An Ideal Accelerator System High Performance Low Energy Easy Use - align hardware interfaces with those software is already using Portability - software fallback plan Sunday, June 5, 2011

  23. Enforcing Data Encapsulation set $v,$i,$x get $v,$i,$x dot $v1,$v2,$p CPU Sparse Vector Accelerator Sunday, June 5, 2011

  24. Enforcing Data Encapsulation set $v,$i,$x get $v,$i,$x dot $v1,$v2,$p CPU Sparse Vector Accelerator I A B v i x Sunday, June 5, 2011

  25. Enforcing Data Encapsulation set $v,$i,$x get $v,$i,$x dot $v1,$v2,$p CPU Sparse Vector Accelerator I A B I A B I A B v i x Sunday, June 5, 2011

  26. Enforcing Data Encapsulation set $v,$i,$x get $v,$i,$x dot $v1,$v2,$p CPU Sparse Vector Accelerator I A B I A B I C D A B C D v i x Sunday, June 5, 2011

  27. Specialized Caching for Sparse Vectors 100% Standard Cache VecStore 75% Hit Rate 50% 25% 0% 128 256 512 1024 2048 Storage Capacity (B) Sunday, June 5, 2011

  28. Key Reuse in Hash Tables LZW Compress Parser 100% 75% Pct. Hash Operations 50% 25% 0% 0.1 1 10 100 1000 10000 100000 Number of Keys Sunday, June 5, 2011

  29. Key Reuse in Hash Tables LZW Compress Parser 100% 75% Pct. Hash Operations 50% 25% 0% 0.1 1 10 100 1000 10000 100000 Number of Keys Sunday, June 5, 2011

  30. Key Reuse in Hash Tables LZW Compress Parser 100% 386 entry table 75% 26% of table Pct. Hash Operations 99% of dynamic accesses 50% 25% 0% 0.1 1 10 100 1000 10000 100000 Number of Keys Sunday, June 5, 2011

  31. Key Reuse in Hash Tables LZW Compress Parser 100% 386 entry table 75% 26% of table Pct. Hash Operations 99% of dynamic accesses 50% 94K entry table .1% of table 25% 75% of dynamic accesses 0% 0.1 1 10 100 1000 10000 100000 Number of Keys Sunday, June 5, 2011

  32. Exploiting Key Reuse Compress HTX-M Parser HTX-M Accesses Compress HTX-M Entrystore Accesses put $h,$k,$v get $h,$k,$v Parser HTX-M Entrystore Accesses HTX-C HTX-M Hash Table Accelerator (HTX) Sunday, June 5, 2011

  33. Exploiting Key Reuse Compress HTX-M Parser HTX-M Accesses Compress HTX-M Entrystore Accesses put $h,$k,$v get $h,$k,$v Parser HTX-M Entrystore Accesses 100% Reduction In HTX-M Accesses HTX-C 75% 50% 25% HTX-M 0% Hash Table Accelerator (HTX) 1 10 100 1000 Cache Capacity Sunday, June 5, 2011

  34. Summary Extend software’s encapsulated datatypes into hardware accelerators Natural alignment with standard software engineering Accelerator utility on all applications that use a particular type A software fallback that ensures portability Aggressive optimization of computation and data movement Sunday, June 5, 2011

  35. Research Challenges What are the appropriate types to target? What is the lower bound in complexity? Is there a max number of types a hardware system can support? How do I implment polymorphism efficiently? (e.g., priority queue with arbitrary types and user-defined sort function) How do I optimized enforcement of data encapsulation? (copy-on-read is conservative) Can the execution model support parallel execution? What is type-specific coherence like? Simpler? Uglier? What is the appropriate system-level resource allocation between general and specialized? Between different types? Sunday, June 5, 2011

  36. Thank You Sunday, June 5, 2011

Recommend


More recommend