custom built heterogeneous multi core architectures
play

CUstom Built HEterogeneous Multi- Core ArCHitectures (CUBEMACH): - PowerPoint PPT Presentation

CUstom Built HEterogeneous Multi- Core ArCHitectures (CUBEMACH): Breaking the Conventions Nagarajan Venkateswaran Director, Waran Research Foundation Karthikeyan Palavedu Saravanan - Nachiappan Chidambaram Nachiappan Research Trainees (2008 -


  1. CUstom Built HEterogeneous Multi- Core ArCHitectures (CUBEMACH): Breaking the Conventions Nagarajan Venkateswaran Director, Waran Research Foundation Karthikeyan Palavedu Saravanan - Nachiappan Chidambaram Nachiappan Research Trainees (2008 - 2010), Waran Research Foundation Aravind Vasudevan - Balaji Subramaniam - Ravindhiran Mukundarajan Former Research Trainees (2007 - 2009), Waran Research Foundation 1 Chennai, India

  2. Motivation : Heterogeneity Redefined • Cost Effective High Performance Custom Built Heterogeneous Multi-Core Node Design for wider class applications – Inter and Intra core heterogeneity • Breaking the Conventions – Multiple User Multiple Application without Space- Time sharing in a Cluster : Cost sharing across users – Single User Multiple Application without Space-Timer Sharing (non-multiprogramming) : Cost sharing across applications 2 Chennai, India

  3. Overview • Custom Built Heterogeneous Multi-Core Architectures (CUBEMACH) • Design Space – Architectural Space – Optimization Space – Customer Vendor Interaction – Simulation Space • CUBEMACH Design and Simulation Tool Framework • Conclusion 3 Chennai, India

  4. Custom Built Heterogeneous Multi-Core Architectures (CUBEMACH) • CUBEMACH promises – Increased Resource Utilization – Multiple Application Flavored Architectures – Elimination of Space Time Sharing at the Quantum Level during Multiple Application Execution – Manufacturing and Operational Cost reduction 4 Chennai, India

  5. Overview • Custom Built Heterogeneous Multi-Core Architectures (CUBEMACH) • Design Space – Architectural Space – Optimization Space – Customer Vendor Interaction – Simulation Space • CUBEMACH Design and Simulation Tool Framework • Conclusion 5 Chennai, India

  6. CUBEMACH Design Paradigm 6 Chennai, India

  7. Architectural Design Space - CUBEMACH SRAM DRAM ALISA Memory ONNET Compiler-On- Silicon ALFU PCOS SCOS CUBEMACH Architectural Space 7 Chennai, India

  8. Architectural Space • Why ALU Why Not ALFU?? – Hardwired units – Design : Homogeneously Structured – Reduced Instruction Generation & Fetches : Employ a Higher Level ISA – Reduced memory-functional unit interaction – Helps execute multiple applications without space & time sharing 8 Chennai, India

  9. Algorithm Level Functional Unit ALFU HLFU Delay Requirements Control ALFU Types Centralize/ Decentralized Control Algorithm Scalar Unit Class of Size Algorithms Class of Memory Units ALFU HLFU Number In Characteristics of MIP Processor Cells Input Bits Grain Size Type of MIP Architecture Cell 9 Chennai, India

  10. ALU vs ALFU Instruction Generation Results 10 Chennai, India

  11. Architectural Space Contd … Sample Algorithm Level Functional Units • Scalar Units • Matrix Centric Units • Scalar Adder / Subtractor • Matmul • Scalar Multiplier • Matadd • Scalar Divider • Chain Matadd • Comparator • Sorter • Graph Theoretic Units • Multiple Operand Adder • Graph Traversal Unit – • Min / Max Finder BFS, DFS • Vector Units • KL Graph Partitioning • Inner Product 11 Chennai, India

  12. Architectural Space Contd… ALISA – Algorithm Level Instruction Set Architecture • Algorithm Level Instructions • Triggers ALFUS • ALISA Multiple VLIWs • ALISA for heterogeneous multi-cores 12 Chennai, India

  13. Architectural Space Contd … Hierarchical Compilation Scheme Application • PCOS Partitions A Problem Into Sub-Problems – Level 1 PCOS Sub - Application • SCOS Partitions The Sub- Problems Into ALFU Level SCOS Instruction – Level 2 Instruction 13 Chennai, India

  14. ALISA & Compiler On Silicon No of ALISA Fields Field Length Types of Instructions in Decoding/ Encoding Logic ISA Number of Instructions Per Types of Instructions Per ALISA ALISA ALISA Rate of Output Scheduler O/P BISA No. of parallel Generation Length Generation rate Units Scheduler Processing rate No. of I/O Ports PCOS SCOS No. of Ports Compiler-On- Silicon 14 Chennai, India

  15. Architectural Space Contd … ON-Node-Network Architecture Global Router Sub-Local Router Local Router ALFU Population Core 2D - Torus 15 Chennai, India

  16. Architectural Space Contd … ON-Node-Network Architecture H- Tree Global Router Topology Local Router Sub-Local Router 16 Chennai, India

  17. Comparison of Conventional NOCs with ONNET ONNET Conventional NOCs Type of Switch MIN Crossbar N 2 Number of N* log 2 (N) Routers Hierarchy Yes No Switching Latency Log 2 (Number of Number of Inputs Inputs) * Switch * Switch Delay Delay 17 Chennai, India

  18. On Node Network Architecture Route Data Rate Location No. of Buffer/ I/O Port Decoders Stack Size Decoding Logical Rate Grouping Routers Length of Address HLFU Stack Decoding Count Organization Input Traffic No. of Buffers ONNET Packet Size Type of MIN Packet Packetization Input/Output Switching I/P Data Size Destination Output Data Router ID Destination HLFU ID Path Word Length Size ID Latency Buffer Size 18 Chennai, India

  19. Overview • Motivation • Custom Built Heterogeneous Multi-Core Architectures (CUBEMACH) • Design Space – Architectural Space – Optimization Space – Customer Vendor Interaction – Simulation Space • CUBEMACH Design and Simulation Tool Framework • Conclusion 19 Chennai, India

  20. Optimization Space 5a Desired Power to Performance Ratio 5 3 4 9 Selected Simulated Game Final Parameters Annealing Theory CUBEMACH Multiple Multiple Multiple Applications Applications Applications 6 5b Input Calculated 2 Initial Candidate Power to Architecture Core Performance Parameters 1 Formation Ratio CUBEMACH Optimization 7 Space Power Model CUBEMACH Architectural 8 Space Performanc CUBEMACH e Model Simulation Space 20 Chennai, India

  21. Optimization Space • Generates Optimized CUBEMACH for input specifications such as, – Power – Performance – Cost – Initial Architecture • Power and Performance Model • Uses GT and SA for optimization of Power and performance • Uses KL For Core Grouping 21 Chennai, India

  22. Sample CUBEMACH Architecture 22

  23. CUBEMACH Design Implementation : Supercomputer On Chip (SCOC) IP Cores 23 Chennai, India

  24. SCOC IP Cores • ALFUs designed as SCOC IP Cores • Soft IP Core • Coarse-grained Reusable Soft IP Cores • Scalable IP Cores 24 Chennai, India

  25. Optimization Space Contd … Customer Vendor Interaction App 1 CUBEMACH Customers Application App 2 Node Requirements – App 3 Manufacturers/ Power & Performance App 4 System Vendors Simultaneous Fabrication of Multiple IP Cores Applications Workload Generation Layout Intermediate Format SCOC IP CORES CUBEMACH Design Final Space Architecture Optimized Initial CUBEMACH` Heterogeneous Multi Core Candidate Intermediate Architecture CUBEMACH Simulator Optimizer 25 Chennai, India

  26. Overview • Motivation • Custom Built Heterogeneous Multi-Core Architectures (CUBEMACH) • Design Space – Architectural Space – Optimization Space – Customer Vendor Interaction – Simulation Space • CUBEMACH Design and Simulation Tool Framework • Conclusion 26 Chennai, India

  27. CUBEMACH Simulator • pThread based Simulator • Evaluates candidate CUBEMACH Architecture • Feed results to CUBEMACH Optimizer • CUBEMACH Optimization Engine (COE) produces Optimized Architecture • Simulation & Optimization : An iterative process • Consists of ALFU Sub-Simulator COS Sub-Simulator ONNET Sub-Simulator Memory Sub-Simulator 27 Chennai, India

  28. CUBEMACH Simulator 28

  29. What we have seen . . . Integrated CUBEMACH Design Paradigm … 29 Chennai, India

Recommend


More recommend