CUstom Built hEterogeneous Multi-core ArCHitecture design paradigm based simulator : Towards integrated design automation of supercomputing clusters WAran Research FoundaTion
Introducing the User Guide Part I Custom Built Heterogeneous Multi-core architecture design paradigm WAran Research FoundaTion
CUBEMACH Design Paradigm • Simultaneous execution of multiple applications without space-time sharing – for increased resource utilization • Inter-core heterogeneity and Intra-core heterogeneity WAran Research FoundaTion
CUBEMACH Design Space Memory Architecture On Core Network Pseudo Compiler on Silicon Algorithm Level ISA Algorithm Level Functional Units Architecture Space WAran Research FoundaTion
Algorithm Level Functional Units (ALFU) / Algorithm Level Instruction Set Architecture(ALISA ) • ALFUs – Hardwired Functional units executing algorithms (KL graph partitioning, Crout’s algorithm etc) of small problem size • ALISA – Algorithm Level instructions triggering the execution of an ALFU WAran Research FoundaTion
Algorithm Level Functional Units (ALFU) / Algorithm Level Instruction Set Architecture(ALISA ) • Advantages: – Reduced instruction fetches – Reduced control signal generation – Reduced cache misses – Reduced compilation time – Increased performance WAran Research FoundaTion
pseudo Compiler On Silicon: Dynamic code generator-cum-scheduler for simultaneous multiple application execution • Hardware code-generator and scheduler – to cope up with high instruction generation and issue rate • Table based code generator – customizable with respect to architecture WAran Research FoundaTion
pseudo Compiler On Silicon: Dynamic code generator-cum-scheduler for simultaneous multiple application execution Hierarchical Compiler Application (Libraries) Primary COS – Converts PCOS application in the form of libraries to sub-libraries (Sub-Libraries) (Sub-Libraries) Secondary COS – Converts sub- SCOS SCOS libraries to instructions Instruction Instruction WAran Research FoundaTion
On Core Network: High Bandwidth NoC architecture • Designed for varying and high bandwidth requirements of Algorithm Level Functional Units • Uses hierarchical and scalable Multistage Interconnect Network (MIN) for reduced hardware complexity & power consumption • Self routing techniques employed to reduce power consumption WAran Research FoundaTion
Cache organization, Mapping and replacement for simultaneous multiple application • Increased Data mapping required – for simultaneous execution of multiple applications without space-time sharing • Scheduler – Memory integrated mapping scheme adopted • Advanced Replacement strategy for CUBEMACH design paradigm adopted → WAran Research FoundaTion
Sample CUBEMACH Architecture WAran Research FoundaTion
Optimization of heterogeneous multi- core architecture parameters • Exhaustive Design Space Exploration not possible – due to very large design space • Optimizer employs – Simulated Annealing to find architectures matching input specification – Game Theory to choose parameters to be perturbed – KL graph partitioning to group highly communication Functional Units WAran Research FoundaTion
BENSIM (BENchmark SIMulator): Application Cloning and benchmarking CUBEMACH based architecture • Application modeled as a graph – Algorithms form the nodes of a graph – Edges forms the communication across algorithms • By choosing suitable algorithms for the nodes any application can be cloned based on their communication and computation pattern WAran Research FoundaTion
BENSIM (BENchmark SIMulator): Application Cloning and benchmarking CUBEMACH based architecture WAran Research FoundaTion
Input to CUBEMACH simulator ALISA based workload Application Pseudo CUBEMACH Language User heuristics To be added in future Pseudo Language Compiler Application Clone Application in terms of ALISA CUBEMACH Simulator Input No SAGT based specification optimization met? Yes Optimized SAGT – Simulated Annealing + Game Theory WAran Research FoundaTion architecture
Input to CUBEMACH simulator • ALISA based workloads (algorithms) provided to user • User models his/her application as BENSIM graphs using the given workloads • In future, provisions will be provided for users to code their applications in the pseudo CUBEMACH language WAran Research FoundaTion
Comparison of CUBEMACH simulator with other simulators WAran Research FoundaTion
Part II Custom Built Heterogeneous Multi-core Architecture design paradigm based simulator WAran Research FoundaTion
CUBEMACH Simulator Architecture Parameter Values of the heterogeneous architecture Architectural Structure Generation COS ALFU SUB-SIMULATOR SUB-SIMULATOR CLOCK GENERATOR ONNET MEMORY SUB-SIMULATOR SUB-SIMULATOR EVENT HANDLER Simulation LOG Results Dump WAran Research FoundaTion
Parameter Value Selection • GUI is provided to the user to enter the architecture parameter values grouped into various tabs To understand what these parameters mean, read the user guide here WAran Research FoundaTion
Application Input Selection – User generated workload • The application clone (BENSIM graph) developed is given as an adjacency matrix • In the row and the column labels, enter the IDs corresponding to the algorithms • Enter the adjacency matrix in the text area WAran Research FoundaTion
Running the CUBEMACH simulator • The simulator allows the user to check for the validity of the architecture parameter values • After user checks are completed close the UI to start the simulation WAran Research FoundaTion
Recommend
More recommend