Motivation & Utility Motivation: Do we really need a Multimedia - PowerPoint PPT Presentation

Motivation & Utility Motivation: Do we really need a Multimedia Extension at all? - Intel’s success with MMX and SSE SMIPS Multimedia Extension - The entire GPU industry (ATI, Nvidia, Intel) - The nascent PPU industry (Ageia, Sony) - MIPS MDMX from SGI - Sony’s in-house GPU (PSP, PS3) Group 2 - Only barrier to ubiquity is how to compile to them! Myron King Utility: What does a Multimedia Extension look like and what does it do? - Expose vector primitives (vector registers replace scalar ones) Asif Khan - Expose DWORD primitives within each vector - Add opcodes which are useful for target applications - Make claims about memory interaction - Convince others it’s actually useful! Getting Started Changing smipsv2 Adding the Coprocessor: Nothing new under the sun: why reinvent the wheel? - At first all in one module but onerous compile times as well as good - Interesting work; lots of infrastructure already in place design practice forced us to modularize our design - Until you implement something, you don’t fully “grok” it - Definition of interfaces for transfer of Data (and state) from control - Still an active area in research, both industrial and academic processor to coprocessor - Cross-pollination which took place in exploration could lead - Once we gained adequate Bluespec skills, this came quite naturally to interesting projects in the future (getting over the learning curve, easier said than done) - Asif is tenacious Bluespec hacker and does the heavy lifting! Implementing the Instructions: Coming up with the specifics: - Determining which instructions run on which processor (some on - DirectX Shader Language (vertex shaders especially) both) was the first step. - MMX and SSE for instruction set extension - Some Cop2 instructions must be run on the control processor as well (SC follows naturally if done correctly) - Discussions with Chris Batten (exploration) - Restrictions on Cop2 instructions allow for easier implementation - Arvind’s insistence on specifying the micro-protocol details early on (no CF instructions and no non-aligned loads and stores) led us to an implementation which would ensure SC but with minimal interlocking (for greater efficiency)

Changing smipsv2 Microarchitecture What did we do to SMIPSv2: - Add a coprocessor module with some new opcodes. - Add a new rule “dispatch” between “pcGen” and “exec” - Change the memory caches: enlarge cache lines to support 128 bit loads and stores - Add more control logic for the interaction with the control processor - Add some cop2 instructions to the control processor execution (those which need both) What’s in the Coprocessor: - only execution and write back stages - Cache interface needed to be changed to route responses - lots of gotcha’s! Getting everything up and running: - Add pre-asm.pl to tool path - Write tests and benchmarks (hand-writing assembly code is no fun!) Runtime Comparison between smipsv2 IPC's for Various Benchmarks on smipsv2 and Baseline Implementation 1 180000.0 smipsv2 without branch predictor 0.9 baseline implementation with branch predictor 160000.0 0.8 140000.0 0.7 120000.0 0.6 100000.0 0.5 0.4 80000.0 0.3 60000.0 0.2 40000.0 0.1 20000.0 0 median multiply qsort towers vvadd 0.0 Branch prediction works: but you already knew that! vvadd multiply

Exploration 1: 16-DWORD Exploration 1: 16-DWORD Vectors Vectors 16-dword vectors but still 4 lanes in the coprocessor Discarding Mispredicted Branches � Register File enlarged to 24 4-dword registers from 8 4-dword � � Single epoch register scheme from smipsv2 falls registers apart Semantics of the control processor instructions and the data � � Coprocessor takes multiple cycles to execute each transfer instructions remain unaltered instruction, allowing the control processor to run Exec rule changed in the control processor to execute LWC2 � ahead and SWC2 in 4 cycles � Another epoch register added which is incremented Exec rule changed in the coprocessor to execute all instructions � every time a branch instruction is dispatched other than the data transfer instructions in 4 cycles � All the coprocessor instructions are dual-tagged Writeback rules in both the control processor and the � coprocessor remain unaltered � Extra checks in the exec rule of the coprocessor to make sure that all instructions which were dispatched before the branch instruction get executed Exploration 2: Variable Length Runtime Comparison between Baseline Vectors and Exploration 1 180000.0 � A control register is added which allows the Baseline Implementation 160000.0 programmer to set the length of the vector 16-dword Vectors Implementation registers using the CTC2 instruction 140000.0 � Length has to be a multiple of 4 and 120000.0 maximum length restricted to 32-dwords 100000.0 � Register File further enlarged to 32 4-dword 80000.0 registers 60000.0 � Mask bits increased to 32 40000.0 � Changes to the exec rules in the control 20000.0 processor and the coprocessor similar to exploration1 0.0 vvaddv multiplyv geometry

Runtimes for Custom Benchmarks (ns) Number of Instructions for Custom Benchmarks 8000 180000.0 vvaddv vvaddv 7000 multiplyv 160000.0 multiplyv geometry geometry 6000 140000.0 120000.0 5000 100000.0 4000 80000.0 3000 60000.0 2000 40000.0 20000.0 1000 0.0 0 Baseline Variable Length Variable Length Variable Length Variable Length Variable Length Variable Length Variable Length Baseline Variable Length Variable Length Variable Length Variable Length Variable Length Variable Length Variable Length Implementation Vectors Vectors Vectors Vectors Vectors Vectors Vectors Implementation Vectors Vectors Vectors Vectors Vectors Vectors Vectors Implementation Implementation Implementation Implementation Implementation Implementation Implementation Implementation Implementation Implementation Implementation Implementation Implementation Implementation (8-dword) (12-dword) (16-dword) (20-dword) (24-dword) (28-dword) (32-dword) (8-dword) (12-dword) (16-dword) (20-dword) (24-dword) (28-dword) (32-dword) Exploration 3: ALU changes for clock speed Timing and Area Comparison improvement � The dot4 instruction creates the longest Total Area and Effective Clock Period of Different Implementations from the Synthesis Tool combinational path Area (units) Effective Clock Period (ns) smipsv2 Implementation 28,837.25 4.26 Baseline Implementation 87,413.50 5.50 � dot4 instruction broken down into mulv and 16-dword Vectors Implementation 147,652.25 5.50 Variable Length Vectors Implementation 172,251.00 5.84 Alternate ALU Implementation 104,651.75 5.00 addh instructions � Register File size is the same as that for the Total Area and Effective Clock Period of Different Implementations from Encounter Area (sq micron) Effective Clock Period (ns) baseline implementation smipsv2 Implementation 464,849.30 7.174 Baseline Implementation 1,415,025.90 9.453 16-dword Vectors Implementation 2,466,711.70 14.520 � Minor changes in design to accommodate for Variable Length Vectors Implementation 2,799,400.60 14.889 Alternate ALU Implementation 1,708,809.50 10.782 the added addh instruction

Conclusion � The baseline implementation is a win! � Explorations have not proven very fruitful Thank you � Memory bottleneck with lengthened vectors � Not changing the register file size increases register pressure on benchmarks � Needed more time for floor planning to get better timing and area from Encounter � A few more benchmarks perhaps � We’re happy with what we’ve accomplished

Motivation & Utility Motivation: Do we really need a Multimedia - PowerPoint PPT Presentation

Motivation & Utility Motivation: Do we really need a Multimedia Extension at all? - Intels success with MMX and SSE SMIPS Multimedia Extension - The entire GPU industry (ATI, Nvidia, Intel) - The nascent PPU industry (Ageia, Sony) -

Utility Flood SOLUTIONS November 9, 2017 UTILITY LIGHTING PRODUCTS 1 1 HO HOWARD WARD

DAS-ITE UTILITY SERVICES DAS Custo me r Co unc il F Y 13 AND F Y 14 Utility Se rvic e Upda te

Storm Water Utility: Creation Update AWWU Presentation to the Assembly Enterprise and Utility

3D Utility Survey and Modeling Resolving the Utility Conundrum 3 D Design and Modeling for Highway

SUE Process SUE Process SUBSURFACE UTILITY ENGINEERING SUBSURFACE UTILITY ENGINEERING Presented

Utility Allowanc e December 12, 2013 1 Utility Allowanc e Initiative Effective April 1 st

Or Oro Va Valley Wa Water Utility ility FY 2019 20 Recommended Budget April 24, 2019 Or Oro

Town of Oro Valley STORM WATER UTILITY COMMISSION Storm Water Utility Service Fee Proposal

FlexPond Marc Crauwels VP Utility Sales September 2019 Asian Utility Week Kuala Lumpur

Lecture 11: Critiques of Expected Utility Alexander Wolitzky MIT 14.121 1 Expected Utility and Its

Image Registration: Utility for image . . . Utility for image . . . An Overview with an Why

Sketch Model Review MotoThresher Empowering Tanzanian Farmers Motivation Motivation

with Polynomial Filters Josiah Manson and Scott Schaefer Texas A&M University Motivation

2012 Utility Program Council Presentation December 13, 2011 Agenda Utility Programs Overall

Utility central baseload plus distributed Solar peak generation Rooftop Solar enhances Utility

E le c tric Utility a nd Wa te r Utility 5- Ye a r Ra te Proposa l 2018 2022 City Counc

OFFICE OF ADOPTION AND CHILD PROTECTION MONTHLY WEBINAR UPDATE August 11, 2020 Agenda

Bulgaria the best opportunities and advantages in Apparel Industry May 2011 Bulgaria - the

Dang Thi Tuoi Biodiversity Conservation Agency (BCA) Ministry of Natural Resources and

What is SSS? How do I get in? What is Student Support Services (SSS)? SSS is a free program

We are all in this together. A complete School Opening Framework is located online as well as

(Make A Special Kid Smile) Awareness Programme MASKS (Make A Special Kid Smile) Who are we? A

DOSH-SIRIM PPE APPROVAL CERTIFICATION & TESTING Presenter; Azmi Musa SIRIM QAS

How Parliamentary Procedure helps the Library Board be open & accountable By John Elvidge,

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us

Motivation & Utility Motivation: Do we really need a Multimedia - PowerPoint PPT Presentation

Motivation & Utility Motivation: Do we really need a Multimedia Extension at all? - Intels success with MMX and SSE SMIPS Multimedia Extension - The entire GPU industry (ATI, Nvidia, Intel) - The nascent PPU industry (Ageia, Sony) -

Utility Flood SOLUTIONS November 9, 2017 UTILITY LIGHTING PRODUCTS 1 1 HO HOWARD WARD

DAS-ITE UTILITY SERVICES DAS Custo me r Co unc il F Y 13 AND F Y 14 Utility Se rvic e Upda te

Storm Water Utility: Creation Update AWWU Presentation to the Assembly Enterprise and Utility

3D Utility Survey and Modeling Resolving the Utility Conundrum 3 D Design and Modeling for Highway

SUE Process SUE Process SUBSURFACE UTILITY ENGINEERING SUBSURFACE UTILITY ENGINEERING Presented

Utility Allowanc e December 12, 2013 1 Utility Allowanc e Initiative Effective April 1 st

Or Oro Va Valley Wa Water Utility ility FY 2019 20 Recommended Budget April 24, 2019 Or Oro

Town of Oro Valley STORM WATER UTILITY COMMISSION Storm Water Utility Service Fee Proposal

FlexPond Marc Crauwels VP Utility Sales September 2019 Asian Utility Week Kuala Lumpur

Lecture 11: Critiques of Expected Utility Alexander Wolitzky MIT 14.121 1 Expected Utility and Its

Image Registration: Utility for image . . . Utility for image . . . An Overview with an Why

Sketch Model Review MotoThresher Empowering Tanzanian Farmers Motivation Motivation

with Polynomial Filters Josiah Manson and Scott Schaefer Texas A&amp;M University Motivation

2012 Utility Program Council Presentation December 13, 2011 Agenda Utility Programs Overall

Utility central baseload plus distributed Solar peak generation Rooftop Solar enhances Utility

E le c tric Utility a nd Wa te r Utility 5- Ye a r Ra te Proposa l 2018 2022 City Counc

OFFICE OF ADOPTION AND CHILD PROTECTION MONTHLY WEBINAR UPDATE August 11, 2020 Agenda

Bulgaria the best opportunities and advantages in Apparel Industry May 2011 Bulgaria - the

Dang Thi Tuoi Biodiversity Conservation Agency (BCA) Ministry of Natural Resources and

What is SSS? How do I get in? What is Student Support Services (SSS)? SSS is a free program

We are all in this together. A complete School Opening Framework is located online as well as

(Make A Special Kid Smile) Awareness Programme MASKS (Make A Special Kid Smile) Who are we? A

DOSH-SIRIM PPE APPROVAL CERTIFICATION &amp; TESTING Presenter; Azmi Musa SIRIM QAS

How Parliamentary Procedure helps the Library Board be open &amp; accountable By John Elvidge,

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us

with Polynomial Filters Josiah Manson and Scott Schaefer Texas A&M University Motivation

DOSH-SIRIM PPE APPROVAL CERTIFICATION & TESTING Presenter; Azmi Musa SIRIM QAS

How Parliamentary Procedure helps the Library Board be open & accountable By John Elvidge,