Carnegie Mellon “Replenishing the Microarchitecture Treasure Chest” Prof. John Paul Shen Electrical and Computer Engineering Department Carnegie Mellon University UT Austin -- Distinguished Lecture Series on Computer Architecture -- April 26, 1999 Page 1 Carnegie Mellon CMuART Members Current Ph.D. Students: CMuART (PhD) Alumni: 1. Bryan Black 1. Ron Bianchini (FORE & CMU) 2. Yuan Chou 2. Mauricio Breternitz (Moto) 3. Alex Dean 3. Trung Diep (Intel) 4. Ryan Rakvic 4. F. Joel Ferguson (UCSC) 5. Bob Rychlik 5. Andrew Huang (Moto) Current M.S. Students: 6. Mikko Lipasti (IBM & UW) 1. Candice Bechem 7. Chris Newburn (Intel) 2. Jonathan Combs 8. Derek Noonburg (S3) 3. Jeffrey Heid 9. Scott Robinson (Intel) 4. Kyle Oppenheim 10. Mike Schuette (Moto) 11. Kent Wilken (UC-Davis) 12. Andy Wolfe (S3) UT Austin -- Distinguished Lecture Series on Computer Architecture -- April 26, 1999 Page 2
Carnegie Mellon Microprocessor Performance Unmatched by Any Other Industry! [John Crawford, Intel, 1993] Doubling every 18 months (1982-1996): total of 800X - Cars travel at 44,000 MPH; get 16,000 miles/gal. - Air travel: L.A. to N.Y. in 22 seconds (MACH 800) - Wheat yield: 80,000 bushels per acre Doubling every 24 months (1971-1996): total of 9,000X - Cars travel at 600,000 MPH; get 150,000 miles/gal. - Air travel: L.A. to N.Y. in 2 seconds (MACH 9,000) - Wheat yield: 900,000 bushels per acre UT Austin -- Distinguished Lecture Series on Computer Architecture -- April 26, 1999 Page 3 Carnegie Mellon Leveraging the Treasure Chest All originally invented in the 1960’s - Pipelining - Cache Memories - Multiple Instruction Issue - Out of Order Execution - Dataflow Machines - Vector Machines - Virtual Memory - Optimizing Compilers - Operating Systems UT Austin -- Distinguished Lecture Series on Computer Architecture -- April 26, 1999 Page 4
Carnegie Mellon Iron Law of Processor Performance Time Processor Performance = --------------- Program Instructions Cycles Time = ------------------ X ---------------- X ------------ Instruction Program Cycle (code size) (CPI) (cycle time) Architecture --> Implementation --> Realization Compiler Designer Processor Designer Chip Designer UT Austin -- Distinguished Lecture Series on Computer Architecture -- April 26, 1999 Page 5 Carnegie Mellon Evolution of Microprocessors 1970-1979 1980-1989 1990-1999 by 2009 Transistor Count 10K-100K 100K-1M 1M-30M 1,000M Clock Frequency 0.2-2MHz 2-20MHz 20-600MHz 10GHz Instruction/cycle << 0.1 0.1-0.8 0.8- 2.4 10 (?) MIPS/MFLOPS << 1 1-20 20-1,400 100,000 UT Austin -- Distinguished Lecture Series on Computer Architecture -- April 26, 1999 Page 6
Carnegie Mellon Strong Diminishing Returns on IPC m88ksim ijpeg 9.0 9.0 8.0 8.0 7.0 7.0 6.0 6.0 5.0 5.0 IPC IPC 4.0 4.0 3.0 3.0 2.0 2.0 1.0 1.0 0.0 0.0 4 8 16 32 64 64 4 8 16 32 li perl 9.0 9.0 8.0 8.0 7.0 7.0 6.0 6.0 IPC 5.0 5.0 IPC 4.0 4.0 3.0 3.0 2.0 2.0 1.0 1.0 0.0 0.0 64 16 64 4 8 16 32 4 8 32 Issue Width Issue Width UT Austin -- Distinguished Lecture Series on Computer Architecture -- April 26, 1999 Page 7 Carnegie Mellon Looking for A Paradigm Shift Revisit previously-assumed limits and try to go beyond these limits. 1970’s - “Flynn’s Bottleneck” ..... Branch Prediction 1990’s - “Dataflow Limit” ..... Value Prediction UT Austin -- Distinguished Lecture Series on Computer Architecture -- April 26, 1999 Page 8
Carnegie Mellon Once Upon a Time...Fall 1995 Load Value Locality: Alpha AXP Value Locality (%) History=1 100 History=16 80 60 40 20 0 cc1-271 cjpeg compress doduc eqntott gawk gperf grep hydro2d mpeg perl quick swm256 sc tomcatv xlisp PowerPC Value Locality (%) 100 80 60 40 20 0 cc1-271 cc1 cjpeg compress doduc eqntott gawk gperf grep hydro2d mpeg perl quick swm256 sc tomcatv xlisp UT Austin -- Distinguished Lecture Series on Computer Architecture -- April 26, 1999 Page 9 Carnegie Mellon Concept of “Value Locality” Load Value Locality 100.0 History=1 Load Value Locality (%) History=16 80.0 60.0 40.0 20.0 0.0 cc1-271 cjpeg compress eqntott gawk gperf grep mpeg perl quick sc xlisp Register Value Locality 100.0 History=1 Register Value Locality (%) History=4 80.0 60.0 40.0 20.0 0.0 cc1-271 cjpeg compress eqntott gawk gperf grep mpeg perl quick sc xlisp UT Austin -- Distinguished Lecture Series on Computer Architecture -- April 26, 1999 Page 10
Carnegie Mellon Dynamic “Pipeline Contraction” Branch Prediction Value Prediction Fetch Fetch Fetch Fetch Fetch Fetch Fetch Fetch Fetch Fetch Dispatch Dispatch Fetch Dispatch Fetch Dispatch Dispatch Dispatch Dispatch Fetch Dispatch Fetch Dispatch Dispatch Execute Execute Dispatch Execute Dispatch Execute Execute Rename Rename Dispatch Rename Dispatch Rename Rename Commit Fetch Commit Execute Commit Execute Commit Commit Op Read Fetch Op Read Rename Op Read Rename Op Read Op Read Dispatch Commit Commit Dispatch Op Read Op Read Execute Rename Commit Op Read UT Austin -- Distinguished Lecture Series on Computer Architecture -- April 26, 1999 Page 11 Carnegie Mellon “Superspeculation” Techniques Dependence Reg. Value Load Value Mem. Alias Prediction Prediction Prediction Prediction FETCH FETCH FETCH FETCH DECODE/ DECODE/ DECODE/ DECODE/ DISPATCH DISPATCH DISPATCH DISPATCH ADDR. EXEC. EXEC. ADDR. TLB TLB COMPL. COMPL. MEM. MEM. COMPL. COMPL. UT Austin -- Distinguished Lecture Series on Computer Architecture -- April 26, 1999 Page 12
Carnegie Mellon SPECint95 Performance (16 wide) Perfect +AP 64K/Infinite 12.0 +LVP 64K/8 ports +VP 11.0 64K/4 ports +2-Phase 64K/2 ports +GAg TC 10.0 Baseline 9.0 8.0 Sustained IPC 7.0 6.0 5.0 4.0 3.0 2.0 1.0 0.0 m88ksim gcc compress li ijpeg perl vortex HM go UT Austin -- Distinguished Lecture Series on Computer Architecture -- April 26, 1999 Page 13 Carnegie Mellon SPECfp95 Performance (16 wide) Perfect 16.0 +AP 64K/Infinite 15.0 +LVP 64K/8 ports 14.0 +VP 64K/4 ports 64K/2 ports +2-Phase 13.0 +GAg TC 12.0 Baseline 11.0 Sustained IPC 10.0 9.0 8.0 7.0 6.0 5.0 4.0 3.0 2.0 1.0 0.0 applu apsi fpppp HM swim mgrid tomcatv Benchmark UT Austin -- Distinguished Lecture Series on Computer Architecture -- April 26, 1999 Page 14
Carnegie Mellon A Possible New Paradigm m88ksim ijpeg 9.0 9.0 8.0 8.0 7.0 7.0 6.0 6.0 IPC 5.0 5.0 IPC 4.0 4.0 3.0 3.0 2.0 2.0 1.0 1.0 0.0 0.0 4 8 64 64 16 32 4 8 16 32 perl li 9.0 9.0 8.0 8.0 7.0 7.0 6.0 6.0 IPC IPC 5.0 5.0 4.0 4.0 3.0 3.0 2.0 2.0 1.0 1.0 0.0 0.0 64 16 64 4 8 16 32 4 8 32 Issue Width Issue Width UT Austin -- Distinguished Lecture Series on Computer Architecture -- April 26, 1999 Page 15 Carnegie Mellon Hybrid Value Predictors V P1 V P2 P r e d ic t io n R a t e s f o r V P 1 a n d V P 2 1 0 0 % N o t 9 0 % Pr e d ic te d 8 0 % In c o r r e c t 7 0 % 6 0 % B o th 5 0 % C o r r e c t 4 0 % S tr id e + 3 0 % U n iq u e 2 0 % F C M 1 0 % U n iq u e 0 % m88ksim SPECint SPECfp SPEC95 ijpeg li applu mgrid tomcatv turb3d compress gcc go perl vortex fpppp swim UT Austin -- Distinguished Lecture Series on Computer Architecture -- April 26, 1999 Page 16
Carnegie Mellon Prediction “Usefulness” Us e fu ln e s s Tr ack in g Im p act o n Pr e d ictio n Rate Us e fu ln e s s T r ac k in g Im p a c t o n IP C 2 .30 2 .2 8 7 2 .2 8 8 100% 90% 2 .25 Not 80% Predic ted 2 .20 70% 60% 2 .14 9 IPC 2 .14 3 Inc orrect 2 .15 50% 40% 2 .10 2 .0 7 3 30% 2 .0 6 4 Correct 20% 2 .05 10% 2 .00 0% A ll S PECf p S PECin t A ll SPECf p SPECint A ll SPECf p SPECint W ithout Tracking W ith Trac king W ith o ut Tra c king W ith Tr ac kin g UT Austin -- Distinguished Lecture Series on Computer Architecture -- April 26, 1999 Page 17 Carnegie Mellon Current Value Prediction Landscape Increasing Sophistication - Hybrid Predictors - Adaptive Predictors Increasing Efficiency - Selective Prediction - Register-based Prediction - Compiler Assistance Extensions Into Other Domains - VLIW-based Value Prediction - Dynamic Instruction Reuse - Aggressive Partial Evaluation UT Austin -- Distinguished Lecture Series on Computer Architecture -- April 26, 1999 Page 18
Carnegie Mellon Instruction Supply Problem Instruction I-cache T-cache Flow Branch FETCH Prediction Instruction Buffer DECODE/ Integer Memory Floating-point Media Memory Data Flow Reorder Buffer Register COMPLETE Data Flow Store D-cache Buffer UT Austin -- Distinguished Lecture Series on Computer Architecture -- April 26, 1999 Page 19 Carnegie Mellon Wide-Machine Instruction Fetch Three Major Challenges: 1. Multiple-Branch I-cache Prediction 2. Multiple Fetch B ranch FE T C H Groups Prediction Instruction Buffer 3. Alignment and Collapsing D EC O D E / D ISPAT C H UT Austin -- Distinguished Lecture Series on Computer Architecture -- April 26, 1999 Page 20
Recommend
More recommend