The Cray 1 Time line 1969 -- CDC Introduces 7600, designed by - PowerPoint PPT Presentation

The Cray 1

Time line • 1969 -- CDC Introduces 7600, designed by cray. • 1972 -- Design of the 8600 stalls due to complexity. CDC can’t afford the redesign Cray wants. He leaves to start Cray Research • 1975 -- CRI announces the Cray 1 • 1976 -- First Cray-1 ships

Vital Statistics • 80Mhz clock • A very compact machine -- fast! • 5 tonnes • 115kW -- freon cooled • Just four kinds of chips • 5/4 NAND, Registers, memory, and ???

Vital Statistics • 12 Functional units • >4KB of registers. • 8MB of main memory • In 16 banks • With ECC • Instruction fetch -- 16 insts/cycle

Key Feature: Registers • Lots of registers • T -- 64 x 64-bit scalar registers • B -- 64 x 24-bit address registers • B+T are essentially SW-managed L0 cache • V -- 8 x 64 x 64-bit vector registers

Key Feature: Vector ops • This is a scientific machine • Lots of vector arithmetic • Support it in hardware

Cray Vectors • Dense instruction encoding -- 1 inst -> 64 operations • Amortized instruction decode • Access to lots of fast storage -- V registers are 4KB • Fast initiation • vectors of length 3 break even. length 5 wins. • No parallelism within one vector op!

Vector Parallelism: Chaining for i in 1..64 Source code a[i] = b[i] + c[i] * d[i] for i in 1..64 t[i] = c[i] * d[i] Naive hardware for i in 1..64 a[i] = t[i] + b[i] for i in 1..64 for i in 1..64 Cray hardware t = c[i] * d[i] a[i] = t + b[i] ‘t’ is a wire In lock step

Vector Tricks Sort pair in A and B V1 = A ABS(A) V2 = B V1 = A V3 = A-B V2 = 0-V1 VM = V3 < 0 VM = V1 < 0 V2 = V2 merge V1 V3 = V1 merge V2 VM = V3 > 0 V1 = V1 merge V2 No branches!

Vector Parallelism: OOO execution • Just like other instructions, vector ops can execute out-of-order/in parallel • The scheduling algorithm is not clear • I can’t find it described anywhere • Probably similar to 6600

Tarantula: A recent vector machine • Vector extensions to the 21364 (never built) • Basic argument: Too much control logic per FU (partially due to wire length) • Vectors require less control.

Tarantula Archictecture • 32 Vector registers • 128, 64-bit values each • Tight integration with the OOO-core. • Vector unit organized as 16 “lanes” • To FUs per lane • 32 parallel operations • 2-issue vector scheduler

Amdahl’s Rule • 1 byte of IO per flops • Where do you get the BW and capacity needed for vector ops? • The L2!

Vector memory accesses. • Only worry about unit-stride -- EZ and covers about 80% of cases. • However... Large non-unit strides account for about 10% of accesses • Bad for cache lines • 2-stride is about 4%

Vector Caching Options • L1 or L2 • L1 is too small and to tightly engineered • L2 is big and highly banked already • Non-unit strides don’t play well with cache lines • Option 1: Just worry about unit-stride • Option 2: Use single-word cache lines (Cray SV1)

Other problems • Vector/Scalar consistency • The vector processor accesses the L2 directly -- Extra bits in the L2 cache lines • Scalar stores may be to data that is then read by vector loads -- Special instruction to flush store queue

Tarantula Impact • 14% more area • 11% more power • 4x peak Gflops (20 vs 80) • 3.4x Gflops/W

The Cray 1 Time line 1969 -- CDC Introduces 7600, designed by - PowerPoint PPT Presentation

The Cray 1 Time line 1969 -- CDC Introduces 7600, designed by cray. 1972 -- Design of the 8600 stalls due to complexity. CDC cant afford the redesign Cray wants. He leaves to start Cray Research 1975 -- CRI announces the Cray 1

Application Performance Tuning on Cray XT Systems Luiz DeRose John Levesque PE Director CSCE

Cray Lustre Model Roadmap Cory Spitz and Derek Robb Cray Inc. 5/24/2011 Introduction and Agenda

FFT libraries on Cray XT: CRay Adaptive FFT (CRAFFT) Jonathan Bentz Cray Inc. Outline

Introducing the Cray XMT Petr Konecny November 29 th 2007 Agenda Shared memory programming

Howard Pritchard and Igor Gorodetsky Cray, Inc. Cray User Group Conference 2011 1 Cray User

The Slope of a Line The Slope of a Line The Slope of a Line The Slope of a Line The Slope of a

Title Slide Math 696 Class July 19, 2002 Line 1 Line 2 Line 3 Line 4 Line 5 Line 6 Line 7

I/O Performance on Cray XC30 Zhengji Zhao 1) , Doug Petesch 2) , David Knaak 2) , and Tina Declerck

Dave Strenski, Cray Inc. Cray User Group, Atlanta 5-5-09 Storaasli - MRSC - 29 M 07 3 FPGA

Detecting Application Load Imbalance on Cray Systems Heidi Poxon Technical Lead, Performance

Environment (CLE) Performance Jeff Larkin Jeff Kuehn Cray Inc. ORNL <larkin@cray.com>

Cray I/O Software Enhancements Tom Edwards tedwards@cray.com C O M P U T E | S T O R E

Application Characteristics and Performance on a Cray XE6 Performance on a Cray XE6 Courtenay T.

GTC Overflow PARQUET Cray Inc. Confidential Slide 2 Cray has a long tradition of

ALPS Tutorial Ascent Michael Karo mek@cray.com Topics A look back at Base Camp

Diagnostic Capabilities of the Red Storm Compliance Test Suite Mike Davis Cray Inc.

Using Likely Invariants For Automated Software Fault Localization Swarup Sahoo, John Criswell,

Improved Debugging Using Automatic Fault-localization Techniques Mary Jean Harrold ADVANCE

Software Visualization Procedures Objects Files Presented by Sam Davis

Case Study IV: Geometrical Modeling of the heart and the head Moritz Dannhauer Motivation

Prevalence of Single-Fault Fixes and its Impact on Fault Localization Alexandre Perez, Rui Abreu,

Testing and Analysis of Next Generation Software Mary Jean Harrold College of Computing Georgia

Testing and Debugging Project 1: Code Coverage Projects

Identifying Bug Signatures Using Discriminative Graph Mining Hong Cheng 1 , David Lo 2 , Yang Zhou

Sambuz

Useful Links

Newsletter

Mail Us

The Cray 1 Time line 1969 -- CDC Introduces 7600, designed by - PowerPoint PPT Presentation

The Cray 1 Time line 1969 -- CDC Introduces 7600, designed by cray. 1972 -- Design of the 8600 stalls due to complexity. CDC cant afford the redesign Cray wants. He leaves to start Cray Research 1975 -- CRI announces the Cray 1

Application Performance Tuning on Cray XT Systems Luiz DeRose John Levesque PE Director CSCE

Cray Lustre Model Roadmap Cory Spitz and Derek Robb Cray Inc. 5/24/2011 Introduction and Agenda

FFT libraries on Cray XT: CRay Adaptive FFT (CRAFFT) Jonathan Bentz Cray Inc. Outline

Introducing the Cray XMT Petr Konecny November 29 th 2007 Agenda Shared memory programming

Howard Pritchard and Igor Gorodetsky Cray, Inc. Cray User Group Conference 2011 1 Cray User

The Slope of a Line The Slope of a Line The Slope of a Line The Slope of a Line The Slope of a

Title Slide Math 696 Class July 19, 2002 Line 1 Line 2 Line 3 Line 4 Line 5 Line 6 Line 7

I/O Performance on Cray XC30 Zhengji Zhao 1) , Doug Petesch 2) , David Knaak 2) , and Tina Declerck

Dave Strenski, Cray Inc. Cray User Group, Atlanta 5-5-09 Storaasli - MRSC - 29 M 07 3 FPGA

Detecting Application Load Imbalance on Cray Systems Heidi Poxon Technical Lead, Performance

Environment (CLE) Performance Jeff Larkin Jeff Kuehn Cray Inc. ORNL &lt;larkin@cray.com&gt;

Cray I/O Software Enhancements Tom Edwards tedwards@cray.com C O M P U T E | S T O R E

Application Characteristics and Performance on a Cray XE6 Performance on a Cray XE6 Courtenay T.

GTC Overflow PARQUET Cray Inc. Confidential Slide 2 Cray has a long tradition of

ALPS Tutorial Ascent Michael Karo mek@cray.com Topics A look back at Base Camp

Diagnostic Capabilities of the Red Storm Compliance Test Suite Mike Davis Cray Inc.

Using Likely Invariants For Automated Software Fault Localization Swarup Sahoo, John Criswell,

Improved Debugging Using Automatic Fault-localization Techniques Mary Jean Harrold ADVANCE

Software Visualization Procedures Objects Files Presented by Sam Davis

Case Study IV: Geometrical Modeling of the heart and the head Moritz Dannhauer Motivation

Prevalence of Single-Fault Fixes and its Impact on Fault Localization Alexandre Perez, Rui Abreu,

Testing and Analysis of Next Generation Software Mary Jean Harrold College of Computing Georgia

Testing and Debugging Project 1: Code Coverage Projects

Identifying Bug Signatures Using Discriminative Graph Mining Hong Cheng 1 , David Lo 2 , Yang Zhou

Sambuz

Useful Links

Newsletter

Mail Us

Environment (CLE) Performance Jeff Larkin Jeff Kuehn Cray Inc. ORNL <larkin@cray.com>