RAMP-White / FAST-MP Hari Angepat and Derek Chiou Electrical and - PowerPoint PPT Presentation

RAMP-White / FAST-MP Hari Angepat and Derek Chiou Electrical and Computer Engineering University of Texas at Austin Supported in part by DOE, NSF, SRC,Bluespec, Intel, Xilinx, IBM, and Freescale

RAMP-White Overview  Use existing FPGA processor implementations to build scalable, flexible, coherent shared memory platforms that run standard operating systems  Standard ISA/OS enables more complex applications such as software emulators (QEMU) when desired.

RAMP White Architecture  Classic shared memory machine design Processor Processor Intersection Intersection Router Router Unit Unit IO/MEM IO/MEM

RAMP White Architecture Processor Processor Proc shim Proc shim Intersection Intersection Ring Ring NIU NIU Router Router Unit Unit Periph shim Periph shim Peripheral bus Peripheral bus IO IO DRAM DRAM IO Device Device Device RAMP-White

RAMP White Architecture  Model CMP/SMP targets • Coherent shared memory platform • Single image OS  RAMP scalability (1K cores) via spatial and temporal replication Processor Processor Proc shim Proc shim Intersection Intersection Ring Ring NIU NIU Router Router Unit Unit Periph shim Periph shim Peripheral bus Peripheral bus IO IO DRAM DRAM IO Device Device Device RAMP-White

RAMP White Architecture  Ability to use commodity cores: • SparcV8: Leon3 soft‐core • PowerPC: PPC405 hard‐core • Configurable coherence protocol, enginesc Processor Processor Proc shim Proc shim Intersection Intersection Ring Ring NIU NIU Router Router Unit Unit Periph shim Periph shim Peripheral bus Peripheral bus IO IO DRAM DRAM IO Device Device Device RAMP-White

RAMP White Architecture  Configurable modules: • NIC, network, coherence engine, intersection unit  Modules connected by Connectors : • Point‐to‐point FIFOs that can model target time if required  Shim adapters Processor Processor Proc shim Proc shim Intersection Intersection Ring Ring NIU NIU Router Router Unit Unit Periph shim Periph shim Peripheral bus Peripheral bus IO IO DRAM DRAM IO Device Device Device RAMP-White

RAMP-White Status  Working: • Multi processor Leon • Soft‐fp kernel and userspace as initramfs • Standard pthread Splash benchmarks  Still debugging: • Multichip crossing with scalable interrupt components • Integration with parametrizable FAST cache model  See me during retreat if interested in Alpha release

Prototype (See at Demo)  Hardware • Sparc V8 32bit soft‐core processor (Leon3) • 50 Mhz core clock, soft‐FP, 16KB Icache, Dcache bypassed • GRLIB Components {serial, ethernet, ddr, jtag}  Software • Linux SMP 2.6.21 for Leon3 • Pthread‐based Splash2 benchmarks • RAM disk rootfs with simple userspace apps  Platform • BEE2 control FPGA with JTAG based programming • Ethernet for kernel loading/debugging RAMP-White

FAST-MP

FAST-MP: High Level Goal  Multi‐resolution coherent shared memory target emulation • Predict performance/power for wide range of micro‐architectures at accuracies ranging from cycle accurate to functional‐only • Capable of running real ISAs aided by binary translation (x86, Sparc, PowerPC, etc), operating systems (unmodified Windows, Linux), compilers, applications (SQLServer, Apache, etc) • Extensible/flexible (new instructions, different micro architectures)

Performance Modeling on RAMP-White  RAMP‐White host predicts RAMP‐White target performance perfectly • Predicting performance of arbitrary micro‐architectures requires additional support  FAST (FPGA Accelerated Simulation Techniques) uses a timing model to predict performance of arbitrary micro‐architecture • Special purpose structure designed to predict time • Very small (complex model in a fraction of an FPGA) • Uses same functional model for any micro‐architecture  White as a scalable functional model for FAST‐MP

FAST (FPGA Accelerated Simulation)  Speculative FM with checkpoint/rollback of FM when FM/TM paths diverge • Ex) branch mispredict/resolve

FAST-MP Approach  Multicore functional model executes as it wishes • Functional instruction stream generated (per core) and sent to timing model • Rollback when functional model execution differs from timing model • Branch mispredictions, address speculation, etc.  Possible for functional model to access memory in different order than target

FAST-MP Memory Reordering  All memory references tagged with a version number  FM passes a version number in trace to TM • essentially a precondition on the validity of the given trace  If TM version != FM version • Freeze timing models (to avoid corrupting TM) • Rollback functional models to restore correct memory/architectural state • Use TM directed order to re‐execute

White Processor Processor Intersection Intersection Router Router Unit Unit IO/MEM IO/MEM

White + Timing Model  PowerPC/Sparc ISA with arbitrary timing model Processor Processor Timing Model Net model Timing Model Intersection Intersection Router Router Unit Unit IO/MEM IO/MEM

White + VM + Timing Model  Sparc ISA with QEMU to emulate any ISA  Requires trace/rollback: • Hardware • Software (QEMU) ‐ can also be hardware accelerated QEMU x86 QEMU x86 VCPU VCPU SMP OS X86 Timing X86 Timing Processor Processor Timing Model Net model Timing Model Model Model Intersection Intersection Router Router Unit Unit IO/MEM IO/MEM

Probability of Reordered Memory Ops  Functionally‐driven speculation in a MP costly if timing ordered memory references conflict • Preliminary study with on X86 applications studying atomic operations • Use Pin dynamic instrumentation tool to monitor every atomic operation running a multi‐threaded app • Analyze inter‐atomic distance for existing shared memory workloads (Splash2, Parsec)

Interprocessor Atomic Reuse Distance 35% 30% 25% FFT Percent Atomic Operations LU Ocean 20% Radix BlackScholes BodyTrack 15% FaceSim Ferret FluidAnimate 10% FreqMine Swaptions 5% 0% 0 2500 7500 10000 20000 30000 40000 50000 60000 70000 80000 90000 Interprocessor Reuse Distance (Cycles)

Task Size Scaling on Intel CMPs 4.00 3.50 3.00 Speedup Normailzed to Serial Implementation Xeon5140‐1Thread XeonX3230‐1Thread Xeon5140‐2Threads 2.50 XeonX3230‐2Threads Xeon5140‐4Threads XeonX3230‐4Threads 2.00 1.50 1.00 0.50 0.00 0 1000 2000 3000 4000 5000 6000 7000 8000 Task Size (cycles)

FAST-MP Can Be Less Than Accurate!  Nearly accurate • Functional model backpressured by timing model • Don’t want to overflow buffers • Each functional core roughly at correct instruction relative to other cores • Do not rollback to reorder memory operations • Still correct, just locks taken in different order • Eliminate rollback overheads, probably quite accurate • Model RAMP‐White on FAST‐MP to check accuracy  Functional + cache • Run with just cache simulators  Etc.

QEMU on White-Leon3  QEMU 0.9.1 with patches • Some issues remaining with Dyngen for V8 ISA with Leon3 cross compiler  For initial Linux Boot: • X86 instructions: 1 • QEMU uOPs: ~3.1 • Sparc instructions: ~22.5 •  High overheads involved in address computation, segmentation checks, software tlb, etc  Can modify/replace Leon3 to improve efficiency • MicroOP‐based processor

Conclusions  Initial RAMP‐White Alpha design functional  FAST‐MP • Provide various ISAs • Cycle‐accuracy to purely functional • Developing power models  FAST‐MP will run on top of RAMP‐White as well as standard multicore system

Questions…

RAMP-White / FAST-MP Hari Angepat and Derek Chiou Electrical and - PowerPoint PPT Presentation

RAMP-White / FAST-MP Hari Angepat and Derek Chiou Electrical and Computer Engineering University of Texas at Austin Supported in part by DOE, NSF, SRC,Bluespec, Intel, Xilinx, IBM, and Freescale RAMP-White Overview Use existing FPGA

RAMP RAMP RAMP RAMP Research Administrators Management Program Use of Animal Subjects (IACUC)

SR 874/Don Shula Expressway SR 874/Don Shula Expressway Ramp Connector Ramp Connector Ramp

I-64/I-264 Ramp Improvements and I-264/Witchduck Road Interchange & Ramp Extension 1

Ramp up plan 2 PRINCIPLES OF PEOPLE AND OPERATIONS RAMP-UP Ensure a safe & confident

Under-Ramp Park Under-Ramp Park Schematic Designs Schematic Designs June 12, 2014 1 Transbay

Safety Concerns re: Complete Street Design Van Ramps Encroaching on Cycle Tracks Van Parked

White tigers Jonica Farrell White fur black stripes White tigers are majestic and beautiful!

PORT RICHMOND LIBRARY E XT E RIOR RAMP RE VIE W E XT E RIOR RAMP RE VIE W - 5/ 21/ 19

AVIATION PROGRAMS August 2019 August 2019 Routine Airport Maintenance Program (RAMP) RAMP

Ramp Metering Jeremy Dilmore, P.E. FDOT District Five TSM&O Engineer Ramp Signaling in

Presentation by Snowy Hydro, AGL, and Hydro Tasmania to AEMC on Ramp Rates Draft Rule

Morrison Boat Ramp Non-Motorized Development Morrison Cove & Boat Ramp Management Vision The

SH 358 RAMP REVERSAL PROJECT Martin C. Horst, PE CC Area Engineer April 13, 2017 SH0358

OpenVMS Security Update OpenVMS Security Update TCSEC C2 Ramp TCSEC C2 Ramp - -> Common

1 Target Model - Units Target Model Channel (1) Inside edge Channel semantics Ports

RAMP for Exascale RAMP Wrap August 25th, 2010 Kathy Yelick NERSC Overview NERSC represents

CombiHeader: Minimizing the Number of Shim Headers in Redundancy Elimination Systems Sumanta

Haskell+STM Nalini Vasudevan Satnam Singh Objectives Goal: trying to encode various kinds of

A preliminary result on synchronization of heterogeneous agents via funnel control Stephan Trenn

WIDEX Working Group IETF 65 Chairs: David Bryan, Eunsoo Shim, Dean Willis Note Well Any

Roaming tiger Anton Cherepanov cherepanov@eset.sk Intro In 2014 ESET observed similar attacks in

Lepton Collider Simulations With WHIZARD New Developments Wolfgang Kilian University of Siegen

Software Components for Secure Mobile Web Application Platforms Patrik Persson & Bjrn

LIE BRACKETS AND STABILITY OF SWITCHED SYSTEMS Daniel Liberzon Coordinated Science

RAMP-White / FAST-MP Hari Angepat and Derek Chiou Electrical and - PowerPoint PPT Presentation

RAMP-White / FAST-MP Hari Angepat and Derek Chiou Electrical and Computer Engineering University of Texas at Austin Supported in part by DOE, NSF, SRC,Bluespec, Intel, Xilinx, IBM, and Freescale RAMP-White Overview Use existing FPGA

RAMP RAMP RAMP RAMP Research Administrators Management Program Use of Animal Subjects (IACUC)

SR 874/Don Shula Expressway SR 874/Don Shula Expressway Ramp Connector Ramp Connector Ramp

I-64/I-264 Ramp Improvements and I-264/Witchduck Road Interchange &amp; Ramp Extension 1

Ramp up plan 2 PRINCIPLES OF PEOPLE AND OPERATIONS RAMP-UP Ensure a safe &amp; confident

Under-Ramp Park Under-Ramp Park Schematic Designs Schematic Designs June 12, 2014 1 Transbay

Safety Concerns re: Complete Street Design Van Ramps Encroaching on Cycle Tracks Van Parked

White tigers Jonica Farrell White fur black stripes White tigers are majestic and beautiful!

PORT RICHMOND LIBRARY E XT E RIOR RAMP RE VIE W E XT E RIOR RAMP RE VIE W - 5/ 21/ 19

AVIATION PROGRAMS August 2019 August 2019 Routine Airport Maintenance Program (RAMP) RAMP

Ramp Metering Jeremy Dilmore, P.E. FDOT District Five TSM&amp;O Engineer Ramp Signaling in

Presentation by Snowy Hydro, AGL, and Hydro Tasmania to AEMC on Ramp Rates Draft Rule

Morrison Boat Ramp Non-Motorized Development Morrison Cove &amp; Boat Ramp Management Vision The

SH 358 RAMP REVERSAL PROJECT Martin C. Horst, PE CC Area Engineer April 13, 2017 SH0358

OpenVMS Security Update OpenVMS Security Update TCSEC C2 Ramp TCSEC C2 Ramp - -&gt; Common

1 Target Model - Units Target Model Channel (1) Inside edge Channel semantics Ports

RAMP for Exascale RAMP Wrap August 25th, 2010 Kathy Yelick NERSC Overview NERSC represents

CombiHeader: Minimizing the Number of Shim Headers in Redundancy Elimination Systems Sumanta

Haskell+STM Nalini Vasudevan Satnam Singh Objectives Goal: trying to encode various kinds of

A preliminary result on synchronization of heterogeneous agents via funnel control Stephan Trenn

WIDEX Working Group IETF 65 Chairs: David Bryan, Eunsoo Shim, Dean Willis Note Well Any

Roaming tiger Anton Cherepanov cherepanov@eset.sk Intro In 2014 ESET observed similar attacks in

Lepton Collider Simulations With WHIZARD New Developments Wolfgang Kilian University of Siegen

Software Components for Secure Mobile Web Application Platforms Patrik Persson &amp; Bjrn

LIE BRACKETS AND STABILITY OF SWITCHED SYSTEMS Daniel Liberzon Coordinated Science

I-64/I-264 Ramp Improvements and I-264/Witchduck Road Interchange & Ramp Extension 1

Ramp up plan 2 PRINCIPLES OF PEOPLE AND OPERATIONS RAMP-UP Ensure a safe & confident

Ramp Metering Jeremy Dilmore, P.E. FDOT District Five TSM&O Engineer Ramp Signaling in

Morrison Boat Ramp Non-Motorized Development Morrison Cove & Boat Ramp Management Vision The

OpenVMS Security Update OpenVMS Security Update TCSEC C2 Ramp TCSEC C2 Ramp - -> Common

Software Components for Secure Mobile Web Application Platforms Patrik Persson & Bjrn