Accelerate Cycle-Level Multi-Core RISC-V Simulation with Binary - PowerPoint PPT Presentation

Accelerate Cycle-Level Multi-Core RISC-V Simulation with Binary Translation Xuan Guo, Robert Mullins Department of Computer Science and Technology Both the paper and the slides are made available under CC BY 4.0

Motivation • We want to evaluate processor designs with meaningful workloads • Not just microbenchmarks • Existing simulators are too slow for the task • Last year we looked at TLB simulation: • Fast TLB Simulation for RISC-V Systems @ CARRV 2019 • We based the work on top of QEMU • For TLB design, we don’t really need cycle accuracy • The assumption does not hold for cache simulation!

Design Goals • Full-system capable • With the presence of an operating system • Cycle-level simulation • Ability to model multicore interaction • Include cache coherency and shared caches • Fast!

R2VM • R ust R ISC-V V irtual M achine

Design

Prior Art • Igor Böhm, Björn Franke, and Nigel Topham. 2010. Cycle-accurate performance modelling in an ultra-fast just-in-time dynamic binary translation instruction set simulator.

From Single-Core to Multi-Core • We have an accurate single-core cycle-level simulator • We instantiate multiple copies of it in parallel • Assume each single-core simulator is thread safe already • What could go wrong?

Multi-Core Interaction • Prone to distortion from the host • OS scheduler • Length of JITed code • Multithreading • Cannot model interaction within the guest • Single-writer-multiple-reader cache coherency • Micro-contention • Etc

Lockstep Execution • Need to keep simulated cores in sync • So we need to have them run in lockstep • Hard with binary translation

A Failed Attempt … Thread 0 Thread 1 Thread N Core 0 Inst 1 Core 1 Inst 1 Core N Inst 1 Thread Barrier Core 0 Inst 2 Core 1 Inst 2 Core N Inst 2 std::sync::Barrier 100k/s Thread Barrier Spinning 1M/s Core 0 Inst 3 Core 1 Inst 3 Core N Inst 3 Thread Barrier … … …

Lockstep Execution • Need to keep simulated cores in sync • So we need to have them run in lockstep • Hard with binary translation • Thread barriers are slow and do not scale.

Fiber/Coroutine • Yield control within a function • We use stackful fibers • Boost::Coroutine is stackful • Goroutines are stackful • Most modern languages use stackless

Fiber • How is it implemented (traditional approach): • Get the current fiber from TLS • Save registers of current fiber • Switch to the next fiber and set TLS • Switch the stack to the new fiber’s • Restore registers from the new fiber • Restore execution • 50M yields/second

Fiber • fiber_yield_raw: mov [rbp - 32], rsp ; Save current stack pointer mov rbp, [rbp - 16] ; Move to next fiber mov rsp, [rbp - 32] ; Restore stack pointer ret • 80-90M yields/second

Memory Simulation

Memory Access Flow

Performance

Open Source • https://github.com/nbdd0121/r2vm • MIT/Apache-2.0 Dual Licensed • Not GPL!

Accelerate Cycle-Level Multi-Core RISC-V Simulation with Binary - PowerPoint PPT Presentation

Accelerate Cycle-Level Multi-Core RISC-V Simulation with Binary Translation Xuan Guo, Robert Mullins Department of Computer Science and Technology Both the paper and the slides are made available under CC BY 4.0 Motivation We want to

Cycle time: 40 sec Cycle time: 12 sec Cycle time: 0.75 sec Cycle time: 1.25 sec Cycle time: 5

Welcome Welcome Core: Core A Regional Destination Core: Core UL Core: Core Downtown

The future of operating systems on RISC-V Alex Bradbury asb@lowrisc.org @asbradbury 4th

Using GPU VSIPL & CUDA to Accelerate RF Clutter Simulation Accelerate RF Clutter Simulation

ACCELERATE AUDIT ACCELERATE ATTAIN ALIGN ACCREDIT THE 4 STAGE PROCESS ACCELERATE ACCREDIT

Roadmap 1. Instruction Set Architectures (ISA) What is CISC? What is RISC? Why did RISC prevail

Roadmap 1. Instruction Set Architectures (ISA) What is CISC? What is RISC? Why did RISC prevail

Caching, Parallelism, Fault Tolerance Marco Serafini COMPSCI 532 Lectures 2-3 Memory Hierarchy

LOGIC TECHNOLOGY FOR CS EDUCATION RISCAL The RISC Algorithm Language Wolfgang Schreiner

PROCESSOR DEVELOPMENT THE FREE AND OPEN RISC INSTRUCTION SET ARCHITECTURE Codasip is the

LOGIC TECHNOLOGY FOR CS EDUCATION RISCAL The RISC Algorithm Language Wolfgang Schreiner

Outline 2 Outline 2 ZSim core simulation techniques Outline 2 ZSim core simulation

Outline Narcisse Ngada DESY, MKK 1) What is simulation ? 14.05.2014 2) Why simulation ? 3)

Simulating Multi-Core RISC-V Systems in gem5 Tuan Ta, Lin Cheng, and Christopher Batten School

REALIZATION OF LCA IN BUSINESS LIFE CYCLE SIMULATION MODELS LCS Life Cycle Simulation GmbH

Multi Cycle CPU Jason Mars Monday, February 4, 13 Why a Multiple Cycle CPU? Monday, February 4,

ShortestNontrivialCycles inDirectedSurfaceGraphs JeffErickson SOCG2011

Is ENSO a cycle or a series of events? William S. Kessler NOAA / Pacific Marine Environmental

Optimizing Your Analytic s L ife Cyc le with SAS & T e r adata Paul Se gal - T e r

1.14 Implementing Effective Contract Negotiation and Relationship Management Strategies 101 May

Concussion in road cycling Dr Neil Heron, F.FSEM (UK), GP , PhD Consultant in Sport and

Management (CDLM) for Petascale Projects Arun Jagatheesan iRODS.org, DICE, SDSC/UCSD Agenda

On s -fully cycle extendable line graphs Yehong Shao Ohio University Southern, Ironton, OH 45638

TSNsched: Automated Schedule Generation for Time Sensitive Networking Aellison Cassimiro T. dos

Accelerate Cycle-Level Multi-Core RISC-V Simulation with Binary - PowerPoint PPT Presentation

Accelerate Cycle-Level Multi-Core RISC-V Simulation with Binary Translation Xuan Guo, Robert Mullins Department of Computer Science and Technology Both the paper and the slides are made available under CC BY 4.0 Motivation We want to

Cycle time: 40 sec Cycle time: 12 sec Cycle time: 0.75 sec Cycle time: 1.25 sec Cycle time: 5

Welcome Welcome Core: Core A Regional Destination Core: Core UL Core: Core Downtown

The future of operating systems on RISC-V Alex Bradbury asb@lowrisc.org @asbradbury 4th

Using GPU VSIPL &amp; CUDA to Accelerate RF Clutter Simulation Accelerate RF Clutter Simulation

ACCELERATE AUDIT ACCELERATE ATTAIN ALIGN ACCREDIT THE 4 STAGE PROCESS ACCELERATE ACCREDIT

Roadmap 1. Instruction Set Architectures (ISA) What is CISC? What is RISC? Why did RISC prevail

Roadmap 1. Instruction Set Architectures (ISA) What is CISC? What is RISC? Why did RISC prevail

Caching, Parallelism, Fault Tolerance Marco Serafini COMPSCI 532 Lectures 2-3 Memory Hierarchy

LOGIC TECHNOLOGY FOR CS EDUCATION RISCAL The RISC Algorithm Language Wolfgang Schreiner

PROCESSOR DEVELOPMENT THE FREE AND OPEN RISC INSTRUCTION SET ARCHITECTURE Codasip is the

LOGIC TECHNOLOGY FOR CS EDUCATION RISCAL The RISC Algorithm Language Wolfgang Schreiner

Outline 2 Outline 2 ZSim core simulation techniques Outline 2 ZSim core simulation

Outline Narcisse Ngada DESY, MKK 1) What is simulation ? 14.05.2014 2) Why simulation ? 3)

Simulating Multi-Core RISC-V Systems in gem5 Tuan Ta, Lin Cheng, and Christopher Batten School

REALIZATION OF LCA IN BUSINESS LIFE CYCLE SIMULATION MODELS LCS Life Cycle Simulation GmbH

Multi Cycle CPU Jason Mars Monday, February 4, 13 Why a Multiple Cycle CPU? Monday, February 4,

ShortestNontrivialCycles inDirectedSurfaceGraphs JeffErickson SOCG2011

Is ENSO a cycle or a series of events? William S. Kessler NOAA / Pacific Marine Environmental

Optimizing Your Analytic s L ife Cyc le with SAS &amp; T e r adata Paul Se gal - T e r

1.14 Implementing Effective Contract Negotiation and Relationship Management Strategies 101 May

Concussion in road cycling Dr Neil Heron, F.FSEM (UK), GP , PhD Consultant in Sport and

Management (CDLM) for Petascale Projects Arun Jagatheesan iRODS.org, DICE, SDSC/UCSD Agenda

On s -fully cycle extendable line graphs Yehong Shao Ohio University Southern, Ironton, OH 45638

TSNsched: Automated Schedule Generation for Time Sensitive Networking Aellison Cassimiro T. dos

Using GPU VSIPL & CUDA to Accelerate RF Clutter Simulation Accelerate RF Clutter Simulation

Optimizing Your Analytic s L ife Cyc le with SAS & T e r adata Paul Se gal - T e r