Pydgin for RISC-V: A Fast and Productive Instruction-Set Simulator Berkin Ilbeyi In collaboration with Derek Lockhart (Google), and Christopher Batten 3rd RISC-V Workshop, Jan 2016 Cornell University Cornell University Computer Systems Laboratory Computer Systems Laboratory
Motivation Productivity Performance - Develop Interpretive: 1-10 MIPS (1-10 days) - Extend Typical DBT: 100s MIPS (1-3 hours) - Instrument QEMU DBT: 1000 MIPS (0.5 hours) RISC-V Foundation RISC-V Software Specialized RISC-V Instruction-Set Simulator Specialized RISC-V Hardware 1 / 16 Pydgin for RISC-V: A Fast and Productive Instruction-Set Simulator
Productivity Performance Architectural Instruction Set [SimIt-ARM2006] Description Interpreter in C [Wagstaff2013] Language with DBT Key Insight: Similar productivity-performance challenges for Dynamic Language building high-performance interpreters of Interpreter in C dynamic languages. with JIT Compiler (e.g. JavaScript, Python) 2 / 16 Pydgin for RISC-V: A Fast and Productive Instruction-Set Simulator
Productivity Performance Architectural Instruction Set [SimIt-ARM2006] Description Interpreter in C [Wagstaff2013] Language with DBT Dynamic-Language RPython Dynamic Language Interpreter in C Interpreter Translation in RPython Toolchain with JIT Compiler Meta-Tracing JIT: makes JIT generation generic across languages 2 / 16 Pydgin for RISC-V: A Fast and Productive Instruction-Set Simulator
Productivity Performance Architectural Instruction Set Pydgin Description Interpreter in C Language with DBT RPython Translation Toolchain JIT ≈ DBT 2 / 16 Pydgin for RISC-V: A Fast and Productive Instruction-Set Simulator
Pydgin Architecture Description Language Architectural State State Instruction Encoding Encoding Semantics Instruction Semantics 3 / 16 Pydgin for RISC-V: A Fast and Productive Instruction-Set Simulator
Pydgin Architecture Description Language Architectural State State class State( object ): Encoding Semantics def __init__( self, memory, reset_addr=0x400 ): self.pc = reset_addr self.rf = RiscVRegisterFile() self.mem = memory # optional state if floating point is enabled if ENABLE_FP: self.fp = RiscVFPRegisterFile() self.fcsr = 0 4 / 16 Pydgin for RISC-V: A Fast and Productive Instruction-Set Simulator
Pydgin Architecture Description Language Instruction Encoding State encodings = [ Encoding Semantics # ... ['xori', 'xxxxxxxxxxxxxxxxx100xxxxx0010011'], ['ori', 'xxxxxxxxxxxxxxxxx110xxxxx0010011'], ['andi', 'xxxxxxxxxxxxxxxxx111xxxxx0010011'], ['slli', '000000xxxxxxxxxxx001xxxxx0010011'], ['srli', '000000xxxxxxxxxxx101xxxxx0010011'], ['srai', '010000xxxxxxxxxxx101xxxxx0010011'], ['add', '0000000xxxxxxxxxx000xxxxx0110011'], ['sub', '0100000xxxxxxxxxx000xxxxx0110011'], ['sll', '0000000xxxxxxxxxx001xxxxx0110011'], # ... ] 5 / 16 Pydgin for RISC-V: A Fast and Productive Instruction-Set Simulator
Pydgin Architecture Description Language Instruction Semantics State def execute_addi ( s, inst ): Encoding Semantics s.rf[inst.rd] = s.rf[inst.rs1] + inst.i_imm s.pc += 4 def execute_sw ( s, inst ): addr = trim_xlen( s.rf[inst.rs1] + inst.s_imm ) s.mem.write( addr, 4, trim_32( s.rf[inst.rs2] ) ) s.pc += 4 def execute_beq ( s, inst ): if s.rf[inst.rs1] == s.rf[inst.rs2]: s.pc = trim_xlen( s.pc + inst.sb_imm ) else: s.pc += 4 6 / 16 Pydgin for RISC-V: A Fast and Productive Instruction-Set Simulator
Pydgin Framework Interpreter Loop State def instruction_set_interpreter( memory ): Encoding Semantics state = State( memory ) while True: Pydgin Framework pc = state.fetch_pc() inst = memory[ pc ] # fetch execute = decode( inst ) # decode execute( state, inst ) # execute 7 / 16 Pydgin for RISC-V: A Fast and Productive Instruction-Set Simulator
Pydgin Framework Interpreter Loop State def instruction_set_interpreter( memory ): Encoding Semantics state = State( memory ) while True: Pydgin Framework pc = state.fetch_pc() inst = memory[ pc ] # fetch Debug on execute = decode( inst ) # decode Python execute( state, inst ) # execute Interpreter 100 KIPS 7 / 16 Pydgin for RISC-V: A Fast and Productive Instruction-Set Simulator
The RPython Translation Toolchain RPython � Source State Type � Inference Encoding Semantics Op � miza � on Pydgin Framework Code � Genera � on Debug on RPython Python Translation Compila � on Interpreter Toolchain 100 KIPS Compiled � Interpreter 8 / 16 Pydgin for RISC-V: A Fast and Productive Instruction-Set Simulator
The RPython Translation Toolchain RPython � Source State Type � Inference Encoding Semantics Op � miza � on Pydgin Framework Code � Genera � on Debug on RPython Python Translation Compila � on Interpreter Toolchain 100 KIPS Compiled � Interpreter Pydgin Interpretive Simulator 10 MIPS 8 / 16 Pydgin for RISC-V: A Fast and Productive Instruction-Set Simulator
The RPython Translation Toolchain RPython � Source State Type � Inference Encoding Semantics Op � miza � on JIT � Generator Pydgin Framework Code � Genera � on Debug on RPython Python Translation Compila � on Interpreter Toolchain 100 KIPS Compiled � Interpreter � with � JIT Pydgin Interpretive Simulator 10 MIPS 8 / 16 Pydgin for RISC-V: A Fast and Productive Instruction-Set Simulator
The RPython Translation Toolchain RPython � Source State Type � Inference Encoding Semantics Op � miza � on JIT � Generator Pydgin Framework Code � Genera � on Debug on RPython Python Translation Compila � on Interpreter Toolchain 100 KIPS Compiled � Interpreter � with � JIT Pydgin Pydgin Interpretive DBT Simulator Simulator 10 MIPS <10 MIPS 8 / 16 Pydgin for RISC-V: A Fast and Productive Instruction-Set Simulator
JIT Annotations and Optimizations 23X improvement Additional RPython JIT hints: over no annotations State + Elidable Instruction Fetch Encoding Semantics + Elidable Decode + Constant Promotion of PC and Memory + Word-Based Target Memory Pydgin + Loop Unrolling in Instruction Semantics Framework + JIT Annot. + Virtualizable PC and Statistics + Increased Trace Limit Debug on RPython Please see our ISPASS paper Python Translation for more details! Interpreter Toolchain 100 KIPS Pydgin Pydgin Interpretive DBT Simulator Simulator 10 MIPS 100+ MIPS SPECINT2006 on ARM 9 / 16 Pydgin for RISC-V: A Fast and Productive Instruction-Set Simulator
Pydgin Performance Spike is an interpretive simulator with some advanced DBT features: - Caching decoded instructions - PC-indexed dispatch RISC-V QEMU port was out- of-date at the time of our development 10 / 16 Pydgin for RISC-V: A Fast and Productive Instruction-Set Simulator
Pydgin Productivity RISC-V Foundation RISC-V encourages ISA extensions. RISC-V - Productive Development Specialized RISC-V - Productive Extensibility - Productive Instrumentation Specialized RISC-V 11 / 16 Pydgin for RISC-V: A Fast and Productive Instruction-Set Simulator
Pydgin RISC-V Development 12 / 16 Pydgin for RISC-V: A Fast and Productive Instruction-Set Simulator
Pydgin RISC-V Development 100+ MIPS simulator after 9 days of development! 12 / 16 Pydgin for RISC-V: A Fast and Productive Instruction-Set Simulator
Pydgin Extensibility encodings = [ # ... State ['andi', 'xxxxxxxxxxxxxxxxx111xxxxx0010011'], ['slli', '000000xxxxxxxxxxx001xxxxx0010011'], Encoding Semantics ['srli', '000000xxxxxxxxxxx101xxxxx0010011'], ['srai', '010000xxxxxxxxxxx101xxxxx0010011'], ['add', '0000000xxxxxxxxxx000xxxxx0110011'], # ... Pydgin ['gcd', 'xxxxxxxxxxxxxxxxx000xxxxx1011011'], Framework + JIT Annot. # ... ] Debug on RPython Python Translation # greatest common divisor semantics Interpreter Toolchain def execute_gcd ( s, inst ): a, b = s.rf[inst.rs1], s.rf[inst.rs2] while b: Pydgin Pydgin a, b = b, a%b Interpretive DBT s.rf[inst.rd] = a Simulator Simulator s.pc += 4 13 / 16 Pydgin for RISC-V: A Fast and Productive Instruction-Set Simulator
Pydgin Instrumentation # count number of adds def execute_addi ( s, inst ): State s.rf[inst.rd] = s.rf[inst.rs1] + inst.i_imm s.num_adds += 1 Encoding Semantics s.pc += 4 # count misaligned stores def execute_sw ( s, inst ): Pydgin addr = trim_xlen( s.rf[inst.rs1] + inst.s_imm ) Framework + JIT Annot. if addr % 4 != 0: s.num_misaligned += 1 s.mem.write( addr, 4, trim_32( s.rf[inst.rs2] ) ) s.pc += 4 Debug on RPython Python Translation # record and count all executed loops Interpreter Toolchain def execute_beq ( s, inst ): if s.rf[inst.rs1] == s.rf[inst.rs2]: old_pc = s.pc Pydgin Pydgin s.pc = trim_xlen( s.pc + inst.sb_imm ) Interpretive DBT if s.pc <= old_pc: s.loops[(s.pc, old_pc)] += 1 Simulator Simulator else: s.pc += 4 14 / 16 Pydgin for RISC-V: A Fast and Productive Instruction-Set Simulator
Pydgin in Our Research Group • Statistics for software-defined regions • Data-structure specialization experimentation • Control- and memory-divergence for SIMD • Basic Block Vector generation for SimPoint • Analysis of JIT-enabled dynamic language interpreters 15 / 16 Pydgin for RISC-V: A Fast and Productive Instruction-Set Simulator
Recommend
More recommend