Improving Ibex Performance
Greg Chadwick RISC-V Devroom FOSDEM 1st February 2020
Improving Ibex Performance Greg Chadwick RISC-V Devroom FOSDEM 1st - - PowerPoint PPT Presentation
Improving Ibex Performance Greg Chadwick RISC-V Devroom FOSDEM 1st February 2020 Ibex Microcontroller class CPU with two stage pipeline 32-bit RISC-V IMC/EMC with M-Mode, U-Mode and PMP Written in SystemVerilog Initially
Greg Chadwick RISC-V Devroom FOSDEM 1st February 2020
1st February 2020
2
by ETH Zurich
silicon root of trust
1st February 2020
improve
that isn’t benchmarks
○ Choose a smaller/simpler Ibex or a faster one
3
1st February 2020
4
1st February 2020
○ Look at signals indicating top-level stall ○ Choose a few points to examine why stall is occurring
what kinds of things are slowing down execution
5
1st February 2020
ALU checks branch condition bne t2,s5,100404
6
1st February 2020
ALU checks branch condition ALU calculates branch target bne t2,s5,100404
7
1st February 2020
ALU checks branch condition ALU calculates branch target Branch Taken bne t2,s5,100404
8
1st February 2020
lw t3,12(sp) Load requested
9
1st February 2020
lw t3,12(sp) Load requested Data returned
10
1st February 2020
effect stall conditions from informal survey have on performance
11
1st February 2020
% of total cycles spent calculating branch target
12
1st February 2020
% of total cycles spent waiting for memory response
13
1st February 2020
branch targets
branch condition in parallel
performance gain
14
1st February 2020
Analysis via OpenSTA
OpenROAD repository
library ○ Flow used to see relative changes and areas of timing pressure
15
1st February 2020
Base Branch Target ALU % change Coremark/MHz 2.40 2.51 +4.5 % Area 27,345 μm2 27,666 μm2 +1.2 % Fmax 269 MHz 234 MHz
Coremark 645.6 587.3
16
1st February 2020
17
1st February 2020
decision was stored in a flop after being computed by the main ALU
straight in the PC Mux select
to feed into PC selection mux (as it computed the target), which was the worst path
logic into the select
longer
18
1st February 2020
decoder
like
decoder and we can solve the problem
19
1st February 2020
○ Meaning it feeds its data to many different gates
connects to
from duplicated register
20
1st February 2020
○ Yosys/ABC doesn’t take IO timing constraints into account ○ So doesn’t optimise worst path properly ○ May not want to run at Fmax anyway
Base Branch Target ALU % change Coremark/MHz 2.40 2.51 +4.5 % Area 27,345 μm2 27,579 μm2 +0.9 % Fmax 269 MHz 250 MHz
Coremark 645.6 627.5
21
1st February 2020
writeback which holds the value to be written to the register file
direct to the register file
stores as response only needed the cycle afuer ID/EX
○ Significant new stalling and hazard logic needed
22
1st February 2020
○ Outweighed by performance gains
○ Worst case path from BT ALU change still dominates
Base Writeback + BT ALU % change Coremark/MHz 2.40 2.88 +20.0 % Area 27345 μm2 29212 μm2 +6.8 % Fmax 269 MHz 253 MHz
Coremark 645.60 728.64 +12.9 % 23
1st February 2020
Coremark/MHz Speedup Base 2.40
2.51 4.5% Writeback + BT ALU 2.88 20% Geomean Speedup BT ALU 4.42% Writeback + BT ALU 21.3%
24
1st February 2020
www.github.com/lowRISC/ibex
main repository ○ See my ‘ibex_fosdem’ branch at www.github.com/GregAC/ibex to take a look
○ Now recruiting!
25