liquid architecture
play

Liquid Architecture Microarchitecture Optimization for Embedded - PowerPoint PPT Presentation

Liquid Architecture Microarchitecture Optimization for Embedded Systems D. Schuehler, B. Brodie, R. Chamberlain, R. Cytron, S. Friedman, J. Fritts, P. Jones, P. Krishnamurthy, J. Lockwood, S. Padmanabhan, and H. Zhang Dept. of Computer Science


  1. Liquid Architecture Microarchitecture Optimization for Embedded Systems D. Schuehler, B. Brodie, R. Chamberlain, R. Cytron, S. Friedman, J. Fritts, P. Jones, P. Krishnamurthy, J. Lockwood, S. Padmanabhan, and H. Zhang Dept. of Computer Science and Engineering Washington University in St. Louis Supported by NSF ITR-0313203

  2. Liquid Architecture • Configurable architecture that can adapt to needs of particular application • E.g., within an FPGA – Soft-core processors • E.g., as an embedded processor – Tensilica supports configuration at fab time – Stretch support configuration at run time • Today’s discussion is on performance analysis and configuration choice

  3. Block Diagram FPX Event Bus FPGA LEON SPARC- External Statistics compatible Memory Module processor Memory Controller I-Cache D-Cache AHB ` ` Network LED UART LED UART ` ` Control Interface APB Packet Adapter Adapter Processor Boot Rom Layered Internet Protocol Wrappers

  4. Microarchitecture Configurability • Instruction set • Memory subsystem – Cache size (I and D) – Associativity – Cache line size • Co-processor(s) • Instruction pipeline • Full HDL source is available

  5. Design Flow Internet Write and compile Identify Execute program Reconfigure FPX embedded SPARC configuration for on FPX Platform hardware via Internet and application with GCC candidate and measure run- upload system software. architecture time performance

  6. Time / Method Cycle-accurate profiling Cycles .text main • Choose methods to profile from the addQuery user interface findMatch computeKey computeBase computeStep fillQuery Rnd

  7. Address Method Range .text main Lo 0x4000027C addQuery 0x400003EF Hi findMatch computeKey computeBase computeStep fillQuery Rnd

  8. Event Bus Method PC CLK Statistics Module .text 0x4000035A main Lo 0x4000027C addQuery 0x400003EF Hi findMatch computeKey computeBase computeStep fillQuery Rnd

  9. Event Bus Function PC CLK Statistics Module .text Lo 0x4000027C 0x4000035A 0x400003EF main ≤ ≤ Hi addQuery Counter INCR findMatch computeKey computeBase computeStep fillQuery Rnd

  10. Event Bus Function PC CLK Statistics Module .text Lo 0x4000027C 0x4000035A 0x400003EF main ≤ ≤ Hi addQuery Counter INCR findMatch computeKey Lo computeBase 0x400005D8 0x4000035A 0x4000061F ≤ ≤ Hi computeStep Counter fillQuery INCR Rnd

  11. Event Bus PC CLK Statistics Module Lo 0x4000027C 0x4000035A 0x400003EF ≤ ≤ Hi Counter INCR To User Lo 0x400005D8 0x4000035A 0x4000061F ≤ ≤ Hi Counter INCR

  12. Where is time spent? 100% 90% 80% Rest % of total runtim e 70% coreLoop BLASTN findMatch 60% biosequence 50% search 40% application 30% 20% 10% 0% 128K 32K Size of hash table ( Bytes)

  13. Cache Hits / Misses Time / Function Cycles Read Write .text main addQuery Expand to findMatch measure cache computeKey hits/misses computeBase computeStep fillQuery Rnd

  14. Measure Several Configurations

  15. Impact of D-cache Configuration 100 Total findMatch 98 coreLoop 96 BLASTN hit rate (%) 94 biosequence 92 search application 90 88 86 128K, 1Kx1 128K, 128K, 32K, 1Kx1 32K, 32Kx1 32K, 16Kx2 32Kx1 16Kx2 Size of hash table, D-cache configuration

  16. Impact of I-cache Configuration 35 30 1KB I-Cache 4KB I-Cache 25 Run tim e ( secs) BLASTN 20 biosequence search 15 application 10 5 0 128K 32K BLASTN hash table sizes ( Bytes)

  17. Cache Hits / Misses Time / Pipeline Branch Function Cycles Stalls Predict Read Write .text main addQuery findMatch computeKey computeBase computeStep fillQuery Rnd

  18. Time for Single Run 80000 100000 Almost 2 10000 orders of 1800 magnitude Time (sec) 1000 faster than simulation 100 10 1 SimpleScalar 3.0 LEON

  19. Implications of Slow Simulation • Focus has historically been on measuring the performance of a single thread of a single application • Real apps are often executed in a multitasking environment – Impacts cache behavior – Ignores OS (system call) performance • Liquid architecture system enables direct measurement, including OS

  20. OS Boot Sequence

  21. Summary • Run-time reconfigurable processors will be available sooner rather than later • Determining desired configuration is a difficult design task – Large search space – Depends on accurate performance data • Liquid architecture system enables direct measurement of performance properties

  22. Current and Future Work • Evaluation of several arch. design ideas • Automated search of the design space • Characterizing performance analysis methods – Analytic models – Simulation models – Direct execution models • Usable as is for evaluating soft-core procs • Like to extend to higher-speed procs

Recommend


More recommend