development of a functional prototype of the quad core
play

Development of a Functional Prototype of the Quad-Core NGMP Space - PowerPoint PPT Presentation

Development of a Functional Prototype of the Quad-Core NGMP Space Processor DASIA 2012 May 14 th , 2012 www.aeroflex.com/gaisler Contents Contents: NGMP project overview NGMP architecture NGFP functional prototype


  1. Development of a Functional Prototype of the Quad-Core NGMP Space Processor DASIA 2012 May 14 th , 2012 www.aeroflex.com/gaisler

  2. Contents Contents: • – NGMP project overview NGMP architecture – – NGFP functional prototype overview – NGFP usage Current status – – Conclusions and remaining work 2

  3. NGMP Project Overview NGMP is an ESA activity developing a multi-processor system with higher • performance than earlier generations of European Space processors • Part of the ESA roadmap for standard microprocessor components • Aeroflex Gaisler's assignment consists of specification, the architectural (VHDL) design, and verification by simulation and on FPGA. The goal of this work is to produce a verified gate-level netlist for a suitable technology. As an additional step in the development of the NGMP, a functional • prototype ASIC “NGFP” is being developed, also under ESA contract, which is presented here. 3

  4. NGMP Architecture Overview (1/2) GRFPU GRFPU DMA Masters GRFPC GRFPC GRFPC GRFPC LEON4FT LEON4FT LEON4FT LEON4FT IOMMU L1 Cache L1 Cache L1 Cache L1 Cache CPU bus 128-bit to low-speed slaves... Level 2 Cache Scrubber Memory bus 128-bit EDAC Memory controller mem_ifsel DDR2 SDRAM to either DDR2 or PC133 SDRAM 4

  5. NGMP Architecture Overview (2/2) - Quad-core Leon4 - GRFPU, pairwise GRFPU GRFPU DMA Masters shared GRFPC GRFPC GRFPC GRFPC LEON4FT LEON4FT LEON4FT LEON4FT IOMMU L1 Cache L1 Cache L1 Cache L1 Cache CPU bus 128-bit to low-speed slaves... - L1 and L2 Caches Level 2 Cache Memory bus 128-bit Scrubber EDAC Memory controller - Memory controller mem_ifsel DDR2 SDRAM to either DDR2 or PC133 SDRAM 5

  6. NGMP Overview - LEON4FT and GRFPU LEON4FT ● ● IEEE-1754 SPARC V8 compliant 32-bit processor ● 7-stage pipeline, multi-processor support ● 64- or 128-bit AHB bus interface • Compare-and-swap (CASA) instruction support • 1.7 DMIPS/MHz, 0.6 Wheatstone MFLOPS/MHz • Estimated 0.35 SPECINT/MHz, 0.25 SPECFP/MHz • 2.1 CoreMark/MHz (comparable to ARM11) – GRFPU • High-performance FPU integrated into LEON4 pipeline • Hardware DIV and SQRT • Floating-point controller (FPC) decouples FP operations from pipeline, allowing FPU and CPU to work in parallel • Each FPU arbitrated between two FPC:s to save significant area for a few percent performance reduction (no reduction at all if only one CPU uses the FPU) 6

  7. NGMP Overview – Caches (1/2) Level 1 cache ● ● Separate L1 integrated into each LEON4 core ● Multi-set with configurable LRU/LRR/RND policy ● Write-through operation ● Bus snooping and physical tags to maintain coherency Level 2 cache ● ● Designed as a bridge in the bus topology ● Highly configurable in caching behavior ● Supports copy-back operation ● Locked ways, allowing part or whole to be used as on-chip RAM 7

  8. NGMP Overview – Caches (2/2) L1 is write-through GRFPU GRFPU with bus snooping DMA Masters GRFPC GRFPC GRFPC GRFPC to maintain LEON4FT LEON4FT LEON4FT LEON4FT coherency IOMMU L1 Cache L1 Cache L1 Cache L1 Cache CPU bus 128-bit to low-speed slaves... Level 2 Cache L2 can be copy-back Burst lengths Scrubber since no masters are matched are behind. Memory bus between 128-bit This prevents memory and repeated writes due caches EDAC to L1 write-through from causing Memory controller mem_ifsel unnecessary DDR2 SDRAM memory accesses. to either DDR2 or PC133 SDRAM 8

  9. NGMP Overview – Memory Controller Memory controller ● ● DDR2 or SDRAM, selected with bootstrap signal ● Use same package pins for DDR2 and SDRAM interfaces • Full-width 64-bit or half-width 32-bit external data buses, selected with bootstrap signal ● Powerful interleaved 16/32+8-bit ECC giving 32 or 16 checkbits (SW selected, can be switched on the fly) Scrubber ● ● Fast initialization of memory and checkbits on bootup ● Background scrubbing ● Error reporting to CPU and statistics collection Memory error handling (memory controller, scrubber, cpu together) ● ● Rapid regeneration of contents after SEFI ● Graceful degradation of failed byte lane, regaining SEU tolerance ● Some example code already in our RTEMS repository 9

  10. NGMP Overview – I/O Interfaces Large number of I/O interfaces: • – SpaceWire router PCI Master/T arget with DMA – – Gbit ethernet – MIL-STD-1553B Uart, SPI, GPIO – • Debug interfaces: – Ethernet USB – – Spacewire (RMAP) JTAG, Serial – 10

  11. NGMP Overview – Fault Torerance LEON4FT • – 4-bit parity on L1 cache Protected register files (both CPU and FPU) – • L2 Cache – BCH protected memories Built-in Scrubber – • General – Block RAM contents in IP cores protected by ECC – Rad-hard flip-flops and logic by process, library or TMR on netlist 11

  12. NGMP Overview - Interfaces Resource partitioning • – The architecture has been designed to support both SMP, AMP and mixtures (example: 3 CPU:s running Linux SMP and one running RTEMS) The L2 caches can be set to 1 way/CPU mode – IRQ:s can be masked/routed separately to each CPU allowing – different schemes The I/O core IP's register interfaces are located at separate 4K – pages to allow (via MMU) restricting user-level software from accessing the wrong IP in case of software malfunction. 12

  13. NGMP Block Diagram M = Master interface(s) S = Slave interface(s) S S S S S S S 32-bit APB @ 400 MHz X = Snoop interface LEON4 IRQCTRL1 IRQCTRL2 IRQMP IRQCTRL3 IRQCTRL4 STAT. UNIT 32-bit APB @ 400 MHz S S M AHB/APB RMAP DSU PCITRACE Timers IRQCTRL FPU Timers IRQCTRL FPU X FPU FPU Bridge DCL Timers IRQCTRL Timers IRQCTRL S M LEON4FT LEON4FT LEON4FT LEON4FT Debug bus Memory AHB/AHB S 32-bit AHB @ 400 MHz Caches MMU Caches MMU Caches MMU Caches MMU Scrubber Bridge 64-bit DDR2 S M MX MX MX MX M M M M SDRAM AND L2 Processor bus S Memory bus M S USB JTAG AHBTRACE DDR2-800/ SDRAM Cache DCL 128-bit AHB @ 400 MHz 128-bit AHB @ 400 MHz SDR-PC100 CTRLs S X S S M On-Chip AHB/AHB AHB Bridge AHB SDRAM Bridge IOMMU Status S S M PROM PROM S 32-bit AHB @ 400 MHz Master IO bus 32-bit AHB @ 400 MHz IO & IO M M Slave IO bus 8/16-bit CTRL M M X S S M M SPW HSSL AHB PCI PCI PCI SPW HSSL AHB/APB SPW HSSL Ethernet Timers SPW HSSL Ethernet Status Bridge Master DMA Target S S S S S S S M S CLKGATE S S S S PCI UART IRQSTAMP GPIO UART Arbiter 13

  14. NGFP Overview (1/4) • NGFP is a functional (not rad-hard) prototype of NGMP on a commercial technology. Purposes • – Keep NGMP development going despite target technology library not being available yet. – Create more representative and complete prototype than the earlier FPGA prototypes for user evaluation and benchmarking. Learning and gaining experience – – Reduce technical risk for final ASIC design 14

  15. NGFP Overview (2/4) • The technology chosen was eASIC:s Nextreme2 structured ASIC. – Fixed sea of LUT:s and RAM blocks similar to FPGA but programmed by customizing a few layers of metal. ASIC-like tool flow (DC synthesis, custom back-end tools) – – Competitive cost Low lead time (8 weeks) – – Devices are factory tested (unlike MPW) 15

  16. NGFP Overview (3/4) • The full NGMP architecture is included in the prototype, with the following exceptions: – Non-FT version of LEON4 and non-FT L2-cache to save resources • L2-cache is set to have same timing to give representative benchmark results. DDR2 and SDRAM on different pins (multi-IO pads not available) – • Still, only one of them can be used at a time. Reduced L2 cache sizes – Lower clock rates – 16

  17. NGFP Overview (4/4) • Clock speed limitations – Main CPU clock limited to 150 MHz in worst case (slow process) DDR2 clock limited to 300 MHz – One contributing factor is that all the RAM-blocks that are used for the L2 • cache are distributed all over the chip, this will not happen in a regular ASIC process where RAM:s can be freely placed on the chip. • Some potential for improvement on the prototype: In typical process timing is significantly better – – We will also use overclocking techniques such as raising core voltage to get as good frequency as possible. 17

  18. NGFP Status • Currently finishing tape-out process (May 18:th) 8 weeks lead time expected on devices • PCB design finalized, to be manufactured • • After receiving devices, PCB assembly and board bring-up will follow • ...then continue with verification and benchmarking 18

  19. NGFP Evaluation Board Evaluation board providing interfaces of the NGFP device 6U CPCI form factor 19

  20. NGFP Conclusions • A functional prototype of the NGMP architecture has been designed and taped out. Worst-case process timing did not become as good as we had hoped, but • mitigation by overclocking should be possible. • Some parts of this development have been very time consuming, one issue in particular has been interfacing the native DDR2 PHY resources on the technology. We have gained valuable experience that will definitely speed up further • development, once the target technology has been fixed. • Thank you for listening! 20

Recommend


More recommend