programming modern fpgas
play

Programming Modern FPGAs Ivo Bolsens Xilinx MPSOC August, 2006 - PowerPoint PPT Presentation

Programming Modern FPGAs Ivo Bolsens Xilinx MPSOC August, 2006 MPSOC 2006 Outline Modern FPGA FPGA programmable platform Programming the FPGA Conclusions MPSOC 2006 slide 2 Modern FPGA 65nm technology, 40-nm gate


  1. Programming Modern FPGAs Ivo Bolsens Xilinx MPSOC August, 2006 MPSOC 2006

  2. Outline • Modern FPGA • FPGA programmable platform • Programming the FPGA • Conclusions MPSOC 2006 slide 2

  3. Modern FPGA • 65nm technology, 40-nm gate length (Poly) • 1.6nm oxide thickness (16 Angstrom) – ~5 atomic layers • Triple-Oxide Technology 65-nm Transistor – 3 oxide thicknesses for optimum Cross Section power and performance • 1.0 Vcc core – Lower dynamic power • 12 layer copper • Strained silicon transistor – Maximum performance at lowest AC power Over 1 Billion Transistors MPSOC 2006 slide 3

  4. FPGA Roadmap New process technology drives down cost 180 nm 180 nm FPGAs can take advantage of new technology faster than ASICs and ASSPs 150 nm 150 nm FPGA 2010: 32 nm, 5 Billion transistors 130 nm 130 nm 300mm wafers – 300mm wafers – Low cost Low cost 90nm 90nm – – Low cost Low cost Triple Oxide Triple Oxide – – Low power Low power 90 nm 90 nm 12 layer copper, 1 volt core 12 layer copper, 1 volt core 65 nm 65 nm 45 nm 45 nm 1.0 Volt 1.0 Volt 32 nm 32 nm The cost of IC development increases. Therefore customers want to buy reconfigurable and programmable platforms, instead of developing their own. MPSOC 2006 slide 4

  5. FPGA Fabric High-Performance High-Performance -Logic Fabric -Logic Fabric 36Kbit Many 36Kbit Many Dual-Port Configuration Dual-Port Configuration Block RAM / FIFO Options Block RAM / FIFO Options with ECC with ECC 25x18 Multiplier 25x18 Multiplier General IO with General IO with DSP Slice with DSP Slice with ChipSync ChipSync Integrated ALU Integrated ALU + XCITE DCI + XCITE DCI Gigabit Gigabit Serial Serial 550 MHz Clock 550 MHz Clock Transceivers Transceivers Management Tile Management Tile DCM + PLL DCM + PLL MPSOC 2006 slide 5

  6. Logic Architecture True 6-input Lookup Table (LUT) with dual 5-input LUT option RAM64 RAM64 SRL32 SRL32 SRL32 Register/ Register/ Register/ Latch Latch Latch LUT6 LUT6 64-bit RAM per M-LUT LUT6 RAM64 RAM64 SRL32 SRL32 about half of all LUTs SRL32 Register/ Register/ Register/ Latch Latch Latch LUT6 LUT6 LUT6 RAM64 RAM64 SRL32 SRL32 SRL32 Register/ Register/ Register/ 32-bit or 16-bit x 2 Latch Latch Latch LUT6 LUT6 LUT6 RAM64 RAM64 shift register per M-LUT SRL32 SRL32 SRL32 Register/ Register/ Register/ Latch Latch Latch LUT6 LUT6 LUT6 MPSOC 2006 slide 6

  7. Connect 2 Hops 3 Hops 1 Hop Fast Virtex-4 Routing Virtex-5 Routing Same pattern for all outputs Symmetric connecting MPSOC 2006 slide 7 pattern, CLBs

  8. General Purpose I/O (Select I/O) • All I/O pins are “created equal” • Compatible with >40 different standards – Vcc, output drive, input threshold, single/differential, etc • Each I/O pin has dedicated circuitry for: – On-chip transmission-line termination (serial or parallel) – Serial-to-parallel converter on the input (CHIPSYNC) – Parallel -to-serial converter on the output (CHIPSYNC) – Clock divider, and high-speed “regional” clock distribution Ideal for source-synchronous I/O up to 1 Gbps MPSOC 2006 slide 8

  9. Platform FPGAs Digital System Design Simplified System Design RTL HW / SW partition 0, 1 and delay High-level synthesis Timing Standards and interfaces Platform FPGA IR drop Embedded IP DFM Termination Clock generation Noise Margin Transmission lines Repeaters ATPG Startup init Crosstalk Clock distribution MPSOC 2006 slide 9

  10. Xilinx Strategic Directions APPS Integration Embedded Processor Hard IP System Tools New Gb Transceivers Time DSP Algorithmic Logic Existing Glue Logic Markets Existing New Cost • Network Infrastructure • Consumer Electronics Power • Computing Infrastructure • Automotive • Industrial, medical • Portable Quality • Military MPSOC 2006 slide 10

  11. Domain Optimized Platforms One Family – Multiple Platforms LX SX FX Column based features ... Logic Logic Domain Domain A DSP Domain Domain B Connectivity Domain Memory Highest logic density Highest DSP performance Embedded Processors DSP High-speed Serial I/O Enables “Dial-In” hard IP Mix Processing Logic, DSP, BRAM, I/O, MGT, DCM, PowerPC Enabled by Flip-Chip Packaging High-speed I/O I/O Columns Distributed Throughout the Device MPSOC 2006 slide 11

  12. I/Os The FPGA System External Memory Port PowerPC µP Memory Internal Memory DSP DSP Accelerator Emulation Logic Custom Logic MPSOC 2006 slide 12 Port Communication MGTs

  13. 8 MPEG4 decoders Off Chip RAM RAM Frame Memories Memory Memory Memory Memory Eight Ports of Controller Eight Ports of Controller Controller Controller Compressed De-Compressed Video In 720p Video Out Mpeg4 Decoder Mpeg4 Decoder Mpeg4 Decoder Mpeg4 Decoder Mpeg4 Decoder Mpeg4 Decoder Mpeg4 Decoder Mpeg4 Decoder MPSOC 2006 slide 13

  14. Application Example: MicroBlaze 5.0 Data-side Instruction-side Data-side Instruction-side bus interface bus interface bus interface bus interface DLMB ILMB DLMB ILMB Add/Sub Program Program Program Shift/Logical Shift/Logical Shift/Logical Counter Counter Counter Multiply Multiply Bus Bus Bus Bus Bus Bus Instruction Instruction Instruction IF IF IF IF IF IF Decode Decode Decode Instruction Instruction Register File Register File Buffer Buffer 32X32b 32X32b IOPB IOPB DOPB DOPB • 1400 LUT6 • 230 Dhrystone Mips • > 200 fit in V5 MPSOC 2006 slide 14

  15. Future Proof Architecture • Parallelism – Performance & Power + + + + • Distributed Memory + + + + – Data transfer bottleneck + + + + • Regular + + + + – Manufacturability Arithmetic/Logic & Memory – Redundancy • Scalable “If FPGAs didn’t exist today, “If FPGAs didn’t exist today, – Future Proof people would have to invent them…” people would have to invent them…” • 2010 – 5 cent/32bit MB – 2$ for 1 Mgates MPSOC 2006 slide 15

  16. FPGA for Embedded Systems • An embedded system is a system that – has a complex concurrent behavior – is characterized by stringent timing requirements – has non-trivial communication between its components and the rest of the world MPSOC 2006 slide 16

  17. Outline • Modern FPGA • FPGA programmable platform • Programming the FPGA • Conclusions MPSOC 2006 slide 17

  18. FPGA Memory Options Choose the Right Memory for the Application Distributed RAM/SRL32 Distributed RAM/SRL32 On- On -chip BRAM/FIFO chip BRAM/FIFO Fast Memory Interfaces Fast Memory Interfaces DRAM DRAM DRAM DRAM • SDRAM • RAM / SRL 32 • SDRAM SDRAM RAM / SRL 32 • DDR SDRAM • DDR SDRAM • DDR SDRAM • FCRAM • • FCRAM FCRAM SRAM • RLDRAM • RLDRAM • RLDRAM Virtex Virtex- -5 5 SRAM SRAM SRAM BRAM/FIFO BRAM/FIFO • Sync SRAM • • Sync SRAM Sync SRAM • DDR SRAM • DDR SRAM FLASH • DDR SRAM • ZBT • ZBT • ZBT • QDR • • QDR QDR LOGIC LOGIC FLASH FLASH FLASH EEPROM EEPROM EEPROM EEPROM • Very granular, localized memory • Efficient, on-chip blocks • Cost-effective bulk storage • Minimal impact on logic routing • Flexible + optional FIFO logic • Memory controller cores • Great for small FIFOs • Ideal for mid-sized FIFOs/buffers • Large memory requirements Granularity Granularity Capacity Capacity MPSOC 2006 slide 18

  19. Memory Bandwidth Envelope 1 000 BRAM 800 Memory (KB) 4VLX200 600 LUT-RAM 2V6000 400 3.5GHz P5 200 REGISTERS 0 0 50 1 00 1 50 200 250 300 B andwidt h ( Tbps) Intel; Xilinx • Bandwidth to Registers: 500x that of a processor registerfile • Bandwidth to LUTrams: 50x that of L1 cache of processor • Bandwidth to BRAMS: 5x that of L1 to L2 cache of a processor MPSOC 2006 slide 19

  20. Programmable interconnect • Can connect compute and registers, small memory and larger memory arbitrarily • 80% of the FPGA resource, but often neglected as the key differentiator • Contrast this with processors: 4 pre-specified architectural (von Neumann) bottlenecks. ALUs REGs L1 L2 Mem MPSOC 2006 slide 20

  21. FPGA vs Microprocessor Microprocessor FPGA Itanium 2 Virtex 2VP100 Technology 0.13 Micron 0.13 Micron Clock Speed 1.6GHz 180MHz 102 GBytes per Sec 7.5 TBytes per Sec Internal Memory Bandwidth 5 FPU(2MACs + 1FPU) 212 FPU or # Processing Units + 6 MMU 300+ Integer Units or + 6 Integer Units ………. Power Consumption 130 WATTS 15 WATTS 8 GFLOPs 38 GFLOPS Peak Performance ~2 GFLOPs ~19 GFLOPS Sustained Performance 6.4 GBytes/sec 67 GBytes/sec I/O / External Memory Bandwidth Courtesy Nallatech MPSOC 2006 slide 21

  22. High Performance Compute 250 200 150 Pentium 100 V2Pro 50 V4 0 Computation Memory IO Bandwidth (GOPS) Bandwidth (Gbps) (GB/sec) MPSOC 2006 slide 22

  23. Processor Use Models 1 2 3 State Machine Microcontroller Custom Embedded • • Lowest Cost, No Medium Cost, Some • Highest Integration, Peripherals, No RTOS & Peripherals, Possible Extensive Peripherals, No Bus Structures Bus Structure RTOS & Bus Structures • • VGA & LCD Controllers Control & Instrumentation • Networking & Wireless • Low/High Performance • Moderate Performance • High Performance MPSOC 2006 slide 23

Recommend


More recommend