Intro to SKARAB for programmers (and how to use HMC!) Jason Manley 2017 CASPER workshop
Hardware
Hardware ● Virtex 7, 690T FPGA ● 4 Mezzanine sites per SKARAB ○ 2 in front, 2 in back ○ 16 SERDES links per site ○ Designed to early PowerMX standard. ● Fans over-provisioned, normally run around 20% - 30% rated speed.
Hardware Mezzanine cards allow trading off of memory vs IO capacity. Four cards per SKARAB. ● Only one type of off-chip memory currently available on SKARAB: HMC. ○ HMC replaces QDR/SRAM and also DRAM found on previous CASPER boards. ● 40G mezzanine card offers 4x40G QSFP Ethernet ports, can drive optics or copper. ○ No more complicated, flaky PHY chips that need firmware loaded to function properly. ● An ADC is now also available, with other cards to follow.
Hardware: HMC Mezzanine card ● 1x HMC device per card ● HMC is 2GiB or 4GiB ● Two independent interfaces per card: 2x “half-width” (8 lane) links at 10Gbps per lane. ● Each link is bi-directional. ● Up to 160Gbps throughput per card.
Hardware: QSFP 40G mezzanine card Quad 40G QSFP Ethernet card PHY-less (purely passive). Does have a little micro processor for SFP management (power, temp etc). Able to drive optics directly. Tested with up to 7m passive cables. Recommend AOC (Active Optical Cables) for anything 5m and over. Does not currently work in “breakout” mode with spider/octopus cables. (turning one 40G port into 4x10G ports)
Compared to existing CASPER hardware iBOB ROACH ROACH-2 SKARAB Logic cells 53K 94K 476K 693K DSP slices 232 640 2016 3600 BRAM capacity 4.2Mb 8.8Mb 38Mb 53Mb SRAM capacity 2x18Mb 2x36Mb 4x144Mb HMC SRAM bandwidth 9Gbps 43Gbps 200Gbps DDR capacity (max) < 8x 32Gib - 1x8Gb 1x16Gb 8x 30Gbps R+W DDR bandwidth (total) - 38Gbps 50Gbps Ethernet ports 2x 10G 4x10G 8x10G < 16x40G
Hardware Uses the JASPER flow, not the traditional CASPER flow. Python now forms the backend for managing: ● busses ● ‘Yellowblock’ Backend is Xilinx VIVADO, not ISE (hard break at Virtex-6/ROACH-2; no overlapping tool support). (recall Wesley’s JASPER/VIVADO in talk on Monday) SKARAB incorporates all the lessons-learnt from SKA-SA’s sizable deployments of iBOB/BEE2, ROACH-1 and ROACH-2s. After compiling a bitstream, interacting with a SKARAB from a network-attached control computer using any of the standard tools is the same as working with any previous CASPER hardware. But it is quite different under-the-hood...
Remotely controlling SKARABs Previous CASPER boards (iBOBs, BEE2s, ROACH1s, ROACH2s) all had out of band management ports (separate 100Mbps or 1G Ethernet ports from the 10G data ports). SKARAB can do everything in-band: data, management as well as (re)programming ● Eventually over any network interface, ● But currently only over 1G port or first 40G port. ● Work in progress! SKARAB does not have a separate management processor. ● It uses a lightweight on-FPGA ‘softcore’ MicroBlaze. ● Microblaze is reloaded whenever FPGA is reprogrammed ● Process must be robust, and managed carefully, to avoid losing comms to boards. Simpler setup and maintenance: ● Just need a power cable and network cable to each SKARAB. ● Network appliance: No need for managing boot servers, Linux filesystems etc ● Entire platform can be managed remotely, including upgrading all firmware over network. ● Designed for large-scale deployments (MeerKAT, with an eye on SKA).
SKARAB startup sequencing ● Onboard flash memory ships with two (space for up to four) bitstreams pre-loaded. ○ “Golden Image” and “Multiboot Image” ○ Exactly same bitstream; ○ Tries to boot multiboot image quickly. If that fails, falls back to golden image more slowly. ○ You can load your own images here, if you want, but that’s not the idea… ● Most large CASPER deployments have a control computer on the network to configure the FPGA boards. SKARAB is designed to work in this environment. ● Host computer stores your various bitstreams. ● So, when SKARAB boots, loads flash image, asks for DHCP. Server then knows about new SKARAB board on network, and can load whichever DSP gateware image, configure registers and set it to work. ● Default is for DHCP on all network ports on startup. ○ (SKARAB wants DHCP server. Hard-coding IP addresses in your bitstreams no longer so easy.) ○ Hostname support, for example, skarab020394-01. ○ LLDP support (boards announce themselves to switches) ● MAC addresses are based on serial number and network port. ○ First 40G port has hostname skarab020302-01, with MAC 06:50:02:03:02:01 ● After loading DSP bitstream, network interfaces flap and a new DHCP transaction ensues. Depending on your DHCP server and network (switch), can take a few seconds to bring link back up.
What’s working? Working Not (yet) working Basic JASPER toolflow Legacy CASPER toolflow (and never will) Polling sensors (power, temp, fans etc) Automatic fan speed control HMC Mezzanine cards Retrieval of logs for hardware errors First 40G ethernet port Arbitrary combinations of Ethernet and HMC cards 1G ethernet port Onboard USB JTAG bridge Remote reprogramming and control Fast (~1 second) remote reloading of FPGA gateware Remote updates (flash firmware) Large wishbone bus (timing implications; WIP) DHCP, LLDP, ARP, PING and other network services Comprehensive DRC during compile Python casperfpga interfaces (mostly; WIP)
Tips for designs ● Keep to the UDP port compiled-in to your yellowblock for all your high-speed traffic. ○ Else, can overwhelm microblaze with traffic; especially problematic while trying to reprogram. ○ Yellowblock default is to use 7148 (SPEAD default at SKA-SA). ○ Don’t ever use: ■ 7778 decimal (0x1e62); that’s for controlling the microblaze. ■ 29000 decimal (0x7148); that’s used for reprogramming. ● In the event of a network failure at startup, SKARAB will try indefinitely to get a DHCP lease. ● LEDs on front panel indicate DHCP success on golden image (useful for basic/visual debugging). ● Check for updates regularly. Development’s very fluid at the moment, and nothing is “stable” yet. ● Current bus architecture limitations prevent very large numbers of attachments (~50 slaves ok). ● Good news is that V7 seems to have much better routing resources, especially when building large BRAMs. Timing much easier for large FFTs and snapshot blocks than on V6. ● Large designs easily meet timing at 240MHz. ● You’ll get to play with all this stuff during Adam’s SKARAB tutorials.
HMC memory What is Hybrid Memory Cube? ● Stacked DRAM on a chip, with a built-in management layer. ● Designed and optimised for very high throughput, not low-latency. Perfect for RA instrumentation! ● HMC takes care of itself, including error detection on memory cells and IO operations. ○ Don’t have to deal with refreshes, bank management etc in FPGA controller anymore. ● HMC contains “smarts”... has buffers and a small ALU. (can build accumulator inside the memory!) ● External interface is high speed serial (“SERDES”) links. ● HMC supports up to 4 sets of bidirectional 16-lane links, with each lane operating up to 15Gbps... ○ That’s up to 1.9Tbps. It’s FAST! ● Micron already on 3rd generation HMC. SKARAB uses 2nd generation at lower speeds.
Accessing HMC memory ● Yellowblock packages your instructions (read/write) into flits . ● A flit is a “packet” containing a header (instruction) and data (see HMC datasheet for details). ● Fortunately, all of this is abstracted-away for user; ○ Yellowblock makes HMC look like a conventional memory interface. ● Each HMC yellowblock offers two dual-ported interfaces. ● Simultaneous read and write operations are combined into a single flit . ● Memory is organised into Vaults, Banks and DRAMs. ● The controller allows you to arbitrarily map these into your address bits. ● By default, SKARAB’s implementation optimises for linear reads and writes. a 26 ... a 8 a 7 a 6 a 5 a 4 a 3 a 2 a 1 a 0 D 19 ... D 0 B 3 B 2 B 1 B 0 V 3 V 2 V 1 V 0 ● Yellowblock accesses 256 bits at a time, and presents a 256 bit bus. ○ One clock cycle per read&/write request ○ No need for burst reads or writes: truly random access possible.
Accessing HMC memory ● Yellowblock packages your instructions (read/write) into flits . ● A flit is a “packet” containing a header (instruction) and data (see HMC datasheet for details). ● Fortunately, all of this is abstracted-away for user; ○ Yellowblock makes HMC look like a conventional memory interface. ● Each HMC yellowblock offers two dual-ported interfaces. ● Simultaneous read and write operations are combined into a single flit . ● Memory is organised into Vaults, Banks and DRAMs. ● The controller allows you to arbitrarily map these into your address bits. ● By default, SKARAB’s implementation optimises for linear reads and writes. a 26 ... a 8 a 7 a 6 a 5 a 4 a 3 a 2 a 1 a 0 D 19 ... D 0 B 3 B 2 B 1 B 0 V 3 V 2 V 1 V 0 ● Yellowblock accesses 256 bits at a time, and presents a 256 bit bus. ○ One clock cycle per read&/write request ○ No need for burst reads or writes: truly random access possible.
Recommend
More recommend