Review • Numbers Formats and Simple Arithmetic • FPGA Structure (CLB, Routing, IO, Clocks) • Pipelining (Resource VS Speed VS Latency) • Memories and Waveform Generation • ADCs and DACs applications in DSP • Constraints (Timing and Placement) • More Complex Arithmetic (Series Expansion and the CORDIC algorithm for sin & cos) • DSP Resources (DSP48E2 Block) • Filtering: FIR and IIR Implementations • Serial multi-rate DSP (decimation and interpolation) and applications
Looking Forward • Multi-Rate, Parallel DSP (1 week) • FFTs (2 weeks) • Digital Compensation (1 week) • PLLs • AGCs • Complete DSP Chains & SDRs (2 weeks) • Miscellaneous (1 week) • Pseudo-Random Noise Generators and CRC checks • PWM and PDM (audio systems)
Parallel Processing • In some instances, the timing requirements cannot be met with a serial process even after a DSP function is fully pipelined. • Example: • In desktop computers, video processing requires the values of many pixels to be simultaneously computed within the refresh rate. • Since many of the operations are independent, GPUs are well suited to handle the computational load in a parallel fashion.
Parallel Processing • FPGAs are well suited to handle parallel tasks. • We need to understand what can be computed independently, or how to modify the DSP algorithm to work in a parallel fashion. • Like pipelining there is a trade-off between use of resources and achievable clock rates. • Common Applications • FFTs • Video Processing • GSPS ADCs and DACs
Parallel Processing • Many FPGAs now have dedicated hardware components to facilitate the use of high speed data converters that operate at rates that exceed the FPGA fabric. • Gigabit Transceivers • Serializers • Deserializers • RFSoC Integrated ADCs and DACs • Extreme care must be taken to understand clock rates and data formats. • ODDR processing of the DAC channels on the dev board.
GigaBit Transceiver
Zynq RFSoC DATA_ADC0[127:0] – 8x 16-bit samples Up to 16 Converters 128 Values per clock cycle. DATA_ADC0[255:0] – 16x 16-bit samples Up to 16 Converters 256 Values per clock cycle.
Zynq RFSoC I/Q Mixers, Decimation, Interpolation all implemented in dedicated hardware.
Zynq RFSoC
Serializer • Part of the IO Logic • Data_In D8 to D1 • Data_Out OQ • Achieve output data rates that are up to 14x fabric rate.
Detailed View • 4-to-1 • Signals • 2 clocks • Global Clock: Slower clock from FPGA fabric. • IO Clock: High-speed Input/Output clock. • SDR and DDR • IO Data • 4 input lines • 1 output line • Enables • Training Data
Detailed View • Structure • Registers • Two Columns • Parallel Load • Global Clock • Shift Regs, Serialized output. • IO Clock • Muxes • Shift data from parallel load to Shift Registers. • Use Training Data. • Width Expansion
Detailed View • Operation • Global Clock • Loads parallel data from D4 to D1. • Strobe • Not used on OSERDESE2. • Selects mux to shift parallel data into shift registers. • I/O Clock • When Strobe is high, loads shift registers with new data. • When Strobe is low, shifts out the serial data. • Train pin selects preset training data rather than D4 to D1.
Detailed View • Operation • Global Clock • Loads parallel data from D4 to D1. • Strobe • Not used on OSERDESE2. • Selects mux to shift parallel data into shift registers. • I/O Clock • When Strobe is high, loads shift registers with new data. • When Strobe is low, shifts out the serial data. • Train pin selects preset training data rather than D4 to D1.
Detailed View • Operation • Global Clock • Loads parallel data from D4 to D1. • Strobe • Not used on OSERDESE2. • Selects mux to shift parallel data into shift registers. • I/O Clock • When Strobe is high, loads shift registers with new data. • When Strobe is low, shifts out the serial data. • Train pin selects D1 preset training data rather than D4 to D1.
Detailed View • Operation • Global Clock • Loads parallel data from D4 to D1. • Strobe • Not used on OSERDESE2. • Selects mux to shift parallel data into shift registers. • I/O Clock • When Strobe is high, loads shift registers with new data. • When Strobe is low, shifts out the serial data. • Train pin selects D2 preset training data rather than D4 to D1.
Detailed View • Operation • Global Clock • Loads parallel data from D4 to D1. • Strobe • Not used on OSERDESE2. • Selects mux to shift parallel data into shift registers. • I/O Clock • When Strobe is high, loads shift registers with new data. • When Strobe is low, shifts out the serial data. • Train pin selects D3 preset training data rather than D4 to D1.
Detailed View • Operation • Global Clock • Loads parallel data from D4 to D1. • Strobe • Not used on OSERDESE2. • Selects mux to shift parallel data into shift registers. • I/O Clock • When Strobe is high, loads shift registers with new data. • When Strobe is low, shifts out the serial data. • Train pin selects D4 preset training data rather than D4 to D1.
Detailed View • Operation • Global Clock • Loads parallel data from D4 to D1. • Strobe • Not used on OSERDESE2. • Selects mux to shift parallel data into shift registers. • I/O Clock • When Strobe is high, loads shift registers with new data. • When Strobe is low, shifts out the serial data. • Train pin selects D1 preset training data rather than D4 to D1.
Detailed View • Operation • Global Clock • Loads parallel data from D4 to D1. • Strobe • Not used on OSERDESE2. • Selects mux to shift parallel data into shift registers. • I/O Clock • When Strobe is high, loads shift registers with new data. • When Strobe is low, shifts out the serial data. • Train pin selects D2 preset training data rather than D4 to D1.
Detailed View • Operation • Global Clock • Loads parallel data from D4 to D1. • Strobe • Not used on OSERDESE2. • Selects mux to shift parallel data into shift registers. • I/O Clock • When Strobe is high, loads shift registers with new data. • When Strobe is low, shifts out the serial data. • Train pin selects D3 preset training data rather than D4 to D1.
Detailed View • Operation • Global Clock • Loads parallel data from D4 to D1. • Strobe • Not used on OSERDESE2. • Selects mux to shift parallel data into shift registers. • I/O Clock • When Strobe is high, loads shift registers with new data. • When Strobe is low, shifts out the serial data. • Train pin selects D4 preset training data rather than D4 to D1.
Detailed View • Operation • Global Clock • Loads parallel data from D4 to D1. • Strobe • Not used on OSERDESE2. • Selects mux to shift parallel data into shift registers. • I/O Clock • When Strobe is high, loads shift registers with new data. • When Strobe is low, shifts out the serial data. • Train pin selects D1 preset training data rather than D4 to D1.
Detailed View • Operation • Global Clock • Loads parallel data from D4 to D1. • Strobe • Not used on OSERDESE2. • Selects mux to shift parallel data into shift registers. • I/O Clock • When Strobe is high, loads shift registers with new data. • When Strobe is low, shifts out the serial data. • Train pin selects preset training data rather than D4 to D1.
Example
Example: 2.5 GSPS DAC • Think about the required Clock and Data Requirements. • Device requires two deinterleaved DDR data paths. • Data Rate (per path): 2.5 GSPS/2 = 1.25 GSPS • Clock Freq (per path): 2.5 GHz/4 = 625 MHz • FPGA OSERDES IO Clock operating in DDR mode. • We will drive each data path with an 8:1 OSERDESE2. • Our choice based on the available FPGA (Host Processor in the figure)
Example: 2.5 GSPS DAC FPGA Requirements DB0x14 Global Clock • Not DDR D14 D12 • 1.25 GSPS/8 D10 D8 8:1 • 156.25 MHz D6 8:1 8:1 D4 SerDes SerDes D2 SerDes D0 Every clock Cycle must update 16x 14-bit data GCLK IOCLK samples. DB1x14 reg [13:0] D [15:0] always@(posedge GCLK) D15 D13 begin D11 D9 8:1 D[15] <= ?; D7 8:1 8:1 D5 SerDes SerDes D[14] <= ?; D3 SerDes D1 ... D[0] <= ?; GCLK IOCLK end
Clock Gen. and Dist. Informational Resources IOSERDES: SelectIO Users Guide BUFR & BUFIO: Clocking Users Guide Instatiation: Libraries Guide
DCI Example: 2.5 GSPS DAC 0 1 0 1 FPGA Requirements DB0x14 0 8:1 1 Global Clock 0 SerDes 1 • Not DDR D14 D12 • 1.25 GSPS/8 D10 GCLK D8 IOCLK 8:1 • 156.25 MHz D6 8:1 8:1 D4 SerDes SerDes D2 SerDes D0 Every clock Cycle must update 16x 14-bit data GCLK IOCLK samples. DB1x14 reg [13:0] D [15:0] always@(posedge GCLK) D15 D13 begin D11 D9 8:1 D[15] <= ?; D7 8:1 GCLK 8:1 D5 SerDes SerDes D[14] <= ?; D3 SerDes BUFR D1 ... DIV D[0] <= ?; GCLK IOCLK IOCLK end BUF IO
Waveform Generation • Waveform Generation for a serializer becomes more complicated. • Start Simple: Using the previous example, how would a linear ramp be generated?
Recommend
More recommend