ffts
play

FFTs Overview EECS 360 Notes Methods descriptions Hardware - PowerPoint PPT Presentation

FFTs Overview EECS 360 Notes Methods descriptions Hardware Implementations Direct Implementation Goertzel Re-indexing Chirp-z Rader Fourier Methods Time Domain Frequency Domain Frequency Time Domain Transfer


  1. FFTs • Overview • EECS 360 Notes • Methods descriptions • Hardware Implementations • Direct Implementation • Goertzel • Re-indexing • Chirp-z • Rader

  2. Fourier Methods Time Domain Frequency Domain Frequency Time Domain Transfer Function (continuous/discrete Transform Method (Tables) (continuous/discrete Domain Periodicity s or z translation ) ) Periodicity CTFS (6.1, 6.2) Periodic Τ Discrete s = j ∙2π∙k/T or ∞ 𝑑 𝑦 𝑙 𝑓 𝑘2𝜌 𝑙 𝑈 𝑢 𝑦 𝑢 = σ 𝑙=−∞ Continuous (t) Aperiodic (T=1/ Δ f) 𝑈 𝑦 𝑢 𝑓 −𝑘2𝜌 (k, Δ f=1/T, f=k ∙Δ f) j ∙2π∙ k ∙Δ f 1 Τ 𝑙 𝑈 𝑢 𝑒𝑢 𝑑 𝑦 𝑙 = 𝑈 ׬ 0 CTFT (6.3-6.6) ∞ 𝑌 𝑔 𝑓 𝑘2𝜌𝑔𝑢 𝑒𝑔 𝑦 𝑢 = ׬ Continuous (t) Aperiodic Continuous (f) Aperiodic s = j ∙2π∙f −∞ ∞ 𝑦 𝑢 𝑓 −𝑘2𝜌𝑔𝑢 𝑒𝑢 𝑌 𝑔 = ׬ −∞ DTFS 𝑂−1 𝑑 𝑦 𝑙 𝑓 𝑘2𝜌𝑙𝑜/𝑂 Discrete Periodic Discrete Periodic 𝑦 𝑜 = σ 𝑙=0 z = e j∙2π∙k/T or e j∙2π∙k∙Δ f (n, Δ t=1/BW, t=n ∙Δ t) (N, T=N∙Δ t) (k, Δ f=1/T, f=k ∙Δ f) (N, BW=N∙Δ f) 1 𝑂−1 𝑦 𝑜 𝑓 −𝑘2𝜌𝑙𝑜/𝑂 𝑂 σ 𝑜=0 𝑑 𝑦 𝑙 = DTFT 𝐶𝑋 𝑌 𝑔 𝑓 𝑘2𝜌𝑔 𝑜∙∆𝑢 𝑒𝑔 Periodic 𝑦 𝑜 = ׬ z = e j∙2π∙f Discrete (n, t=n ∙Δ t) Aperiodic Continuous (f) 0 (BW=1/ Δ t) 1 ∞ 𝑦 𝑜 𝑓 −𝑘2𝜌𝑔 𝑜∙∆𝑢 𝐶𝑋 σ 𝑜=−∞ 𝑌 𝑔 = *unless noted otherwise, Δ t is assumed to be 1.

  3. Fourier Methods: DTFS variation (The DFT or FFT) Time Domain Frequency Domain Frequency Time Domain Transfer Function (continuous/discrete Transform Method (Tables) (continuous/discrete Domain Periodicity s or z translation ) ) Periodicity DTFS 𝑂−1 𝑑 𝑦 𝑙 𝑓 𝑘2𝜌𝑙𝑜/𝑂 Discrete Periodic Discrete Periodic 𝑦 𝑜 = σ 𝑙=0 z = e j∙2π∙k/T or e j∙2π∙k∙Δ f (n, Δ t=1/BW, t=n ∙Δ t) (N, T=N∙Δ t) (k, Δ f=1/T, f=k ∙Δ f) (N, BW=N∙Δ f) 1 𝑂−1 𝑦 𝑜 𝑓 −𝑘2𝜌𝑙𝑜/𝑂 𝑑 𝑦 𝑙 = 𝑂 σ 𝑜=0 DFT (MATLAB: FFT and IFFT) Discrete Periodic Discrete Periodic 1 𝑂−1 𝑌 𝑙 𝑓 𝑘2𝜌𝑙𝑜/𝑂 z = e j∙2π∙k/T or e j∙2π∙k∙Δ f IFFT: 𝑦 𝑜 = 𝑂 σ 𝑙=0 (n, Δ t=1/BW, t=n ∙Δ t) (N, T=N∙Δ t) (k, Δ f=1/T, f=k ∙Δ f) (N, BW=N∙Δ f) 𝑂−1 𝑦 𝑜 𝑓 −𝑘2𝜌𝑙𝑜/𝑂 FFT: 𝑌 𝑙 = σ 𝑜=0 *unless noted otherwise, Δ t is assumed to be 1.

  4. DFT equation DFT (MATLAB: FFT and IFFT) 1 𝑂−1 𝑌 𝑙 𝑓 𝑘2𝜌𝑙𝑜/𝑂 IFFT: 𝑦 𝑜 = 𝑂 σ 𝑙=0 𝑂−1 𝑦 𝑜 𝑓 −𝑘2𝜌𝑙𝑜/𝑂 FFT: 𝑌 𝑙 = σ 𝑜=0

  5. Goertzel Algorithm 𝑂−1 𝑦 𝑜 𝑓 −𝑘2𝜌𝑙𝑜/𝑂 FFT: 𝑌 𝑙 = σ 𝑜=0 • Expand the sum, W N = e -j2 π /N X[k] = W N 0k x[0] + W N 1k x[1] + W N 2k x[2] + ... + W N (N-2)k x[N-2] + W N (N-1)k x[N-1] X[k] = (W N -Nk )(W N 0k x[0] + W N 1k x[1] + W N 2k x[2] + ... + W N (N-2)k x[N-2] + W N (N-1)k x[N-1]) X[k] = (W N -Nk x[0] + W N -(N-1)k x[1] + W N -(N-2)k x[2] + ... + W N -(2)k x[N-2] + W N -(1)k x[N-1]) X[k] = (W N -(N-1)k x[0] + W N -(N-2)k x[1] + W N -(N-3)k x[2] + ... + W N -(1)k x[N-2] + x[N-1])W N -(1)k -k + ... + x[N-2])W N -k + x[N-1])W N X[k] = (...((W N -2k x[0] + W N -1k x[1] + x[2])W N -k -1k + x[1])W N -1k + x[2])W N -k + ... + x[N-2])W N -k + x[N-1])W N X[k] = ((...(((x[0])W N -k -k + x[1])W N -k + x[2])W N -k + ... + x[N-2])W N -k + x[N-1])W N X[k] = ((...(((x[0])W N -k -k every • Integrator multiplied by W N iteration.

  6. DFT equation -k + x[1])W N -k + x[2])W N -k + ... + x[N-2])W N -k + x[N-1])W N -k X[k] = ((...(((x[0])W N -k + x[1])W N -k + x[2])W N -k + ... + x[N-2])W N -k + x[N-1])W N -k X[k] = ((...(((x[0])W N -k + x[1])W N -k + x[2])W N -k + ... + x[N-2])W N -k + x[N-1])W N -k X[k] = ((...(((x[0])W N -k + x[1])W N -k + x[2])W N -k + ... + x[N-2])W N -k + x[N-1])W N -k X[k] = ((...(((x[0])W N -k + x[1])W N -k + x[2])W N -k + ... + x[N-2])W N -k + x[N-1])W N -k X[k] = ((...(((x[0])W N

  7. Remember the Integrator Filter • Sample Domain Equation • 1 st order IIR filter with a0 = 1; y[n] = x[n] + y[n-1] • Z domain H(z) = 1/(1-z -1 ) • Pole at z = 1 (Critically Stable) z -1

  8. DFT equation -k + x[1])W N -k + x[2])W N -k + ... + x[N-2])W N -k + x[N-1])W N -k X[k] = ((...(((x[0])W N -k W N z -1

  9. DFT equation -k + x[1])W N -k + x[2])W N -k + ... + x[N-2])W N -k + x[N-1])W N -k X[k] = ((...(((x[0])W N -k W N wrap rst n-counter z -1 Adders: 1+2 = 3. Multipliers: 4.

  10. 8-Point DFT 1 0.7071+j0.7071 X[0] X[1],conj(X[7]) z -1 z -1 -1 j1 X[4] z -1 X[2],conj(X[6]) z -1 Adders: 11. Multipliers: 14. Delays: 5. -0.7071+j0.7071 X[3],conj(X[5]) z -1

  11. N-Point DFT (even) 1 1 W N X[0] X[1],conj(X[7]) z -1 z -1 -1 2 W N X[N/2] z -1 X[2],conj(X[6]) z -1 In Parallel N/2-1 W N Adders: 2+(N/2-1)*3. Multipliers: 2+(N/2-1)*4. X[3],conj(X[5]) Registers: N. Latency: N. z -1

  12. N-Point Complex DFT (even) ??? 0 : (from ROM) W N X[0] n-counter Data In Memory rst z -1 1 W N X[1] rst z -1 Data Out Memory ... N-1 W N In Parallel X[N-1] Adders: N*4. Multipliers: N*4. rst z -1 Registers: N. ena Latency: N. ???

  13. N-Point DFT (even) done -k W N CORDIC n-counter k-counter or ROM Data In concat I&Q Memory Dual Port rst z -1 Data Out Memory In Series Adders: 3. Multipliers: 4. Registers: 2. Latency: N*N/2. Excludes CORDIC and storage. At 100 MHz, 1024 pt DFT in 1024*512/100e6 = 5.12ms

  14. N-Point complex DFT (even) done CORDIC or Cos -k W N n-counter k-counter Sin ROM Concat I&Q concat I&Q Data In Data Out rst z -1 Memory Memory addr ena In Series Adders: 4. Multipliers: 4. Registers: 2. Latency: N*N. Excludes CORDIC and storage. At 100 MHz, 1024 pt DFT in 1024*1024/100e6 = 10.24ms

  15. Trade Offs In Series In Parallel Adders: 4. Adders: N*4. Multipliers: 4. Multipliers: N*4. Registers: 2. Registers: N. Latency: N*N. Latency: N. Excludes CORDIC and storage. Direct Trade: At 100 MHz, 1024 pt DFT in 1024x Resources, 1024*1024/100e6 = 10.24ms 1024x Faster.

  16. N-Point complex DFT (even) done CORDIC or Cos -k W N n-counter k-counter Sin ROM concat I&Q Data In Data Out rst z -1 Memory Memory addr ena -(k+1) W N Partially Parallel concat I&Q Adders: 4*2. Data Out rst z -1 Memory addr Multipliers: 4*2. Registers: 2*2. ena Latency: N*N/2. Excludes CORDIC and storage. At 100 MHz, 1024 pt DFT in 1024*512/100e6 = 5.12ms

  17. Trade Offs In Series Partially Parallel In Parallel Adders: 4. Adders: 4*2. Adders: N*4. Multipliers: 4. Multipliers: 4*2. Multipliers: N*4. Registers: 2. Registers: 2*2. Registers: N. Latency: N*N. Latency: N*N/2. Latency: N. Excludes CORDIC and storage. Excludes CORDIC and storage. Direct Trade: At 100 MHz, 1024 pt DFT in At 100 MHz, 1024 pt DFT in 1024x Resources, 1024*1024/100e6 = 10.24ms 1024*512/100e6 = 5.12ms 1024x Faster. What level of parallelization should we use? Depends on: 1. # of resources. 2. Types of resources (memory access is typically in a serial fashion) (??? Problem above)

  18. Memory Resources (Series 7) • Dual Port 36 Kb. • Can’t access more than 2 address values per cycle. • 2x Single Port 18 Kb. • Smallest Memory Segment. • Number Formats (single addr). • Concatinated Real and Imag. • 18-bit real and 18-bit imag #s. • Results in 2x 512 Complex Values. • For N-Value DFT. • Parallelization by N/512.

  19. 1024-Point 18r:18i complex DFT done -k W N -(k+1) W N n-counter k-counter -k W N rst z -1 addr wea -(k+1) W N Partially Parallel rst z -1 Adders: 4*2. addr+1 Multipliers: 4*2. web Registers: 2*2. Latency: N*N/2. Excludes CORDIC and storage. At 100 MHz, 1024 pt DFT in 1024*512/100e6 = 5.12ms

  20. 2048-Point 18r:18i complex DFT done -k W N W N -(k+2) -(k+1) W N W N -(k+3) n-counter k-counter 0 to 2048 0 to 512 Partially Parallel Adders: 4*4. Multipliers: 4*4. Registers: 2*4. Latency: N*N/4. Excludes CORDIC and storage. At 100 MHz, 1024 pt DFT in 2048*512/100e6 = 10.24ms

  21. Implementing the accum and mult. done CORDIC or Cos -k W N n-counter k-counter Sin ROM concat I&Q Data In Data Out rst z -1 Memory Memory addr ena In Series Adders: 4. Multipliers: 4. Registers: 2. Latency: N*N. Excludes CORDIC and storage. At 100 MHz, 1024 pt DFT in 1024*1024/100e6 = 10.24ms

  22. Implementing the accum and mult. z -1 -k ) real(W N -k ) -imag(W N -k ) imag(W N -k ) real(W N z -1

  23. The DSP48E1

  24. Complex Multiply with DSP48E1 z -1 We need two of these for real and imaginary parts. 4x DSP Slices

  25. Implementing the accum and mult. We need two of these for real and imaginary parts. 2x DSP Slices

  26. Implementing the accum and mult. Keeping track of BP. B-input is 18 bits (use BP=17)

  27. -k ) real(W N -k ) -imag(W N -k ) imag(W N Implementation with 6 DSP Slices. -k ) real(W N

  28. Implementing the accum and mult. 48-bit accum is a bit excessive. Can be configured as 2x 24-bit adders using inputs (A:B) and C.

  29. -k ) real(W N -k ) -imag(W N -k ) imag(W N -k ) real(W N Implementation with 5 DSP slices.

  30. 1024-Point 18r:18i complex DFT done -k W N -(k+1) W N n-counter k-counter addr wea Partially Parallel addr+1 10 DSP Blocks. web 3x 36kb Block rams 2 counters Latency: N*N/2. At 100 MHz, 1024 pt DFT in 1024*512/100e6 = 5.12ms

Recommend


More recommend