Welcome Hardwa rdware re-acc acceler elerated ated CC CCD re D reado adout ut sm smear ar co correc rection tion for or Fa Fast st Sol olar ar Pol olarimete arimeter Stefan Tabel Walter Stechele and Korbinian Weikl Chair for Integrated Systems, Semiconductor Laboratory Technical University of Munich, of the Max Planck Society, Munich, Germany Munich, Germany IEEE E ASAP 2 P 2017 Monda day y July y 10th 0th, Sessi sion on 3: Image ge Process cessing ing 1 IEEE ASAP 2017 Stefan Tabel, MPG Semiconductor Laboratory
Related projects Fast Solar Polarimeter (FSP) 1m solar telescope SUNRISE Full custom camera On a stratosphere balloon Solar ground-based observations Same image quality as satellites Lower costs Can we install FSP on SUNRISE? No, readout smear will hinder the post-facto correction of image jitter. An online correction can solve this problem… 2 IEEE ASAP 2017 Stefan Tabel, MPG Semiconductor Laboratory
Readout smear models for the FSP camera δ : relative transfer-time S: smeared column α : relative switching-time Y: unsmeared column k: time index 2 x 1024 half-columns 512 pixel/half-column 400 images/second Ground-based observations 1 hour burst length General solution for corrected Constant scene image column 4 polarization states Not constant scene How to compute? Circularly appearing images For a jittered balloon flight Accumulation and inversion 3 IEEE ASAP 2017 Stefan Tabel, MPG Semiconductor Laboratory
Optimization of the algorithm Quadratic complexity 1) Undefined length of series Convergent 2) Correction via n successors only Approximation with fake 3) assumption of periodicity Matrix becomes circulant The inverse of a circulant matrix is circulant 4) Matrix-vector multiplication with a circulant matrix is a convolution A block of a circulant matrix is of Toeplitz type 5) Each Toeplitz matrix can be extended to a circulant matrix 4 IEEE ASAP 2017 Stefan Tabel, MPG Semiconductor Laboratory
Design space exploration 1 Study and single unit => no ASIC FPGA instead of CPU / GPGPU: Power dissipation in the stratosphere 10G Ethernet peripherals on-chip No need for hosts => Focus on Xilinx FFT cores: The correction needs to be done in single precision floating point Choose a mixed-model with n є [4:6] Uint16 image data should be transformed using a 31 bit fixed-point transform Twiddle factor width is 24 bit <= 5 IEEE ASAP 2017 Stefan Tabel, MPG Semiconductor Laboratory
Design space exploration 2 NetFPGA SUME offers QDR II+ SRAM 6.7 gbps Ethernet stream 209 M samples per hemisphere Requirements: Rotation of the image Parallelization FFT, multiplication, IFFT Degrees of freedom DDR3 vs. QDR II+ => simple design for feasibility study targeting a single unit camera Sequential vs. parallel algorithm => parallel version is always fast, slightly more expensive in logic, can be built in before the RAM, and can be easily configured to different depths of correction Order of RAM and FFT => FFT before RAM would increase memory costs Tasks Use one RAM-module per hemisphere, rotate image during write access Readout of parallel image-data Parallel fixed-point FFT Cast to single precision floating point, multiply with constants, cast to fixed-point, IFFT Interface 10Gig Ethernet 6 IEEE ASAP 2017 Stefan Tabel, MPG Semiconductor Laboratory
Memory and logic design 209 M pix./sec. @ 225 MHz write Write single pixel for image rotation Row index @ LSB for column access during read burst Each word serves as ring buffer for image bursts A crossbar is necessary at read side n times higher throughput @ read n parallel and synchronous inputs Correction values are constant (ROM) Synchronous calculations Higher throughput than in stream FFT modules are extended with typecasts One FFT module transforms 2 signals 7 IEEE ASAP 2017 Stefan Tabel, MPG Semiconductor Laboratory
Parallelization 1 sensor 2 RAMs 4 pipelines 1 stream Throughput and capacity require one RAM per hemisphere Parallel algorithm forces temporal multiplexing on two logic pipelines per RAM (zero insertion) Sequential variant can be built with lower logic resources at the costs of RAM Twice the clock-rate at 2 pipelines did not meet timing constraints No buffers at the memory interfaces, straight forward stream 8 IEEE ASAP 2017 Stefan Tabel, MPG Semiconductor Laboratory
Results and tests Implementation: n = 4 SRAM not included in table Correction for n=4 => max. error = 2 (Uint16) Correction for n=6 => max. error = 1 (Uint16) Cutoff due to noise => 3 bit in Uint16 Model-based, co-design with camera Separate throughput test, later testing Readout smear is a convolution Stepwise correction removes copies of the image FPGA module allows to use the FSP camera on the SUNRISE balloon mission 9 IEEE ASAP 2017 Stefan Tabel, MPG Semiconductor Laboratory
That`s it! Thank you very much for your interest! Your questions, please. 10 IEEE ASAP 2017 Stefan Tabel, MPG Semiconductor Laboratory
Recommend
More recommend