Scaled-RAM Interpolator on FPGA Xijie Jia 1 , Kaiyuan Guo 1 , - PowerPoint PPT Presentation

SRI-SURF: A Better SURF Powered by Scaled-RAM Interpolator on FPGA Xijie Jia 1 , Kaiyuan Guo 1 , Wenqiang Wang 3 , Yu Wang 1,2 and Huazhong Yang 1 1 E.E. Dept., TNLIST, Tsinghua University, Beijing, China 2 yu-wang@mail.tsinghua.edu.cn 3 Microsoft Research Asia, Beijing, China Nano-scale Integrated Circuit and System Lab, Department of Electronic Engineering, Tsinghua University

Outline • Introduction • Methods • Experiments • Conclusion p. 2

Outline • Introduction – Background – Related Work – SURF Algorithm – Contributions • Methods • Experiments • Conclusion p. 3

Background – Local Feature Extraction • Main Goal: – Find representative regions of a image – Find robust expression for each of them • What is “robust” feature: – Invariant to affine transformations, environment light, etc. • Algorithms: – SIFT (Scale Invariant Feature Transform) [IJCV04] – PCA-SIFT (Principle Component Analysis SIFT) [CVPR04] – GLOH (Gradient Location-Orientation Histogram) [PAMI05] – SURF (Speed-Up Robust Feature) [ECCV06] p. 4

Background - Applications • Image mosaic [ICISE09] • Requirements • Object recognition [SMC09] – Real-time processing – High matching precision at high • 3D reconstruction [ICIP12] resolution • Crowd counting [TCEC14] p. 5

Background - Performance Evaluation 0s • Frames Per Second (FPS) Frame 0 PPF 0 • Feature Points Per Frame (PPF) Frame 1 PPF 1 – Related to image resolution and texture complexity • Feature Points Per Second (PPS) FPS PPS …. …. – MAX-PPS: represents the calculation capacity of the system – ACT-PPS: represents the requirements of the application Frame N PPF N 1s p. 6

Related Work – SURF Acceleration Serial platform Parallel platform CPU GPU ASIC FPGA OpenSURF [2009] clSURF [GPGPU2011] SURFEX [CICC2013] SURF [FPT2013] Good Easy to Best energy Good energy portability realize efficiency efficiency from CPU Low energy Long develop Low Low flexibility efficiency cycle performance p. 7

Related Work – SURF Acceleration Version Clock Resolution FPS PPF PPS Octave Chip Function [GPGPU11] 1.4GHz 791x704 40 800 32K NA GTX480 FD+OG+DG [ReConFig11] 100MHz 640x480 ~2 ~49 0.1K 8 Virtex 5+PowerPC FD+OG+DG [BEC12] 25MHz 640x480 60 100 6.0K 6 3x Virtex 4 FD+OG+DG [TENCON13] 200MHz 300x300 42 250 10.5K 4 Zynq 7 FD+OG+DG [FPT13] 156MHz 640x480 356 100 35K 6 Virtex 6 FD+OG+DG [ReConfig14] 25MHz 640x480 131 1614 211K 6 Zynq 7 FD+OG [CICC13] 200MHz 1920x1080 57 5000 285K 12 ASIC FD+OG+DG • Early work on GPU: high performance by powerful chip FD ： Feature Detection OG ： Orientation Generation • Works on FPGA: performance was still insufficient DG ： Descriptor Generation – Simplification -> precision problem – Low computation capacity – High resource occupation • Work on ASIC: high performance by specific device p. 8

Introduction to SURF - Algorithm • Feature Detection – Calculate integral image —— base data – Calculate det 𝓘 𝑏prrox norm —— locate in each interval – Find local-maximum —— locate among neighbor interval – Up-sampling interpolation —— sub-pixel correction • Orientation Generation – Calculate Haar wavelet —— base data – Add-up Slide-Window —— locate orientation • Descriptor Generation – Calculate Haar wavelet —— base data – Sum-up Sub-Neighbor-Region —— generate 4x4x4 descriptors p. 9

Introduction to SURF - Algorithm image Integral image …… Scale image Scale image Scale image Feature points (x, y, s) Feature points’ orientation Feature points’ descriptor p. 10

Introduction to SURF - Complexity Find localMax Determinant Orientation Descriptor UpSamp-Intp Op. Total Candidate Feature Feature Resolution Point Point Point 640x480 520 520 500 Read RAM 9,059,904 453,440 2,304,000 11,817,344 Plus 7,361,172 6,480 1,152,320 4,864,000 13,383,972 High Minus 3,963,708 4,860 340,080 1,728,000 6,036,648 computation Multiply 566,244 165,360 1,296,000 2,027,604 complexity Square 283,122 37,440 320,562 Divide 283,122 283,122 Compare 14,040 18,720 32,760 Equation Set 540 540 Rotate 56,680 576,000 632,680 ATAN 520 520 Bottleneck of serial processing Points are computed serially, Good parallelism Bottleneck is single point processing p. 11

Introduction to SURF - approximation • Feature points are from different scales • Non-integer coordinate feature points • How to use integral image? R r =6s r θ • In OpenSURF, all the integral image data θ r FP(x,y,s) are from integer coordinates FP(x,y,s) FP r (x r ,y r ,s r ) FP r (x r ,y r ,s r ) • How about interpolation R=6s Orientation Descriptor The index deviation caused by rounding error FP: original feature point FP r : rounded-coordinates-and-scale feature point p. 12

Contribution • Interpolation of Integral Image (I 3 ) – For better matching precision • Compromise of Interpolation of Integral Image (CI 3 ) – Halve the memory access, by decreasing a bit accuracy – For higher processing speed • Multi-Scaled RAM (MSR) – For lower storage occupation p. 13

Outline • Introduction • Methods – Interpolation of Integral Image (I 3 ) – Compromise of Interpolation of Integral Image (CI 3 ) – Multi-Scaled RAM (MSR) – Implementation • Experiments • Conclusion p. 14

Interpolation of Integral Image Quantization Error of Image System Continuous image-> Acquisition -> Pixels Decimal coordinates-> Truncation -> Integer Loss of image detail Index deviation Cumulative error is enlarged step by step 0.5 R r =6s r θ θ r 4x Up FP(x,y,s) FP(x,y,s) FP r (x r ,y r ,s r ) FP r (x r ,y r ,s r ) R=6s Orientation Descriptor The index deviation caused by rounding error 4x 0.5 p. 15 FP: original feature point FP r : rounded-coordinates-and-scale feature point

Interpolation of Integral Image • Haar wavelet - math • OpenSURF decimal integer coordinate coordinate decimal integer distance distance Theoretical situation Directly read from integral image Approximate by interpolation p. 16

Compromise of Interpolation of Integral Image (CI 3 ) • Haar wavelet - math • A trade-off version decimal decimal coordinate coordinate decimal integer distance distance Need 32 number from integral image Need 32 number from integral image Different interpolation parameter Same interpolation parameter p. 17

Compromise of Interpolation of Integral Image (CI 3 ) • Haar wavelet - math • Proposed integer coordinate decimal coordinate decimal integer distance distance Need 32 number from integral image Pre-compute the Haar wavelets on integer Hard to fetch in parallel coordinates Need 4 pre-computed number p. 18

Compromise of Interpolation of Integral Image (CI 3 ) • Advantage: – Use interpolation to improve accuracy – Remains the data access pattern predictable • Weakness: – RAM occupation is doubled for pre-computed Harr wavelets. – Not exactly as the mathematical solution Point Coords. Version Coord.Type Index Level Type Deviation Rounded Trad. All Pixel Large Integer FP Fixed Decimal Sub-Pixel Small Propose NP Fixed Decimal Sub-Pixel Small d IP As Trad. As Trad. As Trad. p. 19

RAM Occupation Problem Comparison of FP Distribution and Buffer Utilization Row-Width Distribution of Rows 𝑡 0 Extracted FPs Needed 320 640 1280 1920 2 54% 71 20.28% 10.14% 5.07% 3.38% 3 29% 105 13.71% 6.86% 3.43% 2.29% 4 11% 140 10.29% 5.14% 2.57% 1.71% 5 5% 175 8.23% 4.11% 2.06% 1.37% • A large number of rows are required: 𝑡𝑞𝑏𝑜 IP,max = 2 23𝑡 0 + 1 + 2𝑡 0 • Only a few of the data are used: 24x24x8=4608 p. 20

Multi-Scaled RAM (MSR) • Scaled Integral Image -> Multi-Scaled RAM ImageWidth • Haar results of NP are processed on the 175 rows Original corresponding scaled RAM Multi-Scaled Integral Image • Normalized scale -> uniform RAM access 16 rows Integral Image pattern 34 rows HaarX Result • Adjust utilization: 34 rows HaarY Result – 39%, 26%, 19.5%, 15.5% 1/2 1/3 1/4 1/5 • Reject redundant data -> save RAM 1 1 1 1 – 16 + 34 × 2 × 2 + 3 + 4 + 5 = 108 – RAM saved: 1 − 108 175 = 38% p. 21

Scaled-RAM Interpolator on FPGA Xijie Jia 1 , Kaiyuan Guo 1 , - PowerPoint PPT Presentation

SRI-SURF: A Better SURF Powered by Scaled-RAM Interpolator on FPGA Xijie Jia 1 , Kaiyuan Guo 1 , Wenqiang Wang 3 , Yu Wang 1,2 and Huazhong Yang 1 1 E.E. Dept., TNLIST, Tsinghua University, Beijing, China 2 yu-wang@mail.tsinghua.edu.cn 3 Microsoft

CACHING BEYOND RAM CACHING BEYOND RAM memcached.org/blog @dormando WHY RAM? WHY RAM?

Cross Ram Support Set Ram accessories 1 Cross Ram Support Set Set composition The Cross

FPGA What is a FPGA? How FPGAs work How do they work? Manufacturers

Open Source FPGA Toolchain FPGA LSE Summer Week 2015 iCE40 Flow Conclusion Vincent Gatine

Tips about an FPGA 02/09/2018 J.C. special topic FPGA ( field-programmable gate array ) FPGA :

WWW.FPGA What is an FPGA? Field Programmable Gate Array Introduction to FPGA designs

Animation Java 3D provides a very powerful and easy to use animation facility It is

Variably scaled kernels M. Bozzini jointed with L. Lenarduzzi, M. Rossini, R. Schaback Maia

EMS RAM PUMPS EMS RAM PUMPS INDUSTRIES LTD INDUSTRIES LTD Press ENTER to continue EMS

Random Access Memory (RAM) Key features RAM is traditionally packaged as a chip.

Current Trends in Hybrid FPGA/CPU Devices Hybrid FPGA/CPU Devices Xilinx Zynq Series Real

FPGA-CAPELLA: A REAL TIME AUDIO FX UNIT COSMA KUFA AND JUSTIN XIAO WHAT IS FPGA-CAPELLA?

Public FPGA based DM Public FPGA based DMA Atta A Attacking king UlfFrisk Agenda Background

GRVI Phalanx Update: A Massively Parallel RISC-V FPGA Accelerator Framework Jan Gray |

An introduction to FPGA-based acceleration of neural networks Marco Pagani 1 What is an FPGA?

RTLinux in an FPGA Alejandro Lucero alucero@os3sl.com www.os3sl.com RTLinux in a FPGA 1.

Surfcrest Annual Meeting 2018 May 19, 2018 Please silence your cell phones Be sure to register

SURF Space Availability Joshua Willhite LBNF Far Site Conventional Facilities Project Manager 14

iARCH Asynchronous file handling with iRODS tape resources

Custom Board Yellow A Design Concept 7 6 5 4 3 1 5 2 1 5 1 2 2 1 5 1 2 2 8 8

Overview Optical flow Video classification Bag of spatio-temporal features Action

Random Surfjng on Multipartite Graphs Athanasios N. Nikolakopoulos, Antonia Korba and John D.

MATH 3341: Introduction to Scientific Computing Lab Libao Jin University of Wyoming October 28,

TYPO3 Surf Get on your board! Jan Kiesewetter @t3easy_de What is a deployment Do recurring