a hardware friendly bilateral solver for real time
play

A Hardware-Friendly Bilateral Solver for Real-Time Virtual-Reality - PowerPoint PPT Presentation

A Hardware-Friendly Bilateral Solver for Real-Time Virtual-Reality Video Amrita Mazumdar University of Washington Armin Alaghi Jonathan T. Barron Google David Gallup Luis Ceze Mark Oskin University of Washington Steven M. Seitz 1


  1. A Hardware-Friendly 
 Bilateral Solver for 
 Real-Time Virtual-Reality Video Amrita Mazumdar University of Washington Armin Alaghi Jonathan T. Barron Google David Gallup Luis Ceze Mark Oskin University of Washington Steven M. Seitz 1

  2. virtual reality video with omnidirectional stereo (ODS) 2

  3. the Google Jump camera rig can capture ODS video easily 16 GoPros x 4K camera feed 3.6 GB/s raw video 3

  4. the Google Jump camera rig can capture ODS video easily Anderson et al., SIGGRAPH Asia 2016 4

  5. the Google Jump camera rig can capture ODS video easily Anderson et al., SIGGRAPH Asia 2016 5

  6. processing video from Google Jump is slow 10 hours on 1000 cores 1 hour of video Anderson et al., SIGGRAPH Asia 2016 6

  7. Google Jump pipeline breakdown optical pre- download sensor alignment compositing processing to viewer flow Anderson et al., SIGGRAPH Asia 2016 7

  8. Google Jump pipeline breakdown the bilateral solver dominates processing time optical pre- download sensor alignment compositing processing to viewer flow 12% 2% 69% 17% Anderson et al., SIGGRAPH Asia 2016 8

  9. The bilateral solver produces an image that is smooth and accurate. transform to output result: upsample into blocky flow field bilateral grid and smooth flow field noisy flow field solve input pair 
 (from two cameras) Anderson et al., SIGGRAPH Asia 2016 9

  10. this work: a hardware-friendly bilateral solver (HFBS)

  11. The bilateral solver is hard to parallelize second-order global optimization global communication prevents aggressive parallelization high-dimensional, sparse matrices sparsity results in significant divergence on GPUs why not a dense grid? too large to store on-chip 11

  12. HFBS is easier to parallelize detailed formulation in paper Barron Poole 2016 HFBS (our work) ✅ includes color grayscale only ✅ dense matrix fits in memory dense matrix too big to fit in memory ✅ local communication only global communication required ✅ partial, non-iterative bistochastization iterative bistochastization before solving

  13. HFBS demonstrates imperceptible accuracy loss Barron Poole 2016 input image task: Ferstl et al., ICCV 2013, 
 noisy depth map HFBS (this work) data: Middlebury stereo dataset 13

  14. algorithm optimizations make it easier to implement bilateral solver in parallel hardware 14

  15. algorithm optimizations make it easier to implement bilateral solver in parallel hardware plan: exploit this parallelism with a custom hardware accelerator 15

  16. Mapping HFBS to hardware optical pre- download sensor alignment compositing processing to viewer flow 16

  17. Mapping HFBS to hardware optical pre- download sensor alignment compositing processing to viewer flow CPU FPGA load video pair construct bilateral grid per pair perform hardware- slice out solution friendly bilateral into output images solver 17

  18. microarchitecture HFBS z-axis memory controller bilateral filter worker bank z-axis memory AXI memory main memory bilateral filter worker memory custom bank interface access memory selector fixed-point layout datapath z-axis memory bilateral filter worker bank z-axis memory controller CPU 18

  19. Floating-point resource requirements limit hardware parallelism float64 32-bit fixed 64-bit fixed 47-bit fixed DSPs per 18 1 16 4 worker Maximum # 6840 427 1710 379 workers Error 8.3 x 10 -4 7.16 x 10 -13 6.69 x 10 -7 - (MSE) 19

  20. Fixed-point datapath conversion 32 64 47 Bitwidth 1E-02 (MSE relative to float64) 1E-04 Max Error Error 
 1E-06 1E-08 1E-10 1E-12 40% 50% 60% 70% 80% 90% Decimal Precision (Fraction of Bitwidth) 20

  21. z-axis slicing for bilateral grid memory layout x:0,y:0,r:255,g:172,b:0 x:0,y:1,r:255,g:172,b:0 . . . . . x:100,y:100,r:255,g:172,b:0 z = 0 21

  22. Evaluation

  23. Evaluation Does HFBS improve runtime? How does parallelization a ff ect power? optical pre- download sensor alignment compositing processing to viewer flow 12% 2% 69% 17% 23

  24. Experimental Setup CPU: Intel Xeon E5-2620 GPU: NVIDIA GTX 1080 Ti FPGA: Xilinx Virtex Ultrascale+ Baseline: Barron Poole et al. 2016 (CPU only) 256 iterations of optimization Varied bilateral grid vertices count 
 ⇒ 4 KB - 1.8 GB grid sizes 24

  25. HFBS is faster and more scalable than prior work. Prior Work (CPU) CPU GPU FPGA 10000 log Runtime (ms) 100 1 0.01 1,000 100,000 10,000,000 log Bilateral Grid Vertices 25

  26. HFBS is faster and more scalable than prior work. Prior Work (CPU) CPU GPU FPGA 10000 log Runtime (ms) 100 30 FPS and better 1 0.01 1,000 100,000 10,000,000 log Bilateral Grid Vertices 26

  27. HFBS-FPGA is more power-e ffi cient than other platforms Prior Work CPU GPU FPGA 40 Ops / Watt Improvement 30.72x 30 20 10 2.12x 1.00x 0.45x 0 Power-e ffi ciency relative to prior work 27

  28. building a VR video camera rig with HFBS

  29. this work full system 29

  30. HFBS-FPGA consumes much less power than a GPU for the same task 16 GPUs = 4,560 W 16 FPGAs = 400 W full system 30

  31. HFBS makes real-time VR video more feasible with FPGAs on-node with FPGAs o ffl oaded to cloud optical pre- download sensor alignment compositing processing to viewer flow 31

  32. to conclude fast, parallel implementation of bilateral solving with little accuracy loss fixed-point datatypes and a custom bilateral-grid memory layout for improved FPGA performance hardware-software codesign to reduce latency and improve quality for future VR applications 32

  33. A Hardware-Friendly Bilateral Solver for Real-Time Virtual Reality Video parallel algorithm for bilateral solving FPGA architecture 50x faster, 30x more power-e ffi cient 33

Recommend


More recommend