Exploring Computation- Communication Tradeo ff s in Camera Systems - PowerPoint PPT Presentation

Exploring Computation- Communication Tradeo ff s in Camera Systems Amrita Mazumdar Armin Alaghi Thierry Moreau Luis Ceze Sung Kim Mark Oskin Meghan Cowan   Visvesh Sathe IISWC 2017 1

Camera applications are a prominent workload with tight constraints real-time low-power light weight processing augmented energy reality glasses low-power harvesting camera light weight real-time processing real-time processing large data video size 3D-360 virtual surveillance large data reality camera cameras size rig 2

Hardware implementations compound the camera system design space camera system implementation constraint ASIC FPGA power bandwidth GPU DSP time size CPU DogChat™ 3

We can represent camera applications as   camera processing pipelines   to clarify design space exploration sensor block 1 block 2 block 3 block 4 functions in the application 4

We can represent camera applications as   camera processing pipelines   to clarify design space exploration image face feature image sensor processing detection tracking rendering DogChat™ 5

Developers can trade o ff between computation and communication costs image face feature image sensor processing detection tracking rendering o ffl oaded to cloud DogChat™ 6

Developers can trade o ff between computation and communication costs image face feature image sensor processing detection tracking rendering in-camera processing o ffl oaded to cloud DogChat™ 7

Optional and required blocks in camera pipelines introduce more tradeo ff s edge motion detection tracking motion detection image face feature image sensor processing detection tracking rendering required optional 8

Custom hardware platforms explode the camera system design space GPU edge motion DSP ASIC detection tracking motion detection FPGA image face feature image sensor processing detection tracking rendering DSP CPU FPGA required optional 9

Custom hardware platforms explode the camera system design space GPU edge motion DSP ASIC detection tracking motion detection FPGA In-camera processing pipelines can help us evaluate these tradeo ff s! image face feature image sensor processing detection tracking rendering DSP CPU FPGA required optional 10

  Challenges for modern camera systems Low-power: face authentication for energy-harvesting cameras with ASIC design   motion face neural detection detection network Low latency: real-time virtual reality for multi-camera rigs with FPGA acceleration   stitch prep align depth 11

Face authentication with energy harvesting cameras WISP Cam energy-harvesting camera powered by RF 1 frame / second ~1 mW processing / frame 13

Face authentication with energy harvesting cameras Is this Armin? ✅ 14

CPU-based face authentication neural networks can exceed WISPcam power budgets other neural sensor application network functions on-chip CPU cloud 15

CPU-based face authentication neural networks can exceed WISPcam power budgets other motion face neural sensor application detection detection network functions on-chip ASIC hardware cloud circuit adding optional blocks can reduce power consumption for a neural network 16

Exploring design tradeo ff s in ASIC accelerators neural network face detection pixels in integral accumulator VJ SNNAP PE0 PE1 PE2 PE3 1 1 1 1 2 3 += PU input row + + + 2 6 7 integral image accumulator weight weight weight weight control d_in 1 4 4 integral row DMA Master output previous row 8 8 8 8 8 8 8 8 PE classifier unit offset MUL MUL MUL MUL acc ... window buffer SRAM feature unit 26 16 16 16 16 8 Bus 16 + a d 26 26 26 26 PE acc. stage unit - x 26 ADD ADD ADD ADD + Scheduler b c acc fifo threshold unit weight1 + + 26 26 26 26 SIG + a d - x threshold sig. many more details + sigmoid unit b c feature unit fifo > weight2 8 + a d - x d_out + b c ‘yes’ weight weight3 ‘no’ weight in paper! Streaming face detection Evaluated NN topology and hardware accelerator impact on energy and accuracy Selected a 400-8-1 network topology Explored classifier and other and used 8-bit datapaths for optimal algorithm parameters to optimize energy/accuracy point energy optimality 17

Evaluation Which pipeline achieves the lowest overall power? Synthesized ASIC accelerators in Synopsys Constructed simulator to evaluate power consumption on real-world video input Computed power for computation and transfer of resulting data for each pipeline configuration 18

Which pipeline achieves the lowest power consumption? (ratios) platform configuration compute transfer 11,340 sensor <1% >99% 3,731 sensor motion <1% >99% 374 sensor face detect 10% 90% 782,090 sensor NN 16% 84% 132 sensor motion face detect >99% <1% 257,236 sensor motion NN >99% <1% 419 sensor face detect NN >99% <1% 160 face detect NN sensor motion >99% <1% 1 1000 1000000 log Power (µW) 19

Which pipeline achieves the lowest power consumption? (ratios) platform configuration compute transfer 11,340 sensor <1% >99% 3,731 sensor motion <1% >99% 374 sensor face detect 10% 90% 782,090 sensor NN 16% 84% prefilters reduce 132 sensor motion face detect >99% <1% overall power 257,236 sensor motion NN >99% <1% 419 sensor face detect NN >99% <1% 160 face detect NN sensor motion >99% <1% 1 1000 1000000 log Power (µW) 20

Which pipeline achieves the lowest power consumption? (ratios) platform configuration compute transfer 11,340 sensor <1% >99% 3,731 sensor motion <1% >99% just using NN 374 sensor face detect 10% 90% 782,090 sensor NN 16% 84% 132 sensor motion face detect >99% <1% prefilters with NN use 257,236 sensor motion NN >99% <1% less power 419 sensor face detect NN >99% <1% 160 face detect NN sensor motion >99% <1% 1 1000 1000000 log Power (µW) 21

Which pipeline achieves the lowest power consumption? (ratios) platform configuration compute transfer 11,340 sensor <1% >99% 3,731 sensor motion <1% >99% 374 sensor face detect 10% 90% 782,090 sensor NN 16% 84% most power- 132 sensor motion face detect >99% <1% e ffi cient 257,236 sensor motion NN >99% <1% 419 sensor face detect NN >99% <1% most power- e ffi cient with 160 face detect NN sensor motion >99% <1% on-chip NN 1 1000 1000000 log Power (µW) 22

In-camera processing for face authentication motion face neural detection detection network In isolation, even well-designed hardware   can show sub-optimal performance Optional blocks can improve the overall cost,   if they balance compute and communication   better than the original design 23

Producing real-time VR video from a camera rig Goal: 30 fps 3D-360 stereo video 1.8 GB/s output 16 GoPro cameras 4K-30 fps 3.6 GB/s raw video 26

Producing real-time VR video from a camera rig Goal: cloud processing 30 fps prevents real- 3D-360 stereo video time video 1.8 GB/s output 16 GoPro cameras 4K-30 fps 3.6 GB/s raw video 27

VR pipeline is usually o ffl oaded to perform heavy computation o ffl oaded to cloud image depth image stream sensor prep align from flow stitch to viewer 5% 20% 70% 5% processing time need to accelerate “depth from flow” to achieve high performance 28

O ffl oading before the costly step doesn’t avoid compute-communication tradeo ff s 600 image alignment step Video Frame Size (MB) produces significant 450 intermediate data 300 o ffl oading early on is 150 still 2x final output size 0 image depth image stream sensor prep align from flow stitch to viewer 29

Evaluation Which pipeline achieves the highest frame rate? Designed a simple parallel accelerator for Xilinx implementation Zynq SoC, simulated for Virtex UltraScale+ details in paper Evaluated against CPU and GPU implementations in Halide Assumed 2GB/s network link for communication 30

Exploring Computation- Communication Tradeo ff s in Camera Systems - PowerPoint PPT Presentation

Exploring Computation- Communication Tradeo ff s in Camera Systems Amrita Mazumdar Armin Alaghi Thierry Moreau Luis Ceze Sung Kim Mark Oskin Meghan Cowan Visvesh Sathe IISWC 2017 1 Camera applications are a prominent workload with tight

EE 193 Imaging systems: Tradeo ff s (and how to break them) Steven Bell 31 October 2019 Sketch

Camera to camera communication Why it is unique Descript ion Use cases Feat ures Hive LPR

Exploring the universe of mathematics. Computation, experimentation and exploration in

# Camera camera = Camera.open(); Camera camera

Overlapping Communication and Computation with High Level Communication Routines - On Optimizing

# Camera camera = Camera.open();

SociableSense: Exploring the Trade- offs of Adaptive Sampling and Computation Offloading for

Basics of Off-Camera Flash Off-Camera Flash www.jedi.com * What is it & why do we use it? *

Will They Blend?: Exploring Big Data Computation atop Traditional HPC NAS Storage Ellis H. Wilson

Interprocess Communication (IPC) The characteristics of protocols for communication between

Camera Rongkai Guo Why Camera First? Games have their own visual rules Contrary to other

Secure Computation with Low Communication from Cross-checking Dov Gordon (George Mason U.)

Robust Lower Bounds for Communication and Stream Computation Amit Chakrabarti Dartmouth

Other Camera Controls The LookAt function is only for positioning camera Other ways to

Other Camera Controls The LookAt function is only for positioning camera Other ways to

Classification of communication and cooperation mechanisms for logical and symbolic computation

holder.addCallback(this); holder.setType(SurfaceHolder.STP); MediaRecorder r = new

3G Car EdgeCam Pro JC200 Founded in 2016, brand as EASYTRAX Focus on Telematics communication

Humanoid Robotics Camera Parameters Maren Bennewitz What is Camera Calibration? A camera

SK Telecom 1 U U U U U U U- U - - communication - - - - - communication

Humanoid Robotics Camera Parameters Maren Bennewitz What is Camera Calibration? A camera

Mobile Edge Computing Wei-Yu Chen Outline 5G Communication Components Computation

L ECTURE 7 Last time Communication complexity Other models of computation Today

On the Computation and Communication Complexity of Parallel SGD with Dynamic Batch Sizes for

Exploring Computation- Communication Tradeo ff s in Camera Systems - PowerPoint PPT Presentation

Exploring Computation- Communication Tradeo ff s in Camera Systems Amrita Mazumdar Armin Alaghi Thierry Moreau Luis Ceze Sung Kim Mark Oskin Meghan Cowan Visvesh Sathe IISWC 2017 1 Camera applications are a prominent workload with tight

EE 193 Imaging systems: Tradeo ff s (and how to break them) Steven Bell 31 October 2019 Sketch

Camera to camera communication Why it is unique Descript ion Use cases Feat ures Hive LPR

Exploring the universe of mathematics. Computation, experimentation and exploration in

# Camera camera = Camera.open(); Camera camera

Overlapping Communication and Computation with High Level Communication Routines - On Optimizing

# Camera camera = Camera.open();

SociableSense: Exploring the Trade- offs of Adaptive Sampling and Computation Offloading for

Basics of Off-Camera Flash Off-Camera Flash www.jedi.com * What is it &amp; why do we use it? *

Will They Blend?: Exploring Big Data Computation atop Traditional HPC NAS Storage Ellis H. Wilson

Interprocess Communication (IPC) The characteristics of protocols for communication between

Camera Rongkai Guo Why Camera First? Games have their own visual rules Contrary to other

Secure Computation with Low Communication from Cross-checking Dov Gordon (George Mason U.)

Robust Lower Bounds for Communication and Stream Computation Amit Chakrabarti Dartmouth

Other Camera Controls The LookAt function is only for positioning camera Other ways to

Other Camera Controls The LookAt function is only for positioning camera Other ways to

Classification of communication and cooperation mechanisms for logical and symbolic computation

holder.addCallback(this); holder.setType(SurfaceHolder.STP); MediaRecorder r = new

3G Car EdgeCam Pro JC200 Founded in 2016, brand as EASYTRAX Focus on Telematics communication

Humanoid Robotics Camera Parameters Maren Bennewitz What is Camera Calibration? A camera

SK Telecom 1 U U U U U U U- U - - communication - - - - - communication

Humanoid Robotics Camera Parameters Maren Bennewitz What is Camera Calibration? A camera

Mobile Edge Computing Wei-Yu Chen Outline 5G Communication Components Computation

L ECTURE 7 Last time Communication complexity Other models of computation Today

On the Computation and Communication Complexity of Parallel SGD with Dynamic Batch Sizes for

Basics of Off-Camera Flash Off-Camera Flash www.jedi.com * What is it & why do we use it? *