Silicon retina technology Tobi Delbruck Inst. of Neuroinformatics , University of Zurich and ETH Zurich Sensors Group sensors.ini.uzh.ch Sponsors: Swiss National Science Foundation NCCR Robotics project, EU projects SEEBETTER and VISUALISE , Samsung , DARPA 1
sensors.ini.uzh.ch inilabs.com Sponsors: Swiss National Science Foundation NCCR Robotics , EU projects CAVIAR , SEEBETTER , VISUALISE , Samsung , DARPA , University of Zurich and ETH Zurich 2
Conventional cameras ( Static vision sensors ) output a stroboscopic sequence of frames Golf-guides.blogspot.com Muybridge 1878 (150 years ago) Good Bad Compatible with 50+years of machine vision Redundant output Allows small pixels (1um for consumer, 3-5um Temporal aliasing for machine vision) Limited dynamic range (60dB) Fundamental “latency vs. power” trade -off 3
The Human Eye as a digital camera 100M photoreceptors 1M output fibers carrying max 100Hz spike rates 180dB (10 9 ) operating range >20 different “eyes” Many GOPs computing 3mW power consumption Output is sparse, asynchronous stream of digital spike events 4
This talk has 4 parts • Dynamic Vision Sensor Silicon Retinas • Simple object tracking by algorithmic processing of events • Using probabilistic methods for state estimation •“Data - driven” deep inference with CNNs 5
DVS (Dynamic Vision Sensor) Pixel threshold ON Brightness change log I OFF events Event comparators reset ±𝛦log𝐽 I (ganglion cells) change amplifier (bipolar cells) photoreceptor Lichtsteiner et al., ISSCC 2007, JSSC 2009 From Rodieck 1998 6
DVS pixel has wide dynamic range 780 lux ON events 5.8 lux 780 lux 5.8 lux Edmund 0.1 density chart ISSCC 2007 Illumination ratio=135:1
Using DVS for high speed (low data rate) imaging Data rate <1MBps “Frame rate” equivalent to 10 kHz but 100x less data (10 kHz image sensor x 16k pixels = 160 MBps) ISSCC 2007
DAVIS (Dynamic and Active Pixel Vision Sensor) Pixel Intensity threshold reset ON Change Intensity value events log I OFF Event ±𝛦log𝐽 comparators reset I (ganglion cells) change amplifier (bipolar cells) photoreceptor Brandli et al., Symp VLSI, JSSC 2014 From Rodieck 1998 9
10
DVS/DAVIS +IMU demo Start DAVIS Demo Brandli, Berner, Delbruck et al., Symp. VLSI 2013, JSSC 2014, ISCAS 2015 11
DAVIS (Dynamic and Active Pixel Vision Sensor) Pixel Intensity threshold reset ON Change Intensity value events log I OFF Event ±𝛦log𝐽 comparators reset I (ganglion cells) change amplifier (bipolar cells) photoreceptor Brandli et al., Symp VLSI, JSSC 2014 From Rodieck 1998 12
DAVIS346 8mm AER DVS asynch. event readout Bias generator 180nm CIS 346x260 18.5um pixel DAVIS APS col-parallel ADCs and scanner 13
Important layout considerations 1. Post layout simulations to minimize parasitic coupling 2. Shielding parasitic photodiodes 14
Event threshold matching measurement Experiment: Apply slow triangle wave LED stimulus to entire array, measure njmber of events that pixels generate Conclusion: Pixels generate 11±3 events per factor 3.3 contrast. Since ln(3.3)=1.19 and 1.19/11=0.11, contrast threshold=11% ± 4%
Measuring DVS pixel latency Experiment: Stimulate small area of sensor with flashing LED spot, measure response latencies from recorded event stream jitter latency time Conclusion: Pixels can have minimum latency of about 12us under bright illumination. But “real world” latencies are more like 100us -1ms.
DVS pixel has built-in temperature compensation Threshold Photoreceptor on T ln( I on /I d ) V p T ln( I p ) Since photoreceptor gain and threshold voltage both scale with absolute temperature T , it cancels out Nozaki, Delbruck 2017 (unpublished) 17
Integrated bias generator and circuit design enables operation over extended temperature range Nozaki, Delbruck 2017 (submitted) 18
DVS pixel size trend 350nm 350/180nm 180nm 90nm Global shutter APS Rolling shutter consumer APS https://docs.google.com/spreadsheets/d/1pJfybCL7i_wgH3qF8zsj1JoWMtL0zHKr9eygikBdElY/edit#gid=0 20
Event camera silicon retina developments DVS/DAVIS ATIS/CCAM DVS Commercial entities Inilabs (Zurich) – R&D prototoypes CeleX Insightness (Zurich) – Drones and Augmented Reality Samsung (S Korea) – Consumer electronics Pixium Vision (Paris) – Retinal implants Inivation (Zurich) – Industrial applications, Automotive Chronocam (Paris) - Automotive Hillhouse (Singapore) - Automotive 21
www.iniLabs.com Run as not-for-profit Founded 2009 Neuromorphic sensor R&D prototypes Open source software, user guides, app notes, sample data Shipped devices based on multiproject wafer silicon to 100+ organizations 22
• Dynamic Vision Sensor Silicon Retinas • Simple object tracking by algorithmic processing of events • Using probabilistic methods for state estimation • “Data - driven” deep inference with CNNs 23
Tracking objects from DVS events using spatio-temporal coherence 1. For each event, find nearest cluster • If event within a cluster, move cluster • If event not within cluster, seed new cluster 2. Periodically prune starved clusters, merge clusters, etc (lifetime mgmt) Advantages 1. Low computational cost (e.g. <5% CPU) 2. No frame memory (~100 bytes/object). 3. No frame correspondence problem Litzenberger 2007 24/30
Robo Goalie Delbruck et al, ISCAS 2007, Frontiers 2013 25
Using DVS allows 2 ms reaction time at 4% processor load with USB bus connections 26
This talk has 4 parts • Dynamic Vision Sensor Silicon Retinas • Simple object tracking by algorithmic processing of events • Using probabilistic methods for state estimation •“Data - driven” deep inference with CNNs 27
Simultaneous Mosaicing and Tracking with DVS Hanme Kim, A. Handa , … Andy J. Davison , BMVC 2014 . 28
Simultaneous Mosaicing and Tracking with DVS Hanme Kim, A. Handa, … Andy J. Davison , BMVC 2014 . 29
Goal: To do event-based, semi-dense visual odometry • We want to estimate State vector 𝑡 (camera pose, visual scene spatial brightness gradients and sensor event thresholds) using Bayesian filtering from the events 𝑓 : 𝑞 𝑡 𝑓 • Sensor likelihood 𝑞 𝑓 𝑡 is modeled as mixture of inlier Gaussian distribution and outlier uniform distribution • A tractable posterior 𝑟 𝑡 𝑓 ≈ 𝑞(𝑡|𝑓) is approximated by Kullback-Leibler (KL) divergence • Leads to closed-form update equations in the form of a classical Kalman filter, thus computationally efficient (unlike particle filtering) E. Mueggler D. Scaramuzza (submitted to PAMI, 2016) G. Gallego
Towards event-based, semi-dense SLAM: 6-DOF pose estimation G. Gallego et al. , PAMI (submitted 2016).
This talk has 4 parts • Dynamic Vision Sensor Silicon Retinas • Simple object tracking by algorithmic processing of events • Using probabilistic methods for state estimation • “Data - driven” deep inference with CNNs 32
Demo - RoShamBo 33
RoShamBo CNN architecture Conventional 5-layer LeNet with ReLU/MaxPool and 1 FC layer before output. 240x180 DVS “frames” MaxPool MaxPool MaxPool 2x2 MaxPool 2x2 2x2 2x2 64x6x6 32x14x14 128x2x2 16x30x30 Paper Scissors Rock Background Conv 3x3 Conv 3x3 Conv 3x3 32x28x28 128x4x4 64x12x12 Conv 5x5 Total 18MOp (~9M MAC) 64x64 16x60x60 DVS 2D rectified Conv 1x1 + histogram of MaxPool 2x2 Compute times: 2k events 128x1x1 (0.1Hz – 1kHz rate) On 150W Core i7 PC in Caffe: 2ms On 1W CNN accelerator on FPGA: 8ms I.-A. Lungu, F. Corradi , and T. Delbruck, “Live Demonstration: Convolutional Neural Network Driven by Dynamic Vision Sensor Playing RoShamBo,” in 2017 IEEE Symposium on Circuits and Systems (ISCAS 2017) , Baltimore, MD, USA, 2017.
RoShamBo training images I.-A. Lungu, F. Corradi , and T. Delbruck, “Live Demonstration: Convolutional Neural Network Driven by Dynamic Vision Sensor Playing RoShamBo,” in 2017 IEEE Symposium on Circuits and Systems (ISCAS 2017) , Baltimore, MD, USA, 2017.
A. Aimar, E. Calabrese, H. Mostafa, A. Rios-Navarro, R. Tapiador, I.-A. Lungu, A. Jimenez-Fernandez, F. Corradi, S.-C. Liu, A. Linares- Barranco, and T. Delbruck, “ Nullhop: Flexibly efficient FPGA CNN accelerator driven by DAVIS neuromorphic vision sensor,” in NIPS 2016 , Barcelona, 2016. 36
Conclusions 1. The DVS was developed by following a neuromorphic approach of emulating key properties of biological retinas 2. The wide dynamic range and sparse, quick output make these sensors useful in real time uncontrolled conditions 3. Applications could include vision prosthetics, surveillance, robotics and consumer electronics 4. The precise timing could improve learning and inference 5. The main challenges are to reduce pixel size and to develop effective algorithms. Only industry can do the first but academia has plenty of room to play for the second. 6. Event sensors can nicely drive deep inference. There is a lot of room for improvement of deep inference power efficiency at the system 37 level!
Recommend
More recommend