event-based motion segmentation by motion compensation Timo Stoffregen, Guillermo Gallego, Tom Drummond, Lindsay Kleeman, Davide Scaramuzza, ICCV 2019 presented by Ondrej Holesovsky, ondrej.holesovsky@cvut.cz The 5th of November 2019 Czech Technical University in Prague, CIIRC
outline 1. Background: event camera intro. 2. Addressed problem: motion segmentation. 3. Related work. 4. Proposed method. 5. Experimental findings, discussion. 1
event camera intro
event camera sensor principle Patrick Lichtsteiner et al., A 128x128 120 dB 15 µ s Latency Asynchronous Temporal Contrast Vision Sensor, IEEE Journal of Solid-State Circuits, 2008. • Each pixel is independent, no global or rolling shutter. • A pixel responds by events to changes in log light intensity. • Level-crossing sampling. 3
an event A contrast change detection event k: e k = [ x k , t k , s k ] • x k - pixel coordinates • t k - timestamp in seconds, microsecond resolution • s k - polarity, − 1 or + 1 This sensory representation usually requires ’re-inventing’ computer vision approaches. Alternative way: render videos from events. 4
a sample event sequence A ball rolling on the floor. 10 ms of events shown in an image plane view. 5
a sample event sequence A rolling ball captured in an XYT view, 10 ms of events. 6
a sample event sequence A rolling ball captured in an XYT view, 300 ms of events. 7
the problem: motion segmentation
event motion segmentation • Classify events into N l clusters, each representing a coherent motion with parameters θ j . • Clusters are three-dimensional (space-time coordinates). • Two objects sharing the same motion are segmented together. • Assume motion constancy: events processed in temporally short packets. • Chicken-and-egg: estimate motion of clusters, cluster events by motion. 9
related work
traditional cameras - sparse method Xun Xu et al., Motion Segmentation by Exploiting Complementary Geometric Models, CVPR 2018. • Assuming known keypoint correspondences (SIFT, corners...). • Geometric models: affine, homography, fundamental. • Spectral clustering at the core. Similarly moving tracked points should belong to the same partition of an affinity graph (motion hypothesis - feature). 11
traditional cameras - dense method Brox and Malik, Object Segmentation by Long Term Analysis of Point Trajectories, ECCV 2010. • Intensity constancy assumption. • Sparse translational point trajectories (3% of pixels): optical flow -> point tracking -> trajectory affinities -> spectral clustering. • Dense segmentation: variational label approximation (Potts model optimisation) on sparse trajectories and pixel colour. VGA at 1 FPS on a GPU. 12
event-based vs. traditional cameras and approaches • The presented approach is semi-dense - more than keypoints. • Assumptions: constant contrast vs. constant intensity. (Both invalid in general.) • Event-based could benefit from higher data efficiency. • High-speed, high dynamic range (HDR), low power. • Real-time: still difficult for both. 13
event-driven ball detection and gaze fixation in clutter A. Glover and C. Bartolozzi, IROS 2016. • Detecting and tracking a ball from a moving event camera. • Locally estimate normal flow directions by fitting planes to events. • Flow direction points to or from the circle centre, which directs the Hough transform. • Any motion but only circular objects. 14
independent motion detection with event-driven cameras V. Vasco et al., ICAR 2017. • An iCub robot head and camera move. • Detect and track corners among events and estimate their velocity. • Learn a model relating head joint velocities (from encoders) to corner velocities. • Independent corners are inconsistent (Mahalanobis distance) with the head joint velocities. • Any objects but need to know egomotion. 15
iwe - image of warped events Or motion-compensated event image. G. Gallego and D. Scaramuzza, Accurate Angular Velocity Estimation With an Event Camera, RAL 2016. • A rotating camera. Look at 2D event cloud projections. • Project along the motion trajectories -> edge structure revealed. • Events of a trajectory: same edge, same polarity*. • Consider the sum of polarities along a trajectory. • Number of trajectories = number of pixels... 16
iwe - method description 1a (simplified) An event image sums polarities along a trajectory. Discrete image coordinates x , continuous event coordinates x k : I ( x ) = ∑ N − 1 k = 0 s k f ( x , x k ) . • N - number of events in the cloud, within a small time interval. • f - bilinear interpolation function, ( x , x k ) �→ [ 0 , 1 ] . • Each event contributes to its four neighbouring pixels. • I ( x ) - sum of neighbourhood-weighted polarities of events firing at pixel location x . 17
iwe - method description 1b (paper notation) An event image sums polarities along a trajectory, continuous image coordinates x , continuous event coordinates x k : I ( x ) = ∑ N − 1 k = 0 s k δ ( x − x k ) . • δ - Dirac delta. • Need to integrate the image for meaningful values. • Naive pixelwise sums along the time axis -> motion blur! 18
iwe - method description 2 Idea: Maximise IWE sharpness by transforming the events to compensate for the motion. Iterative motion parameter optimisation: • Sharpness metric: variance of the IWE pixel values. • IWE variance and its derivatives w.r.t. motion parameters. • Update the motion parameters. Transform the event cloud. • A new IWE from the transformed event cloud. Repeat. 19
iwe - translational motion model example Event cloud transform equations with motion parameters v x , v y : x ′ k = x k + t k v x y ′ k = y k + t k v y It transforms all events to their spatial location at t ′ k = 0. 20
simultaneous optical flow and segmentation (sofas) using a Dynamic Vision Sensor, by Timo Stoffregen and Lindsay Kleeman, ACRA 2017. • Not easy to read. Rough method description: greedy sequential model fitting. • The number of local maxima of the contrast objective ideally matches the number of structures with distinct optical flow velocities. 21
the proposed method
solution summary • One IWE per motion cluster. Each with a different motion model. • Table of event-cluster associations. • Sharpness of the IWEs guides event segmentation. • Joint identification of motion models and associations. 23
event clusters • Event-cluster association p kj = P ( e k ∈ l j ) of event k being in cluster j. • P ≡ ( p kj ) is an N e × N l matrix with all event-cluster associations. Non-negative, rows add up to one. • Association-weighted IWE for cluster j: I j ( x ) = ∑ N e k = 1 p kj δ ( x − x ′ kj ) . x ′ kj is the warped event location. Note: ignoring polarity. 24
single objective to optimise Event alignment within cluster j measured by image contrast, such as the variance, 1 ∫ ( I j ( x ) − µ I j ) 2 d x , Var ( I j ) = | Ω | Ω µ I j is the mean of the IWE for cluster j over the image plane Ω . Find the motion parameters θ and the event-cluster associations P , such that the total contrast of all cluster IWEs is maximised. N l ( θ ∗ , P ∗ ) = argmax ( θ, P ) Var ( I j ) . ∑ j = 1 25
the solution - alternating optimisation Update the motion parameters of each event cluster. Associations are fixed. N l Var ( I j )) , ∑ θ ← θ + µ ∇ θ ( j = 1 µ ≥ 0 is the step size. Single gradient ascent step. Recompute event-cluster associations. Motion parameters are fixed. c j ( x ′ k ( θ j )) p kj = , ∑ N l i = 1 c i ( x ′ k ( θ i )) c j ( x ) ̸ = 0 is the local sharpness of the cluster j at pixel x , c j ( x ) . = I j ( x ) . 26
initialisation Greedy. Not crystal clear. • Start with equal associations. • Optimise the first cluster motion parameters. • Gradient g jk of the local contrast of each event w.r.t. motion parameters. • g kj negative -> the event k likely in the cluster j, p kj set high, low for other clusters. • Such event becomes blurred when moving away from the optimised parameters. • Repeat for the remaining clusters. 27
experimental findings
occlusion Mitrokhin’s 2018 Extreme Event Dataset (EED), ball behind a net. 29
low light, strobe light EED, lighting variation. 30
accuracy - bounding boxes Mitrokhin’s dataset: 31
accuracy - per event Using a photorealistic simulator. Textured pebbles, different relative velocities. Roughly 4 pixels of relative displacement to achieve 90% accuracy (true for any velocity). 32
throughput Complexity linear in the number of clusters N l , events N e , IWE pixels N p , iterations N it . O (( N e + N p ) N l N it ) . Optical flow warps, CPU 2.4 GHz, GPU GeForce 1080: Fast moving drone sequence: ca. 370 kevents/s. Ball behind net: ca. 1000 kevents/s. 33
different motion models Fan blades spinning at 1800 rpm and a falling coin. 34
street, facing the sun 35
non-rigid objects 36
number of clusters If set too large, the clusters not needed end up empty. 5 × OF 10 × OF 5 × OF + 5 × Rotation 37
Recommend
More recommend