Filters and other potions P. Perona - Caltech MIT - 21 November 2013
what ? where
Architectures
Architecture 1 building train The vision black box Marble Ripe torso bananas Image(s) Grouping: image regions Surface shape, motor scene depth, spatial relationships, Feature extraction: 3D motion texture stereo disparity color contrast motion flow Perceptual edgels organization: cognition Recognition, …. 2.5D sketch: surface properties boundaries, junctions, foregrnd, bckgrnd Objects, verbs, Image processing categories… [Marr ’82] Regions and surfaces
features? Le Corbusier, Villa Savoye http://flickr.com/photos/ikura/1398271367/
edges Le Corbusier, Villa Savoye http://www.iit.edu/~stawraf/perspx.jpg
Architecture 2 [Fukushima ‘80]
[DeValois ’85]
Column
Hypercolumn
Dense sampling
translation, rotation invariance [LeCun et al. 1998]
scale invariance [Lowe 2004]
translation, rotation, scale invariance [Hinton et al. ’12]
96 filters 6 orientations 2 center-surround 14 scale samples over 2.2 binary octaves
Detection Performance Caltech pedestrians: 1M frames, 250K hand-annotated
Detection Performance
Detection Performance Viola & Jones ‘01 * Dalal-Triggs ‘05 Walk et al. ‘10 Dollar et al. ‘10 Dollar et al. ‘08
filter technology
Scale, orientation, elongation…. lots of CPU cycles
how do we make computations efficient?
Separability X R ( i, j ) = k ( h, k ) I ( i − h, j − k ) X X k ( h ) k 0 ( k ) I ( i − h, j − k ) R ( i, j ) = h =1: M,k =1: N h =1: M k =1: N Cost = m + n Cost = m x n [Adelson & Bergen, ’85]
Separability and decomposition [Adelson & Bergen, ’85]
Steerability [Freeman & Adelson, ’91]
General decomposition D X k ( x, θ ) = b i ( θ ) f i ( x ) i =1 D X k ( x, y ) = f i ( x ) g i ( y ) i =1 D X k ( x, y ; θ ) = b i ( θ ) f i ( x ) g i ( y ) i =1
Design?
θ θ D b i ( θ ) σ i,i = k ( x ; θ ) f i ( x ) x x A = USV T
Approximation D X K ( x, y ; θ ) = b i ( θ ) f i ( x, y ) i =1 R X K ( x, y ; θ ) ≈ b i ( θ ) f i ( x, y ) R ⌧ D i =1
[Perona ’95]
[Perona ’95]
[Perona ’95]
Tensor Factorization D X k ( x, y ; θ ) = b i ( θ ) f i ( x ) g i ( y ) i =1 • Not a convex problem • Gradient descent [Shy, Perona ’96]
Including scale by resampling
[Manduchi et al. ’98] [cfr. Simoncelli et al]
Exploiting Image Statistics
sampling the gradient original upsampled
[Dollar et al. 2013]
Gradient histograms [Dollar et al. 2013]
Power law feature scaling
Power law feature scaling
Individual images [Dollar et al. 2013]
Fast computations
Fast computations [Dollar et al. 2013]
Performance [Dollar et al. 2013]
Conclusions • Filtering front-end • Need fine sampling of scale, orientation, … • Scalable, separable and steerable approximations • Exploiting image statistics to extrapolate • Fast and accurate detection
Recommend
More recommend