Discriminant Hypothesis for Visual Saliency and its Applications in Computer Vision Dashan Gao Joint work with Nuno Vasconcelos Statistical Visual Computing Laboratory Department of Electrical and Computer Engineering, University of California, San Diego SVCL
What is visual saliency? • certain image features that attract visual attention (Yarbus, 1967) (Treisman & Gormican, 1988) SVCL
What is known about saliency? • Bottom-up (BU) saliency – stimulus-driven mechanism, fast – goal independent – e.g. traffic signs • Top-down (TD) saliency – goal-driven mechanism, slower – provides informative locations to the specific task Yarbus 1967: 1. Free viewing 2. Estimate the economic level of the people 3. Judge their age SVCL
In computer vision • saliency is widely used in visual recognition systems, as a pre-processing stage – a sparse image representation – reduces computation – example: weakly supervised learning of object categories Sivic et. al 2003 Fergus et. al 2003 SVCL
however, in computer vision • most saliency definitions are universal, divorced from the recognition problem – repeatability (stability) [ Forstner(94), Harris-Stephens(88), shi-Tomasi(94), Mikolajczyk(01,04)] – continuity (curvature) [ Sha’ashua-Ullman(88), Asada-Brady(86)] – complexity (information content) [ Kadir-Brady(01),Sebe-Lew(03)] – rarity (low probability) [ Walker et al.(98), Oliva et al.(03), Bruce-Tsotsos(05)] • in result, – detected salient locations may not be very informative for recognition. Harris detection SVCL
Discriminant saliency • Hypothesis: saliency is a discrim inant process • requires a stimulus of interest, and a null hypothesis of stimuli that are not salient – context dependent: salient attributes depend on the object of interest and the context (null hypothesis) • Definition: salient features are those that best distinguish the given visual concept from the null hypothesis (NIPS 2004) SVCL
Infomax feature selection • solution: features that maximize the mutual information between the features ( X ) and the class label (Y) • under a constraint of computational parsimony • we use marginal mutual information, X = { X 1 , … , X n } (more on this later) SVCL
Top-down Discriminant Saliency Model Original Feature Set Discriminant Faces Background Feature Selection Training Testing Salient Features Scale Selection W j Saliency Map WTA Malik-Perona pre-attentive perception model [ M-K90] SVCL
Qualitative evaluation • discriminant saliency more correlates with target objects Original images Saliency Maps by Discriminant Saliency Salient locations by Discriminant Saliency Scale Saliency Detector [K-B 01] Harris Saliency Detector [H-S 88] SVCL
Visual classification • how informative are the salient points? – evaluated by measuring actual recognition rates – classifier: histogram of saliency values, fed to an SVM for a presence/ absence test – compared with two standard saliency detectors, and two bench marks 1 00 9 5 DiscSa l- DCT 9 0 Sca le Sa lie ncy 8 5 Ha rr is Sa lie ncy 8 0 7 5 I m a ge Pix e l 7 0 Conste lla tion 6 5 6 0 F M A a o c i r t e o p s l r a b n i k e e s s SVCL
Repeatability test SSD: Kadir-Brady 2001 HarrLap/ HesLap: Mikolajczyk 2004 Mser: Matas et al. 2004 • robustness to various transformations 100 100 DSD scale+ rotation 90 90 SSD HarrLap HesLap 80 80 Mser 70 70 repeatability (%) repeatability (%) 60 60 50 50 40 40 30 30 JPEG DSD 20 20 SSD com pression HarrLap 10 10 HesLap Mser 0 0 2 2.5 3 3.5 4 4.5 5 5.5 6 2 2.5 3 3.5 4 4.5 5 5.5 6 increasing scale changes increasing blur 100 100 view angle 90 90 80 80 70 70 repeatability (%) repeatability (%) 60 60 50 50 40 40 30 30 DSD DSD 20 blurring 20 SSD SSD HarrLap HarrLap 10 10 HesLap HesLap Mser Mser 0 0 2 2.5 3 3.5 4 4.5 5 5.5 6 2 2.5 3 3.5 4 4.5 5 5.5 6 increasing JPEG compression increasing viewpoint angle SVCL
Other research work based on Top- down Discriminant Saliency • a hierarchical model learning of complex features and detectors for classification Background Faces Feature Complexity Control Initial Complex Discriminant Object Saliency Map & Feature Set Feature Feature Detection Salient Locations Generation Selection Salient (complex) Features New Complex Feature Set (CVPR 2005) SVCL
Bayesian Integration • Advantage – the trade-off between high selectivity (by TD) and high accuracy (by BU) • Probabilistic formulation of salient locations – saliency output: probability distributions of saliency locations over the image plane • Saliency as Bayesian inference – BU saliency -> prior, TD saliency -> likelihood – inference from the two observations: both accuracy and selectivity -> a posterior SVCL
Results • better locations Integration BU TD • better selectivity (classification accuracy) SVCL
Applications • Automated gathering of training examples SVCL
Applications • Region-of-interest (ROI) based Image compression Normal JPEG ROI compression In low bit rate Joint work with Sunhyoung Han SVCL
Discriminant saliency hypothesis • these results are encouraging, but how do we evaluate the hypothesis as a whole? • two fundamental questions – can it explain biological saliency? – can it drive both bottom-up and top-down saliency? • bottom-up saliency particularly interesting – bottom-up visual pathway much better understood than its top- down counterpart • motivated us to study bottom-up discriminant saliency SVCL
Bottom-up discriminant saliency • Recall: bottom-up saliency – stimulus-driven mechanism – saliency detection on a single image • Center-surround mechanism P(x|center) 0 W l P(x| surround) l 1 W l x X: features, Y: { center, surround} SVCL
Joint feature distribution • band-pass features exhibit regular patterns of response to natural images – bow-tie shaped conditional distributions (Buccigrossi & Simoncelli, 1999; Huang & Mumford, 1999) – although fine details of feature dependency may vary from scene to scene, coarse structure follows a universal law for all classes – feature dependencies are not informative about image classes 0 top: three images. bottom: conditional P(x i | x j ) histogram of the same coefficient, conditioned on the value of its parent. SVCL
Joint feature distribution • enables the approximation of mutual information by the sum of marginal mutual information (Vasconcelos & Vasconcelos, 2004) discriminant info of features dependencies discriminant info of individual features – all complexity is encoded in the second term • from a computational standpoint, this is extremely simple to compute (computational parsimony) SVCL
Generalized Gaussian density (GGD) • the marginal distributions of natural image features follow a generalized Gaussian density (GGD) For β = 1, a Laplace distribution, and a Gaussian when β = 2 0 0 10 10 Histogram Histogram GGD −1 −1 10 10 GGD −2 −2 10 10 P(X) P(X) −3 −3 10 10 −4 −4 10 10 −5 −5 10 10 −0.2 −0.1 0 0.1 0.2 −2 −1 0 1 2 X X Examples of GGD fit for responses of two Gabor filters SVCL
In summary Feature decomposition • combining principles of Orientation Intensity Color (R/G, B/Y) Feature maps – infomax organization – computational parsimony – neural tuning to stimulus statistics • leads to a very simple Discriminant measure saliency operator Feature saliency • which is approximately maps optimal in the minimum error probability sense Σ SVCL
Biological plausibility • discriminant saliency can be implemented with a three-layer neural network ... 0 { −0.2 | . | β 0 W 0 l Σ −0.4 φ (x) −0.6 φ′ (x) ... −10 −5 −1.2 0 1.2 5 10 x | . | β 0 ψ [ x j , Φ 0 ] + S(l) differential + Σ φ (.) Σ Σ T l simple cell x j - g[ x j ] I l (X k ,Y) H(Y) ψ [ x j , Φ 1 ] | . | β 1 ... { Σ complex cortical W 1 l | . | β 1 cell columns ... Layer 2 Layer 3 Layer 1 SVCL
Single vs. conjunctive feature search Find a bar different from all others discriminant saliency prediction SVCL
Asymmetries in visual search Time(Find a “Q” among “O”s) < Time(Find a “O” among “Q”s) presence of a feature absence of a feature search time saliency prediction # of distractors SVCL
Distractor heterogeneity (relevant dimension) • Saliency perception is nonlinear and affected by heterogeneous distractors if the heterogeneity is in the same dimension bg= 0 o bg= 10 o bg= 20 o Discriminant saliency Nothdurft (1993) 3.5 bg=0 bg=10 3 bg=20 2.5 Relative Saliency 2 1.5 1 0.5 0 0 10 20 30 40 50 60 70 80 90 Orientation contrast (deg) SVCL
Prediction of human eye fixations on natural images (qualitative) SVCL H S
Applications in motion saliency • discriminant saliency is combined with motion field – optical flow – removes the camera motion Joint work with Vijay Mahadevan (NIPS 2007) SVCL
Recommend
More recommend