Lecture 2 AdaBoost and Cascade Structure (with a case on face detection) Lin ZHANG, PhD School of Software Engineering Tongji University Fall 2020 Lin ZHANG, SSE, 2020
Any faces contained in the image? Who are they? Lin ZHANG, SSE, 2020
Overview • Face recognition problem – Given a still image or video of a scene, identify or verify one or more persons in this scene using a stored database of facial images Lin ZHANG, SSE, 2020
Overview • Face identification Lin ZHANG, SSE, 2020
Overview • Face verification Lin ZHANG, SSE, 2020
Overview • Applications of face detection&recognition Intelligent surveillance Lin ZHANG, SSE, 2020
Overview • Applications of face detection&recognition Hong Kong—Luohu, border control E-channel Lin ZHANG, SSE, 2020
Overview • Applications of face detection&recognition National Stadium, Beijing Olympic Games, 2008 Lin ZHANG, SSE, 2020
Overview • Applications of face detection&recognition Check on work attendance Lin ZHANG, SSE, 2020
Overview • Applications of face detection&recognition Smile detection: embedded in most modern cameras Lin ZHANG, SSE, 2020
Overview • Why is face recognition so difficult? • Intra-class variance and inter-class similarity Images of the same person Lin ZHANG, SSE, 2020
Overview • Why is face recognition so difficult? • Intra-class variance and inter-class similarity Images of twins Lin ZHANG, SSE, 2020
Overview Who are they? Lin ZHANG, SSE, 2020
Overview-General Architecture Lin ZHANG, SSE, 2020
Introduction • Identify and locate human faces in an image regardless of their • Position • Scale • Orientation • pose (out-of-plane rotation) • illumination Lin ZHANG, SSE, 2020
Introduction Where are the faces, if any? Lin ZHANG, SSE, 2020
Introduction • Why face detection is so difficult? Lin ZHANG, SSE, 2020
Introduction • Appearance based methods • Train a classifier using positive (and usually negative) examples of faces • Representation: different appearance based methods may use different representation schemes • Most of the state-of-the-art methods belong to this category The most successful one: Viola-Jones method! VJ is based on AdaBoost classifier Lin ZHANG, SSE, 2020
AdaBoost (Adaptive Boosting) • It is a machine learning algorithm [1] • AdaBoost is adaptive in the sense that subsequent classifiers built are tweaked in favor of those instances misclassified by previous classifiers • The classifiers it uses can be weak, but as long as their performance is slightly better than random they will improve the final model [1] Y. Freund and R.E. Schapire, "A Decision-Theoretic Generalization of on-Line Learning and an Application to Boosting", Journal of Computer and System Sciences,1995 Lin ZHANG, SSE, 2020
AdaBoost (Adaptive Boosting) • AdaBoost is an algorithm for constructing a ”strong” classifier as a linear combination of simple weak classifiers, T = ∑ α f x ( ) h x ( ) t t = t 1 • Terminology • h t ( x ) is a weak or basis classifier • H ( x )=sgn( f ( x )) is the final strong classifier Lin ZHANG, SSE, 2020
AdaBoost (Adaptive Boosting) • AdaBoost is an iterative training algorithm, the stopping criterion depends on concrete applications • For each iteration t h x ( ) – A new weak classifier is added based on the current t training set – Modify the weight for each training sample; the weight for h x ( ) the sample being correctly classified by will be t h x ( ) reduced, while the sample being misclassified by will t be increased Lin ZHANG, SSE, 2020
AdaBoost (algorithm for binary classification) Given: y ∈ − + ( x y , ),( x , y ),...,( x , y ) { 1, 1} • Training set , where 1 1 2 2 m m i = D i 1 ( ) 1/ m Initialize weights for samples For t = 1: T Train weak classifiers based on training set and the D t m ∑ [ ] ε = ≠ h D i ( ) h x ( ) y find the best weak classifier with error t t t t i i = ε ≥ i 1 0.5 if , stop; t ( ) ( ) α = − ε ε 0.5ln 1 / set ( ) t t t − α D i ( )exp y h x ( ) = t t i t i D ( ) i update weights for samples + t 1 Denom Outputs the final classifier, T ∑ = α H x ( ) sgn h x ( ) t t = t 1 Lin ZHANG, SSE, 2020
AdaBoost—An Example (0.1) 10 training samples (0.1) (0.1) (0.1) Weak classifiers: vertical or (0.1) horizontal lines (0.1) (0.1) D 1 Initial weights for samples = = D i 1 ( ) 0.1, i 1~10 (0.1) (0.1) Three iterations (0.1) Lin ZHANG, SSE, 2020
AdaBoost—An Example After iteration one (0.1) Get the weak classifier h 1 ( x ) (0.1) (0.1) (0.1) ε = 0.3 1 (0.1) − ε 1 1 ln (0.1) (0.1) α = = D 1 1 0.4236 1 ε 2 1 (0.1) (0.1) (0.1) h 1 ( x ) update weights (0.1667) (0.0714) (0.1667) (0.1667) (0.0714) D 2 (0.0714) (0.0714) (0.0714) (0.0714) (0.0714) Lin ZHANG, SSE, 2020
AdaBoost—An Example After iteration 2 (0.1667) Get the weak classifier h 2 ( x ) (0.0714) (0.1667) ε = (0.1667) 0.2142 2 (0.0714) D 2 − ε 1 1 ln (0.0714) (0.0714) α = = 2 0.6499 2 ε 2 2 (0.0714) (0.0714) (0.0714) h 2 ( x ) update weights (0.1060) (0.0454) (0.1060) (0.1060) (0.1667) D 3 (0.0454) (0.1667) (0.0454) (0.0454) (0.1667) Lin ZHANG, SSE, 2020
AdaBoost—An Example After iteration 3 (0.1060) Get the weak classifier h 3 ( x ) (0.0454) (0.1060) h 3 ( x ) (0.1060) ε = 0.1362 3 (0.1667) D 3 (0.0454) − ε 1 1 ln (0.1667) α = = 3 0.9236 3 ε 2 3 (0.0454) (0.0454) (0.1667) H x = ( ) sgn + 0.6499 + 0.9236 0.4236 Now try to classify the 10 samples using H ( x ) Lin ZHANG, SSE, 2020
Viola-Jones face detection • VJ face detector [1] • Harr-like features are proposed and computed based on integral image ; they act as “weak” classifiers • Strong classifiers are composed of “weak” classifiers by using AdaBoost • Many strong classifiers are combined in a cascade structure which dramatically increases the detection speed [1] P. Viola and M.J. Jones, “Robust real-time face detection", IJCV, 2004 Lin ZHANG, SSE, 2020
Harr features • Compute the difference between the sums of pixels within two (or more) rectangular regions Example Harr features shown relative to the enclosing face detection window Lin ZHANG, SSE, 2020
Harr features • Integral image • The integral image at location ( x, y ) contains the sum of all the pixels above and to the left of x , y , inclusive: = ∑ ' ' ii x y ( , ) i x y ( , ) ≤ ≤ x ' x y , ' y where i ( x , y ) is the original image • By the following recurrence, the integral image can be computed in one pass over the original image = − + s x y ( , ) s x y ( , 1) i x y ( , ) = − + ii x y ( , ) ii x ( 1, ) y s x y ( , ) where s ( x , y ) is the cumulative row sum, s ( x , -1) = 0, and ii (-1, y ) = 0 Lin ZHANG, SSE, 2020
Harr features • Haar feature can be efficiently computed by using integral image B A x 1 x 2 D C x 3 x 4 original image i ( x , y ) integral image ii ( x , y ) Actually, = ii ( x ) A 1 = + = + − − ii ( x ) A B D ii ( x ) ii ( x ) ii ( x ) ii ( x ) 2 4 1 2 3 = + ii ( x ) A C 3 = + + + ii ( x ) A B C D 4 Lin ZHANG, SSE, 2020
Harr features • Haar feature can be efficiently computed by using integral image x 2 x 3 x 2 x 3 x 1 x 1 B A x 4 x 5 x 6 x 4 x 6 x 5 original image i ( x , y ) integral image ii ( x , y ) How to calculate A-B in integral image? How? Lin ZHANG, SSE, 2020
Harr features • Given a detection window, tens of thousands of Harr features can be computed • One Harr feature is a weak classifier to decide whether the underlying detection window contains face < θ 1, pf x ( ) p = − h x f p t ( , , , ) 1, otherwise where x is the detection window, f defines how to compute the Harr feature on window x , p is 1 or -1 to make the inequalities have a θ unified direction, is a threshold θ • f can be determined in advance; by contrast, p and are determined by training, such that the minimum number of examples are misclassified Lin ZHANG, SSE, 2020
Harr features The first and second best Harr features. The first feature measures the difference in intensity between the region of the eyes and a region across the upper cheeks. The feature capitalizes on the observation that the eye region is often darker than the cheeks. The second feature compares the intensities in the eye regions to the intensity across the bridge of the nose. Lin ZHANG, SSE, 2020
From weak learner to stronger learner • Any single Harr feature (thresholded single feature) is quite weak on deciding whether the underlying detection window contains face or not • Many Harr features (weak learners) can be combined into a strong learner by using Adaboost • However, the most straightforward technique for improving detection performance, adding more features to the classifier, directly increases computation cost Construct a cascade classifier Lin ZHANG, SSE, 2020
Recommend
More recommend