P ATTERN R ECOGNITION FROM O NE E XAMPLE BY C HOPPING Franc ¸ois Fleuret EPFL – LCN / CVLAB Gilles Blanchard Fraunhofer – FIRST 1
R ECOGNITION FROM O NE E XAMPLE Given a single training example, find the same object in the test images: ? If we average the test on a large number of trials, an equivalent formulation is: given two images I 1 and I 2 , are they showing the same object ? 2
➊ Learning invariance with a large number of objects ➋ Recognizing from one example = ? No object is common to ➊ and ➋ 3
Remark ➋ Non-generative approach, no explicit model of the space of deformations ➋ Proof of concept 4
D ATABASES ➊ The COIL-100 database (100 objects, 72 images of each) ➋ Our L T EX symbol database (150 symbols, 1,000 images of A each) 5
B OOLEAN F EATURES We denote by I the image space and by f 1 , . . . , f K a set of binary features f k : I → { 0 , 1 } . Each one is a disjunction of a simple edge-detectors of orientation d over a rectangular areas ( x 0 , y 0 , x 1 , y 1 ) . (x,y) d=0 d=1 d=2 d=3 d=4 d=5 d=6 d=7 No invariance to 3D transformation, moderate invariance to scaling, rotation and translation. 6
S PLITS We denote by X an image (random variables on I ) and C its class (random variables on { 1 , . . . , M } ). We call split a mapping ψ : I → { 0 , 1 } which splits the set of objects in two equilibrated halves: • P ( ψ ( X ) = 0) = 1 2 • P ( ψ ( X ) = 0 | C ) is 0 or 1 7
Let C 1 and C 2 denote the classes of two images X 1 and X 2 , with an equilibrated prior P ( C 1 = C 2 ) = 1 2 . • P ( C 1 = C 2 | ψ ( X 1 ) = ψ ( X 2 )) ≃ 1 2 • P ( C 1 = C 2 | ψ ( X 1 ) � = ψ ( X 2 )) = 0 With several independent splits, we could do a very good job. 8
C HOPPING P RINCIPLE We can easily build independent splits on the training objects and we can extend them to the whole set I with machine learning methods. 9
C HOPPING We consider arbitrary splits of the training object set S 1 , . . . , S N , and extend them to I by training predictors L 1 , . . . , L N : ∀ n, L n : I → R Those learners are feature-selection + linear perceptron without threshold. 10
S 1 L =0 1 11
S 2 L =0 2 12
C OMBINING S PLITS To predict if two images show the same object, we estimate how many splits keep them together. The algorithm relies on the split predictors and takes into account their estimated reliability. 13
S PLIT P REDICTOR R ELIABILITY Since we have lot of images of the training objects, we can use a validation set to estimate P ( L n | S n ) 0.2 0.2 Negative class Negative class Positive class Positive class 0.15 0.15 0.1 0.1 0.05 0.05 0 0 -4000 -3000 -2000 -1000 0 1000 2000 3000 4000 -4000 -3000 -2000 -1000 0 1000 2000 3000 4000 Response Response It makes sense to model P ( L n | S n = s ) as a Gaussian. 14
P REDICTION WITH ONE SPLIT 0.002 1.2 1 0.8 0.0015 0.6 0.4 0.2 0.001 0 4000 3000 2000 0.0005 1000 -4000 0 -3000 -2000 -1000 -1000 0 -2000 1000 -3000 2000 0 3000 -4000 -4000 -3000 -2000 -1000 0 1000 2000 3000 4000 4000 0.002 1.2 1 0.8 0.0015 0.6 0.4 0.2 0.001 0 4000 3000 2000 0.0005 1000 -4000 0 -3000 -2000 -1000 -1000 0 -2000 1000 -3000 2000 0 3000 -4000 -4000 -3000 -2000 -1000 0 1000 2000 3000 4000 4000 P ( C 1 = C 2 | L 1 n , L 2 P ( L n | S n = 0) and P ( L n | S n = 1) n ) 15
The rule is similar with several splits under reasonable assumptions of conditional independence: L 2 S 2 S 1 L 1 1 1 1 1 ❅ � ❤ ✭✭ L 2 S 2 ❤ ❅ � S 1 L 1 C 2 C 1 2 2 2 2 ✓ ❙ . . . . . . . . . . . . ✓ ❙ ✓ ❙ L 2 S 2 S 1 L 1 N N N N 16
F INAL R ULE We have log P ( C 1 = C 2 | L 1 , L 2 ) log P ( L 1 , L 2 | C 1 = C 2 ) P ( L 1 , L 2 | C 1 � = C 2 ) + log P ( C 1 = C 2 ) = P ( C 1 � = C 2 | L 1 , L 2 ) P ( C 1 � = C 2 ) If we denote by α j i = P ( S j i = 1 | L j i ) , we end up with the following expression log P ( C 1 = C 2 | L 1 , L 2 ) � α 1 i α 2 i + (1 − α 1 i )(1 − α 2 � � P ( C 1 � = C 2 | L 1 , L 2 ) = log i ) + ρ i 17
R EMARKS ➊ Splits correctly learnt are balanced, thus optimally informative ➋ Splits which are “unlearnable” are naturally ignored in the Bayesian formulation since P ( S = 1 | L = l ) does not depend on l 18
S MART C HOPPING An arbitrary split can label differently very similar objects. We can improve performance by getting rid of objects difficult to learn, and re-building the predictor. 19
R ESULTS We compare: ➊ Chopping with one example and several numbers of splits ➋ Smart chopping with one example and several numbers of splits ➌ Classical learning with several numbers of positive examples ➍ Direct learning of the similarity with a perceptron 20
Number of samples for multi-example learning 1 2 4 8 16 32 0.6 Chopping Smart chopping Multi-example learning 0.5 Similarity learnt directly Test errors (LaTeX symbols) 0.4 0.3 0.2 0.1 0 1 2 4 8 16 32 64 128 256 512 1024 Number of splits for chopping 21
Number of samples for multi-example learning 1 2 4 8 16 32 0.6 Chopping Smart chopping Multi-example learning 0.5 Similarity learnt directly Test errors (COIL-100) 0.4 0.3 0.2 0.1 0 1 2 4 8 16 32 64 128 256 512 1024 Number of splits for chopping 22
W HY DOES IT WORK ? We are inferring functionals which are somehow arbitrary on the training examples. However, we can expect that the training objects provide an exhaustive dictionary of invariant parts, even though they are not an exhaustive dictionary of the combined parts. Note that since splits are built independently, we avoid over-fitting when their number increases. 23
R ELATION WITH ANN S The Chopping structure can be seen as a one-hidden layer ANN with shared weights and an ad hoc output layer. If we define ∆( α, β ) = log ( α β + (1 − α )(1 − β )) , we have Σ ∆ ∆ ∆ Splits Σ Σ Σ Σ Σ Σ Image 1 features Image 2 features Can we globally learn the shared weights ? 24
Franc ¸ois Fleuret EPFL – LCN / CVLAB francois.fleuret@epfl.ch http://cvlab.epfl.ch/~fleuret Pattern Recognition from One Example by Chopping Franc ¸ois Fleuret and Gilles Blanchard NIPS 2005 25
Recommend
More recommend