9.54 class 4 Supervised learning � Shimon Ullman + Tomaso Poggio Danny Harari + Daneil Zysman + Darren Seibert 9.54, fall semester 2014
Intro 9.54, fall semester 2014
An old and simple model of supervised learning � associate b to a and store: Z φ b,a ( x ) = b ∗ a = b ( ξ ) a ( x − ξ ) d ξ retrieve output b from input a — if a � a ⇡ a Z a � φ b,a ( x ) = a ( τ ) φ b,a ( τ + x ) d τ ⇡ a 9.54, fall semester 2014
An old and simple model of supervised learning � when X φ ( x ) = b i ∗ a i a j � a i ⇡ δ i,j retrieve output b from input a — if a j � φ ⇡ b j It is a special case… � 9.54, fall semester 2014
Linear 9.54, fall semester 2014
“Linear” learning � Suppose x i ∈ R n and y i ∈ R m , i = 1 , · · · , N Define ( x 1 , · · · , x N ) = X and ( y 1 , · · · , y N ) = Y Find linear operator (eg a matrix) such that MX = Y 9.54, fall semester 2014
“Linear” learning � If X − 1 exists, then M = Y X − 1 MX = Y ⇒ = If X − 1 does not exists, then M = Y X † MX = Y ⇒ = where the pseudo inverse is the solution of min || MX − Y || F X p with | a i,j | 2 ) || A || F = ( i,j X † = ( X T X ) − 1 X T and if X is full column rank 9.54, fall semester 2014
“Linear” learning is linear regression � If e.g. the output y is scalar, then m = 1 ⇒ y = m T x = X Mx = y m i x i = i with M = XY − 1 9.54, fall semester 2014
Nonlinear 9.54, fall semester 2014
Nonlinear learning � Suppose x i ∈ R n and y i ∈ R m , i = 1 , · · · , N Define ( x 1 , · · · , x N ) = X and ( y 1 , · · · , y N ) = Y Find operator N such that N � X = Y In general impossible but…assume N is in the class of polynomial mappings of degree k in the vector space V (over the real field)…eg N has a convergent Taylor series expansion Weierstrass theorem ensures approximation of any continuous function 10
Nonlinear learning � Y = L o + L 1 ( X ) + L 2 ( X, X ) + ... + L k ( X, ..., X ) f(x) is a polynomial with all monomials as in this 2D example y = a 1 x 1 + a 2 x 2 + b 1 x 2 1 + b 12 x 1 x 2 + · · · 11
Classification and Regression 9.54, fall semester 2014
13
y = sign ( Mx )
In our language: is L 1 enough?
XOR function y = sign ( L 1 x + L 2 ( x, x )) = sign ( a 1 u 1 + a 2 u 2 + bu 1 u 2 ) = sign ( u 1 u 2 ) is in fact enough. This corresponds to a universal, one-hidden layer network output layer all monomials input variables
A few non-standard remarks • Regression is king, Gauss knew everything… � • Perhaps no need of multiple layers…are 2 layers universal? � • An interesting junction here RBFs MLPs
Radial Basis Functions 9.54, fall semester 2014
Nonlinear learning � Later we will see that RBF expansions are a good approximation of functions in high dimensions: N c k e − || x k − x || 2 X k =1 • RBF can be written as a 1-hidden layer network � � � • RBF is a rewriting of our polynomial (infinite ∞ x k − x || 2 n x k − x || 2 = || ˆ radius of convergence) X e || ˆ n ! n =0
Memory-based computation c i e − || x − xi || 2 X X f ( x ) = c i G ( x, x i ) = 2 σ 2 i i The training set is ( x 1 , · · · , x N ) = X and ( y 1 , · · · , y N ) = Y � e − || x − xi || 2 Suppose now that : then it is a → δ ( x − x i ) 2 σ 2 memory, a lookup table ( if x = x i y, f ( x ) = 0 , if x 6 = x i 9.54, fall semester 2014
Memory-based computation Of course learning is much more than memory but in this model the difference is between a Gaussian and a delta function 9.54, fall semester 2014
c i e − || x − xi || 2 X X f ( x ) = c i G ( x, x i ) = 2 σ 2 i i From Learning-from-Examples to View-based Networks for Object Recognition Σ VIEW ANGLE Poggio, Edelman Nature , 1990.
Recording Sites in Anterior IT Logothetis, Pauls, and Poggio, 1995
Garfield 9.54, fall semester 2014
Image Analysis ⇒ Bear (0° view) ⇒ Bear (45° view)
Image Synthesis UNCONVENTIONAL GRAPHICS Θ = 0° view ⇒ Θ = 45° view ⇒
34
Hyperbf 9.54, fall semester 2014
36
Cartooon male 9.54, fall semester 2014
A toy problem: Gender Classification
Brunelli, Poggio ’91 (IRST, MIT)
An example: HyperBF and gender classification Some of the geometrical feature (white) used in the gender classification experiments
HyperBF and gender classification Typical stimuli used in the (informal!) psychophysical experiments of gender classification (about 90% correct)
Figure 3: Feature weights for gender classification as computed by the HyperBF networks
Radial Basis Functions and MLPs 9.54, fall semester 2014
Sigmoidal units are radial basis functions (for normalized inputs) � || x − w || 2 = || x || 2 + || w || 2 − 2( x · w ) Since If || x || = 1 ( x · w ) = 1 + || w || 2 − || x − w || 2 2 σ ( w · x + b ) is a radial function and thus Consider the MLP units 1 σ ( x · w − θ ) = 1 + e − ( x · w − θ )
Sigmoidal units are radial basis functions (for normalized inputs) � The corresponding radial function is
Sigmoidal units are radial basis functions (for normalized inputs) �
Recommend
More recommend