SLIDE 4 21/05/12 ¡ 4 ¡
Combining individual distances
n This approach computes individual attribute distances and then
combine them
n A combination formula, proposed by Gower, is
q The distance dist(xi,xj) is between 0 and 1 q r is the number of attributes q q dij
f is the distance contributed by attribute f, in the range [0,1]
!ij
f =
1 if xif and x jf are not missing 0 if xif or x jf is missing 0 if attribute f is asymmetric and xif and x jf are both 0 ! " # # $ # #
∑ ∑
= =
=
r f f ij f ij r f f ij j i
d dist
1 1
) , ( δ δ x x
(4)
Combining individual distances (cont …)
n If f is a binary or nominal attribute
q distance (4) reduces to
n
equation (3)-lect 10 if all attributes are nominal
n
the simple matching distance (1)-lect 10 if all attributes are symmetric binary
n
the Jaccard distance (2)-lect 10 if all attributes are asymmetric n If f is interval-scaled
q Rf is the value range of f q If all the attributes are interval-scaled, distance (4) reduces to
Manhattan distance
n
Assuming that all attributes values are standardized n Ordinal and ratio-scaled attributes are converted to
interval-scaled attributes and handled in the same way
dij
f =
1 if xif ! x jf 0 otherwise " # $ % $ dij
f = xif ! x jf
Rf Rf = max( f )! min( f )