Distance in data space Notion of distance (metrics) in data space - - PowerPoint PPT Presentation

β–Ά
distance in data space notion of distance metrics in data
SMART_READER_LITE
LIVE PREVIEW

Distance in data space Notion of distance (metrics) in data space - - PowerPoint PPT Presentation

Fundamentals of AI Introduction and the most basic concepts Distance in data space Notion of distance (metrics) in data space Who is my closest neighbor? Euclidean distance Shape of the 2D sphere, R=1 Euclidean distance Euclidean distance


slide-1
SLIDE 1

Introduction and the most basic concepts

Fundamentals of AI

Distance in data space

slide-2
SLIDE 2

Notion of distance (metrics) in data space

Who is my closest neighbor?

slide-3
SLIDE 3

Euclidean distance

Shape of the 2D sphere, R=1

slide-4
SLIDE 4

Euclidean distance

Euclidean distance is the most fundamental distance because physical world is locally Euclidean (with rather large locality radius!) Data space is not obliged to be Euclidean metric space Duality connections between Euclidean distance and Normal (Gaussian) distribution Duality connections between Euclidean distance and linear regression, principal components Euclidean distance is sometimes denoted as L2-norm or L2-metric

slide-5
SLIDE 5

Metric acsioms

slide-6
SLIDE 6

L1-distance

Shape of the 2D sphere, R=1 𝐸 𝒒, 𝒓 = ෍

𝑗=1 𝑙

|π‘žπ‘™ βˆ’ π‘Ÿπ‘™|

slide-7
SLIDE 7

L1-distance

L1-distance is not rotationally invariant! Shape of the 2D sphere, R=1 𝐸 𝒒, 𝒓 = ෍

𝑗=1 𝑙

|π‘žπ‘™ βˆ’ π‘Ÿπ‘™|

a

slide-8
SLIDE 8

Lp-distance

  • p = 2, Euclidean distance
  • p = 1, L1-distance
  • p = ∞, max-distance
  • p < 1 –fractional (pseudo)metrics,

violates the triangle acsiom! If a distance acsiom is not satisfied better use word dissimilarity instead of distance or metric!

Shape of the spheres 𝐸 𝒒, 𝒓 =

π‘ž

෍

𝑗=1 𝑙

|π‘žπ‘™ βˆ’ π‘Ÿπ‘™|π‘ž

slide-9
SLIDE 9

Correlation dissimilarity ***

*** do not mix with distance correlation, dCor!

Definition of Pearson coefficient, -1 <= Corr <= 1 Correlation dissimilarity = (1 - Corr(X,Y))/2 > 0 also Absolute correlation dissimilarity = 1 - |Corr(X,Y)| > 0

slide-10
SLIDE 10

Cosine similarity and Angular distance

CosSim(A,B)

slide-11
SLIDE 11

Distance matrix

  • Non-negative, symmetric
  • Convenient for searching neighbours
  • Inconvenient to store cause the number
  • f elements grows quadratically:

100000 * 100000 * 2 bytes (float16 size) = 20 Gb of RAM

slide-12
SLIDE 12

k Nearest Neighbor (kNN) graph

slide-13
SLIDE 13

k Nearest Neighbor (kNN) graph is directed!

Asymmetry

In higher-dimensional spaces, asymmetry of kNN graphs increases This can lead to hubness (points which are neigbours

  • f many (>>k) other points)

Hubness might be detrimental for machine learning methods based on kNN graphs

slide-14
SLIDE 14

Mutual Nearest Neigbours (MNN) graph

Mutually Nearest Neigbours

slide-15
SLIDE 15

Mutual Nearest Neigbours (MNN) graph

Matching objects in two datasets

slide-16
SLIDE 16

Mutual Nearest Neigbours (MNN) graph

Matching objects in two datasets

Match Mismatch

slide-17
SLIDE 17

Metric learning

  • Example: learn the distance function from labeled data

By choosing distance: Make red lines closer! Make blue lines more distant!

Label Green Label Blue Label Orange

slide-18
SLIDE 18

Dimensionality curse, measure concentration

Point neighborhood in multidimensional space

  • f radius e*D, e << 1 , where D = mean distance between points

Low-dimensional case High-dimensional case

When number of features >> number of objects When the intrinsic dimension of the data > log2(number of objects)