Distance in data space Notion of distance (metrics) in data space - PowerPoint PPT Presentation

May 09, 2023 •331 likes •518 views

Fundamentals of AI Introduction and the most basic concepts Distance in data space Notion of distance (metrics) in data space Who is my closest neighbor? Euclidean distance Shape of the 2D sphere, R=1 Euclidean distance Euclidean distance

Fundamentals of AI Introduction and the most basic concepts Distance in data space
Notion of distance (metrics) in data space Who is my closest neighbor?
Euclidean distance Shape of the 2D sphere, R=1
Euclidean distance Euclidean distance is the most fundamental distance because physical world is locally Euclidean (with rather large locality radius!) Data space is not obliged to be Euclidean metric space Duality connections between Euclidean distance and Normal (Gaussian) distribution Duality connections between Euclidean distance and linear regression, principal components Euclidean distance is sometimes denoted as L2-norm or L2-metric
Metric acsioms
L1-distance Shape of the 2D sphere, R=1 𝑙 |𝑞 𝑙 − 𝑟 𝑙 | 𝐸 𝒒, 𝒓 = ෍ 𝑗=1
L1-distance Shape of the 2D sphere, R=1 a 𝑙 |𝑞 𝑙 − 𝑟 𝑙 | 𝐸 𝒒, 𝒓 = ෍ 𝑗=1 L1-distance is not rotationally invariant!
Lp-distance Shape of the spheres 𝑙 𝑞 |𝑞 𝑙 − 𝑟 𝑙 | 𝑞 𝐸 𝒒, 𝒓 = ෍ 𝑗=1 • p = 2, Euclidean distance • p = 1, L1-distance • p = ∞, max -distance • p < 1 – fractional (pseudo)metrics, violates the triangle acsiom! If a distance acsiom is not satisfied better use word dissimilarity instead of distance or metric!
Correlation dissimilarity *** Definition of Pearson coefficient, -1 <= Corr <= 1 Correlation dissimilarity = (1 - Corr(X,Y))/2 > 0 also Absolute correlation dissimilarity = 1 - |Corr(X,Y)| > 0 *** do not mix with distance correlation, dCor!
Cosine similarity and Angular distance CosSim( A , B )
Distance matrix • Non-negative, symmetric • Convenient for searching neighbours • Inconvenient to store cause the number of elements grows quadratically: 100000 * 100000 * 2 bytes (float16 size ) = 20 Gb of RAM
k Nearest Neighbor (kNN) graph
k Nearest Neighbor (kNN) graph is directed! In higher-dimensional spaces, asymmetry of kNN graphs increases Asymmetry This can lead to hubness (points which are neigbours of many (>>k) other points) Hubness might be detrimental for machine learning methods based on kNN graphs
Mutual Nearest Neigbours (MNN) graph Mutually Nearest Neigbours
Mutual Nearest Neigbours (MNN) graph Matching objects in two datasets
Mutual Nearest Neigbours (MNN) graph Matching objects in two datasets Mismatch Match
Metric learning • Example: learn the distance function from labeled data Label Orange By choosing distance: Make red lines closer! Make blue lines more Label Green distant! Label Blue
Dimensionality curse, measure concentration Point neighborhood in multidimensional space of radius e *D, e << 1 , where D = mean distance between points High-dimensional Low-dimensional case case When number of features >> number of objects When the intrinsic dimension of the data > log2(number of objects)

Recommend

Distance Metrics Mark Voorhies 4/5/2018 Mark Voorhies Distance Metrics List tricks Adding

Distance Metrics Mark Voorhies 4/5/2018 Mark Voorhies Distance Metrics List tricks Adding data to a list: m y l i s t = [ ] m y l i s t . append (3) m y l i s t += [4 ,5 ,6] Mark Voorhies Distance Metrics List tricks Adding data to a

426 views • 28 slides

Distance Metrics Mark Voorhies 4/27/2017 Mark Voorhies Distance Metrics Anatomy of a

Distance Metrics Mark Voorhies 4/27/2017 Mark Voorhies Distance Metrics Anatomy of a Programming Language Mark Voorhies Distance Metrics Anatomy of a Programming Language def f(x,y): f(x) return x*y from math import sqrt functions Mark

625 views • 26 slides

Distance Education Distance education used to be about the distance. 1700s 1800s 1900s 2000s

Distance Distance Education Distance Education Distance education used to be about the distance. 1700s 1800s 1900s 2000s Distance Education Distance Education Education A Course is a course. Good Teaching is good teaching.

579 views • 37 slides

Distance Metrics Mark Voorhies 5/14/2015 Mark Voorhies Distance Metrics New verbs f u n c t i

Distance Metrics Mark Voorhies 5/14/2015 Mark Voorhies Distance Metrics New verbs f u n c t i o n ( parameter1 , parameter2 ) : def Do t h i s ! # Code to do t h i s return r e t u r n v a l u e Mark Voorhies

523 views • 27 slides

What we learned from Community Metrics Agenda Why are metrics used? How metrics are used

What we learned from Community Metrics Agenda Why are metrics used? How metrics are used in two Open Source communities Common pitfalls and ways to avoid them Why use Metrics? Transparency Who is contributing to the

166 views • 14 slides

Performance Metrics for Graph Mining Tasks 1 Outline Introduction to Performance Metrics

Performance Metrics for Graph Mining Tasks 1 Outline Introduction to Performance Metrics Supervised Learning Performance Metrics Unsupervised Learning Performance Metrics Optimizing Metrics Statistical Significance

1.19k views • 40 slides

AGENCY OPERATIONS METRICS The Metrics of Me The Metrics of Me x 159 13,006 5 days old books

Dont trust your gut AGENCY OPERATIONS METRICS The Metrics of Me The Metrics of Me x 159 13,006 5 days old books read so far in 2018 The Metrics of Me 6.3 4.5 4 0 hours of sleep hours in dance class hours cooking hours commuting

549 views • 20 slides

Proposal Metrics Dashboard What Gets Measured Gets Done Topics Why Keep Metrics? What

Proposal Metrics Dashboard What Gets Measured Gets Done Topics Why Keep Metrics? What Metrics Should We Keep? What is the Easiest Way to Collect Metrics? What is the Easiest Way to Report Metrics? Tips and Tricks to Building a

580 views • 16 slides

Mark-recapture distance sampling (MRDS) in Distance 7.1 Setting up Distance for MRDS

Mark-recapture distance sampling (MRDS) in Distance 7.1 Setting up Distance for MRDS Setting up a Distance project for MRDS Data requirements MRDS analyses Setting up Distance You need a copy of R installed on your computer

364 views • 10 slides

Our new bisannual cycle focused on the notion of 'Progress' In current parlance, the notion of

Our new bisannual cycle focused on the notion of 'Progress' In current parlance, the notion of progress commonly refers to Advancement to a further or higher stage, or to further or higher stages successively; growth; development, usually to a

161 views • 4 slides

Concurrent Strategies Glynn Winskel The notion of deterministic/nondeterministic strategy is

Concurrent Strategies Glynn Winskel The notion of deterministic/nondeterministic strategy is potentially as fundamental as the notion of function/relation . The notion needs to be developed in sufficient generality. Two-party concurrent games:

454 views • 42 slides

Software Metrics Alex Boughton Executive Summary What are software metrics? Why are

Software Metrics Alex Boughton Executive Summary What are software metrics? Why are software metrics used in industry, and how? Limitations on applying software metrics A framework to help refine and understand which

445 views • 33 slides

Astheno-Khler and strong KT General results metrics Bismut connection Definition of strong KT

Astheno-Khler and strong KT metrics Anna Fino Astheno-Khler and strong KT General results metrics Bismut connection Definition of strong KT and astheno-Khler metrics Link with balanced metrics Link with standard metrics Holonomy of

777 views • 50 slides

NDCs and metrics Andrei Marcu , Director, ERCST 1 NDCs and metrics Main issues: - Which metrics

ERCST August 26-27, 2019 | Santiago Chile NDCs and metrics Andrei Marcu , Director, ERCST 1 NDCs and metrics Main issues: - Which metrics are used in the NDCs? - Trends with respect to single/multiple metric NDCs - Are there trends in

617 views • 12 slides

Metrics are Pivotal A NATIONAL FARM TO INSTITUTION METRICS COLLABORATIVE WEBINAR Local

Local Procurement: Metrics are Pivotal A NATIONAL FARM TO INSTITUTION METRICS COLLABORATIVE WEBINAR Local Procurement: Metrics are Pivotal NFtI Metrics Collaborative Webinar Dr. Lilian Brislen University of Kentucky The goals of this

442 views • 20 slides

Metrics and Estimation Rahul Premraj + Andreas Zeller 1 Metrics Quantitative measures that

These slides are based on Pressman, Chapter 15 Product Metrics, Chapter 22 Metrics for Process and Projects and Chapter 23 Estimation Metrics and Estimation Rahul Premraj + Andreas Zeller 1 Metrics Quantitative measures

624 views • 28 slides

Quantale-valued dissimilarity Lili Shen (joint with Hongliang Lai, Yuanye Tao and Dexue Zhang)

Quantale-valued dissimilarity Lili Shen (joint with Hongliang Lai, Yuanye Tao and Dexue Zhang) School of Mathematics, Sichuan University Edinburgh, 12 July 2019 Lili Shen (Sichuan University) Quantale-valued dissimilarity Edinburgh, 12 July

927 views • 56 slides

Making A Many-Colored Processing Engine: Signal Processing with Optical Filters Christi K. Madsen

Making A Many-Colored Processing Engine: Signal Processing with Optical Filters Christi K. Madsen Texas A&M University cmadsen@ee.tamu.edu Outline: The Toolbox Optical filter theory & architectures The Engine Practical

705 views • 41 slides

Numerical dispersion and Linearized Saint-Venant Equations M. Ersoy Basque Center for Applied

Numerical dispersion and Linearized Saint-Venant Equations M. Ersoy Basque Center for Applied Mathematics 11 November 2010 Outline of the talk Outline of the talk 1 Introduction 2 The Saint-Venant equations 3 Dispersion relations for the

559 views • 32 slides

Small-scale galaxy dynamics: the pairwise velocity dispersion Jon Loveday University of Sussex

Small-scale galaxy dynamics: the pairwise velocity dispersion Jon Loveday University of Sussex Outline RSD overview Galaxy pairwise velocity dispersion (PVD) - why measure it? GAMA data and mocks Ways of measuring PVD Results:

494 views • 20 slides

Proximity based one-class classification with Common N-Gram dissimilarity for authorship

Proximity based one-class classification with Common N-Gram dissimilarity for authorship verification task PAN 2013 Author Identification Magdalena Jankowska, Vlado Keelj and Evangelos Milios Faculty of Computer Science, Dalhousie University,

707 views • 23 slides

Is this NE tagger getting old? Language Resources and Evaluation Conference Marrakech, Morocco -

Outline Introduction Corpus Analysis NER Performance Analysis Experiments Final Remarks Is this NE tagger getting old? Language Resources and Evaluation Conference Marrakech, Morocco - May 28th - 30th 2008 Cristina Mota and Ralph Grishman

276 views • 23 slides

Fast and Accurate Distance Computation from Unaligned Genomes Fabian Kltzl & Bernhard

Fast and Accurate Distance Computation from Unaligned Genomes Fabian Kltzl & Bernhard Haubold GCB 2018 MPI for Evolutionary Biology, Pln mpg.png ACCGGTGTGCT ACCGGTGTGCT >D AACGATGCG-T >C CACGTT--GGT >B AACGTTGTGCA

631 views • 16 slides

How to Optimize Gower Distance Weights for the k-Medoids Clustering Algorithm to Obtain Mobility

How to Optimize Gower Distance Weights for the k-Medoids Clustering Algorithm to Obtain Mobility Profiles of the Swiss Population Alperen Bektas and Ren Schumann HES-SO Valais / Wallis The 6th Swiss Conference on Data Science Bern, 14 th of

265 views • 15 slides