One the Role and Impact of the Metaparameters in t-distributed - PowerPoint PPT Presentation

One the Role and Impact of the Metaparameters in t-distributed Stochastic Neighbor Embedding John A. Lee and Michel Verleysen Machine Learning Group Université catholique de Louvain Louvain-la-Neuve, Belgium michel.verleysen@uclouvain.be

Motivation Motivation for nonlinear dimensionality reduction • High-dimensional data are – difficult to represent – difficult to understand – difficult to analyze • Motivation # 1: – To visualize data living in a d -dimensional space ( d > 3) • Motivation # 2: – Models (regression, classification, clustering) based on high-dimensional data suffer from the curse of dimensionality – Need to reduce the dimension of data while keeping information content! Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 2

Motivation Visualization • These are data • It is difficult to see something… annual increase (% ), infant mortality (‰ ), illiteracy ratio (% ), school attendance (% ), GIP, annual GIP increase (% ) Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 3

Motivation Visualization • These are the same data • under different visualization paradigms • possible to see groups, relations, outliers, … Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 4

Motivation Not all NLDR methods perform equally ! Geodesic NLM CDA Isomap Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 5

Motivation Stochastic Neighbor Embedding • SNE and t-SNE are nowadays considered as ‘good’ methods for NDLR • Examples t-SNE MDS From: L. Van der Maaten & G. Hinton, Visualizing Data using t- SNE, Journal of Machine Learning Research 9 (2008) 2579-2605 Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 6

Motivation Stochastic Neighbor Embedding • SNE and t-SNE are nowadays considered as ‘good’ methods for NDLR • Examples t-SNE MDS From: L. Van der Maaten & G. Hinton, Visualizing Data using t- SNE, Journal of Machine Learning Research 9 (2008) 2579-2605 Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 7

NDLR: a historical perspective Outline • NDLR: a historical perspective – stress function – intrusion and extrusions – geodesic distances • SNE and t-SNE – algorithm – gradient – transformed distances • Experiments – with Euclidean distances – with geodesic distances • Conclusions Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 8

NDLR: a historical perspective → Stress function From MDS to more general cost functions • MDS follows the idea of ( ) δ = − y y 2 ∑ ij i j δ 2 − 2 min d where ij ij = − d x x X < i j ij i j • Extension: ( ) 2 Breakthrough # 1 ∑ δ 2 − 2 min w d ij ij ij X < i j to give more importance to Traditional « stress » function: ( ) – small distances 2 ∑ δ − min w d – close data ij ij ij X < i j – … Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 9

NDLR: a historical perspective → Intrusions and extrusions Limitations of linear projections • Even simple manifolds can be poorly projected Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 10

NDLR: a historical perspective → Intrusions and extrusions Limitations of linear projections • Even simple manifolds can be poorly projected • Points originally far from eachother are projected close: this is an intrusion Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 11

NDLR: a historical perspective → Intrusions and extrusions Nonlinear projections • Goal: to unfold, rather than to project (linearly) Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 12

NDLR: a historical perspective → Intrusions and extrusions Nonlinear projections • Goal: to unfold, rather than to project (linearly) • Intrusions can be hopefully decreased, but extrusions could appear Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 13

NDLR: a historical perspective → Intrusions and extrusions The user’s point of view • Favouring intrusions or extrusions is related to the application (user’s point of view) • General way of handling the compromise: δ     d ( ) ij ij     = λ + − λ w f 1 f     Breakthrough # 2 ij σ σ     allows intrusions allows extrusions • Nowadays, few methods acknowledge this need for a trade-off ! Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 14

NDLR: a historical perspective → Geodesic distances Geodesic distances • Goal: to measure distances along the manifold Breakthrough # 3 • Such distances are more easily preserved Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 15

NDLR: a historical perspective → Geodesic distances Geodesic and graph distances 2-d data Approximation of Geodesic distance • Geodesic distances: finding the shortest way between data along the manifold Problem: the manifold is unknown → approximate it by a graph • • It exists efficient algorithms for finding shortest paths • The graph can be built by connecting data in a k -neighborhood, or in a ε -ball Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 16

NDLR: a historical perspective Distance preservation methods Euclidean Geodesic distances in distances in HD space HD space 2 N ( ) ( ) ( ) ∑ = − E d i , j d i , j Metric MDS Isomap y x = i , j 1 ( ) ( ) ( ) 2 − N d i , j d i , j ∑ y x = Favors Sammon Geodesic E ( ) NLM d i , j intrusions NLM NLM = y i 1 < i j N ( ) ( ) ( ) ( ( ) ) ∑ 2 = − Favors E d i , j d i , j F d i , j λ CCA y x x CCA CDA = extrusions i 1 < i j Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 17

NDLR: a historical perspective Distance preservation methods Euclidean Geodesic distances in distances in HD space HD space 2 N ( ) ( ) ( ) ∑ = − E d i , j d i , j Metric MDS Isomap y x = Computational load ↓ i , j 1 Performances ↓ ( ) ( ) ( ) 2 − N d i , j d i , j ∑ y x = Favors Sammon Geodesic E ( ) NLM d i , j intrusions NLM NLM = y i 1 < i j Computational load ↑ N ( ) Performances ↑ ( ) ( ) ( ( ) ) ∑ 2 = − Favors E d i , j d i , j F d i , j λ CCA y x x CCA CDA = extrusions i 1 < i j Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 18

SNE and t-SNE Outline • NDLR: a historical perspective – stress function – intrusion and extrusions – geodesic distances • SNE and t-SNE – algorithm – gradient – transformed distances • Experiments – with Euclidean distances – with geodesic distances • Conclusions Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 19

SNE and t-SNE → Algorithm SNE and t-SNE • In the original space, the similarity between y i and y j is defined as =  0 if i j ( )     −  δ λ 2 ( ) g ( ) u   λ =   ij i  = p otherwise g u exp ( ) i   j i   ∑ δ λ  g 2     ik i  ≠ k i • Similarities are not symmetric (individual widths) ! • p j | i is the empirical probability of y j to be a neighbor of y i Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 20

SNE and t-SNE → Algorithm SNE and t-SNE • In the original space, the similarity between y i and y j is defined as =  0 if i j ( )     −  δ λ 2 ( ) g ( ) u   λ =   ij i  = p otherwise g u exp ( ) i   j i   ∑ δ λ  g 2     ik i  ≠ k i • Similarities are not symmetric (individual widths) ! • p j | i is the empirical probability of y j to be a neighbor of y i Individuals widths λ i : set (individually) through a global « perplexity » • parameter ( ) H p j i = 2 PPXT Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 21

SNE and t-SNE → Algorithm SNE and t-SNE • In the embedding space, the similarity between x i and x j is defined as =  + 0 if i j   n 1 ( ) − 2      2 ( ) t d , n ( ) u =   ij    = + q n otherwise t u , n 1 ( ) ij   ∑    t d , n n     kl    ≠ k l • Similarities are symmetric • t ( u,n ) is proportional to a Student t with n degrees of freedom ( n controls the thickness of the tail) SNE: n → ∞ • t-SNE: n = 1 Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 22

One the Role and Impact of the Metaparameters in t-distributed - PowerPoint PPT Presentation

One the Role and Impact of the Metaparameters in t-distributed Stochastic Neighbor Embedding John A. Lee and Michel Verleysen Machine Learning Group Universit catholique de Louvain Louvain-la-Neuve, Belgium michel.verleysen@uclouvain.be

District 211 One-to-One Program One-to-One: Program Background 2012-2013 2016-2017 One-to-One

Dual Language Immersion Middle School Programming One Team. One Mission. One Rock Hill. Welcome!

The Role & Responsibilities of Tourist The Role & Responsibilities of Tourist The Role

Chapter 6 Role of capital Role of population growth Role of other production factors:

Peter Sharp Regeneration Project Manager, Lewes District Council One District One Council LDC

Lesson 2 Greek Vocabulary One does not equal five!!! One does not equal five!!! One does not

Education Deb Ascher Barnstone Professor of Architecture UTS ECONOMIC IMPACT CULTURAL IMPACT

The Role of Preventive Diplomacy The Role of Preventive Diplomacy The Role of Preventive

Security of Voting Systems Ronald L. Rivest MIT CSAIL Given at: GWU Computer Science Dept.

Security of Voting Systems Ronald L. Rivest MIT CSAIL 6.857 Spring 2015 L21 April 27, 2015

Some Thoughts on Electronic Voting Ronald L. Rivest MIT CSAIL DIMACS Voting Workshop May 26,

Research s role in helping society cope with high impact weather events High Impact Weather

DEVELOPMENT RESEARCH AND IMPACT: A FEW THOUGHTS Rachel M. Gisselquist, Research Fellow What is

Rheumatologists Researchers Volunteers Webinars The impact of covid-19 Charity impact and

Impact and Pathways to It John A Clark What is impact Impact is the demonstrable contribution

B Impact Assessment Training Presentation Spring 2014 The B Impact Assessment Project Want to

CY 2021 Hospice Benefit Component Payment Methodology Office of the Actuary, CMS Center for

Green Industrial Policies: Trade and Public Policy Larry Karp Megan Stevenson January 2012 Karp

Principles and Patterns 26 February, 2020 Recap Principles Patterns Inheritance Anti-patterns

MEIC FIRESIDE CHAT May, 2019 FORWARD-LOOKING STATEMENTS This presentation contains

CONSTRAINT -BASED DIFFERENTIAL PRIVACY Releasing Optimal Power Flow Benchmarks Privately

INTEGRATION STATUS John Weigand Tanya Levshina Project History 2 Started as part of the

Reformulation Heuristics for Generalized Interdiction Problems M. Fischetti 1 M. Monaci 2 M. Sinnl

Algorithms and lower bounds for de-Morgan formulas of low- communication leaf gates Sajin Koroth

One the Role and Impact of the Metaparameters in t-distributed - PowerPoint PPT Presentation

One the Role and Impact of the Metaparameters in t-distributed Stochastic Neighbor Embedding John A. Lee and Michel Verleysen Machine Learning Group Universit catholique de Louvain Louvain-la-Neuve, Belgium michel.verleysen@uclouvain.be

District 211 One-to-One Program One-to-One: Program Background 2012-2013 2016-2017 One-to-One

Dual Language Immersion Middle School Programming One Team. One Mission. One Rock Hill. Welcome!

The Role &amp; Responsibilities of Tourist The Role &amp; Responsibilities of Tourist The Role

Chapter 6 Role of capital Role of population growth Role of other production factors:

Peter Sharp Regeneration Project Manager, Lewes District Council One District One Council LDC

Lesson 2 Greek Vocabulary One does not equal five!!! One does not equal five!!! One does not

Education Deb Ascher Barnstone Professor of Architecture UTS ECONOMIC IMPACT CULTURAL IMPACT

The Role of Preventive Diplomacy The Role of Preventive Diplomacy The Role of Preventive

Security of Voting Systems Ronald L. Rivest MIT CSAIL Given at: GWU Computer Science Dept.

Security of Voting Systems Ronald L. Rivest MIT CSAIL 6.857 Spring 2015 L21 April 27, 2015

Some Thoughts on Electronic Voting Ronald L. Rivest MIT CSAIL DIMACS Voting Workshop May 26,

Research s role in helping society cope with high impact weather events High Impact Weather

DEVELOPMENT RESEARCH AND IMPACT: A FEW THOUGHTS Rachel M. Gisselquist, Research Fellow What is

Rheumatologists Researchers Volunteers Webinars The impact of covid-19 Charity impact and

Impact and Pathways to It John A Clark What is impact Impact is the demonstrable contribution

B Impact Assessment Training Presentation Spring 2014 The B Impact Assessment Project Want to

CY 2021 Hospice Benefit Component Payment Methodology Office of the Actuary, CMS Center for

Green Industrial Policies: Trade and Public Policy Larry Karp Megan Stevenson January 2012 Karp

Principles and Patterns 26 February, 2020 Recap Principles Patterns Inheritance Anti-patterns

MEIC FIRESIDE CHAT May, 2019 FORWARD-LOOKING STATEMENTS This presentation contains

CONSTRAINT -BASED DIFFERENTIAL PRIVACY Releasing Optimal Power Flow Benchmarks Privately

INTEGRATION STATUS John Weigand Tanya Levshina Project History 2 Started as part of the

Reformulation Heuristics for Generalized Interdiction Problems M. Fischetti 1 M. Monaci 2 M. Sinnl

Algorithms and lower bounds for de-Morgan formulas of low- communication leaf gates Sajin Koroth

The Role & Responsibilities of Tourist The Role & Responsibilities of Tourist The Role