Principal Components Analysis (PCA) Prof. Mike Hughes Many - PowerPoint PPT Presentation

Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2020f/ Principal Components Analysis (PCA) Prof. Mike Hughes Many ideas/slides attributable to: Liping Liu (Tufts), Emily Fox (UW) Matt Gormley (CMU) 2

What will we learn? Supervised Learning Data Examples Performance { x n } N measure Task n =1 Unsupervised Learning summary data of x x Reinforcement Learning Mike Hughes - Tufts COMP 135 - Fall 2020 3

Task: Embedding Supervised Learning x 2 Unsupervised Learning embedding Reinforcement x 1 Learning Mike Hughes - Tufts COMP 135 - Fall 2020 4

Dim. Reduction/Embedding Unit Objectives • Goals of dimensionality reduction • Reduce feature vector size (keep signal, discard noise) • “Interpret” features: visualize/explore/understand • Common approaches • Principal Component Analysis (PCA) • word2vec and other neural embeddings • Evaluation Metrics • Storage size - Reconstruction error • “Interpretability” Mike Hughes - Tufts COMP 135 - Fall 2020 5

Example: 2D viz. of movies Mike Hughes - Tufts COMP 135 - Fall 2020 6

Example: Genes vs. geography Nature, 2008 Mike Hughes - Tufts COMP 135 - Fall 2020 7

Centering the Data Goal: each feature’s mean = 0.0 Mike Hughes - Tufts COMP 135 - Fall 2020 8

<latexit sha1_base64="dUHKmRLUMswF0x+NPSnxqRI0fg=">AB/3icbVDLSsNAFJ34rPUVFdy4GSyCq5JUQTdC0Y3LCvYBTQiT6aQdOnkwcyMtMQt/xY0LRdz6G+78G6dtFtp64MLhnHu59x4/EVyBZX0bS8srq2vrpY3y5tb2zq65t9ScSopa9JYxLjE8UEj1gTOAjWSQjoS9Y2x/eTPz2A5OKx9E9jBPmhqQf8YBTAlryzENnQCBzgI3AD7JR7vEcX+HQMytW1ZoCLxK7IBVUoOGZX04vpmnIqCKNW1rQTcjEjgVLC87KSKJYQOSZ91NY1IyJSbTe/P8YlWejiIpa4I8FT9PZGRUKlx6OvOkMBAzXsT8T+vm0Jw6WY8SlJgEZ0tClKBIcaTMHCPS0ZBjDUhVHJ9K6YDIgkFHVlZh2DPv7xIWrWqfVat3Z1X6tdFHCV0hI7RKbLRBaqjW9RATUTRI3pGr+jNeDJejHfjY9a6ZBQzB+gPjM8fH62WJw=</latexit> Constant Reconstruction model ˆ x i = m m Parameters: m, an F-dim vector Training problem: Minimize reconstruction error N ( x n − m ) T ( x n − m ) X min m ∈ R F n =1 This is squared error between two vectors Optimal parameters: m ∗ = mean( x 1 , . . . x N ) Think of mean vector as optimal “reconstruction” of a dataset if you must use a single vector Mike Hughes - Tufts COMP 135 - Fall 2020 9

Mean reconstruction original reconstructed Mike Hughes - Tufts COMP 135 - Fall 2020 10

Linear Reconstruction and Principal Component Analysis Mike Hughes - Tufts COMP 135 - Fall 2020 11

Linear Projection to 1D Mike Hughes - Tufts COMP 135 - Fall 2020 12

Reconstruction from 1D to 2D Mike Hughes - Tufts COMP 135 - Fall 2020 13

2D Orthogonal Basis If we could project into 2 dims (same as F), we can perfectly reconstruct Mike Hughes - Tufts COMP 135 - Fall 2020 14

Which 1D projection is best? Idea: Minimize reconstruction error Mike Hughes - Tufts COMP 135 - Fall 2020 15

<latexit sha1_base64="zytQD0ua0ZeTvtQCu7rVyhs+2Jg=">ACGXicbVDLSsNAFJ3UV62vqEs3g0UQhJUQTdC0Y3LCvYBTQmT6aQdOpmEmYlaQ37Djb/ixoUiLnXl3zhpo2jrgYEz59zLvfd4EaNSWdanUZibX1hcKi6XVlbX1jfMza2mDGOBSQOHLBRtD0nCKCcNRUj7UgQFHiMtLzhea3romQNORXahSRboD6nPoUI6Ul17ScAVKJEyA18PzkNk1dCk/h9/8mhXdaOPgRgtQ1y1bFGgPOEjsnZCj7prvTi/EcUC4wgxJ2bGtSHUTJBTFjKQlJ5YkQniI+qSjKUcBkd1kfFkK97TSg34o9OMKjtXfHQkKpBwFnq7MNpTXib+53Vi5Z90E8qjWBGOJ4P8mEVwiwm2KOCYMVGmiAsqN4V4gESCsdZkmHYE+fPEua1Yp9WKleHpVrZ3kcRbADdsE+sMExqIELUAcNgME9eATP4MV4MJ6MV+NtUlow8p5t8AfGxf1vKDg</latexit> Linear Reconstruction Model with 1 components ˆ x i = w z i + m Fx1 F x 1 1 x 1 F x 1 High-dim. Weights Low-dim “mean” data embedding vector or “score” Mike Hughes - Tufts COMP 135 - Fall 2020 16

<latexit sha1_base64="+9+RCl1NQLq0x56CkST31Y4T5uY=">ACAHicbVDLSsNAFJ34rPUVdeHCzWARXJWkCropFAVxWcE+oE3DZDph85MwsxEKSEbf8WNC0Xc+hnu/BunbRbaeuDC4Zx7ufeIGZUacf5tpaWV1bX1gsbxc2t7Z1de2+/qaJEYtLAEYtkO0CKMCpIQ1PNSDuWBPGAkVYwup74rQciFY3EvR7HxONoIGhIMdJG8u3Drkq4n4ZVN+vdwEc/7FVgFbrQt0tO2ZkCLhI3JyWQo+7bX91+hBNOhMYMKdVxnVh7KZKaYkayYjdRJEZ4hAakY6hAnCgvnT6QwROj9GEYSVNCw6n6eyJFXKkxD0wnR3qo5r2J+J/XSXR46aVUxIkmAs8WhQmDOoKTNGCfSoI1GxuCsKTmVoiHSCKsTWZFE4I7/IiaVbK7lm5cndeql3lcRTAETgGp8AF6AGbkEdNAGXgGr+DNerJerHfrY9a6ZOUzB+APrM8fKFmUzw=</latexit> <latexit sha1_base64="zytQD0ua0ZeTvtQCu7rVyhs+2Jg=">ACGXicbVDLSsNAFJ3UV62vqEs3g0UQhJUQTdC0Y3LCvYBTQmT6aQdOpmEmYlaQ37Djb/ixoUiLnXl3zhpo2jrgYEz59zLvfd4EaNSWdanUZibX1hcKi6XVlbX1jfMza2mDGOBSQOHLBRtD0nCKCcNRUj7UgQFHiMtLzhea3romQNORXahSRboD6nPoUI6Ul17ScAVKJEyA18PzkNk1dCk/h9/8mhXdaOPgRgtQ1y1bFGgPOEjsnZCj7prvTi/EcUC4wgxJ2bGtSHUTJBTFjKQlJ5YkQniI+qSjKUcBkd1kfFkK97TSg34o9OMKjtXfHQkKpBwFnq7MNpTXib+53Vi5Z90E8qjWBGOJ4P8mEVwiwm2KOCYMVGmiAsqN4V4gESCsdZkmHYE+fPEua1Yp9WKleHpVrZ3kcRbADdsE+sMExqIELUAcNgME9eATP4MV4MJ6MV+NtUlow8p5t8AfGxf1vKDg</latexit> Linear Reconstruction Model with 1 components ˆ x i = w z i + m W is a vector on unit circle. Magnitude is Problem: “Over-parameterized”. Too many possible solutions! always 1. Suppose we have an alternate model with weights w’ and embedding z’ We would get equivalent reconstructions if we set: • w’ = w * 2 • z’ = z / 2 F Solution: Constrain magnitude of w. X w 2 f = 1 w is a unit vector. We care about direction, not scale. f =1 Mike Hughes - Tufts COMP 135 - Fall 2020 17

<latexit sha1_base64="zytQD0ua0ZeTvtQCu7rVyhs+2Jg=">ACGXicbVDLSsNAFJ3UV62vqEs3g0UQhJUQTdC0Y3LCvYBTQmT6aQdOpmEmYlaQ37Djb/ixoUiLnXl3zhpo2jrgYEz59zLvfd4EaNSWdanUZibX1hcKi6XVlbX1jfMza2mDGOBSQOHLBRtD0nCKCcNRUj7UgQFHiMtLzhea3romQNORXahSRboD6nPoUI6Ul17ScAVKJEyA18PzkNk1dCk/h9/8mhXdaOPgRgtQ1y1bFGgPOEjsnZCj7prvTi/EcUC4wgxJ2bGtSHUTJBTFjKQlJ5YkQniI+qSjKUcBkd1kfFkK97TSg34o9OMKjtXfHQkKpBwFnq7MNpTXib+53Vi5Z90E8qjWBGOJ4P8mEVwiwm2KOCYMVGmiAsqN4V4gESCsdZkmHYE+fPEua1Yp9WKleHpVrZ3kcRbADdsE+sMExqIELUAcNgME9eATP4MV4MJ6MV+NtUlow8p5t8AfGxf1vKDg</latexit> <latexit sha1_base64="K0louETznvNXoYDCXF7hv1DtY=">ACMHicbVDLSgMxFM34rPVdenmYhEUscxUQZeiC12qWBU6tWTSjIYmSHJqHWY/pEbP0U3Coq49StMH4qvA4Fzr2X3HuCmDNtXPfJGRgcGh4ZzY3lxycmp6YLM7PHOkoUoRUS8UidBlhTziStGY4PY0VxSLg9CRo7nTqJ5dUaRbJI9OKaU3gc8lCRrCxVr2w6wsm6+kN+EyCL7C5CIL0Mug3V7qyTC9zmAVvtRVBjewAp9SZMuwfFauF4puye0C/hKvT4qoj/164d5vRCQRVBrCsdZVz41NLcXKMJplvcTWNMmvicVi2VWFBdS7sHZ7BonQaEkbJPGui63ydSLRuicB2dtbUv2sd879aNTHhZi1lMk4MlaT3UZhwMBF0oMGU5QY3rIE8XsrkAusMLE2IzNgTv98l/yXG5K2Vygfrxa3tfhw5NI8W0BLy0AbaQntoH1UQbfoAT2jF+fOeXRenbde64DTn5lDP+C8fwAqVaj7</latexit> <latexit sha1_base64="wLdgEcLQpRKuImgEmyR48SYRZ9Q=">AB+HicbVDLTgIxFO3gC/HBqEs3jcQEF5IZNGNCdGNS0x4JTCSTulAQ9uZtB0VJnyJGxca49ZPcefWGAWCp7kJifn3Jt7/EjRpV2nG8rs7K6tr6R3cxtbe/s5u29/YKY4lJHYcslC0fKcKoIHVNSOtSBLEfUa/vBm6jcfiFQ0FDU9iojHUV/QgGKkjdS182N4BR/va7D4BE8hP+naBafkzACXiZuSAkhR7dpfnV6IY06Exgwp1XadSHsJkpiRia5TqxIhPAQ9UnbUIE4UV4yO3wCj43Sg0EoTQkNZ+rviQRxpUbcN50c6YFa9Kbif1471sGl1ARxZoIPF8UxAzqE5TgD0qCdZsZAjCkpbIR4gibA2WeVMCO7iy8ukUS65Z6Xy3Xmhcp3GkQWH4AgUgQsuQAXcgiqoAwxi8AxewZs1tl6sd+tj3pqx0pkD8AfW5w+wBpEo</latexit> Linear Reconstruction Model with 1 components ˆ x i = w z i + m W is a vector on unit circle. Fx1 F x 1 1 x 1 Magnitude is F x 1 always 1. Given fixed weights w and a specific x, what is the optimal scalar z value? Minimize reconstruction error! ( x − ( w z + m )) 2 min z ∈ R Exact analytical solution (take gradient, set to zero, solve for z) gives: z = w T ( x − m ) Projection of feature vector x onto vector w after “centering” (removing the mean) Mike Hughes - Tufts COMP 135 - Fall 2020 18

Principal Components Analysis (PCA) Prof. Mike Hughes Many - PowerPoint PPT Presentation

Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2020f/ Principal Components Analysis (PCA) Prof. Mike Hughes Many ideas/slides attributable to: Liping Liu (Tufts), Emily Fox (UW) Matt Gormley (CMU) 2 What

ECS231 PCA, revisited May 28, 2019 1 / 18 Outline 1. PCA for lossy data compression 2. PCA for

MLCC 2015 Dimensionality Reduction and PCA Lorenzo Rosasco UNIGE-MIT-IIT June 25, 2015 Outline

Exploratory Factor Analysis PCA Analysis A Review Precipitation Temperature Ecosystems PCA

PCA applied to bodies e 1 e 2 e 3 e 4 e 5 +4 4 Freifeld and Black, ECCV 2012 PCA

Principal Components Analysis (PCA) in Matlab Princi cipal C Compon onen ents An Analysis i

1 Principal Components Analysis (PCA) Review of basic setup: N vectors, { x 1 , . . .

Principal Components Analysis (PCA) and Singular Value Decomposition (SVD) with applications to

Lecture 24: Principal Component Analysis Aykut Erdem January 2017 Hacettepe University This

Principal Component Analysis (PCA) Dr. Veselina Kalinova Max Planck Institute for

Principal Components Analysis David Benjamin, Broad DSDE Methods February 10, 2016 What is PCA?

1 Principal Components Analysis (PCA) Suppose someone hands you a stack of N vectors, { x 1 ,

Principal Components Analysis Sargur Srihari University at Buffalo 1 Topics Projection

Ive Got You Under My Skin: A Comparison of IV and s/c PCA Nick Williamson Clinical Nurse

Lecture 25: Autoencoders Kernel PCA Aykut Erdem January 2017 Hacettepe University Today

PCA and admixture proportions for NGS data Anders Albrechtsen Admixture model NGSadmix

Principal Components Analysis (PCA) Exploratory data analysis of high-dimensional data sets.

Rauzy fractals and Equi-distribution of Galois- and beta-conjugates of Parry Numbers near the

Contour Integrals of Functions of a Complex Variable Bernd Schr oder logo1 Bernd Schr

Parallel Computing at the Desktop

Building a Stronger FoundaFon for Understanding in Advanced Trigonometry Jason Slowbe Great Oak

The Dot Product and Orthogonal Vectors The Dot Product Defn. The dot product (or inner product )

DIRECTIONAL DERIVATIVE MATH 200 GOALS Be able to compute a gradient vector, and use it to

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst.

RNA structure alignment by a unit-vector approach Emidio Capriotti Marc A. Marti-Renom