Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2020f/ Principal Components Analysis (PCA) Prof. Mike Hughes Many ideas/slides attributable to: Liping Liu (Tufts), Emily Fox (UW) Matt Gormley (CMU) 2
What will we learn? Supervised Learning Data Examples Performance { x n } N measure Task n =1 Unsupervised Learning summary data of x x Reinforcement Learning Mike Hughes - Tufts COMP 135 - Fall 2020 3
Task: Embedding Supervised Learning x 2 Unsupervised Learning embedding Reinforcement x 1 Learning Mike Hughes - Tufts COMP 135 - Fall 2020 4
Dim. Reduction/Embedding Unit Objectives • Goals of dimensionality reduction • Reduce feature vector size (keep signal, discard noise) • “Interpret” features: visualize/explore/understand • Common approaches • Principal Component Analysis (PCA) • word2vec and other neural embeddings • Evaluation Metrics • Storage size - Reconstruction error • “Interpretability” Mike Hughes - Tufts COMP 135 - Fall 2020 5
Example: 2D viz. of movies Mike Hughes - Tufts COMP 135 - Fall 2020 6
Example: Genes vs. geography Nature, 2008 Mike Hughes - Tufts COMP 135 - Fall 2020 7
Centering the Data Goal: each feature’s mean = 0.0 Mike Hughes - Tufts COMP 135 - Fall 2020 8
<latexit sha1_base64="dUHKmRLUMswF0x+NPSnxqRI0fg=">AB/3icbVDLSsNAFJ34rPUVFdy4GSyCq5JUQTdC0Y3LCvYBTQiT6aQdOnkwcyMtMQt/xY0LRdz6G+78G6dtFtp64MLhnHu59x4/EVyBZX0bS8srq2vrpY3y5tb2zq65t9ScSopa9JYxLjE8UEj1gTOAjWSQjoS9Y2x/eTPz2A5OKx9E9jBPmhqQf8YBTAlryzENnQCBzgI3AD7JR7vEcX+HQMytW1ZoCLxK7IBVUoOGZX04vpmnIqCKNW1rQTcjEjgVLC87KSKJYQOSZ91NY1IyJSbTe/P8YlWejiIpa4I8FT9PZGRUKlx6OvOkMBAzXsT8T+vm0Jw6WY8SlJgEZ0tClKBIcaTMHCPS0ZBjDUhVHJ9K6YDIgkFHVlZh2DPv7xIWrWqfVat3Z1X6tdFHCV0hI7RKbLRBaqjW9RATUTRI3pGr+jNeDJejHfjY9a6ZBQzB+gPjM8fH62WJw=</latexit> Constant Reconstruction model ˆ x i = m m Parameters: m, an F-dim vector Training problem: Minimize reconstruction error N ( x n − m ) T ( x n − m ) X min m ∈ R F n =1 This is squared error between two vectors Optimal parameters: m ∗ = mean( x 1 , . . . x N ) Think of mean vector as optimal “reconstruction” of a dataset if you must use a single vector Mike Hughes - Tufts COMP 135 - Fall 2020 9
Mean reconstruction original reconstructed Mike Hughes - Tufts COMP 135 - Fall 2020 10
Linear Reconstruction and Principal Component Analysis Mike Hughes - Tufts COMP 135 - Fall 2020 11
Linear Projection to 1D Mike Hughes - Tufts COMP 135 - Fall 2020 12
Reconstruction from 1D to 2D Mike Hughes - Tufts COMP 135 - Fall 2020 13
2D Orthogonal Basis If we could project into 2 dims (same as F), we can perfectly reconstruct Mike Hughes - Tufts COMP 135 - Fall 2020 14
Which 1D projection is best? Idea: Minimize reconstruction error Mike Hughes - Tufts COMP 135 - Fall 2020 15
<latexit sha1_base64="zytQD0ua0ZeTvtQCu7rVyhs+2Jg=">ACGXicbVDLSsNAFJ3UV62vqEs3g0UQhJUQTdC0Y3LCvYBTQmT6aQdOpmEmYlaQ37Djb/ixoUiLnXl3zhpo2jrgYEz59zLvfd4EaNSWdanUZibX1hcKi6XVlbX1jfMza2mDGOBSQOHLBRtD0nCKCcNRUj7UgQFHiMtLzhea3romQNORXahSRboD6nPoUI6Ul17ScAVKJEyA18PzkNk1dCk/h9/8mhXdaOPgRgtQ1y1bFGgPOEjsnZCj7prvTi/EcUC4wgxJ2bGtSHUTJBTFjKQlJ5YkQniI+qSjKUcBkd1kfFkK97TSg34o9OMKjtXfHQkKpBwFnq7MNpTXib+53Vi5Z90E8qjWBGOJ4P8mEVwiwm2KOCYMVGmiAsqN4V4gESCsdZkmHYE+fPEua1Yp9WKleHpVrZ3kcRbADdsE+sMExqIELUAcNgME9eATP4MV4MJ6MV+NtUlow8p5t8AfGxf1vKDg</latexit> Linear Reconstruction Model with 1 components ˆ x i = w z i + m Fx1 F x 1 1 x 1 F x 1 High-dim. Weights Low-dim “mean” data embedding vector or “score” Mike Hughes - Tufts COMP 135 - Fall 2020 16
<latexit sha1_base64="+9+RCl1NQLq0x56CkST31Y4T5uY=">ACAHicbVDLSsNAFJ34rPUVdeHCzWARXJWkCropFAVxWcE+oE3DZDph85MwsxEKSEbf8WNC0Xc+hnu/BunbRbaeuDC4Zx7ufeIGZUacf5tpaWV1bX1gsbxc2t7Z1de2+/qaJEYtLAEYtkO0CKMCpIQ1PNSDuWBPGAkVYwup74rQciFY3EvR7HxONoIGhIMdJG8u3Drkq4n4ZVN+vdwEc/7FVgFbrQt0tO2ZkCLhI3JyWQo+7bX91+hBNOhMYMKdVxnVh7KZKaYkayYjdRJEZ4hAakY6hAnCgvnT6QwROj9GEYSVNCw6n6eyJFXKkxD0wnR3qo5r2J+J/XSXR46aVUxIkmAs8WhQmDOoKTNGCfSoI1GxuCsKTmVoiHSCKsTWZFE4I7/IiaVbK7lm5cndeql3lcRTAETgGp8AF6AGbkEdNAGXgGr+DNerJerHfrY9a6ZOUzB+APrM8fKFmUzw=</latexit> <latexit sha1_base64="zytQD0ua0ZeTvtQCu7rVyhs+2Jg=">ACGXicbVDLSsNAFJ3UV62vqEs3g0UQhJUQTdC0Y3LCvYBTQmT6aQdOpmEmYlaQ37Djb/ixoUiLnXl3zhpo2jrgYEz59zLvfd4EaNSWdanUZibX1hcKi6XVlbX1jfMza2mDGOBSQOHLBRtD0nCKCcNRUj7UgQFHiMtLzhea3romQNORXahSRboD6nPoUI6Ul17ScAVKJEyA18PzkNk1dCk/h9/8mhXdaOPgRgtQ1y1bFGgPOEjsnZCj7prvTi/EcUC4wgxJ2bGtSHUTJBTFjKQlJ5YkQniI+qSjKUcBkd1kfFkK97TSg34o9OMKjtXfHQkKpBwFnq7MNpTXib+53Vi5Z90E8qjWBGOJ4P8mEVwiwm2KOCYMVGmiAsqN4V4gESCsdZkmHYE+fPEua1Yp9WKleHpVrZ3kcRbADdsE+sMExqIELUAcNgME9eATP4MV4MJ6MV+NtUlow8p5t8AfGxf1vKDg</latexit> Linear Reconstruction Model with 1 components ˆ x i = w z i + m W is a vector on unit circle. Magnitude is Problem: “Over-parameterized”. Too many possible solutions! always 1. Suppose we have an alternate model with weights w’ and embedding z’ We would get equivalent reconstructions if we set: • w’ = w * 2 • z’ = z / 2 F Solution: Constrain magnitude of w. X w 2 f = 1 w is a unit vector. We care about direction, not scale. f =1 Mike Hughes - Tufts COMP 135 - Fall 2020 17
<latexit sha1_base64="zytQD0ua0ZeTvtQCu7rVyhs+2Jg=">ACGXicbVDLSsNAFJ3UV62vqEs3g0UQhJUQTdC0Y3LCvYBTQmT6aQdOpmEmYlaQ37Djb/ixoUiLnXl3zhpo2jrgYEz59zLvfd4EaNSWdanUZibX1hcKi6XVlbX1jfMza2mDGOBSQOHLBRtD0nCKCcNRUj7UgQFHiMtLzhea3romQNORXahSRboD6nPoUI6Ul17ScAVKJEyA18PzkNk1dCk/h9/8mhXdaOPgRgtQ1y1bFGgPOEjsnZCj7prvTi/EcUC4wgxJ2bGtSHUTJBTFjKQlJ5YkQniI+qSjKUcBkd1kfFkK97TSg34o9OMKjtXfHQkKpBwFnq7MNpTXib+53Vi5Z90E8qjWBGOJ4P8mEVwiwm2KOCYMVGmiAsqN4V4gESCsdZkmHYE+fPEua1Yp9WKleHpVrZ3kcRbADdsE+sMExqIELUAcNgME9eATP4MV4MJ6MV+NtUlow8p5t8AfGxf1vKDg</latexit> <latexit sha1_base64="K0louETznvNXoYDCXF7hv1DtY=">ACMHicbVDLSgMxFM34rPVdenmYhEUscxUQZeiC12qWBU6tWTSjIYmSHJqHWY/pEbP0U3Coq49StMH4qvA4Fzr2X3HuCmDNtXPfJGRgcGh4ZzY3lxycmp6YLM7PHOkoUoRUS8UidBlhTziStGY4PY0VxSLg9CRo7nTqJ5dUaRbJI9OKaU3gc8lCRrCxVr2w6wsm6+kN+EyCL7C5CIL0Mug3V7qyTC9zmAVvtRVBjewAp9SZMuwfFauF4puye0C/hKvT4qoj/164d5vRCQRVBrCsdZVz41NLcXKMJplvcTWNMmvicVi2VWFBdS7sHZ7BonQaEkbJPGui63ydSLRuicB2dtbUv2sd879aNTHhZi1lMk4MlaT3UZhwMBF0oMGU5QY3rIE8XsrkAusMLE2IzNgTv98l/yXG5K2Vygfrxa3tfhw5NI8W0BLy0AbaQntoH1UQbfoAT2jF+fOeXRenbde64DTn5lDP+C8fwAqVaj7</latexit> <latexit sha1_base64="wLdgEcLQpRKuImgEmyR48SYRZ9Q=">AB+HicbVDLTgIxFO3gC/HBqEs3jcQEF5IZNGNCdGNS0x4JTCSTulAQ9uZtB0VJnyJGxca49ZPcefWGAWCp7kJifn3Jt7/EjRpV2nG8rs7K6tr6R3cxtbe/s5u29/YKY4lJHYcslC0fKcKoIHVNSOtSBLEfUa/vBm6jcfiFQ0FDU9iojHUV/QgGKkjdS182N4BR/va7D4BE8hP+naBafkzACXiZuSAkhR7dpfnV6IY06Exgwp1XadSHsJkpiRia5TqxIhPAQ9UnbUIE4UV4yO3wCj43Sg0EoTQkNZ+rviQRxpUbcN50c6YFa9Kbif1471sGl1ARxZoIPF8UxAzqE5TgD0qCdZsZAjCkpbIR4gibA2WeVMCO7iy8ukUS65Z6Xy3Xmhcp3GkQWH4AgUgQsuQAXcgiqoAwxi8AxewZs1tl6sd+tj3pqx0pkD8AfW5w+wBpEo</latexit> Linear Reconstruction Model with 1 components ˆ x i = w z i + m W is a vector on unit circle. Fx1 F x 1 1 x 1 Magnitude is F x 1 always 1. Given fixed weights w and a specific x, what is the optimal scalar z value? Minimize reconstruction error! ( x − ( w z + m )) 2 min z ∈ R Exact analytical solution (take gradient, set to zero, solve for z) gives: z = w T ( x − m ) Projection of feature vector x onto vector w after “centering” (removing the mean) Mike Hughes - Tufts COMP 135 - Fall 2020 18
Recommend
More recommend