Principal Components Analysis David Benjamin, Broad DSDE Methods - PowerPoint PPT Presentation

Principal Components Analysis David Benjamin, Broad DSDE Methods February 10, 2016

What is PCA? PCA turns high-dimensional data into low-dimensional data by throwing out directions with low variance. Keep y , throw out x . Assumption: noise smaller than signal.

What about correlations? PCA turns high-dimensional data into low-dimensional data by throwing out directions with low variance. Find the pink and green axes. Throw out the pink component. Resulting low-dimensional data is projection onto green axis.

Covariance matrix Σ ij = 1 � ( x ni − µ i )( x nj − µ j ) � = 0 if x i and x j are correlated. N n � Σ xx Figure: � 0 Figure: Σ = � � Σ xx Σ xy > 0 0 Σ yy Σ = Σ xy > 0 Σ yy We want coordinates that make Σ diagonal.

PCA recipe Coordinates (principal components) that make Σ diagonal are the eigenvectors of Σ . PCA recipe Calculate covariance matrix Σ . Find eigenvectors v and eigenvalues λ such that Σ v k = λ k v k . λ k is the variance in the k k direction. Use heuristic to choose K eigenvectors to keep. K � Data is now K -dimensional: x ≈ µ + c k v k , k =1 c k = ( x − µ ) · v k K � Generative model: x = µ + c k v k + noise k =1

Eigenfaces Pixel images are very high-dimensional vectors. Run PCA and look at the principal components. . . Not strictly “eigenfaces,” but eigen-variation in faces relative to average face.

Eigenfaces Pixel images are very high-dimensional vectors. Run PCA and look at the principal components. . . Clockwise from top left full head of hair sunken eyes war paint your interpretation goes here Not strictly “eigenfaces,” but eigen-variation in faces relative to average face.

Eigenfaces

PCA map of Europe Data: x ni = genotype (0, 1, 2) of SNP i in person n .

PCA map of Europe Applications Classification / geneaology Population stratification in GWAS (regress against PCs)

PCA map of Europe Applications Classification / geneaology Population stratification in GWAS (regress against PCs) Do the PCs correspond to the map suspiciously well? Why do the genes of a population migrating north keep going straight along the first PC? Why is Hungary - Austria parallel to Switzerland - France?

Copy number variation from exome capture crash course in exome capture get DNA exon DNA hybridizes to baits, throw out remaining DNA sequence exon DNA

Copy number variation from exome capture crash course in exome capture get DNA exon DNA hybridizes to baits, throw out remaining DNA sequence exon DNA copy number variation align sequenced DNA to reference genome count number of reads from each exon more (less) reads implies duplication (deletion)

Copy number variation from exome capture

Copy number variation from exome capture � ( v ⊤ x = µ + k x ) v k + copy number signal k � ( v ⊤ ⇒ copy number signal = x − µ − k x ) v k k PCs v come from non-tumor samples with no CNVs!

Pitfalls PCs might not be good for classification

Pitfalls PCs might not be good for classification Low-dimensional space might be non-linear

Pitfalls PCs might not be good for classification Low-dimensional space might be non-linear Non-issue: Σ is a big matrix. (Use iterative PCA, FastPCA, flashpca. . .)

Generalizations � x = µ + c k v k + noise is part of a larger model: probabilistic PCA.

Generalizations � x = µ + c k v k + noise is part of a larger model: probabilistic PCA. Don’t like heuristics for choosing number of PCs to use: Bayesian PCA.

Generalizations � x = µ + c k v k + noise is part of a larger model: probabilistic PCA. Don’t like heuristics for choosing number of PCs to use: Bayesian PCA. Data are not linear: nonlinear dimensionality reduction (tSNE, autoencoders, GPLVM, Isomap, SOM. . .)

Equations Find the direction (unit vector) v of greatest variance. Projection of x is x ⊤ v . σ 2 = 1 = 1 � 2 � 2 � � � � x ⊤ n v − µ ⊤ v ( x n − µ ) ⊤ v N N n n = v ⊤ 1 � ( x n − µ )( x n − µ ) ⊤ v = v ⊤ Σ v N n Set ∇ v = 0 with Lagrange multiplier for v ⊤ v = 1 : � � v ⊤ Σ v + λ (1 − v ⊤ v ) = 0 ⇒ Σ v = λ v ∇ v Dotting with v ⊤ gives λ = λ v ⊤ v = v ⊤ Σv = σ 2 .

Principal Components Analysis David Benjamin, Broad DSDE Methods - PowerPoint PPT Presentation

Principal Components Analysis David Benjamin, Broad DSDE Methods February 10, 2016 What is PCA? PCA turns high-dimensional data into low-dimensional data by throwing out directions with low variance. Keep y , throw out x . Assumption: noise

Introduction to Machine Learning Session 3b: Principal Components Analysis Reto West

Continuous Latent Variables Oliver Schulte - CMPT 419/726 Bishop PRML Ch. 12 Principal Component

RECSM Summer School: Machine Learning for Social Sciences Session 3.2: Principal Components

Principal Components Analysis (PCA) and Singular Value Decomposition (SVD) with applications to

Multivariate analysis DAAG Chapter 12 Learning objectives In this section, we will learn some

Non-linear dimensionality reduction Recasting Principal Components R.W. Oldford Reducing

Recasting Principal Components R.W. Oldford University of Waterloo Reducing dimensions -

Section 1 Principal Component Analysis 1 / 16 Principal Component Analysis ST 810-006

Principal Components Analysis Sargur Srihari University at Buffalo 1 Topics Projection

24/11/2018 Principal Dr Irene Ng Vice Principal Mrs Regina Po Vice Principal Mr Bryan Ong Vice

Year 10 GCSE Key People You Need to Know: Mr Arnell Principal Ms Morris Deputy Principal

Functional Principal Component Analysis May 14, 2018 Empirical Principal Component FPC for the

Financial Econometrics Econ 40357 Principal Components N.C. Mark University of Notre Dame and

Principal Component Analysis Powerpoint Presentation What is multivariate analysis? Summarizing

Company introduction Soyter Components Our company Soyter Components located in Klaudyn near

Massive Data Algorithmics Lecture 10: Connected Components and MST Massive Data Algorithmics

Lecture 14: High Dimensionality & PCA CS109A Introduction to Data Science Pavlos Protopapas,

GP IT Overview NHS Scarborough & Ryedale PLT February 2020 Official-Sensitive: Commercial

An Overview of Battery Simulation Robert Spotnitz, Battery Design LLC Overview A. Battery

Type Instructional Software Presentation Instructional Software forHigh School Physical

Multivariate Ordination Analyses: Principal Component Analysis Dilys Vela Tatiana Boza Tatiana

Sparse PCA refusing to graduate :-) Aviad Rubinstein (UC Berkeley) Joint work with Siu-On Chan

Big Data Analytics in Economics: What Have We Learned so Far, and Where Should We Go From Here?

Multiple Object Tracking Using Local PCA C. Beleznai 1 , B. Frhstck 2 , H. Bischof 3 1 Advanced

Principal Components Analysis David Benjamin, Broad DSDE Methods - PowerPoint PPT Presentation

Principal Components Analysis David Benjamin, Broad DSDE Methods February 10, 2016 What is PCA? PCA turns high-dimensional data into low-dimensional data by throwing out directions with low variance. Keep y , throw out x . Assumption: noise

Introduction to Machine Learning Session 3b: Principal Components Analysis Reto West

Continuous Latent Variables Oliver Schulte - CMPT 419/726 Bishop PRML Ch. 12 Principal Component

RECSM Summer School: Machine Learning for Social Sciences Session 3.2: Principal Components

Principal Components Analysis (PCA) and Singular Value Decomposition (SVD) with applications to

Multivariate analysis DAAG Chapter 12 Learning objectives In this section, we will learn some

Non-linear dimensionality reduction Recasting Principal Components R.W. Oldford Reducing

Recasting Principal Components R.W. Oldford University of Waterloo Reducing dimensions -

Section 1 Principal Component Analysis 1 / 16 Principal Component Analysis ST 810-006

Principal Components Analysis Sargur Srihari University at Buffalo 1 Topics Projection

24/11/2018 Principal Dr Irene Ng Vice Principal Mrs Regina Po Vice Principal Mr Bryan Ong Vice

Year 10 GCSE Key People You Need to Know: Mr Arnell Principal Ms Morris Deputy Principal

Functional Principal Component Analysis May 14, 2018 Empirical Principal Component FPC for the

Financial Econometrics Econ 40357 Principal Components N.C. Mark University of Notre Dame and

Principal Component Analysis Powerpoint Presentation What is multivariate analysis? Summarizing

Company introduction Soyter Components Our company Soyter Components located in Klaudyn near

Massive Data Algorithmics Lecture 10: Connected Components and MST Massive Data Algorithmics

Lecture 14: High Dimensionality &amp; PCA CS109A Introduction to Data Science Pavlos Protopapas,

GP IT Overview NHS Scarborough &amp; Ryedale PLT February 2020 Official-Sensitive: Commercial

An Overview of Battery Simulation Robert Spotnitz, Battery Design LLC Overview A. Battery

Type Instructional Software Presentation Instructional Software forHigh School Physical

Multivariate Ordination Analyses: Principal Component Analysis Dilys Vela Tatiana Boza Tatiana

Sparse PCA refusing to graduate :-) Aviad Rubinstein (UC Berkeley) Joint work with Siu-On Chan

Big Data Analytics in Economics: What Have We Learned so Far, and Where Should We Go From Here?

Multiple Object Tracking Using Local PCA C. Beleznai 1 , B. Frhstck 2 , H. Bischof 3 1 Advanced

Lecture 14: High Dimensionality & PCA CS109A Introduction to Data Science Pavlos Protopapas,

GP IT Overview NHS Scarborough & Ryedale PLT February 2020 Official-Sensitive: Commercial