Class Averaging and Symmetry Detection in Cryo-EM Amit Singer Princeton University Department of Mathematics and Program in Applied and Computational Mathematics July 25, 2014 Amit Singer (Princeton University) July 2014 1 / 29
Image denoising by vector diffusion maps S, Zhao, Shkolnisky, Hadani (SIIMS 2011) Hadani, S (FoCM 2011) S, Wu (Comm. Pure Appl. Math 2012) Zhao, S (J Struct. Bio. 2014) Generalization of Laplacian Eigenmaps (Belkin, Niyogi 2003) and Diffusion Maps (Coifman, Lafon 2006) Introduced the graph Connection Laplacian Experimental images (70S) courtesy of Dr. Joachim Frank (Columbia) Class averages by vector diffusion maps (averaging with 20 nearest neighbors) Amit Singer (Princeton University) July 2014 2 / 29
Class Averaging in Cryo-EM: Improve SNR Amit Singer (Princeton University) July 2014 3 / 29
Clustering method (Penczek, Zhu, Frank 1996) Projection images P 1 , P 2 , . . . , P n with unknown rotations R 1 , R 2 , . . . , R n ∈ SO (3) Rotationally Invariant Distances (RID) d RID ( i , j ) = O ∈ SO (2) � P i − OP j � min Cluster the images using K-means. Images are not centered; also possible to include translations and to optimize over the special Euclidean group. Problem with this approach: outliers. At low SNR images with completely different viewing directions may have relatively small d RID (noise aligns well, instead of underlying signal). Amit Singer (Princeton University) July 2014 4 / 29
Outliers: Small World Graph on S 2 Define graph G = ( V , E ) by { i , j } ∈ E ⇐ ⇒ d RID ( i , j ) ≤ ε . R 2 R 3 i i R 1 i Optimal rotation angles O ij = argmin � P i − OP j � , i , j = 1 , . . . , n . O ∈ SO (2) Triplet consistency relation – good triangles O ij O jk O ki ≈ I 2 × 2 . How to use information of optimal rotations in a systematic way? Vector Diffusion Maps “Non-local means with rotations” Amit Singer (Princeton University) July 2014 5 / 29
Vector Diffusion Maps: Setup j w ij O ij i In VDM, the relationships between data points (e.g., cryo-EM images) are represented as a weighted graph, where the weights w ij describing affinities between data points are accompanied by linear orthogonal transformations O ij . Amit Singer (Princeton University) July 2014 6 / 29
� Manifold Learning: Point cloud in R p x 1 , x 2 , . . . , x n ∈ R p . Manifold assumption: x 1 , . . . , x n ∈ M d , with d ≪ p . Local Principal Component Analysis (PCA) gives an approximate orthonormal basis O i for the tangent space T x i M . O i is a p × d matrix with orthonormal columns: O T i O i = I d × d . Alignment: O ij = argmin O ∈ O ( d ) � O − O T i O j � HS (computed through the singular value decomposition of O T i O j ). � ��� � ��� 1 � 5 � � � � 4 3 2 6 � � � � � � � � � � � � � � 1 5 � �� � 2 � � � � 6 3 4 Amit Singer (Princeton University) July 2014 7 / 29
Parallel Transport O ij approximates the parallel transport operator P x i , x j : T x j M → T x i M Amit Singer (Princeton University) July 2014 8 / 29
Laplacian Eigenmap (Belkin and Niyogi 2003) and Diffusion Map (Coifman and Lafon 2006) Symmetric n × n matrix W 0 : � w ij ( i , j ) ∈ E , W 0 ( i , j ) = 0 ( i , j ) / ∈ E . Diagonal matrix D 0 of the same size: � D 0 ( i , i ) = deg( i ) = w ij . j :( i , j ) ∈ E Graph Laplacian, Normalized graph Laplacian and the random walk matrix: L 0 = I − D − 1 / 2 W 0 D − 1 / 2 A 0 = D − 1 L 0 = D 0 − W 0 , , 0 W 0 0 0 The diffusion map Φ t is defined in terms of the eigenvectors of A 0 : A 0 φ l = λ l φ l , l = 1 , . . . , n Φ t : i �→ ( λ t l φ l ( i )) n l =1 . Amit Singer (Princeton University) July 2014 9 / 29
Vector diffusion mapping: W 1 and D 1 Symmetric nd × nd matrix W 1 : � w ij O ij ( i , j ) ∈ E , W 1 ( i , j ) = 0 d × d ( i , j ) / ∈ E . n × n blocks, each of which is of size d × d . Diagonal matrix D 1 of the same size, where the diagonal d × d blocks are scalar matrices with the weighted degrees: D 1 ( i , i ) = deg( i ) I d × d , and � deg( i ) = w ij j :( i , j ) ∈ E Amit Singer (Princeton University) July 2014 10 / 29
A 1 = D − 1 1 W 1 is an averaging operator for vector fields The matrix A 1 can be applied to vectors v of length nd , which we regard as n vectors of length d , such that v ( i ) is a vector in R d viewed as a vector in T x i M . The matrix A 1 = D − 1 1 W 1 is an averaging operator for vector fields, since 1 � ( A 1 v )( i ) = w ij O ij v ( j ) . deg( i ) j :( i , j ) ∈ E This implies that the operator A 1 transport vectors from the tangent spaces T x j M (that are nearby to T x i M ) to T x i M and then averages the transported vectors in T x i M . Amit Singer (Princeton University) July 2014 11 / 29
Affinity between nodes based on consistency of transformations In the VDM framework, we define the affinity between i and j by considering all paths of length t connecting them, but instead of just summing the weights of all paths, we sum the transformations . Every path from j to i may result in a different transformation (like parallel transport due to curvature). When adding transformations of different paths, cancelations may happen. We define the affinity between i and j as the consistency between these transformations. 1 W 1 is similar to the symmetric matrix ˜ A 1 = D − 1 W 1 W 1 = D − 1 / 2 ˜ W 1 D − 1 / 2 1 1 We define the affinity between i and j as HS = deg( i ) � ˜ W 2 t 1 ( i , j ) � 2 deg( j ) � ( D − 1 1 W 1 ) 2 t ( i , j ) � 2 HS . Amit Singer (Princeton University) July 2014 12 / 29
Embedding into a Hilbert Space Since ˜ W 1 is symmetric, it has a complete set of eigenvectors { v l } nd l =1 and eigenvalues { λ l } nd l =1 (ordered as | λ 1 | ≥ | λ 2 | ≥ . . . ≥ | λ nd | ). Spectral decompositions of ˜ W 1 and ˜ W 2 t 1 : nd nd λ l v l ( i ) v l ( j ) T , l v l ( i ) v l ( j ) T , ˜ � W 2 t ˜ � λ 2 t W 1 ( i , j ) = and 1 ( i , j ) = l =1 l =1 where v l ( i ) ∈ R d for i = 1 , . . . , n and l = 1 , . . . , nd . The HS norm of ˜ W 2 t 1 ( i , j ) is calculated using the trace: nd � ˜ W 2 t 1 ( i , j ) � 2 � ( λ l λ r ) 2 t � v l ( i ) , v r ( i ) �� v l ( j ) , v r ( j ) � . HS = l , r =1 The affinity � ˜ W 2 t 1 ( i , j ) � 2 HS = � V t ( i ) , V t ( j ) � is an inner product for the finite dimensional Hilbert space R ( nd ) 2 via the mapping V t : � nd ( λ l λ r ) t � v l ( i ) , v r ( i ) � � V t : i �→ l , r =1 . Amit Singer (Princeton University) July 2014 13 / 29
Vector Diffusion Distance The vector diffusion mapping is defined as � nd ( λ l λ r ) t � v l ( i ) , v r ( i ) � � V t : i �→ l , r =1 . The vector diffusion distance between nodes i and j is denoted d VDM , t ( i , j ) and is defined as d 2 VDM , t ( i , j ) = � V t ( i ) , V t ( i ) � + � V t ( j ) , V t ( j ) � − 2 � V t ( i ) , V t ( j ) � . Other normalizations of the matrix W 1 are possible and lead to slightly different embeddings and distances (similar to diffusion maps). The matrices I − ˜ W 1 and I + ˜ W 1 are positive semidefinite, because 2 � � v ( i ) ± w ij O ij v ( j ) � � v T ( I ± D − 1 / 2 W 1 D − 1 / 2 � ) v = ≥ 0 , � � 1 1 � � � � deg( i ) deg( j ) � � ( i , j ) ∈ E for any v ∈ R nd . Therefore, λ l ∈ [ − 1 , 1]. As a result, the vector diffusion mapping and distances can be well approximated by using only the few largest eigenvalues and their corresponding eigenvectors. Amit Singer (Princeton University) July 2014 14 / 29
Application to the class averaging problem in Cryo-EM (S, Zhao, Shkolnisky, Hadani 2011) 4 5 14x 10 2.5x 10 12 2 10 1.5 8 6 1 4 0.5 2 0 0 0 20 40 60 80 100 120 140 160 180 0 20 40 60 80 100 120 140 160 180 degrees degrees (a) Neighbors are identified using d RID (b) Neighbors are identified using d VDM , t =2 Figure : SNR=1 / 64: Histogram of the angles ( x -axis, in degrees) between the viewing directions of each image (out of 40000) and it 40 neighboring images. Left: neighbors are identified using the original rotationally invariant distances d RID . Right: neighbors are post identified using vector diffusion distances. Amit Singer (Princeton University) July 2014 15 / 29
Computational Aspects Zhao, S J. Struct. Biol. 2014 ıve implementation requires O ( n 2 ) rotational alignments of images Na¨ Rotational invariant representation of images: “bispectrum” Dimensionality reduction using a randomized algorithm for PCA (Rokhlin, Liberty, Tygert, Martinsson, Halko, Tropp, Szlam, ...) Randomized approximated nearest neighbors search in nearly linear time (Jones, Osipov, Rokhlin 2011) Amit Singer (Princeton University) July 2014 16 / 29
Rotational Invariance: Bispectrum Bispectrum for a 1D periodic signal f b f ( k 1 , k 2 ) = ˆ f ( k 1 )ˆ f ( k 2 )ˆ f ( − ( k 1 + k 2 )) Bispectrum is shift invariant, complete and unbiased. Phase information is preserved (unlike power spectrum) Amit Singer (Princeton University) July 2014 17 / 29
Recommend
More recommend