Three-dimensional structure determination of molecules without - PowerPoint PPT Presentation

Three-dimensional structure determination of molecules without crystallization: from electron microscopy to semidefinite programming Amit Singer Princeton University, Department of Mathematics and PACM February 13, 2014 Amit Singer (Princeton University) February 2014 1 / 29

Single Particle Cryo-Electron Microscopy Drawing of the imaging process: Amit Singer (Princeton University) February 2014 2 / 29

Single Particle Cryo-Electron Microscopy: Model Projection I i   | | | Molecule φ  ∈ SO(3) R 1 R 2 R 3 R i =  i i i | | | Electron source � ∞ −∞ φ ( xR 1 i + yR 2 i + zR 3 Projection images I i ( x , y ) = i ) dz + “noise”. φ : R 3 �→ R is the electric potential of the molecule. Cryo-EM problem: Find φ and R 1 , . . . , R n given I 1 , . . . , I n . Amit Singer (Princeton University) February 2014 3 / 29

Toy Example Amit Singer (Princeton University) February 2014 4 / 29

E. coli 50S ribosomal subunit: sample images Fred Sigworth, Yale Medical School Movie by Lanhui Wang and Zhizhen (Jane) Zhao Amit Singer (Princeton University) February 2014 5 / 29

Algorithmic Pipeline Particle Picking: manual, automatic or experimental image segmentation. Class Averaging: classify images with similar viewing directions, register and average to improve their signal-to-noise ratio (SNR). Orientation Estimation: S, Shkolnisky, SIIMS 2011. Bandeira, Charikar, S, Zhu, ITCS 2014. Three-dimensional Reconstruction: a 3D volume is generated by a tomographic inversion algorithm. Iterative Refinement Assumptions for today’s talk: Trivial point-group symmetry Homogeneity: no structural variability Amit Singer (Princeton University) February 2014 6 / 29

Orientation Estimation: Fourier projection-slice theorem R i c ij c ij = ( x ij , y ij , 0) T ( x ij , y ij ) ˆ Projection I i I i 3D Fourier space ( x ji , y ji ) R i c ij = R j c ji ˆ Projection I j I j 3D Fourier space Amit Singer (Princeton University) February 2014 7 / 29

Angular Reconstitution (Van Heel 1987, Vainshtein and Goncharov 1986) Amit Singer (Princeton University) February 2014 8 / 29

Experiments with simulated noisy projections Each projection is 129x129 pixels. SNR = Var( Signal ) Var( Noise ) , (b) SNR=2 0 (c) SNR=2 − 1 (d) SNR=2 − 2 (e) SNR=2 − 3 (a) Clean (f) SNR=2 − 4 (g) SNR=2 − 5 (h) SNR=2 − 6 (i) SNR=2 − 7 (j) SNR=2 − 8 Amit Singer (Princeton University) February 2014 9 / 29

Fraction of correctly identified common lines and the SNR Define common line as being correctly identified if both radial lines deviate by no more than 10 ◦ from true directions. log 2 (SNR) p 20 0.997 0 0.980 -1 0.956 -2 0.890 -3 0.764 -4 0.575 -5 0.345 -6 0.157 -7 0.064 -8 0.028 -9 0.019 Amit Singer (Princeton University) February 2014 10 / 29

Least Squares Approach Consider the unit directional vectors as three-dimensional vectors: ( x ij , y ij , 0) T , c ij = ( x ji , y ji , 0) T . c ji = Being the common-line of intersection, the mapping of c ij by R i must coincide with the mapping of c ji by R j : ( R i , R j ∈ SO (3)) R i c ij = R j c ji , for 1 ≤ i < j ≤ n . Least squares: � � R i c ij − R j c ji � 2 min R 1 , R 2 ,..., R n ∈ SO (3) i � = j Search space is exponentially large and non-convex. Amit Singer (Princeton University) February 2014 11 / 29

Quadratic Optimization Under Orthogonality Constraints i � = j � R i c ij − R j c ji � 2 Quadratic cost: � Quadratic constraints: R T i R i = I 3 × 3 (det( R i ) = +1 constraint is ignored) We approximate the solution using SDP and rounding. Related to: Goemans-Williamson (1995) SDP relaxation for MAX-CUT PhaseLift (Candes et al 2012) Generalized Orthogonal Procrustes Problem (Nemirovski 2007) Non-commutative Grothendick Problem (Naor et al 2013) “Robust” version – Least Unsquared Deviations (Wang, S, Wen 2013) � min � R i c ij − R j c ji � R 1 , R 2 ,..., R n ∈ SO (3) i � = j Amit Singer (Princeton University) February 2014 12 / 29

SDP Relaxation for the Common-Lines Problem Least squares is equivalent to maximizing the sum of inner products: � R i c ij − R j c ji � 2 ⇐ � � min ⇒ max � R i c ij , R j c ji � R 1 , R 2 ,..., R n ∈ SO (3) R 1 , R 2 ,..., R n ∈ SO (3) i � = j i � = j � Tr( c ji c T ij R T ⇐ ⇒ max i R j ) ⇐ ⇒ R 1 , R 2 ,..., R n ∈ SO (3) Tr( CG ) max R 1 , R 2 ,..., R n ∈ SO (3) i � = j C is the 2 n × 2 n matrix (“the common lines matrix”) with � 0 � x ji � x ji x ij � � � � x ji y ij 0 c T � C ij = ˜ c ji ˜ ij = x ij y ij = , C ii = y ji y ji x ij y ji y ij 0 0 R T ˜ G is the 2 n × 2 n Gram matrix G = ˜ R with G ij = ˜ i ˜ R T R j : ˜ ˜ 1 ˜ ˜ 1 ˜  R T   R T R T  · · · I 2 × 2 R 2 R n 1 ˜ ˜ 2 ˜ ˜ 2 ˜ R T R T R T · · · � ˜ R 1 I 2 × 2 R n     2 ˜ ˜ � G = R 1 R 2 · · · R n =  .   . . .  ... . . . .     . . . .     ˜ ˜ n ˜ ˜ n ˜ R T R T R T · · · R 1 R 2 I 2 × 2 n Amit Singer (Princeton University) February 2014 13 / 29

SDP Relaxation and Rounding R 1 , R 2 ,..., R n ∈ SO (3) Tr( CG ) max SDP Relaxation : G ∈ R 2 n × 2 n Tr( CG ) max s.t. G � 0 , G ii = I 2 × 2 , i = 1 , 2 , . . . , n . Missing is the non-convex constraint rank( G ) = 3. Randomize a 2 n × 3 orthogonal matrix Q using (careful) QR factorization of a 2 n × 3 matrix with i.i.d standard Gaussian entries Compute Cholesky decomposition G = YY T ⇒ ˜ Round using SVD: ( YQ ) i = U i Σ i V T R T = U i V T = i . i i Use the cross product to find R T i . Loss of handedness. Amit Singer (Princeton University) February 2014 14 / 29

Spectral Relaxation for Uniformly Distributed Rotations � x 1 x 2 � | | � � i i R 1 R 2 y 1 y 2 = i = 1 , . . . , n . , i i i i z 1 z 2 | | i i Define 3 vectors of length 2 n � T � x 1 x 2 x 1 x 2 x 1 x 2 x = · · · 1 1 2 2 n n � T � y 1 y 2 y 1 y 2 y 1 y 2 = · · · y 1 1 2 2 n n � T � z 1 z 2 z 1 z 2 z 1 z 2 = · · · z 1 1 2 2 n n Rewrite the least squares objective function as R 1 ,..., R n ∈ SO (3) x T Cx + y T Cy + z T Cz � max � R i c ij , R j c ji � = max R 1 ,..., R n ∈ SO (3) i � = j By symmetry , if rotations are uniformly distributed over SO (3), then the top eigenvalue of C has multiplicity 3 and corresponding eigenvectors are x , y , z from which we recover R 1 , R 2 , . . . , R n ! Amit Singer (Princeton University) February 2014 15 / 29

Spectrum of C Numerical simulation with n = 1000 rotations sampled from the Haar measure; no noise. Bar plot of positive (left) and negative (right) eigenvalues of C : 600 180 160 500 140 400 120 100 − λ λ 300 80 200 60 40 100 20 0 0 0 10 20 30 40 50 60 0 10 20 30 40 50 60 Eigenvalues: λ l ≈ n ( − 1) l +1 l = 1 , 2 , 3 , . . . . ( 1 2 , − 1 6 , 1 l ( l +1) , 12 , . . . ) Multiplicities: 2 l + 1. Two basic questions: Why this spectrum? Answer: Representation Theory of SO(3) 1 (Hadani, S, 2011) Is it stable to noise? Answer: Yes, due to random matrix theory. 2 Amit Singer (Princeton University) February 2014 16 / 29

Probabilistic Model and Wigner’s Semi-Circle Law Simplistic Model: every common line is detected correctly with probability p , independently of all other common-lines, and with probability 1 − p the common lines are falsely detected and are uniformly distributed over the unit circle. Let C clean be the matrix C when all common-lines are detected correctly ( p = 1). The expected value of the noisy matrix C is E [ C ] = pC clean , as the contribution of the falsely detected common lines to the expected value vanishes . Decompose C as C = pC clean + W , where W is a 2 n × 2 n zero-mean random matrix. The eigenvalues of W are distributed according to Wigner’s semi-circle law whose support, up to O ( p ) and finite sample √ √ fluctuations, is [ − 2 n , 2 n ]. Amit Singer (Princeton University) February 2014 17 / 29

Threshold probability Sufficient condition for top three eigenvalues to be pushed away from the semi-circle and no other eigenvalue crossings: (rank-1 and finite rank deformed Wigner matrices, F¨ uredi and Koml´ os 1981, F´ eral and P´ ech´ e 2007, ...) p ∆( C clean ) > 1 2 λ 1 ( W ) Spectral gap ∆( C clean ) and spectral norm λ 1 ( W ) are given by ∆( C clean ) ≈ (1 2 − 1 12) n and √ λ 1 ( W ) ≈ 2 n . Threshold probability √ p c = 5 2 6 √ n . Amit Singer (Princeton University) February 2014 18 / 29

Numerical Spectra of C , n = 1000 800 800 400 600 600 300 400 400 200 200 200 100 0 0 0 −200 0 200 400 600 −200 0 200 400 600 −200 0 200 400 600 λ λ λ (a) SNR=2 0 (b) SNR=2 − 1 (c) SNR=2 − 2 250 200 100 200 80 150 150 60 100 100 40 50 50 20 0 0 0 −200 0 200 400 600 −200 0 200 400 600 −100 0 100 200 300 400 λ λ λ (d) SNR=2 − 3 (e) SNR=2 − 4 (f) SNR=2 − 5 40 25 60 20 30 40 15 20 10 20 10 5 0 0 0 −100 0 100 200 300 −50 0 50 100 150 200 −50 0 50 100 150 λ λ λ (g) SNR=2 − 6 (h) SNR=2 − 7 (i) SNR=2 − 8 Amit Singer (Princeton University) February 2014 19 / 29

Three-dimensional structure determination of molecules without - PowerPoint PPT Presentation

Three-dimensional structure determination of molecules without crystallization: from electron microscopy to semidefinite programming Amit Singer Princeton University, Department of Mathematics and PACM February 13, 2014 Amit Singer (Princeton

Fast structure determination for proteins and small molecules a MicroED solution Abhay

Introduction to Three Dimensional Structure Determination of Macromolecules by Cryo-Electron

Toward automated structure determination from near-atomic resolution data Frank DiMaio

Geometric arrangement algorithms for protein structure determination Jeff Martin Bruce Donald

Location Determination 1 Framework and Technologies Meaning of Location 2 Three Dimensional

Part 1 How can computational NMR contribute to structure determination of proteins with

Protein Folding Protein Folding Proteins have unique 3-dimensional shapes created by the

Protein Folding Protein Folding Proteins have unique 3-dimensional shapes created by the

Overview of Global Green Programs and Volatile Organic Compounds (VOCs) 3. Determination of VOC

What is important to know about lipids? What can be measured? Example for structure determination

Table of Contents Why DNA Computing? The Structure of DNA DNA Computing Operations on DNA

Table of Contents Why DNA Computing? The Structure of DNA DNA Computing Operations on

The interval branch-and-prune algorithm for the protein structure determination by NMR Th er`

Understanding the biological machinery by cryogenic TEM imaging and structure determination.

Biomolecular Structure and Evolution From Theory to the Design of RNA Molecules Peter Schuster

Structure and evolution of transiting giant planets: a Bayesian homogeneous determination of

De novo structure determination of a 27.5 kDa protein- RNA complex: a Dead End for classical NMR

Course outline Theory Practice Day 1 Introduction to structure determination Chromatin

Ab initio cryo-EM structure determination as a validation problem Pawel A. Penczek The

Protein Structure Determination An Introduction Chad A. Brautigam, Ph.D. Structural Biology

Electronic Structure of Rare Gas - Hydrogen Rydberg Molecules Erich W Schreiner and Geerd HF

DNA Computing Information Processing with DNA Molecules Christian Jacob, 01/2002. Table of

Determination of apoferritin structure parameters using small angle neutron scattering Daniela

Structure determination of genomes and genomic domains by satisfaction of spatial restraints