[PPT] - CAPSID Computational Algorithms for Protein Structures and PowerPoint Presentation

SLIDE 1

CAPSID Computational Algorithms for Protein Structures and Interactions

David Ritchie + Isaure Chauvot de Beauchˆ ene Inria Nancy – Grand Est

SLIDE 2

Structural Bioinformatics Tools and Techniques

In-House Software

Hex– protein docking by spherical polar FFT Sam – spherical polar FFT docking of symmetrical complexes gEMfitter – cryo-EM protein density fitting by FFT on GPU KBDOCK – database of 3D domain-domain interactions Kpax – multiple flexible protein structure alignment

External Tools

Molecular dynamics simulation & modeling: NAMD, Modeller, ...

2 / 12

SLIDE 3

“Hex” – Spherical Polar Fourier Protein Docking

SPF approach => analytic translational + rotational correlations Shape-based scoring function (surface skin overlap volume) Can cover 6D search space using 1D, 3D, or 5D rotational FFTs... “Easy” to accelerate the 1D FFTs on highly parallel GPUs ...

3 / 12

SLIDE 4

Sam/Hex: Spherical Polar Fourier Basis Functions

Represent protein shape as a 3D shape-density function... τ(r) = N

nlm aτ nlmRnl(r) ylm(θ, φ)

...using spherical harmonic, ylm(θ, φ), and radial, Rnl(r), basis functions

Image Order Coefficients A Gaussians

B

N = 16 1,496 C N = 25 5,525 D N = 30 9,455 4 / 12

SLIDE 5

Coordinate Operators and Docking Equations

Polar Fourier basis is “natural” for rotational search problems

Describe search space using operators

Rotation: ˆ R(α, β, γ) = ˆ Rz(α)ˆ Ry(β)ˆ Rz(γ) Translation: ˆ Tz(R)

Describe interaction as an “equation”

ˆ R(0, βA, γA)A(r) ← → ˆ Tz(R)ˆ R(αB, βB, γB)B(r)

Can re-write this in many ways, e.g.

ˆ R(αB, βB, 0)−1 ˆ Tz(R)−1 ˆ R(0, βA, γA)A(r) ← → ˆ Rz(γB)B(r)

Ultimately, operators transform coefficients in “simple” ways, e.g. Score: SAB(γB) =

nlmp
A′

nlm

.
B∗

nlpe−ipγB

.

5 / 12

SLIDE 6

The Docking Equation for Cyclic Symmetries (Cn)

Cn systems are planar, with symmetry operator ˆ Ry(ω = 2π/n)

x z y x z y

C2 axis C3 axis

ω ω

ˆ Ry(ω) ˆ Tz(D)ˆ R(α, β, γ)A(r) ← → ˆ Tz(D)ˆ R(α, β, γ)A(r) After some working, we get a Fourier series in α: SAB(α) =

nlmp Anlm(β, γ)Anlp(D, β, γ)∗d(l) mp(ω)e−i(p−m)α

6 / 12

SLIDE 7

Sam Results – Examples of Each Symmetry Type

All except 2 solutions are rank-1, RMSD < 3 ˚ A w.r.t. crystal structure Main limitation is size of monomer (approx 500 residue limit)

7 / 12

Ritchie and Grudinin (2016), J Appl. Cryst., 49, 158–167

SLIDE 8

“gEMfitter” – GPU-Accelerated Cryo-EM Density Fitting

Representation: 3D shape-density in Cartesian grid Search: brute force search with FFT acceleration Scoring: normalised cross correlation with Laplacian filter

Calculates 3D translations using Cartesian FFT Calculates 3D rotations in GPU texture memory

8 / 12

SLIDE 9

Kpax – Protein Structure Alignements

For the first time: exploit the tetrahedral geometry of Cα atoms to superpose pairs of residues without doing least-squares fitting

Score similarity of local environment of residues (i, j) as product of 3D Gaussians between up-stream and down-stream Cα pairs:

Ki,j = Πn

k=−ne−βkR2

i+k,j+k/4σ2 k

Gives a very fast way to score local 3D similarity of all residue pairs

9 / 12

Ritchie et al. (2012), Bioinformatics, 28, 3274–3281

SLIDE 10

Results – Comparing Rigid and Flexible Alignments

Example: methyl dehydroxygenase / galactose oxidase

PDB codes: 4AAH (572 AA; green/orange) and 1GOF (388 AA; blue/red) all red/orange regions are structurally aligned left: rigid; 267 pairs, 3.3 ˚ A RMSD (20 identities) right: flexible; 308 pairs, 2.2 ˚ A RMSD (23 identities)

Compare with TM-Align (rigid only): TM-Align: 366 pairs, 5.4 ˚ A RMSD (19 identities) ∆(TM-Align, Kpax): 11.6 ˚ A RMSD

10 / 12

SLIDE 11

Applications – PDB-Wide Structure Comparison

KBDOCK1 – database of 3D domain-domain interactions

Allows us to identify “Domain Family Binding Sites” (DFBSs)

QsBio2 – identifying biologically relevant quaternary structures

Allows us to predict QS by homology and to fix wrong annotations in PDB

11 / 12

[1] Ghoorah et al. (2014), Nucleic Acids Research, 42, D389–D395 [2] Dey, Ritchie, Levy (2017), in press

SLIDE 12

Thank You!

http://capsid.loria.fr/ http://hex.loria.fr/ http://sam.loria.fr/ http://gem.loria.fr/ http://kpax.loria.fr/ http://kbdock.loria.fr/

12 / 12

SLIDE 13

ssRNA ab initio docking 1B23 unbound vs bound ssRNA

Fragment-based ssRNA docking

SLIDE 14

Docking RNA sequence Fragment library Protein structure A U G G U Energy Combinatorial assembly

Low total energy
High connectivity
No clashes

U G G Search for path with

~ 3000 conf per sequence

Fragment-based ssRNA docking

~500.000 poses

SLIDE 15

Fragment-based ssRNA docking

frag k pose i

Nfwd(k ,i)=

∑

i'∈neigbors(i)

Nfwd(k+1,i') Nbwd(k ,i)=

∑

i'tq i∈neigbors(i')

Nbwd(k−1,i') Ntot(k ,i)=Nfwd (k ,i)×Nbwd(k ,i)

Connection propensity

SLIDE 16

frag k pose i 1 1 1 1 1 2 2 2 3 6 2 11 2 8 13 11 10 11

Fragment-based ssRNA docking

Nfwd(k ,i)=

∑

i'∈neigbors(i)

Nfwd(k+1,i') Nbwd(k ,i)=

∑

i'tq i∈neigbors(i')

Nbwd(k−1,i') Ntot(k ,i)=Nfwd (k ,i)×Nbwd(k ,i)

Connection propensity

SLIDE 17

frag k pose i 1 1 1 1 2 1 3 4 6 3 4 7 3 10 10 7 7 17 1 1 1 1 1 2 2 2 3 6 2 11 2 8 13 11 10 11

Fragment-based ssRNA docking

Nfwd(k ,i)=

∑

i'∈neigbors(i)

Nfwd(k+1,i') Nbwd(k ,i)=

∑

i'tq i∈neigbors(i')

Nbwd(k−1,i') Ntot(k ,i)=Nfwd (k ,i)×Nbwd(k ,i)

Connection propensity

SLIDE 18

frag k pose i 1 1 1 1 2 1 3 4 6 3 4 7 3 10 10 7 7 17 1 1 1 1 1 2 2 2 3 6 2 11 2 8 13 11 10 11 7 7 17 10 3 8 14 20 9 24 12 33 4 8 11 13 11 10

Fragment-based ssRNA docking

Nfwd(k ,i)=

∑

i'∈neigbors(i)

Nfwd(k+1,i') Nbwd(k ,i)=

∑

i'tq i∈neigbors(i')

Nbwd(k−1,i') Ntot(k ,i)=Nfwd (k ,i)×Nbwd(k ,i)

Connection propensity

SLIDE 19

frag k pose i 1 1 1 1 1 3 4 6 3 4 7 10 10 7 7 1 1 1 2 2 2 3 6 2 11 8 13 11 10 11 7 7 10 8 14 20 9 24 12 33 8 11 13 11 10

Fragment-based ssRNA docking

Nfwd(k ,i)=

∑

i'∈neigbors(i)

Nfwd(k+1,i') Nbwd(k ,i)=

∑

i'tq i∈neigbors(i')

Nbwd(k−1,i') Ntot(k ,i)=Nfwd (k ,i)×Nbwd(k ,i)

Connection propensity 17 117

SLIDE 20

frag k pose i Stochastic backtracking => enumerate chains

Fragment-based ssRNA docking

Nfwd(k ,i)=

∑

i'∈neigbors(i)

Nfwd(k+1,i') Nbwd(k ,i)=

∑

i'tq i∈neigbors(i')

Nbwd(k−1,i') Ntot(k ,i)=Nfwd (k ,i)×Nbwd(k ,i)

Connection propensity 1 1 1 1 1 3 4 6 3 4 7 10 10 7 7 17 1 1 1 1 2 2 2 3 6 2 11 8 13 11 10 11 7 7 17 10 8 14 20 9 24 12 33 8 11 13 11 10

SLIDE 21

frag k pose i Stochastic backtracking => enumerate chains

Fragment-based ssRNA docking

Nfwd(k ,i)=

∑

i'∈neigbors(i)

Nfwd(k+1,i') Nbwd(k ,i)=

∑

i'tq i∈neigbors(i')

Nbwd(k−1,i') Ntot(k ,i)=Nfwd (k ,i)×Nbwd(k ,i)

Connection propensity 1 1 1 1 1 3 4 6 3 4 7 10 10 7 7 1 1 1 2 2 2 3 6 2 11 8 13 11 10 11 7 7 10 8 14 20 9 24 12 33 8 11 13 11 10 17 117

SLIDE 22

frag k pose i Stochastic backtracking => enumerate chains

Fragment-based ssRNA docking

Nfwd(k ,i)=

∑

i'∈neigbors(i)

Nfwd(k+1,i') Nbwd(k ,i)=

∑

i'tq i∈neigbors(i')

Nbwd(k−1,i') Ntot(k ,i)=Nfwd (k ,i)×Nbwd(k ,i)

Connection propensity 1 1 1 1 1 3 4 6 3 4 7 10 10 7 7 1 1 1 2 2 2 3 6 2 11 8 13 11 10 11 7 7 10 8 14 20 9 24 12 33 8 11 13 11 10 17 117

SLIDE 23

Zfwd(k ,i)=

∑

j/connect ( j ,i)

exp( E(i, j) RT )×Zfwd(k−1, j)

Weighten by Boltzmann equation

Zfwd(1,0)

Zfwd(0,0)

Zbwd(1,0)

Zfwd(0,1) P(k ,i)=Z fwd (k ,i)×Zbwd(k ,i)

∑

j