Outline Day 1 & 2 Introduction: The protein structure knowledge - PDF document

Swiss Institute of Bioinformatics SIB Doctoral School in Bioinformatics Advanced Course Protein Structure: Prediction and Analysis Day 2: Protein Structure Modeling Lausanne, September 1-5, 2008 Torsten Schw ede Biozentrum - Universität Basel Swiss Institute of Bioinformatics Klingelbergstr 50-70 CH - 4056 Basel, Switzerland Tel: + 41-61 267 15 81 Outline Day 1 & 2 Introduction: The protein structure knowledge gap � Recap: Basic principles of proteins and their 3-dimensional structures � � Protein Structure modeling and prediction Comparative protein structure modeling � What happened to fold recognition? � De novo prediction � � Evaluation and Assessment of Protein Structure Model Quality Practicals & Tutorials: Tutorial: Structure Visualization with DeepView (Nicolas Guex, SIB Lausanne) � Practical: Examples of comparative modeling and model evaluation � Exam / credit points: � Student presentations on one of the evaluation examples

The number of distinct protein domains in nature is limited. � Chothia (1992) Proteins. One thousand families for the molecular biologist. Nature. 3 5 7 : 543-4. � Idea: determine structures of representative proteins and then derive most other structures by homology modeling. � Structural Genomics (Protein Structure Initiative, Riken in Japan, SPINE in Europe) Fold Classification Fold Classification � Fold classification is an important to systematically study protein structure evolution � Multi-domain proteins have to be divided into domains prior to classification � There is no consensus on how to delineate the domains. � Three main protein structure classification databases are commonly used: � SCOP: manual classification based on evolutionary information � CATH: semi-automatic classification based on geometric criteria � FSSP: automatic classification based on direct structural similarity

Fold Classification Databases � The CATH database is a hierarchical domain classification of protein structures in the Brookhaven protein databank. � UCL, Janet Thornton & Christine Orengo � clusters proteins semi-automated at four major levels: � Class(C) � Architecture(A) � Topology(T) � Homologous superfamily (H) [ http:/ / w w w .biochem .ucl.ac.uk/ bsm / cath_ new / ] Protein Structure Classification � Class( C) derived from secondary structure content is assigned automatically � Architecture( A) describes the gross orientation of secondary structures, independent of connectivity. � Topology( T) clusters structures according to their topological connections and numbers of secondary structures � Hom ologous Superfam ily ( H) This level groups together protein domains which are thought to share a common ancestor and can therefore be described as homologous. [ http: / / www.cathdb.info ]

Fold Classification Databases � Top of the hierarchy: Example: 1EWF

Example: 1EWF Fold Classification Databases � Structural Classification of Proteins: SCOP � MRC Cambridge (UK), Alexey Murzin, Brenner S. E., Hubbard T., Chothia C. � hierarchical classification of protein domain structures � created by manual inspection � comprehensive description of the structural and evolutionary relationships � organized as a hierarchical structure • Class • Fold • Superfamily • Family • Species [ http:/ / scop.m rc-lm b.cam .ac.uk/ scop/ ]

Fold Classification Databases The different m ajor levels in the hierarchy are: � Fold : Major structural similarity Proteins are defined as having a common fold if they have the same major secondary structures in the same arrangement and with the same topological connections. � Superfam ily : Probable common evolutionary origin Proteins that have low sequence identities, but whose structural and functional features suggest that a common evolutionary origin is probable are placed together in superfamilies. � Fam ily : Clear evolutionarily relationship Proteins clustered together into families are clearly evolutionarily related. Generally, this means that pair wise residue identities between the proteins are 30% and greater. Fold Classification Databases

Fold Classification Databases Fold Classification Databases

qCOPS and Fold Space Navigator � Quantitative classification of protein structures, navigation through fold space and visualization of pairwise structure similarities. http: / / www.came.sbg.ac.at Protein Structure / Fold Databases PDB: http: / / www.pdb.org � � EBI-MSD http: / / www.ebi.ac.uk/ msd/ SCOP http: / / scop.mrc-lmb.cam.ac.uk/ scop/ � � CATH http: / / www.biochem.ucl.ac.uk/ bsm/ cath_new/ � FSSP: http: / / ekhidna.biocenter.helsinki.fi/ dali/ start

Comparing Protein Structures � Why do we want to compare protein structures? � Classify structures � Identify structural movements (induced fit, NMR, etc.) � Analyze evolutionary relationships � Identify recurring structural motifs � Assess quality of predicted models � … Comparing Protein Structures What do we need to compare structures? � � Protein sequences can be treated as linear strings of letters . For a given similarity matrix, sequences can be aligned optimally using dynamic programming. Protein structures are 3-dimensional objects. We need � to find algorithms (analogue to DP for sequences) which find an optimal match for two shapes – given a certain similarity measure.

Comparing Protein Structures � What do we need to compare structures? 1. Structural feature description 2. Comparison / superposition algorithms 3. Distance / similarity measure 1. Description A Description B 2. Similarity / Distance 3. Measure Comparing Protein Structures � Local or global comparison? � Global: n = 5 � Local: n = 4

Comparing Protein Structures � Distance measure: Root mean square deviation � Comparing two sets of points (= atoms in structures) A = { a 1 … a n } and B = { b 1 … b n } with a i Position vector of atom i in structure A n Number of equivalent atoms � We need to define a 1: 1 correspondence for atoms in A and B � Root mean square distance is calculated from the squared Euclidian distances between corresponding points: n ∑ − 2 ( ) a b i i = = . . . . 0 i r m s d n Comparing Protein Structures Distances in Euclidian Space � For two points x = (x 1 ,x 2 ,x 3 ,… ,x n ) and y = (x 1 ,x 2 ,x 3 ,… ,x n ), a p- norm distance is defined as: n ∑ = − x y i i � 1-norm distance = 1 i 1 / 2 ⎛ 2 ⎟ ⎞ = ∑ n ⎜ − x y � 2-norm distance i i ⎝ ⎠ = 1 i 1 / p ⎛ ⎞ = ∑ n � p-norm distance ⎜ − p ⎟ x y i i ⎝ ⎠ = 1 i � In Euclidian space R n , distances are normally given as Euclidian distance (= 2-norm distance), which is a generalization of the Pythagorean theorem. � p need not be integer, but can not be less than 1.

Comparing Protein Structures Distances � Mathematically, “distance” is a function which meets the following criteria: � One can find the distance between any two points. � That distance is a distinct real number. � It is positive definite. d( x , y ) ≥ 0, and d( x , y ) = 0 if and only if x = y . � It is symmetric. d( x , y ) = d( y , x ). � It satisfies the triangle inequality, d( x , z ) ≤ d( x , y ) + d( y , z ). � Such a function is known as a metric. � Geometrically, the right-hand part of the triangle inequality states that the sum of the lengths of any two sides of a triangle is greater than the length of the remaining side: Y Z X Comparing Protein Structures � Protein Structure Superposition � Assume that we know the correspondence set between A and B (e.g. NMR ensembles, induced fit) � Task: Find a rigid transformation RT which best superposes A = { a 1 … a n } and B = { b 1 … b n } � Many solutions in image analysis, mechanical engineering, … � Find a rigid transformation RT which minimizes the error function E: n ∑ 2 = − min ( ) E RT a b RT i i = 1 i

Comparing Protein Structures � Protein Structure Superposition � The rigid transformation RT has a translational component T and a rotational component R: RT(a) = R(X) + T � The error function to be minimized becomes: n ∑ 2 = − + min ( ) E R a b T RT i i = 1 i Comparing Protein Structures � 1. The translational component � The error function is as its minimum with respect to the translation when: ∂ n E ( ) ∑ = − + = 2 ( ) 0 R a b T ∂ i i T = 1 i ⎛ ⎞ n n ∑ ∑ = − ⎜ ⎟ + T R a b i i ⎝ ⎠ = = 1 1 i i � If both structures A and B are centered on the coordinate origin, Σ a i and Σ b i become 0, and then also T = 0. � In the first step, we translate the centers of structures A and B to the origin of the coordinate system.

Outline Day 1 & 2 Introduction: The protein structure knowledge - PDF document

Swiss Institute of Bioinformatics SIB Doctoral School in Bioinformatics Advanced Course Protein Structure: Prediction and Analysis Day 2: Protein Structure Modeling Lausanne, September 1-5, 2008 Torsten Schw ede Biozentrum - Universitt

At Creation Common Holy Day 1 Day 2 Day 8 Day 9 Day 3 Day 4 Day 5 Day 6 Day 7 7 Days The

Science with a Little Altitude | QS18 Fah Sathirapongsasuti, PhD EBC Everest Day 1 Day 2 Day

ENGLAND | APRIL 12 20, 2020 8 DAY TOU R SUGGE STE D ITI N E R ARY* DAY 0 DAY 1 DAY 2

Day 1 Day 1 Staging area Buses & Ambulances In Use Day 1 Day 2 Days 2 & 3 Day 4

Introduction to R Day 4: Functions October 10, 2019 Agenda Day 1: Figures Day 2: Selecting,

Module 4 AFA CyberCamp Format Day T wo Day Three Day Four Day Five Day One Windows

Workflow 6 Touchpoints After First Visit Day 0 - Sunday Day 2 - Tuesday Day 6 -

Summer School Overview Day 0: R bootcamp Day 1: Workflow, Google App Engine Day 2:

Ins Domingues Breast Cancer Workshop April 7th 2015 Outline Outline Outline Outline

2014 Investor Day DECEMBER 10, 2014 5 | 2014 INVESTOR DAY | 2014 INVESTOR DAY Welcome MARK

BJC BJC BJC BJC Opportunity Day Opportunity Day 4Q09 Opportunity Day Opportunity Day

CACTM Patricia Arizmendi Garcia LauraCalleja Diez Fernando Carmona Mateos Andrea Magn

Europe 2014 A Million Heartbeats by rt Ahlin 2 Who am I? 3 What do I do? 4 Why QS? 5

2020 Effective Mentoring Program Combined Program (School and Early Childhood) Day 2 1 2020 SB

King, Jr. Day Remember! Celebrate! Act! A Day On, Not A Day Off! January 18, 2016 Dr. Martin

KINDERGARTEN EXPERIENCE How will the full-day kindergarten differ from the half-day

CSE182-L8 Protein Sequence Analysis Patterns (regular expressions) Profiles HMM Gene Finding

Analysing variants with the EBI is an Outstation of the European Molecular Biology Laboratory.

Similarity Searches on Sequence Databases Lorenza Bordoli Swiss Institute of Bioinformatics

Use of SAXS in ubiquitin conjugation research Ubiquitin conjugation is a signalling system Like

Marketed Medical Products Monique L. Anderson, MD, MHS Assistant Professor of Medicine Division

EVALUATION OF THE IMPACT OF TETRAHYDROPYRIDO[2,1-B][1,3,5]THIADIAZINE DERIVATIVES ON LEPODOVA

Pharmacy 483 MUE Cost Effective Medication Utilization Quality Cost Improvement Management

1 Welcome to the Reserve Bank of New Zealand AML/CFT Workshop 7th October 2019 Cable Room,

Outline Day 1 & 2 Introduction: The protein structure knowledge - PDF document

Swiss Institute of Bioinformatics SIB Doctoral School in Bioinformatics Advanced Course Protein Structure: Prediction and Analysis Day 2: Protein Structure Modeling Lausanne, September 1-5, 2008 Torsten Schw ede Biozentrum - Universitt

At Creation Common Holy Day 1 Day 2 Day 8 Day 9 Day 3 Day 4 Day 5 Day 6 Day 7 7 Days The

Science with a Little Altitude | QS18 Fah Sathirapongsasuti, PhD EBC Everest Day 1 Day 2 Day

ENGLAND | APRIL 12 20, 2020 8 DAY TOU R SUGGE STE D ITI N E R ARY* DAY 0 DAY 1 DAY 2

Day 1 Day 1 Staging area Buses &amp; Ambulances In Use Day 1 Day 2 Days 2 &amp; 3 Day 4

Introduction to R Day 4: Functions October 10, 2019 Agenda Day 1: Figures Day 2: Selecting,

Module 4 AFA CyberCamp Format Day T wo Day Three Day Four Day Five Day One Windows

Workflow 6 Touchpoints After First Visit Day 0 - Sunday Day 2 - Tuesday Day 6 -

Summer School Overview Day 0: R bootcamp Day 1: Workflow, Google App Engine Day 2:

Ins Domingues Breast Cancer Workshop April 7th 2015 Outline Outline Outline Outline

2014 Investor Day DECEMBER 10, 2014 5 | 2014 INVESTOR DAY | 2014 INVESTOR DAY Welcome MARK

BJC BJC BJC BJC Opportunity Day Opportunity Day 4Q09 Opportunity Day Opportunity Day

CACTM Patricia Arizmendi Garcia LauraCalleja Diez Fernando Carmona Mateos Andrea Magn

Europe 2014 A Million Heartbeats by rt Ahlin 2 Who am I? 3 What do I do? 4 Why QS? 5

2020 Effective Mentoring Program Combined Program (School and Early Childhood) Day 2 1 2020 SB

King, Jr. Day Remember! Celebrate! Act! A Day On, Not A Day Off! January 18, 2016 Dr. Martin

KINDERGARTEN EXPERIENCE How will the full-day kindergarten differ from the half-day

CSE182-L8 Protein Sequence Analysis Patterns (regular expressions) Profiles HMM Gene Finding

Analysing variants with the EBI is an Outstation of the European Molecular Biology Laboratory.

Similarity Searches on Sequence Databases Lorenza Bordoli Swiss Institute of Bioinformatics

Use of SAXS in ubiquitin conjugation research Ubiquitin conjugation is a signalling system Like

Marketed Medical Products Monique L. Anderson, MD, MHS Assistant Professor of Medicine Division

EVALUATION OF THE IMPACT OF TETRAHYDROPYRIDO[2,1-B][1,3,5]THIADIAZINE DERIVATIVES ON LEPODOVA

Pharmacy 483 MUE Cost Effective Medication Utilization Quality Cost Improvement Management

1 Welcome to the Reserve Bank of New Zealand AML/CFT Workshop 7th October 2019 Cable Room,

Day 1 Day 1 Staging area Buses & Ambulances In Use Day 1 Day 2 Days 2 & 3 Day 4