A simple tool from a complex system: A simple tool from a complex - PowerPoint PPT Presentation

A simple tool from a complex system: A simple tool from a complex system: high- -throughput, unsupervised generation of throughput, unsupervised generation of high Protein Families Protein Families Protein Families Protein Families from the from the Protein Homology Network. Protein Homology Network. Protein Homology Network. Protein Homology Network. Duccio Medini Duccio Medini Duccio Medini Duccio Medini Cellular Microbiology Microbiology and and Bioinformatics Bioinformatics Unit Unit Cellular Novartis Vaccines, S Vaccines, S iena (I) Novartis iena (I)

What is the What is the What is the What is the Protein Homology Network (PHN)? Protein Homology Network (PHN)? Protein Homology Network (PHN)? Protein Homology Network (PHN)?

PHN PHN: definitions definitions complete genomes → 761260 predicted proteins � 251 Nodes → Proteins � Nodes Links → Blast alignments with E-score < ε (cut-off) � Links Proteins Proteins Homology relations Homology relations Connected Component → group of proteins � Connected Component connected by a path. Component A Component B

PHN PHN: snapshot of a small portion (1/20) Full: 760,000 proteins and 7x10 7 links (at ε = 1 0 -5 )

The structure of the PHN PHN depends on the homology cut-off ε ε = 10 = 10 -200 -200 ÷ 10 10 -100 -100 S S everal everal relationships elationships missed missed

The structure of the PHN PHN depends on the homology cut-off ε ε = 10 = 10 -80 -80 ÷ 10 10 -40 -40 S S everal everal relationships elationships missed missed + “ strange” + “ strange” connections! onnections!

The structure of the PHN PHN depends on the homology cut-off ε ε = 10 = 10 -30 -30 ÷ 10 10 -10 -10 S S ome relationships ome relationships still till missed missed + several + several inter-family inter-family

The structure of the PHN PHN depends on the homology cut-off ε ε = 10 = 10 -5 -5 The “ giant The “ giant component” component” dominates dominates the network he network

PHN: the giant connected component giant connected component Fraction of nodes included in the largest connected component At ε = 10 -5 63% 63% of the proteins are in the giant component

PHN topology Proximity of a node: Proximity f a node: clustering index C 2 E = = i C ; C C ( ) i i − k k 1 i i Albert R, Barabasi AL (2002) Reviews of Modern Physics 74: 47-97 Connected Connected components: omponents: compactness index η k η = η = η i ; i i − M 1 i

How do we identify Protein Families? How do we identify Protein Families? How do we identify Protein Families? How do we identify Protein Families? Family “ B” Family “ B” Family “ A” Family “ A”

Overlap measure: Overlap measure: neighborhood similarity We define the overlap θ ij of two nodes i , j as the normalized fraction of nearest neighbors that they have in common n ij θ ij = ( ) max k i , k j i θ ij =0 i k i =10 j θ ij =0.3 n ij =3 θ jk ≈ 1 j k j =8 k θ is des igned to identify pairs of nodes s haring a large fraction of their neares t neighbors .

The modularity measure The modularity measure Q Q : correspondence of a network partitioning to the network modular structure (Newman MEJ, Girvan M (2004) Phys ical Review E 69: 26113-26127) ( ) ∑ a i = fraction of edges with at least one end in the i -th component, Q = b i − a i 2 b i = fraction of edges with both ends in the i -th component. i PHN-Families: connected components for θ = 0,5. PHN-Families

Comparison to PFAM Comparison to PFAM (~ 75% testable) Added Links Added Links 〈 θ ij 〉 Protein Classification Protein Classification Fraction Fraction 98.5% confirmed 98.5% confirmed share a domain 98.5% 0.68 do not share a domain 1.5% 0.58 Removed links Removed links 〈 ε 〉 Protein Classification Protein lassification Fraction Fraction 76.4% confirmed 76.4% confirmed ij 10 -10 do not share a domain 8.1% one or two 10 -87 68.3% multi-domains 10 -10 single domain, shared 23.6%

ummary: the PHN-Families Algorithm ummary: the PHN-Families Algorithm S S

Result: PHN-Families Result: PHN-Families Before Beforepartitioning After After partitioning 28,226 PHN-Families 28,226 PHN-Families(giant component disconnected into 14,443 PHN-Families + 26,000 isolated proteins)

How can we use Protein Families? How can we use Protein Families? How can we use Protein Families? How can we use Protein Families? 1. 1. Enhanced Enhanced annotation annotation of new genomic of new genomic sequences equences 2. Whole genome profiling and comparison 3. Identification and study of bacterial organelles

How can we use Protein Families? How can we use Protein Families? How can we use Protein Families? How can we use Protein Families? 1. Enhanced annotation of new genomic sequences 2. Whole g e genome p e profiling and c comparison 3. Identification and study of bacterial organelles

Protein Families as discrete characters: Protein Families as discrete characters: the genomic matrix the genomic matrix Microorganisms Microorganisms Microorganisms Microorganisms Protein Protein Families Families Families ( (functions functions functions) ) Protein Protein Families functions

Bacillales PHN-Family profiles: genomic genomic signatures signatures Archea

How can we use Protein Families? How can we use Protein Families? How can we use Protein Families? How can we use Protein Families? 1. Enhanced annotation of new genomic sequences 2. Whole genome profiling and comparison 3. 3. Identification and stud Identification and study of bacterial organelles y of bacterial organelles

From PHN-Families to bacterial organelles bacterial organelles A classification of proteins into families allows to recognize the similarities between complex structures, even if some individual components are missing, different, or placed in an unexpected position.

Can we group all the building blocks of Can we group all the building blocks of Type IV S Type IV S ecretion S ecretion S ystems? ystems? Functional PHN- Proteins Class Families VirB1 2 42 4 VirB2 4 18 9 7 1 VirB3 3 19 13 1 VirB4 1 228 VirB5 2 46 7 VirB6 2 117 3 VirB7 6 7 7 5 3 1 1 VirB8 2 69 2 VirB9 2 127 2 VirB10 1 119 VirB11 1 724 VirD4 1 174 Covacci et al., S cience (1999) 284, 1328-33. S elected 12 major structural components from 6 reference T4S S belonging to A.tumefaciens , IncN R46 , B.suis , B.pertussis, and H.pylori , which provide a good sampling of the diversity of known TTS S s.

Evolutionary diversification of Type IV S Evolutionary diversification of Type IV S S S Variable set Conserved core A B C D Groups of probably co-evolved Type IV S S

PHN-Families are coherent with molecular molecular philogenesys hilogenesys 180 180 point accepted mutations 230 230 point accepted mutations

Conclusions Conclusions • The complex system: The complex system: The Protein Homology Network is formed by clusters (families of homologous proteins) interconnected. • The simple tool: The simple tool: We have developed a computational method to identify these groups of proteins, the PHN-Families , an unsupervised classification of quality comparable to collections cured by human experts. • The huge amount of genomic da The huge amount of genomic data produced can be classified ta produced can be classified before expert curation, to study: � Whole genomes / Organelles / S pecific families. • Integration with Pfam Integration with Pfam and other databases will connect PHN-Fams to experimental data.

Aknowledgements Aknowledgements Claudio Donati Claudio Donati Antonello Antonello Covacci Covacci The BioInformatic The BioInformatic Unit (NV&D) nit (NV&D) The Pfam group The Pfam group (The WT S (The WT S anger Institute) anger Institute) Alessandro Muzzi Alessandro Muzzi Nicola Pacchiani Nicola Pacchiani Robert Finn Robert Finn Roberto Palmas Roberto Palmas Riccardo Riccardo Beltrami Beltrami D. Medini D, Covacci A, Donati C, Protein Homology Network Families Reveal S tep-Wise Diversification of Type III and Type IV S ecretion S ystems , PLoSComputational Biology Vol. 2, No. 12, e173

A simple tool from a complex system: A simple tool from a complex - PowerPoint PPT Presentation

A simple tool from a complex system: A simple tool from a complex system: high- -throughput, unsupervised generation of throughput, unsupervised generation of high Protein Families Protein Families Protein Families Protein Families from

Complex Numbers Complex Numbers 1 / 19 Complex Numbers Complex numbers ( C ) are an extension of

Intermembrane Space H + H + Cyt c Co Q Complex Complex III IV H + ATPase H + Complex

SynAthina Onli line Tools 1. . A mapping tool 2. A Community Tool 3. An Archive Tool 3. An

An introduction to complex numbers The complex numbers Are the real numbers not sufficient? A

Overview of Complex Networks Complex Networks Principles of Complex Systems | @pocsvox Basic

Complex Networks Principles of Complex Systems Basic definitions Examples of CSYS/MATH 300,

Why Complex-Valued When Are Integration . . . Relation to Complex . . . Fuzzy? Why Complex

Math 211 Math 211 Complex Numbers and Matrices October 29, 2001 2 Complex Numbers Complex

Complex Networks Basic definitions Principles of Complex Systems Books Course 300, Fall, 2008

System Modeling: Complex Number and Harmonic Motion Prof. Seungchul Lee Industrial AI Lab.

Hawaii Board of Education Meeting Kauai Complex Area Presentation September 2, 2014 1 Complex Area

Simple vs. Complex Modeling: Choosing the Appropriate Level of Complexity When Using Groundwater

HPE StoreVirtual to StorMagic SvSAN MAKING THE COMPLEX SIMPLE MAKING THE COMPLEX SIMPLE

STORMAGIC SvSAN WaaS & SOFTWARE RAID MAKING THE COMPLEX SIMPLE MAKING THE COMPLEX SIMPLE

A TECHNICAL EXAMINATION OF SvSAN 6.2 MAKING THE COMPLEX SIMPLE MAKING THE COMPLEX SIMPLE

Simple vs. Complex Modeling: Choosing the Appropriate Level of Complexity When Using Groundwater

QUANDLE COCYCLES FROM GROUP COCYCLES YUICHI KABAYA Abstract. We give a construction of a quandle

Do Super Cats Make Odd Knots? Sean Clark MPIM Oberseminar November 5, 2015 Sean Clark Do Super

Invariants for transverse knots from Khovanov-type homologies Contact & links Kh-type

CS 103: Representation Learning, Information Theory and Control Lecture 6, Feb 15, 2019 VAEs and

Simultaneous estimation of alignments and trees Tandy Warnow The University of Texas at Austin

Fibrancy of Symplectic Homology in Cotangent Bundles Thomas Kragh April 5, 2013 Liouville

Intersection cohomology of coisotropic submanifolds Work in progress Poisson 2012 (C.

Non commutative representations of Torelli groups Christian Blanchet, Univ. Paris Diderot, IMJ

Sambuz

Useful Links

Newsletter

Mail Us

A simple tool from a complex system: A simple tool from a complex - PowerPoint PPT Presentation

A simple tool from a complex system: A simple tool from a complex system: high- -throughput, unsupervised generation of throughput, unsupervised generation of high Protein Families Protein Families Protein Families Protein Families from

Complex Numbers Complex Numbers 1 / 19 Complex Numbers Complex numbers ( C ) are an extension of

Intermembrane Space H + H + Cyt c Co Q Complex Complex III IV H + ATPase H + Complex

SynAthina Onli line Tools 1. . A mapping tool 2. A Community Tool 3. An Archive Tool 3. An

An introduction to complex numbers The complex numbers Are the real numbers not sufficient? A

Overview of Complex Networks Complex Networks Principles of Complex Systems | @pocsvox Basic

Complex Networks Principles of Complex Systems Basic definitions Examples of CSYS/MATH 300,

Why Complex-Valued When Are Integration . . . Relation to Complex . . . Fuzzy? Why Complex

Math 211 Math 211 Complex Numbers and Matrices October 29, 2001 2 Complex Numbers Complex

Complex Networks Basic definitions Principles of Complex Systems Books Course 300, Fall, 2008

System Modeling: Complex Number and Harmonic Motion Prof. Seungchul Lee Industrial AI Lab.

Hawaii Board of Education Meeting Kauai Complex Area Presentation September 2, 2014 1 Complex Area

Simple vs. Complex Modeling: Choosing the Appropriate Level of Complexity When Using Groundwater

HPE StoreVirtual to StorMagic SvSAN MAKING THE COMPLEX SIMPLE MAKING THE COMPLEX SIMPLE

STORMAGIC SvSAN WaaS &amp; SOFTWARE RAID MAKING THE COMPLEX SIMPLE MAKING THE COMPLEX SIMPLE

A TECHNICAL EXAMINATION OF SvSAN 6.2 MAKING THE COMPLEX SIMPLE MAKING THE COMPLEX SIMPLE

Simple vs. Complex Modeling: Choosing the Appropriate Level of Complexity When Using Groundwater

QUANDLE COCYCLES FROM GROUP COCYCLES YUICHI KABAYA Abstract. We give a construction of a quandle

Do Super Cats Make Odd Knots? Sean Clark MPIM Oberseminar November 5, 2015 Sean Clark Do Super

Invariants for transverse knots from Khovanov-type homologies Contact &amp; links Kh-type

CS 103: Representation Learning, Information Theory and Control Lecture 6, Feb 15, 2019 VAEs and

Simultaneous estimation of alignments and trees Tandy Warnow The University of Texas at Austin

Fibrancy of Symplectic Homology in Cotangent Bundles Thomas Kragh April 5, 2013 Liouville

Intersection cohomology of coisotropic submanifolds Work in progress Poisson 2012 (C.

Non commutative representations of Torelli groups Christian Blanchet, Univ. Paris Diderot, IMJ

Sambuz

Useful Links

Newsletter

Mail Us

STORMAGIC SvSAN WaaS & SOFTWARE RAID MAKING THE COMPLEX SIMPLE MAKING THE COMPLEX SIMPLE

Invariants for transverse knots from Khovanov-type homologies Contact & links Kh-type