a system for automated data analysis and interpretation
play

A system for automated data analysis and interpretation for - PowerPoint PPT Presentation

A system for automated data analysis and interpretation for biological solution SAXS Maxim Petoukhov EMBL, Hamburg Outstation Outline Introduction Concept of the integrated system Input & output Examples Summary


  1. A system for automated data analysis and interpretation for biological solution SAXS Maxim Petoukhov EMBL, Hamburg Outstation

  2. Outline • Introduction • Concept of the integrated system • Input & output • Examples • Summary & Outlook

  3. SAXS: State of the Art • Brilliant sources for rapid data take and novel methods for data analysis Proposals/Groups, X33, EMBL Hamburg Total N proposals Biomolecular solutions Number of groups 150 150 • A rapid increase in the biological 100 100 users community; active training 50 50 0 0 1999 2001 2003 2005 2007 2009 Year • Automation, remote access, high throughput data reduction • Active use in multidisciplinary projects • IT: on-line services, pipelines, databases Hardware- independent analysis block

  4. Automated SAXS pipeline Data acquisition robot, data normalization, reduction and XML log file generation Data processing, computation of overall parameters Database search, ab initio Hardware- model building and XML- independent summary file generation analysis block Advanced 3D modelling?

  5. Data Analysis Expert System for Small Angles: DANESSA Bioinformatics Web Services Software Tools for SAS Data Analysis Set of Processed (and Optionally Ranked) Scattering Curves Facilitates routine tasks and enables high throughput SAXS studies

  6. Employed external services � EMBL-EBI Sequence alignment • Annotation by structure • Macromolecular interfaces • � Protein Data Bank Primary sequences of • macromolecules and their atomic coordinates •

  7. Integrated ATSAS software components • DATPOROD – automated calculation of the excluded volume and molecular mass estimate • DAMMIF – ab initio shape determination by simulated annealing using bead model The optimal threshold as a compromise between the number of clusters and the averaged spread within a cluster • DAMCLUST – clustering of multiple 3D models (assessment of multimodality) based on discrepancies between the models

  8. Integrated ATSAS software components • CRYSOL – Evaluation of X- ray Solution Scattering Curves from Atomic Models • SASREF – Rigid body modelling of multi-component particles against solution lg I, relative scattering data 11 10 • BUNCH – Modelling of 9 multidomain proteins and their 8 0.5 1.0 1.5 2.0 deletion mutants s, nm -1 • OLIGOMER – Quantitative analysis of equilibrium mixtures ∑ = I ( s ) v I ( s ) k k k

  9. Concepts • Object • Individual sequence contributing to one or several samples • generally ≠ sample, ≠ individual atomic model • Project • contains a number of curves (samples) • and the set of corresponding objects B • Samples A1 A2 • A • B A • A1 B • A+B • Objects • A1+B – A Generic project • A+2B – B • A1+2B – A1

  10. Minimalistic Input • List of Objects (sequences) – Sequence A GSGVPSRVI H I RKLPI DVTE GEVI SLGLPF GKVTNLLMLK – … GKNQAFI EMN TEEAANTMV YYTSVTPVLR GQPI YI QFSN – Sequence K HKELKTDSSPNQARAQAALQ AVNSVQSGNL ALAASAAAVD • List of Scattering Profiles – Curve 1 4.138455E-02 5.904029 1.555333E-01 4.371607E-02 5.652469 1.527037E-01 – … 4.604759E-02 5.533381 1.521723E-01 4.837912E-02 5.547052 1.474577E-01 – Curve N 5.071064E-02 5.296281 1.436712E-01 … • Cross-correlation table with molar ratios Object A Object B Object C Object D Object E Curve 1 1 1 1 0 0 Curve 2 0 0 0 1 1 Curve 3 1 1 1 2 2

  11. Case-independent actions: bioinformatics analysis MSPILGYWKIKGLVQPTRLLLEYLEEKYEEHLYERDEGMSPILGYWKIKGLVQPTRLLLEYLEEKYEEHLYERDEGDKWRNKKFELG EAIAACGDVPEIMVIGGGRVYEQFLPKAQKLYLTHIDAEVEGDTHFPDYEPDDWESVFSEFHDADAQNSHSYCFEILERR MSPILGYWKIKGLVQPTRLLLEYLEEKYEEHLYERDEGD HIDAEVEGDTHFPDYEPDDWESVFSEFHDADAQNSHSYCFEILERR KIKGLVQPTRLLLEYLEEKYEEHLYERDEGDKWRNKKFELG

  12. Case-independent actions: oligomerization assessment B A MSPILGYWKIKGLVQPTRLLLEYLEEKYEEHLYERDEGMSPILGYWKIKGLVQPTRLLLEYLEEKYEEHLYERDEGDKWRNKKFELG MSPILGYWKIKGLVQPTRLLLEYLEEKYEEHLYERDEGMSPILGYWKIKGLVQPTRLLLEYLEEKYEEHLYERDEGDKWRNKKFELG MSPILGYWKIKGLVQPTRLLLEYLEEKYEEHLYERDEGMSPILGYWKIKGLVQPTRLLLEYLEEKYEEHLYERDEGDKWRNKKFELG A:B 2:1 MW expected MW experimental

  13. Case-independent actions: ab initio modelling • Multiple bead modelling runs for each sample in P1 • For non-monomeric states additional reconstructions with appropriate symmetries (e.g. P222 and P4 for tetramers) • Clustering of independent reconstructions • Averaged volumes are determined

  14. Selecting Scenario • Bound vs Dissociated ? + ≥ • Modular protein vs Assembly with no flexible parts • Deletion mutants vs single curve fitting lg I, relative 11 10 9 8 0.5 1.0 1.5 2.0 s, nm -1

  15. Scenario-based modelling • Dissociation • Composition analysis with OLIGOMER • Other curves or PDB files to evaluate formfactors • Proteins with linkers • Modelling with BUNCH • Combining the curves from the same family (simultaneous fitting of deletion mutants) • Single domain (possibly in various conditions) • Validation/Identification of biologically active oligomers by CRYSOL • Modelling of quaternary structure by SASREF with symmetry restraints • Multisubunit complex(es) / multidomain proteins with no gaps • Global rigid body modelling with SASREF • Accounting for assembly predictions of individual subunits (PISA) • Combining multiple curves where applicable • Switching / mixing of scenarios possible

  16. Results output • All 3D modelling attempts are performed multiple times • Non-uniqueness for each type of reconstruction is assessed by clustering • The results are stored in an SQLite database for easy retrieval

  17. Case studies • Binary complex • Dissociation • Oligomeric equilibrium • Modular protein • Quaternary structure of multimer

  18. Examples: binary complex internalin (listeria monocytogenes) / e-cadherin (human) Collaboration: H.Niemann (Braunschweig)

  19. Examples: two-component mixture Internalin Met receptor Fit by linear (semaphorin combination of two domain) experimental profiles + Mixture Fit by two scattering intensities from atomic models Collaboration: H.Niemann (Braunschweig) and E.Gherardi (Cambridge)

  20. Examples: oligomeric equilibrium H(C) fragment dimer of Tetanus toxin monomer mixture O. Qazi, B. Bolgiano, D. Crane, D.I. Svergun, P.V. Konarev, Z.-P. Yao, C.V. Robinson, K.A. Brown and N. Fairweather (2007). JMB 365 , 123-34.

  21. Examples: modular protein & deletion mutants Polypyrimidine tract binding protein (PTB) Overlap of the typical ab initio and rigid body models Petoukhov, M. V., Monie, T. P., Allain, F. H., Matthews, S., Curry, S., and Svergun, D. I. (2006). Structure 14 , 1021-1027.

  22. Examples: modular protein & deletion mutants Single vs multiple curves fitting

  23. Examples: modular protein & deletion mutants Multifit Data Tool Symmetry PISA? Identifier Chi ? ptb_ab_a31/bunch_01_P1- ptb_ab_a31c.dat bunch single P1 0 1.27 12/bun-10 ptb_bc_a54/bunch_01_P1- ptb_bc_a54c.dat bunch single P1 0 1.02 23/bun-08 ptb_cd_a18/bunch_01_P1- ptb_cd_a18c.dat bunch single P1 0 0.99 34/bun-08 ptb123/bunch_01_P1-123m/bun- ptb123c.dat bunch single P1 0 0.99 01 ptb123/bunch_02_P2-123m/bun- ptb123c.dat bunch single P2 0 1.08 02 ptb_bcd_a13/bunch_01_P1- ptb_bcd_a13c.dat bunch single P1 0 1.02 234/bun-10 ptb_del_a39/bunch_01_P1- ptb_del_a39c.dat bunch single P1 0 1.1 delm/bun-02 ptb123-multi/bunch_01_P1- ptb123c.dat bunch multi P1 0 1.05 123m/bun-08 ptb_bcd_a13- ptb_bcd_a13c.dat bunch multi P1 0 1.07 multi/bunch_01_P1-234/bun-04 ptb_del_a39-multi/bunch_01_P1- ptb_del_a39c.dat bunch multi P1 0 1.25 delm/bun-02

  24. Examples: multimer Tetrameric glucose isomerase E.Mylonas , EMBL-HH

  25. Conclusions • A working prototype of the integrated system for automated SAXS data analysis and 3D model building is created • An Atsas-online Web portal for remote access is provided • Minimal information is required from the User • Up-to-date programs from ATSAS package and Web-based bioinformatics tools are employed • DANESSA liberates one from a routine work but not (yet) from the need for thinking

  26. Acknowledgements • Course Organizers • BioSAXS Group @EMBL-Hamburg • Collaborations: • S. Curry (Imperial College, London, UK) • E. Gherardi (Medical Research Council Centre, Cambridge, UK) • H. Niemann (GBF, Braunschweig, Germany) • K. Brown (Imperial College, London)

Recommend


More recommend