Developing and Using Special Developing and Using Special - PowerPoint PPT Presentation

Developing and Using Special Developing and Using Special Developing and Using Special Purpose Hidden Markov Model Purpose Hidden Markov Model Purpose Hidden Markov Model Databases Databases Databases Martin Gollery Associate Director of Bioinformatics University of Nevada, Reno Mgollery@unr.edu

Today’ ’s Tutorial s Tutorial Today • Instructor: Martin Gollery • Associate Director of Bioinformatics, University of Nevada, Reno • Consultant to several organizations • Formerly with TimeLogic • Developed several HMM databases

Hidden Markov Models Hidden Markov Models • What HMM’s are • Which HMM programs are commonly used • What HMM databases are available • Why you would use one DB over another • Integrated Resources- InterPro and more • How you can build your own HMM DB • Problems with building your own • Live demonstration

Hidden Markov Models- - Hidden Markov Models What are they, anyway? What are they, anyway? • Statistical description of a protein family's consensus sequence • Conserved regions receive highest scores • Can be seen as a Finite State Machine

Representation of Family Representation of Family Members Members yciH KDGII • ZyciH KDGVI • VCA0570 KDGDI • HI1225 KNGII • sll0546 KEDCV • C D E G I K N V 1 1.0 2 0.6 0.2 0.2 3 0.2 0.8 4 0.2 0.2 0.4 0.2 5 0.8 0.2

Representation of gaps in Family Representation of gaps in Family Members Members yciH KDGII • ZyciH KDGVI • VCA0570 KDGDI • HI1225 KNGII • sll0546 KED-V • C D E G I K N V - 1 1.0 2 0.6 0.2 0.2 3 0.2 0.8 4 0.2 0.4 0.2 0.2 5 0.8 0.2

For Maximum sensitivity- - For Maximum sensitivity C D E G I K N V - 1 1.0 2 0.6 0.2 0.2 3 0.2 0.8 4 0.2 0.4 0.2 0.2 5 0.8 0.2 No residue at any position should have a zero probability, even if it was not seen in the training data.

Start with an MSA… … Start with an MSA CLUSTAL W (1.7) multiple sequence alignment • yciH KDGVIEIQGDKRDLLKSLLEAKGMKVKLAGG • ZyciH KDGVIEIQGDKRDLLKSLLEAKGMKVKLAGG • VCA0570 KDGDIEIQGDVRDQLKTLLESKGHKVKLAGG • HI1225 KNGIIEIQGEKRDLLKQLLEQKGFKVKLSGG • sll0546 KEDCVEIQGDQREKILAYLLKQGYKAKISGG • PA4840 KDGVVEIQGEHVELLIDELLKRGFKAKKSGG • AF0914 KNGVIELQGNHVNRVKELLIKKGFNPERIKT • *:. :*:**: : : * :* : : •

Hidden Markov Models Hidden Markov Models HMMER2.0 • NAME example2 • DESC Small example for demonstration purposes • LENG 31 • ALPH Amino • COM hmmbuild example2 example2.aln • NSEQ 7 • DATE Wed Jan 08 13:33:06 2003 • HMM A C D E F G H I K … • 1 -3217 -3413 -3082 -2664 -4291 -3257 -2104 -4231 3883 … • 2 -1938 -3859 2747 1592 -4024 -1857 -1206 -3953 -1455 … • 3 -2160 -3144 1834 -953 -4284 3247 -2013 -4362 -2365 … • 4 -1255 2750 436 -2789 -1273 -2972 -2049 1510 -2543 … • 5 -2035 -1558 -4660 -4320 -2085 -4409 -4229 3081 -4224 … • 6 -3264 -3765 -1447 3822 -4535 -2948 -2636 -4814 -2810 … • 7 -2423 -1951 -4843 -4395 -1156 -4544 -3680 3291 -4151 … • 8 -3220 -3396 -2530 -2667 -3851 -3171 -2735 -4442 -2277 … • 9 -3196 -3194 -3915 -4259 -4867 3789 -4005 -5414 -4591 … • 10 -1923 -3837 2743 2134 -4005 -1854 -1196 -3929 -1434 … • 11 -999 -2164 -952 -353 -2483 -1909 3321 -2139 1730 … • 12 -1629 -1909 -2827 -2102 -2279 -2588 -1442 -1012 -488 … •

Emission Probabilities Emission Probabilities • What is the likelihood that sequence X was emitted by HMM Y? • Likelihood is calculated by adding the probability of each residue at each position, and each of the transition probabilities

Plan7 from Outer Space Plan7 from Outer Space (Well, from St. Louis, anyway!) (Well, from St. Louis, anyway!)

HMM’ ’s s vs BLAST vs BLAST HMM • Position specific scoring vs. general matrix • Example: dDGVIvIddDKRDLLKSLiEAKkMKVKLAGG – KDGVIEIQGDKRDLLKSLLEAKGMKVKLAGG has 80% BLAST – similarity, but misses highly conserved regions • Scoring emphasizes important locations • Clearer score cutoffs • However, it is MUCH slower!

HMM programs HMM programs • HMMer -Sean Eddy, Wash U • SAM - Haussler, UCSC • Wise tools - Birney, EBI • SledgeHMMer - Subramaniam, SDSC • Meta-MEME - Noble & Bailey • PSI-BLAST - NCBI • SPSpfam - Southwest Parallel Software • Ldhmmer - Logical Depth • DeCypherHMM - TimeLogic

What exactly do you want? What exactly do you want? • Are you searching thousands of sequences with one or a few models? • Use hmmsearch • Searching a few sequences with thousands of models? • Use hmmpfam • Thousands of sequences vs. Thousands of models? • Use an accelerator, if you do it very often

HMM databases HMM databases • PFAM • TIGRFAM • Superfamily • SMART • Panther • PRED-GPCR

HMM databases at the CFB HMM databases at the CFB • COGfam • KinFam • HydroHMMer • NVfam-pro • NVfam-arc • NVfam-fun • NVfam-pln

PFAM PFAM • From Sanger, WashU, KI, INRA • Version 17 has 7868 families • Most widely used HMM database • Good annotation team

PFAM PFAM • PFAM-A is hand curated • From high quality multiple Alignments • PFAM-B is built automatically from ProDom • Generated using the Domainer algorithm • ProDom is built from SP/TREMBL

PFAM PFAM • Pfam-ls = global alignments • Pfam-fs = local alignments, so that matches may include only part of the model • Both the –ls and –fs versions are local W.R.T. the sequence

PFAM PFAM • Note ‘type’ annotation • Labeled TP • Family • Domain • Repeat • Motif

TIGRFAMs TIGRFAMs • Available at (www.tigr.org/TIGRFAMs/) • Organized by functional role • Equivalogs: a set of homologous proteins that are conserved with respect to function since their last common ancestor • Equivalog domains: domains of conserved function

TIGRFAMs TIGRFAMs • 2453 models in release 4.1 • Complementary to PFAM, so run both • Part of the Comprehensive Microbial Resource (CMR)

TIGRFAMs TIGRFAMs TIGRfam and PFAM alignments for Pyruvate carboxylase. The thin line represents the sequence. The bars represent hit regions.

SuperFamily SuperFamily • By Julian Gough, formerly MRC, now Riken GSC • www.supfam.org • Provides structural (and hence implied functional) assignments to protein sequences at the superfamily level • Built from SCOP (Structural Classification of Proteins) database, which is built from PDB • Available in HMMer, SAM, and PSI-BLAST formats

SuperFamily SuperFamily • 1447 SCOP Superfamilies • Each represented by a group of HMMs • Over 8500 models total • Table provides comparison to GO, Interpro, PFAM

SMART SMART • Simple Modular Architecture Research Tool • Version 3.4 contains 654 HMMs • Emphasis on mobile eukaryotic domains • smart.embl-heidelberg.de • Annotated with respect to phyletic distributions, functional class, tertiary structures and functionally important residues

SMART SMART • Use for signaling domains or extracellular domains • Normal and Genomic mode

PRED- -GPCR GPCR PRED • Papasaikas et al, U of Athens • 265 HMMs in 67 GPCR families • Based on TiPs Pharmacological classification. • Filters with CAST • signatures regularly updated • Entire system redone each year

-GPCR webserver GPCR webserver PRED- PRED

Panther Panther • Protein ANalysis THrough Evolutionary Relationships • Family and subfamily: families are evolutionarily related proteins; subfamilies are related proteins with the same function • Molecular function: the function of the protein by itself or with directly interacting proteins at a biochemical level, e.g. a protein kinase • Biological process: the function of the protein in the context of a larger network of proteins that interact to accomplish a process at the level of the cell or organism, e.g. mitosis. • Pathway: similar to biological process, but a pathway also explicitly specifies the relationships between the interacting molecules.

Panther Panther • (Thomas et al., Genome Research 2003; Mi et al. NAR 2005) • 6683 protein families • 31,705 functionally distinct protein subfamilies.

Panther Panther • Due to the size, searches could be slow • First, BLAST against consensus seqs • Then, search against models represented by those hits • With an accelerator, you don’t have to do that…

Panther Panther • So- how does it perform? • I took 3451 Arabidopsis proteins with no hit to PFAM, Superfamily, SMART or TIGRfam • Ran it against Panther • Found 160 significant hits!

COG- -HMMs HMMs COG • Clusters of Orthologous Groups of proteins • www.ncbi.nlm.nih.gov/cog/ • Each COG is from at least 3 lineages • Ancient conserved domain • 4873 alignments available • Alignments from NCBI, HMMs from me at mgollery@unr.edu

CDD CDD • Conserved Domain Database (NCBI) • Psi-BLAST profiles are similar to HMMs • 10991 PSSMs - SMART + COG +KOG+ Pfam+CD • Runs with RPS-BLAST • Much faster searches

Developing and Using Special Developing and Using Special - PowerPoint PPT Presentation

Developing and Using Special Developing and Using Special Developing and Using Special Purpose Hidden Markov Model Purpose Hidden Markov Model Purpose Hidden Markov Model Databases Databases Databases Martin Gollery Associate Director of

Developing Developing and Developing and Developing and researching and researching

Special and Extra Special Groups Generalised Bestvina-Brady groups Special Cube Complexes My

SPECIAL EVENTS 2018 Training Planning for a Special Event When do you need a Special Event

Office of Special Events, Film & Tourism SPECIAL EVENTS ORDINANCE City of Savannah / Office

Special Olympics Tennis Special Olympics Tennis Special Olympics Tennis Special Olympics Tennis

Special Services Presentation March 20, 2018 Ellen Gerace, LCSW, Director of Special Services

Formal Modeling in Cognitive Science 1 Special Probability Distributions Uniform Distribution

20:20 Resources Investor DEVELOPING RESOURCES Series, Sydney FOR DEVELOPING November 2013

Company Update RESOURCES FOR February 2013 DEVELOPING MARKETS DEVELOPING RESOURCES FOR

OXLEY POTASH PROJECT DEVELOPING ACQUISITION RESOURCES FOR May 2015 DEVELOPING MARKETS Ben

Special Education Funding Model: Developing a Needs-Based Option Presentation for the COO Special

Special Ed Teacher and SLP Collaborating and Creating Learning Units Suzanne Slaughter - Special

NEGATIVE POSITIVE FLUFFY AND IRRELEVANT UNHEARD GOD ONLY GIVES SPECIAL KIDS TO SPECIAL

West Rocks Middle School SPECIAL EDUCATION What is special education? The purpose of special

Welcome Special Kids, Special Care NICU Consortium Meeting April 25, 2018 A special thank you

AIR TICKETING | SAFARIS | CAR RENTALS AGM AGM AGM SPECIAL SPECIAL SPECIAL LAKE MANYARA

Performance of Scientific Applications Lonnie D. Crosby, R. Glenn Brook, Bhanu Rekapalli,

iRODS functionality within the Grassroots Infrastructure Simon Tyrrell, Xingdong Bian and Robert

Performing Large Science Experiments on Azure: Pitfalls and Solutions Wei Lu, Jared Jackson,

Introducing MapReduce to High End Computing Grant Mackey, Julio Lopez, Saba Sehrish, John Bent,

OddCI: On-Demand Distributed Computing Infrastructure Rostand Costa Francisco Brasileiro Guido

Charm (and DengueInfo) http://dengueinfo.org/ Holland R.C.G., Ong S.H., Verhoef F., Mitchell

Sam's String Metrics Links HomePage Natural Language Processing Group , Research Links

EE 6882 Visual Search Engine Feb. 27 th , 2012 Lecture #6 Object Search Using Local Features

Sambuz

Useful Links

Newsletter

Mail Us

Developing and Using Special Developing and Using Special - PowerPoint PPT Presentation

Developing and Using Special Developing and Using Special Developing and Using Special Purpose Hidden Markov Model Purpose Hidden Markov Model Purpose Hidden Markov Model Databases Databases Databases Martin Gollery Associate Director of

Developing Developing and Developing and Developing and researching and researching

Special and Extra Special Groups Generalised Bestvina-Brady groups Special Cube Complexes My

SPECIAL EVENTS 2018 Training Planning for a Special Event When do you need a Special Event

Office of Special Events, Film &amp; Tourism SPECIAL EVENTS ORDINANCE City of Savannah / Office

Special Olympics Tennis Special Olympics Tennis Special Olympics Tennis Special Olympics Tennis

Special Services Presentation March 20, 2018 Ellen Gerace, LCSW, Director of Special Services

Formal Modeling in Cognitive Science 1 Special Probability Distributions Uniform Distribution

20:20 Resources Investor DEVELOPING RESOURCES Series, Sydney FOR DEVELOPING November 2013

Company Update RESOURCES FOR February 2013 DEVELOPING MARKETS DEVELOPING RESOURCES FOR

OXLEY POTASH PROJECT DEVELOPING ACQUISITION RESOURCES FOR May 2015 DEVELOPING MARKETS Ben

Special Education Funding Model: Developing a Needs-Based Option Presentation for the COO Special

Special Ed Teacher and SLP Collaborating and Creating Learning Units Suzanne Slaughter - Special

NEGATIVE POSITIVE FLUFFY AND IRRELEVANT UNHEARD GOD ONLY GIVES SPECIAL KIDS TO SPECIAL

West Rocks Middle School SPECIAL EDUCATION What is special education? The purpose of special

Welcome Special Kids, Special Care NICU Consortium Meeting April 25, 2018 A special thank you

AIR TICKETING | SAFARIS | CAR RENTALS AGM AGM AGM SPECIAL SPECIAL SPECIAL LAKE MANYARA

Performance of Scientific Applications Lonnie D. Crosby, R. Glenn Brook, Bhanu Rekapalli,

iRODS functionality within the Grassroots Infrastructure Simon Tyrrell, Xingdong Bian and Robert

Performing Large Science Experiments on Azure: Pitfalls and Solutions Wei Lu, Jared Jackson,

Introducing MapReduce to High End Computing Grant Mackey, Julio Lopez, Saba Sehrish, John Bent,

OddCI: On-Demand Distributed Computing Infrastructure Rostand Costa Francisco Brasileiro Guido

Charm (and DengueInfo) http://dengueinfo.org/ Holland R.C.G., Ong S.H., Verhoef F., Mitchell

Sam's String Metrics Links HomePage Natural Language Processing Group , Research Links

EE 6882 Visual Search Engine Feb. 27 th , 2012 Lecture #6 Object Search Using Local Features

Sambuz

Useful Links

Newsletter

Mail Us

Office of Special Events, Film & Tourism SPECIAL EVENTS ORDINANCE City of Savannah / Office