Sequence Motifs: Highly Predictive Features for Protein Function - PowerPoint PPT Presentation

Dec 05, 2023 •433 likes •547 views

Sequence Motifs: Highly Predictive Features for Protein Function Prediction Asa Ben-Hur and Douglas Brutlag Department of Biochemistry, Stanford Background Proteins participate in most of the biochemical processes in the cell

Sequence Motifs: Highly Predictive Features for Protein Function Prediction Asa Ben-Hur and Douglas Brutlag Department of Biochemistry, Stanford
Background � Proteins participate in most of the biochemical processes in the cell � SwissProt: Protein sequence database. Contains ~140K sequences � Enzymes: facilitate chemical reactions � Enzyme Commission (EC) numbers: n1.n2.n3.n4 � SwissProt contains 35K enzymes which belong to ~750 EC classes
Similarity / Representation � Similarity: � Weighted edit distance: Smith-Waterman and BLAST methods � Model-based, e.g. HMM (Haussler et al.) � Fisher kernels (Jaakkola et al.) � Vector-space representation: � Extract a set of properties (amino acid counts etc.) � Represent a sequence in the space of all 20 k k-mers (spectrum and mismatch kernels, Leslie et al.) � Motif composition
Protein Sequence Motifs Snippet of a Multiple sequence alignment � Evolutionarily conserved sequence elements � Represented as regular expressions or as position- specific scoring matrices � Known to be part of protein functional sites: Motifs : � Catalytic sites � Binding sites substitution wildcards group Syntax : amino k[ilmv]…hq acid
Computing Motif Composition Represent motif database in a TRIE with motifs in leaf nodes
The Motif Representation � A “bag of motifs” representation of a protein sequence: Motif Database Motif Count � A high dimensional feature vector: motif database can contain several hundred thousand motifs The motif kernel is a linear kernel that essentially counts the number of motifs two sequences have in common
Assessing Motifs as Features For each class of enzymes we compute a statistic for each feature:
Feature Selection Results � Feature selection using the L 0 (multiplicative update) method of Weston et al. compared with SVM trained on all features: # features for each class Balanced Success Rate:
Classification Results � KNN works very well: � Success rate on all data: 0.94 (same as SVM) � One-against-rest comparison with SVM: Area under ROC50 curve Balanced Success Rate
Conclusion � Motifs: highly discriminative features for predicting the function of a protein � Can provide low dimensional, interpretable classifiers � Domain knowledge required Things I haven’t mentioned: � Discrete motifs vs. scoring matrices � Custom motif databases for enzyme classification

Recommend

Protein Sequence Analysis Protein Sequence Analysis Protein sequence motifs Protein sequence

Protein Sequence Analysis Protein Sequence Analysis Protein sequence motifs Protein sequence motifs Premise: the sequence of a protein Premise: the sequence of a protein sequence gives clues about its structure sequence gives clues

359 views • 11 slides

Network Motifs Bioinformatics: Sequence Analysis COMP 571 - Spring 2015 Luay Nakhleh, Rice

Network Motifs Bioinformatics: Sequence Analysis COMP 571 - Spring 2015 Luay Nakhleh, Rice University Motifs Not all subgraphs occur with equal frequency Motifs are subgraphs that are over-represented compared to a randomized version of

642 views • 52 slides

Protein Sequence Analysis Protein Sequence Analysis Domain review Domain review What is a

Protein Sequence Analysis Protein Sequence Analysis Domain review Domain review What is a domain? What is a domain? Part of a sequence that can fold independently, and is Part of a sequence that can fold independently, and is

634 views • 17 slides

Protein design Chris Bystroff Biology 12 Apr 2016 1 Protein folding/ protein design folding

Protein design Chris Bystroff Biology 12 Apr 2016 1 Protein folding/ protein design folding design structure sequence Sequence space maps to structure space sequence families fold space Structure prediction is "many-to-one".

906 views • 38 slides

A STUDY OF TORSION ANGLES OF RNA MOTIFS By Sai Teja Kshir Sagar Bioinformatics Independent

A STUDY OF TORSION ANGLES OF RNA MOTIFS By Sai Teja Kshir Sagar Bioinformatics Independent Study M May 2010 2010 Under guidance of Dr. Jason Tsong-Li Wang 1 WHAT ARE RNA MOTIFS WHAT ARE RNA MOTIFS Small sequence fragments of RNA which

688 views • 33 slides

Protein-Protein interactions Reducing the complexity Why are protein-protein interactions

12/3/2012 Protein-Protein interactions Reducing the complexity Why are protein-protein interactions important? Identify proteins in complexes. Identify proteins that are in a metabolic or signaling pathway. Identify members of a

820 views • 8 slides

Sequence to Sequence models: Attention Models 1 Sequence-to-sequence modelling Problem:

Deep Learning Sequence to Sequence models: Attention Models 1 Sequence-to-sequence modelling Problem: A sequence goes in A different sequence comes out E.g. Speech recognition: Speech goes in, a word

1.77k views • 162 slides

Sequence to Sequence models: Attention Models 1 Sequence-to-sequence modelling Problem:

Deep Learning Sequence to Sequence models: Attention Models 1 Sequence-to-sequence modelling Problem: A sequence goes in A different sequence comes out E.g. Speech recognition: Speech goes in, a word

2.23k views • 167 slides

Sequence to Sequence models: Connectionist Temporal Classification 1 Sequence-to-sequence

Deep Learning Sequence to Sequence models: Connectionist Temporal Classification 1 Sequence-to-sequence modelling Problem: A sequence goes in A different sequence comes out E.g. Speech recognition:

1.86k views • 172 slides

CSE182-L16 Non-coding RNA Biol. Data analysis: Review Assembly Protein Sequence Sequence

CSE182-L16 Non-coding RNA Biol. Data analysis: Review Assembly Protein Sequence Sequence Analysis Analysis Gene Finding Much other analysis is possible Assembly Genomic Analysis/ Pop. Genetics Protein Sequence Sequence Analysis

815 views • 35 slides

N C C C protein sequence but is not fully rigid C C peptide C C bond

3/1/2012 Protein Study of Protein Motion Long sequence of amino-acids (dozens to thousands) SC C C C N O H 1 2 Protein Protein Folding Long sequence of amino-acids (dozens to thousands) Physiological conditions: SC O aqueous

455 views • 8 slides

ORF Calling ORF Calling Why? Need to know protein sequence Protein sequence is usually

ORF Calling ORF Calling Why? Need to know protein sequence Protein sequence is usually what does the work Functional studies Crystallography Proteomics Similarity studies Proteins are better for remote

1.04k views • 54 slides

SEQUENCE ANALYSIS The term " sequence analysis " in biology implies subjecting a DNA or

Sequence Analysis SEQUENCE ANALYSIS The term " sequence analysis " in biology implies subjecting a DNA or peptide sequence to sequence alignment, sequence databases, repeated sequence searches, or other bioinformatics methods on a

823 views • 20 slides

Animal protein production in a Animal protein production in a Animal protein production in a

Animal protein production in a Animal protein production in a Animal protein production in a Animal protein production in a resource depleted world subject to resource depleted world subject to environmental decline and global environmental

421 views • 37 slides

DNA RNA Protein synthesis AMINO ACIDS PROTEIN Protein degradation FUNCTION Some properties

DNA RNA Protein synthesis AMINO ACIDS PROTEIN Protein degradation FUNCTION Some properties of intracellular protein degradation (1970). Abnormal proteins are rapidly eliminated. Normal proteins are selectively degraded at widely

866 views • 29 slides

CSE182-L7 CSE182-L7 Protein structure Basics Protein structure Basics Protein sequencing via MS

CSE182-L7 CSE182-L7 Protein structure Basics Protein structure Basics Protein sequencing via MS Protein sequencing via MS Quiz Quiz What research won the Nobel prize in What research won the Nobel prize in Chemistry in 2004?

507 views • 39 slides

A Scalable Cellular Logic Technology Using Zinc-Finger Proteins Christopher Batten, Ronny

A Scalable Cellular Logic Technology Using Zinc-Finger Proteins Christopher Batten, Ronny Krashinsky, Thomas Knight, Jr. Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology June 20, 2004 Synthetic

489 views • 36 slides

Enhanced Sampling and Free Energy Applications in Biomolecular Modeling Emad Tajkhorshid NIH

Enhanced Sampling and Free Energy Applications in Biomolecular Modeling Emad Tajkhorshid NIH Biotechnology Center for Macromolecular Modeling and Bioinformatics Beckman Institute for Advanced Science and Technology University of Illinois at

517 views • 48 slides

Protein Shakes: Graph 1 1 12 8 4 Protein Shakes: Graph 2 2 12 8 4 P1 SepOct 2012

P1 SepOct 2012 Timothy Van Zandt Prices & Markets Page 1 Session 10 Explicit Price Discrimination Protein Shakes: Graph 1 1 12 8 4 Protein Shakes: Graph 2 2 12 8 4 P1 SepOct 2012 Timothy Van Zandt Prices

661 views • 9 slides

A Fatgraph Model of Protein Structure Carsten Wiuf BiRC Bioinformatics Research Center

A Fatgraph Model of Protein Structure Carsten Wiuf BiRC Bioinformatics Research Center University of Aarhus DIMACS 2009, April 27-29 Bob Penner Jorgen Ellegaard Andersen Michael Knudsen Short Intro and Aim N j H j PROTEIN FATGRAPH

475 views • 24 slides

to the Institutional DURC Oversight Policy July 22, 2015 Prepared by NIH Office of Science Policy

A Case Study Approach to the Institutional DURC Oversight Policy July 22, 2015 Prepared by NIH Office of Science Policy and HHS ASPR on behalf of the U.S. Government 1 Webcast For those joining us by webcast, please follow along! Access

546 views • 26 slides

Novel Motif Detection Algorithms for Finding Protein-Protein Interaction Sites January Wisniewski

Novel Motif Detection Algorithms for Finding Protein-Protein Interaction Sites January Wisniewski MS in Computer Information System Engineering Advisor: Dr. Chen College of Engineering, Department of Computer Science Tennessee State University

474 views • 15 slides

Outline Introduc4on to networks. Network alignment. 1 4/24/09 Signaling Networks

4/24/09 CSCI1950Z Computa4onal Methods for Biology Lecture 21 Ben Raphael April 20, 2009 hGp://cs.brown.edu/courses/csci1950z/ Outline Introduc4on to networks. Network alignment. 1 4/24/09 Signaling Networks Networks and

346 views • 11 slides

COMP598: Introduction to Protein Structure Prediction Jrme Waldisphl School of Computer

COMP598: Introduction to Protein Structure Prediction Jrme Waldisphl School of Computer Science & McGill Centre of Bioinformatics jeromew@cs.mcgill.ca Features slides from Jinbo Xu TTI-Chicago Folding problem K L H G G P

720 views • 71 slides

Sequence Motifs: Highly Predictive Features for Protein Function - PowerPoint PPT Presentation

Sequence Motifs: Highly Predictive Features for Protein Function Prediction Asa Ben-Hur and Douglas Brutlag Department of Biochemistry, Stanford Background Proteins participate in most of the biochemical processes in the cell

Protein Sequence Analysis Protein Sequence Analysis Protein sequence motifs Protein sequence

Network Motifs Bioinformatics: Sequence Analysis COMP 571 - Spring 2015 Luay Nakhleh, Rice

Protein Sequence Analysis Protein Sequence Analysis Domain review Domain review What is a

Protein design Chris Bystroff Biology 12 Apr 2016 1 Protein folding/ protein design folding

A STUDY OF TORSION ANGLES OF RNA MOTIFS By Sai Teja Kshir Sagar Bioinformatics Independent

Protein-Protein interactions Reducing the complexity Why are protein-protein interactions

Sequence to Sequence models: Attention Models 1 Sequence-to-sequence modelling Problem:

Sequence to Sequence models: Attention Models 1 Sequence-to-sequence modelling Problem:

Sequence to Sequence models: Connectionist Temporal Classification 1 Sequence-to-sequence

CSE182-L16 Non-coding RNA Biol. Data analysis: Review Assembly Protein Sequence Sequence

N C C C protein sequence but is not fully rigid C C peptide C C bond

ORF Calling ORF Calling Why? Need to know protein sequence Protein sequence is usually

SEQUENCE ANALYSIS The term " sequence analysis " in biology implies subjecting a DNA or

Animal protein production in a Animal protein production in a Animal protein production in a

DNA RNA Protein synthesis AMINO ACIDS PROTEIN Protein degradation FUNCTION Some properties

CSE182-L7 CSE182-L7 Protein structure Basics Protein structure Basics Protein sequencing via MS

A Scalable Cellular Logic Technology Using Zinc-Finger Proteins Christopher Batten, Ronny

Enhanced Sampling and Free Energy Applications in Biomolecular Modeling Emad Tajkhorshid NIH

Protein Shakes: Graph 1 1 12 8 4 Protein Shakes: Graph 2 2 12 8 4 P1 SepOct 2012

A Fatgraph Model of Protein Structure Carsten Wiuf BiRC Bioinformatics Research Center

to the Institutional DURC Oversight Policy July 22, 2015 Prepared by NIH Office of Science Policy

Novel Motif Detection Algorithms for Finding Protein-Protein Interaction Sites January Wisniewski

Outline Introduc4on to networks. Network alignment. 1 4/24/09 Signaling Networks

COMP598: Introduction to Protein Structure Prediction Jrme Waldisphl School of Computer

Sambuz

Useful Links

Newsletter

Mail Us

Sequence Motifs: Highly Predictive Features for Protein Function - PowerPoint PPT Presentation

Sequence Motifs: Highly Predictive Features for Protein Function Prediction Asa Ben-Hur and Douglas Brutlag Department of Biochemistry, Stanford Background Proteins participate in most of the biochemical processes in the cell

Protein Sequence Analysis Protein Sequence Analysis Protein sequence motifs Protein sequence

Network Motifs Bioinformatics: Sequence Analysis COMP 571 - Spring 2015 Luay Nakhleh, Rice

Protein Sequence Analysis Protein Sequence Analysis Domain review Domain review What is a

Protein design Chris Bystroff Biology 12 Apr 2016 1 Protein folding/ protein design folding

A STUDY OF TORSION ANGLES OF RNA MOTIFS By Sai Teja Kshir Sagar Bioinformatics Independent

Protein-Protein interactions Reducing the complexity Why are protein-protein interactions

Sequence to Sequence models: Attention Models 1 Sequence-to-sequence modelling Problem:

Sequence to Sequence models: Attention Models 1 Sequence-to-sequence modelling Problem:

Sequence to Sequence models: Connectionist Temporal Classification 1 Sequence-to-sequence

CSE182-L16 Non-coding RNA Biol. Data analysis: Review Assembly Protein Sequence Sequence

N C C C protein sequence but is not fully rigid C C peptide C C bond

ORF Calling ORF Calling Why? Need to know protein sequence Protein sequence is usually

SEQUENCE ANALYSIS The term &quot; sequence analysis &quot; in biology implies subjecting a DNA or

Animal protein production in a Animal protein production in a Animal protein production in a

DNA RNA Protein synthesis AMINO ACIDS PROTEIN Protein degradation FUNCTION Some properties

CSE182-L7 CSE182-L7 Protein structure Basics Protein structure Basics Protein sequencing via MS

A Scalable Cellular Logic Technology Using Zinc-Finger Proteins Christopher Batten, Ronny

Enhanced Sampling and Free Energy Applications in Biomolecular Modeling Emad Tajkhorshid NIH

Protein Shakes: Graph 1 1 12 8 4 Protein Shakes: Graph 2 2 12 8 4 P1 SepOct 2012

A Fatgraph Model of Protein Structure Carsten Wiuf BiRC Bioinformatics Research Center

to the Institutional DURC Oversight Policy July 22, 2015 Prepared by NIH Office of Science Policy

Novel Motif Detection Algorithms for Finding Protein-Protein Interaction Sites January Wisniewski

Outline Introduc4on to networks. Network alignment. 1 4/24/09 Signaling Networks

COMP598: Introduction to Protein Structure Prediction Jrme Waldisphl School of Computer

Sambuz

Useful Links

Newsletter

Mail Us

SEQUENCE ANALYSIS The term " sequence analysis " in biology implies subjecting a DNA or