Regulatory Motif Prediction in DNA Regulatory Motif Prediction in - PowerPoint PPT Presentation

Regulatory Motif Prediction in DNA Regulatory Motif Prediction in DNA • Introduction: toward transcription regulatory networks Ab initio discovery of motifs by over-representation of regular expressions • • The weight matrix representation of regulatory motifs. Ab initio discovery with weight matrices: MEME and the Gibbs Sampler • • Discovery of regulatory modules in higher eukaryotes. Ab initio regulatory motif discovery in phylogenetically • related sequences: PhyloGibbs Erik van Nimwegen Division of Bioinformatics Biozentrum, Universität Basel, Swiss Institute of Bioinformatics E. van Nimwegen, EMBnet Geneve, Feb 2006.

Transcription Regulation Networks Transcription Regulation Networks Regulators Promoters Genes (transcription factors) ATG….. ATG….. ATG….. ATG….. ATG….. ATG….. ATG….. ATG….. ATG….. ATG….. E. van Nimwegen, EMBnet Geneve, Feb 2006.

Transcription Regulation Networks Transcription Regulation Networks Regulators Promoters Genes (transcription factors) ATG….. ATG….. ATG….. ATG….. ATG….. ATG….. ATG….. ATG….. ATG….. ATG….. binding sites E. van Nimwegen, EMBnet Geneve, Feb 2006.

Transcription Regulation Networks Transcription Regulation Networks Regulators Promoters Genes (transcription factors) ATG….. ATG….. ATG….. ATG….. ATG….. ATG….. ATG….. ATG….. ATG….. binding sites ATG….. Regulatory network To reconstruct the network we need to identify all binding sites genome-wide and the factor(s) that binds at each site. E. van Nimwegen, EMBnet Geneve, Feb 2006.

Transcription Regulation Networks Transcription Regulation Networks metabolic genes transcription factors cell cycle related genes • The number of transcription regulators increases roughly quadratically with the size of the genome. • The number of regulators per gene thus increases linearly with the size of the genome. From: E. van Nimwegen Trends in Genetics 19 479-484 (2003) E. van Nimwegen, EMBnet Geneve, Feb 2006.

Transcription Regulation Networks Transcription Regulation Networks Knowledge from direct experimentation: E. coli : • almost 200,000 papers in PubMed. Over 17,000 on transcription. • About 300 TFs. • Less than 100 TFs with at least 1 known binding site. • About 750 known sites in total. (of 2,500-8,000 ?) S. cerevisiae : • Almost 60,000 papers in PubMed. Over 10,000 on transcription. • About 350 TFs. • About 65 TFs with at least 1 known binding site. • About 450 known sites in total. (of > 10,000 ?) Even in intensely studied model organisms the majority of regulatory sites is not known. E. van Nimwegen, EMBnet Geneve, Feb 2006.

Ab initio initio discovery of regulatory sites Ab discovery of regulatory sites General Approaches: 1. Collect sets of (intergenic) sequences that are thought to contain binding sites for a common regulatory factor. Examples: • Upstream regions of co-regulated genes. • Sequence fragments pulled down with ChrIP then search for overrepresented short sequence motifs among them. Microarray experiments (gene expression) Sets of sequences containing sites Binding experiments for a common regulatory factor. (ChIP-on-chip) Other external biological knowledge E. van Nimwegen, EMBnet Geneve, Feb 2006.

Representation by consensus sequence Representation by consensus sequence or regular expression or regular expression The experimentally known binding ACGCGT sites of MBP1 (yeast TF): ACGCGT ACGCGA ACGCGT So called IUPAC symbols ACGCGA are used to represent sets CCGCGT TCGCGA of nucleotides. For instance: ACGCGT W = {A,T} and H = {A,C,T} ACGCGT ACGCGT ACGCGT ACGCGT Consensus sequence: (take the majority base in each column) ACGCGT Regular expression: (take the IUPAC symbol for the sequences HCGCGW occurring in each column) E. van Nimwegen, EMBnet Geneve, Feb 2006.

Scan for over- -represented patterns represented patterns Scan for over Gene A ATG….. Gene B ATG….. Gene C ATG….. Gene D ATG….. ATG….. Gene E • Exhaustively go through all possible consensus sequences (or regular expressions) s up to some length L. E. van Nimwegen, EMBnet Geneve, Feb 2006.

Scan for over- -represented patterns represented patterns Scan for over Gene A ATG….. AGCTCG TGCTCG Gene B ATG….. Gene C TGCTCG ATG….. Gene D AGCACG ATG….. ATG….. Gene E TGCACG • Exhaustively go through all possible consensus sequences (or regular expressions) s up to some length L. • For a given motif, say s = WGCWCG, find all occurrences. E. van Nimwegen, EMBnet Geneve, Feb 2006.

Scan for over- -represented patterns represented patterns Scan for over Gene A ATG….. AGCTCG TGCTCG Gene B ATG….. Gene C TGCTCG ATG….. Gene D AGCACG ATG….. ATG….. Gene E TGCACG • Exhaustively go through all possible consensus sequences (or regular expressions) s up to some length L. • For a given motif, say s = WGCWCG, find all occurrences. • Determine the significance of the motif. Roughly speaking the significance is given by the probability to get so many occurrences in random sequences , e.g. P(WGCWCG) = 0.034 E. van Nimwegen, EMBnet Geneve, Feb 2006.

Scan for over- -represented patterns represented patterns Scan for over Gene A ATG….. AGCTCG TGCTCG Gene B ATG….. Gene C TGCTCG ATG….. Gene D AGCACG ATG….. Gene E TGCACG ATG….. • Exhaustively go through all possible consensus sequences (or regular expressions) s up to some length L. • For a given motif, say s = WGCWCG, find all occurrences. • Determine the significance of the motif. Roughly speaking the significance is given by the probability to get so many occurrences in random sequences, e.g. P(WGCWCG) = 0.034 • Rank all motifs by significance and report the motifs with highest significance. E. van Nimwegen, EMBnet Geneve, Feb 2006.

Over- -representation of consensus representation of consensus Over and regular expression patterns and regular expression patterns Example algorithms: • YMF (Sinha and Tompa) • Weeder (Pavesi et al.) Advantages: • The search is exhaustive. If a significant motif exists it is guaranteed to be found. Disadvantages: • Consensus sequences and regular expressions are not necessarily a good representation of binding sites. (next slides) • The significant motifs are often partially redundant. For example: ATTACTAT WWACTWTTA AATTAC ATTACGG Now which motif is the “correct” motif? E. van Nimwegen, EMBnet Geneve, Feb 2006.

The weight matrix representation of The weight matrix representation of regulatory motifs regulatory motifs Alignment of known fruR binding sites : CTGAATCGATTTTAT CTGAATCGTTTCAAT CTGAATTGATTCAGG CTGAAACCATTCAAG GTGAATCGATACTTT CTGAAACGCTTCAGC CTGAAACGTTTTTGC TTGAAACGTTTCAGC GTGAATCGTTCAAGC CTGAATCGGTTAACT GTTAAGCGATTCAGC cTGAAtCG* cTGAAtCG *TTcAg TTcAg* * = α i Probabilit y of finding base at position . w i α = = = = 1 1 1 1 For instance : 0 . 07 , 0.53 , 0.27 , 0.13 w w w w A C G T Probability that a site for the TF represented by w will have sequence s : l ∏ = i ( | ) P s w w s i = 1 i E. van Nimwegen, EMBnet Geneve, Feb 2006.

The weight matrix representation of The weight matrix representation of regulatory motifs regulatory motifs Alignment of known fruR binding sites : CTGAATCGATTTTAT CTGAATCGTTTCAAT CTGAATTGATTCAGG CTGAAACCATTCAAG GTGAATCGATACTTT CTGAAACGCTTCAGC CTGAAACGTTTTTGC TTGAAACGTTTCAGC GTGAATCGTTCAAGC CTGAATCGGTTAACT GTTAAGCGATTCAGC cTGAAtCG* cTGAAtCG *TTcAg TTcAg* * The quality of an alignment of putative sites can be measured by the Information score I : ⎛ ⎞ i i n f ∑ ⎜ ⎟ = = = α α i i , background , log f b I f ⎜ ⎟ α α α ⎝ ⎠ n b α α , i ( sites from a WM ) P ≈ nI e ( sites from bg) P E. van Nimwegen, EMBnet Geneve, Feb 2006.

Ab initio initio motif discovery with weight matrices Ab motif discovery with weight matrices Assume the input set of ‘co-regulated’ sequences is a mixture of “random” background sequence plus a number of samples from a weight matrix. ATG….. Unknowns: ATG….. 1. The weight matrix ATG….. 2. The number of sites ATG….. ATG….. 3. The positions of the sites MEME approach: Search the space of WMs for the WM that maximizes the likelihood of the data (summing over all possible binding site configurations for each WM). The likelihood is maximized using “Expectation Maximization”. Gibbs Sampler approach: Search the space of binding site configurations for the configuration that maximizes the likelihood of all sites deriving from a common WM (integrating over all possible WMs) and all other sequence deriving from background. The space of configurations is searched through “Gibbs Sampling”. E. van Nimwegen, EMBnet Geneve, Feb 2006.

Regulatory Motif Prediction in DNA Regulatory Motif Prediction in - PowerPoint PPT Presentation

Regulatory Motif Prediction in DNA Regulatory Motif Prediction in DNA Introduction: toward transcription regulatory networks Ab initio discovery of motifs by over-representation of regular expressions The weight matrix

DNA D DNA Double bl Helix DNA stands for: DNA stands for: U d Under a Deoxyribose

Table of Contents Why DNA Computing? The Structure of DNA DNA Computing Operations on DNA

Take out your DNA model DNA and the Human Genome DNA Model How was your How was your model

Table of Contents Why DNA Computing? The Structure of DNA DNA Computing Operations on

DNA Computing Information Processing with DNA Molecules Christian Jacob, 01/2002. Table of

Eastern Shores (GHOTES) DNA A Family Tree DNA Project Family Tree DNA Family Tree DNA or

RNA Search and Whirlwind tour of ncRNA search & discovery Motif Discovery RNA motif

Motif Discovery Upper Bound An Upper Bound on the Hardness of Exact Matrix Based Motif Discovery

DNA Mo'f Discovery COMPSCI 260 Spring 2016 DNA motif discovery

DNA IN OUR FOOD? EXTRACTION OF DNA FROM STRAWBERRIES (GETTING THE DNA OUT OF STRAWBERRIES) -OR

The Design of Autonomous DNA The Design of Autonomous DNA Nanomechanical Devices: Devices:

DNA evidence: two important features match between two DNA profiles frequency of the DNA profile in

DNA Nucleus Contains cells genetic info (DNA) controls cell functions DNA Structure

Self-Assembling DNA Self-Assembling DNA N. Jonoska Jonoska, N. C. , N. C. Seeman Seeman, DNA

Go Bananas! Introduction Tell you about DNA Show you how to extract DNA from a Banana

Specificity of Protein-DNA recognition of a long DNA binding motif Francisco Melo Ledermann EMBO

AAAAI Annual Meeting- Seminar #4811 Difficult Cases In Anaphylaxis: Biphasic & Protracted

The GOLD Study Research grants from the NHLBI, FDA & Industry - R37 HL51856 - R01 HL126176

1 BLAST and BLAST-like programs Nucleotide Words NCBI FieldGuide NCBI FieldGuide Query

Radiation Risks of Medical Imaging: Separating Fact from Fantasy 1 William R. Hendee, PhD During

Introduction to Programming with Python A Useful Reference

StoryDroid : Automated Generation of Storyboard for Android Apps ICSE 2019 Montral, QC,

Suffix tree Build a tree from the text Used if the text is expected to be the same during

Practical Bioinformatics Mark Voorhies 5/12/2015 Mark Voorhies Practical Bioinformatics

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us

Regulatory Motif Prediction in DNA Regulatory Motif Prediction in - PowerPoint PPT Presentation

Regulatory Motif Prediction in DNA Regulatory Motif Prediction in DNA Introduction: toward transcription regulatory networks Ab initio discovery of motifs by over-representation of regular expressions The weight matrix

DNA D DNA Double bl Helix DNA stands for: DNA stands for: U d Under a Deoxyribose

Table of Contents Why DNA Computing? The Structure of DNA DNA Computing Operations on DNA

Take out your DNA model DNA and the Human Genome DNA Model How was your How was your model

Table of Contents Why DNA Computing? The Structure of DNA DNA Computing Operations on

DNA Computing Information Processing with DNA Molecules Christian Jacob, 01/2002. Table of

Eastern Shores (GHOTES) DNA A Family Tree DNA Project Family Tree DNA Family Tree DNA or

RNA Search and Whirlwind tour of ncRNA search &amp; discovery Motif Discovery RNA motif

Motif Discovery Upper Bound An Upper Bound on the Hardness of Exact Matrix Based Motif Discovery

DNA Mo'f Discovery COMPSCI 260 Spring 2016 DNA motif discovery

DNA IN OUR FOOD? EXTRACTION OF DNA FROM STRAWBERRIES (GETTING THE DNA OUT OF STRAWBERRIES) -OR

The Design of Autonomous DNA The Design of Autonomous DNA Nanomechanical Devices: Devices:

DNA evidence: two important features match between two DNA profiles frequency of the DNA profile in

DNA Nucleus Contains cells genetic info (DNA) controls cell functions DNA Structure

Self-Assembling DNA Self-Assembling DNA N. Jonoska Jonoska, N. C. , N. C. Seeman Seeman, DNA

Go Bananas! Introduction Tell you about DNA Show you how to extract DNA from a Banana

Specificity of Protein-DNA recognition of a long DNA binding motif Francisco Melo Ledermann EMBO

AAAAI Annual Meeting- Seminar #4811 Difficult Cases In Anaphylaxis: Biphasic &amp; Protracted

The GOLD Study Research grants from the NHLBI, FDA &amp; Industry - R37 HL51856 - R01 HL126176

1 BLAST and BLAST-like programs Nucleotide Words NCBI FieldGuide NCBI FieldGuide Query

Radiation Risks of Medical Imaging: Separating Fact from Fantasy 1 William R. Hendee, PhD During

Introduction to Programming with Python A Useful Reference

StoryDroid : Automated Generation of Storyboard for Android Apps ICSE 2019 Montral, QC,

Suffix tree Build a tree from the text Used if the text is expected to be the same during

Practical Bioinformatics Mark Voorhies 5/12/2015 Mark Voorhies Practical Bioinformatics

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us

RNA Search and Whirlwind tour of ncRNA search & discovery Motif Discovery RNA motif

AAAAI Annual Meeting- Seminar #4811 Difficult Cases In Anaphylaxis: Biphasic & Protracted

The GOLD Study Research grants from the NHLBI, FDA & Industry - R37 HL51856 - R01 HL126176