Mo#f discovery Morgane Thomas-Chollier Computa)onal systems - PowerPoint PPT Presentation

Mo#f ¡discovery ¡ Morgane ¡Thomas-‑Chollier ¡ ¡ Computa)onal ¡systems ¡biology ¡-‑ ¡IBENS ¡ mthomas@biologie.ens.fr ¡ ¡ M2 ¡– ¡Computa6onal ¡analysis ¡of ¡cis-‑regulatory ¡sequences ¡2015/2016 ¡ Denis ¡Thieffry, ¡Jacques ¡van ¡Helden ¡and ¡Carl ¡Herrmann ¡kindly ¡shared ¡some ¡of ¡their ¡slides. ¡ ¡

Co-‑expressed ¡genes ¡ clusters ¡of ¡ co-‑expressed ¡genes ¡ during ¡oxida#ve ¡stress ¡in ¡ yeast ¡ Are ¡they ¡co-‑regulated ¡? ¡ If ¡so, ¡what ¡is ¡the ¡TF ¡? ¡ ¡

Aim ¡of ¡the ¡course ¡ Mo6f ¡discovery ¡ 1 ¡-‑ ¡Understand ¡what ¡is ¡a ¡mo6f ¡discovery ¡problem ¡ Gene 1 Gene 2 Gene 3 Gene 4 Gene 5 Gene 6 2 ¡– ¡Mo6f ¡discovery ¡approaches ¡ § Word ¡coun#ng ¡ § Gibbs ¡sampling ¡ 3 ¡– ¡Important ¡parameters ¡

Co-‑expressed ¡genes ¡ Knowing ¡that ¡a ¡set ¡of ¡genes ¡are ¡co-‑regulated, ¡one ¡can ¡expect ¡that ¡ their ¡upstream ¡regions ¡contains ¡some ¡regulatory ¡signal. ¡ ¡ Gene 1 Gene 2 Gene 3 Gene 4 Gene 5 Gene 6

A ¡mo6f ¡discovery ¡problem ¡ TF ¡? ¡ Mo6f ¡discovery ¡ …HIS7 � 5’- TCTCTCTCCACGGCTAATTAGGTGATCATGAAAAAATGAAAAATTCATGAGAAAAGAGTCAGACATCGAAACATACAT � …ARO4 � 5’- ATGGCAGAATCACTTTAAAACGTGGCCCCACCCGCTGCACCCTGTGCATTTTGTACGTTACTGCGAAATGACTCAACG � Co-‑expressed ¡ …ILV6 � 5’- CACATCCAACGAATCACCTCACCGTTATCGTGACTCACTTTCTTTCGCATCGCCGAAGTGCCATAAAAAATATTTTTT � …THR4 � ¡genes ¡ 5’- TGCGAACAAAAGAGTCATTACAACGAGGAAATAGAAGAAAATGAAAAATTTTCGACAAAATGTATAGTCATTTCTATC � …ARO1 � 5’- ACAAAGGTACCTTCCTGGCCAATCTCACAGATTTAATATAGTAAATTGTCATGCATATGACTCATCCCGAACATGAAA � …HOM2 � 5’- ATTGATTGACTCATTTTCCTCTGACTACTACCAGTTCAAAATGTTAGAGAAAAATAGAAAAGCAGAAAAAATAAATAA � …PRO3 � 5’- GGCGCCACAGTCCGCGTTTGGTTATCCGGCTGACTCATTCTGACTCTTTTTTGGAAAGTGTGGCATGTGCTTCACACA � Problem ¡: ¡If ¡there ¡is ¡a ¡common ¡regula)ng ¡factor, ¡can ¡we ¡discover ¡its ¡mo)f ¡ (some ¡signal) ¡ on the basis of these sequences ONLY ? ¡ § We ¡have ¡a ¡set ¡of ¡sequences ¡ § We ¡suspect ¡that ¡they ¡share ¡some ¡func#onal ¡signal ¡ § We ¡ignore ¡the ¡transcrip#on ¡factors ¡involved ¡in ¡this ¡regula#on. ¡ § We ¡ignore ¡the ¡cis-‑ac#ng ¡elements ¡

Typical ¡mo6f ¡discovery ¡problems ¡ Motif discovery predicted Binding regions ChIP regions in (non-coding) regulatory elements regions Whole set Complete of upstream genome regions Clusters of Microarray co-expressed RNA-seq genes Clusters of ? Gene fusion evolutionarily analysis related genes Phylogenetic coding region profiles upstream region predicted elements Synteny Clusters of Comparative transcription orthologous genomics factors genes

Aim ¡of ¡the ¡course ¡ Mo6f ¡discovery ¡ 1 ¡-‑ ¡Understand ¡what ¡is ¡a ¡mo6f ¡discovery ¡problem ¡ Gene 1 Gene 2 Gene 3 Gene 4 Gene 5 Gene 6 2 ¡– ¡Mo6f ¡discovery ¡approaches ¡ § Word ¡coun#ng ¡ § Gibbs ¡sampling ¡ 3 ¡– ¡Important ¡parameters ¡

Principle: ¡detect ¡unexpected ¡paMerns ¡ TF ¡ Target ¡gene ¡ 5’- TCTCTCTCCACGGCTAATTAGGTGATCATGAAAAAATGAAAAATTCATGAG AAAAGAGTCA GACATCGAAACATACAT � …HIS7 � …ARO4 � 5’- ATGGCAGAATCACTTTAAAACGTGGCCCCACCCGCTGCACCCTGTGCATTTTGTACGTTACTGCG AAATGACTCA ACG � 5’- CACATCCAACGAATCACCTCACCGTTATCG TGACTCACTT TCTTTCGCATCGCCGAAGTGCCATAAAAAATATTTTTT � …ILV6 � 5’- TGCGAAC AAAAGAGTCA TTACAACGAGGAAATAGAAGAAAATGAAAAATTTTCGACAAAATGTATAGTCATTTCTATC � …THR4 � …ARO1 � 5’- ACAAAGGTACCTTCCTGGCCAATCTCACAGATTTAATATAGTAAATTGTCATGCATA TGACTCATCC CGAACATGAAA � 5’- ATTGAT TGACTCATTT TCCTCTGACTACTACCAGTTCAAAATGTTAGAGAAAAATAGAAAAGCAGAAAAAATAAATAA � …HOM2 � 5’- GGCGCCACAGTCCGCGTTTGGTTATCCGGC TGACTCATTCTGACTCTTTT TTGGAAAGTGTGGCATGTGCTTCACACA � …PRO3 � § Binding ¡sites ¡are ¡represented ¡as ¡“words” ¡= ¡“string”=“k-‑mer” ¡ - e.g. ¡ acgtga ¡is ¡a ¡6-‑mer ¡ § Signal ¡is ¡likely ¡to ¡be ¡more ¡frequent ¡in ¡the ¡upstream ¡regions ¡of ¡the ¡ co-‑regulated ¡genes ¡than ¡in ¡a ¡random ¡selec#on ¡of ¡genes ¡ § We ¡will ¡thus ¡detect ¡over-‑represented ¡words ¡

Mo6f ¡discovery ¡using ¡word ¡coun6ng ¡ Idea: motifs corresponding to binding sites are generally repeated in the dataset → capture this statistical signal n Algorithm ¡ • count ¡occurrences ¡of ¡ all ¡k-‑mers ¡in ¡a ¡set ¡of ¡related ¡sequences ¡ (promoters ¡of ¡co-‑expressed ¡genes, ¡in ¡ChIP ¡bound ¡regions,...) ¡

A ¡more ¡relevant ¡criterion ¡for ¡over-‑representa6on ¡ § The ¡most ¡frequent ¡pa[erns ¡do ¡not ¡reveal ¡the ¡mo#fs ¡specifically ¡bound ¡by ¡ specific ¡transcrip#on ¡factors. ¡ ¡ ¡ § They ¡merely ¡ reflect ¡the ¡composi6onal ¡biases ¡of ¡upstream ¡sequences. ¡ ¡ § A ¡more ¡relevant ¡criterion ¡for ¡over-‑representa#on ¡is ¡to ¡detect ¡pa[erns ¡which ¡ are ¡more ¡frequent ¡ in ¡the ¡upstream ¡sequences ¡of ¡the ¡selected ¡genes ¡(co-‑ regulated) ¡ than ¡the ¡random ¡expecta6on . ¡ ¡ § The ¡ random ¡expecta6on ¡ is ¡calculated ¡by ¡coun#ng ¡the ¡frequency ¡of ¡each ¡ pa[ern ¡in ¡the ¡complete ¡set ¡of ¡upstream ¡sequences ¡(all ¡genes ¡of ¡the ¡ genome). ¡ => ¡ “Background” ¡

Mo#f discovery Morgane Thomas-Chollier Computa)onal systems - PowerPoint PPT Presentation

Mo#f discovery Morgane Thomas-Chollier Computa)onal systems biology - IBENS mthomas@biologie.ens.fr M2 Computa6onal analysis of cis-regulatory sequences

UNESCO Discovery Centre reference image of education space UNESCO Discovery Centre Discovery

From Search to Discovery in our Future Library From Search to Discovery W e see a spectrum of

Watson Discovery Spring 2020 Discovery pipeline Using NLU, document conversion, and UI tools

Tunnel End-point Discovery Tunnel End-point Discovery draft-palet-v6ops-tun-auto-disc-03.txt

VPN Discovery VPN Discovery Design Team Discussions and Options Design Team Discussions and

Motif Discovery Upper Bound An Upper Bound on the Hardness of Exact Matrix Based Motif Discovery

E-Discovery Challenges In Healthcare Objectives To Identify and Discuss: E-Discovery? What

EBSCO Discovery Service Presented By: Shaji John Discovery Innovations and FOLIO Africa,

Home Learning Event NSS Discovery: National benchmarking February 2020 What is NSS Discovery?

Discovery Projects Strategies for Defining the Opportunity Tom Martin Senior Technology

RNA Search and Whirlwind tour of ncRNA search & discovery Motif Discovery RNA motif

On Optimal Neighbor Discovery Philipp H. Kindt philipp.kindt@tum.de SIGCOMM19, Beijing CH

A tunnel discovery and A tunnel discovery and monitoring overview monitoring overview Ryszard

Discovery & Monetisation Tom Greenaway Google @tcmg But fjrst Can someone please

Foundations of Causal Discovery Frederick Eberhardt KDD Causality Workshop 2016 Causal Discovery

Value-Driven Development with Continuous Discovery Introductions Prabhat Sinha Hello

CSE 527 Lecture 10 More on the Gibbs Sampler Projects see web Implementation or

Thinking with Data in the Second Course Nicholas J. Horton Department of Mathematics and

PLANAR: RNA Sequence Alignment using Non-Affine Gap Penalty and Secondary Structure Ofer Hirsch

Gene Expression: Details Pre-mRNA Secondary (Eukaryotes) Structure Prediction Aids DNA

Composite repetition-aware text indexing Djamal Belazzougui Fabio Cunial Travis Gagie Nicola

Data Mining in Bioinformatics Day 8: Clustering in Bioinformatics Clustering Gene Expression Data

CSI5180. MachineLearningfor BioinformaticsApplications Rule Learning by Marcel Turcotte Version

Pattern Matching in Protein-Protein Interaction Graphs Ga elle Brevier ( Universit e de

Sambuz

Useful Links

Newsletter

Mail Us

Mo#f discovery Morgane Thomas-Chollier Computa)onal systems - PowerPoint PPT Presentation

Mo#f discovery Morgane Thomas-Chollier Computa)onal systems biology - IBENS mthomas@biologie.ens.fr M2 Computa6onal analysis of cis-regulatory sequences

UNESCO Discovery Centre reference image of education space UNESCO Discovery Centre Discovery

From Search to Discovery in our Future Library From Search to Discovery W e see a spectrum of

Watson Discovery Spring 2020 Discovery pipeline Using NLU, document conversion, and UI tools

Tunnel End-point Discovery Tunnel End-point Discovery draft-palet-v6ops-tun-auto-disc-03.txt

VPN Discovery VPN Discovery Design Team Discussions and Options Design Team Discussions and

Motif Discovery Upper Bound An Upper Bound on the Hardness of Exact Matrix Based Motif Discovery

E-Discovery Challenges In Healthcare Objectives To Identify and Discuss: E-Discovery? What

EBSCO Discovery Service Presented By: Shaji John Discovery Innovations and FOLIO Africa,

Home Learning Event NSS Discovery: National benchmarking February 2020 What is NSS Discovery?

Discovery Projects Strategies for Defining the Opportunity Tom Martin Senior Technology

RNA Search and Whirlwind tour of ncRNA search &amp; discovery Motif Discovery RNA motif

On Optimal Neighbor Discovery Philipp H. Kindt philipp.kindt@tum.de SIGCOMM19, Beijing CH

A tunnel discovery and A tunnel discovery and monitoring overview monitoring overview Ryszard

Discovery &amp; Monetisation Tom Greenaway Google @tcmg But fjrst Can someone please

Foundations of Causal Discovery Frederick Eberhardt KDD Causality Workshop 2016 Causal Discovery

Value-Driven Development with Continuous Discovery Introductions Prabhat Sinha Hello

CSE 527 Lecture 10 More on the Gibbs Sampler Projects see web Implementation or

Thinking with Data in the Second Course Nicholas J. Horton Department of Mathematics and

PLANAR: RNA Sequence Alignment using Non-Affine Gap Penalty and Secondary Structure Ofer Hirsch

Gene Expression: Details Pre-mRNA Secondary (Eukaryotes) Structure Prediction Aids DNA

Composite repetition-aware text indexing Djamal Belazzougui Fabio Cunial Travis Gagie Nicola

Data Mining in Bioinformatics Day 8: Clustering in Bioinformatics Clustering Gene Expression Data

CSI5180. MachineLearningfor BioinformaticsApplications Rule Learning by Marcel Turcotte Version

Pattern Matching in Protein-Protein Interaction Graphs Ga elle Brevier ( Universit e de

Sambuz

Useful Links

Newsletter

Mail Us

RNA Search and Whirlwind tour of ncRNA search & discovery Motif Discovery RNA motif

Discovery & Monetisation Tom Greenaway Google @tcmg But fjrst Can someone please