gene regulation bioinformatics
play

Gene Regulation Bioinformatics Wyeth W. Wasserman University of - PowerPoint PPT Presentation

Gene Regulation Bioinformatics Wyeth W. Wasserman University of British Columbia www.cisreg.ca The Grand Challenge: Reliably Define Cis-Regulatory Mechanisms of Regulons CLUSTERING EXPRESSION DATA SEQUENCE ANALYSIS Lake Barkley 2006 2


  1. Gene Regulation Bioinformatics Wyeth W. Wasserman University of British Columbia www.cisreg.ca

  2. The Grand Challenge: Reliably Define Cis-Regulatory Mechanisms of Regulons CLUSTERING EXPRESSION DATA SEQUENCE ANALYSIS Lake Barkley 2006 2

  3. Inferring Gene Regulation from Expression Profiling Data

  4. REGULATORY PATHWAY INFERENCE from CO-EXPRESSED GENES • What is the appeal? • Understand how perceived signals at surface result in downstream changes in cell phenotype • TFs occasionally serve as therapeutically relevant targets • PPAR γ , Estrogen Receptor, Glucocorticoid Receptor • • Builds on data from powerful profiling technologies • Expression profiling; ChIP-chip Lake Barkley 2006 4

  5. Bioinformatics and Promoter Analysis What can we do? Lake Barkley 2006 5

  6. What can we do? • Predict Transcription Factor Binding Sites Lake Barkley 2006 6

  7. Representing Binding Sites for a TF Set of Set of binding binding sites sites • A single site AAGTTAATGA AAGTTAATGA CAGTTAATAA CAGTTAATAA • AAGTTAATGA GAGTTAAACA GAGTTAAACA CAGTTAATTA CAGTTAATTA GAGTTAATAA • A set of sites represented as a consensus GAGTTAATAA CAGTTATTCA CAGTTATTCA • VDRTWRWWSHD (IUPAC degenerate DNA) GAGTTAATAA GAGTTAATAA CAGTTAATCA CAGTTAATCA AGATTAAAGA • A matrix describing a set of sites: AGATTAAAGA AAGTTAACGA AAGTTAACGA AGGTTAACGA AGGTTAACGA ATGTTGATGA ATGTTGATGA AAGTTAATGA AAGTTAATGA A 14 16 4 0 1 19 20 1 4 13 4 4 13 12 3 AAGTTAACGA C 3 0 0 0 0 0 0 0 7 3 1 0 3 1 12 AAGTTAACGA AAATTAATGA G 4 3 17 0 0 2 0 0 9 1 3 0 5 2 2 AAATTAATGA GAGTTAATGA GAGTTAATGA T 0 2 0 21 20 0 1 20 1 4 13 17 0 6 4 AAGTTAATCA AAGTTAATCA AAGTTGATGA Logo – A graphical AAGTTGATGA AAATTAATGA AAATTAATGA representation of frequency ATGTTAATGA matrix. Y-axis is information ATGTTAATGA AAGTAAATGA content , which reflects the AAGTAAATGA AAGTTAATGA strength of the pattern in each AAGTTAATGA AAGTTAATGA column of the matrix AAGTTAATGA AAATTAATGA AAATTAATGA AAGTTAATGA AAGTTAATGA AAGTTAATGA AAGTTAATGA AAGTTAATGA Lake Barkley 2006 AAGTTAATGA 7 AAGTTAATGA AAGTTAATGA

  8. Conversion of PFM to Position Specific Scoring Matrix (PSSM) Add the following features to the matrix profile: 1. Correct for nucleotide frequencies in genome 2. Weight for the confidence (depth) in the pattern 3. Convert to log-scale probability for easy arithmetic pssm pfm f (b,i)+ s (n) A 1.6 -1.7 -0.2 -1.7 -1.7 A 5 0 1 0 0 Log ( ) p (b) C -1.7 0.5 0.5 1.3 -1.7 C 0 2 2 4 0 G -1.7 1.0 -0.2 -1.7 1.3 G 0 3 1 0 4 T -1.7 -1.7 -0.2 -0.2 -0.2 T 0 0 1 1 1 TGCTG = 0.9 Lake Barkley 2006 8

  9. What can we do? • Predict TFBS • Predict Cis-Regulatory Modules Lake Barkley 2006 9

  10. Combinatorial interactions between TFs Lake Barkley 2006 10

  11. CRM Models Trained models take as input a set of TF binding profiles and return significant clusters of TFBS 1 0.8 0.6 0.4 0.2 0 -0.2 100 510 920 1330 1740 2150 2560 2970 3380 3790 4200 4610 5020 5430 5840 Lake Barkley 2006 11

  12. What can we do? • Predict TFBS • Predict CRMs • Phylogenetic Footprinting Lake Barkley 2006 12

  13. Phylogenetic Footprinting % I dentity 200 bp Window Start Position (human sequence) Actin gene compared between human and mouse Lake Barkley 2006 13

  14. What can we do? • Predict TFBS • Predict CRMs • Phylogenetic Footprinting • Motif Over-Representation Lake Barkley 2006 14

  15. Deciphering Regulation of Co- Expressed Genes Co-Expressed Controls Lake Barkley 2006 15

  16. oPOSSUM Procedure Set of co- Automated expressed or Phylogenetic sequence retrieval Footprinting co-precipitated from EnsEMBL genes ORCA ORCA Putative Statistical Detection of mediating significance of transcription factor transcription binding sites binding sites factors Lake Barkley 2006 16

  17. Statistical Methods for Identifying Over-represented TFBS • Z scores – Based on the number of occurrences of the TFBS relative to background – Normalized for sequence length – Simple binomial distribution model • Fisher exact probability scores – Based on the number of genes containing the TFBS relative to background – Hypergeometric probability distribution Lake Barkley 2006 17

  18. Validation using Reference Gene Sets A. Muscle-specific (23 input; 16 analyzed) B. Liver-specific (20 input; 12 analyzed) Rank Z-score Fisher Rank Z-score Fisher 8.83e-08 SRF 1 21.41 1.18e-02 HNF-1 1 38.21 9.50e-03 MEF2 2 18.12 8.05e-04 HLF 2 11.00 1.22e-01 c-MYB_1 3 14.41 1.25e-03 Sox-5 3 9.822 1.60e-01 Myf 4 13.54 3.83e-03 FREAC-4 4 7.101 4.66e-02 TEF-1 5 11.22 2.87e-03 HNF-3beta 5 4.494 4.20e-01 deltaEF1 6 10.88 1.09e-02 SOX17 6 4.229 Yin-Yang 7 4.070 1.16e-01 S8 7 5.874 2.93e-01 1.61e-02 Irf-1 8 5.245 2.63e-01 S8 8 3.821 Irf-1 9 3.477 1.69e-01 Thing1-E47 9 4.485 4.97e-02 COUP-TF 10 3.286 2.97e-01 HNF-1 10 3.353 2.93e-01 TFs with experimentally-verified sites in the reference sets. Lake Barkley 2006 18

  19. Empirical Selection of Parameters based on Reference Studies 40 p65 SRF c-Rel HNF-1 30 p50 NF- κ B 20 Muscle TEF-1 MEF2 Liver Z-score FREAC-2 10 NF- κ B Myf cEBP Z-score cutoff SP1 HNF-3 β Fisher cutoff 0 -10 -20 1.0E-09 1.0E-07 1.0E-05 1.0E-03 1.0E-01 Fisher p-value Lake Barkley 2006 19

  20. C-Myc SAGE Data • c-Myc transcription factor dimerizes with the Max protein • Key regulator of cell proliferation, differentiation and apoptosis • Menssen and Hermeking identified 216 different SAGE tags corresponding to unique mRNAs that were induced after adenoviral expression of c-Myc in HUVEC cells • They then went on to confirm the induction of 53 genes using microarray analysis and RT-PCR Lake Barkley 2006 20

  21. Induced Genes after Ectopic Expression of c-Myc (SAGE) (53 input; 36 analyzed) TF Class Rank Z-score Fisher No. Genes Myc-Max bHLH-ZIP 1 21.68 5.35e-03 7 Staf ZN-FINGER, C2H2 2 20.17 1.70e-02 2 Max bHLH-ZIP 3 18.32 2.16e-02 12 SAP-1 ETS 4 13.23 1.61e-04 13 1.84e-01 16 USF bHLH-ZIP 5 11.90 SP1 ZN-FINGER, C2H2 6 11.68 4.40e-02 12 1.55e-01 20 n-MYC bHLH-ZIP 7 11.11 1.55e-01 20 ARNT bHLH 8 11.11 Elk-1 ETS 9 10.92 3.88e-03 19 1.11e-01 25 Ahr-ARNT bHLH 10 10.17

  22. C-Fos Microarray Experiment • In a study examining the role of transcriptional repression in oncogenesis, Ordway et al . compared the gene expression profiles of fibroblasts transformed by c-fos to the parental 208F rat fibroblast cell line • We mapped the list of 252 induced Affymetrix Rat Genome U34A GeneChip sequences to 136 human orthologs Lake Barkley 2006 22

  23. Induced Genes after Ectopic Expression of c-Fos (Affymetrix) (136 input; 86 analyzed) TF Class Rank Z-score Fisher No. Genes c-FOS bZIP 1 17.53 2.60e-05 45 RREB-1 ZN-FINGER, C2H2 2 8.899 1.41e-01 1 PPARgamma-RXRal NUCLEAR RECEPTOR 3 3.991 2.98e-01 1 CREB bZIP 4 3.626 1.25e-01 10 E2F Unknown 5 2.965 7.67e-02 15 Lake Barkley 2006 23

  24. NF- к B inhibition microarray study Lake Barkley 2006 24

  25. Genes significantly down-regulated by the NF- κ B pathway inhibitor (326 input ; 179 analyzed) TF Class Rank Z-score Fisher No. Genes p65 REL 1 36.57 5.66e-12 62 NF-kappaB REL 2 32.58 5.82e-11 61 c-REL REL 3 26.02 8.59e-08 63 Irf-2 TRP-CLUSTER 4 20.39 5.74e-04 6 SPI-B ETS 5 16.59 1.23e-03 135 Irf-1 TRP-CLUSTER 6 15.4 9.55e-04 23 Sox-5 HMG 7 15.38 2.56e-02 126 p50 REL 8 14.72 2.23e-03 19 Nkx HOMEO 9 13.66 2.29e-03 111 9.92e-02 1 Bsap PAIRED 10 13.2 FREAC-4 FORKHEAD 11 12.05 1.66e-03 92 Lake Barkley 2006 25

  26. Identifying over-represented pairs of TFBSs in co-expressed genes Background Target d Calculate a Fisher exact probability that the pair of sites is over-represented Correct for multiple testing d Lake Barkley 2006 26

Recommend


More recommend