paul kirk
play

Paul Kirk MASAMB 2016, Cambridge October 4, 2016 Central dogma of - PowerPoint PPT Presentation

Retroviruses integrate into a shared, non-palindromic motif Paul Kirk MASAMB 2016, Cambridge October 4, 2016 Central dogma of molecular biology (Crick, 1956) General transfers of biological sequential information: Protein translation RNA


  1. Retroviruses integrate into a shared, non-palindromic motif Paul Kirk MASAMB 2016, Cambridge October 4, 2016

  2. Central dogma of molecular biology (Crick, 1956) General transfers of biological sequential information: Protein translation RNA transcription DNA replication 1 of 22 MRC | Medical Research Council

  3. Central dogma of molecular biology (Crick, 1956) General transfers of biological sequential information: Protein translation RNA transcription DNA replication There are also special transfers of sequential information. 1 of 22 MRC | Medical Research Council

  4. For example: retroviruses A retrovirus: Reverse transcriptase Integrase viral RNA Protease 2 of 22 MRC | Medical Research Council

  5. For example: retroviruses A retrovirus: Reverse transcriptase Integrase viral RNA Protease Retroviruses are obligate parasites : they require a host cell to complete their “life”-cycle. 2 of 22 MRC | Medical Research Council

  6. For example: retroviruses A retrovirus: Reverse transcriptase Integrase viral RNA Protease Retroviruses are obligate parasites : they require a host cell to complete their “life”-cycle. Examples: HIV, HTLV-1, . . . . 2 of 22 MRC | Medical Research Council

  7. For example: retroviruses L L E C T S O H host DNA 3 of 22 MRC | Medical Research Council

  8. For example: retroviruses L L E C T S O H INFECTION host DNA 3 of 22 MRC | Medical Research Council

  9. For example: retroviruses L L E C T S O H viral RNA host DNA 3 of 22 MRC | Medical Research Council

  10. For example: retroviruses L L E C T S O H viral RNA Reverse transcriptase viral DNA host DNA 3 of 22 MRC | Medical Research Council

  11. For example: retroviruses L L E C T S O H viral RNA Reverse transcriptase viral DNA Integrase host DNA host DNA S N I P ! 3 of 22 MRC | Medical Research Council

  12. For example: retroviruses L L E C T S O H viral RNA Reverse transcriptase Integrase host DNA viral DNA host DNA provirus 3 of 22 MRC | Medical Research Council

  13. Characterising retroviral integration sites ...A T CCC G C TT A... HOST DNA 4 of 22 MRC | Medical Research Council

  14. Characterising retroviral integration sites T G A C ... C G T RETROVIRUS DNA INTERMEDIATE ...A T CCC G C TT A... HOST DNA 4 of 22 MRC | Medical Research Council

  15. Characterising retroviral integration sites T G A C ... C G T RETROVIRUS DNA INTERMEDIATE ...A T CCC G C TT A... HOST DNA CUT! 4 of 22 MRC | Medical Research Council

  16. Characterising retroviral integration sites PASTE! ...A T CCC G C TT A... T G A C ... C G T HOST DNA PROVIRUS 4 of 22 MRC | Medical Research Council

  17. Characterising retroviral integration sites PASTE! ...A T CCC G C TT A... T G A C ... C G T HOST DNA PROVIRUS We would like to characterise the target integration site • i.e. the regions flanking the provirus • Is there a motif? 4 of 22 MRC | Medical Research Council

  18. Aligning integration sites Given a collection of integration sites, we can align them according to the position of the provirus. . . ...A T CCC G C TT A... T G A C ... C G T INTEGRATION SITE 1 ... TT A G A G GG T A... T G A C ... C G T INTEGRATION SITE 2 ...AA C G AA C TT C ... T G A C ... C G T INTEGRATION SITE 3 ... TT C T CC C GG A... T G A C ... C G T INTEGRATION SITE 4 ...A G C TT C C T G C ... T G A C ... C G T INTEGRATION SITE 5 5 of 22 MRC | Medical Research Council

  19. Aligning integration sites Given a collection of integration sites, we can align them according to the position of the provirus. . . . . . and then ignore/remove/mask the provirus sequence, so that we just look at the target sites: ...A T CCC G C TT A... INTEGRATION SITE 1 ... TT A G A G GG T A... INTEGRATION SITE 2 ...AA C G AA C TT C ... INTEGRATION SITE 3 ... TT C T CC C GG A... INTEGRATION SITE 4 ...A G C TT C C T G C ... INTEGRATION SITE 5 5 of 22 MRC | Medical Research Council

  20. Summarising a collection of target sites Sequences Complements Reverse complements Example ...ATC... ...TAG... ...GAT... (5 sequences) ...TTA... ...AAT... ...TAA... ...AAC... ...TTG... ...GTT... ...TTC... ...AAG... ...GAA... ...AGC... ...TCG... ...GCT... Consensus sequence Just take the most frequent letter at each position: ...ATC... Position probability matrix (PPM), P Estimate the probability of each letter at each position: A  . . . 3 / 5 1 / 5 1 / 5 . . .  T . . . 2 / 5 3 / 5 0 . . .   P =   C . . . 0 0 4 / 5 . . .   G . . . 0 1 / 5 0 . . . 6 of 22 MRC | Medical Research Council

  21. Summarising a collection of target sites Sequences Complements Reverse complements Example ...ATC... ...TAG... ...GAT... (5 sequences) ...TTA... ...AAT... ...TAA... ...AAC... ...TTG... ...GTT... ...TTC... ...AAG... ...GAA... ...AGC... ...TCG... ...GCT... Reverse complement PPM, P ( RC ) The PPM for the reverse complement sequences: A  . . . 0 3 / 5 2 / 5 . . .  T . . . 1 / 5 1 / 5 3 / 5 . . . P ( RC ) =     C . . . 0 1 / 5 0 . . .   G . . . 4 / 5 0 0 . . . Note: we can get P ( RC ) from P (and vice versa) by swapping the rows A ↔ T and C ↔ G, and reversing the order of the columns. 7 of 22 MRC | Medical Research Council

  22. Palindromic consensus sequences for HTLV-1 and HIV-1 target integration sites From 4,521 HTLV-1 target integration sites, we find the consensus: AA G T GG A T A T CC A C TT From 13,442 HIV-1 target integration sites, we find the consensus: TTT GG T AA CC AAA 8 of 22 MRC | Medical Research Council

  23. Palindromic consensus sequences for HTLV-1 and HIV-1 target integration sites From 4,521 HTLV-1 target integration sites, we find the consensus: AA G T GG A T A T CC A C TT From 13,442 HIV-1 target integration sites, we find the consensus: TTT GG T AA CC AAA 8 of 22 MRC | Medical Research Council

  24. Palindromic consensus sequences for HTLV-1 and HIV-1 target integration sites From 4,521 HTLV-1 target integration sites, we find the consensus: AA G T GG A T A T CC A C TT TT C A CC T A T A GG T G AA From 13,442 HIV-1 target integration sites, we find the consensus: TTT GG T AA CC AAA AAA CC A T T TTT GG 8 of 22 MRC | Medical Research Council

  25. Palindromic consensus sequences for HTLV-1 and HIV-1 target integration sites From 4,521 HTLV-1 target integration sites, we find the consensus: AA G T GG A T A T CC A C TT TT C A CC T A T A GG T G AA From 13,442 HIV-1 target integration sites, we find the consensus: TTT GG T AA CC AAA AAA CC A T T TTT GG The target integration sites are palindromic (as already known!) 8 of 22 MRC | Medical Research Council

  26. Palindromic PPMs for HTLV-1 and HIV-1 target integration sites For both HTLV-1 and HIV-1, we have P ( RC ) ≈ P HTLV-1 HIV-1 0.6 0.6 Entries of reverse-complement PPM, P (RC) Entries of reverse-complement PPM, P (RC) P (RC) = P P (RC) = P 95% credible region 95% credible region 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0 0.1 0.2 0.3 0.4 0.5 0.6 Entries of PPM, P Entries of PPM, P 9 of 22 MRC | Medical Research Council

  27. Palindromic sequence logos HTLV-1: 0.4 T A 0.3 bits 0.2 A T G 0.1 T A C A A A C G A T T T T A T T A T G T A A A T A T C G A T T G A A G C G C C C C T C G C T A C G G T A T C A C G 0.0 G C A T G G A C T C A G A A A T T G G G C G C G C C T T C G C T G G G C G G C C C -13 -12 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 1 2 3 4 5 6 7 8 9 10 11 12 13 HIV-1: 0.4 0.3 bits 0.2 A T G T A C A A T 0.1 A C T A A A T T A T T T G G T A T C C A G C G C A G A T C T A T C C G C T T C G A G A A A G 0.0 G A T C A T C A T A C T G A T T A T A C C G C G G G G C C G G G C G T C C G T G C C G -12 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 10 of 22 MRC | Medical Research Council

  28. An attack of aibohphobia • There is an almost unbelievable amount of symmetry (!) 11 of 22 MRC | Medical Research Council

  29. An attack of aibohphobia • There is an almost unbelievable amount of symmetry (!) • Is this “real”? Do we see evidence of the symmetry within individual sequences, or just at the level of these summaries? 11 of 22 MRC | Medical Research Council

  30. An attack of aibohphobia • There is an almost unbelievable amount of symmetry (!) • Is this “real”? Do we see evidence of the symmetry within individual sequences, or just at the level of these summaries? • We introduce a palindrome index to quantify “how palindromic” each sequence is 11 of 22 MRC | Medical Research Council

  31. The palindrome index AA G T GG A T A T CC A C TT 12 of 22 MRC | Medical Research Council

  32. The palindrome index AA G T GG A T A T CC A C TT S = s -8 s -7 s -6 s -5 s -4 s -3 s -2 s -1 s 1 s 2 s 3 s 4 s 5 s 6 s 7 s 8 12 of 22 MRC | Medical Research Council

  33. The palindrome index AA G T GG A T A T CC A C TT S = s -8 s -7 s -6 s -5 s -4 s -3 s -2 s -1 s 1 s 2 s 3 s 4 s 5 s 6 s 7 s 8 Define n ρ ( S ) = 1 � I ( s i = c ( s − i )) , n i = 1 where 2 n is the sequence length, I is the indicator function, and c ( x ) is the complement of x (e.g. c ( T ) = A ). 12 of 22 MRC | Medical Research Council

Recommend


More recommend