Retroviruses integrate into a shared, non-palindromic motif Paul Kirk MASAMB 2016, Cambridge October 4, 2016
Central dogma of molecular biology (Crick, 1956) General transfers of biological sequential information: Protein translation RNA transcription DNA replication 1 of 22 MRC | Medical Research Council
Central dogma of molecular biology (Crick, 1956) General transfers of biological sequential information: Protein translation RNA transcription DNA replication There are also special transfers of sequential information. 1 of 22 MRC | Medical Research Council
For example: retroviruses A retrovirus: Reverse transcriptase Integrase viral RNA Protease 2 of 22 MRC | Medical Research Council
For example: retroviruses A retrovirus: Reverse transcriptase Integrase viral RNA Protease Retroviruses are obligate parasites : they require a host cell to complete their “life”-cycle. 2 of 22 MRC | Medical Research Council
For example: retroviruses A retrovirus: Reverse transcriptase Integrase viral RNA Protease Retroviruses are obligate parasites : they require a host cell to complete their “life”-cycle. Examples: HIV, HTLV-1, . . . . 2 of 22 MRC | Medical Research Council
For example: retroviruses L L E C T S O H host DNA 3 of 22 MRC | Medical Research Council
For example: retroviruses L L E C T S O H INFECTION host DNA 3 of 22 MRC | Medical Research Council
For example: retroviruses L L E C T S O H viral RNA host DNA 3 of 22 MRC | Medical Research Council
For example: retroviruses L L E C T S O H viral RNA Reverse transcriptase viral DNA host DNA 3 of 22 MRC | Medical Research Council
For example: retroviruses L L E C T S O H viral RNA Reverse transcriptase viral DNA Integrase host DNA host DNA S N I P ! 3 of 22 MRC | Medical Research Council
For example: retroviruses L L E C T S O H viral RNA Reverse transcriptase Integrase host DNA viral DNA host DNA provirus 3 of 22 MRC | Medical Research Council
Characterising retroviral integration sites ...A T CCC G C TT A... HOST DNA 4 of 22 MRC | Medical Research Council
Characterising retroviral integration sites T G A C ... C G T RETROVIRUS DNA INTERMEDIATE ...A T CCC G C TT A... HOST DNA 4 of 22 MRC | Medical Research Council
Characterising retroviral integration sites T G A C ... C G T RETROVIRUS DNA INTERMEDIATE ...A T CCC G C TT A... HOST DNA CUT! 4 of 22 MRC | Medical Research Council
Characterising retroviral integration sites PASTE! ...A T CCC G C TT A... T G A C ... C G T HOST DNA PROVIRUS 4 of 22 MRC | Medical Research Council
Characterising retroviral integration sites PASTE! ...A T CCC G C TT A... T G A C ... C G T HOST DNA PROVIRUS We would like to characterise the target integration site • i.e. the regions flanking the provirus • Is there a motif? 4 of 22 MRC | Medical Research Council
Aligning integration sites Given a collection of integration sites, we can align them according to the position of the provirus. . . ...A T CCC G C TT A... T G A C ... C G T INTEGRATION SITE 1 ... TT A G A G GG T A... T G A C ... C G T INTEGRATION SITE 2 ...AA C G AA C TT C ... T G A C ... C G T INTEGRATION SITE 3 ... TT C T CC C GG A... T G A C ... C G T INTEGRATION SITE 4 ...A G C TT C C T G C ... T G A C ... C G T INTEGRATION SITE 5 5 of 22 MRC | Medical Research Council
Aligning integration sites Given a collection of integration sites, we can align them according to the position of the provirus. . . . . . and then ignore/remove/mask the provirus sequence, so that we just look at the target sites: ...A T CCC G C TT A... INTEGRATION SITE 1 ... TT A G A G GG T A... INTEGRATION SITE 2 ...AA C G AA C TT C ... INTEGRATION SITE 3 ... TT C T CC C GG A... INTEGRATION SITE 4 ...A G C TT C C T G C ... INTEGRATION SITE 5 5 of 22 MRC | Medical Research Council
Summarising a collection of target sites Sequences Complements Reverse complements Example ...ATC... ...TAG... ...GAT... (5 sequences) ...TTA... ...AAT... ...TAA... ...AAC... ...TTG... ...GTT... ...TTC... ...AAG... ...GAA... ...AGC... ...TCG... ...GCT... Consensus sequence Just take the most frequent letter at each position: ...ATC... Position probability matrix (PPM), P Estimate the probability of each letter at each position: A . . . 3 / 5 1 / 5 1 / 5 . . . T . . . 2 / 5 3 / 5 0 . . . P = C . . . 0 0 4 / 5 . . . G . . . 0 1 / 5 0 . . . 6 of 22 MRC | Medical Research Council
Summarising a collection of target sites Sequences Complements Reverse complements Example ...ATC... ...TAG... ...GAT... (5 sequences) ...TTA... ...AAT... ...TAA... ...AAC... ...TTG... ...GTT... ...TTC... ...AAG... ...GAA... ...AGC... ...TCG... ...GCT... Reverse complement PPM, P ( RC ) The PPM for the reverse complement sequences: A . . . 0 3 / 5 2 / 5 . . . T . . . 1 / 5 1 / 5 3 / 5 . . . P ( RC ) = C . . . 0 1 / 5 0 . . . G . . . 4 / 5 0 0 . . . Note: we can get P ( RC ) from P (and vice versa) by swapping the rows A ↔ T and C ↔ G, and reversing the order of the columns. 7 of 22 MRC | Medical Research Council
Palindromic consensus sequences for HTLV-1 and HIV-1 target integration sites From 4,521 HTLV-1 target integration sites, we find the consensus: AA G T GG A T A T CC A C TT From 13,442 HIV-1 target integration sites, we find the consensus: TTT GG T AA CC AAA 8 of 22 MRC | Medical Research Council
Palindromic consensus sequences for HTLV-1 and HIV-1 target integration sites From 4,521 HTLV-1 target integration sites, we find the consensus: AA G T GG A T A T CC A C TT From 13,442 HIV-1 target integration sites, we find the consensus: TTT GG T AA CC AAA 8 of 22 MRC | Medical Research Council
Palindromic consensus sequences for HTLV-1 and HIV-1 target integration sites From 4,521 HTLV-1 target integration sites, we find the consensus: AA G T GG A T A T CC A C TT TT C A CC T A T A GG T G AA From 13,442 HIV-1 target integration sites, we find the consensus: TTT GG T AA CC AAA AAA CC A T T TTT GG 8 of 22 MRC | Medical Research Council
Palindromic consensus sequences for HTLV-1 and HIV-1 target integration sites From 4,521 HTLV-1 target integration sites, we find the consensus: AA G T GG A T A T CC A C TT TT C A CC T A T A GG T G AA From 13,442 HIV-1 target integration sites, we find the consensus: TTT GG T AA CC AAA AAA CC A T T TTT GG The target integration sites are palindromic (as already known!) 8 of 22 MRC | Medical Research Council
Palindromic PPMs for HTLV-1 and HIV-1 target integration sites For both HTLV-1 and HIV-1, we have P ( RC ) ≈ P HTLV-1 HIV-1 0.6 0.6 Entries of reverse-complement PPM, P (RC) Entries of reverse-complement PPM, P (RC) P (RC) = P P (RC) = P 95% credible region 95% credible region 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0 0.1 0.2 0.3 0.4 0.5 0.6 Entries of PPM, P Entries of PPM, P 9 of 22 MRC | Medical Research Council
Palindromic sequence logos HTLV-1: 0.4 T A 0.3 bits 0.2 A T G 0.1 T A C A A A C G A T T T T A T T A T G T A A A T A T C G A T T G A A G C G C C C C T C G C T A C G G T A T C A C G 0.0 G C A T G G A C T C A G A A A T T G G G C G C G C C T T C G C T G G G C G G C C C -13 -12 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 1 2 3 4 5 6 7 8 9 10 11 12 13 HIV-1: 0.4 0.3 bits 0.2 A T G T A C A A T 0.1 A C T A A A T T A T T T G G T A T C C A G C G C A G A T C T A T C C G C T T C G A G A A A G 0.0 G A T C A T C A T A C T G A T T A T A C C G C G G G G C C G G G C G T C C G T G C C G -12 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 10 of 22 MRC | Medical Research Council
An attack of aibohphobia • There is an almost unbelievable amount of symmetry (!) 11 of 22 MRC | Medical Research Council
An attack of aibohphobia • There is an almost unbelievable amount of symmetry (!) • Is this “real”? Do we see evidence of the symmetry within individual sequences, or just at the level of these summaries? 11 of 22 MRC | Medical Research Council
An attack of aibohphobia • There is an almost unbelievable amount of symmetry (!) • Is this “real”? Do we see evidence of the symmetry within individual sequences, or just at the level of these summaries? • We introduce a palindrome index to quantify “how palindromic” each sequence is 11 of 22 MRC | Medical Research Council
The palindrome index AA G T GG A T A T CC A C TT 12 of 22 MRC | Medical Research Council
The palindrome index AA G T GG A T A T CC A C TT S = s -8 s -7 s -6 s -5 s -4 s -3 s -2 s -1 s 1 s 2 s 3 s 4 s 5 s 6 s 7 s 8 12 of 22 MRC | Medical Research Council
The palindrome index AA G T GG A T A T CC A C TT S = s -8 s -7 s -6 s -5 s -4 s -3 s -2 s -1 s 1 s 2 s 3 s 4 s 5 s 6 s 7 s 8 Define n ρ ( S ) = 1 � I ( s i = c ( s − i )) , n i = 1 where 2 n is the sequence length, I is the indicator function, and c ( x ) is the complement of x (e.g. c ( T ) = A ). 12 of 22 MRC | Medical Research Council
Recommend
More recommend