Discovery of Genomic Structural Variations with Next-Generation Sequencing Data Advanced Topics in Computational Genomics Slides from Marcel H. Schulz, Tobias Rausch (EMBL), and Kai Ye (Leiden University)
Computational Methods
Detecting Genomic Rearrangements Reference Mate-pair or paired-end mapping abnormalities Split-Read alignments Read depth signals courtesy of Tobias Rausch (EMBL)
Detecting Genomic Rearrangements Unmapped or single-anchored Reference reads Mate-pair or paired-end mapping abnormalities Split-Read alignments Local assembly Read depth signals courtesy of Tobias Rausch (EMBL)
courtesy of Tobias Rausch (EMBL)
courtesy of Tobias Rausch (EMBL)
Insertions Deletions courtesy of Tobias Rausch (EMBL)
Lee et al. (2009) Korbel et al. (2007) courtesy of Tobias Rausch (EMBL)
courtesy of Tobias Rausch (EMBL)
courtesy of Tobias Rausch (EMBL)
courtesy of Tobias Rausch (EMBL)
courtesy of Tobias Rausch (EMBL)
1 Copy 1 Copy 0 Copy 2 Copy 2 Copy Chiang et al. (2009) courtesy of Tobias Rausch (EMBL)
• Down-Syndrom – Partial Trisomie 21 Xie et al. (2009) courtesy of Tobias Rausch (EMBL)
Human cancer cell lines compared to normal cell lines (SeqSeq algorithm, no fixed window size, multiple change points method ) Chiang et al. (2009)
With reads of length 40-100 bps are we able to find the exact breakpoint of a structural variation?
With reads of length 40-100 bps are we able to find the exact breakpoint of a structural variation? Yes – using split-read mapping Donor Reference Example for read of length 40: Expected random matches for a 12bp read-prefix in the human genome?
With reads of length 40-100 bps are we able to find the exact breakpoint of a structural variation? Yes – using split-read mapping Donor Reference Example for read of length 40: Expected random matches for a 12bp read-prefix in the human genome? 1 ⋅ 10 9 ≈ 179 4 12
With reads of length 40-100 bps are we able to find the exact breakpoint of a structural variation? Yes – using anchored split-read mapping Donor Reference mappable read mate provides anchor to narrow down search space Medvedev et al. (2009)
The Pindel algorithm (Deletions) How to do that? Ye et al. (2009)
The Pindel algorithm (Deletions) ① Use 3’ end of left read as anchor point ② Use pattern growth to search for minimum and maximum unique substrings from the 3 ′ end of the unmapped read (<=2x insert size) Ye et al. (2009)
#&)-./ ! '0&12-./ ! (3 ! %0&&$). ! /)45&2 ATGCA ATCAAGTATGCTTAGC !" ! #$%&$'($) ! *!++ +, courtesy of Kai Ye (Leiden U.)
#&)-./ ! '0&12-./ ! (3 ! %0&&$). ! /)45&2 ATGCA ATCAAGTATGCTTAGC !" ! #$%&$'($) ! *!++ +, courtesy of Kai Ye (Leiden U.)
#&)-./ ! '0&12-./ ! (3 ! %0&&$). ! /)45&2 ATGCA ATCAAGTATGCTTAGC !" ! #$%&$'($) ! *!++ +, courtesy of Kai Ye (Leiden U.)
#&)-./ ! '0&12-./ ! (3 ! %0&&$). ! /)45&2 ATGCA ATCAAGTATGCTTAGC !" ! #$%&$'($) ! *!++ +, courtesy of Kai Ye (Leiden U.)
#&),-. ! '/&01,-. ! (2 ! %/&&$)- ! .)34&1 ATGCA ATCAAGTATGCTTAGC 5,-,'6' ! 6-,76$ ! 86(8&),-.9 ! :;< 5/=,'6' ! 6-,76$ ! 86(8&),-.9 ! :;<> !" ! #$%&$'($) ! *!++ *! courtesy of Kai Ye (Leiden U.)
The Pindel algorithm (Deletions) ① Use 3’ end of left read as anchor point ② Use pattern growth to search for minimum and maximum unique substrings from the 3 ′ end of the unmapped read (<=2x insert size) ③ Use pattern growth to search for minimum and maximum unique substrings from the 5’ end of the unmapped read (read length + Max_D) starting from mapped end in step 2 Ye et al. (2009)
The Pindel algorithm (Deletions) ① Use 3’ end of left read as anchor point ② Use pattern growth to search for minimum and maximum unique substrings from the 3 ′ end of the unmapped read (<=2x insert size) ③ Use pattern growth to search for minimum and maximum unique substrings from the 5’ end of the unmapped read (read length + Max_D) starting from mapped end in step 2 ④ check if complete unmapped read can be combined from 3’ and 5’ end substrings matches Ye et al. (2009)
The Pindel algorithm (Insertions) ① Use 3’ end of left read as anchor point ② Use pattern growth to search for minimum and maximum unique substrings from the 3 ′ end of the unmapped read (<=2x insert size) ③ Use pattern growth to search for minimum and maximum unique substrings from the 5’ end of the unmapped read (read length -1) starting from mapped end in step 2 ④ check if complete unmapped read can be combined from 3’ and 5’ end substrings matches Ye et al. (2009)
The Pindel algorithm (Insertions) ① Use 3’ end of left read as anchor point ② Use pattern growth to search for minimum and maximum unique substrings from the 3 ′ end of the unmapped read (<=2x insert size) ③ Use pattern growth to search for minimum and maximum unique substrings from the 5’ end of the unmapped read (read length -1) starting from mapped end in step 2 ④ check if complete unmapped read can be combined from 3’ and 5’ end substrings matches • In initial Pindel version exact matches to reference where required Ye et al. (2009)
The Pindel algorithm (Real Data) Ye et al. (2009)
The Pindel algorithm (Real Data) Ye et al. (2009)
The Pindel algorithm for complex variants a) large deletion b) tandem duplication c) inversion d-f) same as a-c with non-template sequence (yellow part) Ye et al. Pindel manual
Acknowledgements • Tobias Rausch (EMBL) • Kai Ye (Leiden University Medical Center) • Anne-Katrin Emde (Freie Universität Berlin) References Kai Ye, Marcel H. Schulz, Quan Long, Rolf Apweiler, and Zemin Ning Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics (2009) 25(21): 2865-2871 Pindel homepage: https://trac.nbic.nl/pindel/ SplazerS homepage: http://www.seqan.de/projects/splazers.html
Recommend
More recommend