ALLPATHS: de novo assembly of whole genome micro-reads by Butler et - PowerPoint PPT Presentation

Dec 28, 2023 •405 likes •641 views

ALLPATHS: de novo assembly of whole genome micro-reads by Butler et al. Presented by Tim Smith CSC2431 2008/03/12 NGS data presents new challenges and opportunities Find all overlaps is not adequate for NGS data Mean number of false

ALLPATHS: de novo assembly of whole genome micro-reads by Butler et al. Presented by Tim Smith CSC2431 2008/03/12
NGS data presents new challenges and opportunities
“Find all overlaps” is not adequate for NGS data Mean number of false placements of K-mers
ALLPATHS finds all paths across read pairs Gaps in read pairs are “walked” from one read to the other by filling in the gap with overlapping reads
ALLPATHS introduces the concept of unipath graphs Sequence graph of C. jejuni with K = 6000 bases T wo valid paths: ABCDBCEFCEG and ABCEFCDBCEG
ALLPATHS finds approximate unipaths between read pairs
Unipaths with low copy number become seeds ● Ideally, seeds are long and unique ● Copy number is inferred from read coverage of unipath components ● Read pairing is used to optimize seed selection
“Neighborhoods” are built around seeds Unipaths assigned coordinates relative to the seed Read “partners” added to primary cloud Repetitive read pairs are placed in the secondary cloud
All paths between merged short-fragment pairs are found ● Paths between merged short-fragment pairs are computed ● Resulting set of paths covers neighborhood ● Paths are then used as reads to walk mid- length (~5 kb) read pairs from the primary read cloud
Local assemblies are glued together (a) Sequences around bubble match (b) Common path identified (c) Edges “zipped up”
The global assembly is glued together
The global assembly is edited
Evaluation was performed using “simulated short reads” ● T en reference genomes (2-39 Mb) ● 10Mb segment of reference human genome ● Segmented into 30 base “reads” – 1X coverage from long fragments (~50 kb) – 39.5X from medium fragments (~6 kb) – 39.5X from short fragments (~500 bases) – T otal of 80X coverage
The results were promising
ALLPATHS accuracy is still unknown ● Comparisons were against “reference” genomes ● No “coverage bias” in simulated reads ● Is ALLPATHS actually accurate, or just biased in the same way as Sanger?
Evaluation was also performed with “artificially paired” Solexa reads” ● 36 base E. coli Solexa reads mapped to reference genome ● Reads paired in same 80X coverage distribution as above ● Simulated error as a result in error in fragment length
Performance with real data was slightly worse ● ALLPATHS produced assembly of 58 components, with 99.1% coverage ● Components were ordered and oriented using read pair information to produce a single contiguous sequence ● Assembled sequence matches reference except in 12 locations
The performance on real paired read data is unknown ● Same problems with “simulated data” evaluation ● Bias in fragment size “error”? ● Lack of read error information
Variance in fragment size can cause “closure explosion” Number of read pair closures in E. coli using 30-base reads and K = 20
Unipath graphs offer a compact and informative representation of sequence components
Questions?

Recommend

De Novo Genome Analysis . . . . . Ketil Malde Analysis Annotation evaluation Assembly

De Novo . Institute of Marine Research Ketil Malde De Novo Genome Analysis . . . . . Ketil Malde Analysis Annotation evaluation Assembly Gene prediction Assembly Introduction October 2, 2012 De Novo . Annotation Assembly

836 views • 19 slides

Whole Genome Analysis and Annotation Adam Siepel Biological Statistics & Computational

Whole Genome Analysis and Annotation Adam Siepel Biological Statistics & Computational Biology Cornell University 2 The Challenge Whole Genome Analysis 3 Genome Browsers Whole Genome Analysis 4 Whole Genome Analysis 5 Whole Genome

862 views • 72 slides

Genome assembly Mark Stenglein, Todos Santos 2018 Genome assembly is the process of attempting to

Genome assembly Mark Stenglein, Todos Santos 2018 Genome assembly is the process of attempting to reconstruct a genome sequence An assembly is only a putative reconstruction of the genome sequence [Miller, Koren, Sutton (2010)] Keith

504 views • 22 slides

Assembly Assembly Assembling with Repeats Assembling with Repeats Mate Pairs Mate Pairs Whole

Assembly Assembly Assembling with Repeats Assembling with Repeats Mate Pairs Mate Pairs Whole genome Whole genome shotgun shotgun Input: Input: Shotgun sequence fragments (reads) Shotgun sequence fragments (reads) Mate

676 views • 24 slides

Computational Methods for de novo Assembly of Next-Generation Genome Sequencing Data Rayan Chikhi

Computational Methods for de novo Assembly of Next-Generation Genome Sequencing Data Rayan Chikhi ENS Cachan Brittany / IRISA (Genscale team) Advisor : Dominique Lavenier 1/39 genome (unknown) reads : overlapping sub-sequences, covering

801 views • 56 slides

Strategies for Bulk RNA-seq Analysis Genome Transcriptome Assembly Mapping Mapping Reads

Strategies for Bulk RNA-seq Analysis Genome Transcriptome Assembly Mapping Mapping Reads Reads Reads RSEM, Trinity, STAR, Kallisto, Scripture, HISAT2 Sailfish, Stringtie Salmon Splice-aware Transcript mapping Assembly into

622 views • 8 slides

De novo genome assembly versus mapping to a reference genome Beat Wolf PhD. Student in Computer

De novo genome assembly versus mapping to a reference genome Beat Wolf PhD. Student in Computer Science University of Wrzburg, Germany University of Applied Sciences Western Switzerland beat.wolf@hefr.ch 1 Outline Genetic variations

699 views • 56 slides

BayeHem: Bayesian Optimisation of Genome Assembly 1. Genome Assembly 2. Bayesian Optimisation

DCSI 2018 Finlay Maguire Beiko Lab, FCS, Dalhousie University BayeHem: Bayesian Optimisation of Genome Assembly 1. Genome Assembly 2. Bayesian Optimisation 3. BayeHem 4. Conclusion 1 Table of contents Genome Assembly 2

532 views • 39 slides

Introduction to Bioinformatics Genome sequencing & assembly Genome sequencing & assembly

Introduction to Bioinformatics Genome sequencing & assembly Genome sequencing & assembly p DNA sequencing n How do we obtain DNA sequence information from organisms? p Genome assembly n What is needed to put together DNA sequence

743 views • 39 slides

short read genome assembly Sorin Istrail CSCI1820 Short-read genome assembly algorithms

short read genome assembly Sorin Istrail CSCI1820 Short-read genome assembly algorithms 3/6/2014 1 Genomathica Assembler Mathematica notebook for genome assembly simulation Assembler can be found at:

942 views • 36 slides

10X Genome Assembly Technology and Single Cell CNV Credit: 10X Genomics Diana Burkart-Waco DNA

10X Genome Assembly Technology and Single Cell CNV Credit: 10X Genomics Diana Burkart-Waco DNA Technologies and Expression Analysis Cores 12-19-2018 10X Chromium Genome linked read assembly providing de novo genome assembly, variant

804 views • 27 slides

Genome Sequencing & Analysis Core Resource Olivier Fedrigo Friday, October 19, 12 Reference

Genome Sequencing & Analysis Core Resource Olivier Fedrigo Friday, October 19, 12 Reference genome * * GENOME RESEQUENCING Friday, October 19, 12 Reference genome DE NOVO GENOME SEQUENCING Friday, October 19, 12 Reference genome

537 views • 39 slides

SciLifeLab Drug Discovery Workshop Uppsala 1 June 2015 Nanna Lneborg Novo Seeds Novo Seeds

SciLifeLab Drug Discovery Workshop Uppsala 1 June 2015 Nanna Lneborg Novo Seeds Novo Seeds Seed arm of Novo A/S European seed investor with Scandinavian focus Building companies through grants and investments Evergreen fund

730 views • 15 slides

Bioinformatics Seminars Series: Assembly Validation Francesco Vezzi KTH: Royal Institute of

Introduction De Novo Assembly Assembly Validation Features and FRCurve Bioinformatics Seminars Series: Assembly Validation Francesco Vezzi KTH: Royal Institute of Technology SciLife Lab Stockholm Introduction De Novo Assembly Assembly

643 views • 37 slides

Relaxations of the Seriation Problem and Applications to de novo Genome Assembly Soutenance de

Relaxations of the Seriation Problem and Applications to de novo Genome Assembly Soutenance de th` ese Antoine Recanati sous la direction dAlexandre dAspremont 29 Novembre 2018 Introduction Genome sequencing ...ATGGCGTGCAATG...

537 views • 29 slides

Genome Assembly Sample Prepara1on Fragments Sequencing Reads

Genome Assembly Sample Prepara1on Fragments Sequencing Reads ACGTAGAATACGTAGAA Assembly ACGTAGAATCGACCATG GGGACGTAGAATACGAC ACGTAGAATACGTAGAAACAGATTAGAGAG Con1gs Paired-End Reads Genomic

551 views • 53 slides

A segmentation-clustering problem for the analysis of array CGH data F. Picard, S. Robin, E.

A segmentation-clustering problem for the analysis of array CGH data F. Picard, S. Robin, E. Lebarbier, J-J. Daudin UMR INA-PG / INRA, Paris Bio-Info-Math Workshop, Tehran, April 2005 Microarray CGH technology - Known effects of big size

392 views • 24 slides

Exact posterior distributions over the segmentation space and model selection for multiple

Exact posterior distributions over the segmentation space and model selection for multiple change-point detection problems Guillem Rigaill, Emilie Lebarbier and Stphane Robin, August 2010 G.Rigaill ( ) August 2010 1 / 16 Application to DNA

208 views • 16 slides

Preference Proposals Each student will submit Two (2) votes for topic areas in the form

Preference Proposals Each student will submit Two (2) votes for topic areas in the form of Two (2) Preference Proposals Deadlines: Submit by EOD on Mon, September 24th Discussion on Wed, September 26th. No Class on

253 views • 8 slides

An Algorithmic View on Multi-related-segments: a unifying model for approximate common interval

An Algorithmic View on Multi-related-segments: a unifying model for approximate common interval X.Yang F .Sikora G.Blin S.Hamel R.Rizzi S.Aluru GSAP , Broad Institute of MIT & Harvard USA Universit e Paris-Est, LIGM, UMR 8049

580 views • 31 slides

t tt

t tt rs r rst trt r

655 views • 39 slides

Parallel Compu,ng Strategies for NGS Sequence Mapping Kun Huang

Parallel Compu,ng Strategies for NGS Sequence Mapping Kun Huang Doruk Bozdag, Terry Camerlengo, Ha,ce Gulcin Ozer, Joanne Trgovcich, Tea Meulia, Umit Catalyurek

456 views • 31 slides

for Clinical Gen or Clinical Genomics omics NACG introduction slides Updated 25. February 2020

The Nor he Nordic Alliance dic Alliance for Clinical Gen or Clinical Genomics omics NACG introduction slides Updated 25. February 2020 25/02/2020 NACG CG is an is an independent, independent, non non-go gover ernmental, nmental, not

248 views • 11 slides

Homology Modeling I Basel, September 30, 2004 Torsten Schwede Biozentrum - Universitt Basel

Swiss Institute of Bioinformatics EMBnet course: Introduction to Protein Structure Bioinformatics Homology Modeling I Basel, September 30, 2004 Torsten Schwede Biozentrum - Universitt Basel Swiss Institute of Bioinformatics Klingelbergstr

638 views • 29 slides

ALLPATHS: de novo assembly of whole genome micro-reads by Butler et - PowerPoint PPT Presentation

ALLPATHS: de novo assembly of whole genome micro-reads by Butler et al. Presented by Tim Smith CSC2431 2008/03/12 NGS data presents new challenges and opportunities Find all overlaps is not adequate for NGS data Mean number of false

De Novo Genome Analysis . . . . . Ketil Malde Analysis Annotation evaluation Assembly

Whole Genome Analysis and Annotation Adam Siepel Biological Statistics & Computational

Genome assembly Mark Stenglein, Todos Santos 2018 Genome assembly is the process of attempting to

Assembly Assembly Assembling with Repeats Assembling with Repeats Mate Pairs Mate Pairs Whole

Computational Methods for de novo Assembly of Next-Generation Genome Sequencing Data Rayan Chikhi

Strategies for Bulk RNA-seq Analysis Genome Transcriptome Assembly Mapping Mapping Reads

De novo genome assembly versus mapping to a reference genome Beat Wolf PhD. Student in Computer

BayeHem: Bayesian Optimisation of Genome Assembly 1. Genome Assembly 2. Bayesian Optimisation

Introduction to Bioinformatics Genome sequencing & assembly Genome sequencing & assembly

short read genome assembly Sorin Istrail CSCI1820 Short-read genome assembly algorithms

10X Genome Assembly Technology and Single Cell CNV Credit: 10X Genomics Diana Burkart-Waco DNA

Genome Sequencing & Analysis Core Resource Olivier Fedrigo Friday, October 19, 12 Reference

SciLifeLab Drug Discovery Workshop Uppsala 1 June 2015 Nanna Lneborg Novo Seeds Novo Seeds

Bioinformatics Seminars Series: Assembly Validation Francesco Vezzi KTH: Royal Institute of

Relaxations of the Seriation Problem and Applications to de novo Genome Assembly Soutenance de

Genome Assembly Sample Prepara1on Fragments Sequencing Reads

A segmentation-clustering problem for the analysis of array CGH data F. Picard, S. Robin, E.

Exact posterior distributions over the segmentation space and model selection for multiple

Preference Proposals Each student will submit Two (2) votes for topic areas in the form

An Algorithmic View on Multi-related-segments: a unifying model for approximate common interval

t tt

Parallel Compu,ng Strategies for NGS Sequence Mapping Kun Huang

for Clinical Gen or Clinical Genomics omics NACG introduction slides Updated 25. February 2020

Homology Modeling I Basel, September 30, 2004 Torsten Schwede Biozentrum - Universitt Basel

Sambuz

Useful Links

Newsletter

Mail Us

ALLPATHS: de novo assembly of whole genome micro-reads by Butler et - PowerPoint PPT Presentation

ALLPATHS: de novo assembly of whole genome micro-reads by Butler et al. Presented by Tim Smith CSC2431 2008/03/12 NGS data presents new challenges and opportunities Find all overlaps is not adequate for NGS data Mean number of false

De Novo Genome Analysis . . . . . Ketil Malde Analysis Annotation evaluation Assembly

Whole Genome Analysis and Annotation Adam Siepel Biological Statistics &amp; Computational

Genome assembly Mark Stenglein, Todos Santos 2018 Genome assembly is the process of attempting to

Assembly Assembly Assembling with Repeats Assembling with Repeats Mate Pairs Mate Pairs Whole

Computational Methods for de novo Assembly of Next-Generation Genome Sequencing Data Rayan Chikhi

Strategies for Bulk RNA-seq Analysis Genome Transcriptome Assembly Mapping Mapping Reads

De novo genome assembly versus mapping to a reference genome Beat Wolf PhD. Student in Computer

BayeHem: Bayesian Optimisation of Genome Assembly 1. Genome Assembly 2. Bayesian Optimisation

Introduction to Bioinformatics Genome sequencing &amp; assembly Genome sequencing &amp; assembly

short read genome assembly Sorin Istrail CSCI1820 Short-read genome assembly algorithms

10X Genome Assembly Technology and Single Cell CNV Credit: 10X Genomics Diana Burkart-Waco DNA

Genome Sequencing &amp; Analysis Core Resource Olivier Fedrigo Friday, October 19, 12 Reference

SciLifeLab Drug Discovery Workshop Uppsala 1 June 2015 Nanna Lneborg Novo Seeds Novo Seeds

Bioinformatics Seminars Series: Assembly Validation Francesco Vezzi KTH: Royal Institute of

Relaxations of the Seriation Problem and Applications to de novo Genome Assembly Soutenance de

Genome Assembly Sample Prepara1on Fragments Sequencing Reads

A segmentation-clustering problem for the analysis of array CGH data F. Picard, S. Robin, E.

Exact posterior distributions over the segmentation space and model selection for multiple

Preference Proposals Each student will submit Two (2) votes for topic areas in the form

An Algorithmic View on Multi-related-segments: a unifying model for approximate common interval

t tt

Parallel Compu,ng Strategies for NGS Sequence Mapping Kun Huang

for Clinical Gen or Clinical Genomics omics NACG introduction slides Updated 25. February 2020

Homology Modeling I Basel, September 30, 2004 Torsten Schwede Biozentrum - Universitt Basel

Sambuz

Useful Links

Newsletter

Mail Us

Whole Genome Analysis and Annotation Adam Siepel Biological Statistics & Computational

Introduction to Bioinformatics Genome sequencing & assembly Genome sequencing & assembly

Genome Sequencing & Analysis Core Resource Olivier Fedrigo Friday, October 19, 12 Reference