Spliced Spliced Transcripts Transcripts STAR STAR Alignment - PowerPoint PPT Presentation

Spliced Spliced Transcripts Transcripts STAR STAR Alignment & Alignment & Reconstruction Reconstruction using high throughput using high throughput RNA-seq data RNA-seq data Alexander Dobin, Philippe Batut, Sudipto Chakrabortty, Carrie Davis, , Philippe Batut, Sudipto Chakrabortty, Carrie Davis, Alexander Dobin Delphine Fagegaltier, Sonali Jha, Wei Lin, Delphine Fagegaltier, Sonali Jha, Wei Lin, Felix Schlesinger, Chenghai Xue, Christopher Zaleski, Felix Schlesinger, Chenghai Xue, Christopher Zaleski, Thomas Gingeras Thomas Gingeras CSHL CSHL

STAR: spliced transcript STAR: spliced transcript alignment and reconstruction alignment and reconstruction • 'Ab initio' splice junctions – un-annotated, non-canonical, distal exons, chimeric ... • Unique and multiple mappers • Any read length, any number of SJs per read • Any (reasonable) number of mismatches and indels • Alignment scoring utilizing reads quality scores • "Auto" trimming of poor quality ends • poly-A tails detection • Very Fast: human 75-mer reads: 60 Million read per hour • Memory: RAM~9*(Genome length) bytes: 25GB for human II. Algorithm # 2

Splitting the reads Splitting the reads • Split the read at poor quality bases (QS<10), 'N' • Map each good piece separately • Detect mismatches caused by poor SNR • Avoid erroneous mapping caused by sequencing errors: just 1 SNP can cause mis-mapping from paralog to paralog # 3

Suffix array based search Suffix array based search For each good piece • find maximum exactly mappable length (could be a multiple mapper) • if a long portion of the good piece is still unmapped - repeat • repeat this procedure backwards (from 3' to 5' of a good piece) # 4

Maximum mappable length Maximum mappable length • Typical short read aligner: does the read map entirely, i.e. at full length? • What is the maximum mappable length? – can detect many mismatches Map Extend – can precisely "trim" poor quality tails Map – can detect splice junctions Map Map again • With suffix arrays we find maximum mappable length in no extra time II. Algorithm # 5

Scoring with quality scores Scoring with quality scores • Similar to local alignment scoring, but penalties have probabilistic meaning ( ) = ⋅ QS - 10 log 10 P • Illumina quality score: base - error • +QS for matches; -QS for mismatches ( ) = 10 ⋅ P - log P • Penalty for gap opening: gap 10 SJ ∑ ∑ ∑ = + − − S Q Q P • Total score i i gap = = = i match i mismatch i gap • A more elaborate iterative penalty system is being developed – gap penalty is calculated from mapped gap length distribution – mismatch penalties vs QS scores are re-calibrated after mapping • Choose the alignment(s) with highest score II. Algorithm # 6

Stitch and extend mapped pieces Stitch and extend mapped pieces • Select anchors and alignment windows • Collect all mapped pieces within an alignment window • Consider all collinear transcripts of mapped pieces within a window • Stitch all pieces together • Extend the transcripts through the un-mapped 5' and 3' ends Stitch Extend Extend # 7

Comparison with exhaustive search Comparison with exhaustive search Fly embryo 76mer RNA seq 1 Illumina lane: 8,930,945 total reads, good quality Exhaustively Only in STAR Missed by STAR mapped Exact 5,125,614 0 2,425 1MM 1,353,709 94 3,217 2MM 417,225 23 4,172 Multiple mappers by exhaustive search, <0.002% of all reads STAR maps 99.8% of all exhaustively mapped reads poor quality reads which did not have a single unique "anchor" III. Application # 8

Reads mapped by STAR Reads mapped by STAR 1.5% multi-mappers 8.5% STAR splice junctions 1.8% not mapped by STAR 0.2% STAR InDels 11% STAR >2MM or shorter length 77% overlap with exhaustive search III. Application # 9

STAR alignments STAR alignments ~1,000,000 alignments found by STAR and not by exhaustive search Distribution of mapped lengths Distribution of mismatches mean length = 72 spliced portions poor quality tails III. Application # 10

Benchmarks Benchmarks Single thread benchmarks 75-mer reads Bowtie (-v2 -k1) only reports non-spliced alignments with 0-2 MM, 1 or 2 alignments per read BLAT and STAR report >2MM and spliced alignments, and all the multiple alignments Million of reads aligned per hour BLAT Bowtie STAR Fly 13 19 91 Human 1 13 58 III. Application # 11

Human total RNA Human total RNA • K562 human cell line • Quality scores vs cycle Poor quality tails! percentile 270M reads (76-mer) uniquely mapped mean mapped reads length (bases) 0-2 MM 51M 76 0-2MM & trim to 50 72M 50 STAR 106M 64.8 III. Application # 12

Splice junctions Splice junctions Splice junctions with 3 or more All spliced reads: 3.75M reads per junction: 87k Non-Canonical Un-Annotated Non- 150k (4%) Canonical Canonical 6k (7%) 2.8k (3%) Annotated Canonical Canonical GT/AG 78k (90%) 3.6M (96%) ~0.5% of mapped reads are chimeric: inter-chromosome or inter-strand # 13

Summary Summary • STAR : ab initio splice junction detection • Maximum mappable length search with suffix arrays • Alignment scoring uses quality scores of the reads • Very fast: 60M/hour for 75-mer reads in human , requires large amount of RAM (~25GB for human) # 14

Spliced Spliced Transcripts Transcripts STAR STAR Alignment - PowerPoint PPT Presentation

Spliced Spliced Transcripts Transcripts STAR STAR Alignment & Alignment & Reconstruction Reconstruction using high throughput using high throughput RNA-seq data RNA-seq data Alexander Dobin, Philippe Batut, Sudipto

Three Steps 1. Apply through the NCAA Eligibility Center 2. Request and send your Transcripts

Interlinking source text collections a Norwegian example Christian-Emil Ore Charter by king

SET GOALS. MEASURE PROGRESS. STAR Communities Ratings Participating STAR Community Reporting STAR

Transcripts & Graduation Requirements Review Packet of Materials Deadlines

A Star Is Born! A Star Is Born! p. 1/3 A Star Is Born! The photograph below shows a cloud of

Senior Class of 2020 Presentation Overview Handouts - Transcripts - use for your applications

Senior Class of 2019 Presentation Overview Handouts - Transcripts - use for your applications

Highlights from Highlights from the STAR experiment the STAR experiment Hanna Zbroszczyk for the

Ris isin ing S Senio ior Pres esen entation Class of 2019 1 st five transcripts are

Reproducible Identification of Pragmatic Universalia in CHILDES Transcripts GNU meets OpenScience

Bioinformatics Institute (BII) A*STAR Singapore Frank Eisenhaber www.bii.a-star.edu.sg

Upsilon production in STAR Pibero Djawotho for the STAR Collaboration Texas A&M University

Year / Level Analysis on Horse Age Number Of Starters Completion Rate 2014 One Star Two Star

STAR Program STAR Program Teachers Dan Kipfer Leann Laframboise The STAR program is designed to

Lorraine Cox STAR Director Elizabeth McKenna Asst. Director (Delivery) STAR Objectives:

Transcript arrival Transcripts are received in many ways. The most popular delivery is

InSite: Enabling Transparency With Searchable, Shareable, Interactive Transcripts IAnnotate 2018,

The FMS Trigger at STAR John Calvin Martinez Carl Gagliardi Pibero Djawotho Texas A&M

Agenda for Today Transcripts/credits Foothill Counseling website Naviance Senior

Star Product and Star Exponential Akira Yoshioka, Dept. of Math. Sci., Tokyo University of

Using STAR-CCM+ for Catalyst Utilization Analysis STAR Global Conference Amsterdam Netherlands

Welcome 2016 Silicon Valley All-Star Coaches www.siliconvalleynjb.com www.njbl.org SV All-Star

Welcome 2018 Silicon Valley All-Star Coaches www.siliconvalleynjb.com www.njbl.org SV All-Star

Welcome 2017 Silicon Valley All-Star Coaches www.siliconvalleynjb.com www.njbl.org SV All-Star

Spliced Spliced Transcripts Transcripts STAR STAR Alignment - PowerPoint PPT Presentation

Spliced Spliced Transcripts Transcripts STAR STAR Alignment & Alignment & Reconstruction Reconstruction using high throughput using high throughput RNA-seq data RNA-seq data Alexander Dobin, Philippe Batut, Sudipto

Three Steps 1. Apply through the NCAA Eligibility Center 2. Request and send your Transcripts

Interlinking source text collections a Norwegian example Christian-Emil Ore Charter by king

SET GOALS. MEASURE PROGRESS. STAR Communities Ratings Participating STAR Community Reporting STAR

Transcripts &amp; Graduation Requirements Review Packet of Materials Deadlines

A Star Is Born! A Star Is Born! p. 1/3 A Star Is Born! The photograph below shows a cloud of

Senior Class of 2020 Presentation Overview Handouts - Transcripts - use for your applications

Senior Class of 2019 Presentation Overview Handouts - Transcripts - use for your applications

Highlights from Highlights from the STAR experiment the STAR experiment Hanna Zbroszczyk for the

Ris isin ing S Senio ior Pres esen entation Class of 2019 1 st five transcripts are

Reproducible Identification of Pragmatic Universalia in CHILDES Transcripts GNU meets OpenScience

Bioinformatics Institute (BII) A*STAR Singapore Frank Eisenhaber www.bii.a-star.edu.sg

Upsilon production in STAR Pibero Djawotho for the STAR Collaboration Texas A&amp;M University

Year / Level Analysis on Horse Age Number Of Starters Completion Rate 2014 One Star Two Star

STAR Program STAR Program Teachers Dan Kipfer Leann Laframboise The STAR program is designed to

Lorraine Cox STAR Director Elizabeth McKenna Asst. Director (Delivery) STAR Objectives:

Transcript arrival Transcripts are received in many ways. The most popular delivery is

InSite: Enabling Transparency With Searchable, Shareable, Interactive Transcripts IAnnotate 2018,

The FMS Trigger at STAR John Calvin Martinez Carl Gagliardi Pibero Djawotho Texas A&amp;M

Agenda for Today Transcripts/credits Foothill Counseling website Naviance Senior

Star Product and Star Exponential Akira Yoshioka, Dept. of Math. Sci., Tokyo University of

Using STAR-CCM+ for Catalyst Utilization Analysis STAR Global Conference Amsterdam Netherlands

Welcome 2016 Silicon Valley All-Star Coaches www.siliconvalleynjb.com www.njbl.org SV All-Star

Welcome 2018 Silicon Valley All-Star Coaches www.siliconvalleynjb.com www.njbl.org SV All-Star

Welcome 2017 Silicon Valley All-Star Coaches www.siliconvalleynjb.com www.njbl.org SV All-Star

Transcripts & Graduation Requirements Review Packet of Materials Deadlines

Upsilon production in STAR Pibero Djawotho for the STAR Collaboration Texas A&M University

The FMS Trigger at STAR John Calvin Martinez Carl Gagliardi Pibero Djawotho Texas A&M