expression quantification ii
play

Expression quantification II Helene Kretzmer 15.05.2012 Pipeline - PowerPoint PPT Presentation

Expression quantification II Helene Kretzmer 15.05.2012 Pipeline (i) RNA isolation from sample (ii) RNA transcription to cDNA and fragmentation (iii) sequencing (iv) mapping reads to reference genome (v) using read counts for expression level


  1. Expression quantification II Helene Kretzmer 15.05.2012

  2. Pipeline (i) RNA isolation from sample (ii) RNA transcription to cDNA and fragmentation (iii) sequencing (iv) mapping reads to reference genome (v) using read counts for expression level estimation Mapping Problems unknown isoforms sequencing non-uniformity read mapping uncertainty IZBI Introduction 2

  3. Read Mapping Uncertainty paralogous genes low-complexity regions high sequence similarity reference sequence errors sequencing errors � gene mulitreads ⇒ multireads isoform multireads IZBI Introduction 3

  4. Mapping Strategies (a) discard mulitreads 2 1 3 3 (b) rescue mulitreads (c) em - a statistical model IZBI Introduction 4

  5. Measures of Expression - isoform i τ i .. fraction of transcripts percentage of isoform i of all transcripts in the sample ν i .. fraction of nucleotides percentage of isoform i of all nucleotides in the sample ℓ i .. length of isoform i in nucleotides τ i = RPKM i · 10 − 9 � τ j ℓ j j IZBI EM Model 5

  6. Measures of Expression - isoform i τ i .. fraction of transcripts − 1   τ i = ν i ν j   �         ℓ i  ℓ j      j ν i .. fraction of nucleotides τ i ℓ i ν i = � τ j ℓ j j ℓ i .. length of isoform i in nucleotides τ i = RPKM i · 10 − 9 � τ j ℓ j j IZBI EM Model 5

  7. EM-Model Generative Model N reads all of length L Assumptions M isoforms isoform sequence is known additional noise isoform uniformly distributed reads: # reads of isoform i −→ ν i N IZBI EM Model 6

  8. R n .. sequence of read n N IZBI EM Model 7

  9. R n .. sequence of read n G n .. isoform of read n N IZBI EM Model 7

  10. R n .. sequence of read n G n .. isoform of read n S n .. start position of read n N IZBI EM Model 7

  11. R n .. sequence of read n G n .. isoform of read n S n .. start position of read n O n .. orientation (strang) of read n N IZBI EM Model 7

  12. R n .. sequence of read n G n .. isoform of read n S n .. start position of read n O n .. orientation (strang) of read n N S n G n IZBI EM Model 7

  13. R n .. sequence of read n G n .. isoform of read n S n .. start position of read n O n .. orientation (strang) of read n N S n G n O n IZBI EM Model 7

  14. R n .. sequence of read n G n .. isoform of read n S n .. start position of read n O n .. orientation (strang) of read n N S n G n R n O n IZBI EM Model 7

  15. R n .. sequence of read n G n .. isoform of read n S n .. start position of read n O n .. orientation (strang) of read n θ = [ θ 0 , . . . , θ M ] .. expression levels of the isoforms 0 , . . . , M N S n θ G n R n O n IZBI EM Model 7

  16. R n .. sequence of read n G n .. isoform of read n S n .. start position of read n O n .. orientation (strang) of read n θ = [ θ 0 , . . . , θ M ] .. expression levels of the isoforms 0 , . . . , M N S n θ G n R n O n P ( s n | g n ) IZBI EM Model 7

  17. R n .. sequence of read n G n .. isoform of read n S n .. start position of read n O n .. orientation (strang) of read n θ = [ θ 0 , . . . , θ M ] .. expression levels of the isoforms 0 , . . . , M N S n θ G n R n O n P ( s n | g n ) P ( o n | g n ) IZBI EM Model 7

  18. R n .. sequence of read n G n .. isoform of read n S n .. start position of read n O n .. orientation (strang) of read n θ = [ θ 0 , . . . , θ M ] .. expression levels of the isoforms 0 , . . . , M N S n θ G n R n O n P ( s n | g n ) P ( o n | g n ) P ( r n | g n , s n , o n ) IZBI EM Model 7

  19. R n .. sequence of read n G n .. isoform of read n S n .. start position of read n O n .. orientation (strang) of read n θ = [ θ 0 , . . . , θ M ] .. expression levels of the isoforms 0 , . . . , M N S n θ G n R n O n P ( g n | θ ) P ( s n | g n ) P ( o n | g n ) P ( r n | g n , s n , o n ) IZBI EM Model 7

  20. R n .. sequence of read n G n .. isoform of read n S n .. start position of read n O n .. orientation (strang) of read n θ = [ θ 0 , . . . , θ M ] .. expression levels of the isoforms 0 , . . . , M N S n θ G n R n O n N P ( g , s , o , r | θ ) = � P ( g n | θ ) P ( s n | g n ) P ( o n | g n ) P ( r n | g n , s n , o n ) n = 1 IZBI EM Model 7

  21. Summary P ( G n = i | θ ) .. probability that read n maps to isoform i given the expression levels θ 0 , . . . , θ M P ( O n = 0 | G n � 0 ) .. probability that read n has the same orientation as its template given that it is not from the noise isoform P ( S n = j | G n = i ) .. probability that read n starts at position j given that it is from isoform i P ( R n = ρ | G n = i , S n = j , O n = 0 ) .. probability that read n has sequence ρ given it is from isoform i , starts at position j and has the same orientiation as its template IZBI EM Model 8

  22. Isoform G n N P ( g , s , o , r | θ ) = � P ( g n | θ ) P ( o n | g n ) P ( s n | g n ) P ( r | g n , s n , o n ) n = 1 P ( G n = i | θ ) G n ∈ [ 0 , M ] 0 noise isoform 1 , . . . , M known isoforms P ( G n = i | θ ) = θ i and � θ i = 1 i IZBI EM Model 9

  23. Orientiation O n N P ( g , s , o , r | θ ) = � P ( g n | θ ) P ( o n | g n ) P ( s n | g n ) P ( r | g n , s n , o n ) n = 1 P ( O n = 0 | G n � 0 ) � 1 , reverse complement O n = 0 , same orientation as its template � 1 , strand specific sequencing P ( O n = 0 | G n � 0 ) = 0 . 5 , not strand specific sequencing IZBI EM Model 10

  24. Startposition S n N P ( g , s , o , r | θ ) = P ( g n | θ ) P ( o n | g n ) P ( s n | g n ) P ( r | g n , s n , o n ) � n = 1 P ( S n = j | G n = i ) S n ∈ [ 1 , . . . , max ℓ i ] ℓ i .. length of isoform i i  1 uniform read start distribution ℓ i ,   P ( S n = j | G n = i ) =  f ( j ℓ i ) − f ( j − 1 ℓ i ) , non-uniform read start distribution    f .. empirical cumulative density function over [ 0 , 1 ] 0.055 Probability density function 0.045 0.035 0.0 0.2 0.4 0.6 0.8 1.0 IZBI EM Model 11 Fractional position along transcript

  25. Sequence R n N P ( g , s , o , r | θ ) = P ( g n | θ ) P ( o n | g n ) P ( s n | g n ) P ( r | g n , s n , o n ) � n = 1 P ( R n = ρ | G n = i , S n = j , O n = k ) strand specific protocol, known isoforms: L � ω t ( ρ t , γ i P ( R n = ρ | G n = i , S n = j , O n = 0 ) = j + t − 1 ) t = 1 ω t ( a , b ) = P ( read [ t ] = a | isoform [ j + t − 1 ] = b ) γ i .. sequence of isoform i IZBI EM Model 12

  26. Sequence R n N P ( g , s , o , r | θ ) = P ( g n | θ ) P ( o n | g n ) P ( s n | g n ) P ( r | g n , s n , o n ) � n = 1 P ( R n = ρ | G n = i , S n = j , O n = k ) strand specific protocol, known isoforms: L � ω t ( ρ t , γ i P ( R n = ρ | G n = i , S n = j , O n = 0 ) = j + t − 1 ) t = 1 ω t ( a , b ) = P ( read [ t ] = a | isoform [ j + t − 1 ] = b ) Alignment of read and isoform: γ i .. sequence of isoform i C G A T A T C C G A A T C G P ( R n = ρ | G n = i , S n = j , O n = 0 ) = ω 1 ( C , C ) ω 2 ( G , G ) ω 3 ( A , A ) ω 4 ( T , A ) IZBI EM Model 12

  27. Sequence R n N P ( g , s , o , r | θ ) = P ( g n | θ ) P ( o n | g n ) P ( s n | g n ) P ( r | g n , s n , o n ) � n = 1 P ( R n = ρ | G n = i , S n = j , O n = k ) strand specific protocol, known isoforms: L � ω t ( ρ t , γ i P ( R n = ρ | G n = i , S n = j , O n = 0 ) = j + t − 1 ) t = 1 ω t ( a , b ) = P ( read [ t ] = a | isoform [ j + t − 1 ] = b ) γ i .. sequence of isoform i strand specific protocol, noise isoform 0: L P ( R n = ρ | G n = 0 , S n = j , O n = 0 ) = � β ( ρ t ) t = 1 β .. background distribution IZBI EM Model 12

  28. Estimation of Expression Levels Given: N reads of length L and M known isoforms Assumption: reads are uniformly sampled from the transcriptome EM Algorithm: find θ = [ θ 0 , . . . , θ M ] that maximizes P ( r | θ ) N M 1 � � � P ( r | θ ) = P ( r n | g n = i , s n = j ) θ i ℓ i n = 1 i = 0 j θ i ν i ≈ 1 − θ 0 IZBI EM Model 13

  29. Estimation of Expression Levels EM-Algorithm: iteratively optimization of θ Given: N reads of length L and M known isoforms latent variables: G n , S n , O n Assumption: reads are uniformly sampled from the transcriptome E-step: E [ G n = i , S n = j , O n = k ] = P ( G n = i , S n = j , O n = k | r , θ t ) EM Algorithm: find θ = [ θ 0 , . . . , θ M ] that maximizes P ( r | θ ) M-step: N M 1 θ t + 1 = arg max E [ log ( P ( r , g n , o n , s n | θ )) | r , θ t ] � � � P ( r | θ ) = P ( r n | g n = i , s n = j ) θ i θ ℓ i n = 1 i = 0 j θ i ν i ≈ 1 − θ 0 IZBI EM Model 13

  30. Estimation of Expression Levels Given: N reads of length L and M known isoforms Assumption: reads are uniformly sampled from the transcriptome EM Algorithm: find θ = [ θ 0 , . . . , θ M ] that maximizes P ( r | θ ) N M 1 � � � P ( r | θ ) = P ( r n | g n = i , s n = j ) θ i ℓ i n = 1 i = 0 j θ i ν i ≈ 1 − θ 0 IZBI EM Model 13

  31. (a) (b) Gene expression estimates (y-axis) vs. sample values (x-axis) for the simulated mouse (a) and maize (b) RNA-Seq data sets. Comparisons are given for ν . IZBI Results 14

  32. IZBI Refinements 15

  33. N S n θ G n R n O n Thank you for your attention! IZBI 16

Recommend


More recommend