scalable differential transcript usage analysis for
play

Scalable differential transcript usage analysis for single-cell - PowerPoint PPT Presentation

Scalable differential transcript usage analysis for single-cell applications JEROEN GILIS EuroBioc2019 presentation Promotor: Prof. Lieven Clement Supervisor: Dr. Koen Van den Berge 1 Di ff erential Transcript Usage (DTU) Translation


  1. Scalable differential transcript usage analysis for single-cell applications JEROEN GILIS EuroBioc2019 presentation Promotor: Prof. Lieven Clement Supervisor: Dr. Koen Van den Berge � 1

  2. Di ff erential Transcript Usage (DTU) Translation Alternative splicing Normal metabolism Transcription Isoform M1 Pre-mRNA Tumorigenesis (gene-level) DNA Isoform M2 Gene-level analysis Transcript-level analysis Expression level (cpm) Relative usage (%) M1 M2 M1 M2 � 2

  3. Method development • Our workflow unlocks edgeR for DTU analysis Y gi ~ NB ( µ gi , φ g ) DGE log ( µ gi ) = η gi C η gi = β 0 + β gc + log (S i ) � 3

  4. Method development • Our workflow unlocks edgeR for DTU analysis Y ti ~ NB ( µ ti , φ t ) DTE log ( µ ti ) = η ti C η ti = β 0 + β tc + log (S i ) � 4

  5. Method development • Our workflow unlocks edgeR for DTU analysis Y ti ~ NB ( µ ti , φ t ) DTU log ( µ ti ) = η ti C η ti = β 0 + β tc + log (T ti ) • Our workflow takes the gene-level counts (total counts, T ti ) as offsets to the GLM framework edgeR-total � 5

  6. Method development • Our workflow unlocks edgeR for DTU analysis Y ti ~ NB ( µ ti , φ t ) DTU log ( µ ti ) = η ti C η ti = β 0 + β tc + log (T ti ) • Our workflow takes the gene-level counts (total counts, T ti ) as offsets to the GLM framework edgeR-total • DEXSeq Sample 1 … Sample m Sample 1 … Sample m Tx 1 112 … 15 Tx 1 25 … 3 ‘other’ Counts Tx t … … … Tx t … … … counts Tx n 62 … 348 Tx n 88 … 212 • Our second workflow takes the other counts as offsets edgeR-other � 6

  7. Performance evaluation on real bulk data Gtex dataset, Nature Genetics 45, 580-585 (2013) 5v5 75v75 10v10 DEXSeq edgeR_total edgeR_other limma_di ff splice DRIMSeq � 7

  8. Scalability benchmark on real single-cell data • Our workflow performs a DTU analysis between two groups of 512 cells in ~20 minutes • DEXSeq scales quadratically � 8

  9. Single-cell transcriptomics case study Dataset from Buettner et al., Nature Biotechnology 33; 155-160 (2015) • Dataset; 288 mouse embryonic stem cells, di ff erent cell cycle stages (G1, S and G2M) • Runtime; < 2 minutes • Significant enrichment in cell cycle processes • Several DTU genes are; ✦ Biologically relevant ✦ Not picked up in a gene-level analysis ✦ Clearly di ff erentially used when visualised Ccdc86 *** *** Proportions Phase G1 S Tx1 Tx2 Tx3 The size of the dots (which represent individual cells) are weighted according to the total expression of the gene in that cell. � 9

  10. Single-cell transcriptomics case study Buettner dataset, Nature Biotechnology 33; 155-160 (2015) • Dataset; 288 mouse embryonic stem cells, di ff erent cell cycle stages (G1, S and G2M) • Runtime; < 2 minutes for o ff set-based methods • Significant enrichment in cell cycle processes • Some DTU genes display clear DTU in visualisation and are biologically relevant • edgeR_other method large number of (false) positive results; sensitive to outliers (?) • Discrepancy between edgeR-total and limma di ff splice; asses formally in single-cell benchmark Eef1d limma di ff splice edgeR-total *** Proportions edgeR-other Tx8 Phase G1 G2M � 10

  11. Take-home messages We are developing a workflow for studying DTU that; 1. Has a performance similar to that of DEXSeq 2. Correctly controls the false discovery rate 3. Scales towards large transcriptomics datasets � 11

  12. Scalable differential transcript usage analysis for single-cell applications JEROEN GILIS EuroBioc2019 presentation Promotor: Prof. Lieven Clement Supervisor: Dr. Koen Van den Berge � 12

  13. Background - DTU � 13

  14. Background - DEXSeq • Input : matrix of transcript-level counts (e.g. Salmon or kallisto) Transcript-level counts Complementary counts • Statistical model: Y ti ~ NB ( µ ti , φ t ) log ( µ ti ) = η ti S T TC η ti = β ti + β t + β tci � 14

  15. Parametric bulk simulation study Dataset from Love et al., F1000Research, 7:952 (2018) 3v3 10v10 6v6 DEXSeq edgeR_total edgeR_other limma_di ff splice DRIMSeq � 15

  16. Gtex dataset stringent filtering DEXSeq edgeR_total edgeR_other limma_di ff splice DRIMSeq � 16

  17. Love dataset stringent filtering DEXSeq edgeR_total edgeR_other limma_di ff splice DRIMSeq � 17

  18. Other parametric bulk simulations and additional methods Love 6v6 Van den Berge 5v5 (1) Van den Berge 5v5 (2) DEXSeq edgeR_total edgeR_other limma_di ff splice DRIMSeq NBSplice edgeR_di ff splice � 18

  19. Results - Scalability • Methods that require sample-level intercepts scale quadratically with the number of cells • edgeR one order of magnitude faster than DESeq2 • All methods scale linearly with the number of transcripts � 19

  20. � 20

Recommend


More recommend