PSIchomics Shiny application for the integrated analysis of alternative splicing from large transcriptomic datasets Nuno Agostinho Nuno Morais EuroBioC 6 Dec. 2016 laboratory 2016
2 Alternative Splicing Gene Exon 1 Exon 2 Exon 3 Introduction Workflow Case Study Testing Conclusions
2 Alternative Splicing Gene Exon 1 Exon 2 Exon 3 Introduction Workflow Case Study Testing Conclusions
3 Alternative Splicing Gene Exon 1 Exon 2 Exon 3 Occurs in 95% of human multi-exon genes (Pan et al., 2009) • Involved in the control of many cellular processes (Oltean & Bates, 2014) • Introduction Workflow Case Study Testing Conclusions
4 Alternative Splicing Occurs in 95% of human multi-exon genes (Pan et al., 2009) • Involved in the control of many cellular processes (Oltean & Bates, 2014) • Alternative splicing deregulation is linked with cancer development • (Oltean & Bates, 2014) Studying alternative splicing changes may allow to identify prognostic factors and therapeutic targets Introduction Workflow Case Study Testing Conclusions
5 RNA Sequencing Extract Exon 2 Exon 3 Exon 1 1 RNA Divide in 2 fragments Convert 3 to DNA Sequence 4 DNA Obtain 5 reads Map reads to DNA of reference 6 Reference Exon 1 Exon 2 Exon 3 DNA Exonic reads Junction reads Introduction Workflow Case Study Testing Conclusions
6 Alternative Splicing Quantification Junction read counts Alternative splicing annotation inclusion reads Percent Spliced-In (PSI) = inclusion + exclusion reads ACTN1 (exon 19) 4 Median: 0.17 Median: 0.82 Variance: 0.06 Variance: 0.05 2 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Distribution of PSI values Mann–Whitney U test's p-value (FDR): 2.28e-07 Introduction Workflow Case Study Testing Conclusions
7 Quantification and Analytical Tools • Many programs quantify, analyse and visualise alternative splicing data SpliceSeq splicegear AltAnalyze rMATS SeqGSEA jSplice TIN JunctionSeq VAST-TOOLS DEXSeq spliceR SGSeq SUPPA FineSplice Splicing Compass Cufflinks MISO DRIMSeq JuncBASE Introduction Workflow Case Study Testing Conclusions
8 Quantification and Analytical Tools • Many programs quantify, analyse and visualise alternative splicing data • No standard pipeline • Their problems range from… Time-consuming Over-simplistic quantification of analyses or focus in alternative splicing the quantification step No user-friendly No incorporation of interfaces in most tools clinical information Introduction Workflow Case Study Testing Conclusions
9 PSIchomics Quantify, analyse and Modular architecture to visualise alternative easily modify and extend splicing in cancer data the program Visual and Incorporate clinical command-line information interfaces Introduction Workflow Case Study Testing Conclusions
10 Choosing as the language • Open-source, free and cross-platform functional language • Ideal for statistics, data manipulation and graphical computation • Used by the scientific community • Web app framework using R, HTML, CSS and JavaScript • Reactive programming model (Excel-like reactivity) Introduction Design decisions Implementation Testing Conclusions
Workflow Introduction Workflow Case Study Testing Conclusions
12 Alternative Splicing Quantification Junction read counts Alternative splicing annotation inclusion reads Percent Spliced-In (PSI) = inclusion + exclusion reads Introduction Workflow Case Study Testing Conclusions
13 ! # " Alt. Splicing Analyses and Data Retrieval Quantification Visualisation • The Cancer Genome Atlas Clinical data (data from human tumours) Junction • Data is downloaded in-app using Firebrowse web API read counts • Human (hg19 assembly) annotation is Alternative available splicing annotation • Custom annotation files may be created Introduction Workflow Case Study Testing Conclusions
14 ! # " Alt. Splicing Analyses and Data Retrieval Quantification Visualisation Retrieve Junction TCGA data read counts Quantify (optional) alternative splicing Alt. splicing provided or prepared by user annotation inclusion reads Percent Spliced-In (PSI) = inclusion + exclusion reads Introduction Workflow Case Study Testing Conclusions
15 ! # " Alt. Splicing Analyses and Data Retrieval Quantification Visualisation Principal Differential component splicing analysis analysis Gene, RNA Survival and protein analysis information Introduction Workflow Case Study Testing Conclusions
16 ! # " Alt. Splicing Analyses and Data Retrieval Quantification Visualisation • Vast array of customisable and interactive plots • Features tooltips, zooming and plot exporting Density plots Survival curves Introduction Workflow Case Study Testing Conclusions
Case Study Introduction Workflow Case Study Testing Conclusions
18
19
20
21
22
23 PCA: dimensionality reduction by selecting the main directions of variance
24
25
26
% $ & Continuous and Usability Performance Unit Testing Testing Benchmarking Testing Introduction Workflow Case Study Testing Conclusions
28 % $ & Continuous and Usability Performance Unit Testing Testing Benchmarking • Common tumour types were selected from TCGA • Average time was collected over 10 consecutive runs Load data Quantify AS (skipped exon) Differential analyses (Normal vs Tumour) Breast cancer (1093 patients) 47s 2m 39s 2m 35s Pan-kidney cohort (889 patients) 22s 2m 2s 2m 34s 33s 1m 20s 2m 37s Glioma cohort (676 patients) 16s 35s 2m 20s Liver cancer (371 patients) 0 30s 1m 1m 30s 2m 2m 30s 3m 3m 30s 4m 4m 30s 5m 5m 30s 6m 6m 30s MacBook Pro 2011: i7 (8 cores), HDD Running time (with default settings) and 8GB RAM Introduction Workflow Case Study Testing Conclusions
Conclusions Introduction Workflow Case Study Testing Conclusions
30 Conclusions Quantify, analyse and Modular architecture visualise alternative to easily modify and splicing in cancer data extend the program 6 minutes using Command-line and processed data with Incorporates clinical easy-to-use graphical the highest number of information interface patients in TCGA Introduction Workflow Case Study Testing Conclusions
31 Future work ( • Extend available data sources • Quantify alternative splicing from raw data ' • Less dependence on processed data • Deploy to the web ) • No installation required • Always up-to-date Introduction Workflow Case Study Testing Conclusions
32 PSIchomics (MIT license) GitHub Code hosting Bioconductor Biological R packages
Thanks to you all! Nuno Morais Lab André Falcão Ana Rita Grosso
Recommend
More recommend