Program – an analysis workflow Day 1. � Basic functionality of Chipster (Eija) Microarray data analysis with Chipster � Data import (Eija) � Quality control (Jarno) 16.-17.4.2008 � Normalization (Jarno) • Describing the experiment � Filtering and missing value considerations (Jarno) Jarno Tuimala Day 2. Eija Korpelainen � Statistical testing (Jarno) � Clustering and visualization (Jarno) � Annotation (Eija) � Promoter analysis (Eija) � Experimental design (Jarno) – if time allows Demo data � Affymetrix • Kidney cancer Introduction to microarrays Introduction to microarrays • 8 controls, 9 cancer patients � Agilent • Acute leukemia • 7 controls, 7 FLT mutated � Illumina • Teratozoospermia • 5 controls, 8 affected 1
Research using microarrays � Plan! • Experimental design Introduction to Chipster � Laboratory work • Extract, label, hybridize � Computer work • Scanning, image analysis • Bioinformatics � Laboratory work • Confirmation � Publish • Submit data to public databases How does it work? Chipster � Goal: Easy access to leading analysis tools such as those developed in the CSC internet desktop R/Bioconductor project � Features • Easy to use graphical user interface • Comprehensive selection of tools SSL front Java Web Start security client • Support for different array types (Affymetrix, Agilent, Illumina, cDNA) installs and end • Compatible with Windows, Linux and Mac OS X updates client • Easy to install and update automatically SOAP • Wizards and workflows • Interactive graphics analyser • Transparency (as opposed to “black box”) • Alternative annotations for Affymetrix arrays Corona/Murska • Automatic tracking of performed analyses international Web Services VISUALISATION ANALYSIS � http://www.csc.fi/english/customers/university/useraccounts/scientificservices.pdf � http://chipster.csc.fi 2
Acknowledgements � Aleksi Kallio � Jarno Tuimala � Taavi Hupponen � Mika Rissanen, Janne Käki, Mikko Koski, Petri Klemelä � All the pilot users � Department of computer science (HY) � Dario Greco (HY) � Prof. Olli Yli-Harja’s group (TUT) � GeneCruiser team (MIT Broad Institute) � Tekes/SA SYSBIO-program Phenodata – describing your experiment � Phenodata file is created during normalization Tools � Fill in the group column with numbers describing your experimental setup • e.g. 1 = healthy control, 2 = cancer sample Data • necessary for the statistical tests to work � If you bring in previously created normalized data and phenodata: • Choose ”import directly” in the import tool • Right click on normalized data, choose ”Link to” phenodata and link type ”Annotation” � If you brought in normalized data and need to create phenodata for it: Visualization • Utilities/ Generate phenodata (fill in the chiptype parameter!) • Right click on normalized data, choose ”Link to” phenodata and link type ”Annotation” • Fill in the group column 3
Interactive visualizations by the client Visualizing the data � Spreadsheet � Histogram � Data visualization panel � Scatterplot • Maximize and redraw for better viewing � 3D scatterplot � Expression profiles � Two types of visualizations � Clustered profiles 1. Interactive visualizations produced by the client program � Hierarchical clustering • Select the visualization method from the pulldown menu of the data � SOM clustering visualization panel � Array pseudo-image • Save by right clicking on the image 2. Static images produced by R/Bioconductor, Weeder, etc • Select from Analysis tools/ Visualisation Available actions: • View by double clicking on the image file � Change titles, colors etc • Save by right clicking on the file name and choosing ”Export” � Zoom in/out � Select and annotate genes using the MIT GeneCruiser 4
Static images produced by R/Bioconductor � Volcano plot � Box plot � Histogram � Heatmap � Venn diagram � Idiogram � Chromosomal position � Correlogram � Dendrogram � QC stats plot � RNA degradation plot � K-means clustering � SOM-clustering Automatic tracking of analysis history Running many analyses simultaneously � You can have max 5 analysis jobs running at the same time � Use Task manager to • view parameters, status,… • cancel jobs 5
Workflow – reusing your analysis pipeline Workspace – continue later/elsewhere � Creates a ”macro” that can be applied to another normalized dataset and � Saving your workspace allows you to continue later phenodata • File/ Save workflow • File/ Load workflow � Choose a dataset, and workflow records the analysis steps that lead to that dataset � Currently it is possible to have only one workspace saved at the time � You can give the workflow a meaningful name (ending .bsh), but it has to be � If you would like to continue your work on another computer, you located in the chipster-scripts folder need to transfer the workspace-snapshot -folder to the corresponding location under nami-work-files • C:\Documents and Settings\ekorpela\nami-work-files\workspace-snapshot � You can run a workflow on another computer by making it visible to Chipster with ”Reload workflows from disk” � You can change parameters directly to the workflow file Wizard – autopilot for analysis Wizard for Affymetrix data � Ready-made workflow to find differentially expressed genes • Normalization • Phenodata creation • Statistical test • Hierarchical clustering 6
Importing files Import tool, step 1 � Affymetrix CEL-files are imported to � Define Chipster automatically • Header • Footer � Other files are imported using the • Title row Import tool • Delimiter Import tool, step 2 Importing Agilent files � Define columns � Sample (rMeanSignal) � Modify flags � Sample background (rBGMedianSignal) � Control (gMeanSignal) � Control background (gBGMedianSignal) � Identifier (ProbeUID) � Annotation (ControlType) � https://extras.csc.fi/biosciences/chipster-manual/data-formats.html 7
Exercise I 1. Import the demo data of your favorite type in Chipster � Affymetrix Exercise � Agilent 2. Save the workspace 3. Have lunch (back at 13.00) Quality control tools � Quality control -tools • Affymetrix basic Quality control RNA degradation + Affy QC • Affymetrix RLE & NUSE (might take a long time to run) Fits a model to expression values • Agilent MA-plot + density plot + boxplot � Visualization – dendrogram � Statistics - NMDS 8
Affymetrix I Affymetrix II � Quality control tools are run on raw data (CEL files). • Dendrogram and NMDS on normalized data Agilent General QC – dendrogram and NMDS 9
Scatterplots Heatmaps (this took an hour to calculate) QC-tools in Chipster � Quality control • Affymetrix basic Normalization • Affymetrix RLE and NUSE • Agilent 2-color � Visualization • Dendrogram • Heatmap • Correlogram � Statistics • NMDS 10
What is normalization? Methods � Normalization is the process of removing systematic � Affymetrix variation from the data. • Background correction + expression estimation + summarization � Typically you would normalize your data so that all the • RMA (default) uses only PM probes, fits a model to them, and gives out chips become comparable. expression values after quantile normalization and median polishing � Agilent • Background correction + averaging duplicate spots + normalization � After normalization the expression values are always expressed on log2-scale Affymetrix Agilent I � Methods: MAS5, Plier, RMA, GCRMA, Li-Wong � Background correction • MAS5 is the older Affymetrix method, Plier is a newer one • RMA is the default, and works rather nicely if you have more than a • Background treatment few chips None, Subtract, Edwards, Normexp • GCRMA is similar to RMA, but takes also GC% content into account • Background offset • Li-Wong is the method implemented in dChip 0 or 50 � Variance stabilization makes the variance over all the chips � Normalize chips similar • None, median, loess • Works only with MAS5 and Plier, since all others output log2- � Normalize genes (not typically used) tranformed data by default (and thus corrected for the same • None , scale (to median), quantile phenomenon) � Chiptype � Custom chiptype • A must setting! • If you want to use reannotated probes (they are really assigned to the genes where they belong), select one from this menu. 11
Checking normalization Agilent II � Background treatment typically generates many negative values that are coded as missing values after log2-transformation. • Usual subtract option does this • Using normexp + offset 50 will generate no negative values, and gives rather good estimates (best method reported) � Loess removes curvature from the data (suggested) Exercise II � Normalize your dataset • Use two different normalization schemes Exercise � Describe the experiment (fill in phenodata) � Check the quality of your dataset • Is there difference between the normalization schemes • If there is, select the better one, and continue with it 12
Recommend
More recommend