R package gcExplorer Scharl, Leisch R package gcExplorer : graphical and Motivation Cluster inferential exploration of cluster solutions Analysis Neighborhood Graphs Software Theresa Scharl 1 , 2 Friedrich Leisch 3 Inference Summary 1 Institut für Statistik und Wahrscheinlichkeitstheorie Technische Universität Wien 2 Department of Biotechnology University of Natural Resources and Applied Life Sciences,Vienna 3 Institut für Statistik Ludwig-Maximilians-Universität München UseR! 2009, July 8 th , Rennes
R package gcExplorer Outline Scharl, Leisch Motivation Cluster Analysis Motivation 1 Neighborhood Graphs Software Cluster Analysis 2 Inference Summary 3 Neighborhood Graphs 4 Software Inference 5
R package gcExplorer Motivation Scharl, Leisch Motivation Cluster Analysis Exploration and visualization of cluster solutions Neighborhood Graphs • Interpretation of cluster results. Software • Understanding of the cluster structure. Inference • Relationships between segments of a partition. Summary Inference for gene cluster graphs • Explore the quality of a cluster solution. • External validation of clustering. • Association to a functional group.
R package gcExplorer E. coli data Scharl, Leisch Recombinant E. coli process Motivation Cluster • Evaluate the influence of the induction level of Analysis N pro GFPmut3.1 an inclusion body forming protein on Neighborhood Graphs host metabolism Software • Non-induced state was compared to samples past Inference induction Summary Oxygen data (Covert et al., 2004) • Investigation of various mutants under oxygen deprivation • Target the a priori most relevant part of the transcriptional netwok • Use six strains with knockouts of key transcriptional regulators in the oxygen response.
R package gcExplorer Cluster algorithms Scharl, Leisch Motivation Partitioning cluster algorithms Cluster Cluster algorithms like K–means and PAM or others where Analysis clusters can be represented by centroids (e.g., QT–Clust, Neighborhood Graphs Heyer et al., Genome Research, 1999). Software Inference R package flexclust Summary • Flexible toolbox to investigate the influence of distance measures and cluster algorithms. • Extensible implementations of the generalized k–Means and QT–Clust algorithm. • Possibility to try out a variety of distance or similarity measures. • Cluster algorithms are treated separately from distance measures.
R package gcExplorer TRNs and silhouette plots Scharl, Leisch Motivation Topology–representing networks Cluster Analysis (Martinetz and Schulten, 1994) Neighborhood Graphs • Count the number of data points a pair of centroids is Software closest and second–closest. Inference • Centroid pairs with a positive count are connected. Summary Silhouette plots (Rousseeuw, 1987) • Compare the distance from each point to the points in its own cluster to the distance to points in the second closest cluster. • The larger the silhouette values the better a cluster is separated from the other clusters.
R package gcExplorer Neighborhood graphs Scharl, Leisch (Leisch, 2006) Motivation Cluster Analysis • Neighborhood graphs use mean relative distances as Neighborhood edge weights. Graphs Software • Assume we are given a data set X N = { x 1 , . . . , x N } and Inference • a set of centroids C K = { c 1 , . . . , c K } . Summary • The centroid closest to x is denoted by c ( x ) = argmin d ( x , c ) . c ∈ C K • And the second closest centroid to x is denoted by ˜ c ( x ) = argmin d ( x , c ) . c ∈ C K \{ c ( x ) }
R package gcExplorer Neighborhood graphs Scharl, Leisch Motivation Cluster Analysis Neighborhood • The set of all points where c i is the closest centroid and Graphs c j is second–closest is given by Software Inference A ij = { x n | c ( x n ) = c i , ˜ c ( x n ) = c j } . Summary • Now we define edge weights 2 d ( x , c ( x )) � | A i | − 1 � A ij � = ∅ c ( x )) , x ∈ A ij d ( x , c ( x ))+ d ( x , ˜ s ij = A ij = ∅ 0 ,
R package gcExplorer Neighborhood graphs Scharl, Leisch Motivation k1 Cluster Analysis Neighborhood k4 Graphs Software k8 k7 Inference Summary k11 k5 k2 k10 k13 k3 k9 k14 k12 k6
R package gcExplorer R package gcExplorer Scharl, Leisch An interactive visualization toolbox for clusters Motivation (Scharl and Leisch, 2009) Cluster Analysis Neighborhood • New visualization techniques to display cluster results Graphs of high dimensional data. Software Inference • Nonlinear arrangements of the cluster centroids using Summary Bioconductor packages Rgraphviz and graph • Interactive exploration using arbitrary panel functions. • Visualize properties of clusters using arbitray node functions. • Allow small glyphs for the representation of nodes. • Inference for gene cluster graphs http://cran.r-project.org/package=gcExplorer.
R package gcExplorer How to use gcExplorer Scharl, Leisch Motivation Cluster analysis Cluster Analysis Neighborhood R> library("gcExplorer") Graphs R> data("ps19") Software R> set.seed(1111) Inference Summary R> cl1 <- qtclust(ps19, radius = 2, + save.data = TRUE) Interactive gcExplorer R> gcExplorer(cl1, theme = "blue", + panel.function = gcProfile, + node.function = node.size)
R package gcExplorer Interactive gcExplorer Scharl, Leisch Motivation Cluster Analysis Neighborhood Graphs Software Inference Summary
R package gcExplorer Interactive gcExplorer Scharl, Leisch Motivation Cluster Analysis Neighborhood Graphs Software Inference Summary
R package gcExplorer Interactive gcExplorer Scharl, Leisch Motivation Cluster Analysis Neighborhood Graphs Software Inference Summary
R package gcExplorer Interactive gcExplorer Scharl, Leisch Motivation Cluster Analysis Neighborhood Graphs Software Inference Summary
R package gcExplorer Interactive gcExplorer Scharl, Leisch Motivation Cluster Analysis Neighborhood Graphs Software Inference Summary
R package gcExplorer Interactive gcExplorer Scharl, Leisch Motivation Cluster Analysis Neighborhood Graphs Software Inference Summary
R package gcExplorer How to use gcExplorer Scharl, Leisch Motivation Cluster Analysis Neighborhood Panel function and node function Graphs Software R> data("sigma") Inference R> gcExplorer(cl1, theme = "green", Summary + panel.function = gcTable, + panel.args = list(links = links_ps19), + node.function = node.go, + node.args = list(gonr = "Sigma32", + id = bn_ps19))
R package gcExplorer Panel and node function Scharl, Leisch Motivation Cluster Analysis Neighborhood Graphs Software Inference Summary
R package gcExplorer Panel and node function Scharl, Leisch Motivation Cluster Analysis Neighborhood Graphs Software Inference Summary
R package gcExplorer How to use gcExplorer Scharl, Leisch Motivation Cluster Analysis Use of matrix plot as node function Neighborhood Graphs Software R> gcExplorer(cl1, node.function = gmatplot, Inference + doViewPort = TRUE) Summary Use of pie plot as node function R> gcExplorer(cl1, node.function = gpie, + doViewPort = TRUE)
R package gcExplorer Node function Scharl, Leisch Motivation Cluster Analysis Neighborhood Graphs Software Inference Summary
R package gcExplorer Node function Scharl, Leisch Motivation Cluster F <= 20 Analysis F > 20 Neighborhood Graphs Software Inference Summary
R package gcExplorer R package symbols Scharl, Leisch Motivation Cluster Analysis • Based on Grid, a very flexible graphics system for R. Neighborhood Graphs • Grid features viewports, i.e., rectangular areas allowing Software the creation of plotting regions all over the R graphic Inference device. Summary • Implementation of several grid–based functions which can directly be used as node functions in the gcExplorer . • Plot barplots, boxplots, line plots, pie charts, stars and symbols. http://r-forge.r-project.org/projects/symbols
R package gcExplorer Neighborhood graph for general Scharl, Leisch cluster functions Motivation Cluster Analysis Neighborhood Cluster results from cluster functions like kmeans from Graphs package stats or pam from package cluster can be Software converted to objects of class kcca and visualized using the Inference neighborhood graph: Summary Conversion R> k1 <- kmeans(hsod, centers = 15) R> k2 <- as.kcca(k1, data = hsod, save.data = TRUE) R> gcExplorer(k2)
R package gcExplorer Functional relevance test Scharl, Leisch Motivation Cluster Analysis • Validation of a given clustering using a priori Neighborhood information about gene function. Graphs Software • Let π 1 , . . . , π K be the proportions of genes assigned to Inference a functional group. Summary • H 0 : d ij = | π i − π j | = 0 • Use the neighborhood structure, i.e., only test for significant differences if two clusters are connected. • No difference in proportions → merge clusters. • Get separated subgraphs with common gene function within the neighborhood graph.
Recommend
More recommend