Introduction to Microarray Data Analysis and Gene Networks Lecture - PowerPoint PPT Presentation

Introduction to Microarray Data Analysis and Gene Networks Lecture 5 Alvis Brazma European Bioinformatics Institute

Lecture 5 • Clustering – Hierarchical – K-means • A few minutes about representing experimental designs – Experiment design graphs, replicates – Experimental factors • A few minutes about supervised learning • Practical

Supervised vs. unsupervised analysis - class discovery vs. clustering

What is a cluster? •In a set of elements, subsets of elements that are in some sense closer to each other than ‘average’ •Closeness can be defined by a distance measure •Distance by itself is not sufficient •How to measure distance between more than 2 points? •Shape of the cluster? •Thresholds of closeness which are the same clusters, which are not

What is a cluster? The definition of what is a ‘cluster’ is difficult In practice it is defined by an algorithm that finds clusters

Clustering algorithms • Hierarchical vs flat – Hierarchical clustering builds a hierarchical tree (also called dendrogram) showing the relationship among the elements – Flat clustering partitions the set of elements in subsets (nonoverlapping or overlapping) 1 2 c2 c1 3 c5 c3 4 c4 5

Hierarchical clustering – how does it work? 1 1 2 2 1 2 1 3 3 4 4 3 2 4 5 3 5 5 4 1,2 3 4 5 1 2 3 4 5 1,2 3 4 5 5 2 1,2 4.5 5.5 1 2 5 6 2 1 1,2 4.5 5.5 3 3 3 2 2 4 5 3 3 3 4 2 3 3 3 4 2 5 4 2 5 5

Different linkages Keep joining together two closest clusters by using the: Minimum distance => Single linkage Maximum distance => Complete linkage Average distance => Average linkage Alternative – maintain a centroid in each cluster and use it for linking

Flat clusterings All genes TFIID SAGA

Clustering genes and smaples • When does it make sense to cluster samples?

K means clutering • K stands for number of clusters one wants to obtain – K has to be guessed • We need a notion of a gravity center – in n dimensional Euclidean space the gravity center of vectors (each of weight 1) is defined as the vector of mean coordinates along each dimension separately

B A C Condition 1 Condition 2 Figure 4.2

y A 5 A = (2,5) 4 B = (4,2) 3 C = (3,-3) B 2 1 X=(2+4+3)/3=3 x Y=(5+2-4)/3=1 0 1 2 3 4 -1 -2 -3 C -4 -5

y A 5 A = (2,5) 4 B = (4,2) 3 C = (3,-3) B 2 1 X=(2+4+3)/3=3 x 0 1 2 3 4 -1 -2 -3 C -4 -5

y A 5 A = (2,5) 4 B = (4,2) 3 C = (3,-3) B 2 1 X=(2+4+3)/3=3 x Y=(5+2-4)/3=1 0 1 2 3 4 -1 -2 -3 C -4 -5

y A 5 A = (2,5) 4 B = (4,2) 3 C = (3,-3) B 2 1 X=(2+4+3)/3=3 x Y=(5+2-4)/3=1 0 1 2 3 4 -1 G = (3,1) -2 -3 C -4 -5

K means clustering 1. Select K points (vectors) called centers in the space somehow (at random, or more intelligently so that they are far a way) 2. For each vector in the universe that you want to cluster, calculate the distance between it and all the K centers, and assign it to the center which is the closest - In this way K clusters are defined. 3. In each cluster define the new center as its gravity center 4. Repeat steps 2-3 until the gravity centers do not move any more, or after some fixed number of steps

1. Guess K centres 3. Move to gravity centres 2. Assign to clusters

K means clustering 1. Select K points (vectors) called centers in the space somehow (at random, or more intelligently so that they are far a way) 2. For each vector in the universe that you want to cluster, calculate the distance between it and all the K centers, and assign it to the center which is the closest - In this way K clusters are defined. 3. In each cluster define the new center as its gravity center 4. Repeat steps 2-3 until the gravity centers do not move any more, or after some fixed number of steps

Other clustering methods • Kohonen’s self organising maps • Self organising trees (Dopazo) • Probability distribution based clustering • Two way clustering • Fuzzy clustering • Cluster comparison

Introduction to Microarray Data Analysis and Gene Networks Lecture - PowerPoint PPT Presentation

Introduction to Microarray Data Analysis and Gene Networks Lecture 5 Alvis Brazma European Bioinformatics Institute Lecture 5 Clustering Hierarchical K-means A few minutes about representing experimental designs

Capturing Best Practice for Microarray Gene Expression Data Analysis Gregory Piatetsky-Shapiro

Microarray Data Analysis ECS 289A ECS289A a) Oligonucleotide and b) Spotted Arrays Lochart and

Introduction to Microarray Data Analysis and Gene Networks lecture 8 Alvis Brazma European

Gene Expression Data Introduction to gene expression data Expression data storage concept An

Eukaryotic Gene Eukaryotic Gene Prediction Prediction Eukaryotic gene structure Eukaryotic

Inference of Gene Relations from Microarray Data by Abduction Irene Papatheodorou & Marek

Introduction to Microarray Data Analysis and Gene Networks Alvis Brazma European Bioinformatics

Gene Finding Strategies to find gene structures on the web Swiss Institute of Bioinformatics

Staphylococcus aureus Pathogenesis - Gene exchanges - Gene regulation - Gene products - Gene

Recent development in microarray data analysis Guan-Hua Huang Institute of Statistics National

A CMOS Label- -free DNA free DNA A CMOS Label Microarray Microarray Erik Anderson Stanford

Gene expression analysis Roadmap Microarray technology: how it work Applications: what

CSci 8980: Advanced Topics in Graphical Models Application: Gene Expression Analysis Instructor:

Introduction to Microarray Data Analysis and Gene Networks Lecture 3 and practical Alvis Brazma

Introduction to Microarray Data Analysis and Gene Networks Alvis Brazma European Bioinformatics

Biology-Driven Clustering of Microarray Data Applications to the NCI60 Data Set K.R. Coombes,

Data Driven Algorithm Design Maria-Florina (Nina) Balcan Carnegie Mellon University Analysis and

Gene Ontology and Functional Enrichment Genome 373 Genomic Informatics Elhanan Borenstein A

Investigating Citation Linkage as a Sentence Similarity Measurement Task using Deep Learning

Unsupervised Learning Introduction Nakul Verma Unsupervised Learning What can we learn from

Shape Co-analysis and constrained clustering Daniel Cohen-Or Tel-Aviv University 1 High-level

B Street / Broadway Piers, Downtown Anchorage, and Switzer Creek TMDLs Public Workshop &

Tropospheric Water Vapor Variability and Linkage to Tropospheric Water Vapor Variability and

Hierarchical Clustering MAT 6480W / STT 6705V Guy Wolf guy.wolf@umontreal.ca Universit e de

Introduction to Microarray Data Analysis and Gene Networks Lecture - PowerPoint PPT Presentation

Introduction to Microarray Data Analysis and Gene Networks Lecture 5 Alvis Brazma European Bioinformatics Institute Lecture 5 Clustering Hierarchical K-means A few minutes about representing experimental designs

Capturing Best Practice for Microarray Gene Expression Data Analysis Gregory Piatetsky-Shapiro

Microarray Data Analysis ECS 289A ECS289A a) Oligonucleotide and b) Spotted Arrays Lochart and

Introduction to Microarray Data Analysis and Gene Networks lecture 8 Alvis Brazma European

Gene Expression Data Introduction to gene expression data Expression data storage concept An

Eukaryotic Gene Eukaryotic Gene Prediction Prediction Eukaryotic gene structure Eukaryotic

Inference of Gene Relations from Microarray Data by Abduction Irene Papatheodorou &amp; Marek

Introduction to Microarray Data Analysis and Gene Networks Alvis Brazma European Bioinformatics

Gene Finding Strategies to find gene structures on the web Swiss Institute of Bioinformatics

Staphylococcus aureus Pathogenesis - Gene exchanges - Gene regulation - Gene products - Gene

Recent development in microarray data analysis Guan-Hua Huang Institute of Statistics National

A CMOS Label- -free DNA free DNA A CMOS Label Microarray Microarray Erik Anderson Stanford

Gene expression analysis Roadmap Microarray technology: how it work Applications: what

CSci 8980: Advanced Topics in Graphical Models Application: Gene Expression Analysis Instructor:

Introduction to Microarray Data Analysis and Gene Networks Lecture 3 and practical Alvis Brazma

Introduction to Microarray Data Analysis and Gene Networks Alvis Brazma European Bioinformatics

Biology-Driven Clustering of Microarray Data Applications to the NCI60 Data Set K.R. Coombes,

Data Driven Algorithm Design Maria-Florina (Nina) Balcan Carnegie Mellon University Analysis and

Gene Ontology and Functional Enrichment Genome 373 Genomic Informatics Elhanan Borenstein A

Investigating Citation Linkage as a Sentence Similarity Measurement Task using Deep Learning

Unsupervised Learning Introduction Nakul Verma Unsupervised Learning What can we learn from

Shape Co-analysis and constrained clustering Daniel Cohen-Or Tel-Aviv University 1 High-level

B Street / Broadway Piers, Downtown Anchorage, and Switzer Creek TMDLs Public Workshop &amp;

Tropospheric Water Vapor Variability and Linkage to Tropospheric Water Vapor Variability and

Hierarchical Clustering MAT 6480W / STT 6705V Guy Wolf guy.wolf@umontreal.ca Universit e de

Inference of Gene Relations from Microarray Data by Abduction Irene Papatheodorou & Marek

B Street / Broadway Piers, Downtown Anchorage, and Switzer Creek TMDLs Public Workshop &