Gene Expression Analysis Clustering Samples/Condi4ons Find groups - PDF document

4/3/09 CSCI1950‐Z Computa4onal Methods for Biology Lecture 16 Ben Raphael March 30, 2009 hHp://cs.brown.edu/courses/csci1950‐z/ Gene Expression Analysis Clustering Samples/Condi4ons • Find groups of genes with similar expression profiles. Gene expression Biclustering • Find subsets of genes may behave similarly under only a subset of condi4ons. BMC Genomics 2006, 7:279 1

4/3/09 Classifica4on Use gene expression profiles Samples/Condi4ons to predict/classify samples into a number of predefined groups. Gene expression • Tissue of origin (heart, kidney, brain, etc.) • Cancer vs. normal • Cancer subtypes. BMC Genomics 2006, 7:279 Cancer Subtype Classifica4on Dis4nguish acute lyphoblas4c leukemia (ALL) and acute myeloid leukemia (AML). (Golub et al. Science 1999) 2

4/3/09 Breast Cancer Prognosis • 70 gene signature to predict breast cancer pa4ents with metastasis within 5 years (van de Vijver et al. 2002, van’t Veer et al. 2002) • Now an FDA approved test: Mammaprint Classifica4on Binary classifica2on Given a set of examples ( x i , y i ) , where y i = +‐ 1, from unknown distribu4on D. Design func4on f: R n  {‐1,+1} that op7mally assigns addi4onal samples x i to one of two classes. Supervised learning ( x i , y i ) training data x i (j) : feature. R n : feature space. 3

4/3/09 Classifica4on in 2D k‐nearest neighbors 4

4/3/09 Decision Tree • Par44on feature space R n according to series of decision rules. • Internal nodes: Decision rules x i > c • Leaves: Label assigned to point. • “popular among medical scien4sts, perhaps because it mimics the way that a doctor thinks.” (Has4e, Tibshirani, Friedman. Elements of Sta4s4cal Learning) Overfijng • Many classifica4on methods can fit training data with no mistakes. – k‐NN: k=1 – Decision tree with large depth • Classifier will be tuned to “quirks” in training data. Poor generaliza4on to other data sets. • Evaluate classifier on test data, not training data. 5

4/3/09 Training and Tes4ng • Split data D into training/test sets. • Evaluate performance on test set. • Available data D olen limited. E.g. few gene expression samples. • Splijng into training/test sets leaves too few data points. Cross‐Valida4on Split D into K equally sized parts D 1 , …, D k . For k = 1, …K Train on D’ = D \ D k . Test on D k . Compute performance quan4ty ρ k . Output average of ρ k . K‐fold cross‐valida2on . Special case k=n: leave one out cross valida2on . 6

4/3/09 Feature Selec4on • Selec4ng a subset of features (reduc4on of dimensionality) some4mes leads to beHer classifica4on, performance, etc. • Gene expression: subset of genes informa4ve for classifica4on. Results: Class Discovery with TNoM (ben‐Dor, Friedman, Yakhini, 2001) • Find op4mal labeling L. – Solu4on: use heuris4c search • Find mul4ple (subop4mal) labelings – Solu4on: Peeling: remove previously used genes from set. 7

4/3/09 Results: Class Discovery with TNoM (ben‐Dor, Friedman, Yakhini, 2001) Leukemia (Golub et al. 1999): 72 expression profiles. 25 AML, 47 ALL. 7129 genes Lymphoma (Alizadeh et al.): 96 expression profiles, 46 Diffuse large B‐cell lymphoma (DLBCL) 50 from 8 different 4ssues. Lymphoma‐DLBCL: subset of 46 of above. TNoM Results (ben‐Dor, Friedman, Yakhini, 2001) % survival years 40 pa4ents 24 pa4ents with low clinical risk. 8

Gene Expression Analysis Clustering Samples/Condi4ons Find groups - PDF document

4/3/09 CSCI1950Z Computa4onal Methods for Biology Lecture 16 Ben Raphael March 30, 2009 hHp://cs.brown.edu/courses/csci1950z/ Gene Expression Analysis Clustering Samples/Condi4ons Find groups of genes with similar expression

Gene Expression Data Introduction to gene expression data Expression data storage concept An

CSci 8980: Advanced Topics in Graphical Models Application: Gene Expression Analysis Instructor:

Analysis of Gene Expression Profiles Analysis of Gene Expression Profiles and Drug Activity

A Data Warehouse-based A Data Warehouse-based Gene Expression Analysis Gene Expression Analysis

Eukaryotic Gene Eukaryotic Gene Prediction Prediction Eukaryotic gene structure Eukaryotic

1 Milestones Milestones ID Task Name Duration Start Finish % Complete 1 Project Proposal

CSEP 527 Computational Biology Gene Expression Analysis 1 Assaying Gene Expression 3

CSEP 527 Computational Biology Gene Expression Analysis 1 Assaying Gene Expression 3

CSEP 590 B Computational Biology Gene Expression Analysis 1 Assaying Gene Expression 3

Examples of online analysis tools for gene expression data Tools integrated in data repositories

Gene Expression Remember the days of 10 th grade biology Learning about gene expression Which can

AP BIOLOGY Gene Expression Summer 2013 www.njctl.org Slide 3 / 199 Gene Expression Unit Topics

Boolean models of the lac operon in E. coli Matthew Macauley Clemson University Gene expression

Survival Models built from Gene Expression Data using Gene Groups as Covariates Kai Kammers,

A Parallel Approximation Hitting Set Algorithm for Gene Expression Analysis D. P. Ruchkys

Boolean models of gene regulatory networks Matthew Macauley Math 4500: Mathematical Modeling

Antiviral effect of derivatives of triazoles on EBV-associated lymphoblastoid cells Krystyna

Jenna Kahn M.D. Assistant Professor Oregon Health Sciences University Twitter: @jennamkahn,

GSK 916: anti -BCMA ADC An exciting potential advance in the treatment of Multiple Myeloma 12

The First Line Treatment for Non-Resectable, Non- Transplantable Liver Only HCC: Systemic Therapy

Cutpoint Analysis with Bootstrap Validation for the Cox PH Model These slides present an overview

Na National C Coalition f for C Cancer er Survivorship Lee Jones Patient advocate and Cancer

MYELODYSPLASTIC SYNDROMES: CHAOS AND ORDER Del (5q) Syndrome Dr Valeria Di Battista October 26,

Electronic Consultations (eConsults) in Hematology: An Evaluation for Effectiveness Presented by