Practical Bioinformatics Mark Voorhies 4/6/2017 Mark Voorhies - PowerPoint PPT Presentation

Practical Bioinformatics Mark Voorhies 4/6/2017 Mark Voorhies Practical Bioinformatics

Loading and re-loading your functions # Use import the f i r s t time you load a module # (And keep using import u n t i l i t loads # s u c c e s s f u l l y ) import my module my module . my function (42) # Once a module has been loaded , use r e l o a d to # f o r c e python to read your new code i m p o r t l i b r e l o a d from import r e l o a d ( my module ) Mark Voorhies Practical Bioinformatics

Pearson distances Pearson similarity � N i ( x i − x offset )( y i − y offset ) s ( x , y ) = �� N �� N i ( x i − x offset ) 2 i ( y i − y offset ) 2 Mark Voorhies Practical Bioinformatics

Pearson distances Pearson similarity � N i ( x i − x offset )( y i − y offset ) s ( x , y ) = �� N �� N i ( x i − x offset ) 2 i ( y i − y offset ) 2 Pearson distance d ( x , y ) = 1 − s ( x , y ) Mark Voorhies Practical Bioinformatics

Pearson distances Pearson similarity � N i ( x i − x offset )( y i − y offset ) s ( x , y ) = �� N �� N i ( x i − x offset ) 2 i ( y i − y offset ) 2 Pearson distance d ( x , y ) = 1 − s ( x , y ) Euclidean distance � N i ( x i − y i ) 2 N Mark Voorhies Practical Bioinformatics

Comparing all measurements for two genes Comparing two expression profiles (r = 0.97) ● ● 5 ● ● ● YFG1 log2 relative expression ● ● ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● ● ● −5 ● ● ● ● ● −5 0 5 TLC1 log2 relative expression Mark Voorhies Practical Bioinformatics

Comparing all genes for two measurements ● ● ● ● ● ● ● ● 5 ● ● ● ● ● ● ● ● Array 2, log2 relative expression ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −10 ● ● −10 −5 0 5 10 Array 1, log2 relative expression Mark Voorhies Practical Bioinformatics

Comparing all genes for two measurements Euclidean Distance ● ● ● ● ● ● ● ● 5 ● ● ● ● ● ● ● ● Array 2, log2 relative expression ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −10 ● ● −10 −5 0 5 10 Array 1, log2 relative expression Mark Voorhies Practical Bioinformatics

Comparing all genes for two measurements Uncentered Pearson ● ● ● ● ● ● ● ● 5 ● ● ● ● ● ● ● ● Array 2, log2 relative expression ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −10 ● ● −10 −5 0 5 10 Array 1, log2 relative expression Mark Voorhies Practical Bioinformatics

Measure all pairwise distances under distance metric Mark Voorhies Practical Bioinformatics

Hierarchical Clustering Mark Voorhies Practical Bioinformatics

It’s hard work at times, but you have to be realistic. If you have a large database with many variables and your goal is to get a good understanding of the interrelationships, then, unless you get lucky, this complex structure is bound to require some hard work to understand. Bill Cleveland and Rick Becker http://stat.bell-labs.com/project/trellis/interview.html Mark Voorhies Practical Bioinformatics

Using JavaTreeView Mark Voorhies Practical Bioinformatics

Adjust pixel settings for global view Mark Voorhies Practical Bioinformatics

Select annotation columns Mark Voorhies Practical Bioinformatics

Select URL for gene annotations Mark Voorhies Practical Bioinformatics

Activate and detach annotation window Mark Voorhies Practical Bioinformatics

Clustering exercises – Negative controls Write functions to reproduce the shuffling controls in figure 3 of the Eisen paper (removing correlations among genes and/or arrays). Mark Voorhies Practical Bioinformatics

Clustering exercises – Negative controls Write functions to reproduce the shuffling controls in figure 3 of the Eisen paper (removing correlations among genes and/or arrays). s h u f f l e G e n e s ( s e l f , seed = None ) : def ””” S h u f f l e e x p r e s s i o n matrix by row . ””” random import i f ( seed != None ) : random . seed ( seed ) i n d i c e s = range ( l e n ( s e l f . genes )) random . s h u f f l e ( i n d i c e s ) genes = [ s e l f . geneName [ i ] f o r i i n i n d i c e s ] s e l f . geneName = genes a n n o t a t i o n s = [ s e l f . geneAnn [ i ] f o r i i n i n d i c e s ] s e l f . geneAnn = genes num = [ s e l f . num [ i ] f o r i i n i n d i c e s ] s e l f . num = num Mark Voorhies Practical Bioinformatics

Clustering exercises – Negative controls Write functions to reproduce the shuffling controls in figure 3 of the Eisen paper (removing correlations among genes and/or arrays). Mark Voorhies Practical Bioinformatics

Clustering exercises – Negative controls Write functions to reproduce the shuffling controls in figure 3 of the Eisen paper (removing correlations among genes and/or arrays). def shuffleRows ( s e l f , seed = None ) : ”””Permute r a t i o v a l u e s w i t h i n rows . ””” import random i f ( seed != None ) : random . seed ( seed ) i s e l f . num : f o r i n random . s h u f f l e ( i ) Mark Voorhies Practical Bioinformatics

Clustering exercises – Negative controls Write functions to reproduce the shuffling controls in figure 3 of the Eisen paper (removing correlations among genes and/or arrays). def shuffleRows ( s e l f , seed = None ) : ”””Permute r a t i o v a l u e s w i t h i n rows . ””” import random i f ( seed != None ) : random . seed ( seed ) i s e l f . num : f o r i n random . s h u f f l e ( i ) s h u f f l e C o l s ( s e l f , seed = None ) : def ”””Permute r a t i o v a l u e s w i t h i n columns . ””” random import i f ( seed != None ) : random . seed ( seed ) # Transpose the e x p r e s s i o n matrix c o l s = [ ] f o r c o l i n xrange ( l e n ( s e l f . num [ 0 ] ) ) : c o l s . append ( [ row [ c o l ] f o r row i n s e l f . num ] ) # S h u f f l e f o r i i n c o l s : random . s h u f f l e ( i ) # Transpose back to o r i g i n a l o r i e n t a t i o n s e l f . num = [ ] f o r row i n xrange ( l e n ( c o l s ) ) : s e l f . num . append ( [ c o l [ row ] f o r c o l i n row ] ) Mark Voorhies Practical Bioinformatics

Homework 1 Explore different clustering methods and/or distance methods 2 Try additional shufflings of the data: how do they affect your ability to cluster the data? C.f. figure 3 the Eisen paper Permute the columns Independently permute the columns of each row Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 4/6/2017 Mark Voorhies - PowerPoint PPT Presentation

Practical Bioinformatics Mark Voorhies 4/6/2017 Mark Voorhies Practical Bioinformatics Loading and re-loading your functions # Use import the f i r s t time you load a module # (And keep using import u n t i l i t loads # s u c c

Practical Bioinformatics Mark Voorhies 5/15/2015 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 4/16/2018 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 4/9/2018 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 5/12/2015 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 6/3/2013 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 5/ 24/ 2013 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 5/23/2019 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 5/21/2019 Mark Voorhies Practical Bioinformatics Change

Practical Bioinformatics Mark Voorhies 5/11/2015 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 5/29/2019 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 4/20/2011 Mark Voorhies Practical Bioinformatics Review

Practical Bioinformatics Mark Voorhies 5/21/2013 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 5/26/2015 Mark Voorhies Practical Bioinformatics Habits

Data Mining in Bioinformatics Day 7: Clustering in Bioinformatics Karsten Borgwardt February 25

Practical Bioinformatics Mark Voorhies 5/14/2019 Mark Voorhies Practical Bioinformatics Course

Practical Bioinformatics Mark Voorhies 5/2/2017 Mark Voorhies Practical Bioinformatics

Induction and Recapitulation of Deep Musical Structure Lee Spector Adam Alpern School of

Heuristic Approaches to Program Synthesis: Genetic Programming and Beyond Krzysztof Krawiec

Parallel Numerical Algorithms Chapter 7 Differential Equations Section 7.5 Tensor

OpenAtom: Fast, fine grained parallel electronic structure software for materials science,

Convex rank tests Anne Shiu Texas A&M University CombinaTexas 8 May 2016 From Algebraic

February 2008 Differential Expression, Power, Exploratory Analysis Mauro Delorenzi Bioinformatics

Parallel Game Tree Search Tsan-sheng Hsu tshsu@iis.sinica.edu.tw

Machine Learning 2007: Lecture 2 Instructor: Tim van Erven (Tim.van.Erven@cwi.nl) Website:

Practical Bioinformatics Mark Voorhies 4/6/2017 Mark Voorhies - PowerPoint PPT Presentation

Practical Bioinformatics Mark Voorhies 4/6/2017 Mark Voorhies Practical Bioinformatics Loading and re-loading your functions # Use import the f i r s t time you load a module # (And keep using import u n t i l i t loads # s u c c

Practical Bioinformatics Mark Voorhies 5/15/2015 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 4/16/2018 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 4/9/2018 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 5/12/2015 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 6/3/2013 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 5/ 24/ 2013 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 5/23/2019 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 5/21/2019 Mark Voorhies Practical Bioinformatics Change

Practical Bioinformatics Mark Voorhies 5/11/2015 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 5/29/2019 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 4/20/2011 Mark Voorhies Practical Bioinformatics Review

Practical Bioinformatics Mark Voorhies 5/21/2013 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 5/26/2015 Mark Voorhies Practical Bioinformatics Habits

Data Mining in Bioinformatics Day 7: Clustering in Bioinformatics Karsten Borgwardt February 25

Practical Bioinformatics Mark Voorhies 5/14/2019 Mark Voorhies Practical Bioinformatics Course

Practical Bioinformatics Mark Voorhies 5/2/2017 Mark Voorhies Practical Bioinformatics

Induction and Recapitulation of Deep Musical Structure Lee Spector Adam Alpern School of

Heuristic Approaches to Program Synthesis: Genetic Programming and Beyond Krzysztof Krawiec

Parallel Numerical Algorithms Chapter 7 Differential Equations Section 7.5 Tensor

OpenAtom: Fast, fine grained parallel electronic structure software for materials science,

Convex rank tests Anne Shiu Texas A&amp;M University CombinaTexas 8 May 2016 From Algebraic

February 2008 Differential Expression, Power, Exploratory Analysis Mauro Delorenzi Bioinformatics

Parallel Game Tree Search Tsan-sheng Hsu tshsu@iis.sinica.edu.tw

Machine Learning 2007: Lecture 2 Instructor: Tim van Erven (Tim.van.Erven@cwi.nl) Website:

Convex rank tests Anne Shiu Texas A&M University CombinaTexas 8 May 2016 From Algebraic