distance metrics
play

Distance Metrics Mark Voorhies 4/5/2018 Mark Voorhies Distance - PowerPoint PPT Presentation

Distance Metrics Mark Voorhies 4/5/2018 Mark Voorhies Distance Metrics List tricks Adding data to a list: m y l i s t = [ ] m y l i s t . append (3) m y l i s t += [4 ,5 ,6] Mark Voorhies Distance Metrics List tricks Adding data to a


  1. Distance Metrics Mark Voorhies 4/5/2018 Mark Voorhies Distance Metrics

  2. List tricks Adding data to a list: m y l i s t = [ ] m y l i s t . append (3) m y l i s t += [4 ,5 ,6] Mark Voorhies Distance Metrics

  3. List tricks Adding data to a list: m y l i s t = [ ] m y l i s t . append (3) m y l i s t += [4 ,5 ,6] Lists of lists: matrix = [ [ 1 , 2 , 3 , 4 ] , [ 5 , 6 , 7 , 8 ] , [ 9 , 1 0 , 1 1 , 1 2 ] ] Mark Voorhies Distance Metrics

  4. Anatomy of a Programming Language Mark Voorhies Distance Metrics

  5. Anatomy of a Programming Language def f(x,y): f(x) return x*y from math import sqrt functions Mark Voorhies Distance Metrics

  6. Anatomy of a Programming Language 1 1.2 "my string" ["my","list"] my_ fi le = open("my_ fi le.txt") data structures ("my","tuple") [["my","multi"], ["dimensional","list"]] Mark Voorhies Distance Metrics

  7. Anatomy of a Programming Language while(a != stop) while (a != -1): a = x. fi nd("ATG") No a <- a|nextbase() |a| < 3? for line in open("x"): L.append(line[:-1]) Yes! p <- p|translate(a) a <- "" control statements Mark Voorhies Distance Metrics

  8. Anatomy of a Programming Language "GGGATGCATCAT". fi nd("ATG") L = [3,4,5] L.append(7) f(x) L += [6,7] g(x) objects open("1.txt").readlines() Mark Voorhies Distance Metrics

  9. Anatomy of a Programming Language Mark Voorhies Distance Metrics

  10. The CDT file format Minimal CLUSTER input Cluster3 CDT output Tab delimited ( \ t) UNIX newlines ( \ n) Missing values → empty cells Mark Voorhies Distance Metrics

  11. supp2data.cdt Mark Voorhies Distance Metrics

  12. supp2data.cdt [ [ ”YBR166C” , ”YOR357C” , ”YLR292C” , . . . ] , [ ”TYR1 . . . ” , ”GRD19 . . . ” , ”SEC72 . . . ” , . . . ] , [ [ 0 . 3 3 , − 0 . 1 7 , 0 . 0 4 , − 0 . 0 7 , − 0 . 0 9 , . . . ] , [ − 0.64 , − 0.38 , − 0.32 , − 0.29 , − 0.22 , ...] , [ − 0.23 , 0.19 , − 0.36 , 0 . 1 4 , − 0 . 4 0 , . . . ] , . . . ] ] Mark Voorhies Distance Metrics

  13. Generators are like polymerases: iterable but not indexable Mark Voorhies Distance Metrics

  14. Fun with logarithms In log space, multiplication and division become addition and subtraction: log( xy ) = log( x ) + log( y ) log( x / y ) = log( x ) − log( y ) Mark Voorhies Distance Metrics

  15. Fun with logarithms In log space, multiplication and division become addition and subtraction: log( xy ) = log( x ) + log( y ) log( x / y ) = log( x ) − log( y ) Therefore, exponentiation becomes multiplication: log( x y ) = y log( x ) Mark Voorhies Distance Metrics

  16. Fun with logarithms In log space, multiplication and division become addition and subtraction: log( xy ) = log( x ) + log( y ) log( x / y ) = log( x ) − log( y ) Therefore, exponentiation becomes multiplication: log( x y ) = y log( x ) Also, we can change of the base of a logarithm like so: log A ( x ) = log( x ) / log( A ) Mark Voorhies Distance Metrics

  17. Pearson distances Pearson similarity N s ( x , y ) = 1 � x i − x offset � � y i − y offset � � N φ x φ y i � N � ( G i − G offset ) 2 � � φ G = � N i Mark Voorhies Distance Metrics

  18. Pearson distances Pearson similarity N � x i − x offset � � y i − y offset � � s ( x , y ) = φ x φ y i � N � � � ( G i − G offset ) 2 φ G = � i Mark Voorhies Distance Metrics

  19. Pearson distances Pearson similarity     N x i − x offset y i − y offset � s ( x , y ) =     �� N �� N i ( x i − x offset ) 2 i ( y i − y offset ) 2 i Mark Voorhies Distance Metrics

  20. Pearson distances Pearson similarity � N i ( x i − x offset )( y i − y offset ) s ( x , y ) = �� N �� N i ( x i − x offset ) 2 i ( y i − y offset ) 2 Mark Voorhies Distance Metrics

  21. Pearson distances Pearson similarity � N i ( x i − x offset )( y i − y offset ) s ( x , y ) = �� N �� N i ( x i − x offset ) 2 i ( y i − y offset ) 2 Pearson distance d ( x , y ) = 1 − s ( x , y ) Mark Voorhies Distance Metrics

  22. Pearson distances Pearson similarity � N i ( x i − x offset )( y i − y offset ) s ( x , y ) = �� N �� N i ( x i − x offset ) 2 i ( y i − y offset ) 2 Pearson distance d ( x , y ) = 1 − s ( x , y ) Euclidean distance � N i ( x i − y i ) 2 N Mark Voorhies Distance Metrics

  23. Comparing all measurements for two genes Comparing two expression profiles (r = 0.97) ● ● 5 ● ● ● YFG1 log2 relative expression ● ● ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● ● ● −5 ● ● ● ● ● −5 0 5 TLC1 log2 relative expression Mark Voorhies Distance Metrics

  24. Comparing all genes for two measurements ● ● ● ● ● ● ● ● 5 ● ● ● ● ● ● ● ● Array 2, log2 relative expression ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −10 ● ● −10 −5 0 5 10 Array 1, log2 relative expression Mark Voorhies Distance Metrics

  25. Comparing all genes for two measurements Euclidean Distance ● ● ● ● ● ● ● ● 5 ● ● ● ● ● ● ● ● Array 2, log2 relative expression ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −10 ● ● −10 −5 0 5 10 Array 1, log2 relative expression Mark Voorhies Distance Metrics

  26. Comparing all genes for two measurements Uncentered Pearson ● ● ● ● ● ● ● ● 5 ● ● ● ● ● ● ● ● Array 2, log2 relative expression ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −10 ● ● −10 −5 0 5 10 Array 1, log2 relative expression Mark Voorhies Distance Metrics

  27. Measure all pairwise distances under distance metric Mark Voorhies Distance Metrics

  28. Homework 1 Install biopython via Canopy (or whatever you’re using. If this doesn’t work, install Cluster3) 2 Write a function to calculate all pairwise Pearson correlations for the yeast expression profiles. 3 Save the results of your pairwise correlation calculation in the CDT format described in the JavaTreeView manual. 4 Read PNAS 95:14863 5 Try the first two problems, replacing the Pearson correlation with the distance metric from the PNAS paper or with one of the distance metrics from the Cluster3 manual. Mark Voorhies Distance Metrics

Recommend


More recommend