Distance Metrics Mark Voorhies 4/5/2018 Mark Voorhies Distance - - PowerPoint PPT Presentation

distance metrics
SMART_READER_LITE
LIVE PREVIEW

Distance Metrics Mark Voorhies 4/5/2018 Mark Voorhies Distance - - PowerPoint PPT Presentation

Distance Metrics Mark Voorhies 4/5/2018 Mark Voorhies Distance Metrics List tricks Adding data to a list: m y l i s t = [ ] m y l i s t . append (3) m y l i s t += [4 ,5 ,6] Mark Voorhies Distance Metrics List tricks Adding data to a


slide-1
SLIDE 1

Distance Metrics

Mark Voorhies 4/5/2018

Mark Voorhies Distance Metrics

slide-2
SLIDE 2

List tricks

Adding data to a list: m y l i s t = [ ] m y l i s t . append (3) m y l i s t += [4 ,5 ,6]

Mark Voorhies Distance Metrics

slide-3
SLIDE 3

List tricks

Adding data to a list: m y l i s t = [ ] m y l i s t . append (3) m y l i s t += [4 ,5 ,6] Lists of lists: matrix = [ [ 1 , 2 , 3 , 4 ] , [ 5 , 6 , 7 , 8 ] , [ 9 , 1 0 , 1 1 , 1 2 ] ]

Mark Voorhies Distance Metrics

slide-4
SLIDE 4

Anatomy of a Programming Language

Mark Voorhies Distance Metrics

slide-5
SLIDE 5

Anatomy of a Programming Language

f(x)

functions def f(x,y): return x*y from math import sqrt

Mark Voorhies Distance Metrics

slide-6
SLIDE 6

Anatomy of a Programming Language

data structures 1 1.2 "my string" ["my","list"] my_file = open("my_file.txt") ("my","tuple") [["my","multi"], ["dimensional","list"]]

Mark Voorhies Distance Metrics

slide-7
SLIDE 7

Anatomy of a Programming Language

|a| < 3? a <- a|nextbase() p <- p|translate(a) while(a != stop) a <- ""

control statements

Yes!

No

while (a != -1): a = x.find("ATG") for line in open("x"): L.append(line[:-1])

Mark Voorhies Distance Metrics

slide-8
SLIDE 8

Anatomy of a Programming Language

f(x) g(x)

  • bjects

"GGGATGCATCAT".find("ATG") L = [3,4,5] L.append(7) L += [6,7]

  • pen("1.txt").readlines()

Mark Voorhies Distance Metrics

slide-9
SLIDE 9

Anatomy of a Programming Language

Mark Voorhies Distance Metrics

slide-10
SLIDE 10

The CDT file format

Minimal CLUSTER input Cluster3 CDT output Tab delimited (\t) UNIX newlines (\n) Missing values → empty cells

Mark Voorhies Distance Metrics

slide-11
SLIDE 11

supp2data.cdt

Mark Voorhies Distance Metrics

slide-12
SLIDE 12

supp2data.cdt

[ [ ”YBR166C” , ”YOR357C” , ”YLR292C” , . . . ] , [ ”TYR1 . . . ” , ”GRD19 . . . ” , ”SEC72 . . . ” , . . . ] , [ [ 0 . 3 3 , − 0 . 1 7 , 0 . 0 4 , − 0 . 0 7 , − 0 . 0 9 , . . . ] , [ −0.64 , −0.38 , −0.32 , −0.29 , −0.22 , ...] , [ −0.23 , 0.19 , −0.36 , 0 . 1 4 , − 0 . 4 0 , . . . ] , . . . ] ] Mark Voorhies Distance Metrics

slide-13
SLIDE 13

Generators are like polymerases: iterable but not indexable

Mark Voorhies Distance Metrics

slide-14
SLIDE 14

Fun with logarithms

In log space, multiplication and division become addition and subtraction: log(xy) = log(x) + log(y) log(x/y) = log(x) − log(y)

Mark Voorhies Distance Metrics

slide-15
SLIDE 15

Fun with logarithms

In log space, multiplication and division become addition and subtraction: log(xy) = log(x) + log(y) log(x/y) = log(x) − log(y) Therefore, exponentiation becomes multiplication: log(xy) = y log(x)

Mark Voorhies Distance Metrics

slide-16
SLIDE 16

Fun with logarithms

In log space, multiplication and division become addition and subtraction: log(xy) = log(x) + log(y) log(x/y) = log(x) − log(y) Therefore, exponentiation becomes multiplication: log(xy) = y log(x) Also, we can change of the base of a logarithm like so: logA(x) = log(x)/ log(A)

Mark Voorhies Distance Metrics

slide-17
SLIDE 17

Pearson distances

Pearson similarity s(x, y) = 1 N

N

  • i

xi − xoffset φx yi − yoffset φy

  • φG =
  • N
  • i

(Gi − Goffset)2 N

Mark Voorhies Distance Metrics

slide-18
SLIDE 18

Pearson distances

Pearson similarity s(x, y) =

N

  • i

xi − xoffset φx yi − yoffset φy

  • φG =
  • N
  • i

(Gi − Goffset)2

Mark Voorhies Distance Metrics

slide-19
SLIDE 19

Pearson distances

Pearson similarity s(x, y) =

N

  • i

  xi − xoffset N

i (xi − xoffset)2

    yi − yoffset N

i (yi − yoffset)2

 

Mark Voorhies Distance Metrics

slide-20
SLIDE 20

Pearson distances

Pearson similarity s(x, y) = N

i (xi − xoffset)(yi − yoffset)

N

i (xi − xoffset)2

N

i (yi − yoffset)2

Mark Voorhies Distance Metrics

slide-21
SLIDE 21

Pearson distances

Pearson similarity s(x, y) = N

i (xi − xoffset)(yi − yoffset)

N

i (xi − xoffset)2

N

i (yi − yoffset)2

Pearson distance d(x, y) = 1 − s(x, y)

Mark Voorhies Distance Metrics

slide-22
SLIDE 22

Pearson distances

Pearson similarity s(x, y) = N

i (xi − xoffset)(yi − yoffset)

N

i (xi − xoffset)2

N

i (yi − yoffset)2

Pearson distance d(x, y) = 1 − s(x, y) Euclidean distance N

i (xi − yi)2

N

Mark Voorhies Distance Metrics

slide-23
SLIDE 23

Comparing all measurements for two genes

  • −5

5 −5 5

Comparing two expression profiles (r = 0.97)

TLC1 log2 relative expression YFG1 log2 relative expression

Mark Voorhies Distance Metrics

slide-24
SLIDE 24

Comparing all genes for two measurements

  • −10

−5 5 10 −10 −5 5 Array 1, log2 relative expression Array 2, log2 relative expression

  • Mark Voorhies

Distance Metrics

slide-25
SLIDE 25

Comparing all genes for two measurements

  • −10

−5 5 10 −10 −5 5

Euclidean Distance

Array 1, log2 relative expression Array 2, log2 relative expression

  • Mark Voorhies

Distance Metrics

slide-26
SLIDE 26

Comparing all genes for two measurements

  • −10

−5 5 10 −10 −5 5

Uncentered Pearson

Array 1, log2 relative expression Array 2, log2 relative expression

  • Mark Voorhies

Distance Metrics

slide-27
SLIDE 27

Measure all pairwise distances under distance metric

Mark Voorhies Distance Metrics

slide-28
SLIDE 28

Homework

1 Install biopython via Canopy (or whatever you’re using. If this

doesn’t work, install Cluster3)

2 Write a function to calculate all pairwise Pearson correlations

for the yeast expression profiles.

3 Save the results of your pairwise correlation calculation in the

CDT format described in the JavaTreeView manual.

4 Read PNAS 95:14863 5 Try the first two problems, replacing the Pearson correlation

with the distance metric from the PNAS paper or with one of the distance metrics from the Cluster3 manual.

Mark Voorhies Distance Metrics