biweight correlation as a measure of distance between
play

Biweight Correlation as a Measure of Distance between Genes on a - PowerPoint PPT Presentation

Biweight Correlation as a Measure of Distance between Genes on a Microarray Aya Mitani Pitzer College 06 Advisor: Professor Johanna Hardin Pomona College April 29, 2006 1 About microarray Small chip Contains thousands of probes


  1. Biweight Correlation as a Measure of Distance between Genes on a Microarray Aya Mitani Pitzer College ’06 Advisor: Professor Johanna Hardin Pomona College April 29, 2006 1

  2. About microarray • Small chip • Contains thousands of probes • Measures mRNA activity in a particular cell type • Contains control and treatment sample • Expression level is measured from light intensity 2

  3. 3

  4. 4

  5. Problem with microarray • Noisy data • Needs robust estimation of correlation • Pearson correlation is often used -One outlier can greatly affect correlation 5

  6. Last summer M-estimation weighed average with points farther from the center given less weight � µ ) ′ ˜ Σ − 1 ( x i − ˜ = ( x i − ˜ µ ) (1) d i � i w ( d i ) x i ˜ = (2) µ � i w ( d i ) µ ) ′ � i w ( d i )( x i − ˜ µ )( x i − ˜ ˜ Σ = (3) � i w ( d i ) Tukey’s biweight � d i (1 − ( d i c ) 2 ) 2 d i ≤ c w ( d i ) = 0 d i > c Use Minimum Covariance Determinant (MCD) for initial estimation of µ and Σ 6

  7. Plot of Biweight weight function ( w ) 1.0 0.8 0.6 weight 0.4 0.2 0.0 0 1 2 3 4 distance 7

  8. Biweight Correlation Coefficient σ jk bwc jk = σ jj σ kk where σ jk is biweight estimate of covariance of gene j and gene k and σ jj is biweight estimate of variance of gene j Want to find out the correlation(similarities/differences) of two genes 8

  9. 0.5 Pearson correlation 0.0 −0.5 9 −0.5 0.0 0.5 1.0 Biweight correlation

  10. 0.5 0 0.0 −1 Gene 86 Gene 11 −1.0 −2 −3 −2.0 −4 −0.5 0.0 0.5 −0.5 0.0 0.5 1.0 1.5 2.0 Gene 14 Gene 26 10

  11. Further work to be done • Computational time • Biweight correlation on clean data 11

  12. This Spring • Matrix correlation vs Pair by pair correlation • One-step M-estimation • Median vs MCD • Biweight correlation good for clean data? 12

  13. Instead of computing pair by pair correlation, compute correla- tion matrix from biweight covariance matrix simultaneously � µ ) ′ ˜ Σ − 1 ( x i − ˜ = ( x i − ˜ µ ) (4) d i � i w ( d i ) x i ˜ = (5) µ � i w ( d i ) µ ) ′ � i w ( d i )( x i − ˜ µ )( x i − ˜ ˜ Σ = (6) � i w ( d i ) ⎛ ⎞ � − 1 ⎛ ⎞ mat.bwc 11 . . . mat.bwc 1 n σ 11 . . . σ 1 n � − 1 � � 0 0 σ 11 . . . σ 11 . . . mat.bwc 21 . . . mat.bwc 2 n σ 21 . . . σ 2 n . . . . ... ... ⎠ = . . . . . . . . ... . . ... . . . . . . ⎝ ⎝ ⎠ . . . . 0 0 . . . σ nn . . . σ nn mat.bwc n 1 . . . mat.bwc nn σ n 1 . . . σ nn mat.bwc jk = bwc jk ??? 13

  14. 10 genes 1.0 0.5 Pair by pair correlation 0.0 −0.5 15 −0.5 0.0 0.5 1.0 Matrix Correlation

  15. One-step M-estimation 20 genes 1.0 0.5 Converged 0.0 −0.5 −0.5 0.0 0.5 1.0 one−step Converged M-estimation was doing 10-25 iterations on average (Takes 11 seconds to compute 190 pairs of genes) 16

  16. Few-step 20 genes 20 genes 1.0 1.0 0.5 0.5 Converged Converged 0.0 0.0 −0.5 −0.5 −0.5 0.0 0.5 1.0 −0.5 0.0 0.5 1.0 3−step 5−step 3.5 seconds 5.5 seconds 17

  17. 1.5 1.0 Gene 18 0.5 0.0 −0.5 −1.0 −2.0 −1.5 −1.0 −0.5 0.0 0.5 Gene 11 18

  18. 10-step 20 genes 1.0 0.5 Converged 0.0 −0.5 −0.5 0.0 0.5 1.0 10−step 8 seconds 19

  19. Median instead of MCD • Median for ˜ µ • Median absolute deviation (MAD) for ˜ Σ MAD( X ) = median | x i − median( x i ) | If converged → no difference 20

  20. 20 genes 1.0 0.5 MCD converged 0.0 −0.5 −0.5 0.0 0.5 1.0 Median converged 7 seconds 21

  21. Few-step median 20 genes 20 genes 1.0 1.0 0.5 0.5 MCD converged MCD converged 0.0 0.0 −0.5 −0.5 −0.5 0.0 0.5 1.0 −0.5 0.0 0.5 1.0 Median 3−step Median 5−step 1.5 seconds 2.5 seconds 22

  22. 2 1 Gene 7 0 −1 −2 −2.5 −2.0 −1.5 −1.0 −0.5 0.0 0.5 Gene 17 23

  23. 10-step median 20 genes 1.0 0.5 MCD converged 0.0 −0.5 −0.5 0.0 0.5 1.0 Median 10−step 5 seconds 24

  24. 10-step median 5-step MCD 20 genes 20 genes 1.0 1.0 0.5 0.5 MCD converged Converged 0.0 0.0 −0.5 −0.5 −0.5 0.0 0.5 1.0 −0.5 0.0 0.5 1.0 Median 10−step 5−step 5 seconds 5.5 seconds 25

  25. Biweight correlation on clean data How biased/variable compared to Pearson correlation? Pearson correlation Biweight correlation 1.0 1.0 0.9 0.9 0.8 0.8 0.7 0.7 0.6 0.6 0.5 0.7636 0.8482 0.7850 0.7523 0.8541 0.7945 26

  26. What makes the difference? Multivariate normal data 1.0 0.9 Pearson correlation 0.8 0.7 0.6 0.6 0.7 0.8 0.9 1.0 Biweight correlation 27

  27. bw−pearson=0.1166 bw−pearson=0.1108 1 2 0 1 row11 row16 −1 0 −2 −1 −3 −2 −3 −2 −1 0 1 −3 −2 −1 0 1 row2 row2 bw−pearson=0.0523 bw−pearson=0.0003 3 2 2 1 1 row17 row15 0 0 −1 −1 28 −2 −2 −2 −1 0 1 −2 −1 0 1 row6 row5

  28. Concluding remarks • Biweight correlation is unbiased and similarly variable with Pearson correlation • Median and median absolute deviation for initiation of ˜ µ and ˜ Σ is as robust as MCD estimators • Median and median absolute deviation for initiation of ˜ µ and ˜ Σ is faster than MCD estimators • Depending on how robust we want the result to be, compu- tational time can be shortened by number of iterations for speed efficiency -Generally, 5 iterations or more is recommended 29

Recommend


More recommend