characterization modeling and characterization modeling
play

Characterization, Modeling, and Characterization, Modeling, and - PowerPoint PPT Presentation

Characterization, Modeling, and Characterization, Modeling, and Simulation Simulation of Mouse Microarray Microarray Data Data of Mouse David S. Lalush Bioinformatics Research Center North Carolina State University Acknowledgments


  1. Characterization, Modeling, and Characterization, Modeling, and Simulation Simulation of Mouse Microarray Microarray Data Data of Mouse David S. Lalush Bioinformatics Research Center North Carolina State University

  2. Acknowledgments • Assistance from: – Jeff Tucker (NIEHS) – Pierre Bushel (NIEHS) – Bruce Weir (NCSU) • Funded by K01 HG02428, National Human Genome Research Institute

  3. Outline • Microarray Simulation Project • Characterization of Microarray Images • Results of Characterization • Simulations • Conclusion

  4. Outline • Microarray Simulation Project • Characterization of Microarray Images • Results of Characterization • Simulations • Conclusion

  5. Microarray in Diagnosis Type I tumors Type II tumors Microarray Microarray Gene Gene expression expression pattern pattern

  6. Microarray in Diagnosis Unknown tumor Microarray Gene Type I or type II? expression pattern Probability of misclassification?

  7. Research Focus • Evaluating classification methods • Studying variability in microarray data Problems: •Many replications are required to evaluate error rates. •Microarray experiments are expensive. •True patterns are unknown in real data.

  8. Microarray Simulation • Creating a realistic simulation of microarray data • Accounting for various sources of variability in the system Advantages: •Generates many replications cheaply. •True patterns are known. •Can control sources of variability.

  9. Microarray System Slide Printing Hybridization Sample Scanning Preparation Image Processing Data Analysis

  10. Simulation Model Sample Array Printing Slide Scanning And Hybridization Pin

  11. Simulation Model Sample •Gene expression variation modeled as multivariate normal •Global expression variations Array Printing Slide Scanning modeled as normal And Hybridization Pin

  12. Simulation Model Sample •Background level modeled as normal (dye-dependent) Array Printing Slide Scanning •Defects modeled as 2D causal And Hybridization Markov random field Pin

  13. Simulation Model Sample Array Printing Slide Scanning • Spot size, shape, and orientation And Hybridization modeled as normal • Spot defects modeled with 2D causal Markov random field Pin

  14. Simulation Model Sample •Instantiates spots based on properties from sample, slide, and pin Array Printing Slide Scanning And Hybridization Pin

  15. Simulation Model Sample Array Printing Slide Scanning And Hybridization •Creates discretized image based on spots, SNR, gain, resolution, and blur parameters Pin

  16. Characterization • Characterization of existing microarray images – Spot properties (size, shape, uniformity) – Pin properties (spot uniformity) – Slide properties (background, signal-to-noise) – Gene properties (mean, variance, covariance)

  17. Outline • Microarray Simulation Project • Characterization of Microarray Images • Results of Characterization • Simulations • Conclusion

  18. Characterization • Characterization of mouse kidney dataset – Six mice – Four slides each (2x2 fluor flip) – 24 slides in all – 5520 spots in 16 blocks, 4x4 block pattern

  19. Characterization of Spots • Step 1: Spot Detection

  20. Characterization of Spots • Step 2: Spot Morphology Measures Cast rays from centroid Radius Area Eccentricity

  21. Characterization of Spots • Step 3: Spot Intensity Measures – Mean and standard deviation of spot pixels – Mean and standard deviation of background pixels

  22. Characterization of Spots • Step 4: Secondary Intensity Measures Separability − ( signal background ) 2 2 σ + σ signal background

  23. Characterization of Spots • Step 4: Secondary Intensity Measures Spot Uniformity σ signal signal

  24. Characterization of Spot Defects • Spots often exhibit characteristic nonuniformities – Low center – Spot breaks

  25. Characterization of Spot Defects Consider each spot to have two regions Normal region Defect region

  26. Characterization of Spot Defects Each region acts as a hidden state. Each state has its own distribution of emitted intensities. State 0: N State 1: D

  27. Characterization of Spot Defects The probability of a pixel being in a given state depends on its neighbors. N N D D X P(X | N,N,D,D)

  28. Characterization of Spot Defects Region Model (2D causal MRF): • 16 parameters for state transition • 2 parameters for intensity of D region pixels relative to N region (mean, s.d.) State 0: N State 1: D

  29. Characterization of Spot Defects Applying the Region Model Pixel is in D region if: •It is in the spot •It is below the spot average intensity in BOTH channels State 0: N State 1: D

  30. Characterization of Spot Defects Applying the Region Model •Smooth region boundary •Compute the 18 parameters for each spot State 0: N State 1: D

  31. Characterization of Background • Base level and variation – Modeled as stationary across slide • Background defects – Marks, scratches, bright spots, other features – Modeled with 2D Markov random field

  32. Characterization of Background • Classify all background pixels as normal or defect 0.7 – Defect is 2 σ above background mean 0.6 0.5 • Compute statistics on normal background Probability 0.4 • Apply 2D MRF to model defect state 0.3 – Similar to region model 0.2 – Intensities are modeled as beta distribution 0.1 • Measures taken only by slide 0 0 0.01 0.02 0.03 0.04 0.05 Relative Defect Intensity

  33. Characterization of Gene Expression • Multivariate normal distribution for each sample (test or reference) – Mean vector – Covariance matrix • Linear model to account for global effects from slide to slide and dye effects Sample = (mean gene expression) + slope * (slide perturbation) + (variable expression)

  34. Characterization of Gene Expression • Problem: Covariance matrix is BIG (5200x5200) – In simulation, we will have to diagonalize it. • Model the most significant correlations – Compute correlations between each pair of genes on each slide – Cluster genes by correlation distance – Each gene in a cluster has greater than .48 absolute correlation with every other gene in the cluster

  35. Analyzing Characterization Data • Two-way ANOVA – By slide (fixed) – By pin (random) • Which properties varied more? – By slide – By pin – By spot

  36. Analyzing Characterization Data • Spot morphology measures • Spot secondary intensity measures • Spot defect model parameters • Background defect model parameters (by slide measurement only - no ANOVA) Only spots with separability > 1 used in ANOVA

  37. Outline • Microarray Simulation Project • Characterization of Microarray Images • Results of Characterization • Simulations • Conclusion

  38. Results Sometimes the images have their own story to tell.

  39. Results: Spot Morphology • Most variation (75% for size measures) was attributed to variation by spot • Pins behaved similarly (mostly) • Slides showed some differences in last eight slides (mice five and six)

  40. Results: Spot Morphology Spot size vs. Pin Number 9 8 7 Radius (pixels) 6 5 4 3 2 1 0 Pin Number

  41. Results: Spot Morphology Spot size vs. Slide Number 9 8 7 Radius (pixels) 6 Mouse 5 Mouse 6 5 Mouse 1 Mouse 2 Mouse 3 Mouse 4 4 3 2 1 0 Slide Number

  42. Results: Spot Intensities • Most variation in separability (83-90%) was attributed to variation by spot • Spot uniformity varied considerably by slide, mostly due to last eight slides

  43. Results: Spot Intensities Spot uniformity (532nm) vs. Slide Number 0.8 0.7 Uniformity (532 nm) 0.6 0.5 0.4 0.3 Mouse 5 Mouse 6 0.2 Mouse 1 Mouse 4 Mouse 2 Mouse 3 0.1 0 Slide Number

  44. Results: Spot Defect MRF • The 16 region transition probability parameters varied by pin – Model the MRF as a property of a pin, not a slide • The mean intensity of defect region was strongly dependent on the pin. • Mean intensity of defect region varied considerably by slide.

  45. Results: Spot Defect MRF Defect region intensity vs. Slide Number 0.9 relative to normal region mean Low region mean intensity 0.8 0.7 Mouse 3 Mouse 4 0.6 Mouse 1 Mouse 2 0.5 0.4 Mouse 5 Mouse 6 0.3 0.2 0.1 0 Slide Number

  46. Results: Background MRF • Last eight slides had more intense background defects • Last eight also had higher probabilities of generating a defect

  47. Results: Background MRF Background defect intensity vs. Slide Number 0.5 Intensity of Background Defects Relative to Background Mean 0.4 Mouse 6 0.3 Mouse 5 0.2 0.1 Mouse 4 Mouse 1 Mouse 3 Mouse 2 0 Slide Number

  48. Results: General • Slide-pin interactions were small (<5% of variance in all cases) • Therefore, modeling of slide and pin effects separately is justified.

  49. Results: Summary • Characterization shows differences in the properties of slides for mice five and six: – Spots were more likely to be broken. – Spot breaks were more severe. – Background defects were more numerous. – Background defects were more intense. Did this impact the estimated mouse-to-mouse variation?

Recommend


More recommend