Quantification of cross hybridization on oligonucleotide microarrays Li Zhang Dept. of Biostatistics, UT MDACC, Houston, TX 77030
DNA/RNA duplex on oligonucleotide microarrays The probe is a 25-mer DNA oligo: ATCAGCATACGA C AGAATGATGGAT ATCAGCATACGAGAGAATGATGGAT ||||||||||||||||||||||||| AAUAGUCGUAUGCUCUCUUACUACCUAGC cRNA fragment in solution expressed from a targeted gene
Modes of binding on probes 1. Gene-specific binding: (Mismatches=0) 2. Cross hybridization: (I) Non-specific binding (Mismatches>5) (II) Binding of related sequences (0<Mismatches<5 )
Binding energies Binding energy = f (distance, interacting partners) Gene-specific binding energy: ∑ Ε = ω ε ( b , b ) + i i 1 i Non-specific binding energy: ∑ Ε = ω ε * * * ( b , b ) + i i i 1
Thermodynamic model of binding on a probe N N * j = + + I B Probe Signal: ij E E * + + 1 e 1 e ij ij ∑ = − Fitness: 2 T (ln I ln I ) ij , obs ij Constraints: • N*, B are the same on a microarray; • N j is the same in a probe set. •Energy parameters Minimization of T • B, N*, N j
Weight factors reflect dynamic properties of binding on the probes Non-specific binding(PM & MM) Gene-specific binding (PM) Gene-specific binding (MM)
Stacking energy of base-pairs
Fitting the model ln (signal) Probe index N N * j = + + I B ij E E * + + 1 e 1 e ij ij
The baseline of non-specific binding N N * j = + + I B Non-specific binding energy ij E E * + + 1 e 1 e ij ij
The effect of mismatch depends on the nearest-neighbors 3 C C T 2 T A A G A G 1 0 -1 -2 < ln(PM/MM) > E*(PM)-E*(MM) -3 Middle 3 bases of PM probe N N * j = + + I B ij E E * + + 1 e 1 e ij ij
Effect of cross hybridization on model fitting
Latin-square tests with ‘spiked-in’ genes Gene 1 2 3 4 5 6 7 8 9 10 11 12 13 14 1 0 0.25 0.5 1 2 4 8 16 32 64 128 256 512 1024 Sample 2 0.25 0.5 1 2 4 8 16 32 64 128 256 512 1024 0 3 0.5 1 2 4 8 16 32 64 128 256 512 1024 0 0.25 4 1 2 4 8 16 32 64 128 256 512 1024 0 0.25 0.5 5 2 4 8 16 32 64 128 256 512 1024 0 0.25 0.5 1 6 4 8 16 32 64 128 256 512 1024 0 0.25 0.5 1 2 7 8 16 32 64 128 256 512 1024 0 0.25 0.5 1 2 4 8 16 32 64 128 256 512 1024 0 0.25 0.5 1 2 4 8 9 32 64 128 256 512 1024 0 0.25 0.5 1 2 4 8 16 10 64 128 256 512 1024 0 0.25 0.5 1 2 4 8 16 32 11 128 256 512 1024 0 0.25 0.5 1 2 4 8 16 32 64 12 256 512 1024 0 0.25 0.5 1 2 4 8 16 32 64 128 13 512 1024 0 0.25 0.5 1 2 4 8 16 32 64 128 256 14 1024 0 0.25 0.5 1 2 4 8 16 32 64 128 256 512 Data source: Affymetrix Inc.
Accuracy of estimated gene expression levels
Reproducibility of estimated gene expression levels
Correlation of expression levels between batches of experiments
Spotting cross hybridizing probes Cross hyb probes: unknown EST Cross hyb probes: interleukin-8 receptor type B Cross hyb source: salivary alpha-amylase Cross hyb source: angiotensinogen serine (or cysteine) proteinase inhibitor
The missing 12 th gene?
Is the missing gene an alternative splicing variant? Gene name: rTS beta protein
Conclusions •Probe signals can be decomposed into two modes: gene specific binding and non-specific binding. •Sequence dependence of probe signals can be determined by a thermodynamic model. •For the given data set, the amount of cross hybridization and its sources on the probes can be determined.
Acknowledgements Ken D Aldape Ken Hess Keith A. Baggerly James Mitchell Norris Clift Lianchun Xiao Kevin R. Coombes
Website for downloading the program Perfect Match http://bioinformatics.mdanderson.org
Clustering crosshyb effects
The misfits happen in the same probe pairs Plot of residues 3 2 ln(PMfitted) - ln (PM) 1 0 -1 -2 -3 -4 -2 0 2 4 ln(MMfitted) - ln (MM)
Basic questions of microarrays •How can we determine gene expression levels from probe signals? •How does probe binding affinity depend on the probe’s sequence? •How much binding on a probe is due to non- specific binding? •Why sometimes a mismatch probe signal is stronger than the corresponding perfect match probe signal?
Why a good physical model for microarrays is important •Recognize erratic probe signals •Eliminate inefficient probes •Extract accurate and reliable gene expression levels
Protocol of a microarray experiment
Oligonucleotide microarray technology Affymetrix array ~ Affinity matrix Basic features: •Probe set -- Multiple probes for a gene target •Probe pair -- Perfect match vs. mismatch
Effects of alternative splicing Even probes: 2 4 6 8 10 12 14 16 Odd probes: 1 3 5 7 9 11 13 15 1 st half probes: 1 2 3 4 5 6 7 8 2 nd half probes: 9 10 11 12 13 14 15 16 Alternative splicing: DNA: ------------------------ Fitting the model with sub- mRNA1: ------------------------ mRNA2: ----------- divided probe sets mRNA3: ------------ --------- (
Mechanism of non-specific binding on the probes A. Non-specific binding energy is much higher than gene specific energy (E* - E = 5 k B T) B. Source of non-specific binding is much higher than source of gene-specific binding. C. Non-specific binding is very loose, flexible, and contains many mismatches. D. Non-specific binding depends on stacking energy, which in turn depends on the probe sequence.
Energetic aspect of probe design 9 8 Total signal 7 ln (observed signal) 6 Gene specific signal 5 4 3 2 1 Background 0 -5 -4 -3 -2 -1 0 1 2 3 4 5 Gene specific binding energy
Boltzmann Distribution N 1 − E E N − 1 2 = k T 1 e N 2 B N E 1 2 E 2 N 1 = 1 Binding affinity: ∆ + + E N N 1 e 1 2
Binding on microarrays Probe’s response to gene expression: • Some probes always give strong signals, even when Probe Signal the targeted gene is absent. • Some probes always give weak signals, even when the targeted gene is abundant. Level of expression
Recommend
More recommend