Estimating the Parameters of In fi nite Scale Mixtures of Normals Hasan Hamdan and John Nolan 36th Symposium on Interface: Computing Science and Statistics May 26 -9 Baltimore, Maryland
An Outline of the Presentation • Motivation, De fi nitions, General Problem • Variance Mixtures of Normals (VMN) 1. Examples of Variance Mixtures in R and in R n 2. Characterization Theorem 3. Approximation Theorem • Estimating the mixing measure • Further Research
Motivation • Identify and simplify in fi nite mixtures of normals, uniforms, and exponentials. • Approximate in fi nite mixtures with fi nite mixtures. Simpler forms. Closed form. Easier to study properties.
Variance Mixture of Normals • A random variable X is a variance mixture of normals if X d = AZ , where Z ∼ N (0 , 1), A is a random scale, with A and Z independent. We assume P ( A = 0) = 0 . R ∞ • Equivalently, X has pdf f ( x ) = g ( x | σ ) π ( d σ ) , 0 where g ( x | σ ) is the N(0, σ 2 ) density and the mix- ing measure π is the distribution of A . • Equivalently, the characteristic function φ X ( t ) of X can be written in the form Z ∞ φ X ( t ) = φ σ Z ( t ) π ( d σ ) , 0 where φ σ Z ( t ) is the characteristic function of the random variable σ Z ∼ N (0 , σ 2 ).
Examples in R and R n 1. Symmetric stable distributions A stable random variable X with index of stability α ∈ (0 , 2], scale parameter σ ∈ (0 , ∞ ), skewness parame- ter β ∈ [ − 1 , 1] and location parameter µ ∈ ( −∞ , ∞ ) is denoted by S α ( σ , β , µ ) . The characteristic function φ X ( u ) = ⎧ ³ − σ α | u | α h ³ π ´ i ´ ⎨ exp 1 − i β tan 2 α s ( u ) + iµu α 6 = 1 ³ h i ´ α = 1 , 1 + i β 2 ⎩ exp − σ | u | π s ( u ) ln ( | σ u | ) + iµu where s ( u ) = sign ( u ) . Suppose that X v N (0 , 2 σ 2 ), A is positive stable S α / 2 ((cos( πα / 4)) 2 / α , 1 , 0), and A and X are independent. Then W = A 1 / 2 X is symmetric α − stable (S α S) with scale σ .
Sub-Gaussian random vectors ó ! ³ πα ´´ 2 α , 1 , 0 Choose A ∼ S α cos with α < 2 . 4 2 Let G 0 = ( G 1 , ...., G n ) ∼ N (0 , Σ ) independent of A . 1 1 2 G n ) is S α S in R n with Then, X 0 =( A 2 G 1 , ...., A ⎛ ⎞ ¯ ¯ ⎛ ⎞ α ¯ ¯ ¯ θ 0 Σ θ 2 ¯ ⎜ ⎟ ⎜ ⎝ ⎠ ⎟ φ n ( θ ) = exp ⎝ − ⎠ . 2 For example, when n = 2, α = 1 , and G iid N (0 , 2 σ 2 ) ³ ´ 1 / 2 ) θ 2 1 + θ 2 φ 2 ( θ 1 , θ 2 ) = exp( − σ 2 and f ( x 1 , x 2 ) is the spherically symmetric Cauchy den- sity in R 2 .
2. Generalized t distributions Suppose that 1 /A 2 has a Gamma( α , β ) distribution. Equivalently, Ã ! 2 1 − 1 f A ( σ ) = σ (2 α +1) exp . β α Γ ( α ) βσ 2 Set the scale parameter β = 2 /c . Then the density function of X = AZ is given by k f ( x ) = ( x 2 + c ) α +1 / 2 , − ∞ < x < ∞ , (1) Γ ( α + 1 2 ) 2 α where k = β α Γ ( α ) . π 1 / 2 When α = n/ 2 and β = 2 /n , f ( x ) is the t density with n degrees of freedom.
Multivariate t If the mixing density is given by à ! 2 1 − 1 f A ( σ ) = σ (2 α +1) exp β α Γ ( α ) βσ 2 and X ∼ N (0 , I ) , then k 1 f X ( x ) = , ( k 2 + x 0 x ) α + n 2 ³ 2 ´ α Γ ( α + n 2 ) and k 2 = 2 where k 1 = β are constants. n β π 2 Γ ( α ) In particular, when α = n 2 , f X ( x ) is the multivariate SS Cauchy density in R n .
Characterization Theorem De fi nition A function h ( x ) on (0 , ∞ ) is completely monotone in x if it is in fi nitely di ff erentiable and ( − 1) m h ( m ) ( x ) ≥ 0 ∀ x and ∀ m = 1 , 2 , . . . . Examples are 1 1 x , x +1 , and exp( − x ) . Theorem 1 (Schoenberg (1938)) X with density f ( x ) is a V MN i ff h ( x ) = f ( x 1 / 2 ) is a completely monotone function. Equivalently, X is a V MN i ff φ X is a real, even function such that φ X ( t 1 / 2 ) is completely monotone on (0 , ∞ ).
Example Exponential Power Family The exponential power family consists of all distributions having densities of the form f ( x ) = k exp( − | x | b ) , x ∈ R and b > 0 . See West (1987) and Box and Tiao (1973). A random variable X with density f ( x ) is a variance mixtures of normals i ff 0 < b ≤ 2 . h ( x ) = f ( sqrt ( x )) = k exp( − x b/2 ) is completely montonic i ff 0 < b ≤ 2.
Approximating Scale Mixtures Case 1: A ∈ [a,b] where 0 < a < b < ∞ . X with density f ( x ) is a mixture of normals with known scale A having distribution π . If f ( x ) is di ffi cult to compute, then we can approxi- mate it by a fi nite mixture of the form M X f ∗ ( x ) = g ( x | σ j ) π j , j =1 where π 1 , . . . , π M are point masses concentrated on σ 1 , . . . , σ M in [ a, b ] . Questions • How many terms should we take to approximate f ( x ) by f ∗ ( x ) within ² ? • What values of π j and σ j should we choose?
¯ ¯ ¯ ¯ ¯ ∂ g Figure 1: ¯ at a fi xed σ as a function of x. ∂σ 0.4 0.3 abs(dg/dsigma) 0.2 0.1 0.0 -10 -5 0 5 10 x
Lemma 1 If σ 1 , σ 2 ∈ [ a, ∞ ), then 1 | g ( x | σ 1 ) − g ( x | σ 2 ) | ≤ (2 π ) a 2 | σ 1 − σ 2 | ∀ x ∈ R , where g ( x | σ ) is N (0 , σ 2 ) . Proof. ¯ ¯ ¯ ¯ x 2 − σ ¯ ¯ Fixing σ , | ∂ g ( x | σ ) / ∂σ | = ¯ g ( x | σ ) is maximized ¯ σ 2 at x = 0, where it takes value g (0 | σ ) / σ = 1 / ((2 π ) 1 / 2 σ 2 ) . Hence, | g ( x | σ 1 ) − g ( x | σ 2 ) | ≤ (max | ∂ g/ ∂σ | ) | σ 1 − σ 2 | = | σ 1 − σ 2 | / ((2 π ) 1 / 2 a 2 ) .
Theorem 2 Suppose X = AZ, where A is a positive random vari- able with distribution π having support [ a, b ] . For any ² > 0, there is a discrete distribution with at most M = M ( ², a, b ) point masses π 1 , . . . , π M concentrated on σ 1 , . . . , σ M in [ a, b ] which satis fi es ¯ ¯ ¯ ¯ M X ¯ ¯ ¯ ¯ sup ¯ f ( x ) − g ( x | σ j ) π j ¯ ≤ ². ¯ ¯ x ∈ R j =1 Proof. We adapted Lemma 1 from Byczkowski, Nolan, and Rajput (1993). • Fix any ² > 0, and 0 < a < b < ∞ . • De fi ne recursively. a j = a j − 1 + (2 π ) 1 / 2 a 2 a 0 = a, j − 1 ². (2) The distances between the a j ’s are strictly in- creasing, so there exists an M = M ( ², a, b ) such that a 2 M ≥ b .
• De fi ne a disjoint cover of [ a, b ]: I 1 = ( a 0 , a 2 ], I 2 = ( a 2 , a 4 ], . . . , I M = ( a 2 M − 2 , b ]. • Set π j = π ( I j ) and σ j = min( a 2 j − 1 , b ), j = 1 , . . . , M . R • g ( x | σ j ) π j = g ( x | σ j ) I j π ( d σ ) . Then, ¯ ¯ ¯ ¯ R R ¯ f ( x ) - P M [ a,b ] g ( x | σ )- P M ¯ ¯ ¯ ¯ j =1 g ( x | σ j ) π j Ij g ( x | σ j ) π (d σ ) ¯ = ¯ ¯ j = 1 ¯ ¯ ³ ´ R ¯P M ¯ ¯ = g ( x | σ ) − g ( x | σ j ) π ( d σ ) ¯ I j j =1 ¯ ¯ ≤ P M R ¯ ¯ ¯ g ( x | σ ) − g ( x | σ j ) ¯ π ( d σ ) . I j j =1 R ≤ P M I j ² π ( d σ ) = ². j =1
Case 2: A ∈ (0, ∞ ) . We can write f ( x ) as a sum of three integrals. Z ∞ Z a Z b Z ∞ g ( x | σ ) π ( d σ ) = 0 () + a () + () . (3) 0 b The following lemma shows that in all cases where f (0) is bounded, there exists an a and b such that the fi rst and last integrals can be made arbitrary small and the middle can be approximated using Theorem 3. Lemma 2 Let X = AZ be a scale mixture of normals, and ² > 0. (a) If f (0) < ∞ , then there exists an a > 0 such that R a 0 g ( x | σ ) π ( d σ ) < ² for all x ∈ R . R ∞ (b) There exists a b > 0 such that g ( x | σ ) π ( d σ ) < b ² for all x ∈ R .
(a) If f (0) < ∞ , then there exists an a > 0 such that R a 0 g ( x | σ ) π ( d σ ) < ² for all x ∈ R . Proof. R ∞ R ∞ σ − 1 π ( d σ ) = f (0) < f ( x ) = g ( x | σ ) π ( d σ ) ≤ k 0 0 ∞ . R a Let h ( a ) = 0 g ( x | σ ) π ( d σ ). Then, Z a 0 σ − 1 π ( d σ ) h ( a ) ≤ k Z ∞ 1 (0 ,a ) σ − 1 π ( d σ ). = k 0 Let a n be any sequence that converges to 0. Then 1 (0 ,a n ) σ − 1 → 0 pointwise on (0 , ∞ ) and 1 (0 ,a n ) σ − 1 ≤ σ − 1 ∈ L 1 ( π ) . So, h ( a n ) → 0 by the Dominated Convergence Theorem.
R ∞ (b) There exists a b > 0 such that g ( x | σ ) π ( d σ ) < b ² for all x ∈ R . Proof. R ∞ Let h ( b ) = g ( x | σ ) π ( d σ ). Then, b Z ∞ σ − 1 π ( d σ ) h ( b ) ≤ k b Z ∞ 1 ( b, ∞ ) σ − 1 π ( d σ ). ≤ k 0 Let b n be any sequence that converges to ∞ . Then 1 ( b n , ∞ ) σ − 1 → 0 and since the last expression is dominated by 1 b , the result holds by applying the Dominated Convergence Theorem.
Figure 2: Gamma and square root of Inverted Gamma with α = . 5 and β = 2 . 0.30 12 0.25 10 0.20 8 f(0.5, 2, x) f(0.5, 2, x) 0.15 6 0.10 4 0.05 2 0.0 0 0 2 4 6 8 10 0 2 4 6 8 10 x x Approximating the Cauchy Density When α = 1 2 and β = 2, the generalized t distribution is the standard Cauchy. π is the square root of In- verted Gamma with parameters α and β . In this case, the corresponding Gamma has a vertical asymptote at 0 and it is decreasing on Θ = [ a, b ].
Example A comparison between the fi nite and in fi nite mixture is made for di ff erent combinations of a , b , and ² . The maximum di ff erence between the actual density and the approximated density were found based on a = . 05 , b = 50 , and ² = . 03 on a grid of 101 equally spaced points. The maximum value for the relative distance between f and f ∗ is around .028.
Figure 3: a = . 05 , b = 50 and ² = . 03 . 0.30 0.25 f(x) f^(x) 0.20 y 0.15 0.10 0.05 0.0 -4 -2 0 2 4 x
Recommend
More recommend