Estimating the Parameters of In fi nite Scale Mixtures of Normals - PDF document

Estimating the Parameters of In fi nite Scale Mixtures of Normals Hasan Hamdan and John Nolan 36th Symposium on Interface: Computing Science and Statistics May 26 -9 Baltimore, Maryland

An Outline of the Presentation • Motivation, De fi nitions, General Problem • Variance Mixtures of Normals (VMN) 1. Examples of Variance Mixtures in R and in R n 2. Characterization Theorem 3. Approximation Theorem • Estimating the mixing measure • Further Research

Motivation • Identify and simplify in fi nite mixtures of normals, uniforms, and exponentials. • Approximate in fi nite mixtures with fi nite mixtures. Simpler forms. Closed form. Easier to study properties.

Variance Mixture of Normals • A random variable X is a variance mixture of normals if X d = AZ , where Z ∼ N (0 , 1), A is a random scale, with A and Z independent. We assume P ( A = 0) = 0 . R ∞ • Equivalently, X has pdf f ( x ) = g ( x | σ ) π ( d σ ) , 0 where g ( x | σ ) is the N(0, σ 2 ) density and the mixing measure π is the distribution of A . • Equivalently, the characteristic function φ X ( t ) of X can be written in the form Z ∞ φ X ( t ) = φ σ Z ( t ) π ( d σ ) , 0 where φ σ Z ( t ) is the characteristic function of the random variable σ Z ∼ N (0 , σ 2 ).

Examples in R and R n 1. Symmetric stable distributions A stable random variable X with index of stability α ∈ (0 , 2], scale parameter σ ∈ (0 , ∞ ), skewness parameter β ∈ [ − 1 , 1] and location parameter µ ∈ ( −∞ , ∞ ) is denoted by S α ( σ , β , µ ) . The characteristic function φ X ( u ) = ⎧ ³ − σ α | u | α h ³ π ´ i ´ ⎨ exp 1 − i β tan 2 α s ( u ) + iµu α 6 = 1 ³ h i ´ α = 1 , 1 + i β 2 ⎩ exp − σ | u | π s ( u ) ln ( | σ u | ) + iµu where s ( u ) = sign ( u ) . Suppose that X v N (0 , 2 σ 2 ), A is positive stable S α / 2 ((cos( πα / 4)) 2 / α , 1 , 0), and A and X are independent. Then W = A 1 / 2 X is symmetric α − stable (S α S) with scale σ .

Sub-Gaussian random vectors Ã³ ! ³ πα ´´ 2 α , 1 , 0 Choose A ∼ S α cos with α < 2 . 4 2 Let G 0 = ( G 1 , ...., G n ) ∼ N (0 , Σ ) independent of A . 1 1 2 G n ) is S α S in R n with Then, X 0 =( A 2 G 1 , ...., A ⎛ ⎞ ¯ ¯ ⎛ ⎞ α ¯ ¯ ¯ θ 0 Σ θ 2 ¯ ⎜ ⎟ ⎜ ⎝ ⎠ ⎟ φ n ( θ ) = exp ⎝ − ⎠ . 2 For example, when n = 2, α = 1 , and G iid N (0 , 2 σ 2 ) ³ ´ 1 / 2 ) θ 2 1 + θ 2 φ 2 ( θ 1 , θ 2 ) = exp( − σ 2 and f ( x 1 , x 2 ) is the spherically symmetric Cauchy density in R 2 .

2. Generalized t distributions Suppose that 1 /A 2 has a Gamma( α , β ) distribution. Equivalently, Ã ! 2 1 − 1 f A ( σ ) = σ (2 α +1) exp . β α Γ ( α ) βσ 2 Set the scale parameter β = 2 /c . Then the density function of X = AZ is given by k f ( x ) = ( x 2 + c ) α +1 / 2 , − ∞ < x < ∞ , (1) Γ ( α + 1 2 ) 2 α where k = β α Γ ( α ) . π 1 / 2 When α = n/ 2 and β = 2 /n , f ( x ) is the t density with n degrees of freedom.

Multivariate t If the mixing density is given by Ã ! 2 1 − 1 f A ( σ ) = σ (2 α +1) exp β α Γ ( α ) βσ 2 and X ∼ N (0 , I ) , then k 1 f X ( x ) = , ( k 2 + x 0 x ) α + n 2 ³ 2 ´ α Γ ( α + n 2 ) and k 2 = 2 where k 1 = β are constants. n β π 2 Γ ( α ) In particular, when α = n 2 , f X ( x ) is the multivariate SS Cauchy density in R n .

Characterization Theorem De fi nition A function h ( x ) on (0 , ∞ ) is completely monotone in x if it is in fi nitely di ff erentiable and ( − 1) m h ( m ) ( x ) ≥ 0 ∀ x and ∀ m = 1 , 2 , . . . . Examples are 1 1 x , x +1 , and exp( − x ) . Theorem 1 (Schoenberg (1938)) X with density f ( x ) is a V MN i ff h ( x ) = f ( x 1 / 2 ) is a completely monotone function. Equivalently, X is a V MN i ff φ X is a real, even function such that φ X ( t 1 / 2 ) is completely monotone on (0 , ∞ ).

Example Exponential Power Family The exponential power family consists of all distributions having densities of the form f ( x ) = k exp( − | x | b ) , x ∈ R and b > 0 . See West (1987) and Box and Tiao (1973). A random variable X with density f ( x ) is a variance mixtures of normals i ff 0 < b ≤ 2 . h ( x ) = f ( sqrt ( x )) = k exp( − x b/2 ) is completely montonic i ff 0 < b ≤ 2.

Approximating Scale Mixtures Case 1: A ∈ [a,b] where 0 < a < b < ∞ . X with density f ( x ) is a mixture of normals with known scale A having distribution π . If f ( x ) is di ffi cult to compute, then we can approximate it by a fi nite mixture of the form M X f ∗ ( x ) = g ( x | σ j ) π j , j =1 where π 1 , . . . , π M are point masses concentrated on σ 1 , . . . , σ M in [ a, b ] . Questions • How many terms should we take to approximate f ( x ) by f ∗ ( x ) within ² ? • What values of π j and σ j should we choose?

¯ ¯ ¯ ¯ ¯ ∂ g Figure 1: ¯ at a fi xed σ as a function of x. ∂σ 0.4 0.3 abs(dg/dsigma) 0.2 0.1 0.0 -10 -5 0 5 10 x

Lemma 1 If σ 1 , σ 2 ∈ [ a, ∞ ), then 1 | g ( x | σ 1 ) − g ( x | σ 2 ) | ≤ (2 π ) a 2 | σ 1 − σ 2 | ∀ x ∈ R , where g ( x | σ ) is N (0 , σ 2 ) . Proof. ¯ ¯ ¯ ¯ x 2 − σ ¯ ¯ Fixing σ , | ∂ g ( x | σ ) / ∂σ | = ¯ g ( x | σ ) is maximized ¯ σ 2 at x = 0, where it takes value g (0 | σ ) / σ = 1 / ((2 π ) 1 / 2 σ 2 ) . Hence, | g ( x | σ 1 ) − g ( x | σ 2 ) | ≤ (max | ∂ g/ ∂σ | ) | σ 1 − σ 2 | = | σ 1 − σ 2 | / ((2 π ) 1 / 2 a 2 ) .

Theorem 2 Suppose X = AZ, where A is a positive random variable with distribution π having support [ a, b ] . For any ² > 0, there is a discrete distribution with at most M = M ( ², a, b ) point masses π 1 , . . . , π M concentrated on σ 1 , . . . , σ M in [ a, b ] which satis fi es ¯ ¯ ¯ ¯ M X ¯ ¯ ¯ ¯ sup ¯ f ( x ) − g ( x | σ j ) π j ¯ ≤ ². ¯ ¯ x ∈ R j =1 Proof. We adapted Lemma 1 from Byczkowski, Nolan, and Rajput (1993). • Fix any ² > 0, and 0 < a < b < ∞ . • De fi ne recursively. a j = a j − 1 + (2 π ) 1 / 2 a 2 a 0 = a, j − 1 ². (2) The distances between the a j ’s are strictly in- creasing, so there exists an M = M ( ², a, b ) such that a 2 M ≥ b .

• De fi ne a disjoint cover of [ a, b ]: I 1 = ( a 0 , a 2 ], I 2 = ( a 2 , a 4 ], . . . , I M = ( a 2 M − 2 , b ]. • Set π j = π ( I j ) and σ j = min( a 2 j − 1 , b ), j = 1 , . . . , M . R • g ( x | σ j ) π j = g ( x | σ j ) I j π ( d σ ) . Then, ¯ ¯ ¯ ¯ R R ¯ f ( x ) - P M [ a,b ] g ( x | σ )- P M ¯ ¯ ¯ ¯ j =1 g ( x | σ j ) π j Ij g ( x | σ j ) π (d σ ) ¯ = ¯ ¯ j = 1 ¯ ¯ ³ ´ R ¯P M ¯ ¯ = g ( x | σ ) − g ( x | σ j ) π ( d σ ) ¯ I j j =1 ¯ ¯ ≤ P M R ¯ ¯ ¯ g ( x | σ ) − g ( x | σ j ) ¯ π ( d σ ) . I j j =1 R ≤ P M I j ² π ( d σ ) = ². j =1

Case 2: A ∈ (0, ∞ ) . We can write f ( x ) as a sum of three integrals. Z ∞ Z a Z b Z ∞ g ( x | σ ) π ( d σ ) = 0 () + a () + () . (3) 0 b The following lemma shows that in all cases where f (0) is bounded, there exists an a and b such that the fi rst and last integrals can be made arbitrary small and the middle can be approximated using Theorem 3. Lemma 2 Let X = AZ be a scale mixture of normals, and ² > 0. (a) If f (0) < ∞ , then there exists an a > 0 such that R a 0 g ( x | σ ) π ( d σ ) < ² for all x ∈ R . R ∞ (b) There exists a b > 0 such that g ( x | σ ) π ( d σ ) < b ² for all x ∈ R .

(a) If f (0) < ∞ , then there exists an a > 0 such that R a 0 g ( x | σ ) π ( d σ ) < ² for all x ∈ R . Proof. R ∞ R ∞ σ − 1 π ( d σ ) = f (0) < f ( x ) = g ( x | σ ) π ( d σ ) ≤ k 0 0 ∞ . R a Let h ( a ) = 0 g ( x | σ ) π ( d σ ). Then, Z a 0 σ − 1 π ( d σ ) h ( a ) ≤ k Z ∞ 1 (0 ,a ) σ − 1 π ( d σ ). = k 0 Let a n be any sequence that converges to 0. Then 1 (0 ,a n ) σ − 1 → 0 pointwise on (0 , ∞ ) and 1 (0 ,a n ) σ − 1 ≤ σ − 1 ∈ L 1 ( π ) . So, h ( a n ) → 0 by the Dominated Convergence Theorem.

R ∞ (b) There exists a b > 0 such that g ( x | σ ) π ( d σ ) < b ² for all x ∈ R . Proof. R ∞ Let h ( b ) = g ( x | σ ) π ( d σ ). Then, b Z ∞ σ − 1 π ( d σ ) h ( b ) ≤ k b Z ∞ 1 ( b, ∞ ) σ − 1 π ( d σ ). ≤ k 0 Let b n be any sequence that converges to ∞ . Then 1 ( b n , ∞ ) σ − 1 → 0 and since the last expression is dominated by 1 b , the result holds by applying the Dominated Convergence Theorem.

Figure 2: Gamma and square root of Inverted Gamma with α = . 5 and β = 2 . 0.30 12 0.25 10 0.20 8 f(0.5, 2, x) f(0.5, 2, x) 0.15 6 0.10 4 0.05 2 0.0 0 0 2 4 6 8 10 0 2 4 6 8 10 x x Approximating the Cauchy Density When α = 1 2 and β = 2, the generalized t distribution is the standard Cauchy. π is the square root of In- verted Gamma with parameters α and β . In this case, the corresponding Gamma has a vertical asymptote at 0 and it is decreasing on Θ = [ a, b ].

Example A comparison between the fi nite and in fi nite mixture is made for di ff erent combinations of a , b , and ² . The maximum di ff erence between the actual density and the approximated density were found based on a = . 05 , b = 50 , and ² = . 03 on a grid of 101 equally spaced points. The maximum value for the relative distance between f and f ∗ is around .028.

Figure 3: a = . 05 , b = 50 and ² = . 03 . 0.30 0.25 f(x) f^(x) 0.20 y 0.15 0.10 0.05 0.0 -4 -2 0 2 4 x

Estimating the Parameters of In fi nite Scale Mixtures of Normals - PDF document

Estimating the Parameters of In fi nite Scale Mixtures of Normals Hasan Hamdan and John Nolan 36th Symposium on Interface: Computing Science and Statistics May 26 -9 Baltimore, Maryland An Outline of the Presentation Motivation, De fi

In fi nite Parallel Universes: State at the Edge Peter Bourgon Fastly In fi nite Parallel

Estimating Variance under Estimating Mean . . . Interval and Fuzzy Estimating Variance . . .

Awards Presentation at the Gala Nite 2019 This year at the Gala Nite awards were presented in the

Analysis of a model of elastic plastic mixtures (Prandtl-Reuss-mixtures) Project of Josef

Estimating Estimating Covariance . . . Statistical Characteristics Estimating . . . Proof of

Constraints, Graphs, Algebra, Logic, and complexity Moshe Y. Vardi Rice University Constraint

The NITE XML Toolkit Jonathan Kilgour and Jean Carletta University of Edinburgh Dialogue

Planning III-A: Planning III-A: Estimating Software Size - Estimating Software Size -

Estimating Frequency Moments Estimating F 0 Algorithm Correctness Further Anil Maheshwari

Estimating Frequency Moments Moments Estimating F 0 Algorithm Correctness Anil Maheshwari

Release granular mushrooms Release granular mushrooms and dried mixtures and dried mixtures

The science of mixtures and separation techniques Rahul Bhambure PhD Scientist, Chemical

Mixtures of models Michel Bierlaire michel.bierlaire@epfl.ch Transport and Mobility Laboratory

Camera Parameters INEL 6088 Computer Vision Camera Parameters Extrinsic parameters: define

Estimating Parameters of Pareto Distribution Under Interval and Fuzzy Uncertainty Nitaya Buntao

Estimating the parameters of some probability distributions: Exemplifications 1. Estimating the

Defence R&D Canada CORA R & D pour la dfense Canada CARO Nearly a

Experimental Investigation on Double Recycling of Asphalt Mixture for Pavement Applications

Restoring the Duwamish: What is at Stake? May 15, 2013 | 1 Lower Duwamish Waterway - Background

Understand Linear Functions Martin Flashman UCDMP Saturday Series 2014-15 November 1, 2014

approach November 2018 1 Why does AIReF need demographic forecasts?: AIReF must analyse the

Some Markov models for direct observation of behavior James E. Pustejovsky Northwestern

Improving on the Small Sample Size Inference Jim Harmon University of Washington February 25,

Energy-efficient Trajectory Tracking for Mobile Devices Based on "Energy-efficient

Sambuz

Useful Links

Newsletter

Mail Us

Estimating the Parameters of In fi nite Scale Mixtures of Normals - PDF document

Estimating the Parameters of In fi nite Scale Mixtures of Normals Hasan Hamdan and John Nolan 36th Symposium on Interface: Computing Science and Statistics May 26 -9 Baltimore, Maryland An Outline of the Presentation Motivation, De fi

In fi nite Parallel Universes: State at the Edge Peter Bourgon Fastly In fi nite Parallel

Estimating Variance under Estimating Mean . . . Interval and Fuzzy Estimating Variance . . .

Awards Presentation at the Gala Nite 2019 This year at the Gala Nite awards were presented in the

Analysis of a model of elastic plastic mixtures (Prandtl-Reuss-mixtures) Project of Josef

Estimating Estimating Covariance . . . Statistical Characteristics Estimating . . . Proof of

Constraints, Graphs, Algebra, Logic, and complexity Moshe Y. Vardi Rice University Constraint

The NITE XML Toolkit Jonathan Kilgour and Jean Carletta University of Edinburgh Dialogue

Planning III-A: Planning III-A: Estimating Software Size - Estimating Software Size -

Estimating Frequency Moments Estimating F 0 Algorithm Correctness Further Anil Maheshwari

Estimating Frequency Moments Moments Estimating F 0 Algorithm Correctness Anil Maheshwari

Release granular mushrooms Release granular mushrooms and dried mixtures and dried mixtures

The science of mixtures and separation techniques Rahul Bhambure PhD Scientist, Chemical

Mixtures of models Michel Bierlaire michel.bierlaire@epfl.ch Transport and Mobility Laboratory

Camera Parameters INEL 6088 Computer Vision Camera Parameters Extrinsic parameters: define

Estimating Parameters of Pareto Distribution Under Interval and Fuzzy Uncertainty Nitaya Buntao

Estimating the parameters of some probability distributions: Exemplifications 1. Estimating the

Defence R&amp;D Canada CORA R &amp; D pour la dfense Canada CARO Nearly a

Experimental Investigation on Double Recycling of Asphalt Mixture for Pavement Applications

Restoring the Duwamish: What is at Stake? May 15, 2013 | 1 Lower Duwamish Waterway - Background

Understand Linear Functions Martin Flashman UCDMP Saturday Series 2014-15 November 1, 2014

approach November 2018 1 Why does AIReF need demographic forecasts?: AIReF must analyse the

Some Markov models for direct observation of behavior James E. Pustejovsky Northwestern

Improving on the Small Sample Size Inference Jim Harmon University of Washington February 25,

Energy-efficient Trajectory Tracking for Mobile Devices Based on &quot;Energy-efficient

Sambuz

Useful Links

Newsletter

Mail Us

Defence R&D Canada CORA R & D pour la dfense Canada CARO Nearly a

Energy-efficient Trajectory Tracking for Mobile Devices Based on "Energy-efficient