Exponential Families Exponential Families Leila Wehbe March 19, 2013 Leila Wehbe Exponential Families
Exponential Families Exponential Families Functions of the sort: p ( x , θ ) = exp( < φ ( x ) , θ > − g ( θ )) Where φ ( x ) is a sufficient statistic: ”no other statistic which can be calculated from the same sample provides any additional information as to the value of the parameter” Leila Wehbe Exponential Families
Exponential Families Exponential Families p ( x , θ ) = exp( < φ ( x ) , θ > + g ( θ )) g ( θ ) = log � x exp( < φ ( x ) , θ > ) is the partition function. g ( θ ) = log � x exp( < φ ( x ) , θ > ) dx if x continuous � � p ( x , θ ) dx = exp( < φ ( x ) , θ > − g ( θ )) dx 1 � = exp( < φ ( x ) , θ > ) dx = 1 exp ( g ( θ )) � exp ( g ( θ )) = exp( < φ ( x ) , θ > ) dx � g ( θ ) = exp( < φ ( x ) , θ > ) dx log x Leila Wehbe Exponential Families
Exponential Families Example: Bernoulli distribution p ( x ) = p x (1 − p ) (1 − x ) φ ( x ) = x p θ = log( 1 − p ) g ( θ ) = log(1 + e θ ) e θ 1 1 p ( x = 1) = 1+ e θ = e − θ +1 and p ( x = 0) = 1+ e θ Leila Wehbe Exponential Families
Exponential Families Example: Normal distribution 2 π exp( − ( x − µ ) 2 1 p ( x ) = √ 2 σ 2 σ φ ( x ) = [ x , x 2 ] θ = [ µ σ 2 , − 1 2 σ 2 ] g ( θ ) = − θ 2 4 θ 2 − 1 1 2 log( − 2 θ 2 ) Leila Wehbe Exponential Families
Exponential Families Log Partition Function generates cumulants � ∂ θ g ( θ ) = ∂ θ log exp < φ ( x ) , θ > dx � φ ( x ) exp < φ ( x ) , θ > dx = � exp < φ ( x ) , θ > dx � = φ ( x ) exp( < φ ( x ) , θ > − g ( θ )) dx = E [ φ ( x )] Leila Wehbe Exponential Families
Exponential Families Log Partition Function generates cumulants � ∂ 2 θ g ( θ ) = ∂ θ φ ( x ) exp( < φ ( x ) , θ > − g ( θ )) dx � φ ( x )[ φ ( x ) ⊤ − ∂ θ g ( θ )] exp( < φ ( x ) , θ > − g ( θ )) dx = E [ φ ( x ) φ ( x ) ⊤ ] − E [ φ ( x )] E [ φ ( x )] ⊤ = Leila Wehbe Exponential Families
Exponential Families Example: Bernoulli distribution g ( θ ) = log(1 + e θ ) e θ E [ x ] = ∂ θ g ( θ ) = 1+ e θ = p ( x = 1) e θ Var [ x ] = ∂ 2 θ g ( θ ) = [1+ e θ ] 2 = p ( x = 1) p ( x = 0) Leila Wehbe Exponential Families
Exponential Families Example: Poisson distribution λ x e − λ p ( x ) = x ! φ ( x ) = x e θ = λ g ( θ ) = x ! exp( x θ − e θ ) = [ e θ ] x e − e θ = λ x e − λ 1 p ( x ) = x ! x ! E [ x ] = ∂ θ g ( θ ) = e θ = λ θ g ( θ ) = e θ = λ Var [ x ] = ∂ 2 Leila Wehbe Exponential Families
Exponential Families MLE Write down likelihood: log p ( X | θ ) = log � n i =1 p ( x i | θ ) = � n i =1 < φ ( x i ) , θ > − g ( θ ) Differentiate: ∂ θ log p ( X ; θ ) = m [ 1 � n i =1 φ ( x i ) − E [ φ ( x )]] m 1 � n i =1 φ ( x i ) is the sample average. m Leila Wehbe Exponential Families
Exponential Families MLE For a bernoulli distribution: φ ( x ) = x and g ( θ ) = log(1 + e θ ) 1 � i x i E [ x ] = 1 + e − θ = N Leila Wehbe Exponential Families
Exponential Families Conjugate Priors Incorporate prior is similar to adding fake data: p ( θ ) ∝ p ( X fake | θ ) p ( θ | X ) ∝ p ( X | θ ) p ( X fake | θ ) = p ( X ∪ X fake | θ ) p ( θ | µ 0 , m 0 , X ) ∝ p ( θ | µ 0 , m 0 ) p ( X | θ ) m � ∝ exp( < m 0 µ 0 + φ ( x i ) , θ > − ( m 0 + m ) g ( θ )) i =1 Leila Wehbe Exponential Families
Exponential Families Conjugate Priors The prior is also in the exponential family: p ( θ | µ 0 , m 0 ) = exp( m 0 < µ 0 , θ > − m 0 g ( θ ) − h ( m 0 µ 0 , m 0 )) = exp( < φ ( θ ) , ρ > − h ( ρ )) where φ ( θ ) = ( θ, − g ( θ )) Leila Wehbe Exponential Families
Exponential Families MAP with generalized laplace smoothing 1 � n m i =1 φ ( x i ) + n + m µ 0 n + m For normal distribution: i =1 x i and σ 2 = 1 µ = 1 � n � n i =1 x 2 µ 2 MLE: ˆ i − ˆ n n MAP: n + m µ and σ 2 = � n � n 1 m 1 i =1 x 2 n 0 µ 2 µ = ˆ i =1 x i + i + n + n 0 I − ˆ n + n 0 n + n 0 Leila Wehbe Exponential Families
Exponential Families Conjugate Priors Leila Wehbe Exponential Families
Recommend
More recommend