Advanced Simulation - Lecture 3 Patrick Rebeschini January 22nd, 2018 Patrick Rebeschini Lecture 3 1/ 23
From a statistical problem to a sampling problem From a statistical model you get a likelihood function and a prior on the parameters. Applying Bayes rule, you are interested in L (observations; θ ) p ( θ ) π ( θ | observations) = � Θ L (observations; θ ) p ( θ ) dθ. Inference ≡ integral w.r.t. posterior distribution. Integrals can be approximated by Monte Carlo. For Monte Carlo you need samples. Today: inversion, transformation, composition, rejection. Patrick Rebeschini Lecture 3 2/ 23
Inversion Method Consider a real-valued random variable X and its associated cumulative distribution function (cdf) F ( x ) = P ( X ≤ x ) = F ( x ) . The cdf F : R → [0 , 1] is increasing; i.e. if x ≤ y then F ( x ) ≤ F ( y ), right continuous; i.e. F ( x + ε ) → F ( x ) as ε → 0 + , F ( x ) → 0 as x → −∞ and F ( x ) → 1 as x → + ∞ . We define the generalised inverse F − ( u ) = inf { x ∈ R ; F ( x ) ≥ u } also known as the quantile function. Patrick Rebeschini Lecture 3 3/ 23
Inversion Method 1 F ( x ) u F − ( u ) x Figure: Cumulative distribution function F and representation of the inverse cumulative distribution function. Patrick Rebeschini Lecture 3 4/ 23
Inversion Method Proposition . Let F be a cdf and U ∼ U [0 , 1] . Then X = F − ( U ) has cdf F . In other words, to sample from a distribution with cdf F , we can sample U ∼ U [0 , 1] and then return F − ( U ). Proof . F − ( u ) ≤ x ⇔ u ≤ F ( x ) so for U ∼ U [0 , 1] , we have � = P ( U ≤ F ( x )) = F ( x ) . � F − ( U ) ≤ x P Patrick Rebeschini Lecture 3 5/ 23
Examples Exponential distribution . If F ( x ) = 1 − e − λx , then F − ( u ) = F − 1 ( u ) = − log (1 − u ) /λ . Thus when U ∼ U [0 , 1] , − log (1 − U ) /λ ∼ E xp ( λ ) and − log ( U ) /λ ∼ E xp ( λ ). Discrete distribution . Assume X takes values x 1 < x 2 < · · · with probability p 1 , p 2 , ... so � F ( x ) = p k , x k ≤ x F − ( u ) = x k for p 1 + · · · + p k − 1 < u ≤ p 1 + · · · + p k . Patrick Rebeschini Lecture 3 6/ 23
Transformation Method Let Y ∼ q be a Y -valued random variable that we can simulate (e.g., by inversion) Let X ∼ π be X -valued, which we wish to simulate. It may be that we can find a function ϕ : Y → X with the property that if we simulate Y ∼ q and then set X = ϕ ( Y ) then we get X ∼ π. Inversion is a special case of this idea. Patrick Rebeschini Lecture 3 7/ 23
Transformation Method Gamma distribution . Let Y i , i = 1 , 2 , ..., α , be i.i.d. with Y i ∼ E xp (1) and X = β − 1 � α i =1 Y i then X ∼ G a ( α, β ). Proof . The moment generating function of X is � e tX � α � � � e β − 1 tY i = (1 − t/β ) − α E = E i =1 which is the MGF of the gamma density π ( x ) ∝ x α − 1 exp ( − βx ) of parameters α, β . Beta distribution . See Lecture Notes. Patrick Rebeschini Lecture 3 8/ 23
Transformation Method - Box-Muller Algorithm Gaussian distribution. Let U 1 ∼ U [0 , 1] and U 2 ∼ U [0 , 1] be independent and set � − 2 log ( U 1 ) , ϑ = 2 πU 2 . R = We have X = R cos ϑ ∼ N (0 , 1) , Y = R sin ϑ ∼ N (0 , 1) . � � Indeed R 2 ∼ E xp 1 and ϑ ∼ U [0 , 2 π ] so 2 � 1 � � � = 1 r 2 , θ − r 2 / 2 q 2 exp 2 π. Patrick Rebeschini Lecture 3 9/ 23
Transformation Method - Box-Muller Algorithm Bijection: � √ � √ r 2 cos θ, r 2 sin θ ( x, y ) = � � � � x 2 + y 2 , arctan ( y/x ) r 2 , θ ⇔ = so � � � � r 2 , θ � � � � � det ∂ � � r 2 , θ π ( x, y ) = q � � � ∂ ( x, y ) where � � � �� � � r 2 , θ � − 1 � � � � cos θ � det ∂ − r sin θ � = 1 � � � � 2 r = � det 2 . � � � � sin θ � r cos θ ∂ ( x, y ) 2 r Hence we have � � x 2 + y 2 � � π ( x, y ) = 1 2 π exp − / 2 . Patrick Rebeschini Lecture 3 10/ 23
Transformation Method - Multivariate Normal Let Z = ( Z 1 , ..., Z d ) i.i.d. ∼ N (0 , 1). Let L be a real invertible d × d matrix satisfying L L T = Σ, and X = LZ + µ . Then X ∼ N ( µ, Σ) . � � We have indeed q ( z ) = (2 π ) − d/ 2 exp 2 z T z − 1 and π ( x ) = q ( z ) | det ∂z/∂x | � L − 1 � = det (Σ) − 1 / 2 . where ∂z/∂x = L − 1 and det Additionally, z T z = ( x − µ ) T � L − 1 � T L − 1 ( x − µ ) = ( x − µ ) T Σ − 1 ( x − µ ) . In practice, use a Cholesky factorization Σ = L L T where L is a lower triangular matrix. Patrick Rebeschini Lecture 3 11/ 23
Sampling via Composition Assume we have a joint pdf π with marginal π ; i.e. � π ( x ) = π X,Y ( x, y ) dy where π ( x, y ) can always be decomposed as π X,Y ( x, y ) = π Y ( y ) π X | Y ( x | y ) . It might be easy to sample from π ( x, y ) whereas it is difficult/impossible to compute π ( x ) . In this case, it is sufficient to sample Y ∼ π Y then X | Y ∼ π X | Y ( ·| Y ) so ( X, Y ) ∼ π X,Y and hence X ∼ π . Patrick Rebeschini Lecture 3 12/ 23
Finite Mixture of Distributions Assume one wants to sample from p � π ( x ) = α i .π i ( x ) i =1 � π i ( x ) dx = 1 . where α i > 0 , � p i =1 α i = 1 and π i ( x ) ≥ 0 , We can introduce Y ∈ { 1 , ..., p } and π X,Y ( x, y ) = α y × π y ( x ) . To sample from π ( x ), first sample Y from a discrete distribution such that P ( Y = k ) = α k then X | ( Y = y ) ∼ π y . Patrick Rebeschini Lecture 3 13/ 23
Rejection Sampling Basic idea : Sample from a proposal q different from the target π and correct through rejection step to obtain a sample from π . Algorithm (Rejection Sampling) . Given two densities π, q with π ( x ) ≤ M q ( x ) for all x , we can generate a sample from π by 1 Draw X ∼ q , draw U ∼ U [0 , 1] . 2 Accept X = x as a sample from π if π ( x ) U ≤ M q ( x ) , otherwise go to step 1. Patrick Rebeschini Lecture 3 14/ 23
Rejection Sampling Proposition . The distribution of the samples accepted by rejection sampling is π . Proof . We have for any (measurable) set A P ( X ∈ A | X accepted) = P ( X ∈ A, X accepted) P ( X accepted) where � � � � 1 π ( x ) P ( X ∈ A, X accepted) = I A ( x ) I u ≤ q ( x ) dudx M q ( x ) X 0 � π ( x ) = I A ( x ) M q ( x ) q ( x ) dx X � I A ( x ) π ( x ) M dx = π ( A ) = . M X Patrick Rebeschini Lecture 3 15/ 23
Rejection Sampling So P ( X accepted) = P ( X ∈ X , X accepted) = π ( X ) = 1 M M and P ( X ∈ A | X accepted) = π ( A ) . Rejection sampling produces samples from π . It requires to be able to evaluate the density of π point-wise, and an upper bound M on π ( x ) /q ( x ). Patrick Rebeschini Lecture 3 16/ 23
Rejection Sampling In most practical scenarios, we only know π and q up to some normalising constants; i.e. π = � π/Z π and q = ˜ q/Z q � � where � π, ˜ q are known but Z π = π ( x ) dx , Z q = X ˜ q ( x ) dx X � are unknown. If Z π , Z q are unknown but you can upper bound: q ( x ) ≤ � π ( x ) / � � M, q and � then using � π , � M in the algorithm is correct. Indeed we have π ( x ) M ⇔ π ( x ) M Z q � q ( x ) ≤ � q ( x ) ≤ � = M. Z π � Patrick Rebeschini Lecture 3 17/ 23
Rejection Sampling Let T denote the number of pairs ( X, U ) that have to be generated until X is accepted for the first time. Lemma . T is geometrically distributed with parameter 1 /M and in particular E ( T ) = M. In the unnormalised case, this yields P ( X accepted) = 1 Z π M = , � MZ q E ( T ) = M = Z q � M , Z π and it can be used to provide unbiased estimates of Z π /Z q and Z q /Z π . Patrick Rebeschini Lecture 3 18/ 23
Examples Uniform density on a bounded subset of R p . Consider the problem of sampling uniformly over B ⊂ R p , a bounded subset of R p : π ( x ) ∝ I B ( x ) . Let R be a rectangle with B ⊂ R and q ( x ) ∝ I R ( x ) . Then we can use � M = 1 and � � � M ′ � π ( x ) / � q ( x ) = I B ( x ) . The probability of accepting a sample is then Z π /Z q . Patrick Rebeschini Lecture 3 19/ 23
Examples � 2 x 2 � − 1 Normal density . Let � π ( x ) = exp and � 1 + x 2 � . We have q ( x ) = 1 / � � � � 1 + x 2 � ≤ 2 / √ e = � π ( x ) � − 1 2 x 2 q ( x ) = exp M � which is attained at ± 1. The acceptance probability is � � √ � e π ( X ) Z π 2 π � U ≤ 2 π ≈ 0 . 66 , P = = √ e π = 2 � � M � q ( X ) MZ q and the mean number of trials to success is approximately 1 / 0 . 66 ≈ 1 . 52 . Patrick Rebeschini Lecture 3 20/ 23
Examples: Genetic linkage model We observe � � n ; 1 2 + θ 4 , 1 4 (1 − θ ) , 1 4 (1 − θ ) , θ ( Y 1 , Y 2 , Y 3 , Y 4 ) ∼ M 4 where M is the multinomial distribution and θ ∈ (0 , 1) . The likelihood of the observations is thus p ( y 1 , ..., y 4 ; θ ) � 1 � y 1 � 1 � y 2 + y 3 � θ � y 4 n ! 2 + θ = 4 (1 − θ ) y 1 ! y 2 ! y 3 ! y 4 ! 4 4 ∝ (2 + θ ) y 1 (1 − θ ) y 2 + y 3 θ y 4 . Bayesian approach where we select p ( θ ) = I [0 , 1] ( θ ) and are interested in p ( θ | y 1 , ..., y 4 ) ∝ (2 + θ ) y 1 (1 − θ ) y 2 + y 3 θ y 4 I [0 , 1] ( θ ) . Patrick Rebeschini Lecture 3 21/ 23
Recommend
More recommend