Rare-event probability estimation and convex programming Z. I. Botev Rare-event probability estimation and convex programming – p.1/29
Problem formulation We consider the problem of estimating rare-event probabilities of the form ℓ = P ( S ( X ) > γ ) , X = ( X 1 , . . . , X d ) , where S : R d → R and X 1 , . . . , X d are random variables with joint density f ( x ) . Almost any high-dimensional integration problem can be cast into this framework: option pricing, insurance value at risk, ABC, Ising model in physics, queuing models, counting problems in operations research and CS. Dichotomy: Frequently, one can easily simulate the rare event, but it is not clear how one can use the simulated data to estimate ℓ . Rare-event probability estimation and convex programming – p.2/29
Bird’s eye view of existing methods importance sampling: state-dependent and state-independent; no need to simulate the rare-event under the original probability law in order to estimate efficiently the probability of its occurrence. adaptive importance sampling: cross entropy method conditioning splitting: splitting method typically yields exact or approximate realizations of the rare-event With the exception of splitting, all treat the problem of simulating the rare-event and estimating the rare-event probability as essentially separate problems. Rare-event probability estimation and convex programming – p.3/29
Standard importance sampling We propose to estimate the quantity ℓ using maximum likelihood methods, where the data upon which the likelihood function depends is generated from computer simulation of the rare-event under the original probability law. To introduce the idea it is convenient to think of ℓ as a normalization constant ℓ s of the conditional density f s ( x ) = f ( x ) I { S ( x ) > γ } = w s ( x ) . ℓ s ℓ s Rare-event probability estimation and convex programming – p.4/29
The typical importance sampling scheme Let f 1 ( x ) = w 1 ( x ) /ℓ 1 be another density whose normalizing constant ℓ 1 is known and { x : f 1 ( x ) > 0 } ⊇ { x : f s ( x ) > 0 } . Then, the natural estimator is n � f ( X j ) I { S ( X j ) > γ } s = 1 iid ℓ ∗ ∼ f 1 . , X 1 , . . . , X n n f 1 ( X j ) j =1 For good performance we need the tails of f 1 to be at least as heavy as the tails of f s . Frequently, the choice of f 1 is dictated by asymptotic analysis of ℓ ( γ ) as γ ↑ ∞ . Rare-event probability estimation and convex programming – p.5/29
The typical importance sampling scheme It is well known that the zero-variance importance sampling density for estimating ℓ s is the conditional pdf f s . So we may try to use f s itself as an importance sampling density. We consider the estimator n � ℓ s = 1 w s ( X j ) iid � ∼ ¯ , X 1 , . . . , X n f, λ 1 f 1 ( X j ) + λ s w s ( X j ) / � n ℓ s j =1 � �� � ≈ f s where the normalizing constant ℓ s on the right is replaced with � ℓ s , giving rise to a nonlinear equation for � ℓ s . Rare-event probability estimation and convex programming – p.6/29
Comparison of estimators Compare now the traditional importance sampling estimator with the proposed solve-the-equation-type. For the solve-the-equation-type the support and tail restrictions on f 1 are no longer necessary for good performance. The only requirement is that { x : f 1 ( x ) × f s ( x ) > 0 } � = ∅ , that is, the supports of f 1 and f s overlap. This is an example of using computer simulated data of the rare-event to obtain an M-type estimator of ℓ . Next, we try to generalize. Rare-event probability estimation and convex programming – p.7/29
Suppose we are given the sequence of densities f t ( x ) = w t ( x ) = f ( x ) H t ( x ) , t = 1 , . . . , s , ℓ t ℓ t where f is a known density, { H t } are known functions, and ℓ t are probabilities acting as normalizing constants to { w t } . We are interested in estimating ℓ k ′ = E f H k ′ ( X ) for some k ′ . We assume that for at least one f t , say f 1 , the corresponding normalizing constant ℓ 1 is known, and, without loss of generality, equal to unity. We call f 1 a reference density. Rare-event probability estimation and convex programming – p.8/29
Connectivity To proceed, suppose we are given a graph with s nodes and an edge between nodes i and j if and only if E f I { H i ( X ) > 0 } × I { H j ( X ) > 0 } > 0 . (1) We assume that there exists a path between any two nodes. We call the condition ( ?? ) on the supports of { f t } Vardi’s connectivity condition . Assume we have the iid sample iid ∼ f t ( x ) , X t, 1 , . . . , X t,n t t = 1 , . . . , s . Rare-event probability estimation and convex programming – p.9/29
Connectivity B A C D Rare-event probability estimation and convex programming – p.10/29
Mixture model Conceptually, same as sampling n = n 1 + · · · + n s variables with stratification from mixture s s � � f ( x ) = 1 def ¯ n t f t ( x ) = λ t f t ( x ) , λ t = n t /n . n t =1 t =1 Let the pooled sample be denoted via X 1 , . . . , X n , where the first n 1 samples are outcomes from f 1 , the next n 2 are samples from f 2 , and so on. Define the vector of parameters z = ( z 1 , . . . , z n ) = ( − ln(1 /λ 1 ) , − ln( ℓ 2 /λ 2 ) , . . . , − ln( ℓ s /λ s )) . Rare-event probability estimation and convex programming – p.11/29
Empirical Likelihood Estimator Now consider the likelihood of the observed data X t, 1 , . . . , X t,n t , t = 1 , . . . , s as a function of z : s n k s n k � � � � w k ( X k,j ) f k ( X k,j ) = λ k e − z k j =1 j =1 k =1 k =1 s n k s n k � � � � w k ( X k,j ) ¯ f ( X k,j ) × = f ( X k,j ) . λ k e − z k ¯ j =1 j =1 k =1 k =1 Rare-event probability estimation and convex programming – p.12/29
Empirical Likelihood Estimator The last yields the partial log-likelihood as a function of z : s n k � � w k ( X k,j ) ln f ( X k,j ) = λ k e − z k ¯ k =1 j =1 s n k � � ln( w k ( X k,j ) /λ k ) + z k − ln( ¯ = f ( X k,j )) j =1 k =1 n s � � ln( ¯ = const . − f ( X j )) + n k z k j =1 k =1 � s � n s � � � w k ( X j ) e z k = const . − ln + n λ k z k . j =1 k =1 k =1 Rare-event probability estimation and convex programming – p.13/29
Empirical Likelihood Estimator Under the connectivity condition of Vardi and iid assumption on the sample, it can be shown that the maximum of the partial log-likelihood is the same as the maximum of the complete likelihood. In other words, the unique nonparametric maximum likelihood estimate of z (and hence of ℓ ) solves the almost surely convex optimization program (with z 1 = ln λ 1 fixed) � � z = argmin D ( z ) z � s � n s � � � (2) D ( z ) def � w k ( X j ) e z k − = ln n k z k , j =1 k =1 k =1 Rare-event probability estimation and convex programming – p.14/29
Almost surely?? Define the matrix G i,j = E f I { H i ( X ) > 0 } × I { H j ( X ) > 0 } and its empirical counterpart � G i,j = 1 � I { H i ( X k ) > 0 } × I { H j ( X k ) > 0 } n k Under the connectivity condition, matrix G is irreducible, that is if for any pair ( i, j ) , we have � G k i,j > 0 for some k = 1 , 2 , . . . , , and since � G → G almost surely, with probability one the function � D ( z ) is a strictly convex and the maximum likelihood program has a unique solution. Rare-event probability estimation and convex programming – p.15/29
M-estimator & Method of Moments To execute the optimization, we compute the ( s − 1) × 1 gradient ∇ � D ( z ) , which is a vector with components: n � w t ( X j ) e z t [ ∇ � � s k =1 w k ( X j ) e z k − n t , D ] t ( z ) = t = 2 , . . . , s . j =1 Using the prior information that ℓ 1 = 1 or z 1 = ln( λ 1 ) , these estimating equations are equivalent to solving the s − 1 dimensional system for the unknown z 2 , . . . , z s : n � w t ( X j ) e � z t 1 � s z k = λ t , t = 2 , . . . , s . n k =1 w k ( X j ) e � j =1 Rare-event probability estimation and convex programming – p.16/29
Moment-matching master equation The last bit is equivalent to: n � A t,j � ℓ t = , A t,j = H t ( X j ) , t = 2 , . . . , s . � s k =1 A k,j n k / � ℓ k j =1 (3) It is sometimes easier to directly minimize � D , instead of solving the nonlinear system. The system can be solved using Jacobi/Gauss-Seidel type iteration. The process is similar to finding eigenvalues via power iteration. Rare-event probability estimation and convex programming – p.17/29
Jacobi/Gauss-Seidel iteration algorithm Require: Matrix A and initial starting point ℓ = ( ℓ 1 , . . . , ℓ s ) = (1 , . . . , 1) Set ε = ∞ and ℓ ∗ ← ℓ while ε > 10 − 10 do for i = 2 , . . . , s do n � A i,j � s ℓ i ← k =1 A k,j n k /ℓ ∗ k j =1 | ℓ i − ℓ ∗ i | ε ← max i ℓ i Set ℓ ∗ ← ℓ return The vector of estimated probabilities � ℓ ← ℓ . Rare-event probability estimation and convex programming – p.18/29
Recommend
More recommend