VARIANCE REDUCTION FOR CHEMICAL REACTION NETWORKS WITH ARRAY-RQMC Florian Puchhammer Université de Montréal, Canada MCM 2019, Sydney July 12, 2019 Joint work with Amal Ben Abdellah and Pierre L’Ecuyer
Chemical reaction networks Consider chemical species S 1 , S 2 , . . . , S ℓ that react via reactions R 1 , R 2 , . . . , R K with reaction rates c 1 , c 2 , . . . , c K . c k R k : α 1 ,k S 1 + · · · + α ℓ,k S ℓ − → β 1 ,k S 1 + · · · + β ℓ,k S ℓ , α i,k , β i,k ∈ N Goal: Simulate system over time t ∈ [0 , T ] and observe (a function g of) copy numbers X t = ( X 1 ,t , X 2 ,t , . . . , X ℓ,t ) of the species. 1/20
The τ -leap algorithm c k α 1 ,k S 1 + · · · + α ℓ,k S ℓ → β 1 ,k S 1 + · · · + β ℓ,k S ℓ − Simulate with fixed step τ -leap algorithm, Gillespie ’01. Summarizes several reactions into one time step of length τ � faster, but introduces bias (negative copy numbers). � Poisson variables p k model how often reaction R k fires in [ t, t + τ ) . � Mean parameters of p k are a k ( X t ) τ , where a k are the propensity functions. � a k ’s are polynomials in X t of degree α 1 ,k + α 2 ,k + · · · + α ℓ,k . In each step, generate p 1 , . . . , p K and update copy numbers through K � X i,t + τ = X i,t + p k · ( β i,k − α i,k ) . k =1 This is a discrete time Markov chain (DTMC), with ℓ -dimensional state space using K random variates to advance the chain by one step. 2/20
Simulation Goal: Estimate µ = E [ g ( X T )] . MC: Generate n points U i ∼ U (0 , 1) K · T/τ . � Each U i simulates one chain � X i,T . � Estimate n − 1 µ ≈ 1 � g ( X i,T ) . n i =0 RQMC: Generate n points U i ∈ (0 , 1) K · T/τ such that: � Each U i is uniformly distributed in (0 , 1) K · T/τ . � The point set { U 0 , U 1 , . . . , U n − 1 } has low discrepancy. � Then, estimate as above. 3/20
Previous work Beentjes, Baker ’18: Traditional RQMC for τ -leaping. Found that, using n RQMC points, the convergence rate is not better than for MC, n − 1 , even when ℓ, K, T/τ were not too big. Possible problems: � Discontinuities because of integer values. � High dimensional point sets needed, dim = K · T/τ (actually effective dimension important). Possible solution: array-RQMC. � Can work well even when discontinuities appear. � Reduces the dimensionality of the points to ℓ + K . 4/20
Array-RQMC Simulation of n chains with n RQMC points. Traditional approach: One point for the trajectory of one chain. Simulates chains sequentially � high dimension K · T/τ . Array-RQMC: One point to advance one chain by one step. Advances all n chains in parallel at each step � lower dimension ℓ + K . � Idea: the distance between empirical and theoretical distribution of states is small at each step. � achieved by matching the states to first ℓ coordinates of point set via (multivariate) sort after each step. � empirical mean of a n evaluations of function g at the final states is unbiased estimator for the mean of g ( X T ) . � empirical variance of m independent repetitions is unbiased estimator for the variance of the mean estimator. 5/20
Multivariate sorting algorithms If ℓ = 1 sort chains by size. If ℓ > 1 : Split sort: Split states into 2 packets by size of first coordinate. Split each packet into 2 by size of second coordinate. . . Batch sort: Split states into n 1 packets by size of first coordinate. Split each packet into n 2 by size of second coordinate. . . Hilbert curve sort: Map states to [0 , 1] ℓ . Sort the states in the order they are visited by (discrete version of) hilbert curve. Here, we use sample data to estimate mean and variance, normalize each state, and use the CDF of standard normal distribution to map states to [0 , 1] ℓ . Customized sorts: Ideally, find importance function h j : R ℓ → R for step j , then sort by size of h j ( X jτ ) . Might increase efficiency! 6/20
Experimental setting In every experiment, we observe copy number of one specific species. Point sets: � MC: Plain monte carlo � Lat+Baker: Lattice rule with random shift mod 1 and baker transformation. � Sob+LMS: Sobol’ points with a left matrix scramble and a digital shift. � Sob+NUS: Sobol’ points with Owen’s nested uniform scramble. Observed statistics: We replicate each experiment m = 100 times independently. � VRF20: Variance reduction factor in comparison to MC for n = 2 20 points. � ˆ β : Variance convergence rate n − ˆ β estimated by regression n = 2 13 , 2 14 , . . . , 2 20 . Goal: Show that � array-RQMC can beat MC-variance rate ˆ β > 1 , � customized sorts can improve your results (proof of concept). 7/20
The Schlögl system Model with 3 species S 1 , S 2 , S 3 and K = 4 reactions. We observe S 1 . c 1 2 S 1 + S 2 − → 3 S 1 , c 2 3 S 1 − → 2 S 1 + S 2 . c 3 S 3 − → S 1 c 4 S 1 − → S 3 Note: X 1 ,t + X 2 ,t + X 3 ,t = const., therefore ℓ = 2 . Expect standard sorts to work well. Simulation: X 0 = { 250 , 10 5 , 2 · 10 5 } , c 1 = 3 · 10 − 7 , c 2 = 10 − 4 , c 3 = 10 − 3 , c 4 = 3 . 5 , T = 4 , τ = T/ 15 . Array-RQMC reduces points’ dimension from KT/τ = 60 to ℓ + K = 6 or 5. 8/20
Customized sorts revisited Observe that ℓ = 2 , so we can plot final copy number of S 1 vs. current state at time jτ � importance function h j . � Simulate 2 20 sample chains with MC. � Fit polynomial to this data. � To reduce terms: take propensity functions and current copy number of observed species. Here ( N 0 is total number of molecules): a 1 ( x, y ) = 1 a 2 ( x, y ) = 1 2 c 1 x ( x − 1) y, 6 c 2 x ( x − 1)( x − 2) , a 3 ( x, y ) = c 3 ( N 0 − x − y ) , a 4 ( x, y ) = c 4 x. Take h j ( x, y ) = a j, 0 + a j, 1 x + a j, 2 y + a j, 3 x 2 + a j, 4 xy + a j, 5 x 3 + a j, 6 x 2 y. 9/20
The sample data at step jτ and fitted h j , j = 5 , 10 , 14 . 600 600 500 X 1 ( T ) X 1 ( T ) 400 400 200 200 0 0 0 9 . 95 9 . 8 9 . 9 600 9 . 8 400 400 9 . 6 400 · 10 4 · 10 4 9 . 85 200 200 · 10 4 200 X 2 (10 τ ) X 2 (14 τ ) X 2 (5 τ ) X 1 (5 τ ) X 1 (10 τ ) X 1 (14 τ ) 10/20
For customized sorts we can follow two strategies: Multi step: Fit h j in every step. � Expected to be more representative. Last step: Fit h j = h to last step only. Use this h in every step. � Maybe less impact of randomness on data. 11/20
Schlögl system – results c 1 c 3 − → − → 2 S 1 + S 2 ← c 2 3 S 1 , − S 3 ← c 4 S 1 . − ˆ Sort Sample β VRF20 Sob+NUS MC 1.00 1 0 Lat+Baker 1.30 4894 Last step Sob+LMS 1.36 7000 − 5 Sob+NUS 1.37 10481 log 2 (Var) Lat+Baker 1.01 178 Multi step Sob+LMS 1.07 206 − 10 Sob+NUS 0.99 196 MC Lat+Baker 1.39 4421 − 15 Last step Batch Sob+LMS 1.48 9309 Multi step Sob+NUS 1.56 11150 Batch Lattice+Baker 1.68 2493 Hilbert − 20 Hilbert curve Sobol+LMS 1.51 4464 13 14 15 16 17 18 19 20 Sobol+NUS 1.53 4562 log 2 ( n ) 12/20
Problems with multi step sort Look at mean copy numbers per step with MC (left) and copy numbers of S 1 for n = 30 paths (right). MC n = 30 paths 258 600 256 500 254 copy number S 1 400 252 average 250 300 248 200 246 100 244 0 242 0 2 4 6 8 10 12 14 16 0 2 4 6 8 10 12 14 16 step step 13/20
cAMP aktivation of PKA model More challenging example: ℓ = 6 species, K = 6 reactions (Koh, Blackwell (’12); Strehl, Ilie (’15)) : c 1 − → PKA + 2cAMP ← c 2 PKA - cAMP 2 , − c 3 − → PKA - cAMP 2 + 2cAMP ← c 4 PKA - cAMP 4 , − c 5 − → PKA - cAMP 4 ← c 6 PKAr + 2PKAc . − Initial values: PKA : 33 · 10 3 , cAMP : 33 . 03 · 10 3 , others: 1 . 1 · 10 3 . Parameters: c 1 = 8 . 696 · 10 − 5 , c 2 = 0 . 02 , c 3 = 1 . 154 · 10 − 4 , c 4 = 0 . 02 , c 5 = 0 . 016 and c 6 = 0 . 0017 ; T = 5 · 10 − 5 , τ = T/ 15 . Array-RQMC reduces points’ dimension from KT/τ = 90 to ℓ + K = 12 or 7. 14/20
Begin with PKA and use same heuristics (sample paths, propensity functions, and current copy number of PKA ) for last step and multi step. ˆ Sort Sample β VRF20 � Variance rate aligns with MC rate. MC 1.00 1 Lat+Baker 1.05 2593 � Best result for Batch with Sob+LMS. Last Last step Sob+LMS 1.05 3334 step makes up for it with computation Sob+NUS 1.04 3126 time. Lat+Baker 1.03 2492 Multi step Sob+LMS 1.04 3307 � Reducing noise in sample paths improved Sob+NUS 1.01 3099 Last step up to VRF20= 4100 (Sob+NUS). Lat+Baker 1.05 2575 Batch Sob+LMS 1.06 3611 � Select “optimal” sequence of last step Sob+NUS 1.01 2735 and multi step sorts w.r.t. variance Lattice+Baker 1.00 2327 Hilbert curve Sobol+LMS 1.03 2920 increment at each step improved up to Sobol+NUS 1.02 2847 VRF20= 4300 (Sob+LMS). 15/20
Dimensionality is NOT the key Same model, but now we observe PKAr : c 1 − → PKA + 2cAMP ← c 2 PKA - cAMP 2 , − c 3 − → PKA - cAMP 2 + 2cAMP ← c 4 PKA - cAMP 4 , − c 5 − → PKA - cAMP 4 ← c 6 PKAr + 2PKAc . − � Analyzed the data and found indicator, that only PKAc is important for development of PKAr . � Tried to simply sort by PKAc . 16/20
Recommend
More recommend