Foundations of Chemical Kinetics Lecture 25: The Gillespie stochastic simulation algorithm Marc R. Roussel Department of Chemistry and Biochemistry
The chemical master equation vs simulations ◮ As noted earlier, the chemical master equation (CME) is only solvable in some special cases. ◮ Probability distributions are a bit abstract for most people. ◮ Simulations can be carried out for any reaction and they’re often easier to understand. ◮ Tradeoff: You need to do a lot of simulations to generate statistical data (probability distributions or averages, standard deviations, etc.).
The basic ideas behind Gillespie stochastic simulations ◮ In a simulation, we follow one possible set of reactive events and the consequent changes in molecular populations. This is called a realization of a stochastic process. ◮ In the simulation, at any time t , we have a specific composition and therefore specific values of the propensities. ◮ The propensities therefore become reaction probabilities per unit time (not conditional probabilities as in the CME). ◮ Generate random reactive events consistent with CME statistics. ◮ Random reaction time ◮ Randomly selected reaction ◮ Every time a reaction occurs, update the numbers of molecules. This affects the reaction propensities.
Selecting a random reaction time ◮ Each propensity is the probability per unit time that a specific reaction occurs. ◮ The probability per unit time that any reaction occurs is just the sum of the propensities. � a 0 ≡ a r r ◮ The probability per unit time of a reaction occurring is constant until a reaction occurs. ◮ A constant probability per unit time implies exponential decay of the probability that a reaction has not occurred yet. P unreacted = e − a 0 ( t − t ref ) where t ref is some reference time (e.g. the time at which the last reaction occurred).
Selecting a random reaction time (continued) ◮ The cumulative distribution for the probability of reaction is therefore P reacted = 1 − e − a 0 ( t − t ref ) ◮ The distribution of reaction times is therefore p ( t ) = dP reacted = a 0 e − a 0 ( t − t ref ) dt ◮ Thus, we need to generate exponentially distributed reaction times.
Selecting a random reaction time (continued) ◮ Most programmable computer packages (from spreadsheets to full-blown programming languages) have a uniform random number generator that generates pseudo-random numbers in the range [0 , 1). ◮ There is a standard trick for converting a uniform random number r 1 into an exponentially distributed random number: � 1 t react = 1 � ln a 0 r 1
Selecting a random reaction ◮ Suppose that we have just two reactions. If reaction 1 has probability per unit time (propensity) of a 1 and reaction 2 has probability per unit time of a 2 , then the probability that the next reaction to occur is reaction 1 is a 1 / ( a 1 + a 2 ). ◮ In general, the probability that the next reaction is reaction r is a r / a 0 . ◮ To pick a random reaction, ◮ “Line up” the fractions a r / a 0 in the interval [0 , 1]. ◮ Choose a random number, r 2 , between 0 and 1. ◮ Figure out in which reaction interval r 2 falls. r 2 a a a a 1 2 3 r ... a a a a 0 0 0 0 0 1
Selecting a random reaction r 2 a a a a 1 2 3 r ... a a a a 0 0 0 0 0 1 ◮ Using r 2 as a “pointer” to determine which reaction occurs next is equivalent to finding the value of µ for which the following inequality is satisfied: µ − 1 µ � � a r < r 2 a 0 ≤ a r r =1 r =1 ◮ Alternatively, we are looking for the smallest µ for which µ � a r > r 2 a 0 r =1
The Gillespie algorithm 1. Setup: Store initial populations and rate constants, set t = 0, etc. 2. Calculate reaction propensities. 3. Generate two uniform random numbers, r 1 and r 2 . 4. Calculate t react , the time to next reaction, using r 1 . 5. Determine the next reaction using r 2 . 6. Add t react to t . 7. Update the populations based on the reaction chosen. 8. Go to step 2 until some chosen stopping criterion is reached (exhaustion of a chemical, target simulation time reached, . . . )
Statistics ◮ If we want the kind of information we can get from the CME, we need to repeat the simulation many times (with different sequences of pseudo-random numbers each time), and then average across realizations. ◮ If we are interested in the stationary distribution, we can also run one simulation for a very long time and get a time average. ◮ The two procedures described above are equivalent if a stochastic system has the property of ergodicity. A Markov chain (Markov process on a set of discrete states) is ergodic if any state can be reached from any other state in some finite number of steps.
Statistics ◮ Whichever procedure we adopt, we need a lot of data to get reasonable estimates of system statistics: Convergence of statistics goes as N − 1 / 2 , where N is the number of samples (time points or realizations). Adding a significant figure (decrease in uncertainty by a factor of 10) therefore requires that we increase the amount of data by a factor of 100.
Recommend
More recommend