data driven covid 19 modeling data driven covid 19
play

Data-driven COVID-19 modeling Data-driven COVID-19 modeling 1 - PowerPoint PPT Presentation

Data-driven COVID-19 modeling Data-driven COVID-19 modeling 1 Cyprien Neverov th August 28 , 2020 1 Student at IMT Mines Ales and intern at FAU Erlangen-Nurnberg under the supervision of Prof. Enrique Zuazua. Table of Contents Table of


  1. Data-driven COVID-19 modeling Data-driven COVID-19 modeling 1 Cyprien Neverov th August 28 , 2020 1 Student at IMT Mines Ales and intern at FAU Erlangen-Nurnberg under the supervision of Prof. Enrique Zuazua.

  2. Table of Contents Table of Contents 1. Data-driven system identi�cation 2. Modeling COVID-19 3. Conclusion

  3. 1. Data-driven system identi�cation 1. Data-driven system identi�cation Identifying the dynamics from data is becoming a key challenge because: Data acquisition is getting cheaper Problems are getting more complex Computational power is cheaper

  4. Sparse identi�cation of nonlinear dynamical systems Sparse identi�cation of nonlinear dynamical systems Approach proposed by S. Brunton in [1]. Uses sparse regression Relies on a set of candidate functions Expresses the dynamics as a function which is a linear combination of the candidate f functions: dx ( t ) = f ( x ( t )) dt where is the state of the system. x

  5. 1. Make two matrices 1. Make two matrices Let's say that we have observed the system at and either observed or t 1 t 2 , , . . . , t m numerically computed its time derivative at those time points, then we can construct the two following matrices: dx dt t 1 ( ) x ( ) t 1 ⎡ ⎤ ⎡ ⎤ dx ⎢ ⎥ dt t 2 ( ) x ( ) t 2 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ˙ ⎢ ⎥ X = and X = ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⋮ ⎥ ⋮ ⎢ ⎥ ⎣ ⎦ dx ⎣ ⎦ x ( t m ) dt t m ( )

  6. 2. Augment the state matrix 2. Augment the state matrix And then we can augment the matrix with the candidate functions this X f 1 f 2 , , … , f p will yield : θ ( X ) f 1 ( x ( )) t 1 f 2 ( x ( )) t 1 ⋯ f p ( x ( )) t 1 ⎡ ⎤ f 1 ( x ( )) t 2 f 2 ( x ( )) t 2 ⋯ f p ( x ( )) t 2 ⎢ ⎥ ⎢ ⎥ θ ( X ) = ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⋮ ⋮ ⋱ ⋮ ⎣ ⎦ f 1 ( x ( t m )) f 2 ( x ( t m )) ⋯ f p ( x ( t m ))

  7. 3. Solve the linear least squares 3. Solve the linear least squares Now we want to �nd a matrix that is a solution to: ξ ˙ X = θ ( X ) ξ in the least squares sense. The sparsity is achieved by running the optimization several times and gradually zeroing out the values that are under a cut-off value.

  8. Discretized formulation Discretized formulation This algorithm also works in an iterative manner, when instead of we have : ˙ X X 2 x ( ) x ( ) t 2 t 1 ⎡ ⎤ ⎡ ⎤ ⎢ x ( ) t 3 ⎥ ⎢ x ( ) t 2 ⎥ ⎢ ⎥ ⎢ ⎥ X 2 = and X = ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⋮ ⎥ ⎢ ⋮ ⎥ ⎣ ⎦ ⎣ ⎦ x ( ) x ( ) t m t m −1 and then we seek to solve X 2 = θ ( X ) ξ

  9. Simple example Simple example Oscillator with a cubic nonlinearity: x 3 y 3 x ˙ = −0.1 + 2 x 3 y 3 y ˙ = −2 − 0.1 From now on we will consider only polynomial terms of the variables as candidate functions: . x 2 y 2 x 3 y p 1, x , y , , xy , , , … ,

  10. In [15]: plt.figure(dpi=100) plt.plot(cubic_oscillator[:, 0], label='x') plt.plot(cubic_oscillator[:, 1], label='y') plt.legend() plt.xlabel('time') plt.ylabel('state') plt.title('Cubic oscillator') plt.show()

  11. Simple example Simple example In [27]: X_dot, X = make_targets(cubic_oscillator, derivative=(derivative:= False )) theta_X, _ = make_polynomials(X, max_degree=3) weights, _ = sparse_regression(theta_X, X_dot, cutoff=1e-3) if derivative: weights /= t_ode[1] show_weights(weights, derivative=derivative) function x k +1 y k +1 0 0 1 1.00577 -0.0013371 x 0 1.00572 y 0 0 x 2 0 0 xy 0 0 y 2 -0.00816922 -0.0645619 x 3 0 -0.00932543 x 2 y -0.00826491 0 xy 2 0.0654499 -0.00740647 y 3

  12. 2. 2. Modeling COVID-19 Modeling COVID-19 The trajectories of cumulative cases:

  13. In [15]: countries_to_display = ['China', 'South Korea', 'France', 'Spain', 'Germany', 'Uruguay'] plt.figure(dpi=130) for country in countries_to_display: values = ds.cumulative(country) plt.plot(values, label=country) plt.legend() plt.xlabel('days') plt.ylabel('cumulative cases') plt.show()

  14. Single country trajectory Single country trajectory

  15. In [31]: country = 'France' data = ds.cumulative(country, rescaling=100000)[:100] X_dot, X = make_targets(data[..., np.newaxis], derivative=(derivative:= True )) theta_X, _ = make_polynomials(X, max_degree=5) weights, _ = sparse_regression(theta_X, X_dot, cutoff=1e-15) show_weights(weights, derivative=derivative) show_trajectory(data, weights) function ˙ x -0.000900268 1 0.372269 x -1.44715 x 2 2.3851 x 3 -1.59452 x 4 0.36289 x 5

  16. Observations Observations A single trajectory can be described by only two or three parameters. The algorithm is not robust. We did not have the complete evolutions for most of the countries at the time of writing. How can we use the information from countries that are more advanced into the epidemic to make predictions for countries at a more early stage ?

  17. Multi-country model Multi-country model What if the evolution of the number of cases in several countries could be governed by a single formula? This would require additional information: Information about the countries, the cultures; Information about the measures taken by the governments to tackle the spread of the disease. We would like to �nd a function such that for any country , at any day , we have: f c t = f ( , ) x t +1, c x t , c i t , c Where is this additional information. i

  18. SINDy vs ARIMA SINDy vs ARIMA ARIMA (https://en.wikipedia.org/wiki/Autoregressive_integrated_moving_average) seems to be the preferred tool of statisticians to make COVID forecasting: [6, 7]. We compared the performance of our techniques to an ARIMA model for forecasting purposes:

  19. SINDy vs ARIMA SINDy vs ARIMA But it doesn't always work as expected.

  20. SINDy vs ARIMA SINDy vs ARIMA Percentage of best guesses Error on 1 week forecast horizon Error on 2 weeks forecast horizon

  21. Other approaches/experiments Other approaches/experiments Rational basis functions (https://kipre.github.io/�les/internship/reports/non- linear/nonlinear.html) Time dependent dynamics (https://kipre.github.io/�les/internship/reports/covid_time/index.html) Control measures (https://kipre.github.io/�les/internship/reports/covid_control/report.html) N-Beats and NNs SIR �tting and recovering More precedent states ( , ) - autoregression. x n −1 x n −2

  22. Conclusion Conclusion System identi�cation is challenging. In simple settings the SINDy algorithm learns well. In complex situations not so much. It was quite early to work on data-driven approaches. It is really sensitive to small hyperparameters changes. Sensitive to small data changes. Easily over�ts. Random events had a big impact on the evolutions.

  23. Other tasks Other tasks net2mat (https://github.com/Kipre/net2mat) - a small program to transform GASLIB .net �les to a MATLAB .mat . Written in C++. System identi�cation as a service (https://github.com/Kipre/�les/tree/master/internship/siaas) - using the implemented algorithms to deploy a service through a REST API for identifying systems from data.

  24. References References [1] Brunton, Steven L., Joshua L. Proctor, and J. Nathan Kutz. 2016. “Discovering Governing Equations from Data by Sparse Identi�cation of Nonlinear Dynamical Systems.” Proceedings of the National Academy of Sciences 113 (15): 3932–7. https://doi.org/10.1073/pnas.1517384113 (https://doi.org/10.1073/pnas.1517384113) . [2] Hale, Thomas, Sam Webster, Anna Petherick, Toby Phillips, and Beatriz Kira. 2020. “Oxford COVID-19 Government Response Tracker.” Blavatnik School of Government . https://github.com/OxCGRT/covid-policy-tracker/ (https://github.com/OxCGRT/covid- policy-tracker/) . [3] “Understanding the Coronavirus (COVID-19) Pandemic Through Data. World Bank.” n.d. http://datatopics.worldbank.org/universal-health-coverage/covid19/ (http://datatopics.worldbank.org/universal-health-coverage/covid19/) . [4] Kermack, William Ogilvy, A. G. McKendrick, and Gilbert Thomas Walker. 1997. “A Contribution to the Mathematical Theory of Epidemics.” Proc. R. Soc. Lond. 12: 700–721. https://doi.org/10.1098/rspa.1927.0118 (https://doi.org/10.1098/rspa.1927.0118) . [5] Andreas Kergassner, Christian Burkhardt, Dorothee Lippold, Sarah Nistler, Matthias Kergassner, Paul Steinmann, Dominik Budday, Silvia Budday. 2020. “Meso-scale modeling of COVID-19 spatio-temporal outbreak dynamics in Germany” medRxiv 2020.06.10.20126771; doi https://doi.org/10.1101/2020.06.10.20126771

  25. [6] Guorong Ding, Xinru Li, Yang Shen, Jiao Fan. 2020. “Brief Analysis of the ARIMA model on the COVID-19 in Italy” medRxiv 2020.04.08.20058636; doi: https://doi.org/10.1101/2020.04.08.20058636 [7] Lut� Bayyurt, Burcu Bayyurt. 2020. “Forecasting of COVID-19 Cases and Deaths Using ARIMA Models” medRxiv 2020.04.17.20069237; doi: https://doi.org/10.1101/2020.04.17.20069237 [8] Keimer, Alexander & P�ug, Lukas. (2020). “Modeling infectious diseases using integro- differential equations: Optimal control strategies for policy decisions and Applications in COVID-19”. 10.13140/RG.2.2.10845.44000. link (https://www.researchgate.net/publication/341265820_Modeling_infectious_diseases_usin differential_equations_Optimal_control_strategies_for_policy_decisions_and_Applications_in 19?channel=doi&linkId=5eb6577f299bf1287f77ed58&showFulltext=true)

  26. Appendix Appendix

Recommend


More recommend