Time-delay reservoir computers: nonlinear stability of functional differential systems and optimal nonlinear information processing capacity. Applications to stochastic nonlinear time series forecasting. Lyudmila Grigoryeva 1 , Julie Henriques 2 , Laurent Larger 2 , Juan-Pablo Ortega 3 , 4 1 Universit¨ at Konstanz, Germany 2 Universit´ e Bourgogne Franche-Comt´ e, France 3 Universit¨ at Sankt Gallen, Switzerland 4 CNRS, France L. Grigoryeva, J. Henriques, L. Larger, J.-P. Ortega ( Universit¨ Time-delay reservoir computers at Konstanz, Germany, Universit´ e Bourgogne Franche-Comt´ DarrylFest, July, 2017 e, France, Universit 1 / 71 Financial and Insurance Mathematics Seminar
Outline 1 Machine learning in a nutshell: Discrete vs continuous time Deterministic vs stochastic 2 Static problems, neural networks, and approximation theorems 3 Dynamic problems and reservoir computing 4 Universality theorems The control theoretical approach The filter/operators approach 5 Time-delay reservoir computers Hardware realizations, scalability, and big data compatibility Models and performance estimations 6 Application examples L. Grigoryeva, J. Henriques, L. Larger, J.-P. Ortega ( Universit¨ Time-delay reservoir computers at Konstanz, Germany, Universit´ e Bourgogne Franche-Comt´ DarrylFest, July, 2017 e, France, Universit 2 / 71
Machine learning in a nutshell Machine learning in a nutshell We approach to machine learning as an input/output problem. Input: it is denoted by the character z . It contains available information for the solution of the problem (historical data, explanatory factors, features of the individuals that need to be classified). Output: denoted generically by y . Contains the solution of the problem (forecasted data, explained variables, classification results). Purely empirical approach not based on first principles but on a training/testing routine. We distinguish between static/discrete-time and continuous-time setups and between deterministic and stochastic situations since they lead to very different levels of mathematical complexity. L. Grigoryeva, J. Henriques, L. Larger, J.-P. Ortega ( Universit¨ Time-delay reservoir computers at Konstanz, Germany, Universit´ e Bourgogne Franche-Comt´ DarrylFest, July, 2017 e, France, Universit 3 / 71
Machine learning in a nutshell Examples Deterministic setup : an explicit functional relation (via a just measurable function) is assumed between input and output. Static/Discrete-time: observables or diagnostics variables in complex physical or noiseless engineering systems (domotics), translators, memory tasks, games. Continuous time: integration or path continuation of (chaotic) differential equations: molecular dynamics, structural mechanics, vibration analysis, space mission design. Autopilot systems, robotics. Stochastic setup : the input and the output are random variables or processes and only probabilistic dependence is assumed between them. Static/Discrete-time: image classification, speech recognition, time series forecasting, volatility filtering, factor analysis. Continuous time: physiological time series classification, financial bubble detection. L. Grigoryeva, J. Henriques, L. Larger, J.-P. Ortega ( Universit¨ Time-delay reservoir computers at Konstanz, Germany, Universit´ e Bourgogne Franche-Comt´ DarrylFest, July, 2017 e, France, Universit 4 / 71
Machine learning in a nutshell Setups considered Static/Discrete time Continuous time Deterministic Stochastic Deterministic Stochastic z and y are R n and R q -valued z ∈ R n z ∈ ( L 2 (Ω , F , P )) n z ∈ C ∞ ([ a , b ] , R n ) Characterization of processes adapted with respect y ∈ R q y ∈ ( L 2 (Ω , F , P )) q y ∈ C ∞ ([ a , b ] , R n ) ingredients to a given filtration F Problem to be y = f ( z ) E [ y | z ] y ( · ) = F ( z ( · )) E [ y ( · ) | z ( · )] solved f measurable Object to be Real/complex Conditional Functional/Operator Stochastic trained function expectation Causal Filter Causal Filter Approach and (Semi)-parametric Functional data analysis and Approximation Control theory source of statistics Stochastic control theory Stone-Weierstraß Universality Kalman filter theory L. Grigoryeva, J. Henriques, L. Larger, J.-P. Ortega ( Universit¨ Time-delay reservoir computers at Konstanz, Germany, Universit´ e Bourgogne Franche-Comt´ DarrylFest, July, 2017 e, France, Universit 5 / 71
Static problems, neural networks, and approximation theorems The deterministic case Neural networks Input Hidden Output w 1 w 2 layer layer layer Input z 1 Input z 2 Output y Input z 3 Input z 4 5 4 � � w 2 w 1 , ψ sigmoid function . y = ψ i ψ ij z j (1) i =1 j =1 L. Grigoryeva, J. Henriques, L. Larger, J.-P. Ortega ( Universit¨ Time-delay reservoir computers at Konstanz, Germany, Universit´ e Bourgogne Franche-Comt´ DarrylFest, July, 2017 e, France, Universit 6 / 71
Static problems, neural networks, and approximation theorems The deterministic case Universality in neural networks and approximation theorems Neural networks are implemented as a machine learning device by tuning the weights w i using a gradient descent algorithm (backpropagation) that minimizes the approximation error based on a training set. In the deterministic case, the objective is to recover an explicit functional relation between input and output. In the absence of noise there is not danger of overfitting. L. Grigoryeva, J. Henriques, L. Larger, J.-P. Ortega ( Universit¨ Time-delay reservoir computers at Konstanz, Germany, Universit´ e Bourgogne Franche-Comt´ DarrylFest, July, 2017 e, France, Universit 7 / 71
Static problems, neural networks, and approximation theorems The deterministic case Universality problem: how large is the class of input-output functions that can be generated using feedforward neural networks as in (1)? Hilbert’s 13th problem on multivariate functions: can any continuous function of three variables be expressed as a composition of finitely many continuous functions of two variables? This question is a generalization of the original problem for algebraic functions posed in the 1900 ICM in Paris and in [Hil27] L. Grigoryeva, J. Henriques, L. Larger, J.-P. Ortega ( Universit¨ Time-delay reservoir computers at Konstanz, Germany, Universit´ e Bourgogne Franche-Comt´ DarrylFest, July, 2017 e, France, Universit 8 / 71
Static problems, neural networks, and approximation theorems The deterministic case The Kolmogorov-Arnold representation theorem and Kolmogorov-Sprecher networks Theorem (Kolmogorov-Arnold [Kol56, Arn57]) There exist fixed continuous increasing functions ϕ p , q ( x ) on I = [0 , 1] such that each continuous function f on I n can be written as 2 n +1 n � � f ( x 1 , . . . , x n ) = g q ϕ pq ( x p ) q =1 p =1 where the g q are properly chosen continuous functions of one variable. This amounts to saying that the only genuinely multivariate function is the sum! This is a representation and not an approximation theorem L. Grigoryeva, J. Henriques, L. Larger, J.-P. Ortega ( Universit¨ Time-delay reservoir computers at Konstanz, Germany, Universit´ e Bourgogne Franche-Comt´ DarrylFest, July, 2017 e, France, Universit 9 / 71
Static problems, neural networks, and approximation theorems The deterministic case Theorem (Sprecher [Spr65, Spr96, Spr97]) There exist constants λ p and fixed continuous increasing functions ϕ q ( x ) on I = [0 , 1] such that each continuous function f on I n can be written as 2 n +1 n � � f ( x 1 , . . . , x n ) = g q λ p ϕ q ( x p ) q =1 p =1 where the g q are properly chosen continuous functions of one variable. The g q functions depend on f but not λ p and ϕ q . All the information contained in the multivariable continuous function f is contained in the single variable continuous functions g q . This is not ideal for machine learning applications because we would need to train the g q functions. It still can be done (see the CMAC in [CG92]) L. Grigoryeva, J. Henriques, L. Larger, J.-P. Ortega ( Universit¨ Time-delay reservoir computers at Konstanz, Germany, Universit´ e Bourgogne Franche-Comt´ DarrylFest, July, 2017 e, France, Universit 10 / 71
Static problems, neural networks, and approximation theorems The deterministic case The Kolmogorov-Sprecher network (taken from [CG92]) L. Grigoryeva, J. Henriques, L. Larger, J.-P. Ortega ( Universit¨ Time-delay reservoir computers at Konstanz, Germany, Universit´ e Bourgogne Franche-Comt´ DarrylFest, July, 2017 e, France, Universit 11 / 71
Static problems, neural networks, and approximation theorems The deterministic case The Cybenko and the Hornik et al. theorems Definition A squashing function is a map ψ : R → [0 , 1] that is non-decreasing and that λ →−∞ ψ ( λ ) = 0 lim and λ →∞ ψ ( λ ) = 1 lim L. Grigoryeva, J. Henriques, L. Larger, J.-P. Ortega ( Universit¨ Time-delay reservoir computers at Konstanz, Germany, Universit´ e Bourgogne Franche-Comt´ DarrylFest, July, 2017 e, France, Universit 12 / 71
Recommend
More recommend