model free stochastic perturbative adaptation and
play

Model-Free Stochastic Perturbative Adaptation and Optimization Gert - PowerPoint PPT Presentation

Model-Free Stochastic Perturbative Adaptation and Optimization Gert Cauwenberghs Johns Hopkins University gert@jhu.edu 520.776 Learning on Silicon http://bach.ece.jhu.edu/gert/courses/776 G. Cauwenberghs 520.776 Learning on Silicon


  1. Model-Free Stochastic Perturbative Adaptation and Optimization Gert Cauwenberghs Johns Hopkins University gert@jhu.edu 520.776 Learning on Silicon http://bach.ece.jhu.edu/gert/courses/776 G. Cauwenberghs 520.776 Learning on Silicon

  2. Model-Free Stochastic Perturbative Adaptation and Optimization OUTLINE • Model-Free Learning – Model Complexity – Compensation of Analog VLSI Mismatch • Stochastic Parallel Gradient Descent – Algorithmic Properties – Mixed-Signal Architecture – VLSI Implementation • Extensions – Learning of Continuous-Time Dynamics – Reinforcement Learning • Model-Free Adaptive Optics – AdOpt VLSI Controller – Adaptive Optics “Quality” Metrics – Applications to Laser Communication and Imaging G. Cauwenberghs 520.776 Learning on Silicon

  3. The Analog Computing Paradigm • Local functions are efficiently implemented with minimal circuitry, exploiting the physics of the devices. • Excessive global interconnects are avoided: – Currents or charges are accumulated along a single wire. – Voltage is distributed along a single wire. Pros: – Massive Parallellism – Low Power Dissipation – Real-Time, Real-World Interface – Continuous-Time Dynamics Cons: – Limited Dynamic Range – Mismatches and Nonlinearities (WYDINWYG) G. Cauwenberghs 520.776 Learning on Silicon

  4. Effect of Implementation Mismatches SYSTEM INPUTS OUTPUTS { } p i ε ( p ) REFERENCE Associative Element: – Mismatches can be properly compensated by adjusting the parameters p i accordingly, provided sufficient degrees of freedom are available to do so. Adaptive Element: – Requires precise implementation – The accuracy of implemented polarity (rather than amplitude) of parameter update increments ∆ p i is the performance limiting factor. G. Cauwenberghs 520.776 Learning on Silicon

  5. Example: LMS Rule A linear perceptron under supervised learning: ( k ) = Σ ( k ) y i p ij x j j target ( k ) - y i ( k ) 2 ε = 1 Σ Σ ) ( y i 2 k j with gradient descent: ( k ) = - η ∂ ε ( k ) ( k ) ⋅ y i target ( k ) - y i ( k ) ∆ p ij = - η x j ∂ p ij reduces to an incremental outer-product update rule, with scalable, modular implementation in analog VLSI. G. Cauwenberghs 520.776 Learning on Silicon

  6. Incremental Outer-Product Learning in Neural Nets x i p ij x j j i e i e j Σ x i = f ( ) p ij x j Multi-Layer Perceptron: j ∆ p ij = η x j ⋅ e i Outer-Product Learning Update: e i = x i – Hebbian (Hebb, 1949) : target - x i e i = f ' i ⋅ x i – LMS Rule (Widrow-Hoff, 1960) : Σ e j = f ' j ⋅ – Backpropagation (Werbos, Rumelhart, LeCun) : p ij e i i G. Cauwenberghs 520.776 Learning on Silicon

  7. Gradient Descent Learning Minimize ε ( p ) by iterating: ( k ) - η ∂ ε ( k ) ( k + 1) = p i p i ∂ p i from calculation of the gradient: ∂ ε ∂ ε ⋅ ∂ y l ⋅ ∂ x m Σ Σ = ∂ p i ∂ y l ∂ x m ∂ p i m l Implementation Problems: – Requires an explicit model of the internal network dynamics. – Sensitive to model mismatches and noise in the implemented network and learning system. – Amount of computation typically scales strongly with the number of parameters. G. Cauwenberghs 520.776 Learning on Silicon

  8. Gradient-Free Approach to Error-Descent Learning Avoid the model sensitivity of gradient descent, by observing the parameter dependence of the performance error on the network directly, rather than calculating gradient information from a pre- assumed model of the network. Stochastic Approximation: – Multi-dimensional Kiefer-Wolfowitz (Kushner & Clark 1978) – Function Smoothing Global Optimization (Styblinski & Tang 1990) – Simultaneous Perturbation Stochastic Approximation (Spall 1992) Hardware-Related Variants: – Model-Free Distributed Learning (Dembo & Kailath 1990) – Noise Injection and Correlation (Anderson & Kerns; Kirk & al. 1992-93) – Stochastic Error Descent (Cauwenberghs 1993) – Constant Perturbation, Random Sign (Alspector & al. 1993) – Summed Weight Neuron Perturbation (Flower & Jabri 1993) G. Cauwenberghs 520.776 Learning on Silicon

  9. Stochastic Error-Descent Learning Minimize ε ( p ) by iterating: p ( k +1) = p ( k ) – µ ε ( k ) π ( k ) from observation of the gradient in the direction of π ( k ) : ε ( k ) = 1 2 ε ( p ( k ) + π ( k ) ) – ε ( p ( k ) – π ( k ) ) with random uncorrelated binary components of the perturbation vector π ( k ) : ( k ) = ±σ ; E( π i ( k ) π j ( l ) ) ≈ σ 2 δ ij δ kl π i Advantages: – No explicit model knowledge is required. – Robust in the presence of noise and model mismatches. – Computational load is significantly reduced. – Allows simple, modular, and scalable implementation. – Convergence properties similar to exact gradient descent. G. Cauwenberghs 520.776 Learning on Silicon

  10. Stochastic Perturbative Learning Cell Architecture φ ( t ) – η ε ( t ) ^ φ ( t ) π i ( t ) – η ε ( t ) ^ NETWORK p i ( t ) + φ ( t ) π i ( t ) p i ( t ) Σ Σ ε ( p ( t ) + φ ( t ) π ( t )) z –1 LOCAL GLOBAL ε ( k ) = 1 p ( k +1) = p ( k ) – µ ε ( k ) π ( k ) 2 ε ( p ( k ) + π ( k ) ) – ε ( p ( k ) – π ( k ) ) G. Cauwenberghs 520.776 Learning on Silicon

  11. Stochastic Perturbative Learning Circuit Cell V σ + V σ – π i π i π i EN p V bp C perturb sign( ε ) ^ POL p i ( t ) + φ ( t ) π i ( t ) C store V bn EN n π i G. Cauwenberghs 520.776 Learning on Silicon

  12. Charge Pump Characteristics EN p (b) V bp I adapt V stored POL ∆ Q adapt C V bn (a) EN n Voltage Decrement ²V stored (V) 0 0 ∆ t = 40 msec 10 Voltage Increment ²V stored (V) 10 ∆ t = 40 msec 1 msec -1 -1 10 10 1 msec -2 -2 10 10 23 µsec -3 -3 10 10 23 µsec -4 -4 10 10 ∆ t = 0 -5 -5 10 10 0 0.1 0.2 0.3 0.4 0.5 0.6 0 0.1 0.2 0.3 0.4 0.5 0.6 Gate Voltage V bn (V) Gate Voltage V bp (V) (a) (b) G. Cauwenberghs 520.776 Learning on Silicon

  13. Supervised Learning of Recurrent Neural Dynamics BINARY QUANTIZATION Q (.) Q (.) Q (.) Q (.) Q (.) Q (.) TEACHER FORCING π 1 H x 1 ( t ) W 11 W 12 W 13 W 14 W 15 W 16 x 1 T ( t ) x 1 π 2 H x 2 ( t ) UPDATE ACTIVATION AND PROBE MULTIPLEXING W 21 W 22 x 2 T ( t ) x 2 π 3 H W 31 x 3 π 4 H W 41 x 4 π 5 H W 51 W 56 x 5 π 6 H W 65 W 66 W 61 x 6 + π 0 H I ref θ 1 θ 2 θ 3 θ 4 θ 5 θ 6 DYNAMICAL SYSTEM – I ref W off W off W off W off W off W off x ( t ) z ( t ) d x y ( t ) d t = ( ) ( ) π 1 π 2 π 3 π 4 π 5 π 6 F p , x , y z = V V V V V V G x G. Cauwenberghs 520.776 Learning on Silicon

  14. The Credit Assignment Problem or How to Learn from Delayed Rewards SYSTEM INPUTS OUTPUTS { } p i r*(t) r(t) ADAPTIVE CRITIC – External, discontinuous reinforcement signal r(t). – Adaptive Critics: • Discrimination Learning (Grossberg, 1975) • Heuristic Dynamic Programming (Werbos, 1977) • Reinforcement Learning (Sutton and Barto, 1983) • TD( λ ) (Sutton, 1988) • Q-Learning (Watkins, 1989) G. Cauwenberghs 520.776 Learning on Silicon

  15. Reinforcement Learning (Barto and Sutton, 1983) Locally tuned, address encoded neurons: χ ( t ) ∈ {0, ... 2 n –1} : n –bit address encoding of state space y ( t ) = y χ ( t ) : classifier output q ( t ) = q χ ( t ) : adaptive critic Adaptation of classifier and adaptive critic: y k ( t +1) = y k ( t ) + α r ( t ) e k ( t ) y k ( t ) q k ( t +1) = q k ( t ) + β r ( t ) e k ( t ) – eligibilities: e k ( t +1) = λ e k ( t ) + (1 – λ ) δ k χ ( t ) – internal reinforcement: r ( t ) = r ( t ) + γ q ( t ) – q ( t – 1) G. Cauwenberghs 520.776 Learning on Silicon

  16. Reinforcement Learning Classifier for Binary Control State Eligibility SEL hor Vbp UPD q vert SEL vert V α p q k e k UPD V δ ^ Vbn Vbn r Neuron Select Adaptive Critic SEL hor (State) HYST Vbp 64 Reinforcement Learning Neurons UPD UPD HYST x 2 ( t ) y vert V α p Vbp LOCK LOCK V α n State (Quantized) y k x 1 ( t ) Vbn Action Network SEL hor y = –1 y ( t ) y = 1 Action (Binary) u ( t ) G. Cauwenberghs 520.776 Learning on Silicon

  17. A Biological Adaptive Optics System brain iris retina lens zonule fibers cornea optic nerve G. Cauwenberghs 520.776 Learning on Silicon

  18. Wavefront Distortion and Adaptive Optics • Imaging • Laser beam - defocus - beam wander/spread - motion - intensity fluctuations G. Cauwenberghs 520.776 Learning on Silicon

  19. Adaptive Optics Conventional Approach – Performs phase conjugation • assumes intensity is unaffected – Complex • requires accurate wavefront phase sensor (Shack-Hartman; Zernike nonlinear filter; etc.) • computationally intensive control system G. Cauwenberghs 520.776 Learning on Silicon

  20. Adaptive Optics Model-Free Integrated Approach Incoming wavefront Wavefront corrector with N elements: u 1 ,…,u n ,…,u N – Optimizes a direct measure J of optical performance (“quality metric”) – No (explicit) model information is required • any type of quality metric J, wavefront corrector (MEMS, LC, …) • no need for wavefront phase sensor – Tolerates imprecision in the implementation of the updates • system level precision limited by accuracy of the measured J G. Cauwenberghs 520.776 Learning on Silicon

  21. Adaptive Optics Controller Chip Optimization by Parallel Perturbative Stochastic Gradient Descent image Φ (u) J( u ) wavefront performance corrector metric sensor J( u ) u AdOpt VLSI wavefront controller G. Cauwenberghs 520.776 Learning on Silicon

Recommend


More recommend