Harnessing Wake Vortices for Efficient Collective Swimming via Deep - PowerPoint PPT Presentation

Harnessing Wake Vortices for Efficient Collective Swimming via   Deep Reinforcement Learning Siddhartha Verma With: Guido Novati and Petros Koumoutsakos CSE lab http://www.cse-lab.ethz.ch

Collective Swimming • Hydrodynamic benefit of swimming in groups • Are Wake Vortices exploited by fish for propulsion? Credit: Artbeats • Theoretical work on Schooling & Formation Swimming   Breder (1965), Weihs (1973,1975), Shaw (1978) • Experiments : Abrahams & Colgan (1985), Herskin & Steffensen (1998), Svendsen (2003), Killen et al. (2011) • Simulations : Pre-assigned, fixed formations   Hemelrijk et al. (2015), Daghooghi & Borazjani (2015), Maertens et al. (2017) • But schools evolve dynamically Breder (1965) Weihs (1973), Shaw (1978)

THIS TALK: Adaptive Collective Swimming • Autonomous decision making capability, based on learning from experience • Goal: Maximize energy-efficiency No positional or formation constraints

The Need for Control • Without control, trailing fish may get ejected from leader’s wake • Coordinated Swimming through unsteady flow field requires: • Ability to observe the environment • Decision to react appropriately • The swimmers learn how to interact with the environment Prior Work @CSE Lab: “Vanilla” Reinforcement learning - Goal: Follow the Leader ( Novati et al., Bioinspir. Biomim. 2017 ) • HERE: Deep Reinforcement Learning - GOAL : Energy Extraction from Vortex Wake

Reinforcement Learning • An agent learning the best action, through trial- and-error interaction with environment Credit: https:// • Actions have consequences www.cs.utexas.edu/ ~eladlieb/RLRG.html • Reward (feedback) is delayed • Goal • Maximize cumulative future reward: • Credit assignment : • Agent receives feedback • Specify what to do, not how to do it • Expected reward updated in previously visited states r t +1 + γ r t +2 + γ 2 r t +3 + . . . | a k = π ( s k ) ∀ k > t ⇥ ⇤ Q π ( s t , a t ) = E • Q-learning • POLICY for taking the best ACTION in a given STATE Q π ( s t , a t ) = = E [ r t +1 + γ Q π ( s t +1 , π ( s t +1 )] Bellman (1957)

Deep Reinforcement Learning • Stable algorithm for training NN surrogates of Q • Sample past transitions: experience replay • Break correlations in data • Learn from all past policies • "Frozen" target Q-network to avoid oscillations Acting Learning at each iteration: at each iteration • agent is in a state s • sample tuple { s, a, s’, r } (or batch) select action a : • greedy: based on max Q(s,a,w) update wrt target with old weights : • • ⇣ ⌘ 2 � ∂ explore: random • a 0 Q ( s 0 , a 0 , w � ) − Q ( s, a, w) r + γ max observe new state s’ and reward r ∂ w • store in memory tuple { s, a, s’, r } • Periodically update fixed weights • w − ← w V. Mnih et al. "Human-level control through deep reinforcement learning." Nature (2015)

Actions, States, Reward Actions: Turn and modulate velocity by controlling body deformation • Increase curvature • Decrease curvature θ States: • Orientation relative to leader: Δ x, Δ y, θ • Time since previous tail beat: Δ t Δ y Δ x • Current shape of the body (manoeuvre) Reward : based on swimming e ffi ciency

After training: E ffi ciency-maximizing ‘Follower’ Leader Smart Follower • Smart-follower stays in-line with leader • Decides on its own the best strategy • Free to swim outside wake’s influence • Energetics: smart-follower exploits wake • Head synchronised with lateral flow-velocity • Compared to solitary swimmer with identical muscle movements • Presence/absence of wake is the only difference η Speed CoT P Def Smart 1.32 1.11 0.64 0.71 Solo 1 1 1 1

After training: E ffi ciency-maximizing ‘Follower’ Leader Smart Follower • Smart-follower stays in-line with leader • Decides on its own the best strategy • Free to swim outside wake’s influence • Energetics: smart-follower exploits wake • Head synchronised with lateral flow-velocity First 10,000 transitions Last 10,000 transitions • How does smart follower’s behaviour evolve during training? • Why the peaks in distribution?

Sequence of events W 1 L 1 • Snapshot when η is maximum S 1 • Wake-vortex ( W 1 ) lifts-up the boundary layer on the swimmer’s body ( L 1 ) • Lifted vortex generates secondary vortex ( S 1 ) • Secondary vortex - high speed region => suction due to low pressure • Flow-induced force + body deformation determine P def (muscle use) • Low P def values preferable 11

Implementing the Learned Strategy in 3D • Target coordinates - maxima in velocity correlation: • PID controller: • Modulate follower’s undulations (curvature + amplitude) • Maintain the target position specified 12

3D Wake Interactions • Wake-interactions benefit the follower 1 Follower Leader • 11.6% increase in efficiency - 5.3% reduction in CoT 0.8 • Oncoming wake-vortex ring intercepted η • Generates a new ‘lifted-vortex’ ring (LR) 0.6 Similar to the 2D case • 0.4 17.5 18 18.5 19 19.5 t LR LR 14

11% increase in efficiency for each follower 15

Summary • Autonomous swimmer learns to exploit unsteady fluctuations in the velocity field • Decides to interact with the wake, even when free to swim clear • Large energetic savings, without loss in speed (Improvements: 30% and 11% ) Swimming via Reinforcement Learning : An effective and robust method for harnessing energy from unsteady flow NEXT: Energy Efficient Swarms of Drones ? 16

Backup

Reacting to an erratic leader Note: Reward allotted here has no connection to relative displacement Two fish swimming together in Greece Two fish swimming together in the Swiss supercomputer

Robustness: Responds E ff ectively to Perturbations • Agent never experienced deviations in the leader’s behaviour during training • But analogous situations encountered during training (random actions during learning) • Agent reacts appropriately to maximise cumulative reward 19

Numerical methods 0 in 2D • Remeshed vortex methods (2D) ∂ω ∂ t + u · r ω = ω · r u + ν r 2 ω + λ r ⇥ ( χ ( u s � u )) • Solve vorticity form of incompressible Navier-Stokes Advection Diffusion Penalization • Brinkman penalization • Accounts for fluid-solid interaction Angot et al., Numerische Mathematik (1999) • 2D: Wavelet-based adaptive grid • Cost-effective compared to uniform grids Rossinelli et al., J. Comput. Phys. (2015) • 3D: Finite Difference - pressure projection (Chorin 1968) Rossinelli et al., SC'13 Proc. Int. Conf. High Perf. Comput., Denver, Colorado

Reinforcement Learning: Reward Goal #1 : learn to stay behind the leader R<0 Reward: vertical displacement R>0 R ∆ y = 1 − | ∆ y | 0 . 5 L • Failure condition R end = − 1 • Stray too far or collide with leader Goal #2 : learn to maximise swimming-efficiency Reward: efficiency P thrust R η = P thrust + max( P def , 0) Thrust power T | u CM | R η = R T | u CM | + max( ∂ Ω F ( x ) · u def ( x ) d x , 0) Deformation power

Reinforcement Learning: Basic idea • An agent learning the best action, through trial- and-error interaction with environment Credit: https:// • Actions have long term consequences www.cs.utexas.edu/ ~eladlieb/RLRG.html • Reward (feedback) is delayed • Goal • Maximize cumulative future reward: • Specify what to do, not how to do it • Credit assignment : A Example : Maze solving -11 -12 0 0 0 0 • Agent receives feedback State Agent’s position (A) -2 -1 -1 -12 -11 -10 • Expected reward updated in previously Actions go U, D, L, R -13 -2 visited states -9 -8 • Now we have a policy -1 per step taken Reward -14 -15 -3 0 at terminal state -7 r t +1 + γ r t +2 + γ 2 r t +3 + . . . | a k = π ( s k ) ∀ k > t ⇥ ⇤ Q π ( s t , a t ) = E A A A -4 -5 -7 -6 Q π ( s t , a t ) = = E [ r t +1 + γ Q π ( s t +1 , π ( s t +1 )] Bellman (1957)

Recurrent Neural Network (a 1 ) (a 2 ) (a 3 ) (a 4 ) (a 5 ) q n q n q n q n q n LSTM Layer 3 LSTM Layer 2 LSTM Layer 1 o n

A flexible maneuvering model c c - Modified midline kinematics preserves travelling wave: { { Traveling wave Traveling spline - Each action prescribes a point of the spline: c + ¼ c + ¾ c + ½ c +1 c Increase Curvature Decrease Curvature c + ½ c c +1 c + ¾ c + ¼

Examples Effect of action depends on when action is made Increasing local curvature Reducing local curvature Chain of actions

Simulation Cost (2D) • Wavelet-based adaptive grid • https://github.com/cselab/MRAG-I2D Rossinelli et al., JCP (2015) • Production runs (Re=5000) • Domain : [0,1] x [0,1] • Resolution : 8192 x 8192 • Training simulations (lower resolution) • 1600 points along fish midline • Resolution : 2048 x 2048 • Running with 24 threads (12 hyper • 10 tail-beat cycles : 36 core hours threaded cores - Piz Daint) • Learning converges in : 150,000 tail-beats • 10 tail-beat cycles : 27000 time steps • 0.54 Million core hrs per learning episode • Approx. 96 core hours: 1 second/step 26

Harnessing Wake Vortices for Efficient Collective Swimming via Deep - PowerPoint PPT Presentation

Harnessing Wake Vortices for Efficient Collective Swimming via Deep Reinforcement Learning Siddhartha Verma With: Guido Novati and Petros Koumoutsakos CSE lab http://www.cse-lab.ethz.ch Collective Swimming Hydrodynamic benefit of

Quantized Vortices and Quantized Vortices and Quantum Turbulence Quantum Turbulence Makoto

Non-Abelian Vortices in Non-Abelian Vortices in Spinor Spinor Bose-Einstein Condensates

Quantized Quantized superfluid vortices superfluid vortices in the unitary Fermi gas in the

WAKE TRANSIT PLAN Summer 2018 Planning for growth WAKE COUNTYs population already exceeds ONE

Historic Landmark Designation Public Hearing AP~A I WAKE COUNTY Purpose: Wake County Historic

Efficient Wake-Up Scheduling for Efficient Wake-Up Scheduling for Multi-Core Systems Multi-Core

Wake Transi sit Dr Draft Wor ork Pl Plan Summary - Fisc scal Year 2020 - WAKE COUNTY IS

Fundamentals of vortex dynamics Andrew David Gilbert, Mathematics Department, University of Exeter,

Vortices and the Navier-Stokes equation: understanding solutions of equations that we cant

Harnessing the potential of stem cells Harnessing the potential of stem cells for the treatment

HARNESSING HARNESSING THE THE DA DATA Elizabeth Elizabeth Lukanen, Lukanen, MPH MPH Sta

Harnessing Harnessing Grid Resources with Grid Resources with Data- -Centric Task Farms

HARNESSING THE BULL MARKET HARNESSING THE BULL MARKET FOR FREE CASH FLOW FOR FREE CASH FLOW

Harnessing technology for better social outcomes Presented by: Andrew Peckham General Manager -

European Wake Vortex European Wake Vortex Mitigation Benefits Study Mitigation Benefits Study

Foreclosure s Wake s Wake Foreclosure The Credit Experiences of Individuals Following

E-203 status and future plans Nicolas Delerue on behalf

Positive Train Control Project Status Metro-North Railroad Long Island Rail Road January 28,

Status of Analysis The ee WW qqln process at 1.4 TeV Andreas A. Maier 1 1 CERN January 16,

Building fault models for microcontrollers Albert Spruyt aspruyt@os3.nl University of Amsterdam

CASCADES INC. NBF Quebec Conference Montreal June 2, 2016 DISCLAIMER Certain statements in

First-order methods (Optml++ Meeting 2) Suvrit Sra Massachusetts Institute of Technology

Vattenfall Eldistribution Confidentiality: C2 - Internal Why sthlmflex? The project

Maria R. Coady, Ph.D. (co-PI) Candace Harper, Ph.D. (co-PI) OELA Presentation, November 30, 2010