Optimal decentralized control of coupled subsystems with control sharing Aditya Mahajan McGill University IEEE Conference on Decision and Control, 2011
A Mahajan (McGill) Control sharing info struc 1 Notation Random variables: ๐ , realizations: ๏ฟฝ , state spaces: ๐ด . ๐ ๔ ๔ means that variable ๐ belongs to subsystem ๏ฟฝ at time ๏ฟฝ . ๐ ๔ฃ:๔ = ๏ฟฝ๐ ๔ฃ , ๐ ๔ค , โฆ, ๐ ๔ ๏ฟฝ ๐ = ๏ฟฝ๐ ๔ฃ , ๐ ๔ค , โฆ, ๐ ๔ ๏ฟฝ .
A Mahajan (McGill) Control sharing info struc Controller with control sharing Objective Control-coupled subsystems 2 System Model ๏ฟฝ ๔ฃ ๏ฟฝ ๔ค โฏ ๏ฟฝ ๔ ๔ ๔ ๔ ๏ฟฝ ๔ ๏ฟฝ ๔ฃ ๏ฟฝ ๔ค โฏ ๔ ๔ ๔ ๏ฟฝ ๔ ๏ฟฝ ๔ฃ ๏ฟฝ ๔ค ๔ ๔ ๔ ๐ฏ ๔๔ญ๔ฃ ๐ฏ ๔๔ญ๔ฃ ๐ฏ ๔๔ญ๔ฃ ๏ฟฝ ๔ ๔๔ฌ๔ฃ = ๏ฟฝ ๔ ๔ ๏ฟฝ๏ฟฝ ๔ ๔ , ๐ฏ ๔ , ๏ฟฝ ๔ ๏ฟฝ ๔ ๔ = ๏ฟฝ ๔ ๔ ๏ฟฝ๏ฟฝ ๔ ๔ ๏ฟฝ ๔ฃ:๔ , ๐ฏ ๔ฃ:๔๔ญ๔ฃ ๏ฟฝ ๔ป min all policies ๐ก ๐ฝ [ โ ๏ฟฝ ๔ ๏ฟฝ๐ฒ ๔ , ๐ฏ ๔ ๏ฟฝ] ๔๔ฎ๔ฃ
A Mahajan (McGill) Control sharing info struc 3 Some applications Feedback communication systems (physical layer) Point-to-point real-time source coding, multi-terminal source coding with feedback, some classes of multiple access channel with feedback Queueing networks (media access layer) Multi-access broadcast, some classes of decentralized scheduling and routing. Cellular networks Paging and registration in cellular networks
A Mahajan (McGill) Control sharing info struc 4 Conceptual difficulties The system has non-classical information structure Data at each controller is increasing with time Is part of this data redundant? Can part of this data be compressed to a sufficient statistic? Multi-stage decision making How does current control action affect future estimation? its control action? ๏ฟฝ ๔ ๔ = ๏ฟฝ ๔ ๔ ๏ฟฝ๏ฟฝ ๔ ๔ฃ:๔ , ๐ฏ ๔ฃ:๔๔ญ๔ฃ ๏ฟฝ What information does controller ๏ฟฝ communicate to controller ๏ฟฝ via
A Mahajan (McGill) Other non-classical info-structures with sharing Belief sharing: Yรผksel, 2009 Periodic sharing: Ooi, Verbout, Ludwig, Wornell, 1997 Walrand, 1979, Nayyar, Mahajan, and Teneketzis, 2011 Witsenhausen 1971, Varaiya and Delayed (observation) sharing: Delayed state sharing: Aicadri, Davoli, and Minciardi, 1987 Reduces to one-step delayed sharing pattern Control sharing info struc embed the observations in control Exploit the fact that the action space is continuous and compact to Considered the LQG version of the problem Athans, 1974) Control sharing info-structure (Bismut, 1972, Sandell and Literature Overview 5 Partial history sharing: Mahajan, Nayyar, Teneketzis, 2008
A Mahajan (McGill) wlo, wlo, Control sharing info struc Second structural result (based on common info approach of MNT 2008) Dynamic programming decomposition First structural result (based on person-by-person opt.) 6 Outline of the results ๏ฟฝ ๔ ๔ฃ:๔๔ญ๔ฃ is redundant for optimal performance. ๏ฟฝ ๔ ๔ = ๏ฟฝ ๔ ๔ ๏ฟฝ๏ฟฝ ๔ ๔ , ๐ฏ ๔ฃ:๔๔ญ๔ฃ ๏ฟฝ Define ฮ ๔ ๔ ๏ฟฝ๏ฟฝ๏ฟฝ = โ๏ฟฝ๐ ๔ ๔ = ๏ฟฝ | ๐ ๔ฃ:๔๔ญ๔ฃ ๏ฟฝ and ๐ธ ๔ = ๏ฟฝฮ ๔ฃ ๔ , โฆ, ฮ ๔ ๔ ๏ฟฝ . ๐ ๔ is a sufficient statistic of ๐ฏ ๔ฃ:๔๔ญ๔ฃ for optimal performance. ๏ฟฝ ๔ ๔ = ๏ฟฝ ๔ ๔ ๏ฟฝ๏ฟฝ ๔ ๔ , ๐ ๔ ๏ฟฝ
A Mahajan (McGill) 7 Structural result based on person-by-person optimality Main lemma The states processes are conditionally independent given the past control actions. Control sharing info struc Implications ๔ โ๏ฟฝ๐ ๔ ๔ฃ:๔ = ๏ฟฝ ๔ โ๏ฟฝ๐ ๔ฃ:๔ = ๐ฒ ๔ฃ:๔ | ๐ ๔ฃ:๔ ๏ฟฝ = โ ๔ฃ:๔ | ๐ ๔ฃ:๔ ๏ฟฝ ๔๔ฎ๔ฃ Fix ๏ฟฝ ๔ญ๔ and consider optimal design of ๏ฟฝ ๔ . Let ๐ ๔ ๔ = ๏ฟฝ๐ ๔ ๔ , ๐ ๔ฃ:๔๔ญ๔ฃ ๏ฟฝ . Then {๐ ๔ ๔ , ๏ฟฝ = ๏ฟฝ, โฆ} is a controlled MDP with control action ๏ฟฝ ๔ ๔ . โ๏ฟฝ๏ฟฝ ๔ ๔๔ฌ๔ฃ | ๏ฟฝ ๔ ๔ฃ:๔ , ๏ฟฝ ๔ ๔ฃ:๔ ๏ฟฝ = โ๏ฟฝ๏ฟฝ ๔ ๔๔ฌ๔ฃ | ๏ฟฝ ๔ ๔ , ๏ฟฝ ๔ ๔ ๏ฟฝ ๐ฝ[๏ฟฝ ๔ ๏ฟฝ๐ฒ ๔ , ๐ฏ ๔ ๏ฟฝ | ๏ฟฝ ๔ ๔ฃ:๔ , ๏ฟฝ ๔ ๔ฃ:๔ ] = ๐ฝ[๏ฟฝ ๔ ๏ฟฝ๐ฒ ๔ , ๐ฏ ๔ ๏ฟฝ | ๏ฟฝ ๔ ๔ , ๏ฟฝ ๔ ๔ ]
Structural result . . . (cont.) A Mahajan (McGill) Design difficulty Implication of person-by-person optimality argument Original model 8 Control sharing info struc Data at the controller is still increasing with time ๏ฟฝ ๔ ๔ = ๏ฟฝ ๔ ๔ ๏ฟฝ๏ฟฝ ๔ ๔ฃ:๔ , ๐ฏ ๔ฃ:๔๔ญ๔ฃ ๏ฟฝ ๏ฟฝ ๔ ๔ = ๏ฟฝ ๔ ๔ ๏ฟฝ๏ฟฝ ๔ ๔ ๏ฟฝ = ๏ฟฝ ๔ ๔ ๏ฟฝ๏ฟฝ ๔ ๔ , ๐ฏ ๔ฃ:๔๔ญ๔ฃ ๏ฟฝ
A Mahajan (McGill) 9 A coordinator based on common information General idea proposed in (Mahajan, Nayyar, and Teneketzis 2008) Control sharing info struc ๏ฟฝ ๔ฃ ๐ ๔ฃ ๏ฟฝ ๔ฃ ๏ฟฝ ๔ค ๐ ๔ค ๏ฟฝ ๔ค ๔ , ๐ ๔ฃ:๔๔ญ๔ฃ ๔ , ๐ ๔ฃ:๔๔ญ๔ฃ ๔ ๔ ๔ ๔
A coordinator based on common information (cont.) Control sharing info struc 10 A Mahajan (McGill) ๏ฟฝ ๔ฃ ๐ ๔ฃ ๏ฟฝ ๔ฃ ๏ฟฝ ๔ค ๐ ๔ค ๏ฟฝ ๔ค ๔ ๔ ๔ ๔ ๔ ๔ ๏ฟฝ๏ฟฝ ๔ฃ ๔ , ๏ฟฝ ๔ค โ ๔ ๐ ๔ฃ:๔๔ญ๔ฃ ๔ ๏ฟฝ where ๏ฟฝ ๔ ๔ ๏ฟฝโ ๏ฟฝ = ๏ฟฝ ๔ ๔ ๏ฟฝโ , ๐ฏ ๔ฃ:๔๔ญ๔ฃ ๏ฟฝ
A coordinator based on common information (cont.) A Mahajan (McGill) Control sharing info struc 11 Solution approach The coordinated system is a POMDP Identify the structure of optimal coordination strategies for the coordinated system Show that the coordinated system is equivalent to the original model Translate the structure of optimal coordination strategies to the original model
A Mahajan (McGill) 12 The coordinated system wlo, Structure of optimal coordination strategy Control sharing info struc State: ๐ฒ ๔ = ๏ฟฝ๏ฟฝ ๔ฃ ๔ , โฆ, ๏ฟฝ ๔ ๔ ๏ฟฝ ๏ฟฝ ๔ฃ ๐ ๔ฃ ๏ฟฝ ๔ฃ ๏ฟฝ ๔ค ๐ ๔ค ๏ฟฝ ๔ค ๔ ๔ ๔ ๔ ๔ ๔ Observations: ๐ฏ ๔๔ญ๔ฃ = ๏ฟฝ๏ฟฝ ๔ฃ ๔๔ญ๔ฃ , โฆ, ๏ฟฝ ๔ ๔๔ญ๔ฃ ๏ฟฝ ๏ฟฝ๏ฟฝ ๔ฃ ๔ , ๏ฟฝ ๔ค โ ๔ ๐ ๔ฃ:๔๔ญ๔ฃ ๔ ๏ฟฝ Control actions: ๐ ๔ = ๏ฟฝ๏ฟฝ ๔ฃ ๔ , โฆ, ๏ฟฝ ๔ ๔ ๏ฟฝ , ๔ ๔๔ญ๔ฃ : ๔ ๏ฟฝ๐ด ๔ โ ๐ฑ ๔ ๐ฑ ๔ ) Coordination rule: โ ๔ : ( โ โ ๔ ๏ฟฝ ๔๔ฎ๔ฃ ๔๔ฎ๔ฃ ๐ ๔ = โ ๔ ๏ฟฝ๐ฏ ๔ฃ:๔๔ญ๔ฃ ๏ฟฝ Define ฮ ๔ = โ๏ฟฝ state | history of observations ๏ฟฝ = โ๏ฟฝ๐ฒ | ๐ ๔ฃ:๔๔ญ๔ฃ ๏ฟฝ . Then, ๐ ๔ = โ ๔ ๏ฟฝ๐ ๔ ๏ฟฝ
The coordinated system (cont.) A Mahajan (McGill) Control sharing info struc 13 Dynamic programming decomposition Salient features The optimization at each step is a functional optimization problem. (In our opinion) functional optimization at each step is the only way to circumvent the issue of signaling. ๏ฟฝ ๔ ๏ฟฝ๐๏ฟฝ = min ๐ ๐ฝ [๏ฟฝ ๔ ๏ฟฝ๐ ๔ , ๐ ๔ ๏ฟฝ + ๏ฟฝ ๔๔ฌ๔ฃ ๏ฟฝฮ ๔๔ฌ๔ฃ ๏ฟฝ | ฮ ๔ = ๐]
A Mahajan (McGill) Control sharing info struc Solve the DP for coordinated system. Dynamic programming decomposition Structural result wlo, system Translation of results back to the original 14 ๏ฟฝ ๔ฃ ๐ ๔ฃ ๏ฟฝ ๔ฃ ๏ฟฝ ๔ค ๐ ๔ค ๏ฟฝ ๔ค ๔ ๔ ๔ ๔ ๔ ๔ ๏ฟฝ ๔ ๔ = ๏ฟฝ ๔ ๔ ๏ฟฝ๏ฟฝ ๔ ๔ ๏ฟฝ = โ ๔ ๔ ๏ฟฝ๐ ๔ ๏ฟฝ๏ฟฝ๏ฟฝ ๔ ๔ ๏ฟฝ = ๏ฟฝ ๔ ๔ ๏ฟฝ๏ฟฝ ๔ ๔ , ๐ ๔ ๏ฟฝ โ ๔ ๐ ๔ฃ:๔๔ญ๔ฃ ๏ฟฝ๏ฟฝ ๔ฃ ๔ , ๏ฟฝ ๔ค ๔ ๏ฟฝ Choose ๏ฟฝ ๔ ๔ ๏ฟฝ๏ฟฝ ๔ ๔ , ๐ ๔ ๏ฟฝ = โ ๔ ๔ ๏ฟฝ๐ ๔ ๏ฟฝ๏ฟฝ๏ฟฝ ๔ ๔ ๏ฟฝ
A Mahajan (McGill) Control sharing info struc 15 Further simplification of structural result Recall main lemma: The states processes are conditionally independent given the past control actions. Implication ๔ โ๏ฟฝ๐ ๔ ๔ฃ:๔ = ๏ฟฝ ๔ โ๏ฟฝ๐ ๔ฃ:๔ = ๐ฒ ๔ฃ:๔ | ๐ ๔ฃ:๔ ๏ฟฝ = โ ๔ฃ:๔ | ๐ ๔ฃ:๔ ๏ฟฝ ๔๔ฎ๔ฃ ๔ ๐ ๔ ๔ ๏ฟฝ๏ฟฝ ๔ ๐ ๔ ๏ฟฝ๐ฒ๏ฟฝ = โ๏ฟฝ๐ ๔ = ๐ฒ | ๐ ๔ฃ:๔๔ญ๔ฃ ๏ฟฝ = โ ๔ ๏ฟฝ ๔๔ฎ๔ฃ
Further simplification of structural result (cont.) while Control sharing info struc 16 Simplified structural result wlo, A Mahajan (McGill) Simplified dynamic programming decomposition Significant reduction is size. ๏ฟฝ ๔ ๔ = ๏ฟฝ ๔ ๔ ๏ฟฝ๏ฟฝ ๔ ๔ , ๐ ๔ ๏ฟฝ = ๏ฟฝ ๔ ๔ ๏ฟฝ๏ฟฝ ๔ ๔ , ๐ ๔ ๏ฟฝ ๐ ๔ โ ฮ๏ฟฝ๐ด ๔ฃ ร โฏ ร ๐ด ๔ ๏ฟฝ ๐ ๔ โ ฮ๏ฟฝ๐ด ๔ฃ ๏ฟฝ ร โฏ ร ฮ๏ฟฝ๐ด ๔ ๏ฟฝ ๏ฟฝ ๔ ๏ฟฝ๐๏ฟฝ = min ๐ ๐ฝ [๏ฟฝ ๔ ๏ฟฝ๐ ๔ , ๐ ๔ ๏ฟฝ + ๏ฟฝ ๔๔ฌ๔ฃ ๏ฟฝ๐ธ ๔๔ฌ๔ฃ ๏ฟฝ | ๐ธ ๔ = ๐]
A Mahajan (McGill) Using person-by-person approach Using specific conditional independence due to the dynamics Control sharing info struc Using the common information approach of (NMT 2008, 2011) 17 Original: Recap of structural results ๏ฟฝ ๔ ๔ = ๏ฟฝ ๔ ๔ ๏ฟฝ๏ฟฝ ๔ ๔ฃ:๔ , ๐ฏ ๔ฃ:๔๔ญ๔ฃ ๏ฟฝ ๏ฟฝ ๔ ๔ = ๏ฟฝ ๔ ๔ ๏ฟฝ๏ฟฝ ๔ ๔ , ๐ฏ ๔ฃ:๔๔ญ๔ฃ ๏ฟฝ ๏ฟฝ ๔ ๔ = ๏ฟฝ ๔ ๔ ๏ฟฝ๏ฟฝ ๔ ๔ , ๐ ๔ ๏ฟฝ, ๐ ๔ = โ๏ฟฝ๐ ๔ | ๐ฏ ๔ฃ:๔๔ญ๔ฃ ๏ฟฝ ๏ฟฝ ๔ ๔ = ๏ฟฝ ๔ ๔ ๏ฟฝ๏ฟฝ ๔ ๐ ๔ ๔ = โ๏ฟฝ๐ ๔ ๔ , ๐ ๔ ๏ฟฝ, ๔ | ๐ฏ ๔ฃ:๔๔ญ๔ฃ ๏ฟฝ
Recommend
More recommend