A graphical model for sequential teams Aditya Mahajan and Sekhar Tatikonda Dept of Electrical Engineering Yale University Presented at: ConCom Workshop, June 27, 2009
A glimpse of the result
Structural results in sequential teams ◦ Example: MDP (Markov decision process) ⊲ Controlled MC: Pr ( x t | x 1 , . . ., x t − 1 , u 1 , . . ., u t − 1 ) = Pr ( x t | x t − 1 , u t − 1 ) ⊲ Controller: u t = g t ( x 1 , . . ., x t , u 1 , . . ., u t − 1 ) ⊲ Reward: r t = ρ t ( x t , u t ) � T � � ⊲ Objective: Maximize E R t t = 1 ◦ Structural results ⊲ Without loss of optimality, u t = g t ( x t )
Graphically . . . original r 1 r 2 r 3 ρ 1 ρ 2 ρ 3 f 0 f 1 f 2 x 1 u 1 x 2 u 2 x 3 u 3 g 1 g 2 g 3
Graphically . . . structural results r 1 r 2 r 3 ρ 1 ρ 2 ρ 3 f 0 f 1 f 2 x 1 u 1 x 2 u 2 x 3 u 3 g 1 g 2 g 3
Hans S. Witsenhausen, On the structure of real-time source coders, Structural results in sequential teams Bell Systems Technical Journal, vol 58, no 6, pp 1437-1451, July-August 1979 ◦ Example: real-time source coding ⊲ Source: First order Markov source { x t , t = 1 , . . . } ⊲ Real-time source coder: y t = c t ( x 1 , . . ., x t , y 1 , . . ., y t − 1 ) x t = g t ( y t , m t − 1 ) ⊲ Finite memory decoder: ˆ m t = l t ( y t , m t − 1 ) ⊲ ⊲ Cost: d t = ρ t ( x t , ˆ x t ) ◦ Structural Results ⊲ Without loss of optimality, y t = c t ( x t , m t − 1 )
Graphically . . . original ˆ ˆ ˆ d 1 d 2 d 3 ρ 1 ρ 2 ρ 3 f 1 f 2 f 3 x 1 y 1 m 1 x 2 y 2 m 2 x 3 y 3 x 1 x 2 x 3 c 1 g 1 c 2 g 2 c 3 g 3 l 1 l 2
Graphically . . . structural results ˆ ˆ ˆ d 1 d 2 d 3 ρ 1 ρ 2 ρ 3 f 1 f 2 f 3 x 1 y 1 m 1 x 2 y 2 m 2 x 3 y 3 x 1 x 2 x 3 c 1 g 1 c 2 g 2 c 3 g 3 l 1 l 2
The main idea ◦ Represent a sequential team as a directed graph ◦ Simplify the graph
Sequential teams – Salient features A team is sequential if and only if there exists a partial order between the system variables. There is no loss of optimality in restricting attention to non-randomizing decision makers Data available at a DM can be ignored if it is independent of the future Variables functionally determined from the data available at a DM can be ◦ ◦ ◦ rewards conditioned on other data at the DM ◦ assumed to be observed at the DM.
Graphical models – Salient features Any partial order gives rise to a DAG (Directed Acyclic Graph) A DAFG can be used to efficiently check for conditional independence using d-separation A DAFG can be used to efficiently check for conditional independence with ◦ ◦ ◦ deterministic nodes using D-separation
Match between features of sequential teams and graphical models The rest is a matter of details . . .
The model ◦ Components of a sequential team ⊲ A set N of indices of system variables { X n , n ∈ N } . Finite sets {X n , n ∈ N } of state spaces of X n − A ⊂ N , variables generated by DM − N \ A , variables generated by nature − R ⊂ N , reward variables ⊲ Information sets { I n , n ∈ N } , such that I n ⊆ { 1 , . . ., n } . I n = � i ∈ I n X i ⊲ F N \ A = { f n , n ∈ N \ A } , where f n is a conditional PMF X n given I n ⊲ Design: G A = { g n , n ∈ A } , where g n is a decision rule from I n to X n
The model ◦ Probability measure induced by a design � � P G A ( X N ) = f n ( X n | I n ) I [ X n = g n ( I n )] n ∈ N \ A n ∈ A ◦ Optimization problem �� � , where the expectation is with respect to P G A . Minimize E X n n ∈ R
Representation as a graphical model ˜ ◦ Directed Acyclic Factor Graph ◦ Nodes ⊲ Variable node n ≡ system variable X n n ≡ conditional PMF f n or decision rule g n ⊲ Factor node ◦ Edges n ) , for each n ∈ N and i ∈ I n ⊲ ( i , ˜ ⊲ ( ˜ n , n ) , for each n ∈ N ◦ Acyclic Graph ⊲ Sequential team ⇒ partial order on variable nodes ⇒ acyclic graph
Graphical models – Terminology ◦ parents ( n ) ⊲ { m : m → n } ⊲ Parents of a control (factor) node = data observed by controller ◦ children ( n ) ⊲ { m : n → m } ⊲ Children of a control node = control action ◦ ancestors ( n ) ⊲ { m : ∃ directed path from m to n } ⊲ Ancestors of a control node = all nodes that affect the data observed ◦ descendants ( n ) ⊲ { m : ∃ directed path from n to m } ⊲ Descendants of a control node = all nodes affected by the control action
Graphical Models — Example r 1 r 2 r 3 ρ 1 ρ 2 ρ 3 f 0 f 1 f 2 x 1 u 1 x 2 u 2 x 3 u 3 g 1 g 2 g 3
Graphical Models — Variable nodes Reward nodes Non-reward nodes r 1 r 2 r 3 ρ 1 ρ 2 ρ 3 f 0 f 1 f 2 x 1 u 1 x 2 u 2 x 3 u 3 g 1 g 2 g 3
Graphical Models — Factor nodes Control Factors Stochastic Factors r 1 r 2 r 3 ρ 1 ρ 2 ρ 3 f 0 f 1 f 2 x 1 u 1 x 2 u 2 x 3 u 3 g 1 g 2 g 3
Graphical Models — Parents and Children Parents Children Control factor node r 1 r 2 r 3 ρ 1 ρ 2 ρ 3 f 0 f 1 f 2 x 1 u 1 x 2 u 2 x 3 u 3 g 1 g 2 g 3
Graphical Models — Ancestors and descendents Ancestors Descendants Control factor node r 1 r 2 r 3 ρ 1 ρ 2 ρ 3 f 0 f 1 f 2 x 1 u 1 x 2 u 2 x 3 u 3 g 1 g 2 g 3
Structural results If some data available at a DM is independent of future rewards given the control action and other data at the DM, then that data can be ignored Can we automate this process? ◦ The main idea
test conditional independence Graphical models can easily Struct. result ≡ cond. independence
Conditional independence Explanation Hidden cause Markov chain y ◦ Three canonical graphs to verify x ⊥ ⊥ z | y x z y g y g x z x z f f f ◦ Blocking of a trail A trail from a to b is blocked by C if ∃ a node v on the trail such that either: ◦ either → v → , ← v ← , or ← v → , and v ∈ C ◦ → v ← and neither v nor any of v 's descendants are in C .
Conditional independence ◦ d-separation A is d-separated from B by C if all trails from A to B are blocked by C ◦ Conditional independence For any probability measure P that factorizes according to a DAFG, A d-separated from B by C implies X A is conditionally independent of X B given X C , P a.s. ◦ Efficient algorithms to verify d-separation ⊲ Moral graph ⊲ Bayes Ball
Automated Structural results ◦ First attempt ⊲ Dependent rewards: R d ( ˜ n ) = R ∩ descendants ( ˜ n ) n , and parent i is irrelevant if R d ( ˜ n ) is ⊲ Irrelevant data: At a control node ˜ d-separate from i given parents ( ˜ n ) ∪ children ( ˜ n ) \ { i } ⊲ Requisite data: All parents that are not irrelevant ◦ Structural result ⊲ Without loss of optimality, we can remove irrelevant data. u n = g n ( requisite ( ˜ n ))
Structural Results for MDP — Step 1 r 1 r 2 r 3 ρ 1 ρ 2 ρ 3 f 0 f 1 f 2 x 1 u 1 x 2 u 2 x 3 u 3 g 1 g 2 g 3
Structural Results for MDP — Step 1 r 1 r 2 r 3 ρ 1 ρ 2 ρ 3 f 0 f 1 f 2 x 1 u 1 x 2 u 2 x 3 u 3 g 1 g 2 g 3 ◦ Pick node g 3 . ⊲ Original u 3 = g 3 ( x 1 , x 2 , x 3 , u 1 , u 2 ) ⊲ requisite( g 3 ) = { x 3 } ⊲ Thus, u 3 = g 3 ( x 3 )
Structural Results for MDP — Step 2 r 1 r 2 r 3 ρ 1 ρ 2 ρ 3 f 0 f 1 f 2 x 1 u 1 x 2 u 2 x 3 u 3 g 1 g 2 g 3
Structural Results for MDP — Step 2 r 1 r 2 r 3 ρ 1 ρ 2 ρ 3 f 0 f 1 f 2 x 1 u 1 x 2 u 2 x 3 u 3 g 1 g 2 g 3 ◦ Pick node g 2 . ⊲ Original u 2 = g 2 ( x 1 , x 2 , u 1 ) ⊲ requisite( g 2 ) = { x 2 } ⊲ Thus, u 2 = g 2 ( x 2 )
Structural Results for MDP — Simplified r 1 r 2 r 3 ρ 1 ρ 2 ρ 3 f 0 f 1 f 2 x 1 u 1 x 2 u 2 x 3 u 3 g 1 g 2 g 3
Does not work for all problems . . . even when structural simplification is possible u n = g n ( requisite ( ˜ n ) )
A real-time source coding problem Hans S. Witsenhausen, On the structure of real-time source coders, Bell Systems Technical Journal, vol 58, no 6, pp 1437-1451, July-August 1979 ◦ Mathematical Model ⊲ Source: First order Markov source { x t , t = 1 , . . . } ⊲ Real-time source coder: y t = c t ( x ( 1 : t ), y ( 1 : t − 1 )) x t = g t ( y t , m t − 1 ) ⊲ Finite memory decoder: ˆ m t = l t ( y t , m t − 1 ) ⊲ ⊲ Cost: d t = ρ t ( x t , ˆ x t )
Model for real-time comm — Does not simplify ˆ ˆ ˆ d 1 d 2 d 3 ρ 1 ρ 2 ρ 3 f 1 f 2 f 3 x 1 y 1 m 1 x 2 y 2 m 2 x 3 y 3 x 1 x 2 x 3 c 1 g 1 c 2 g 2 c 3 g 3 l 1 l 2
Need to take care of deterministic variables!
Functionally determined nodes ◦ Functionally determined ⊲ X B is functionally determined by X A if X B ⊥ ⊥ X N | X A ◦ Conditional independence with functionally determined nodes ⊲ Can be checked using D-separation ⊲ Similar to d-sep: in the defn of blocking change “in C ” by “is func detm by C ” ◦ Blocking of a trail (version that takes care of detm nodes) A trail from a to b is blocked by C if ∃ a node v on the trail such that either: ◦ either → v → , ← v ← , or ← v → , and v is functionally determined by C ◦ → v ← and neither v nor any of v 's descendants are in C .
appropriate functionally determined data Automated Structural results ◦ Second attempt ⊲ Irrelevant data: Change d-separation by D-separation ⊲ Requisite data: All parents that are not irrelevant ◦ Structural result ⊲ Without loss of optimality, we can remove irrelevant data and add u n = g n ( requisite ( ˜ n ) , functionally_detm( ˜ n ) ∩ ancestors( R d ( ˜ n ) ) )
Recommend
More recommend