learning light transport the reinforced way
play

Learning Light Transport the Reinforced Way Ken Dahm and Alexander - PowerPoint PPT Presentation

Learning Light Transport the Reinforced Way Ken Dahm and Alexander Keller Light Transport Simulation How to do importance sampling compute functionals of a Fredholm integral equation of the 2nd kind L ( x , ) = L e ( x , )+ + ( x


  1. Learning Light Transport the Reinforced Way Ken Dahm and Alexander Keller

  2. Light Transport Simulation How to do importance sampling � compute functionals of a Fredholm integral equation of the 2nd kind � L ( x , ω ) = L e ( x , ω )+ + ( x ) L ( h ( x , ω i ) , − ω i ) f r ( ω i , x , ω ) cos θ i d ω i S 2 2

  3. Light Transport Simulation How to do importance sampling � example: direct illumination � L ( x , ω ) = L e ( x , ω )+ + ( x ) L e ( h ( x , ω i ) , − ω i ) f r ( ω i , x , ω ) cos θ i d ω i S 2 2

  4. Light Transport Simulation How to do importance sampling � example: direct illumination � L ( x , ω ) = L e ( x , ω )+ + ( x ) L e ( h ( x , ω i ) , − ω i ) f r ( ω i , x , ω ) cos θ i d ω i S 2 N − 1 L e ( x , ω )+ 1 L e ( h ( x , ω i ) , − ω i ) f r ( ω i , x , ω ) cos θ i ∑ ≈ N p ( ω i ) i = 0 2

  5. Light Transport Simulation How to do importance sampling � example: direct illumination � L ( x , ω ) = L e ( x , ω )+ + ( x ) L e ( h ( x , ω i ) , − ω i ) f r ( ω i , x , ω ) cos θ i d ω i S 2 N − 1 L e ( x , ω )+ 1 L e ( h ( x , ω i ) , − ω i ) f r ( ω i , x , ω ) cos θ i ∑ ≈ N p ( ω i ) i = 0 p ∼ f r cos θ

  6. Light Transport Simulation How to do importance sampling � example: direct illumination � L ( x , ω ) = L e ( x , ω )+ + ( x ) L e ( h ( x , ω i ) , − ω i ) f r ( ω i , x , ω ) cos θ i d ω i S 2 N − 1 L e ( x , ω )+ 1 L e ( h ( x , ω i ) , − ω i ) f r ( ω i , x , ω ) cos θ i ∑ ≈ N p ( ω i ) i = 0 p ∼ f r cos θ p ∼ L e f r cos θ 2

  7. Machine Learning

  8. Machine Learning Taxonomy supervised learning: learning from labeled data goal: extrapolate/generalize response to unseen data example: artificial neural networks 4

  9. Machine Learning Taxonomy supervised learning: learning from labeled data goal: extrapolate/generalize response to unseen data example: artificial neural networks unsupervised learning: learning from unlabeled data goal: identify structure in data example: autoencoder networks 4

  10. Machine Learning Taxonomy supervised learning: learning from labeled data goal: extrapolate/generalize response to unseen data example: artificial neural networks unsupervised learning: learning from unlabeled data goal: identify structure in data example: autoencoder networks semi-supervised learning: reward-based supervision goal: maximize reward example: reinforcement learning 4

  11. The Reinforcement Learning Problem Maximize reward � policy π t : S → A ( S t ) Agent S t – to select an action A t ∈ A ( S t ) – given current state S t ∈ S S t + 1 R t + 1 ( A t | S t ) A t Environment 5

  12. The Reinforcement Learning Problem Maximize reward � policy π t : S → A ( S t ) Agent S t – to select an action A t ∈ A ( S t ) – given current state S t ∈ S S t + 1 R t + 1 ( A t | S t ) A t � state transition yields reward R t + 1 ( A t | S t ) ∈ R Environment 5

  13. The Reinforcement Learning Problem Maximize reward � policy π t : S → A ( S t ) Agent S t – to select an action A t ∈ A ( S t ) – given current state S t ∈ S S t + 1 R t + 1 ( A t | S t ) A t � state transition yields reward R t + 1 ( A t | S t ) ∈ R Environment � classic goal: find policy to maximize discounted cumulative reward ∞ γ k R t + 1 + k ( A t + k | S t + k ) , where 0 < γ < 1 V π ( S t ) ≡ ∑ k = 0 5

  14. The Reinforcement Learning Problem Q-Learning [Watkins 1989] � finds optimal action-selection policy for any given Markov decision process Q ′ ( s , a ) r ( s , a )+ γ V ( s ′ ) � � = ( 1 − α ) · Q ( s , a )+ α · for a learning rate α ∈ [ 0 , 1 ] 6

  15. The Reinforcement Learning Problem Q-Learning [Watkins 1989] � finds optimal action-selection policy for any given Markov decision process Q ′ ( s , a ) r ( s , a )+ γ V ( s ′ ) � � = ( 1 − α ) · Q ( s , a )+ α · for a learning rate α ∈ [ 0 , 1 ] with the following options for the discounted cumulative reward  max a ′ ∈ A Q ( s ′ , a ′ ) consider best action         V ( s ′ ) ≡         6

  16. The Reinforcement Learning Problem Q-Learning [Watkins 1989] � finds optimal action-selection policy for any given Markov decision process Q ′ ( s , a ) r ( s , a )+ γ V ( s ′ ) � � = ( 1 − α ) · Q ( s , a )+ α · for a learning rate α ∈ [ 0 , 1 ] with the following options for the discounted cumulative reward  max a ′ ∈ A Q ( s ′ , a ′ ) consider best action         V ( s ′ ) ∑ a ′ ∈ A π ( s ′ , a ′ ) Q ( s ′ , a ′ ) ≡ policy weighted average over discrete action space         6

  17. The Reinforcement Learning Problem Q-Learning [Watkins 1989] � finds optimal action-selection policy for any given Markov decision process Q ′ ( s , a ) r ( s , a )+ γ V ( s ′ ) � � = ( 1 − α ) · Q ( s , a )+ α · for a learning rate α ∈ [ 0 , 1 ] with the following options for the discounted cumulative reward  max a ′ ∈ A Q ( s ′ , a ′ ) consider best action         V ( s ′ ) ∑ a ′ ∈ A π ( s ′ , a ′ ) Q ( s ′ , a ′ ) ≡ policy weighted average over discrete action space     �  A π ( s ′ , a ′ ) Q ( s ′ , a ′ ) da ′  policy weighted average over continuous action space   6

  18. Light Transport Simulation and Reinforcement Learning

  19. Light Transport Simulation and Reinforcement Learning Structural equivalence of the integral equations � matching terms � L ( x , ω ) = L e ( x , ω ) + f r ( ω i , x , ω ) cos θ i L ( h ( x , ω i ) , − ω i ) d ω i S 2 + ( x ) � � � Q ′ ( s , a ) π ( s ′ , a ′ ) Q ( s ′ , a ′ ) da ′ = ( 1 − α ) Q ( s , a )+ α r ( s , a ) + γ A 8

  20. Light Transport Simulation and Reinforcement Learning Structural equivalence of the integral equations � matching terms � L ( x , ω ) = L e ( x , ω ) + f r ( ω i , x , ω ) cos θ i L ( h ( x , ω i ) , − ω i ) d ω i S 2 + ( x ) � � � Q ′ ( s , a ) π ( s ′ , a ′ ) Q ( s ′ , a ′ ) da ′ = ( 1 − α ) Q ( s , a )+ α r ( s , a ) + γ A 8

  21. Light Transport Simulation and Reinforcement Learning Structural equivalence of the integral equations � matching terms � L ( x , ω ) = L e ( x , ω ) + f r ( ω i , x , ω ) cos θ i L ( h ( x , ω i ) , − ω i ) d ω i S 2 + ( x ) � � � Q ′ ( s , a ) π ( s ′ , a ′ ) Q ( s ′ , a ′ ) da ′ = ( 1 − α ) Q ( s , a )+ α r ( s , a ) + γ A 8

  22. Light Transport Simulation and Reinforcement Learning Structural equivalence of the integral equations � matching terms � L ( x , ω ) = L e ( x , ω ) + f r ( ω i , x , ω ) cos θ i L ( h ( x , ω i ) , − ω i ) d ω i S 2 + ( x ) � � � Q ′ ( s , a ) π ( s ′ , a ′ ) Q ( s ′ , a ′ ) da ′ = ( 1 − α ) Q ( s , a )+ α r ( s , a ) + γ A 8

  23. Light Transport Simulation and Reinforcement Learning Structural equivalence of the integral equations � matching terms � L ( x , ω ) = L e ( x , ω ) + f r ( ω i , x , ω ) cos θ i L ( h ( x , ω i ) , − ω i ) d ω i S 2 + ( x ) � � � Q ′ ( s , a ) π ( s ′ , a ′ ) Q ( s ′ , a ′ ) da ′ = ( 1 − α ) Q ( s , a )+ α r ( s , a ) + γ A 8

  24. Light Transport Simulation and Reinforcement Learning Structural equivalence of the integral equations � matching terms � L ( x , ω ) = L e ( x , ω ) + f r ( ω i , x , ω ) cos θ i L ( h ( x , ω i ) , − ω i ) d ω i S 2 + ( x ) � � � Q ′ ( s , a ) π ( s ′ , a ′ ) Q ( s ′ , a ′ ) da ′ = ( 1 − α ) Q ( s , a )+ α r ( s , a ) + γ A hints at learning the incident radiance � � � Q ′ ( x , ω ) = ( 1 − α ) Q ( x , ω )+ α L e ( y , − ω )+ + ( y ) f r ( ω i , y , − ω ) cos θ i Q ( y , ω i ) d ω i S 2 as a policy for selecting an action ω in state x to reach the next state y := h ( x , ω ) � the learning rate α is the only parameter left 8

  25. Light Transport Simulation and Reinforcement Learning Discretization of Q in analogy to irradiance representations � action space: a ∈ S 2 + ( y ) – equally sized patches √     1 − u 2 cos ( 2 π v ) x √  = 1 − u 2 sin ( 2 π v ) y    z u 9

  26. Light Transport Simulation and Reinforcement Learning Discretization of Q in analogy to irradiance representations � action space: a ∈ S 2 + ( y ) – equally sized patches √     1 − u 2 cos ( 2 π v ) x √  = 1 − u 2 sin ( 2 π v ) y    z u � state space: s ∈ dV – Voronoi diagram of low-discrepancy sequence � Φ 2 ( i ) � x i = for i = 0 ,..., N − 1 i / N – nearest neighbor search 9

  27. Light Transport Simulation and Reinforcement Learning Discretization of Q in analogy to irradiance representations � action space: a ∈ S 2 + ( y ) – equally sized patches √     1 − u 2 cos ( 2 π v ) x √  = 1 − u 2 sin ( 2 π v ) y    z u � state space: s ∈ dV – Voronoi diagram of low-discrepancy sequence � Φ 2 ( i ) � x i = for i = 0 ,..., N − 1 i / N – nearest neighbor search including surface normal 9

Recommend


More recommend