Learning Light Transport the Reinforced Way Ken Dahm and Alexander Keller
Light Transport Simulation How to do importance sampling � compute functionals of a Fredholm integral equation of the 2nd kind � L ( x , ω ) = L e ( x , ω )+ + ( x ) L ( h ( x , ω i ) , − ω i ) f r ( ω i , x , ω ) cos θ i d ω i S 2 2
Light Transport Simulation How to do importance sampling � example: direct illumination � L ( x , ω ) = L e ( x , ω )+ + ( x ) L e ( h ( x , ω i ) , − ω i ) f r ( ω i , x , ω ) cos θ i d ω i S 2 2
Light Transport Simulation How to do importance sampling � example: direct illumination � L ( x , ω ) = L e ( x , ω )+ + ( x ) L e ( h ( x , ω i ) , − ω i ) f r ( ω i , x , ω ) cos θ i d ω i S 2 N − 1 L e ( x , ω )+ 1 L e ( h ( x , ω i ) , − ω i ) f r ( ω i , x , ω ) cos θ i ∑ ≈ N p ( ω i ) i = 0 2
Light Transport Simulation How to do importance sampling � example: direct illumination � L ( x , ω ) = L e ( x , ω )+ + ( x ) L e ( h ( x , ω i ) , − ω i ) f r ( ω i , x , ω ) cos θ i d ω i S 2 N − 1 L e ( x , ω )+ 1 L e ( h ( x , ω i ) , − ω i ) f r ( ω i , x , ω ) cos θ i ∑ ≈ N p ( ω i ) i = 0 p ∼ f r cos θ
Light Transport Simulation How to do importance sampling � example: direct illumination � L ( x , ω ) = L e ( x , ω )+ + ( x ) L e ( h ( x , ω i ) , − ω i ) f r ( ω i , x , ω ) cos θ i d ω i S 2 N − 1 L e ( x , ω )+ 1 L e ( h ( x , ω i ) , − ω i ) f r ( ω i , x , ω ) cos θ i ∑ ≈ N p ( ω i ) i = 0 p ∼ f r cos θ p ∼ L e f r cos θ 2
Machine Learning
Machine Learning Taxonomy supervised learning: learning from labeled data goal: extrapolate/generalize response to unseen data example: artificial neural networks 4
Machine Learning Taxonomy supervised learning: learning from labeled data goal: extrapolate/generalize response to unseen data example: artificial neural networks unsupervised learning: learning from unlabeled data goal: identify structure in data example: autoencoder networks 4
Machine Learning Taxonomy supervised learning: learning from labeled data goal: extrapolate/generalize response to unseen data example: artificial neural networks unsupervised learning: learning from unlabeled data goal: identify structure in data example: autoencoder networks semi-supervised learning: reward-based supervision goal: maximize reward example: reinforcement learning 4
The Reinforcement Learning Problem Maximize reward � policy π t : S → A ( S t ) Agent S t – to select an action A t ∈ A ( S t ) – given current state S t ∈ S S t + 1 R t + 1 ( A t | S t ) A t Environment 5
The Reinforcement Learning Problem Maximize reward � policy π t : S → A ( S t ) Agent S t – to select an action A t ∈ A ( S t ) – given current state S t ∈ S S t + 1 R t + 1 ( A t | S t ) A t � state transition yields reward R t + 1 ( A t | S t ) ∈ R Environment 5
The Reinforcement Learning Problem Maximize reward � policy π t : S → A ( S t ) Agent S t – to select an action A t ∈ A ( S t ) – given current state S t ∈ S S t + 1 R t + 1 ( A t | S t ) A t � state transition yields reward R t + 1 ( A t | S t ) ∈ R Environment � classic goal: find policy to maximize discounted cumulative reward ∞ γ k R t + 1 + k ( A t + k | S t + k ) , where 0 < γ < 1 V π ( S t ) ≡ ∑ k = 0 5
The Reinforcement Learning Problem Q-Learning [Watkins 1989] � finds optimal action-selection policy for any given Markov decision process Q ′ ( s , a ) r ( s , a )+ γ V ( s ′ ) � � = ( 1 − α ) · Q ( s , a )+ α · for a learning rate α ∈ [ 0 , 1 ] 6
The Reinforcement Learning Problem Q-Learning [Watkins 1989] � finds optimal action-selection policy for any given Markov decision process Q ′ ( s , a ) r ( s , a )+ γ V ( s ′ ) � � = ( 1 − α ) · Q ( s , a )+ α · for a learning rate α ∈ [ 0 , 1 ] with the following options for the discounted cumulative reward max a ′ ∈ A Q ( s ′ , a ′ ) consider best action V ( s ′ ) ≡ 6
The Reinforcement Learning Problem Q-Learning [Watkins 1989] � finds optimal action-selection policy for any given Markov decision process Q ′ ( s , a ) r ( s , a )+ γ V ( s ′ ) � � = ( 1 − α ) · Q ( s , a )+ α · for a learning rate α ∈ [ 0 , 1 ] with the following options for the discounted cumulative reward max a ′ ∈ A Q ( s ′ , a ′ ) consider best action V ( s ′ ) ∑ a ′ ∈ A π ( s ′ , a ′ ) Q ( s ′ , a ′ ) ≡ policy weighted average over discrete action space 6
The Reinforcement Learning Problem Q-Learning [Watkins 1989] � finds optimal action-selection policy for any given Markov decision process Q ′ ( s , a ) r ( s , a )+ γ V ( s ′ ) � � = ( 1 − α ) · Q ( s , a )+ α · for a learning rate α ∈ [ 0 , 1 ] with the following options for the discounted cumulative reward max a ′ ∈ A Q ( s ′ , a ′ ) consider best action V ( s ′ ) ∑ a ′ ∈ A π ( s ′ , a ′ ) Q ( s ′ , a ′ ) ≡ policy weighted average over discrete action space � A π ( s ′ , a ′ ) Q ( s ′ , a ′ ) da ′ policy weighted average over continuous action space 6
Light Transport Simulation and Reinforcement Learning
Light Transport Simulation and Reinforcement Learning Structural equivalence of the integral equations � matching terms � L ( x , ω ) = L e ( x , ω ) + f r ( ω i , x , ω ) cos θ i L ( h ( x , ω i ) , − ω i ) d ω i S 2 + ( x ) � � � Q ′ ( s , a ) π ( s ′ , a ′ ) Q ( s ′ , a ′ ) da ′ = ( 1 − α ) Q ( s , a )+ α r ( s , a ) + γ A 8
Light Transport Simulation and Reinforcement Learning Structural equivalence of the integral equations � matching terms � L ( x , ω ) = L e ( x , ω ) + f r ( ω i , x , ω ) cos θ i L ( h ( x , ω i ) , − ω i ) d ω i S 2 + ( x ) � � � Q ′ ( s , a ) π ( s ′ , a ′ ) Q ( s ′ , a ′ ) da ′ = ( 1 − α ) Q ( s , a )+ α r ( s , a ) + γ A 8
Light Transport Simulation and Reinforcement Learning Structural equivalence of the integral equations � matching terms � L ( x , ω ) = L e ( x , ω ) + f r ( ω i , x , ω ) cos θ i L ( h ( x , ω i ) , − ω i ) d ω i S 2 + ( x ) � � � Q ′ ( s , a ) π ( s ′ , a ′ ) Q ( s ′ , a ′ ) da ′ = ( 1 − α ) Q ( s , a )+ α r ( s , a ) + γ A 8
Light Transport Simulation and Reinforcement Learning Structural equivalence of the integral equations � matching terms � L ( x , ω ) = L e ( x , ω ) + f r ( ω i , x , ω ) cos θ i L ( h ( x , ω i ) , − ω i ) d ω i S 2 + ( x ) � � � Q ′ ( s , a ) π ( s ′ , a ′ ) Q ( s ′ , a ′ ) da ′ = ( 1 − α ) Q ( s , a )+ α r ( s , a ) + γ A 8
Light Transport Simulation and Reinforcement Learning Structural equivalence of the integral equations � matching terms � L ( x , ω ) = L e ( x , ω ) + f r ( ω i , x , ω ) cos θ i L ( h ( x , ω i ) , − ω i ) d ω i S 2 + ( x ) � � � Q ′ ( s , a ) π ( s ′ , a ′ ) Q ( s ′ , a ′ ) da ′ = ( 1 − α ) Q ( s , a )+ α r ( s , a ) + γ A 8
Light Transport Simulation and Reinforcement Learning Structural equivalence of the integral equations � matching terms � L ( x , ω ) = L e ( x , ω ) + f r ( ω i , x , ω ) cos θ i L ( h ( x , ω i ) , − ω i ) d ω i S 2 + ( x ) � � � Q ′ ( s , a ) π ( s ′ , a ′ ) Q ( s ′ , a ′ ) da ′ = ( 1 − α ) Q ( s , a )+ α r ( s , a ) + γ A hints at learning the incident radiance � � � Q ′ ( x , ω ) = ( 1 − α ) Q ( x , ω )+ α L e ( y , − ω )+ + ( y ) f r ( ω i , y , − ω ) cos θ i Q ( y , ω i ) d ω i S 2 as a policy for selecting an action ω in state x to reach the next state y := h ( x , ω ) � the learning rate α is the only parameter left 8
Light Transport Simulation and Reinforcement Learning Discretization of Q in analogy to irradiance representations � action space: a ∈ S 2 + ( y ) – equally sized patches √ 1 − u 2 cos ( 2 π v ) x √ = 1 − u 2 sin ( 2 π v ) y z u 9
Light Transport Simulation and Reinforcement Learning Discretization of Q in analogy to irradiance representations � action space: a ∈ S 2 + ( y ) – equally sized patches √ 1 − u 2 cos ( 2 π v ) x √ = 1 − u 2 sin ( 2 π v ) y z u � state space: s ∈ dV – Voronoi diagram of low-discrepancy sequence � Φ 2 ( i ) � x i = for i = 0 ,..., N − 1 i / N – nearest neighbor search 9
Light Transport Simulation and Reinforcement Learning Discretization of Q in analogy to irradiance representations � action space: a ∈ S 2 + ( y ) – equally sized patches √ 1 − u 2 cos ( 2 π v ) x √ = 1 − u 2 sin ( 2 π v ) y z u � state space: s ∈ dV – Voronoi diagram of low-discrepancy sequence � Φ 2 ( i ) � x i = for i = 0 ,..., N − 1 i / N – nearest neighbor search including surface normal 9
Recommend
More recommend