Introduction Data-Dependent Error Analysis Applications and Comparisons of Bounds Summary New Error Bounds for Approximations from Projected Linear Equations H. Yu ∗ . Bertsekas ∗∗ D. P ∗ Department of Computer Science University of Helsinki ∗∗ Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology European Workshop on Reinforcement Learning, Lille, France, Jun. 30 – Jul. 4, 2008
Introduction Data-Dependent Error Analysis Applications and Comparisons of Bounds Summary Outline Introduction Data-Dependent Error Analysis Applications and Comparisons of Bounds Summary
Introduction Data-Dependent Error Analysis Applications and Comparisons of Bounds Summary Projected Equations and TD Type Methods x ∗ : a solution of the linear fixed point equation x = Ax + b ¯ x : the solution of the projected equation x = Π( Ax + b ) Π : weighted Euclidean projection on subspace S ⊂ ℜ n , dim ( S ) << n Assume: I − Π A invertible Example: TD( λ ) for approximate policy evaluation in MDP • Solve a projected form of a multistep Bellman equation; linear function approximation of the cost function • A : a stochastic or substochastic matrix • Π A is usually a contraction Example: large linear systems of equations in general
Introduction Data-Dependent Error Analysis Applications and Comparisons of Bounds Summary Two Standard Error Bounds for the Contraction Case x ∗ − ¯ x : approximation error due to solving projected equation Standard bound I (arbitrary norm): assume � Π A � = α < 1, then 1 � x ∗ − ¯ 1 − α � x ∗ − Π x ∗ � x � ≤ (1) Standard bound II (weighted Euclidean norm � · � ξ , use Pythagorean theorem, much sharper than I): assume � Π A � ξ = α < 1, then 1 � x ∗ − ¯ 1 − α 2 � x ∗ − Π x ∗ � ξ x � ξ ≤ √ (2) • These are upper bounds on the ratios of � x ∗ − ¯ x − Π x ∗ � ξ bias-to-distance: � ¯ x � ξ amplification: � x ∗ − Π x ∗ � ξ � x ∗ − Π x ∗ � ξ • Our bounds will be in a similar form � x ∗ − ¯ x � ξ ≤ B ( A , ξ, S ) � x ∗ − Π x ∗ � ξ , but apply to both contraction and non-contraction cases .
Introduction Data-Dependent Error Analysis Applications and Comparisons of Bounds Summary Illustration of the Form of Bounds x ∗ Cone specified by error bound B ( A , ξ, S ) Approximation ¯ x Π x ∗ S • B ( A , ξ, S ) = 1 x = Π x ∗ ¯ ⇒
Introduction Data-Dependent Error Analysis Applications and Comparisons of Bounds Summary Data-Dependent Error Analysis: Motivations Motivation I: with or without contraction assumptions, x ∗ − ¯ x = ( I − Π A ) − 1 ( x ∗ − Π x ∗ ) (3) How this equality is relaxed in the standard bounds: • Standard bound I: ( I − Π A ) − 1 = I + Π A + (Π A ) 2 + · · · , � (Π A ) m � ≤ α m • Standard bound II: ( I − Π A ) − 1 = I + Π A ( I − Π A ) − 1 � x ∗ − ¯ ξ = � x ∗ − Π x ∗ � 2 ξ + � Π A ( I − Π A ) − 1 ( x ∗ − Π x ∗ ) � 2 x � 2 ξ = � x ∗ − Π x ∗ � 2 ξ + � Π A ( x ∗ − ¯ ξ ≤ � x ∗ − Π x ∗ � 2 ξ + α 2 � x ∗ − ¯ x ) � 2 x � 2 ξ
Introduction Data-Dependent Error Analysis Applications and Comparisons of Bounds Summary Data-Dependent Error Analysis: Motivations Motivation II: ( I − Π A ) − 1 = I + Π A ( I − Π A ) − 1 = I + ( I − Π A ) − 1 Π A (i) Bound the term ( I − Π A ) − 1 Π A ( x ∗ − Π x ∗ ) directly so that α will not be in the denominator (ii) Seek computable bounds with low order calculations involving small size matrices Consider the technical side of (ii): some notation and facts • Φ : n × k matrix, whose columns form a basis of S ; Ξ = diag ( ξ ) • k × k matrices: F = ( I − B − 1 M ) − 1 B = Φ ′ ΞΦ , M = Φ ′ Ξ A Φ , • Π = Φ(Φ ′ ΞΦ) − 1 Φ ′ Ξ = Φ B − 1 Φ ′ Ξ ; the projected equation is equivalent to Φ r = Φ B − 1 ` ´ Mr + Φ ′ Ξ b , r ∈ ℜ k • B and M can be computed easily by simulation.
Introduction Data-Dependent Error Analysis Applications and Comparisons of Bounds Summary Technical Lemmas for New Error Bounds Lemma 1 ( I − Π A ) − 1 = I + ( I − Π A ) − 1 Π A = I + Φ FB − 1 Φ ′ Ξ A . (4) ⇒ F = ( I − B − 1 M ) − 1 exists. Also, I − Π A invertible ⇐ Lemma 2 H and D: n × k and k × n matrix, respectively. Then, ` ´ � HD � 2 ( H ′ Ξ H )( D Ξ − 1 D ′ ) ξ = σ . (5) Apply the lemmas to bound � ( I − Π A ) − 1 ( x ∗ − Π x ∗ ) � ξ : ( I − Π A ) − 1 Π A ( x ∗ − Π x ∗ ) A ( x ∗ − Π x ∗ ) Lemma 1 Φ FB − 1 Φ ′ Ξ First bound: = | {z } |{z} H D Lemma 2 � ( I − Π A ) − 1 Π A ( x ∗ − Π x ∗ ) � 2 ξ � ( x ∗ − Π x ∗ ) � 2 σ ( G 1 ) � A � 2 = ⇒ ≤ ξ ξ where G 1 = ( H ′ Ξ H )( D Ξ − 1 D ′ ) = B − 1 F ′ BF .
Introduction Data-Dependent Error Analysis Applications and Comparisons of Bounds Summary Main Results: First Bound Theorem 1 q � x ∗ − ¯ ξ � x ∗ − Π x ∗ � ξ 1 + σ ( G 1 ) � A � 2 x � ξ ≤ (6) where • G 1 is the product of k × k matrices G 1 = B − 1 F ′ BF (7) • σ ( G 1 ) = � ( I − Π A ) − 1 Π � 2 ξ , so the bound is invariant to the choice of basis vectors of S (i.e., Φ ). Notes: • Thm. 1 equivalent to � ( I − Π A ) − 1 Π A ( x ∗ − Π x ∗ ) � ξ ≤ � ( I − Π A ) − 1 Π � ξ � A � ξ � x ∗ − Π x ∗ � ξ • Easy to compute, and better than the standard bound I • Weaknesses: two over-relaxations; � A � ξ is required
Introduction Data-Dependent Error Analysis Applications and Comparisons of Bounds Summary Two Over-Relaxations in Theorem 1 1. Π( x ∗ − Π x ∗ ) = 0 is not used. • Effect: degrade (to the standard bound I in the contraction case), if S nearly contains an eigenvector of A associated with the dominant real eigenvalue. • For applications in practice: orthogonalization of basis vectors w.r.t. the eigenspace to obtain sharper bounds 2. When Π A is near zero, the bound cannot fully utilize this fact. • This is due to the splitting of Π and A in bounding � ( I − Π A ) − 1 Π A � : � Π A + Π A ( I − Π A ) − 1 Π A � ξ ≤ � Π + Π A ( I − Π A ) − 1 Π � ξ � A � ξ Thm. 1 ⇔ • Effect: when Π A is near zero but � A � ξ = 1, σ ( G 1 ) ≈ � Π � 2 ξ = 1, and the √ bound tends to 2 instead of 1. Apply the lemmas in a different way to sharpen the bound = ⇒ the second bound
Introduction Data-Dependent Error Analysis Applications and Comparisons of Bounds Summary Main Results: Second Bound Use the fact Π( x ∗ − Π x ∗ ) = 0, ‚ ‚ ‚ ‚ ‚ ( I − Π A ) − 1 Π A ( x ∗ − Π x ∗ ) ‚ ( I − Π A ) − 1 Π A ( I − Π)( x ∗ − Π x ∗ ) ‚ ‚ ‚ ‚ ξ = ‚ ‚ ξ ‚ ‚ ξ � x ∗ − Π x ∗ � ξ ‚ ( I − Π A ) − 1 Π A ( I − Π) ‚ ‚ ≤ ‚ Relate the norm of the matrix to the spectral radius of a k × k matrix: ‚ ‚ ‚ ‚ 2 2 Lemma 1 ‚ ( I − Π A ) − 1 Π A ( I − Π) ‚ Φ FB − 1 Φ ′ Ξ A ( I − Π) ‚ ‚ ‚ ‚ = ‚ ‚ | {z } ξ ξ | {z } H D ` ´ Lemma 2 ( H ′ Ξ H )( D Ξ − 1 D ′ ) = σ Notes: • Incorporating the matrix I − Π is crucial for improving the bound. • � A � ξ is no longer needed.
Introduction Data-Dependent Error Analysis Applications and Comparisons of Bounds Summary Main Results: Second Bound Theorem 2 � x ∗ − ¯ p 1 + σ ( G 2 ) � x ∗ − Π x ∗ � ξ x � ξ ≤ (8) where • G 2 is the product of k × k matrices G 2 = B − 1 F ′ BFB − 1 ( R − MB − 1 M ′ ) , R = Φ ′ Ξ A Ξ − 1 A ′ ΞΦ , (9) • σ ( G 2 ) = � ( I − Π A ) − 1 Π A ( I − Π) � 2 ξ , so the bound is invariant to the choice of basis vectors of S (i.e., Φ ). Proposition 1 (Comparison with the Standard Bound II) Assume that � Π A � ξ ≤ α < 1 . Then, the error bound (8) is always no worse than the standard bound II, i.e., 1 + σ ( G 2 ) ≤ 1 / ( 1 − α 2 ) . Notes: • The bound is tight in the worst case sense. • Estimating R by simulation is less straightforward than estimating B and M ; it is doable, except for TD( λ ) with λ > 0.
Introduction Data-Dependent Error Analysis Applications and Comparisons of Bounds Summary MDP Applications and Numerical Comparisons of Bounds Cost function approximation for MDP with TD( λ ): • A is defined for a pair of values ( α, λ ) by ∞ X A = P ( α,λ ) def λ ℓ ( α P ) ℓ + 1 = ( 1 − λ ) ℓ = 0 discounted cases: α ∈ [ 0 , 1 ) , λ ∈ [ 0 , 1 ] undiscounted cases: α = 1 , λ ∈ [ 0 , 1 ) Choices of the projection norm: • W/o exploration: ξ = invariant distribution of P ; Π A contraction • W/ exploration: ξ determined by policies/simulations that enhance exploration; Π A may or may not be contraction ( λ needs to be chosen properly; LSTD(0) always safe to apply) On applying Thm. 1: • e = [ 1 , 1 , . . . , 1 ] ′ : an eigenvector of A associated with the dominant eigenvalue ( 1 − λ ) α 1 − α . • To obtain a sharper bound, orthogonalize the basis vectors w.r.t. e (i.e., project them on e ⊥ – easy to do online).
Recommend
More recommend