notes on the vapnik chervonenkis theorem background and
play

NOTES ON THE VAPNIK-CHERVONENKIS THEOREM: BACKGROUND AND PROOF - PDF document

NOTES ON THE VAPNIK-CHERVONENKIS THEOREM: BACKGROUND AND PROOF ROLAND WALKER 1. Introduction Vladimir Vapnik and Alexey Chervonenkis proved their eponymous theorem in 1968. The original Russian proof was published in 1971 and then translated to


  1. NOTES ON THE VAPNIK-CHERVONENKIS THEOREM: BACKGROUND AND PROOF ROLAND WALKER 1. Introduction Vladimir Vapnik and Alexey Chervonenkis proved their eponymous theorem in 1968. The original Russian proof was published in 1971 and then translated to English by B. Seckler later that year. The English translation was most recently reprinted in 2015 [4]. These notes, which provide a relatively self-contained proof of the VC Theorem, assume the reader has some comfort with the basics of real analysis (e.g., Chapters 1 and 2 of [2]) but little or no background in probability theory. In addition to the original paper, we used Chapter 7 and Appendix B of [3] as a reference for the proof of the VC theorem and Appendix A of [1] as a reference for the proof of Chernoff’s theorem. 2. Products of σ -algebras Let I be a nonempty set, and let ( X i , A i ) i ∈ I be a family of measurable spaces (i.e., each X i is a nonempty set and each A i is a σ -algebra on X i ). Definition 2.1. The product � i ∈ I A i is the σ -algebra on � i ∈ I X i given by � π − 1 �� �� A i = σ ( A i ) : i ∈ I, A i ∈ A i . i i ∈ I Moreover, if I = { 0 , . . . , n − 1 } for some n ≥ 2, we often write A 0 ⊗ · · · ⊗ A n − 1 for � i ∈ I A i just as we often write X 0 × · · · × X n − 1 for � i ∈ I X i . Lemma 2.2. If I is countable, then ��� �� � A i = σ A i : A i ∈ A i . i ∈ I i ∈ I Proof. A σ -algebra is closed under taking countable intersections. � Lemma 2.3. If ( E i ) i ∈ I is such that each A i = σ ( E i ) , then � π − 1 �� �� A i = σ ( E i ) : i ∈ I, E i ∈ E i . i i ∈ I If, in addition, I is countable, then ��� �� � A i = σ E i : E i ∈ E i . i ∈ I i ∈ I 1

  2. 2 ROLAND WALKER Lemma 2.4. If I = J ⊔ K , with both J and K nonempty, then   �� � � �  ⊗ A i = A j A k . (2.1) i ∈ I j ∈ J k ∈ K Proof. By Lemma 2.3, the right-hand side of (2.1) is the σ -algebra generated by sets of the form π − 1 j ( A j ) ∩ π − 1 k ( A k ) where j ∈ J , k ∈ K , A j ∈ A j , and A k ∈ A k . � Corollary 2.5. For finite products, the operator ⊗ is associative. 3. Product Measures Let n ≥ 2, and let ( X i , A i , µ i ) i<n be a family of measure spaces; i.e., each µ i : A i → [0 , ∞ ] is a measure (see [2, p. 24]) on the measurable space ( X i , A i ). Let R denote the collection of rectangular sets in A 0 ⊗ · · · ⊗ A n − 1 ; i.e., R = { A 0 × · · · × A n − 1 : A i ∈ A i } . It follows that R is an elementary family (see [2, p. 23]), so the set     � F = R j : 1 ≤ m < ω, R j ∈ R  .  j<m consisting of all finite disjoint unions of rectangles is an algebra [2, Proposition 1.7]. Let ρ : R → [0 , ∞ ] be defined by A 0 × · · · × A n − 1 �→ µ 0 ( A 0 ) · · · µ n − 1 ( A n − 1 ) . Claim 3.1. Suppose ( S j ) j<ω ⊆ R is a family of pairwise disjoint rectangles and R = � j<ω S j . If R ∈ R , then ρ ( R ) = � j<ω ρ ( S j ) . Proof. Suppose R = A 0 × · · · × A n − 1 and each S j = B j 0 × · · · × B j n − 1 with each A i and B j i in A i . Since 1 A 0 ( x 0 ) · · · 1 A n − 1 ( x n − 1 ) = 1 A 0 ×···× A n − 1 ( x 0 , . . . , x n − 1 ) � = 1 B j n − 1 ( x 0 , . . . , x n − 1 ) 0 ×··· B j j<ω � = 1 B j 0 ( x 0 ) · · · 1 B j n − 1 ( x n − 1 ) j<ω for all ( x 0 , . . . , x n − 1 ) ∈ X 0 × · · · × X n − 1 , [2, Theorem 2.15] asserts that µ 0 ( A 0 ) · · · µ n − 1 ( A n − 1 ) � � = · · · 1 A 0 ( x 0 ) · · · 1 A n − 1 ( x n − 1 ) dµ 0 ( x 0 ) · · · dµ n − 1 ( x n − 1 ) X n − 1 X 0 � � � = · · · 1 B j 0 ( x 0 ) · · · 1 B j n − 1 ( x n − 1 ) dµ 0 ( x 0 ) · · · dµ n − 1 ( x n − 1 ) X n − 1 X 0 j<ω � µ 0 ( B j 0 ) · · · µ n − 1 ( B j = n − 1 ) . j<ω �

  3. NOTES ON THE VAPNIK-CHERVONENKIS THEOREM: BACKGROUND AND PROOF 3 Let ν : F → [0 , ∞ ] be defined by    �  = � ν R j ρ ( R j ) . j<m j<m In order to show that ν is well-defined, suppose that � j<m R j and � k<m S k describe the same set in F . For each j < m , suppose R j = A j 0 × · · · × A j n − 1 and S k = n − 1 with each A j B k 0 × · · · × B k i and B k i in A i . By Claim 3.1, we have   � � � �  � � A j A j  = ν R j µ 0 · · · µ n − 1 0 n − 1 j<m j<m � � � � � A j A j 0 ∩ B k n − 1 ∩ B k = µ 0 · · · µ n − 1 0 n − 1 j,k<m � � B k � � B k � = µ 0 · · · µ n − 1 0 n − 1 k<m � � � = ν S k . k<m Next, we show that ν is a premeasure on F (see [2, p.30]). Let � j<m R j ∈ F , and �� k<m ℓ S ℓ � �� k<m ℓ S ℓ � let ℓ<ω ⊆ F be pairwise disjoint. Suppose � j<m R j = � . k k ℓ<ω By Claim 3.1, it follows that    � �  = ν R j ρ ( R j ) j<m j<m � � � R j ∩ S ℓ � � = ρ k j<m ℓ<ω k<m ℓ � � � R j ∩ S ℓ � � = ρ k ℓ<ω k<m ℓ j<m � � S ℓ � � = ρ k ℓ<ω k<m ℓ � � � � S ℓ = ν . k ℓ<ω k<m ℓ Let ν ∗ be the outer measure associated with ν ; i.e., ν ∗ : P ( X 0 × · · · × X n − 1 ) → [0 , ∞ ] where     ν ∗ ( A ) = inf � � ν ( F j ) : F j ∈ F , A ⊆ F j  .  j<ω j<ω Definition 3.2. The product measure µ 0 × · · · × µ n − 1 is the restriction of ν ∗ to A 0 ⊗ · · · ⊗ A n − 1 . By [2, Proposition 1.13], this product is indeed a measure which extends ρ . If, in addition, each µ i is σ -finite, then [2, Proposition 1.14] implies that the product is the unique measure extending ρ to A 0 ⊗ · · · ⊗ A n − 1 .

  4. 4 ROLAND WALKER Lemma 3.3. If each µ i is σ -finite, then the product µ 0 × · · · × µ n − 1 is associative. Proof. Suppose I ⊔ J = { 0 , . . . , n − 1 } where both I and J are nonempty. Let µ I = � i ∈ I µ i and µ J = � j ∈ J µ j . It follows that ( µ I × µ J ) ⇂ R = ρ . � 4. Pushforwards Suppose ( X, A ) and ( Y, B ) are measurable spaces and f : X → Y is an ( A , B )- measurable function. Definition 4.1. If µ : A → [0 , ∞ ] is a measure, then we call µ ◦ f − 1 : B → [0 , ∞ ] its pushforward by f . Claim 4.2. The pushforward µ ◦ f − 1 is a measure. Proof. Notice that µ ◦ f − 1 ( ∅ ) = µ ( ∅ ) = 0. Suppose ( B i : i < ω ) ⊆ B is pairwise disjoint. It follows that ( f − 1 ( B i ) : i < ω ) ⊆ A is also pairwise disjoint, so µ ◦ f − 1 �� � �� � � f − 1 ( B i ) µ ◦ f − 1 ( B i ) . B i = µ = � 5. Probability Spaces Definition 5.1. A probability space is a measure space (Ω , A , P ) with P (Ω) = 1 . Definition 5.2. If (Ω , A , P ) is a probability space, then the P -measurable sets (i.e., the elements of A ) are called events. 6. Random Elements and Variables Let (Ω , A , P ) be a probability space. Definition 6.1. A random element of a measurable space (Ψ , B ) is an ( A , B )- measurable function X : Ω → Ψ. Furthermore, if Ψ = R and B = B ( R ), then we call X a random variable . When describing events using preimages of random elements, we often use [ X ∈ B ] for { ω ∈ Ω : X ( ω ) ∈ B } , [ X > r ] for { ω ∈ Ω : X ( ω ) > r } , etc. This abbreviation practice is common in the literature of probability theory. As an aid to the reader, we set off such abbreviations with square brackets rather than braces. Definition 6.2. We say that a collection of random elements X 0 , . . . , X n − 1 of mea- surables spaces (Ψ 0 , B 0 ) , . . . , (Ψ n − 1 , B n − 1 ), respectively, are mutually independent iff: for all ( B 0 , . . . , B n − 1 ) ∈ B 0 × · · · B n − 1 , we have P [ X 0 ∈ B 0 , . . . , X n − 1 ∈ B n − 1 ] = P [ X 0 ∈ B 0 ] · · · P [ X n − 1 ∈ B n − 1 ] . Definition 6.3. If X is a random element of (Ψ , B ), then the probability distribution of X is the pushforward P ◦ X − 1 : B → [0 , 1].

Recommend


More recommend