vapnik chervonenkis density in model theory
play

Vapnik-Chervonenkis Density in Model Theory Matthias Aschenbrenner - PowerPoint PPT Presentation

Vapnik-Chervonenkis Density in Model Theory Matthias Aschenbrenner University of California, Los Angeles (joint with A. Dolich, D. Haskell, D. Macpherson, and S. Starchenko) Outline VC dimension and VC density VC duality The


  1. Vapnik-Chervonenkis Density in Model Theory Matthias Aschenbrenner University of California, Los Angeles (joint with A. Dolich, D. Haskell, D. Macpherson, and S. Starchenko)

  2. Outline • VC dimension and VC density • VC duality • The model-theoretic context • Uniform bounds on VC density

  3. VC dimension and VC density Let ( X, S ) be a set system : • X is a set (the base set ), most of the time assumed infinite; • S is a collection of subsets of X . We sometimes also speak of a set system S on X . Given A ⊆ X , we let S ∩ A := { S ∩ A : S ∈ S} and call ( A, S ∩ A ) the set system on A induced by S . We say A is shattered by S if S ∩ A = 2 A .

  4. VC dimension and VC density If S � = ∅ , then we define the VC dimension of S , denoted by VC( S ) , as the supremum (in N ∪ {∞} ) of the sizes of all finite subsets of X shattered by S . We also decree VC( ∅ ) := −∞ . Examples 1 X = R , S = all unbounded intervals. Then VC( S ) = 2 . 2 X = R 2 , S = all halfspaces. Then VC( S ) = 3 . One point in the convex hull No point in the convex hull of the others of the others 3 Let S = half spaces in R d . Then VC( S ) = d + 1 . (The inequality � follows from Radon’s Lemma .)

  5. VC dimension and VC density Examples (continued) 4 X = R 2 , S = all convex polygons. Then VC( S ) = ∞ . (But VC( { convex n -gons in R 2 } ) = 2 n + 1 .)

  6. VC dimension and VC density The function � � X �� n �→ π S ( n ) := max |S ∩ A | : A ∈ : N → N n is called the shatter function of S . Then n : π S ( n ) = 2 n � � VC( S ) = sup . One says that S is a VC class if VC( S ) < ∞ . The notion of VC dimension was introduced by Vladimir Vapnik and Alexey Chervonenkis in the early 1970s, in the context of computational learning theory.

  7. VC dimension and VC density A surprising dichotomy holds for π S : Lemma (Sauer-Shelah) If VC( S ) = d < ∞ ( so π S ( n ) < 2 n for n > d ) then � n � n � n � � � π S ( n ) � := + · · · + for every n . � d 0 d An illuminating proof of this lemma is due to Frankl: it is enough to show that if S is a set system on a finite set X , then � X S ∩ B � = 2 B for all B ∈ � | X | � � ⇒ |S| � . d +1 � d This claim is trivially true if S is assumed to be an ideal (i.e., closed under taking subsets). One then shows that there exists an ideal T on X with |S| = |T | and |S ∩ B | � |T ∩ B | for all B .

  8. VC dimension and VC density The Sauer-Shelah dichotomy Either • π S ( n ) = 2 n for every n (if S is not a VC class), or • π S ( n ) = O ( n d ) where d = VC( S ) < ∞ . One may now define the VC density of S as inf { r ∈ R > 0 : π S ( n ) = O ( n r ) } � if VC( S ) < ∞ vc( S ) = ∞ otherwise. log π S ( n ) ∈ R � 0 ∪ {∞} . = lim sup log n n →∞ We also define vc( ∅ ) := −∞ .

  9. VC dimension and VC density Examples � X � n � � 1 S = . Then VC( S ) = vc( S ) = d ; in fact π S ( n ) = . � d � d 2 S = half spaces in R d . Then VC( S ) = d + 1 [as seen above] and vc( S ) = d . Some basic properties • vc( S ) � VC( S ) , and if one is finite then so is the other; • VC( S ) = 0 ⇐ ⇒ |S| = 1 ; • S is finite ⇐ ⇒ vc( S ) = 0 ⇐ ⇒ vc( S ) < 1 ; • S = S 1 ∪ S 2 ⇒ vc( S ) = max { vc( S 1 ) , vc( S 2 ) } . (So vc( S ) doesn’t change if we alter finitely many sets of S .)

  10. VC dimension and VC density VC density is often the right measure for the combinatorial complexity of a set system. For example, it is related to packing numbers and entropy. Definition Let ( S, d ) be a bounded pseudo-metric space, and ε > 0 . 1 D ⊆ S is an ε -packing if d ( a, b ) > ε for all a � = b in D ; 2 the ε -packing number of ( S, d ) is D ( S, d ; ε ) := max {| D | : D ⊆ S is a finite ε -packing } ; 3 the entropic dimension of ( S, d ) is dim( S, d ) := inf { s ∈ R > 0 : ∃ C > 0 : ∀ ε > 0 : D ( S, d ; ε ) � Cε − s } .

  11. VC dimension and VC density If ( X, A , µ ) is a probability space, then we equip A with the (bounded) pseudo-metric d µ ( A, B ) := µ ( A △ B ) . Theorem (Dudley � , Assouad � ) vc( S ) = sup µ dim( S , d µ ) , where the supremum ranges over all probability measures µ on X making all sets in S measurable. There is a refinement of the inequality vc( S ) � dim( S , d µ ) for µ concentrated uniformly on a finite set (Haussler, Wernisch): D ( S , d µ ; ε ) � Cε − ̺ for all ε > 0 , where C only depends on π S (not on ( X, S ) , µ , . . . ).

  12. VC duality Let X be a set (possibly finite). Given A 1 , . . . , A n ⊆ X , denote by S ( A 1 , . . . , A n ) the set of atoms of the Boolean subalgebra of 2 X generated by A 1 , . . . , A n : those subsets of X of the form � � A i ∩ X \ A i where I ⊆ [ n ] = { 1 , . . . , n } i ∈ I i ∈ [ n ] \ I which are non-empty. Suppose now that S is a set system on X . We define n �→ π ∗ � � S ( n ) := max | S ( A 1 , . . . , A n ) | : A 1 , . . . , A n ∈ S : N → N . S ( n ) = 2 n for every n , We say that S is independent (in X ) if π ∗ and dependent (in X ) otherwise.

  13. VC duality Example ( X = R 2 , S = half planes in R 2 ) � maximum number of regions into which n half π ∗ S ( n ) = planes partition the plane. Adding one half plane to n − 1 given half planes divides at most n of the existing regions into 2 pieces. So π ∗ S ( n ) = O ( n 2 ) . The function π ∗ S is called the dual shatter function of S , since S = π S ∗ for a certain set system S ∗ on (for infinite S ) one has π ∗ X ∗ = S , called the dual of S .

  14. VC duality Let X , Y be infinite sets, Φ ⊆ X × Y a binary relation. Put S Φ := { Φ y : y ∈ Y } ⊆ 2 X where Φ y := { x ∈ X : ( x, y ) ∈ Φ } , and π ∗ Φ := π ∗ π Φ := π S Φ , S Φ , VC(Φ) := VC( S Φ ) , vc(Φ) := vc( S Φ ) . We also write Φ ∗ ⊆ Y × X := � � ( y, x ) ∈ Y × X : ( x, y ) ∈ Φ . In this way we obtain two set systems: ( X, S Φ ) and ( Y, S Φ ∗ ) Given a finite set A ⊆ X we have a bijection A ′ �→ � � Φ ∗ Y \ Φ ∗ S Φ ∩ A → S (Φ ∗ x ∩ x : x : x ∈ A ) . x ∈ A ′ x ∈ A \ A ′

  15. VC duality Hence π Φ = π ∗ Φ ∗ and π Φ ∗ = π ∗ Φ , and thus S Φ is a VC class ⇐ ⇒ S Φ ∗ is dependent, S Φ ∗ is a VC class ⇐ ⇒ S Φ is dependent. Moreover (first noticed by Assouad): S Φ ∗ is a VC class. S Φ is a VC class ⇐ ⇒ Exploiting this VC duality one easily shows: vc( ¬ Φ) = vc(Φ) , vc(Φ ∪ Ψ) � vc(Φ) + vc(Ψ) , vc(Φ ∩ Ψ) � vc(Φ) + vc(Ψ) . VC does not satisfy similar subadditivity properties.

  16. The model-theoretic context We fix: L : a first-order language, x = ( x 1 , . . . , x m ) : object variables, y = ( y 1 , . . . , y n ) : parameter variables, ϕ ( x ; y ) : a partitioned L -formula, M : an infinite L -structure, and T : a complete L -theory without finite models. The set system (on M m ) associated with ϕ in M : S M := { ϕ M ( M m ; b ) : b ∈ M n } ϕ If M ≡ N , then π S M ϕ = π S N ϕ . So, picking M | = T arbitrary, set VC( ϕ ) := VC( S M vc( ϕ ) := vc( S M π ϕ := π S M ϕ , ϕ ) , ϕ ) .

  17. The model-theoretic context The dual of ϕ ( x ; y ) is ϕ ∗ ( y ; x ) := ϕ ( x ; y ) . Put VC ∗ ( ϕ ) := VC( ϕ ∗ ) , vc ∗ ( ϕ ) := vc( ϕ ∗ ) . We have π ∗ ϕ = π ϕ ∗ , hence VC ∗ ( ϕ ) and vc ∗ ( ϕ ) can be computed using the dual shatter function of ϕ . If VC( ϕ ) < ∞ then we say that ϕ is dependent in T . The theory T does not have the independence property (is NIP ) if every partitioned L -formula is dependent in T . An important theorem of Shelah (given other proofs by Laskowski and others) says that for T to be NIP it is enough for for every L -formula ϕ ( x ; y ) with | x | = 1 to be dependent. Many (but not all) well-behaved theories arising naturally in model theory are NIP .

  18. The model-theoretic context Some questions about vc in model theory 1 Possible values of vc( ϕ ) . There exists a formula ϕ ( x ; y ) in L rings with | y | = 4 such that vc ACF 0 ( ϕ ) = 4 vc ACF p ( ϕ ) = 3 3 ; 2 for p > 0 . We do not know an example of a formula ϕ in a NIP theory ∈ Q . with vc( ϕ ) / 2 Growth of π ϕ . There is an example of an ω -stable T and an L -formula ϕ ( x ; y ) with | y | = 2 and π ϕ ( n ) = 1 2 n log n (1 + o (1)) . 3 Uniform bounds on vc( ϕ ). The topic of the rest of the talk.

  19. Uniform bounds on VC density Two extrinsic reasons why it should be interesting to obtain bounds on vc( ϕ ) in terms of | y | = number of free parameters: 1 Connections to strengthenings of the NIP concept: if vc( ϕ ) < 2 for each ϕ ( x ; y ) with | y | = 1 then T is dp-minimal ; 2 uniform bounds on VC density often “explain” why certain well-known bounds on the complexity of geometric arrangements, used in computational geometry, are polynomial in the number of objects involved. Example ( L = language of rings, K | = ACF ) Choose ϕ ( x ; y ) so that S K ϕ is the collection of all zero sets (in K m ) of polynomials in m indeterminates with coefficients in K having degree at most d . Hence π ∗ ϕ ( t ) is the maximum number of non-empty Boolean combinations of t such hypersurfaces. Then π ∗ ϕ ( t ) = π ϕ ∗ ( t ) = O ( t m ) .

Recommend


More recommend