Vapnik-Chervonenkis Density in Model Theory Matthias Aschenbrenner University of California, Los Angeles (joint with A. Dolich, D. Haskell, D. Macpherson, and S. Starchenko)
Outline • VC dimension and VC density • VC duality • The model-theoretic context • Uniform bounds on VC density
VC dimension and VC density Let ( X, S ) be a set system : • X is a set (the base set ), most of the time assumed infinite; • S is a collection of subsets of X . We sometimes also speak of a set system S on X . Given A ⊆ X , we let S ∩ A := { S ∩ A : S ∈ S} and call ( A, S ∩ A ) the set system on A induced by S . We say A is shattered by S if S ∩ A = 2 A .
VC dimension and VC density If S � = ∅ , then we define the VC dimension of S , denoted by VC( S ) , as the supremum (in N ∪ {∞} ) of the sizes of all finite subsets of X shattered by S . We also decree VC( ∅ ) := −∞ . Examples 1 X = R , S = all unbounded intervals. Then VC( S ) = 2 . 2 X = R 2 , S = all halfspaces. Then VC( S ) = 3 . One point in the convex hull No point in the convex hull of the others of the others 3 Let S = half spaces in R d . Then VC( S ) = d + 1 . (The inequality � follows from Radon’s Lemma .)
VC dimension and VC density Examples (continued) 4 X = R 2 , S = all convex polygons. Then VC( S ) = ∞ . (But VC( { convex n -gons in R 2 } ) = 2 n + 1 .)
VC dimension and VC density The function � � X �� n �→ π S ( n ) := max |S ∩ A | : A ∈ : N → N n is called the shatter function of S . Then n : π S ( n ) = 2 n � � VC( S ) = sup . One says that S is a VC class if VC( S ) < ∞ . The notion of VC dimension was introduced by Vladimir Vapnik and Alexey Chervonenkis in the early 1970s, in the context of computational learning theory.
VC dimension and VC density A surprising dichotomy holds for π S : Lemma (Sauer-Shelah) If VC( S ) = d < ∞ ( so π S ( n ) < 2 n for n > d ) then � n � n � n � � � π S ( n ) � := + · · · + for every n . � d 0 d An illuminating proof of this lemma is due to Frankl: it is enough to show that if S is a set system on a finite set X , then � X S ∩ B � = 2 B for all B ∈ � | X | � � ⇒ |S| � . d +1 � d This claim is trivially true if S is assumed to be an ideal (i.e., closed under taking subsets). One then shows that there exists an ideal T on X with |S| = |T | and |S ∩ B | � |T ∩ B | for all B .
VC dimension and VC density The Sauer-Shelah dichotomy Either • π S ( n ) = 2 n for every n (if S is not a VC class), or • π S ( n ) = O ( n d ) where d = VC( S ) < ∞ . One may now define the VC density of S as inf { r ∈ R > 0 : π S ( n ) = O ( n r ) } � if VC( S ) < ∞ vc( S ) = ∞ otherwise. log π S ( n ) ∈ R � 0 ∪ {∞} . = lim sup log n n →∞ We also define vc( ∅ ) := −∞ .
VC dimension and VC density Examples � X � n � � 1 S = . Then VC( S ) = vc( S ) = d ; in fact π S ( n ) = . � d � d 2 S = half spaces in R d . Then VC( S ) = d + 1 [as seen above] and vc( S ) = d . Some basic properties • vc( S ) � VC( S ) , and if one is finite then so is the other; • VC( S ) = 0 ⇐ ⇒ |S| = 1 ; • S is finite ⇐ ⇒ vc( S ) = 0 ⇐ ⇒ vc( S ) < 1 ; • S = S 1 ∪ S 2 ⇒ vc( S ) = max { vc( S 1 ) , vc( S 2 ) } . (So vc( S ) doesn’t change if we alter finitely many sets of S .)
VC dimension and VC density VC density is often the right measure for the combinatorial complexity of a set system. For example, it is related to packing numbers and entropy. Definition Let ( S, d ) be a bounded pseudo-metric space, and ε > 0 . 1 D ⊆ S is an ε -packing if d ( a, b ) > ε for all a � = b in D ; 2 the ε -packing number of ( S, d ) is D ( S, d ; ε ) := max {| D | : D ⊆ S is a finite ε -packing } ; 3 the entropic dimension of ( S, d ) is dim( S, d ) := inf { s ∈ R > 0 : ∃ C > 0 : ∀ ε > 0 : D ( S, d ; ε ) � Cε − s } .
VC dimension and VC density If ( X, A , µ ) is a probability space, then we equip A with the (bounded) pseudo-metric d µ ( A, B ) := µ ( A △ B ) . Theorem (Dudley � , Assouad � ) vc( S ) = sup µ dim( S , d µ ) , where the supremum ranges over all probability measures µ on X making all sets in S measurable. There is a refinement of the inequality vc( S ) � dim( S , d µ ) for µ concentrated uniformly on a finite set (Haussler, Wernisch): D ( S , d µ ; ε ) � Cε − ̺ for all ε > 0 , where C only depends on π S (not on ( X, S ) , µ , . . . ).
VC duality Let X be a set (possibly finite). Given A 1 , . . . , A n ⊆ X , denote by S ( A 1 , . . . , A n ) the set of atoms of the Boolean subalgebra of 2 X generated by A 1 , . . . , A n : those subsets of X of the form � � A i ∩ X \ A i where I ⊆ [ n ] = { 1 , . . . , n } i ∈ I i ∈ [ n ] \ I which are non-empty. Suppose now that S is a set system on X . We define n �→ π ∗ � � S ( n ) := max | S ( A 1 , . . . , A n ) | : A 1 , . . . , A n ∈ S : N → N . S ( n ) = 2 n for every n , We say that S is independent (in X ) if π ∗ and dependent (in X ) otherwise.
VC duality Example ( X = R 2 , S = half planes in R 2 ) � maximum number of regions into which n half π ∗ S ( n ) = planes partition the plane. Adding one half plane to n − 1 given half planes divides at most n of the existing regions into 2 pieces. So π ∗ S ( n ) = O ( n 2 ) . The function π ∗ S is called the dual shatter function of S , since S = π S ∗ for a certain set system S ∗ on (for infinite S ) one has π ∗ X ∗ = S , called the dual of S .
VC duality Let X , Y be infinite sets, Φ ⊆ X × Y a binary relation. Put S Φ := { Φ y : y ∈ Y } ⊆ 2 X where Φ y := { x ∈ X : ( x, y ) ∈ Φ } , and π ∗ Φ := π ∗ π Φ := π S Φ , S Φ , VC(Φ) := VC( S Φ ) , vc(Φ) := vc( S Φ ) . We also write Φ ∗ ⊆ Y × X := � � ( y, x ) ∈ Y × X : ( x, y ) ∈ Φ . In this way we obtain two set systems: ( X, S Φ ) and ( Y, S Φ ∗ ) Given a finite set A ⊆ X we have a bijection A ′ �→ � � Φ ∗ Y \ Φ ∗ S Φ ∩ A → S (Φ ∗ x ∩ x : x : x ∈ A ) . x ∈ A ′ x ∈ A \ A ′
VC duality Hence π Φ = π ∗ Φ ∗ and π Φ ∗ = π ∗ Φ , and thus S Φ is a VC class ⇐ ⇒ S Φ ∗ is dependent, S Φ ∗ is a VC class ⇐ ⇒ S Φ is dependent. Moreover (first noticed by Assouad): S Φ ∗ is a VC class. S Φ is a VC class ⇐ ⇒ Exploiting this VC duality one easily shows: vc( ¬ Φ) = vc(Φ) , vc(Φ ∪ Ψ) � vc(Φ) + vc(Ψ) , vc(Φ ∩ Ψ) � vc(Φ) + vc(Ψ) . VC does not satisfy similar subadditivity properties.
The model-theoretic context We fix: L : a first-order language, x = ( x 1 , . . . , x m ) : object variables, y = ( y 1 , . . . , y n ) : parameter variables, ϕ ( x ; y ) : a partitioned L -formula, M : an infinite L -structure, and T : a complete L -theory without finite models. The set system (on M m ) associated with ϕ in M : S M := { ϕ M ( M m ; b ) : b ∈ M n } ϕ If M ≡ N , then π S M ϕ = π S N ϕ . So, picking M | = T arbitrary, set VC( ϕ ) := VC( S M vc( ϕ ) := vc( S M π ϕ := π S M ϕ , ϕ ) , ϕ ) .
The model-theoretic context The dual of ϕ ( x ; y ) is ϕ ∗ ( y ; x ) := ϕ ( x ; y ) . Put VC ∗ ( ϕ ) := VC( ϕ ∗ ) , vc ∗ ( ϕ ) := vc( ϕ ∗ ) . We have π ∗ ϕ = π ϕ ∗ , hence VC ∗ ( ϕ ) and vc ∗ ( ϕ ) can be computed using the dual shatter function of ϕ . If VC( ϕ ) < ∞ then we say that ϕ is dependent in T . The theory T does not have the independence property (is NIP ) if every partitioned L -formula is dependent in T . An important theorem of Shelah (given other proofs by Laskowski and others) says that for T to be NIP it is enough for for every L -formula ϕ ( x ; y ) with | x | = 1 to be dependent. Many (but not all) well-behaved theories arising naturally in model theory are NIP .
The model-theoretic context Some questions about vc in model theory 1 Possible values of vc( ϕ ) . There exists a formula ϕ ( x ; y ) in L rings with | y | = 4 such that vc ACF 0 ( ϕ ) = 4 vc ACF p ( ϕ ) = 3 3 ; 2 for p > 0 . We do not know an example of a formula ϕ in a NIP theory ∈ Q . with vc( ϕ ) / 2 Growth of π ϕ . There is an example of an ω -stable T and an L -formula ϕ ( x ; y ) with | y | = 2 and π ϕ ( n ) = 1 2 n log n (1 + o (1)) . 3 Uniform bounds on vc( ϕ ). The topic of the rest of the talk.
Uniform bounds on VC density Two extrinsic reasons why it should be interesting to obtain bounds on vc( ϕ ) in terms of | y | = number of free parameters: 1 Connections to strengthenings of the NIP concept: if vc( ϕ ) < 2 for each ϕ ( x ; y ) with | y | = 1 then T is dp-minimal ; 2 uniform bounds on VC density often “explain” why certain well-known bounds on the complexity of geometric arrangements, used in computational geometry, are polynomial in the number of objects involved. Example ( L = language of rings, K | = ACF ) Choose ϕ ( x ; y ) so that S K ϕ is the collection of all zero sets (in K m ) of polynomials in m indeterminates with coefficients in K having degree at most d . Hence π ∗ ϕ ( t ) is the maximum number of non-empty Boolean combinations of t such hypersurfaces. Then π ∗ ϕ ( t ) = π ϕ ∗ ( t ) = O ( t m ) .
Recommend
More recommend