Graph Based- Discriminators Sample Complexity and Expressiveness Roi Livni and Yishay Mansour
Discrimination β’ A discriminator is provided with two data sets. β’ π 1 βΌ π 1 β’ π 2 βΌ π 2 β’ Decide if π 1 and π 2 are different. β’ If not, provide a certificate.
Motivation: Synthetic Data Generation Goodfellow et al. β 14 https://thispersondoesnotexist.com/
Discrimination: Learning Lens β’ A learner is defined by a class πΌ β 0,1 π β’ Given labelled sample from some distribution π over π Γ 0,1 β’ Learner returns β β πΌ such that π (π¦,π§) β π¦ β π§ β€ min ββπΌ π (π¦,π§) β π¦ β π§ + π β’ If sup πΉ π¦βΌπ 1 β π¦ β πΉ π¦βΌπ 2 β π¦ > π ββπΌ β’ Learner succeeds.
Learning as a discrimination task β’ Discriminator is defined by a class of distinguishers πΌ β 0,1 π Integral Probability Metric: (Muller β 97) π½ππ πΌ π 1 , π 2 = sup |πΉ π¦βΌπ 1 β π¦ β πΉ π¦βΌπ 2 β π¦ | ββπΌ β’ If π½ππ πΌ π 1 , π 2 > π -- return β β πΌ with π½ππ πΌ π 1 , π 2 > π/2 β’ If not, may fail. (return EQUIVALENT).
Higher order discrimination β’ Instead of considering hypotheses classes, what if we take other types of statistical tests: β’ Example: Collision test β’ Estimate probability to draw the same point twice. If different β declare distinct. β’ If not, may fail (return equivalent).
Higher order discrimination β’ Instead of considering hypotheses classes, what if we take other types of distinguishers: β’ More generally: Take a family G = {π: π: π 2 β 0,1 } π½ππ π» π 1 , π 2 = sup πΉ π¦ 1, π¦ 2 )βΌπ 1 2 π π¦ 1 , π¦ 2 β πΉ π¦ 1, π¦ 2 )βΌπ 2 2 π π¦ 1 , π¦ 2 πβπ» β’ Are graph-based distinguishers stronger than classical distinguishers? β’ Sample Complexity?
Expressive power of graph-based discriminators THEOREM: Let X be an infinite domain. There exists a graph g such that: For every hypothesis class H with finite VC dimension and π > 0 , there are two distributions π π‘π§π , π π πππ such that π½ππ πΌ π π‘π§π , π π πππ < π and, [π(π¦ 1 , π¦ 2 )] > 1 πΉ (π¦ 1 ,π¦ 2 )βΌπ π‘π§π [π π¦ 1 ,π¦ 2 )] β πΉ (π¦ 1 ,π¦ 2 )βΌπ π πππ 2 2 4 (L, Mansour β 19)
Finite Version If |X|=N, there is a graph g such that for every class H there are two β distributions that are H-indistinguishable, g-distinguishable unless: ππ· πΌ = Ξ©(π 2 log π) (L, Mansour β 19) β Optimal : For every graph-based class G with finite capacity there is a β hypothesis class H with VC dimension π(π 2 log π) such that π½ππ π· π π‘π§π , p π πππ > 1 4 β π½ππ π» π π‘π§π , p π πππ > π (Alon, L, Mansour) Given a graph g how many sets are needed to separate every dense set from every β sparse set?
Sample complexity of graph-based discriminators For a family of graph G. β Given samples from two unknown distributions π 1 , π 2 : Decide if β π½ππ π» π 1 , π 2 > π How many examples are needed? β Recall: β For an hypothesis class, a discriminator can decide if π½ππ πΌ π 1 , π 2 > π , if and β only if H has finite VC dimension. Ξ ππ· πΌ /π 2 are needed β
The graph-VC dimension The graph VC dimension is obtained by considering the projections of the β graph by fixing a vertex. Namely, for every x consider the hypothesis class πΌ π¦ = π π¦,β : π β 0,1 : π β π» Then: πππ· π· = sup ππ·(πΌ π¦ ) β π¦βπ π(πππ· π· ) are sufficient. β Ξ©( πππ· π· ) are necessary. β (L, Mansour β 19)
Recommend
More recommend