Nearly-tight VC-dimension bounds for piecewise linear neural networks Nicholas J. A. Harvey, Christopher Liaw , Abbas Mehrabian University of British Columbia COLT β17 July 10, 2017
Neural networks π(π¦) = max Β‘ {π¦, 0} (ReLU) π(π₯ * π¦ + π) π¦ " π Identity π¦ # π π π¦ $ π π π¦ % π π π¦ & input hidden hidden output layer 1 layer 2 layer layer
VC-dimension Defn : If Β‘πΊ is a family of functions then ππ·πππ πΊ β₯ π iff β π = {π¦ " , β¦ , π¦ A } s.t. πΊ achieves all 2 A signings, i.e. {(π‘πππ(π(π¦ " )), β¦ , π‘πππ(π(π¦ A ))): π β πΊ} = {0,1} A e.g. Hyperplanes in π K have VC-dimension π + 1 . Impossible to shatter Can shatter any 4 points
VC-dimension Defn : If Β‘πΊ is a family of functions then ππ·πππ πΊ β₯ π iff β π = {π¦ " , β¦ , π¦ A } s.t. πΊ achieves all 2 A signings, i.e. {(π‘πππ(π(π¦ " )), β¦ , π‘πππ(π(π¦ A ))): π β πΊ} = {0,1} A Thm [Fund. thm. of learning] : πΊ is learnable iff ππ·πππ πΊ < β . Moreover, sample complexity is Ξ(ππ·πππ πΊ ) .
W Β‘-Ββ # Β‘parameters/edges VC-dimension of NNs L Β‘-Ββ # Β‘layers Known lower bounds: Known upper bounds: β’ π ππ log π + ππ # β’ Ξ© ππ [BMM β98] [BMM β98] β’ π π # β’ Ξ© π log π [M β94] [GJ β95]
W Β‘-Ββ # Β‘parameters/edges VC-dimension of NNs L Β‘-Ββ # Β‘layers Known lower bounds: Known upper bounds: β’ π ππ log π + ππ # β’ Ξ© ππ [BMM β98] [BMM β98] β’ π π # β’ Ξ© π log π [M β94] [GJ β95] Main Thm [HLM β17] : For a ReLU NN w/ W params, L layers Ξ© ππ log(π/π) β€ VCdim β€ π(ππ log π) Means there exists NN with this VCdim
W Β‘-Ββ # Β‘parameters/edges VC-dimension of NNs L Β‘-Ββ # Β‘layers Known lower bounds: Known upper bounds: β’ π ππ log π + ππ # β’ Ξ© ππ [BMM β98] [BMM β98] β’ π π # β’ Ξ© π log π [M β94] [GJ β95] Main Thm [HLM β17] : For a ReLU NN w/ W params, L layers Main Thm [HLM β17] : For a ReLU NN w/ W params, L layers Ξ© ππ log(π/π) β€ VCdim β€ π(ππ log π) Ξ© ππ log(π/π) β€ VCdim β€ π(ππ log π) Means there exists Independently proved NN with this VCdim by Bartlett β17
W Β‘-Ββ # Β‘parameters/edges VC-dimension of NNs L Β‘-Ββ # Β‘layers Known lower bounds: Known upper bounds: β’ π ππ log π + ππ # β’ Ξ© ππ [BMM β98] [BMM β98] β’ π π # β’ Ξ© π log π [M β94] [GJ β95] Main Thm [HLM β17] : For a ReLU NN w/ W params, L layers Ξ© ππ log(π/π) β€ VCdim β€ π(ππ log π) Means there exists Independently proved NN with this VCdim by Bartlett β17 Recently, lots of work on βpower of depthβ for expressiveness of NNs [T β16, ES β16, Y β16, LS β16, SSβ16, CSSβ 16, LGMRA β17, D β17]
Lower bound (refinement of [BMM β98]) β’ Shattered set: π = π ^ ^β[`] Γ{π c } cβ[d] β’ Encode π w/ weights π ^ = 0. π ^," β¦ π ^,d where π ^,c = π(π ^ , π c )
NN block extracts Lower bound bits from π ^ π ^," 0 (refinement of [BMM β98]) π ^,# 0 π ^ β’ Shattered set: π = π ^ ^β[`] Γ{π c } cβ[d] π ^ π ^,c 1 β’ Encode π w/ weights π ^ = 0. π ^," β¦ π ^,d where π ^,c = π(π ^ , π c ) π ^ π ^,d 0 β’ Given π ^ , easy to extract π ^ 0 π c 0 Select bit j from π ^ 1 0 Rest of NN
NN block extracts Lower bound bits from π ^ π ^," 0 (refinement of [BMM β98]) π ^,# 0 π ^ β’ Shattered set: π = π ^ ^β[`] Γ{π c } cβ[d] π ^ π ^,c 1 β’ Encode π w/ weights π ^ = 0. π ^," β¦ π ^,d where π ^,c = π(π ^ , π c ) π ^ π ^,d 0 β’ Given π ^ , easy to extract π ^ 0 β’ Design bit extractor to extract π ^,c π c 0 β’ [BMM β98] do this 1 bit per layer β Ξ©(ππ) Select bit j from π ^ β’ More efficient: log(π/π) bits per layer 1 β Ξ©(ππ Β‘log Β‘ (π/π)) 0 Rest of NN
NN block extracts Lower bound bits from π ^ π ^," 0 (refinement of [BMM β98]) π ^,# 0 π ^ β’ Shattered set: π = π ^ ^β[`] Γ{π c } cβ[d] π ^ π ^,c 1 β’ Encode π w/ weights π ^ = 0. π ^," β¦ π ^,d where π ^,c = π(π ^ , π c ) π ^ π ^,d 0 β’ Given π ^ , easy to extract π ^ 0 β’ Design bit extractor to extract π ^,c π c 0 β’ [BMM β98] do this 1 bit per layer β Ξ©(ππ) Select bit j from π ^ β’ More efficient: log(π/π) bits per layer 1 β Ξ©(ππ Β‘log Β‘ (π/π)) Thm [HLM β17] : Suppose a ReLU NN w/ π Β‘ params, π Β‘ layers 0 extracts π th bit of input. Then π β€ π(π Β‘log Β‘ (π/π)) . Rest of NN
Upper bound (refinement of [BMM β98] for ReLU) β’ Fix a shattered set π = {π¦ " , β¦ , π¦ d } ^ π¦ " β’ Partition parameter space s.t. input to 1 st π hidden layer has constant sign π ^ π¦ # β’ can replace π with 0 (if < 0) or identity (if > 0)! π β’ Number of partition is small, i.e. β€ (π·π) g π ^ π¦ $ π β’ Repeat procedure for each layer to get π ^ π¦ % partition of size β€ (π·ππ) h(gi) π β’ In each piece, output is polynomial of deg. π ^ π¦ & so total # of signings β€ π·ππ h gi β’ Since π is shattered, need 2 d β€ π·ππ h gi which implies π = π(ππ log π)
Upper bound (refinement of [BMM β98] for ReLU) β’ Fix a shattered set π = {π¦ " , β¦ , π¦ d } ^ π¦ " β’ Partition parameter space s.t. input to 1 st π hidden layer has constant sign π ^ π¦ # β’ can replace π with 0 (if < 0) or identity (if > 0)! π β’ Number of partition is small, i.e. β€ (π·π) g π ^ π¦ $ π β’ Repeat procedure for each layer to get π ^ π¦ % partition of size β€ (π·ππ) h(gi) π β’ In each piece, output is polynomial of deg. π ^ π¦ & so total # of signings β€ π·ππ h gi β’ Since π is shattered, need 2 d β€ π·ππ h gi which implies π = π(ππ log π)
Upper bound (refinement of [BMM β98] for ReLU) β’ Fix a shattered set π = {π¦ " , β¦ , π¦ d } ^ π¦ " β’ Partition parameter space s.t. input to 1 st π hidden layer has constant sign ^ π¦ # β’ can replace π with 0 (if < 0) or identity (if > 0)! π β’ Number of partition is small, i.e. β€ (π·π) g ^ π¦ $ 0 π β’ Repeat procedure for each layer to get ^ π¦ % partition of size β€ (π·ππ) h(gi) π β’ In each piece, output is polynomial of deg. π ^ π¦ & so total # of signings β€ π·ππ h gi β’ Since π is shattered, need 2 d β€ π·ππ h gi which implies π = π(ππ log π)
Upper bound (refinement of [BMM β98] for ReLU) β’ Fix a shattered set π = {π¦ " , β¦ , π¦ d } ^ π¦ " β’ Partition parameter space s.t. input to 1 st π hidden layer has constant sign ^ π¦ # β’ can replace π with 0 (if < 0) or identity (if > 0)! π β’ Size of partition is small, i.e. β€ (π·π) g [Warren β68] ^ π¦ $ 0 π β’ Repeat procedure for each layer to get ^ π¦ % partition of size β€ (π·ππ) h(gi) π β’ In each piece, output is polynomial of deg. π ^ π¦ & so total # of signings β€ π·ππ h gi β’ Since π is shattered, need 2 d β€ π·ππ h gi which implies π = π(ππ log π) * Β‘ π· Β‘ > Β‘1 is some constant
Upper bound (refinement of [BMM β98] for ReLU) β’ Fix a shattered set π = {π¦ " , β¦ , π¦ d } ^ π¦ " β’ Partition parameter space s.t. input to 1 st hidden layer has constant sign ^ π¦ # β’ can replace π with 0 (if < 0) or identity (if > 0)! β’ Size of partition is small, i.e. β€ (π·π) g [Warren β68] ^ π¦ $ 0 β’ Repeat procedure for each layer to get ^ π¦ % partition of size β€ (π·ππ) h(gi) β’ In each piece, output is polynomial of deg. π ^ π¦ & so total # of signings β€ π·ππ h gi β’ Since π is shattered, need 2 d β€ π·ππ h gi which implies π = π(ππ log π) * Β‘ π· Β‘ > Β‘1 is some constant
Upper bound (refinement of [BMM β98] for ReLU) β’ Fix a shattered set π = {π¦ " , β¦ , π¦ d } ^ π¦ " β’ Partition parameter space s.t. input to 1 st hidden layer has constant sign ^ π¦ # β’ can replace π with 0 (if < 0) or identity (if > 0)! β’ Size of partition is small, i.e. β€ (π·π) g [Warren β68] ^ π¦ $ 0 β’ Repeat procedure for each layer to get ^ π¦ % partition of size β€ (π·ππ) h(gi) β’ In each piece, output is polynomial of deg. π ^ π¦ & so total # of signings β€ π·ππ h gi β’ Since π is shattered, need 2 d β€ π·ππ h gi which implies π = π(ππ log π) * Β‘ π· Β‘ > Β‘1 is some constant
Recommend
More recommend