nearly tight vc dimension bounds for piecewise linear
play

Nearly-tight VC-dimension bounds for piecewise linear neural - PowerPoint PPT Presentation

Nearly-tight VC-dimension bounds for piecewise linear neural networks Nicholas J. A. Harvey, Christopher Liaw , Abbas Mehrabian University of British Columbia COLT 17 July 10, 2017 Neural networks () = max {, 0} (ReLU)


  1. Nearly-tight VC-dimension bounds for piecewise linear neural networks Nicholas J. A. Harvey, Christopher Liaw , Abbas Mehrabian University of British Columbia COLT ’17 July 10, 2017

  2. Neural networks 𝜏(𝑦) = max Β‘ {𝑦, 0} (ReLU) 𝜏(π‘₯ * 𝑦 + 𝑐) 𝑦 " 𝜏 Identity 𝑦 # 𝜏 𝜏 𝑦 $ 𝜏 𝜏 𝑦 % 𝜏 𝜏 𝑦 & input hidden hidden output layer 1 layer 2 layer layer

  3. VC-dimension Defn : If ‘𝐺 is a family of functions then π‘Šπ·π‘’π‘—π‘› 𝐺 β‰₯ 𝑙 iff βˆƒ π‘Œ = {𝑦 " , … , 𝑦 A } s.t. 𝐺 achieves all 2 A signings, i.e. {(π‘‘π‘—π‘•π‘œ(𝑔(𝑦 " )), … , π‘‘π‘—π‘•π‘œ(𝑔(𝑦 A ))): 𝑔 ∈ 𝐺} = {0,1} A e.g. Hyperplanes in 𝑆 K have VC-dimension 𝑒 + 1 . Impossible to shatter Can shatter any 4 points

  4. VC-dimension Defn : If ‘𝐺 is a family of functions then π‘Šπ·π‘’π‘—π‘› 𝐺 β‰₯ 𝑙 iff βˆƒ π‘Œ = {𝑦 " , … , 𝑦 A } s.t. 𝐺 achieves all 2 A signings, i.e. {(π‘‘π‘—π‘•π‘œ(𝑔(𝑦 " )), … , π‘‘π‘—π‘•π‘œ(𝑔(𝑦 A ))): 𝑔 ∈ 𝐺} = {0,1} A Thm [Fund. thm. of learning] : 𝐺 is learnable iff π‘Šπ·π‘’π‘—π‘› 𝐺 < ∞ . Moreover, sample complexity is Θ(π‘Šπ·π‘’π‘—π‘› 𝐺 ) .

  5. W Β‘-­‑ # Β‘parameters/edges VC-dimension of NNs L Β‘-­‑ # Β‘layers Known lower bounds: Known upper bounds: β€’ 𝑃 𝑋𝑀 log 𝑋 + 𝑋𝑀 # β€’ Ξ© 𝑋𝑀 [BMM β€˜98] [BMM β€˜98] β€’ 𝑃 𝑋 # β€’ Ξ© 𝑋 log 𝑋 [M β€˜94] [GJ β€˜95]

  6. W Β‘-­‑ # Β‘parameters/edges VC-dimension of NNs L Β‘-­‑ # Β‘layers Known lower bounds: Known upper bounds: β€’ 𝑃 𝑋𝑀 log 𝑋 + 𝑋𝑀 # β€’ Ξ© 𝑋𝑀 [BMM β€˜98] [BMM β€˜98] β€’ 𝑃 𝑋 # β€’ Ξ© 𝑋 log 𝑋 [M β€˜94] [GJ β€˜95] Main Thm [HLM β€˜17] : For a ReLU NN w/ W params, L layers Ξ© 𝑋𝑀 log(𝑋/𝑀) ≀ VCdim ≀ 𝑃(𝑋𝑀 log 𝑋) Means there exists NN with this VCdim

  7. W Β‘-­‑ # Β‘parameters/edges VC-dimension of NNs L Β‘-­‑ # Β‘layers Known lower bounds: Known upper bounds: β€’ 𝑃 𝑋𝑀 log 𝑋 + 𝑋𝑀 # β€’ Ξ© 𝑋𝑀 [BMM β€˜98] [BMM β€˜98] β€’ 𝑃 𝑋 # β€’ Ξ© 𝑋 log 𝑋 [M β€˜94] [GJ β€˜95] Main Thm [HLM β€˜17] : For a ReLU NN w/ W params, L layers Main Thm [HLM β€˜17] : For a ReLU NN w/ W params, L layers Ξ© 𝑋𝑀 log(𝑋/𝑀) ≀ VCdim ≀ 𝑃(𝑋𝑀 log 𝑋) Ξ© 𝑋𝑀 log(𝑋/𝑀) ≀ VCdim ≀ 𝑃(𝑋𝑀 log 𝑋) Means there exists Independently proved NN with this VCdim by Bartlett β€˜17

  8. W Β‘-­‑ # Β‘parameters/edges VC-dimension of NNs L Β‘-­‑ # Β‘layers Known lower bounds: Known upper bounds: β€’ 𝑃 𝑋𝑀 log 𝑋 + 𝑋𝑀 # β€’ Ξ© 𝑋𝑀 [BMM β€˜98] [BMM β€˜98] β€’ 𝑃 𝑋 # β€’ Ξ© 𝑋 log 𝑋 [M β€˜94] [GJ β€˜95] Main Thm [HLM β€˜17] : For a ReLU NN w/ W params, L layers Ξ© 𝑋𝑀 log(𝑋/𝑀) ≀ VCdim ≀ 𝑃(𝑋𝑀 log 𝑋) Means there exists Independently proved NN with this VCdim by Bartlett β€˜17 Recently, lots of work on β€œpower of depth” for expressiveness of NNs [T β€˜16, ES β€˜16, Y ’16, LS β€˜16, SS’16, CSS’ 16, LGMRA β€˜17, D β€˜17]

  9. Lower bound (refinement of [BMM ’98]) β€’ Shattered set: 𝑇 = 𝑓 ^ ^∈[`] Γ—{𝑓 c } c∈[d] β€’ Encode 𝑔 w/ weights 𝑏 ^ = 0. 𝑏 ^," … 𝑏 ^,d where 𝑏 ^,c = 𝑔(𝑓 ^ , 𝑓 c )

  10. NN block extracts Lower bound bits from 𝑏 ^ 𝑏 ^," 0 (refinement of [BMM ’98]) 𝑏 ^,# 0 𝑏 ^ β€’ Shattered set: 𝑇 = 𝑓 ^ ^∈[`] Γ—{𝑓 c } c∈[d] 𝑏 ^ 𝑏 ^,c 1 β€’ Encode 𝑔 w/ weights 𝑏 ^ = 0. 𝑏 ^," … 𝑏 ^,d where 𝑏 ^,c = 𝑔(𝑓 ^ , 𝑓 c ) 𝑓 ^ 𝑏 ^,d 0 β€’ Given 𝑓 ^ , easy to extract 𝑏 ^ 0 𝑓 c 0 Select bit j from 𝑏 ^ 1 0 Rest of NN

  11. NN block extracts Lower bound bits from 𝑏 ^ 𝑏 ^," 0 (refinement of [BMM ’98]) 𝑏 ^,# 0 𝑏 ^ β€’ Shattered set: 𝑇 = 𝑓 ^ ^∈[`] Γ—{𝑓 c } c∈[d] 𝑏 ^ 𝑏 ^,c 1 β€’ Encode 𝑔 w/ weights 𝑏 ^ = 0. 𝑏 ^," … 𝑏 ^,d where 𝑏 ^,c = 𝑔(𝑓 ^ , 𝑓 c ) 𝑓 ^ 𝑏 ^,d 0 β€’ Given 𝑓 ^ , easy to extract 𝑏 ^ 0 β€’ Design bit extractor to extract 𝑏 ^,c 𝑓 c 0 β€’ [BMM ’98] do this 1 bit per layer β‡’ Ξ©(𝑋𝑀) Select bit j from 𝑏 ^ β€’ More efficient: log(𝑋/𝑀) bits per layer 1 β‡’ Ξ©(𝑋𝑀 Β‘log Β‘ (𝑋/𝑀)) 0 Rest of NN

  12. NN block extracts Lower bound bits from 𝑏 ^ 𝑏 ^," 0 (refinement of [BMM ’98]) 𝑏 ^,# 0 𝑏 ^ β€’ Shattered set: 𝑇 = 𝑓 ^ ^∈[`] Γ—{𝑓 c } c∈[d] 𝑏 ^ 𝑏 ^,c 1 β€’ Encode 𝑔 w/ weights 𝑏 ^ = 0. 𝑏 ^," … 𝑏 ^,d where 𝑏 ^,c = 𝑔(𝑓 ^ , 𝑓 c ) 𝑓 ^ 𝑏 ^,d 0 β€’ Given 𝑓 ^ , easy to extract 𝑏 ^ 0 β€’ Design bit extractor to extract 𝑏 ^,c 𝑓 c 0 β€’ [BMM ’98] do this 1 bit per layer β‡’ Ξ©(𝑋𝑀) Select bit j from 𝑏 ^ β€’ More efficient: log(𝑋/𝑀) bits per layer 1 β‡’ Ξ©(𝑋𝑀 Β‘log Β‘ (𝑋/𝑀)) Thm [HLM β€˜17] : Suppose a ReLU NN w/ 𝑋 Β‘ params, 𝑀 Β‘ layers 0 extracts 𝑛 th bit of input. Then 𝑛 ≀ 𝑃(𝑀 Β‘log Β‘ (𝑋/𝑀)) . Rest of NN

  13. Upper bound (refinement of [BMM ’98] for ReLU) β€’ Fix a shattered set π‘Œ = {𝑦 " , … , 𝑦 d } ^ 𝑦 " β€’ Partition parameter space s.t. input to 1 st 𝜏 hidden layer has constant sign 𝜏 ^ 𝑦 # β€’ can replace 𝜏 with 0 (if < 0) or identity (if > 0)! 𝜏 β€’ Number of partition is small, i.e. ≀ (𝐷𝑛) g 𝜏 ^ 𝑦 $ 𝜏 β€’ Repeat procedure for each layer to get 𝜏 ^ 𝑦 % partition of size ≀ (𝐷𝑀𝑛) h(gi) 𝜏 β€’ In each piece, output is polynomial of deg. 𝑀 ^ 𝑦 & so total # of signings ≀ 𝐷𝑀𝑛 h gi β€’ Since π‘Œ is shattered, need 2 d ≀ 𝐷𝑀𝑛 h gi which implies 𝑛 = 𝑃(𝑋𝑀 log 𝑋)

  14. Upper bound (refinement of [BMM ’98] for ReLU) β€’ Fix a shattered set π‘Œ = {𝑦 " , … , 𝑦 d } ^ 𝑦 " β€’ Partition parameter space s.t. input to 1 st 𝜏 hidden layer has constant sign 𝜏 ^ 𝑦 # β€’ can replace 𝜏 with 0 (if < 0) or identity (if > 0)! 𝜏 β€’ Number of partition is small, i.e. ≀ (𝐷𝑛) g 𝜏 ^ 𝑦 $ 𝜏 β€’ Repeat procedure for each layer to get 𝜏 ^ 𝑦 % partition of size ≀ (𝐷𝑀𝑛) h(gi) 𝜏 β€’ In each piece, output is polynomial of deg. 𝑀 ^ 𝑦 & so total # of signings ≀ 𝐷𝑀𝑛 h gi β€’ Since π‘Œ is shattered, need 2 d ≀ 𝐷𝑀𝑛 h gi which implies 𝑛 = 𝑃(𝑋𝑀 log 𝑋)

  15. Upper bound (refinement of [BMM ’98] for ReLU) β€’ Fix a shattered set π‘Œ = {𝑦 " , … , 𝑦 d } ^ 𝑦 " β€’ Partition parameter space s.t. input to 1 st 𝜏 hidden layer has constant sign ^ 𝑦 # β€’ can replace 𝜏 with 0 (if < 0) or identity (if > 0)! 𝜏 β€’ Number of partition is small, i.e. ≀ (𝐷𝑛) g ^ 𝑦 $ 0 𝜏 β€’ Repeat procedure for each layer to get ^ 𝑦 % partition of size ≀ (𝐷𝑀𝑛) h(gi) 𝜏 β€’ In each piece, output is polynomial of deg. 𝑀 ^ 𝑦 & so total # of signings ≀ 𝐷𝑀𝑛 h gi β€’ Since π‘Œ is shattered, need 2 d ≀ 𝐷𝑀𝑛 h gi which implies 𝑛 = 𝑃(𝑋𝑀 log 𝑋)

  16. Upper bound (refinement of [BMM ’98] for ReLU) β€’ Fix a shattered set π‘Œ = {𝑦 " , … , 𝑦 d } ^ 𝑦 " β€’ Partition parameter space s.t. input to 1 st 𝜏 hidden layer has constant sign ^ 𝑦 # β€’ can replace 𝜏 with 0 (if < 0) or identity (if > 0)! 𝜏 β€’ Size of partition is small, i.e. ≀ (𝐷𝑛) g [Warren β€˜68] ^ 𝑦 $ 0 𝜏 β€’ Repeat procedure for each layer to get ^ 𝑦 % partition of size ≀ (𝐷𝑀𝑛) h(gi) 𝜏 β€’ In each piece, output is polynomial of deg. 𝑀 ^ 𝑦 & so total # of signings ≀ 𝐷𝑀𝑛 h gi β€’ Since π‘Œ is shattered, need 2 d ≀ 𝐷𝑀𝑛 h gi which implies 𝑛 = 𝑃(𝑋𝑀 log 𝑋) * Β‘ 𝐷 Β‘ > Β‘1 is some constant

  17. Upper bound (refinement of [BMM ’98] for ReLU) β€’ Fix a shattered set π‘Œ = {𝑦 " , … , 𝑦 d } ^ 𝑦 " β€’ Partition parameter space s.t. input to 1 st hidden layer has constant sign ^ 𝑦 # β€’ can replace 𝜏 with 0 (if < 0) or identity (if > 0)! β€’ Size of partition is small, i.e. ≀ (𝐷𝑛) g [Warren β€˜68] ^ 𝑦 $ 0 β€’ Repeat procedure for each layer to get ^ 𝑦 % partition of size ≀ (𝐷𝑀𝑛) h(gi) β€’ In each piece, output is polynomial of deg. 𝑀 ^ 𝑦 & so total # of signings ≀ 𝐷𝑀𝑛 h gi β€’ Since π‘Œ is shattered, need 2 d ≀ 𝐷𝑀𝑛 h gi which implies 𝑛 = 𝑃(𝑋𝑀 log 𝑋) * Β‘ 𝐷 Β‘ > Β‘1 is some constant

  18. Upper bound (refinement of [BMM ’98] for ReLU) β€’ Fix a shattered set π‘Œ = {𝑦 " , … , 𝑦 d } ^ 𝑦 " β€’ Partition parameter space s.t. input to 1 st hidden layer has constant sign ^ 𝑦 # β€’ can replace 𝜏 with 0 (if < 0) or identity (if > 0)! β€’ Size of partition is small, i.e. ≀ (𝐷𝑛) g [Warren β€˜68] ^ 𝑦 $ 0 β€’ Repeat procedure for each layer to get ^ 𝑦 % partition of size ≀ (𝐷𝑀𝑛) h(gi) β€’ In each piece, output is polynomial of deg. 𝑀 ^ 𝑦 & so total # of signings ≀ 𝐷𝑀𝑛 h gi β€’ Since π‘Œ is shattered, need 2 d ≀ 𝐷𝑀𝑛 h gi which implies 𝑛 = 𝑃(𝑋𝑀 log 𝑋) * Β‘ 𝐷 Β‘ > Β‘1 is some constant

Recommend


More recommend