Why deep nets Is deep better than shallow When Why
in i w XX Shan Deep
Somehistory 80 S N N 2018
Approximation Theory is depth better why
Shallow networks wit O O 02 k X kernel machine Rain example 9 K x a xi y ta Ef Gil yi WE III n
Another example I O O O O w ex ex 8CZt lz4 zJ max x I fzy 2iwiiok.w eliminate b but b I b 64W usually x one of the is components Xd L so anury wit bi
Networks Deep 2 of o H l O O O O h l O O O O 0 very E in VK's Wii W xp y convention E summation
Networkstoappn oxuriate representfunctions Are deep better nets than shallow ones The in the S O answer was no We will see proof of above deep a new answer be much nets can better for certain f
in approximation key ideas theory Functions and approxuriators mm get V efss.fi fCx ist Example iRd fxfafeC aspgu6Cwiif gEVu
Density V compact Md K KEITH HE O cxi gcxllcE Igers t qq.pe Ieuan I set of networks
Degreeafapproximation H f e CCR d gnefn I f g la e dinge is N
Shallown ets density idegreeappeximation Considertargetfundious ftp iiifPnfhei few fo structure assumed gakEcicw.ie Theorem i V fewdm st 7gc.VN 1gal false with N O E
Curse of dimensionality In Bellman's term a optimization cannot be done by Rs a function approximation yd evaluations requires for f Lipschitz order E D integration
Blessings of a Smoothness Barrow's Green compositionelity 2
Examples dtk tnk Pf has d K Kd monomials A function of 10 Variables 10 D corresponds to a table If each dimension vi Just is discretized 10 I have table with partitions entries If D 10 100 pixels
If d p els s o e entries then 10 Ym N O If f e Wdm N Off l O For E d too e
Summary proof Me X Ai bi iCi6 Pol E r PY CW xD Pge E kind fol updegee variables Kd qYd k Z p I.EETEEqfmda Wdm Rz Lp a Ck F Sobolev E F W Es 2 Vd o's E C 30 I p g w'd I
Logic of t t univariate Networks approximate poly Univariate in CW X get represent multivariate gel pet approximate Multivariate Sobolev functions Thus theorem
univariate Any x p linear can be represented as combination of smooth ReLU proof da Xt b b 6 ath fun x 2 h o h 6 b 6 X Ida xtbYE.o Theorem a polynomial not If 6 is the closure of Nr to E C pair linear of contains the r 8 space of IT
a of a p Second derivative which needs terms 3 x gives 0 0 0 Nr Thus C CH is dense in because of Weierstrass theorem
FROM ID To DD win variate i I i Rd PC pi f Hn variables of pohffffeneous deputy d dth dim Hh Ye 2 k Cd al Kd H nd pol be represented thus can fed with network 2 by a units
We show that to want d if Pan IR then pol ou P Lx pikwi.es Wi some ghoice of for r and Pi No general proof but consider following with units network assume t s r pi fu Pal of Can I synthesise tf
Cl of Can yuthere I variables d can E xp get I get x xz do Well how EE aka II Reimann hour pot degree him 2Nd And Pnd a 4 n u Ith din HE r din Pu tan
E fB Define X Lp c B inf Ip flip tf Pex theorem Yet Wam Nz ect F Lp II statist proof m Pie Lp ectin Be classically Ii'd Ed Juice p K r No 4 NECW ECW Ralph IF C
Remarks a shallow c Even without can represent arts Well net in PL with polynomials fed units 2 2
Depths e For general functions and deep shallow nets curse of drinensionality suffer for Local Hierarchical But Compositional functions unlike shallow nets deep have not do ones curse
LHC functions Swnplesteea mple g ftp.flkrkisxn he x 4 4h f Lx Xa fg f2 h 3 few unhfiew Another eeaus.pk f AX.Xz x.kz iBx.eCxzJ shallow not require units 2
a deep net 3 3 10 n t Intuition Thf 4 units shallow wet f 2 deep wet each node for f 6 A N 12 units total 3 fer Another ee.am VsinCX.exz oCxze XuY y h h2C4xz h6 has 34
Theorem Deep nets with graph same approximate functions d in in WE with 4 units variables O per node fer m r 0 Kd D total units Th Proof µ µ µ be approximated with Each h can units We O assume
is Lipschitz continuous each h that het E I LE 1ham is a By hypothesis 11h Pace 1h pike 1h pl Then e PG pal I h h hz h h PNR 1h h hu P R Pz Pz a Ih h I n O e GE I h Pl l E t t p Minkowski Lipson h hypothers If di f Kal ten Ix El theorems for More general DA G s
EH This theorem may eeplain nets successful why deep are the really good and why all h h N Ns C ones are Ann Ann Em is hey Locality not weight sharing of weight sharing helps butuoteep uentially
Recommend
More recommend