xx
play

XX Shan Deep Somehistory 80 S N N 2018 Approximation Theory is - PDF document

Why deep nets Is deep better than shallow When Why in i w XX Shan Deep Somehistory 80 S N N 2018 Approximation Theory is depth better why Shallow networks wit O O 02 k X kernel machine Rain example 9 K x a xi y ta Ef Gil


  1. Why deep nets Is deep better than shallow When Why

  2. in i w XX Shan Deep

  3. Somehistory 80 S N N 2018

  4. Approximation Theory is depth better why

  5. Shallow networks wit O O 02 k X kernel machine Rain example 9 K x a xi y ta Ef Gil yi WE III n

  6. Another example I O O O O w ex ex 8CZt lz4 zJ max x I fzy 2iwiiok.w eliminate b but b I b 64W usually x one of the is components Xd L so anury wit bi

  7. Networks Deep 2 of o H l O O O O h l O O O O 0 very E in VK's Wii W xp y convention E summation

  8. Networkstoappn oxuriate representfunctions Are deep better nets than shallow ones The in the S O answer was no We will see proof of above deep a new answer be much nets can better for certain f

  9. in approximation key ideas theory Functions and approxuriators mm get V efss.fi fCx ist Example iRd fxfafeC aspgu6Cwiif gEVu

  10. Density V compact Md K KEITH HE O cxi gcxllcE Igers t qq.pe Ieuan I set of networks

  11. Degreeafapproximation H f e CCR d gnefn I f g la e dinge is N

  12. Shallown ets density idegreeappeximation Considertargetfundious ftp iiifPnfhei few fo structure assumed gakEcicw.ie Theorem i V fewdm st 7gc.VN 1gal false with N O E

  13. Curse of dimensionality In Bellman's term a optimization cannot be done by Rs a function approximation yd evaluations requires for f Lipschitz order E D integration

  14. Blessings of a Smoothness Barrow's Green compositionelity 2

  15. Examples dtk tnk Pf has d K Kd monomials A function of 10 Variables 10 D corresponds to a table If each dimension vi Just is discretized 10 I have table with partitions entries If D 10 100 pixels

  16. If d p els s o e entries then 10 Ym N O If f e Wdm N Off l O For E d too e

  17. Summary proof Me X Ai bi iCi6 Pol E r PY CW xD Pge E kind fol updegee variables Kd qYd k Z p I.EETEEqfmda Wdm Rz Lp a Ck F Sobolev E F W Es 2 Vd o's E C 30 I p g w'd I

  18. Logic of t t univariate Networks approximate poly Univariate in CW X get represent multivariate gel pet approximate Multivariate Sobolev functions Thus theorem

  19. univariate Any x p linear can be represented as combination of smooth ReLU proof da Xt b b 6 ath fun x 2 h o h 6 b 6 X Ida xtbYE.o Theorem a polynomial not If 6 is the closure of Nr to E C pair linear of contains the r 8 space of IT

  20. a of a p Second derivative which needs terms 3 x gives 0 0 0 Nr Thus C CH is dense in because of Weierstrass theorem

  21. FROM ID To DD win variate i I i Rd PC pi f Hn variables of pohffffeneous deputy d dth dim Hh Ye 2 k Cd al Kd H nd pol be represented thus can fed with network 2 by a units

  22. We show that to want d if Pan IR then pol ou P Lx pikwi.es Wi some ghoice of for r and Pi No general proof but consider following with units network assume t s r pi fu Pal of Can I synthesise tf

  23. Cl of Can yuthere I variables d can E xp get I get x xz do Well how EE aka II Reimann hour pot degree him 2Nd And Pnd a 4 n u Ith din HE r din Pu tan

  24. E fB Define X Lp c B inf Ip flip tf Pex theorem Yet Wam Nz ect F Lp II statist proof m Pie Lp ectin Be classically Ii'd Ed Juice p K r No 4 NECW ECW Ralph IF C

  25. Remarks a shallow c Even without can represent arts Well net in PL with polynomials fed units 2 2

  26. Depths e For general functions and deep shallow nets curse of drinensionality suffer for Local Hierarchical But Compositional functions unlike shallow nets deep have not do ones curse

  27. LHC functions Swnplesteea mple g ftp.flkrkisxn he x 4 4h f Lx Xa fg f2 h 3 few unhfiew Another eeaus.pk f AX.Xz x.kz iBx.eCxzJ shallow not require units 2

  28. a deep net 3 3 10 n t Intuition Thf 4 units shallow wet f 2 deep wet each node for f 6 A N 12 units total 3 fer Another ee.am VsinCX.exz oCxze XuY y h h2C4xz h6 has 34

  29. Theorem Deep nets with graph same approximate functions d in in WE with 4 units variables O per node fer m r 0 Kd D total units Th Proof µ µ µ be approximated with Each h can units We O assume

  30. is Lipschitz continuous each h that het E I LE 1ham is a By hypothesis 11h Pace 1h pike 1h pl Then e PG pal I h h hz h h PNR 1h h hu P R Pz Pz a Ih h I n O e GE I h Pl l E t t p Minkowski Lipson h hypothers If di f Kal ten Ix El theorems for More general DA G s

  31. EH This theorem may eeplain nets successful why deep are the really good and why all h h N Ns C ones are Ann Ann Em is hey Locality not weight sharing of weight sharing helps butuoteep uentially

Recommend


More recommend