Deep Approximation via Deep Learning Zuowei Shen Department of - PowerPoint PPT Presentation

Deep Approximation via Deep Learning Zuowei Shen Department of Mathematics National University of Singapore

Outline Introduction of approximation theory 1 Approximation of functions by compositions 2 Approximation rate in term of number of nurons 3

A brief introduction For a given function f : R d → R and ǫ > 0 , approximation is to find a simple function g such that � f − g � < ǫ.

A brief introduction For a given function f : R d → R and ǫ > 0 , approximation is to find a simple function g such that � f − g � < ǫ. Function g : R n → R can be as simple as g ( x ) = a · x . To make sense of this approximation, we need to find a map T : R d �→ R n , such that � f − g ◦ T � < ǫ.

A brief introduction For a given function f : R d → R and ǫ > 0 , approximation is to find a simple function g such that � f − g � < ǫ. Function g : R n → R can be as simple as g ( x ) = a · x . To make sense of this approximation, we need to find a map T : R d �→ R n , such that � f − g ◦ T � < ǫ. In practice, we only have sample data { ( x i , f ( x i )) } m i =1 of f , one needs develop algorithms to find T .

A brief introduction For a given function f : R d → R and ǫ > 0 , approximation is to find a simple function g such that � f − g � < ǫ. Function g : R n → R can be as simple as g ( x ) = a · x . To make sense of this approximation, we need to find a map T : R d �→ R n , such that � f − g ◦ T � < ǫ. In practice, we only have sample data { ( x i , f ( x i )) } m i =1 of f , one needs develop algorithms to find T . Classical approximation: T is independent of f or data, 1 while n depends on ǫ . Learning: T is learned from data and determined by a few 2 parameters. n depends on ǫ . Deep learning: T is fully learned from data with huge 3 number of parameters. T is a composition of many simple maps, and n can be independent of ǫ .

Classical approximation Linear approximation: Given a finite fixed set of generators { φ 1 , . . . , φ n } , e.g. splines, wavelet frames, finite elements or generators in reproducing kernel Hilbert spaces. Define T = [ φ 1 , φ 2 , . . . , φ n ] ⊤ : R d �→ R n g ( x ) = a · x. and The linear approximation is to find a ∈ R n such that n � g ◦ T = a i φ i ∼ f i =1 It is linear because f 1 ∼ g 1 , f 2 ∼ g 2 ⇒ f 1 + f 2 ∼ g 1 + g 2 .

Classical approximation Linear approximation: Given a finite fixed set of generators { φ 1 , . . . , φ n } , e.g. splines, wavelet frames, finite elements or generators in reproducing kernel Hilbert spaces. Define T = [ φ 1 , φ 2 , . . . , φ n ] ⊤ : R d �→ R n g ( x ) = a · x. and The linear approximation is to find a ∈ R n such that n � g ◦ T = a i φ i ∼ f i =1 It is linear because f 1 ∼ g 1 , f 2 ∼ g 2 ⇒ f 1 + f 2 ∼ g 1 + g 2 . The best n -term approximation: Given dictionary D that can have infinitely many generators , e.g. D = { φ i } ∞ i =1 and define T = [ φ 1 , φ 2 , . . . , ] ⊤ : R d �→∈ R ∞ and g ( x ) = a · x The best n -term approximation of f is to find a with n nonzero terms such that g ◦ T ∼ f .is the best approximation among all the n -term choices It is nonlinear because f 1 ∼ g 1 , f 2 ∼ g 2 � f 1 + f 2 ∼ g 1 + g 2 , as the support of the a 1 and a 2 depends on f 1 and f 2 .

Examples Consider a function space L 2 ( R d ) , let { φ i } ∞ i =1 be an orthonormal basis of L 2 ( R d ) .

Examples Consider a function space L 2 ( R d ) , let { φ i } ∞ i =1 be an orthonormal basis of L 2 ( R d ) . Linear approximation For a given n , T = [ φ 1 , . . . , φ n ] ⊤ and g = a · x where a j = � f, φ j � . Denote H = span { φ 1 , . . . , φ n } ⊆ L 2 ( R d ) . Then, n � g ◦ T = � f, φ i � φ i i =1 is the orthogonal projection onto the space H and is the best approximation of f from the space H .

Examples Consider a function space L 2 ( R d ) , let { φ i } ∞ i =1 be an orthonormal basis of L 2 ( R d ) . Linear approximation For a given n , T = [ φ 1 , . . . , φ n ] ⊤ and g = a · x where a j = � f, φ j � . Denote H = span { φ 1 , . . . , φ n } ⊆ L 2 ( R d ) . Then, n � g ◦ T = � f, φ i � φ i i =1 is the orthogonal projection onto the space H and is the best approximation of f from the space H . g ◦ T provides a good approximation of f when the sequence {� f, φ j �} ∞ j =1 decays fast as j → + ∞ .

Examples Consider a function space L 2 ( R d ) , let { φ i } ∞ i =1 be an orthonormal basis of L 2 ( R d ) . Linear approximation For a given n , T = [ φ 1 , . . . , φ n ] ⊤ and g = a · x where a j = � f, φ j � . Denote H = span { φ 1 , . . . , φ n } ⊆ L 2 ( R d ) . Then, n � g ◦ T = � f, φ i � φ i i =1 is the orthogonal projection onto the space H and is the best approximation of f from the space H . g ◦ T provides a good approximation of f when the sequence {� f, φ j �} ∞ j =1 decays fast as j → + ∞ . Therefore, 1 Linear approximation provides a good approximation for smooth functions. 2 Advantage: It is a good approximation scheme for d is small, domain is simple, function form is complicated but smooth. Disadvantage: It does not do well if d is big and/or domain of f is 3 complex.

Examples The best n -term approximation j =1 : R d �→ R ∞ and g ( x ) = a · x and each a j is T = ( φ j ) ∞ � for the largest n terms in the sequence {|� f, φ j �|} ∞ � f, φ j � , j =1 a j = 0 , otherwise.

Examples The best n -term approximation j =1 : R d �→ R ∞ and g ( x ) = a · x and each a j is T = ( φ j ) ∞ � for the largest n terms in the sequence {|� f, φ j �|} ∞ � f, φ j � , j =1 a j = 0 , otherwise. The approximation of f by g ◦ T depends less on the decay of the sequence {|� f, φ j �|} ∞ j =1 . Therefore, the best n -term approximation is better than the linear 1 approximation when f is nonsmooth. It is not a good scheme if d is big and/or domain of f is 2 complex.

Approximation for deep learning Given data { ( x i , f ( x i )) } m i =1 . The key of deep learning is to construct a T by the given 1 data and chosen g .

Approximation for deep learning Given data { ( x i , f ( x i )) } m i =1 . The key of deep learning is to construct a T by the given 1 data and chosen g . T can simplify the domain of f through the change of 2 variables while keeping the key features of the domain of f , so that

Approximation for deep learning Given data { ( x i , f ( x i )) } m i =1 . The key of deep learning is to construct a T by the given 1 data and chosen g . T can simplify the domain of f through the change of 2 variables while keeping the key features of the domain of f , so that It is robust to approximate f by g ◦ T . 3

Classical approximation vs deep learning For both linear and the best n -term approximations, T is fixed. Neither of them suits for approximating f , when f is defined on a complex domain, e.g manifold in a very high dimensional space.

Classical approximation vs deep learning For both linear and the best n -term approximations, T is fixed. Neither of them suits for approximating f , when f is defined on a complex domain, e.g manifold in a very high dimensional space. For deep learning, T is constructed by and adapted to the given data. T changes variables and maps domain of f to mach with that of a simple function g . It is normally used to approximate f with complex domain.

Classical approximation vs deep learning For both linear and the best n -term approximations, T is fixed. Neither of them suits for approximating f , when f is defined on a complex domain, e.g manifold in a very high dimensional space. For deep learning, T is constructed by and adapted to the given data. T changes variables and maps domain of f to mach with that of a simple function g . It is normally used to approximate f with complex domain. What is the mathematics behind this? Settings: construct a measurable map T : R d �→ R n and a simple function g (e.g. g = a · x ) from data such that the feature of the domain of f can be rearranged by T to match with those of g . This leads to g ◦ T provides a good approximation of f .

Outline Introduction of approximation theory 1 Approximation of functions by compositions 2 Approximation rate in term of number of nurons 3

Approximation by compositions (with Qianxiao Li and Cheng Tai) Question 1: For given f and g , is there a measurable T : R d �→ R n such that f = g ◦ T ?

Approximation by compositions (with Qianxiao Li and Cheng Tai) Question 1: For given f and g , is there a measurable T : R d �→ R n such that f = g ◦ T ? Answer: Yes! We have proven Theorem Let f : R d → R and g : R n → R and assume Im( f ) ⊆ Im( g ) and g is continuous. Then, there exists a measurable map T : R d �→ R n such that f = g ◦ T, a.e. This is an existence proof. T cannot be written out analytically. This leads to the following relaxed question

Approximation by compositions Question 2: For arbitrarily given ǫ > 0 , can one construct a measurable T : R d �→ R n such that � f − g ◦ T � ≤ ǫ ?

Deep Approximation via Deep Learning Zuowei Shen Department of - PowerPoint PPT Presentation

Deep Approximation via Deep Learning Zuowei Shen Department of Mathematics National University of Singapore Outline Introduction of approximation theory 1 Approximation of functions by compositions 2 Approximation rate in term of number of

6. Approximation and fitting norm approximation least-norm problems regularized

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Moderately exponential approximation Bridging the gap between exact computation and polynomial

6. Approximation and fitting Prof. Ying Cui Department of Electrical Engineering Shanghai Jiao

ECS 231 Lecture on Approximation and Error Analysis 1 / 9 Approximation and error analysis 1.

LOCAL LINEAR APPROXIMATION MATH 200 GOALS Be able to compute the local linear approximation

Lecture 18: PCP Theorem and Hardness of Approximation I Arijit Bishnu 26.04.2010 Introduction

Advanced Algorithms COMS31900 Approximation algorithms part three (Fully) Polynomial Time

Deep Reinforcement Learning [Human-Level Control through deep reinforcement learning, Nature

Learning From Data Lecture 7 Approximation Versus Generalization The VC Dimension Approximation

Presentation about Deep Learning --- Zhongwu xie Contents 1.Brief introduction of Deep learning.

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

Deep learning Deep reinforcement learning Hamid Beigy Sharif university of technology December

Differen'able Func'onal Programming Noel Welsh @noelwelsh underscore Goals Deep learning

CS 6316 Machine Learning Dimensionality Reduction Yangfeng Ji Department of Computer Science

Learning Mixtures of Spherical Gaussians: Moment Methods and Spectral Decompositions Daniel Hsu

Surfaces How to carry to surface? Texture Synthesis: Surfaces and RD from Wei & Levoy = +

Rehashing Kernel Evaluation in High Dimensions Paris Siminelakis* Ph.D. Candidate Kexin Rong*,

A Network Coding Approach A Network Coding Approach to IP Traceback Pegah Sattari, Minas Gjokas,

Lecture 7.6: Rings of fractions Matthew Macauley Department of Mathematical Sciences Clemson

Plug-and-Play Methods Provably Converge with Properly Trained Denoisers Ernest K. Ryu 1 Sicheng

On the Quantitative Hardness of CVP Huck Bennett, Alexander Golovnev, Noah Stephens-Davidowitz

Deep Approximation via Deep Learning Zuowei Shen Department of - PowerPoint PPT Presentation

Deep Approximation via Deep Learning Zuowei Shen Department of Mathematics National University of Singapore Outline Introduction of approximation theory 1 Approximation of functions by compositions 2 Approximation rate in term of number of

6. Approximation and fitting norm approximation least-norm problems regularized

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Moderately exponential approximation Bridging the gap between exact computation and polynomial

6. Approximation and fitting Prof. Ying Cui Department of Electrical Engineering Shanghai Jiao

ECS 231 Lecture on Approximation and Error Analysis 1 / 9 Approximation and error analysis 1.

LOCAL LINEAR APPROXIMATION MATH 200 GOALS Be able to compute the local linear approximation

Lecture 18: PCP Theorem and Hardness of Approximation I Arijit Bishnu 26.04.2010 Introduction

Advanced Algorithms COMS31900 Approximation algorithms part three (Fully) Polynomial Time

Deep Reinforcement Learning [Human-Level Control through deep reinforcement learning, Nature

Learning From Data Lecture 7 Approximation Versus Generalization The VC Dimension Approximation

Presentation about Deep Learning --- Zhongwu xie Contents 1.Brief introduction of Deep learning.

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

Deep learning Deep reinforcement learning Hamid Beigy Sharif university of technology December

Differen'able Func'onal Programming Noel Welsh @noelwelsh underscore Goals Deep learning

CS 6316 Machine Learning Dimensionality Reduction Yangfeng Ji Department of Computer Science

Learning Mixtures of Spherical Gaussians: Moment Methods and Spectral Decompositions Daniel Hsu

Surfaces How to carry to surface? Texture Synthesis: Surfaces and RD from Wei &amp; Levoy = +

Rehashing Kernel Evaluation in High Dimensions Paris Siminelakis* Ph.D. Candidate Kexin Rong*,

A Network Coding Approach A Network Coding Approach to IP Traceback Pegah Sattari, Minas Gjokas,

Lecture 7.6: Rings of fractions Matthew Macauley Department of Mathematical Sciences Clemson

Plug-and-Play Methods Provably Converge with Properly Trained Denoisers Ernest K. Ryu 1 Sicheng

On the Quantitative Hardness of CVP Huck Bennett, Alexander Golovnev, Noah Stephens-Davidowitz

Surfaces How to carry to surface? Texture Synthesis: Surfaces and RD from Wei & Levoy = +