Basics of Numerical Optimization: Preliminaries Ju Sun Computer - PowerPoint PPT Presentation

Basics of Numerical Optimization: Preliminaries Ju Sun Computer Science & Engineering University of Minnesota, Twin Cities February 11, 2020 1 / 24

Supervised learning as function approximation – Underlying true function: f 0 – Training data: { x i , y i } with y i ≈ f 0 ( x i ) – Choose a family of functions H , so that ∃ f ∈ H and f and f 0 are close 2 / 24

Supervised learning as function approximation – Underlying true function: f 0 – Training data: { x i , y i } with y i ≈ f 0 ( x i ) – Choose a family of functions H , so that ∃ f ∈ H and f and f 0 are close – Find f , i.e., optimization � min ℓ ( y i , f ( x i )) + Ω ( f ) f ∈H i – Approximation capacity: Univeral approximation theorems (UAT) = ⇒ replace H by DNN W , i.e., a deep neural network with weights W 2 / 24

Supervised learning as function approximation – Underlying true function: f 0 – Training data: { x i , y i } with y i ≈ f 0 ( x i ) – Choose a family of functions H , so that ∃ f ∈ H and f and f 0 are close – Find f , i.e., optimization � min ℓ ( y i , f ( x i )) + Ω ( f ) f ∈H i – Approximation capacity: Univeral approximation theorems (UAT) = ⇒ replace H by DNN W , i.e., a deep neural network with weights W – Optimization: � min ℓ ( y i , DNN W ( x i )) + Ω ( W ) W i 2 / 24

Supervised learning as function approximation – Underlying true function: f 0 – Training data: { x i , y i } with y i ≈ f 0 ( x i ) – Choose a family of functions H , so that ∃ f ∈ H and f and f 0 are close – Find f , i.e., optimization � min ℓ ( y i , f ( x i )) + Ω ( f ) f ∈H i – Approximation capacity: Univeral approximation theorems (UAT) = ⇒ replace H by DNN W , i.e., a deep neural network with weights W – Optimization: � min ℓ ( y i , DNN W ( x i )) + Ω ( W ) W i – Generalization: how to avoid over-complicated DNN W in view of UAT 2 / 24

Supervised learning as function approximation – Underlying true function: f 0 – Training data: { x i , y i } with y i ≈ f 0 ( x i ) – Choose a family of functions H , so that ∃ f ∈ H and f and f 0 are close – Find f , i.e., optimization � min ℓ ( y i , f ( x i )) + Ω ( f ) f ∈H i – Approximation capacity: Univeral approximation theorems (UAT) = ⇒ replace H by DNN W , i.e., a deep neural network with weights W – Optimization: � min ℓ ( y i , DNN W ( x i )) + Ω ( W ) W i – Generalization: how to avoid over-complicated DNN W in view of UAT Now we start to focus on optimization . 2 / 24

Outline Elements of multivatiate calculus Optimality conditions of unconstrained optimization 3 / 24

Recommended references [Munkres, 1997, Zorich, 2015, Coleman, 2012] 4 / 24

Our notation – scalars: x , vectors: x , matrices: X , tensors: X , sets: S 5 / 24

Our notation – scalars: x , vectors: x , matrices: X , tensors: X , sets: S – vectors are always column vectors , unless stated otherwise 5 / 24

Our notation – scalars: x , vectors: x , matrices: X , tensors: X , sets: S – vectors are always column vectors , unless stated otherwise – x i : i -th element of x , x ij : ( i, j ) -th element of X , x i : i -th row of X as a row vector , x j : j -th column of X as a column vector 5 / 24

Our notation – scalars: x , vectors: x , matrices: X , tensors: X , sets: S – vectors are always column vectors , unless stated otherwise – x i : i -th element of x , x ij : ( i, j ) -th element of X , x i : i -th row of X as a row vector , x j : j -th column of X as a column vector – R : real numbers, R + : positive reals, R n : space of n -dimensional vectors, R m × n : space of m × n matrices, R m × n × k : space of m × n × k tensors, etc 5 / 24

Our notation – scalars: x , vectors: x , matrices: X , tensors: X , sets: S – vectors are always column vectors , unless stated otherwise – x i : i -th element of x , x ij : ( i, j ) -th element of X , x i : i -th row of X as a row vector , x j : j -th column of X as a column vector – R : real numbers, R + : positive reals, R n : space of n -dimensional vectors, R m × n : space of m × n matrices, R m × n × k : space of m × n × k tensors, etc – [ n ] . = { 1 , . . . , n } 5 / 24

Differentiability — first order Consider f ( x ) : R n → R m – Definition: First-order differentiable at a point x if there exists a matrix B ∈ R m × n such that f ( x + δ ) − f ( x ) − Bδ → 0 as δ → 0 . � δ � 2 6 / 24

Differentiability — first order Consider f ( x ) : R n → R m – Definition: First-order differentiable at a point x if there exists a matrix B ∈ R m × n such that f ( x + δ ) − f ( x ) − Bδ → 0 as δ → 0 . � δ � 2 i.e., f ( x + δ ) = f ( x ) + Bδ + o ( � δ � 2 ) as δ → 0 . 6 / 24

Differentiability — first order Consider f ( x ) : R n → R m – Definition: First-order differentiable at a point x if there exists a matrix B ∈ R m × n such that f ( x + δ ) − f ( x ) − Bδ → 0 as δ → 0 . � δ � 2 i.e., f ( x + δ ) = f ( x ) + Bδ + o ( � δ � 2 ) as δ → 0 . echet) derivative. When m = 1 , b ⊺ (i.e., B ⊺ ) – B is called the (Fr´ called gradient , denoted as ∇ f ( x ) . For general m , also called Jacobian matrix, denoted as J f ( x ) . 6 / 24

Differentiability — first order Consider f ( x ) : R n → R m – Definition: First-order differentiable at a point x if there exists a matrix B ∈ R m × n such that f ( x + δ ) − f ( x ) − Bδ → 0 as δ → 0 . � δ � 2 i.e., f ( x + δ ) = f ( x ) + Bδ + o ( � δ � 2 ) as δ → 0 . echet) derivative. When m = 1 , b ⊺ (i.e., B ⊺ ) – B is called the (Fr´ called gradient , denoted as ∇ f ( x ) . For general m , also called Jacobian matrix, denoted as J f ( x ) . – Calculation: b ij = ∂f i ∂x j ( x ) 6 / 24

Differentiability — first order Consider f ( x ) : R n → R m – Definition: First-order differentiable at a point x if there exists a matrix B ∈ R m × n such that f ( x + δ ) − f ( x ) − Bδ → 0 as δ → 0 . � δ � 2 i.e., f ( x + δ ) = f ( x ) + Bδ + o ( � δ � 2 ) as δ → 0 . echet) derivative. When m = 1 , b ⊺ (i.e., B ⊺ ) – B is called the (Fr´ called gradient , denoted as ∇ f ( x ) . For general m , also called Jacobian matrix, denoted as J f ( x ) . – Calculation: b ij = ∂f i ∂x j ( x ) – Sufficient condition : if all partial derivatives exist and are continuous at x , then f ( x ) is differentiable at x . 6 / 24

Calculus rules Assume f, g : R n → R m are differentiable at a point x ∈ R n . – linearity : λ 1 f + λ 2 g is differentiable at x and ∇ [ λ 1 f + λ 2 g ] ( x ) = λ 1 ∇ f ( x ) + λ 2 ∇ g ( x ) – product : assume m = 1 , fg is differentiable at x and ∇ [ fg ] ( x ) = f ( x ) ∇ g ( x ) + g ( x ) ∇ f ( x ) – quotient : assume m = 1 and g ( x ) � = 0 , f g is differentiable at x and � � ( x ) = g ( x ) ∇ f ( x ) − f ( x ) ∇ g ( x ) f ∇ g g 2 ( x ) 7 / 24

Calculus rules Assume f, g : R n → R m are differentiable at a point x ∈ R n . – linearity : λ 1 f + λ 2 g is differentiable at x and ∇ [ λ 1 f + λ 2 g ] ( x ) = λ 1 ∇ f ( x ) + λ 2 ∇ g ( x ) – product : assume m = 1 , fg is differentiable at x and ∇ [ fg ] ( x ) = f ( x ) ∇ g ( x ) + g ( x ) ∇ f ( x ) – quotient : assume m = 1 and g ( x ) � = 0 , f g is differentiable at x and � � ( x ) = g ( x ) ∇ f ( x ) − f ( x ) ∇ g ( x ) f ∇ g g 2 ( x ) – Chain rule : Let f : R m → R n and h : R n → R k , and f is differentiable at x and y = f ( x ) and h is differentiable at y . Then, h ◦ f : R n → R k is differentiable at x , and J [ h ◦ f ] ( x ) = J h ( f ( x )) J f ( x ) . 7 / 24

Calculus rules Assume f, g : R n → R m are differentiable at a point x ∈ R n . – linearity : λ 1 f + λ 2 g is differentiable at x and ∇ [ λ 1 f + λ 2 g ] ( x ) = λ 1 ∇ f ( x ) + λ 2 ∇ g ( x ) – product : assume m = 1 , fg is differentiable at x and ∇ [ fg ] ( x ) = f ( x ) ∇ g ( x ) + g ( x ) ∇ f ( x ) – quotient : assume m = 1 and g ( x ) � = 0 , f g is differentiable at x and � � ( x ) = g ( x ) ∇ f ( x ) − f ( x ) ∇ g ( x ) f ∇ g g 2 ( x ) – Chain rule : Let f : R m → R n and h : R n → R k , and f is differentiable at x and y = f ( x ) and h is differentiable at y . Then, h ◦ f : R n → R k is differentiable at x , and J [ h ◦ f ] ( x ) = J h ( f ( x )) J f ( x ) . When k = 1 , ∇ [ h ◦ f ] ( x ) = J ⊤ f ( x ) ∇ h ( f ( x )) . 7 / 24

Differentiability — second order Consider f ( x ) : R n → R and assume f is 1st-order differentiable in a small ball around x ∂x j ∂x i ( x ) . � � �� ∂f 2 ∂f ∂ – Write = ( x ) provided the right side well ∂x j ∂x i defined 8 / 24

Differentiability — second order Consider f ( x ) : R n → R and assume f is 1st-order differentiable in a small ball around x ∂x j ∂x i ( x ) . � � �� ∂f 2 ∂f ∂ – Write = ( x ) provided the right side well ∂x j ∂x i defined ∂f 2 ∂f 2 – Symmetry : If both ∂x j ∂x i ( x ) and ∂x i ∂x j ( x ) exist and both are continuous at x , then they are equal. – Hessian (matrix) : ∂f 2 � � ∇ 2 f ( x ) . = ( x ) (1) , ∂x j ∂x i j,i � � ∂f 2 j,i ∈ R n × n has its ( j, i ) -th element as ∂f 2 where ∂x j ∂x i ( x ) ∂x j ∂x i ( x ) . 8 / 24

Basics of Numerical Optimization: Preliminaries Ju Sun Computer - PowerPoint PPT Presentation

Basics of Numerical Optimization: Preliminaries Ju Sun Computer Science & Engineering University of Minnesota, Twin Cities February 11, 2020 1 / 24 Supervised learning as function approximation Underlying true function: f 0

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization

Basics of Numerical Optimization: Computing Derivatives Ju Sun Computer Science &

Numerical Optimization Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science,

Numerical Differentiation & Integration Composite Numerical Integration I Numerical Analysis

Numerical Differentiation & Integration Numerical Differentiation I Numerical Analysis (9th

Numerical Semigroup Algebra Joint with Kee, Mee-Kyoung International meeting on numerical

Obstacles in Numerical Calculations Erik Schnetter Paris, November 2006 Obstacles in Numerical

JUST THE MATHS SLIDES NUMBER 17.7 NUMERICAL MATHEMATICS 7 (Numerical solution) of

JUST THE MATHS SLIDES NUMBER 17.8 NUMERICAL MATHEMATICS 8 (Numerical solution) of

4. Numerical Quadrature Where analytical abilities end . . . 4. Numerical Quadrature Numerical

Numerical Differentiation & Integration Elements of Numerical Integration I Numerical

Numerical Recipes for Multiprecision Computations Henri Cohen May 13, 2014 IMB, Universit e

Numerical Differentiation & Integration Numerical Differentiation II Numerical Analysis (9th

CBMC: Bounded Model Checking for ANSI-C Version 1.0, 2010 Outline Preliminaries BMC Basics

CBMC: Bounded Model Checking for ANSI-C Preliminaries BMC Basics Completeness Version 1.0, 2010

Outline 2 Introduction Introduction Preliminaries Preliminaries Problem formulation Problem

Survival Analysis Objective : to establish a connection between a set of features and the time

Computational Optimization Convexity and Unconstrained Optimization 1/29/08 and 2/1(revised)

Numerical Computation Sargur N. Srihari srihari@cedar.buffalo.edu This is part of lecture slides

Structure of the Hessian Graham C. Goodwin September 2004 Centre for Complex Dynamic Systems

HESSIAN vs OFFSET method PDF4LHC F b PDF4LHC February 2008 2008 A M Cooper-Sarkar Comparisons

Advanced Machine Learning Dense Neural Networks Amit Sethi Electrical Engineering, IIT Bombay

Preliminary Course on Mathematics winter term 2014/2015 Veronika Penner

Numerical study of complex instantons in the Gross- Witten U(N) matrix model S. Valgushev, P.

Basics of Numerical Optimization: Preliminaries Ju Sun Computer - PowerPoint PPT Presentation

Basics of Numerical Optimization: Preliminaries Ju Sun Computer Science & Engineering University of Minnesota, Twin Cities February 11, 2020 1 / 24 Supervised learning as function approximation Underlying true function: f 0

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization

Basics of Numerical Optimization: Computing Derivatives Ju Sun Computer Science &amp;

Numerical Optimization Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science,

Numerical Differentiation &amp; Integration Composite Numerical Integration I Numerical Analysis

Numerical Differentiation &amp; Integration Numerical Differentiation I Numerical Analysis (9th

Numerical Semigroup Algebra Joint with Kee, Mee-Kyoung International meeting on numerical

Obstacles in Numerical Calculations Erik Schnetter Paris, November 2006 Obstacles in Numerical

JUST THE MATHS SLIDES NUMBER 17.7 NUMERICAL MATHEMATICS 7 (Numerical solution) of

JUST THE MATHS SLIDES NUMBER 17.8 NUMERICAL MATHEMATICS 8 (Numerical solution) of

4. Numerical Quadrature Where analytical abilities end . . . 4. Numerical Quadrature Numerical

Numerical Differentiation &amp; Integration Elements of Numerical Integration I Numerical

Numerical Recipes for Multiprecision Computations Henri Cohen May 13, 2014 IMB, Universit e

Numerical Differentiation &amp; Integration Numerical Differentiation II Numerical Analysis (9th

CBMC: Bounded Model Checking for ANSI-C Version 1.0, 2010 Outline Preliminaries BMC Basics

CBMC: Bounded Model Checking for ANSI-C Preliminaries BMC Basics Completeness Version 1.0, 2010

Outline 2 Introduction Introduction Preliminaries Preliminaries Problem formulation Problem

Survival Analysis Objective : to establish a connection between a set of features and the time

Computational Optimization Convexity and Unconstrained Optimization 1/29/08 and 2/1(revised)

Numerical Computation Sargur N. Srihari srihari@cedar.buffalo.edu This is part of lecture slides

Structure of the Hessian Graham C. Goodwin September 2004 Centre for Complex Dynamic Systems

HESSIAN vs OFFSET method PDF4LHC F b PDF4LHC February 2008 2008 A M Cooper-Sarkar Comparisons

Advanced Machine Learning Dense Neural Networks Amit Sethi Electrical Engineering, IIT Bombay

Preliminary Course on Mathematics winter term 2014/2015 Veronika Penner

Numerical study of complex instantons in the Gross- Witten U(N) matrix model S. Valgushev, P.

Basics of Numerical Optimization: Computing Derivatives Ju Sun Computer Science &

Numerical Differentiation & Integration Composite Numerical Integration I Numerical Analysis

Numerical Differentiation & Integration Numerical Differentiation I Numerical Analysis (9th

Numerical Differentiation & Integration Elements of Numerical Integration I Numerical

Numerical Differentiation & Integration Numerical Differentiation II Numerical Analysis (9th