What is Parameter Optimization? Optimization Techniques Reading: - PowerPoint PPT Presentation

What is Parameter Optimization? Optimization Techniques Reading: C.M.Bishop NNPR § 7 A fancy name for training: the selection of parameter values , which are optimal in some desired sense (eg. minimize an objective function you choose over a dataset you choose). The parameters are the weights and biases of the network. 15-486/782: Arti fi cial Neural Networks Dave Touretzky In this lecture, we will not address learning of network structure. We assume a fi xed number of layers and a fi xed number of hidden units. Fall 2006 (based on slides by A. Courville, Spring 2002, In neural networks, training is typically iterative and time-consuming. It is in our and K. Laskowski, Spring 2004) interests to reduce the training time as much as possible. 2 Lecture Outline Linear Optimization In detail: Applicable to networks with exclusively linear units (and therefore can be re- duced to single layer networks). 1. Gradient Descent (& some extensions) 2. Line Search In one step:   1 1 . . . 1     y (1) y (2) y ( N ) w ∗ w ∗ w ∗ w ∗ . . . . . . 3. Conjugate Gradient Search x (1) x (2) x ( N ) 1 , 0 1 , 1 1 , 2 1 ,I . . . 1 1 1   w ∗ w ∗ w ∗ . . . w ∗ 1 1 1 y (1) y (2) y ( N ) . . . x (1) x (2) x ( N )  2 , 0 2 , 1 2 , 2 2 ,I      . . . ≍ 2 2 2 . . . ... . . . . 2 2 2 ...  . . . .     . . .  . . . . . . . ... . . . . . .  . . .  w ∗ w ∗ w ∗ w ∗ . . . In passing: y (1) y (2) y ( N ) . . . K, 0 K, 1 K, 2 K,I x (1) x (2) x ( N ) . . . K K K I I I W ∗ · X ≍ Y 4. Newton’s method W ∗ · X · X T = Y · X T � Y · X T � � X · X T � − 1 W ∗ = 5. Quasi-Newton methods This is linear regression. A good idea to always try fi rst – maybe you don’t need We will not cover Model Trust Region methods (Scaled Conjugate Gradients, non-linearities. Levenberg-Marquart). 3 4

� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � Non-linear Optimization Try It In Matlab >> W = [1 2 3]; Given a fi xed neural network architecture with non-linearities, we seek iterative algorithms which implement a search in parameter space: >> x = [ ones(1,50); rand(2,50)]; >> y = W*x; = w ( τ ) + ∆ w ( τ ) w ( τ +1) ℜ W w ∈ >> W_star = (y*x’) * inv(x*x’) W_star = At each timestep τ , ∆ w ( τ ) is chosen to reduce an objective (error) function 1.0000 2.0000 3.0000 E ( { x , t } ; w ) . For example, for a network with K linear output units, the appro- priate choice is the sum-of-squares error: >> W_hat = y * pinv(x) N K 1 � � � y k ( x n ; w ) − t n � 2 W_hat = E = k 2 n =1 k =1 where N is the number of patterns. 1.0000 2.0000 3.0000 5 6 Approximating Error Surface Behaviour The Parameter Space Holding the dataset { x , t } fi xed, consider a second order Taylor series expan-   �� w (2) w (2) w (2) · · · 0 , 1 1 , 1 J, 1 y 1 y k y K sion of E ( w ) about a point w 0 : · · · · · · . . ... . = � �� W 2 . . . � � � � � ��  . . .  � � �� w (2) w (2) w (2) � � � � � � � � � � � � � � � � � � � · · · � � � � � � � � � � � � � � � � � � � � � � E ( w 0 ) + ( w − w 0 ) T b + 1 � � � � � � � � � � � � � � � � � � � � 0 ,K 1 ,K J,K � � � � � � � � � � � � � � � � � � � � � � 2( w − w 0 ) T H ( w − w 0 ) � � � � � � � � � � � � � � � � � E ( w ) = (1) � � � � � � � � � � � � � � � � � � � � ℜ ( K,J +1) �� ∈ � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � z 1 � z 2 � z 3 z 4 z j z J 1 � · · · · · · � �� where b is the gradient of E | w 0 and H is the Hessian of E | w 0 : � � � � � � � � � � �   � � � � � � � � � � � � � � � � � � � � � � � � � � � � w (1) w (1) w (1) � � � � � � � � � � � � � � � � � � � � � � � � � � · · · � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 0 , 1 1 , 1 I, 1 � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � . . ... . � � � � � � � � � � � � � � W 1 = � �� . . . � � � � � � � � � � �  . . .  � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � x 1 � x 2 � · · · x i · · · x I w (1) w (1) w (1) 1 b ≡ ∇ E | w 0 ∇ 2 E | w 0 · · · H ≡ 0 ,J 1 ,J I,J   ℜ ( J,I +1) ∂ E   ∂ 2 E ∂ 2 E ∈ · · · ∂ w 1 ∂ w 2 ∂ w 1 ∂ w W . = . 1  .  . ... . = . .   . . ∂ E ∂ 2 E ∂ 2 E Want to think of (and operate on) weight matrices as a single vector: ∂ w W w 0 · · · ∂ w 2 ∂ w W ∂ w 1 w 0 W w = mapping ( W 1 , W 2 ) In a similar way, we can de fi ne a fi rst order approximation to the gradient: ℜ W , ∈ W = J ( I + 1) + K ( J + 1) ∇ E | w = b + H ( w − w 0 ) (2) Doesn’t matter what mapping is, as long as we can reverse it when necessary. 7 8

What is Parameter Optimization? Optimization Techniques Reading: - PowerPoint PPT Presentation

What is Parameter Optimization? Optimization Techniques Reading: C.M.Bishop NNPR 7 A fancy name for training: the selection of parameter values , which are optimal in some desired sense (eg. minimize an objective function you choose over a

6. Parameter Passing Parameter Passing CS 381 Spring 2016 Example (Formal) Parameter void

10/16/19 Parameter Control Genetic Algorithms Motivation Parameter setting Tuning

Parameter Passing and Pointers Parameter passing and functions I: reference parameters

10/16/19 Parameters and Parameter Tuning Genetic Algorithms History Taxonomy

1/88 Presentation: Advanced Techniques 2/88 Presentation: Advanced Techniques 3/88

Intraday Techniques Intraday Techniques Intraday Techniques Intraday Techniques Combining

Subroutines and Parameter Passing ECE2893 Lecture 5 ECE2893 Subroutines and Parameter Passing

Real Time Market Real Time Market Parameter Settings: Parameter Settings: Analytic Results

Funktionen in C++ Funktionen und Parameter Wie in Java: Parameter sind lokale Variablen

Parameter handling Parameter handling and the HADES Oracle database and the HADES Oracle

I 4 - Bayesian parameter estimation in a normal model STAT 587 (Engineering) Iowa State

Data Mining II Optimization & Parameter Tuning Heiko Paulheim Why Parameter Tuning?

Data Mining II Optimization & Parameter Tuning Heiko Paulheim Why Parameter Tuning?

Chemical Synthesis Techniques Chemical Synthesis Techniques Chemical Synthesis Techniques

MACCON Parameter studies for the optimization of e-mobility traction motors with the help of

MASSIVE TADPOLES: Techniques & Applications I. RHO-PARAMETER II. QUARK MASSES III.

Board of Governors Meeting via Teleconference/Webinar February 23, 2016 12:00-1:00 p.m. ET

Research activities in Kazakhstan and Al-Farabi Kazakh National University (short review)

Deep Learning for Network Biology Marinka Zitnik and Jure Leskovec Stanford University Deep

INTRODUCTION TO GENETIC EPIDEMIOLOGY (GBIO0015-1) Prof. Dr. Dr. K. Van Steen Introduction to

Design & Analysis of Design & Analysis of Design & Analysis of Physical Design

Antifragile Software and Genetic Improvement Martin Monperrus University of Lille & Inria,

Mutations that a fg ect auxin tra ffj c or perception give rise to plants with altered body plans

OpenAtom: Fast, fine grained parallel electronic structure software for materials science,

What is Parameter Optimization? Optimization Techniques Reading: - PowerPoint PPT Presentation

What is Parameter Optimization? Optimization Techniques Reading: C.M.Bishop NNPR 7 A fancy name for training: the selection of parameter values , which are optimal in some desired sense (eg. minimize an objective function you choose over a

6. Parameter Passing Parameter Passing CS 381 Spring 2016 Example (Formal) Parameter void

10/16/19 Parameter Control Genetic Algorithms Motivation Parameter setting Tuning

Parameter Passing and Pointers Parameter passing and functions I: reference parameters

10/16/19 Parameters and Parameter Tuning Genetic Algorithms History Taxonomy

1/88 Presentation: Advanced Techniques 2/88 Presentation: Advanced Techniques 3/88

Intraday Techniques Intraday Techniques Intraday Techniques Intraday Techniques Combining

Subroutines and Parameter Passing ECE2893 Lecture 5 ECE2893 Subroutines and Parameter Passing

Real Time Market Real Time Market Parameter Settings: Parameter Settings: Analytic Results

Funktionen in C++ Funktionen und Parameter Wie in Java: Parameter sind lokale Variablen

Parameter handling Parameter handling and the HADES Oracle database and the HADES Oracle

I 4 - Bayesian parameter estimation in a normal model STAT 587 (Engineering) Iowa State

Data Mining II Optimization &amp; Parameter Tuning Heiko Paulheim Why Parameter Tuning?

Data Mining II Optimization &amp; Parameter Tuning Heiko Paulheim Why Parameter Tuning?

Chemical Synthesis Techniques Chemical Synthesis Techniques Chemical Synthesis Techniques

MACCON Parameter studies for the optimization of e-mobility traction motors with the help of

MASSIVE TADPOLES: Techniques &amp; Applications I. RHO-PARAMETER II. QUARK MASSES III.

Board of Governors Meeting via Teleconference/Webinar February 23, 2016 12:00-1:00 p.m. ET

Research activities in Kazakhstan and Al-Farabi Kazakh National University (short review)

Deep Learning for Network Biology Marinka Zitnik and Jure Leskovec Stanford University Deep

INTRODUCTION TO GENETIC EPIDEMIOLOGY (GBIO0015-1) Prof. Dr. Dr. K. Van Steen Introduction to

Design &amp; Analysis of Design &amp; Analysis of Design &amp; Analysis of Physical Design

Antifragile Software and Genetic Improvement Martin Monperrus University of Lille &amp; Inria,

Mutations that a fg ect auxin tra ffj c or perception give rise to plants with altered body plans

OpenAtom: Fast, fine grained parallel electronic structure software for materials science,

Data Mining II Optimization & Parameter Tuning Heiko Paulheim Why Parameter Tuning?

Data Mining II Optimization & Parameter Tuning Heiko Paulheim Why Parameter Tuning?

MASSIVE TADPOLES: Techniques & Applications I. RHO-PARAMETER II. QUARK MASSES III.

Design & Analysis of Design & Analysis of Design & Analysis of Physical Design

Antifragile Software and Genetic Improvement Martin Monperrus University of Lille & Inria,