An Optimization Methodology for Neural Network Weights and - PowerPoint PPT Presentation

_______________________________________________________________________ 1 An Optimization Methodology for Neural Network Weights and Architectures Teresa B Ludermir Teresa B. Ludermir tbl@cin ufpe br tbl@cin.ufpe.br _____________________________________________________________________________ Centro de Informática – Universidade Federal de Pernambuco

_______________________________________________________________________ 2 Outline Motivation • Simulated Annealing and Tabu Search Simulated Annealing and Tabu Search • Optimization Methodology • Implementation Details • Experiments and Results • Final Remarks • _____________________________________________________________________________ Centro de Informática – Universidade Federal de Pernambuco

_______________________________________________________________________ 3 Motivation Architecture design is crucial in MLP applications. • Lack of connections can make the network incapable to solve a Lack of connections can make the network incapable to solve a • problem because there is few parameters to adjust. Too many connections can provoke overfitting. • In general we try many different architectures. In general we try many different architectures • • It is important to develop automatic processes for defining MLP p p p g • architectures. _____________________________________________________________________________ Centro de Informática – Universidade Federal de Pernambuco

_______________________________________________________________________ 4 _____________________________________________________________________________ Centro de Informática – Universidade Federal de Pernambuco

_______________________________________________________________________ 5 Motivation There are several global optimization methods that can be used to • deal with this problem. Ex.: genetic algorithms, simulated annealing and tabu search. • Architecture design for MLP can be formulated as an optimization • problem, where each solution represents an architecture. The cost measure can be a function of the training error and the • network size. _____________________________________________________________________________ Centro de Informática – Universidade Federal de Pernambuco

_______________________________________________________________________ 6 Motivation M Most solutions represents only topological information, but not the t l ti t l t l i l i f ti b t t th • weight values. Disadvantage: noise fitness evaluation • Each solution has only the architectures but a network with a full • set of weights must be used to calculate the training error for the g g cost function. Good option: optimizing neural network architectures and weights Good option: optimizing neural network architectures and weights • • simultaneously. Each point in the search space is a fully specified ANN with • complete weight information complete weight information. Cost evaluation becomes more accurate. • _____________________________________________________________________________ Centro de Informática – Universidade Federal de Pernambuco

_______________________________________________________________________ 7 Motivation Global optimization techniques are relatively inefficient in fine- • tuned local search. Hybrid Training: • Global technique for training the network followed by a local • algorithm (Ex.: backpropagation) for the improvement of the generalization performance. g p _____________________________________________________________________________ Centro de Informática – Universidade Federal de Pernambuco

_______________________________________________________________________ 8 Goal Methodology for the simultaneous optimization of MLP network • weights and architectures. Combines the advantages of simulated annealing and tabu search • avoiding the limitations of the methods. Applies backpropagation as local search algorithm for • improvement of the weights adjustments. p g j Results from the application of the methodology to a real-world • problems are presented and compared to those obtained by BP, bl d d d h b i d b BP SA and TS. _____________________________________________________________________________ Centro de Informática – Universidade Federal de Pernambuco

_______________________________________________________________________ 9 Simulated Annealing Method has the ability to escape from local minima due to the • probability of accepting a new solution that increases the cost. This probability is regulated by a parameter called temperature, • which is decreased during the optimization process. In many cases, the method may take a very long time to converge • if the temperature reduction rule is too slow. p However, a slow rule is often necessary, in order to allow an • efficient exploration in the search space. ffi i l i i h h _____________________________________________________________________________ Centro de Informática – Universidade Federal de Pernambuco

_______________________________________________________________________ 10 Implementation Details of Simulated Annealing – Basic Structure of Simulated Annealing: B i St t f Si l t d A li • s 0 � initial solution in S • For i = 0 to I – 1 For i 0 to I 1 • Generate neighbor solution s’ • If f (s’ ) ≤ f (s i ) • s i 1 � s’ s i+1 � s • else • s i+1 � s’ with probability e – [ f (s’ ) – f (s i )]/ T i + 1 • otherwise s i+1 � s i otherwise s i+1 � s i • Return s I • S is the set of solutions f is the real-valued cost function I S is the set of solutions, f is the real-valued cost function, I is the maximum number of epochs, and T i is the temperature of epoch i . _____________________________________________________________________________ Centro de Informática – Universidade Federal de Pernambuco

_______________________________________________________________________ 11 Tabu search Tabu search evaluates many new solutions in each iteration, • instead of only one solution. The best solution (i.e., the one with lower cost) is always accepted ( , ) y p • as the current solution. This strategy makes tabu search faster than simulated annealing. • It demands implementing a list containing a set of recently visited i i i i i f i i • solutions (the tabu list), in order to avoid the acceptation of previously evaluated solutions. Using the tabu list for comparing new solutions to the prohibited • (tabu) solutions increases the computational cost of tabu search when compared to simulated annealing when compared to simulated annealing. _____________________________________________________________________________ Centro de Informática – Universidade Federal de Pernambuco

_______________________________________________________________________ 12 Implementation Details of Tabu Search – Basic Structure of Tabu Search: B i St t f T b S h • s 0 � initial solution in S • s BSF � s 0 (best solution so far) • Insert s 0 in the Tabu List • For i = 0 to I – 1 • Generate a set V of neighbor solutions • Choose the best solution s’ in V (i.e., f (s’ ) ≤ f (s ) for any s in V ) which is not in the Tabu List • s i+1 � s’ • Insert s • Insert s i+1 in the Tabu List in the Tabu List • If f (s i+1 ) ≤ f (s BSF ) s BSF � s i+1 • • Return s BSF Return s BSF • The Tabu List stores the K most recently visited solutions. _____________________________________________________________________________ Centro de Informática – Universidade Federal de Pernambuco

_______________________________________________________________________ 13 Optimization Methodology A set of new solutions is generated at each iteration, and the best • one is selected according to the cost function, as performed by tabu search However, it is possible to accept a new solution that increases the • cost since this decision is guided by a probability distribution, which is the same used by simulated annealing which is the same used by simulated annealing. During the execution of the methodology, the topology and the • weights are optimized, and the best solution found so far ( s BSF ) is stored. d At the end of this process, the MLP architecture contained in s BSF • is kept constant, and the weights are taken as the initial ones for p , g training with the backpropagation algorithm, in order to perform a fine-tuned local search. _____________________________________________________________________________ Centro de Informática – Universidade Federal de Pernambuco

An Optimization Methodology for Neural Network Weights and - PowerPoint PPT Presentation

_______________________________________________________________________ 1 An Optimization Methodology for Neural Network Weights and Architectures Teresa B Ludermir Teresa B. Ludermir tbl@cin ufpe br tbl@cin.ufpe.br

CTA WEIGHTS AND CTA WEIGHTS AND DIMENSIONS DIMENSIONS INITIATIVES INITIATIVES Meeting of the

Plane partitions with two-periodic weights Sevak Mkrtchyan University of Rochester GGI June 15,

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Multiplicative Weights Update as a Distributed Optimization Algorithm: Constrained Optimization

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Neural networks (Ch. 12) Back-propagation The neural network is as good as it's structure and

CHAPTER IV IV CHAPTER Combinatorial Optimization Combinatorial Optimization by Neural Networks

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization

Companies Need to Know Bureau of Weights and Measures E15 Presentation Overview Presenter: Judy

Constructing Inverse Probability Weights for Static Constructing Inverse Probability Weights for

Multiplicative Weights Algorithms CompSci 590.03 Instructor: Ashwin Machanavajjhala Lecture 13 :

Today. Notes. The multiplicative weights framework. Quick Review: experts

Neural Machine Translation Gongbo Tang 8 October 2018 Outline Neural Machine Translation 1

Neural Network II Neural Network II Week 8 1 Team Homework Assignment #10 Team Homework

Deriving SGD for Neural Networks Swarthmore College CS63 Spring 2018 A neural network NN computes

Rationality PEAS An ideal rational agent , in every possible world state, does Agents must

Bookkeeping Due last night: Introduction survey If you havent Academic integrity

DeepNose: Using artificial neural networks to represents the space of odorants Tumi Ngoc Tran,

Teaching visual recognition systems Kristen Grauman Department of Computer Science University

Foundations of Artificial Intelligence 1. Introduction Organizational Aspects, AI in Freiburg,

Intelligent Agents Philipp Koehn 18 February 2020 Philipp Koehn Artificial Intelligence:

Update in Hospital Medicine 2017-2018 VS. Brad Sharpe, MD SFHM Update in Hospital Medicine

CS 188: Artificial Intelligence Lecture 1: Introduction Pieter Abbeel UC Berkeley Many