ReLu and Maxout Networks and Their Possible Connections to Tropical - PowerPoint PPT Presentation

ReLu and Maxout Networks and Their Possible Connections to Tropical Methods J¨ org Zimmermann, AG Weber Institute of Computer Science University of Bonn, Germany 1 J¨ org Zimmermann: ReLu and Maxout Networks and Their Possible Connections to Tropical Methods

Overview 1. Artificial Neural Networks 2. Activation Functions 3. Maxout Networks: are they tropical? 2 J¨ org Zimmermann: ReLu and Maxout Networks and Their Possible Connections to Tropical Methods

Artificial Neural Networks • Network of functions • If the network graph is acyclical (DAG), it is called a Feedforward Neural Networks (FNNs) • If the network graph contains loops, it is called a Recurrent Neural Network (RNNs). 3 J¨ org Zimmermann: ReLu and Maxout Networks and Their Possible Connections to Tropical Methods

Feed Forward Networks • FNNs can be organised in layers: input layer, hidden layer(s), output layer • Networks having several hidden layers are called deep networks. 4 J¨ org Zimmermann: ReLu and Maxout Networks and Their Possible Connections to Tropical Methods

Forward Neural Networks • A node computes: � f ( x ) = σ ( w i x i + b ) i • σ is a nonlinear function and is called activation function. • Typically σ was the logistic function: 1 σ ( x ) = 1 + e − x 5 J¨ org Zimmermann: ReLu and Maxout Networks and Their Possible Connections to Tropical Methods

Forward Neural Networks: Basic theorems • Neural Networks are universal approximator for continuous functions within a bounded domain. • Even one hidden layer networks are universal approximators. • But the number of neurons can grow exponentially when one flattens a deep network. 6 J¨ org Zimmermann: ReLu and Maxout Networks and Their Possible Connections to Tropical Methods

Forward Neural Networks: Training • Weights (and offsets) are parameters which can be learned (wrt. to a training data set) • Classical training algorithm: Backpropagation (BP) • Propagating error signals from the output layer back to the input layer via the hidden layer(s) (gradient descent). 7 J¨ org Zimmermann: ReLu and Maxout Networks and Their Possible Connections to Tropical Methods

Forward Neural Networks: Problems • Problem of BP (using logistic activation function): backpropagated error signals grow or vanish exponentially from hidden layer to hidden layer. • This was one of the main road blocks preventing the training of deep networks, and one of the main reasons interest in FNNs dropped in the mid 90s. 8 J¨ org Zimmermann: ReLu and Maxout Networks and Their Possible Connections to Tropical Methods

Forward Neural Networks: approaches for training deep networks Approaches to circumvent the vanishing gradient problem: • Introduce amplifier neurons ⇒ Recurrent Networks, LSTM-Networks (J¨ urgen Schmidhuber, Lugano, Switzerland) • transform supervised learning of all layers simultaneously into a sequence of unsupervised learning task, hidden layer by hidden layer. (Geoffrey Hinton, Toronto, Canada) • Introduction of new activation functions. 9 J¨ org Zimmermann: ReLu and Maxout Networks and Their Possible Connections to Tropical Methods

Activation Functions The surge in interest in deep learning has led to the investigation of many different activation functions. ReLU (Rectified Linear Unit): ReLU ( x ) = max(0 , x ) smooth ReLU: sReLU ( x ) = log(1 + e x ) 10 J¨ org Zimmermann: ReLu and Maxout Networks and Their Possible Connections to Tropical Methods

Activation Functions ELU (Exponential Linear Unit):  x ≥ 0 x  ELU ( x ) = α ( e x − 1) otherwise  11 J¨ org Zimmermann: ReLu and Maxout Networks and Their Possible Connections to Tropical Methods

Activation Functions SELU (Scaled Exponential Linear Unit):  x ≥ 0 x  SELU ( x ) = λ α ( e x − 1) otherwise  This activation function is introduced in the article “Self-normalizing Neural Networks” (2017, Sepp Hochreiter). One of the few examples where they prove properties of their activation function. They can show that on average their is no vanishing (or exploding) gradient problem (for magic numbers for α and λ ). 12 J¨ org Zimmermann: ReLu and Maxout Networks and Their Possible Connections to Tropical Methods

Activation Functions Maxout: � � maxout ( z 1 , ..z k ) = max ( w 1 i x i + b 1 , ..., w ki x i + b k ) i i x = ( x 1 , .., x n ) = output vector of the previous layer The maxout-node applies k different scalar products to x plus k offsets ( b 1 , .., b k ) and finally takes the maximum of these k values. z j = � i w ji x i + b j 13 J¨ org Zimmermann: ReLu and Maxout Networks and Their Possible Connections to Tropical Methods

Maxout Networks In 2013 an article titled “Maxout Networks” (Ian Goodfellow) was published introducing a max-based activation function. • Maxout networks are neural networks using the maxout-function as activation function. • ReLu is a special case. • If k scalar products are provided for a node, effectively this node can learn a local nonlinear activation function by approximating it with a piecewise linear function consisting of k intervals. 14 J¨ org Zimmermann: ReLu and Maxout Networks and Their Possible Connections to Tropical Methods

Maxout Networks • Maxout Networks are universal approximators, too. • Analogically, they can be flattened to one max-layer, but again by blowing up the network exponentially. • A maxout network tessellates the input space into polytopes, and computes a linear function on each polytope. • The authors claim that maxout networks are able to generalize from smaller data samples, but only empirical evidence, no proofs. 15 J¨ org Zimmermann: ReLu and Maxout Networks and Their Possible Connections to Tropical Methods

Maxout Networks: are they tropical? If one accepts non-integer coefficients, then maxout-Networks are tropical! So, tropical geometry may be useful to prove properties of maxout networks. 16 J¨ org Zimmermann: ReLu and Maxout Networks and Their Possible Connections to Tropical Methods

ReLu and Maxout Networks and Their Possible Connections to Tropical - PowerPoint PPT Presentation

ReLu and Maxout Networks and Their Possible Connections to Tropical Methods J org Zimmermann, AG Weber Institute of Computer Science University of Bonn, Germany 1 J org Zimmermann: ReLu and Maxout Networks and Their Possible Connections

On-line learning in neural networks with ReLU activation Michiel Straat September 19, 2018 1 /

Very Deep Residual Networks with Maxout for Plant Identification in the Wild Mi l a n u

Small ReLU networks are powerful memorizers: a tight analysis of memorization capacity Chulhee

CUDA NEW FEATURES AND BEYOND Stephen Jones, GTC 2019 A QUICK LOOK BACK This Time Last Year...

Collapse of Deep and Narrow ReLU Neural Nets Lu Lu , Yeonjong Shin, Yanhui Su, George Karniadakis

All That Glisters Is Not Convnets: Hybrid Architectures For Faster, Better Solvers Prof Tom

TIME/ACCURACY TRADEOFFS FOR LEARNING A RELU WITH RESPECT TO GAUSSIAN MARGINALS Surbhi Goel

Constructive universal high-dimensional distribution generation through deep ReLU networks Dmytro

Jefferies 2 0 1 4 Global Energy Presentation Title Presentation Title Presentation Title

Global connections CARGO TRINITY HOUSE // KEY STAGE 2 GLOBAL CONNECTIONS Starter Activity 1

Slide 1 Page: 1 Making Math Connections initial (2-23-10).ppt Making Mathematical Connections in

On the Invertibility of ReLU Networks Inverse Problems and Machine Learning, Caltech Jens

Nonparametric regression using deep neural networks with ReLU activation function Johannes

Astronomical Tests Possible Violations of . . . Possible Violations of . . . of Relativity:

animals Dr. Heather Fraser Human-animal connections, grief, trauma and veterinary social work

Linear connections on Lie groups The affine space of linear connections on a compact Lie group G

Binary Heaps Autumn 2018 Shrirang (Shri) Mare shri@cs.washington.edu Thanks to Kasey Champion,

Vectorial Quasi-flat Zones for Color Image Simplification Erhan Aptoula, Jonathan Weber,

Object categorization: the constellation models Li Fei-Fei with many thanks to Rob Fergus with

Provides ATS remotely to small airports Replaces local tower with cameras and sensors

NVIDIA CUDA Implementation of a Hierarchical Object Recognition Algorithm Sharat Chikkerur CBCL,

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I S ebastien

que ueue ue ban and Agenda Product Vision Critical-risks Key findings Moving Forward

2020 Effective Mentoring Program Combined Program (School and Early Childhood) Day 2 1 2020 SB