Feature Significance in Wide Neural Networks Janusz A. Starzyk Rafa - PowerPoint PPT Presentation

Feature Significance in Wide Neural Networks Janusz A. Starzyk Rafał Niemiec Adrian Horzyk 1 2 2 3 starzykj@ohio.edu horzyk@agh.edu.pl rniemiec@wsiz.rzeszow.pl Google: Janusz Starzyk Google: Rafał Niemiec Google: Horzyk 1 2 3 Ohio University, University of AGH University of Athens, Ohio, U.S.A., Information Science and Technology Technology and School of Electrical Engineering and Management, Krakow, Poland Computer Science Rzeszow, Poland

Introduction Introduction Wide neural networks was recently proposed as a less costly ▪ alternative to deep neural networks, having no problem with exploding or vanishing gradient descent. We analyzed quality of features and properties of wide neural ▪ networks. We compared the random selection of weights in the hidden ▪ layer to the selection based on radial basis functions. Our study is devoted to feature selection and feature significance ▪ in wide neural networks. We introduced a measure to compare various feature selection ▪ techniques. We proved that this approach is computationally more efficient. ▪ 2

Wide (Broad) Neural Network Wide (Broad) Neural Network C.L.P. Chen, and Z. Liu, “Broad Learning System: An Effective and Efficient Incremental Learning System Without the Need for Deep Architecture”, IEEE Transactions on Neural Networks and Learning Systems, Vol. 29 , Issue: 1 , Jan. 2018, pp. 10 -24. Wide Neural Networks can be described by the equations: Y 1 = Z 1 · W 1 ASSUMED (it is usually randomly Z 2 = Ψ (Y 1 ) generated, but we propose a different approach to save Y 2 = Z 2 · W 2 the number of necessary neurons and connections) W 2 = Z 2 + · Y D We try to measure COMPUTED the feature quality using pseudoinverse if the assumption was correct 3 They were from 276 to 1543 times faster trained than autoencoders, multilayer perceptrons, and deep neural networks.

Neural Neural Features Features Ψ Z 1 W 1 W 2 Y 2 kxm mxn nxo kxo INPUT OUTPUT FEATURES FEATURES NEURAL FEATURES It is all about W 1 proper column space setup. 4

Ran Random features dom features (used as reference) (used as reference) Classification error approximated experimentally is 5

Radial Basis Radial Basis Features Features Hidden neurons function: Distance between the input data and a hidden neuron : Mean value of norms of differences between weights of hidden neurons: 6

Comparison of random and radial Comparison of random and radial basis basis featur feature e selection selection approaches approaches RANDOM FEATURE SELECTION RADIAL BASIS FEATURE SELECTION 7

Random v Random vs Radial Basis Radial Basis Features Features 1 𝑜 RND RBF Open question: how significant the improvement can be? 8

Feature Feature Significance Significance 𝟐 𝒐 RBF weights are always better than Random weights are always better random weights in range of 0 and 6000 from the reference weights, and this advantage is still better neurons, but this benefits are diminishing along with the number of neurons along with the number of neurons. e 1 = 𝟐 𝒐 theoretical limit of e 1 – for random weights W 1 RVFLNN networks e 2 – for RBF weights W 1 e 2 – for random weights W 1 9

% increase % increase of RND relative to RBF of RND relative to RBF 10

Inc Incremental remental Feature Feature Significance Significance Suppose we have n features and want to add n f new features We want to measure n RND features can be saved if n f will be added e.g. 11

Example A Example A e 1 e 2 n 12

Example B Example B Incremental significance of endpoint features 13

Full Fully Connec y Connected ted Cas Cascade cade No of connections (and weights to train) is 7840! 14

Modified Cascade Modified Cascade 15

Summary & Summary & Future Future Plans Plans We discussed the significance of the feature selection for ▪ wide neural networks. We compared recognition accuracy on the MNIST dataset ▪ using two approaches. We compared wide networks to connected cascades. ▪ We introduced two simple feature significance measures. ▪ In future, we want to explore tradeoffs between the number ▪ of hidden neurons in wide neural networks vs. width and depth of deep neural networks to better understand tradeoffs between the two parameters in these modern network structures. 16

Rafał Niemiec Adrian Horzyk Janusz A. Starzyk rniemiec@wsiz.rzeszow.pl horzyk@agh.edu.pl starzykj@ohio.edu Google: Rafał Niemiec Google: Horzyk Google: Janusz Starzyk 謝謝 1. Basawaraj, J. A. Starzyk, A. Horzyk , Episodic Memory in Minicolumn Associative Knowledge Graphs, IEEE Transactions on Neural Networks and Learning Systems, Vol. 30, Issue 11, 2019, pp. 3505-3516, DOI: 10.1109/TNNLS.2019.2927106 (TNNLS-2018-P-9932), IF = 7.982. 2. J. A. Starzyk , Ł. Maciura , A. Horzyk , Associative Memories with Synaptic Delays, IEEE Transactions on Neural Networks and Learning Systems, Vol. .., Issue .., 2019, pp. ... - ..., DOI: 10.1109/TNNLS.2019.2921143 (TNNLS-2018-P-9188), IF = 7.982. 3. A. Horzyk , J. A. Starzyk , J. Graham, Integration of Semantic and Episodic Memories, IEEE Transactions on Neural Networks and Learning Systems, Vol. 28, Issue 12, Dec. 2017, pp. 3084 - 3095, DOI: 10.1109/TNNLS.2017.2728203, IF = 6.108. 4. A. Horzyk, J.A. Starzyk , Fast Neural Network Adaptation with Associative Pulsing Neurons , IEEE Xplore, In: 2017 IEEE Symposium Series on Computational Intelligence, pp. 339 -346, 2017, DOI: 10.1109/SSCI.2017.8285369. 5. A. Horzyk , Neurons Can Sort Data Efficiently , Proc. of ICAISC 2017, Springer-Verlag, LNAI, 2017, pp. 64 - 74, ICAISC BEST PAPER AWARD 2017 sponsored by Springer. 6. A. Horzyk , J. A. Starzyk and Basawaraj, Emergent creativity in declarative memories , IEEE Xplore, In: 2016 IEEE Symposium Series on Computational Intelligence, Greece, Athens: Institute of Electrical and Electronics Engineers, Curran Associates, Inc. 57 Morehouse Lane Red Hook, NY 12571 USA, 2016, ISBN 978-1-5090-4239-5, pp. 1 - 8, DOI: 10.1109/SSCI.2016.7850029. 17

Pseudo Pseudoinvers inverse e Z 2 + =V =V Σ + U T C(Z 2 ) Z 2 = UΣV T Projections, where possible ΣΣ + = ~I & WIPE nullspace m x m σ 1 0 0 1/ σ 1 0 0 Σ + Σ = ~I Σ + = 0 1/σ 2 0 Σ = 0 σ 2 0 n x n 0 0 0 0 0 0 m x n n x m C(Z 2 T ) + = ( UΣV T ) + = VΣ + U T Z 2 + Y D Minimum norm least square solution W 2 ~= Z 2 Matrix Methods in Data Analysis, Signal Processing, and Machine Learning 18

Feature Significance in Wide Neural Networks Janusz A. Starzyk Rafa - PowerPoint PPT Presentation

Feature Significance in Wide Neural Networks Janusz A. Starzyk Rafa Niemiec Adrian Horzyk 1 2 2 3 starzykj@ohio.edu horzyk@agh.edu.pl rniemiec@wsiz.rzeszow.pl Google: Janusz Starzyk Google: Rafa Niemiec Google: Horzyk 1 2 3 Ohio

Decision Tree Prof. Seungchul Lee Industrial AI Lab. Feature Test Feature 1 Feature 2 Feature

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Greenhouse Gas CEQA Greenhouse Gas CEQA Significance Threshold Significance Threshold

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

A Distinctive Feature of A Distinctive Feature of A Distinctive Feature of A Distinctive Feature

Outline Reducing Dimensionality Feature Selection 1 Steven J Zeil Feature Extraction 2

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

CHAPTER II III I CHAPTER Neural Networks as Neural Networks as Associative Memory

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Neural Networks 0. Logistics Spring 2019 1 Neural Networks are taking over! Neural networks

Neural Networks and their Application to Go Neural Networks Learning Blackjack Theory Training

Neural Networks 1. Introduction Fall 2017 Neural Networks are taking over! Neural networks

Neural Networks Neural Net Basics Dan Klein, John DeNero UC Berkeley Slides adapted from Greg

Q2 2019 RESULTS Amsterdam, 1 August 2019 Agenda Nik Kershaw - Head of IR OPENING Ursula Burns

Assistance Network September 2014 Assistance Network Purpose The Assistance Network provides

Discussion on full network model expansion performance Abhishek Hundiwale Senior Market Design

Remington Zoning and Land Use Study July 2015 Draft 1 Greater Remington Improvement Association

Graph Convolutional Networks (GCNs) Dimitris Papatheodorou Aalto University

Quality of Service Forecasting with LSTM Neural Network International Symposium on Integrated

BRAIN-INSPIRED COMPUTING FOR ADVANCED IMAGE AND PATTERN RECOGNITION Leti Devices Workshop | Marc

INFORMATION RETRIEVAL USING NEURAL NETWORKS VINEETH REDDY ANUGU CMSC 676 INFORMATION RETRIEVAL