feature significance in wide neural networks
play

Feature Significance in Wide Neural Networks Janusz A. Starzyk Rafa - PowerPoint PPT Presentation

Feature Significance in Wide Neural Networks Janusz A. Starzyk Rafa Niemiec Adrian Horzyk 1 2 2 3 starzykj@ohio.edu horzyk@agh.edu.pl rniemiec@wsiz.rzeszow.pl Google: Janusz Starzyk Google: Rafa Niemiec Google: Horzyk 1 2 3 Ohio


  1. Feature Significance in Wide Neural Networks Janusz A. Starzyk Rafał Niemiec Adrian Horzyk 1 2 2 3 starzykj@ohio.edu horzyk@agh.edu.pl rniemiec@wsiz.rzeszow.pl Google: Janusz Starzyk Google: Rafał Niemiec Google: Horzyk 1 2 3 Ohio University, University of AGH University of Athens, Ohio, U.S.A., Information Science and Technology Technology and School of Electrical Engineering and Management, Krakow, Poland Computer Science Rzeszow, Poland

  2. Introduction Introduction Wide neural networks was recently proposed as a less costly ▪ alternative to deep neural networks, having no problem with exploding or vanishing gradient descent. We analyzed quality of features and properties of wide neural ▪ networks. We compared the random selection of weights in the hidden ▪ layer to the selection based on radial basis functions. Our study is devoted to feature selection and feature significance ▪ in wide neural networks. We introduced a measure to compare various feature selection ▪ techniques. We proved that this approach is computationally more efficient. ▪ 2

  3. Wide (Broad) Neural Network Wide (Broad) Neural Network C.L.P. Chen, and Z. Liu, “Broad Learning System: An Effective and Efficient Incremental Learning System Without the Need for Deep Architecture”, IEEE Transactions on Neural Networks and Learning Systems, Vol. 29 , Issue: 1 , Jan. 2018, pp. 10 -24. Wide Neural Networks can be described by the equations: Y 1 = Z 1 · W 1 ASSUMED (it is usually randomly Z 2 = Ψ (Y 1 ) generated, but we propose a different approach to save Y 2 = Z 2 · W 2 the number of necessary neurons and connections) W 2 = Z 2 + · Y D We try to measure COMPUTED the feature quality using pseudoinverse if the assumption was correct 3 They were from 276 to 1543 times faster trained than autoencoders, multilayer perceptrons, and deep neural networks.

  4. Neural Neural Features Features Ψ Z 1 W 1 W 2 Y 2 kxm mxn nxo kxo INPUT OUTPUT FEATURES FEATURES NEURAL FEATURES It is all about W 1 proper column space setup. 4

  5. Ran Random features dom features (used as reference) (used as reference) Classification error approximated experimentally is 5

  6. Radial Basis Radial Basis Features Features Hidden neurons function: Distance between the input data and a hidden neuron : Mean value of norms of differences between weights of hidden neurons: 6

  7. Comparison of random and radial Comparison of random and radial basis basis featur feature e selection selection approaches approaches RANDOM FEATURE SELECTION RADIAL BASIS FEATURE SELECTION 7

  8. Random v Random vs Radial Basis Radial Basis Features Features 1 𝑜 RND RBF Open question: how significant the improvement can be? 8

  9. Feature Feature Significance Significance 𝟐 𝒐 RBF weights are always better than Random weights are always better random weights in range of 0 and 6000 from the reference weights, and this advantage is still better neurons, but this benefits are diminishing along with the number of neurons along with the number of neurons. e 1 = 𝟐 𝒐 theoretical limit of e 1 – for random weights W 1 RVFLNN networks e 2 – for RBF weights W 1 e 2 – for random weights W 1 9

  10. % increase % increase of RND relative to RBF of RND relative to RBF 10

  11. Inc Incremental remental Feature Feature Significance Significance Suppose we have n features and want to add n f new features We want to measure n RND features can be saved if n f will be added e.g. 11

  12. Example A Example A e 1 e 2 n 12

  13. Example B Example B Incremental significance of endpoint features 13

  14. Full Fully Connec y Connected ted Cas Cascade cade No of connections (and weights to train) is 7840! 14

  15. Modified Cascade Modified Cascade 15

  16. Summary & Summary & Future Future Plans Plans We discussed the significance of the feature selection for ▪ wide neural networks. We compared recognition accuracy on the MNIST dataset ▪ using two approaches. We compared wide networks to connected cascades. ▪ We introduced two simple feature significance measures. ▪ In future, we want to explore tradeoffs between the number ▪ of hidden neurons in wide neural networks vs. width and depth of deep neural networks to better understand tradeoffs between the two parameters in these modern network structures. 16

  17. Rafał Niemiec Adrian Horzyk Janusz A. Starzyk rniemiec@wsiz.rzeszow.pl horzyk@agh.edu.pl starzykj@ohio.edu Google: Rafał Niemiec Google: Horzyk Google: Janusz Starzyk 謝謝 1. Basawaraj, J. A. Starzyk, A. Horzyk , Episodic Memory in Minicolumn Associative Knowledge Graphs, IEEE Transactions on Neural Networks and Learning Systems, Vol. 30, Issue 11, 2019, pp. 3505-3516, DOI: 10.1109/TNNLS.2019.2927106 (TNNLS-2018-P-9932), IF = 7.982. 2. J. A. Starzyk , Ł. Maciura , A. Horzyk , Associative Memories with Synaptic Delays, IEEE Transactions on Neural Networks and Learning Systems, Vol. .., Issue .., 2019, pp. ... - ..., DOI: 10.1109/TNNLS.2019.2921143 (TNNLS-2018-P-9188), IF = 7.982. 3. A. Horzyk , J. A. Starzyk , J. Graham, Integration of Semantic and Episodic Memories, IEEE Transactions on Neural Networks and Learning Systems, Vol. 28, Issue 12, Dec. 2017, pp. 3084 - 3095, DOI: 10.1109/TNNLS.2017.2728203, IF = 6.108. 4. A. Horzyk, J.A. Starzyk , Fast Neural Network Adaptation with Associative Pulsing Neurons , IEEE Xplore, In: 2017 IEEE Symposium Series on Computational Intelligence, pp. 339 -346, 2017, DOI: 10.1109/SSCI.2017.8285369. 5. A. Horzyk , Neurons Can Sort Data Efficiently , Proc. of ICAISC 2017, Springer-Verlag, LNAI, 2017, pp. 64 - 74, ICAISC BEST PAPER AWARD 2017 sponsored by Springer. 6. A. Horzyk , J. A. Starzyk and Basawaraj, Emergent creativity in declarative memories , IEEE Xplore, In: 2016 IEEE Symposium Series on Computational Intelligence, Greece, Athens: Institute of Electrical and Electronics Engineers, Curran Associates, Inc. 57 Morehouse Lane Red Hook, NY 12571 USA, 2016, ISBN 978-1-5090-4239-5, pp. 1 - 8, DOI: 10.1109/SSCI.2016.7850029. 17

  18. Pseudo Pseudoinvers inverse e Z 2 + =V =V Σ + U T C(Z 2 ) Z 2 = UΣV T Projections, where possible ΣΣ + = ~I & WIPE nullspace m x m σ 1 0 0 1/ σ 1 0 0 Σ + Σ = ~I Σ + = 0 1/σ 2 0 Σ = 0 σ 2 0 n x n 0 0 0 0 0 0 m x n n x m C(Z 2 T ) + = ( UΣV T ) + = VΣ + U T Z 2 + Y D Minimum norm least square solution W 2 ~= Z 2 Matrix Methods in Data Analysis, Signal Processing, and Machine Learning 18

Recommend


More recommend