an introduction to neural network rule extraction
play

An Introduction to Neural Network Rule Extraction Algorithms By - PowerPoint PPT Presentation

An Introduction to Neural Network Rule Extraction Algorithms By Sarah Jackson Can we trust magic? Neural Networks Machine learning black boxes Magical, unexplainable results Problems People won't trust Neural Networks since


  1. An Introduction to Neural Network Rule Extraction Algorithms By Sarah Jackson

  2. Can we trust magic? ✗ Neural Networks ✗ Machine learning black boxes ✗ Magical, unexplainable results ✗ Problems ✗ People won't trust Neural Networks since it is difficult for them to understand them ✗ End result isn't always the only thing we are looking for ✗ Unacceptable risk for certain scenarios

  3. Why do we want them then? ✗ Neural Networks have been shown to accurately classify data ✗ Neural Networks are capable of learning and classifying in ways that other machine learning techniques may not be

  4. Who cares about rules? ✗ Rules help to bridge the gap between connectionist and symbolic methods ✗ Rule extraction from Neural Networks will increase their acceptance ✗ Rules will also improve usefulness of data gathered from Neural Networks

  5. What do we do with these rules? ✗ Validation ✗ We can tell something has been learned ✗ Integration ✗ Can be used with symbolic systems ✗ Theory discovery ✗ May not have been seen otherwise ✗ Explanation ability ✗ Allows exploration of knowledge in network

  6. Are the rules good? ✗ Accuracy ✗ Correctly classify unseen examples ✗ Fidelity ✗ Same behavior as Neural Network ✗ Consistency ✗ Classify unseen examples the same ✗ Comprehensibility ✗ Size of rule set and number of clauses per rule

  7. How does extraction work? ✗ Knowledge in Neural Networks represented by numerical weights ✗ Extraction algorithms attempt to directly or indirectly analyze the numerical data ✗ Neural Network behavior is explained through new methods

  8. Decompositional Algorithms ✗ Knowledge is extracted from each node in the network individually ✗ Each node's rules are based on previous layers ✗ Usually simply described and accurate ✗ Require threshold approximation for each node ✗ Restricted generalization and scalability ✗ Special training procedure ✗ Special network architecture ✗ Require sigmoidal transfer functions for hidden nodes

  9. Global Algorithms ✗ Describe output nodes as functions of input nodes ✗ Internal structure of network is not important ✗ Represent networks as decision trees ✗ Extract rules from constructed decision trees ✗ May not be efficient as complexity of network grows

  10. Combinatorial Algorithms ✗ Uses aspects of decompositional and global algorithms ✗ Network architecture and value of weights are necessary ✗ Attempts to gain advantages of each without the disadvantages

  11. TREPAN ✗ Trees Parroting Networks ✗ Global method ✗ Represents network knowledge through a decision tree ✗ Uses same construction as C4.5 and CART ✗ Uses breadth-first search to construct the tree instead of depth-first search

  12. TREPAN ✗ Classes used for decision tree are those defined by the neural network ✗ List of leaf nodes kept with related data ✗ Subset of training data ✗ Set of complementary data ✗ Set of constraints ✗ Data sets used to determine if node should be further divided or left as terminal leaf ✗ Data sets meet constraints

  13. TREPAN ✗ Nodes are removed from list when split or become terminal leaves ✗ Never added to list again ✗ Children are added to list ✗ Decision function determines type of decision tree constructed ✗ M-of-N – Each node represents an m-of-n test ✗ 1-of-N – Each node represents a 1-of-n test ✗ Simple – Each node represents a test for one attribute (true of false)

  14. TREPAN ✗ Comparison on UCI Tic-Tac-Toe Data ✗ 27 inputs, 20 hidden nodes, 2 outputs

  15. TREPAN ✗ Typically, shortest tree is easiest to understand ✗ M-of-N has fewest nodes, but is very difficult to understand ✗ TREPAN provides higher quality information

  16. TREPAN

  17. TREPAN

  18. TREPAN

  19. Another Global Algorithm ✗ Only uses training data to construct decision tree ✗ TREPAN uses training data and may use artificially generated data ✗ Uses CN2 and C4.5 algorithms

  20. BDT ✗ Bound Decomposition Tree ✗ Decomposition Algorithm ✗ Designed with goals of no retraining, high accuracy and low complexity ✗ Algorithm works for Multi-Layer Perceptrons

  21. BDT ✗ Maximum upper bounds on any neuron ✗ All inputs that have positive weight have a value of 1 ✗ Inputs with negative weight have a value of 0 ✗ Minimum lower bounds on any neuron ✗ Only inputs that have negative weight have a value of 1 ✗ Inputs with positive weight have a value of 0

  22. BDT ✗ Each neuron has its own minimum and maximum bounds ✗ Minimum is found by adding the bias plus all negative weights ✗ Maximum is found by adding the bias plus all positive weights Weight Min Bound Max Bound I1 -0.25 -0.25 I2 0.65 0.65 I3 -0.48 -0.48 I4 0.72 0.72 Bias (-1) 1 -1 -1 -1.73 0.37

  23. BDT ✗ Each neuron (cube) is divided into two subcubes based on the first input ✗ One subcube assumes 0 as the value and the other assumes 1 ✗ Remaining inputs are used to construct the input vectors for each subcube ✗ Bounds are calculated for each subcube ✗ Positive subcube – lower bound is positive ✗ Negative subcube – upper bound is negative ✗ Uncertain subcube – lower bound is negative and upper bound is positive

  24. BDT ✗ Positive subcubes will always fire ✗ Represents a rule for the neuron ✗ Negative subcubes will never fire ✗ Uncertain subcubes must be further subdivided until positive and/or negative subcubes are reached ✗ Rules for a neuron are the set of all input vectors on positive subcubes ✗ Can have a Δ over 0 to prune the neuron

  25. BDT

  26. Sources Milare, R., De Carvalho, A., & Monard, M. (2002). An Approach to Explain Neural Networks Using Symbolic Algorithms. International Journal of Computational Intelligence and Applications . 2(4), 365-376. Heh, J. S., Chen, J. C., & Chang, M. (2008). Designing a decompositional rule extraction algorithm for neural networks with bound decomposition tree. Neural Computing and Applications . 17, 297- 309. Nobre, C., Martinelle, E., Braga, A., De Carvalho, A., Rezende, S., Braga, J. L. & Ludermir, T. (1999). Knowledge Extraction: A Comparison between Symbolic and Connectionist Methods. International Journal of Neural Systems . 9(3), 257-264.

Recommend


More recommend