An Introduction to Neural Network Rule Extraction Algorithms By Sarah Jackson
Can we trust magic? ✗ Neural Networks ✗ Machine learning black boxes ✗ Magical, unexplainable results ✗ Problems ✗ People won't trust Neural Networks since it is difficult for them to understand them ✗ End result isn't always the only thing we are looking for ✗ Unacceptable risk for certain scenarios
Why do we want them then? ✗ Neural Networks have been shown to accurately classify data ✗ Neural Networks are capable of learning and classifying in ways that other machine learning techniques may not be
Who cares about rules? ✗ Rules help to bridge the gap between connectionist and symbolic methods ✗ Rule extraction from Neural Networks will increase their acceptance ✗ Rules will also improve usefulness of data gathered from Neural Networks
What do we do with these rules? ✗ Validation ✗ We can tell something has been learned ✗ Integration ✗ Can be used with symbolic systems ✗ Theory discovery ✗ May not have been seen otherwise ✗ Explanation ability ✗ Allows exploration of knowledge in network
Are the rules good? ✗ Accuracy ✗ Correctly classify unseen examples ✗ Fidelity ✗ Same behavior as Neural Network ✗ Consistency ✗ Classify unseen examples the same ✗ Comprehensibility ✗ Size of rule set and number of clauses per rule
How does extraction work? ✗ Knowledge in Neural Networks represented by numerical weights ✗ Extraction algorithms attempt to directly or indirectly analyze the numerical data ✗ Neural Network behavior is explained through new methods
Decompositional Algorithms ✗ Knowledge is extracted from each node in the network individually ✗ Each node's rules are based on previous layers ✗ Usually simply described and accurate ✗ Require threshold approximation for each node ✗ Restricted generalization and scalability ✗ Special training procedure ✗ Special network architecture ✗ Require sigmoidal transfer functions for hidden nodes
Global Algorithms ✗ Describe output nodes as functions of input nodes ✗ Internal structure of network is not important ✗ Represent networks as decision trees ✗ Extract rules from constructed decision trees ✗ May not be efficient as complexity of network grows
Combinatorial Algorithms ✗ Uses aspects of decompositional and global algorithms ✗ Network architecture and value of weights are necessary ✗ Attempts to gain advantages of each without the disadvantages
TREPAN ✗ Trees Parroting Networks ✗ Global method ✗ Represents network knowledge through a decision tree ✗ Uses same construction as C4.5 and CART ✗ Uses breadth-first search to construct the tree instead of depth-first search
TREPAN ✗ Classes used for decision tree are those defined by the neural network ✗ List of leaf nodes kept with related data ✗ Subset of training data ✗ Set of complementary data ✗ Set of constraints ✗ Data sets used to determine if node should be further divided or left as terminal leaf ✗ Data sets meet constraints
TREPAN ✗ Nodes are removed from list when split or become terminal leaves ✗ Never added to list again ✗ Children are added to list ✗ Decision function determines type of decision tree constructed ✗ M-of-N – Each node represents an m-of-n test ✗ 1-of-N – Each node represents a 1-of-n test ✗ Simple – Each node represents a test for one attribute (true of false)
TREPAN ✗ Comparison on UCI Tic-Tac-Toe Data ✗ 27 inputs, 20 hidden nodes, 2 outputs
TREPAN ✗ Typically, shortest tree is easiest to understand ✗ M-of-N has fewest nodes, but is very difficult to understand ✗ TREPAN provides higher quality information
TREPAN
TREPAN
TREPAN
Another Global Algorithm ✗ Only uses training data to construct decision tree ✗ TREPAN uses training data and may use artificially generated data ✗ Uses CN2 and C4.5 algorithms
BDT ✗ Bound Decomposition Tree ✗ Decomposition Algorithm ✗ Designed with goals of no retraining, high accuracy and low complexity ✗ Algorithm works for Multi-Layer Perceptrons
BDT ✗ Maximum upper bounds on any neuron ✗ All inputs that have positive weight have a value of 1 ✗ Inputs with negative weight have a value of 0 ✗ Minimum lower bounds on any neuron ✗ Only inputs that have negative weight have a value of 1 ✗ Inputs with positive weight have a value of 0
BDT ✗ Each neuron has its own minimum and maximum bounds ✗ Minimum is found by adding the bias plus all negative weights ✗ Maximum is found by adding the bias plus all positive weights Weight Min Bound Max Bound I1 -0.25 -0.25 I2 0.65 0.65 I3 -0.48 -0.48 I4 0.72 0.72 Bias (-1) 1 -1 -1 -1.73 0.37
BDT ✗ Each neuron (cube) is divided into two subcubes based on the first input ✗ One subcube assumes 0 as the value and the other assumes 1 ✗ Remaining inputs are used to construct the input vectors for each subcube ✗ Bounds are calculated for each subcube ✗ Positive subcube – lower bound is positive ✗ Negative subcube – upper bound is negative ✗ Uncertain subcube – lower bound is negative and upper bound is positive
BDT ✗ Positive subcubes will always fire ✗ Represents a rule for the neuron ✗ Negative subcubes will never fire ✗ Uncertain subcubes must be further subdivided until positive and/or negative subcubes are reached ✗ Rules for a neuron are the set of all input vectors on positive subcubes ✗ Can have a Δ over 0 to prune the neuron
BDT
Sources Milare, R., De Carvalho, A., & Monard, M. (2002). An Approach to Explain Neural Networks Using Symbolic Algorithms. International Journal of Computational Intelligence and Applications . 2(4), 365-376. Heh, J. S., Chen, J. C., & Chang, M. (2008). Designing a decompositional rule extraction algorithm for neural networks with bound decomposition tree. Neural Computing and Applications . 17, 297- 309. Nobre, C., Martinelle, E., Braga, A., De Carvalho, A., Rezende, S., Braga, J. L. & Ludermir, T. (1999). Knowledge Extraction: A Comparison between Symbolic and Connectionist Methods. International Journal of Neural Systems . 9(3), 257-264.
Recommend
More recommend