26:198:722 Expert Systems I Knowledge representation I Knowledge acquisition I Machine learning I ID3 & C4.5
Knowledge Representation Recall: I Knowledge engineering F Knowledge acquisition N Knowledge elicitation F Knowledge representation N Production rules N Semantic networks N Frames
Knowledge Representation I Representation is more than just encoding (encrypting) I Coding preserves structural ambiguity I Communication assumes prior knowledge I Representation implies organization
Knowledge Representation I Representation F A set of syntactic and semantic conventions that make it possible to describe things (Winston) I Description F makes use of the conventions of a representation to describe some particular thing I Syntax v. semantics
Knowledge Representation I STRIPS F Predicate-argument expressions N at (robot, roomA) F World models F Operator tables N push (X, Y, Z) ² Preconditions at (robot, Y), at (X, Y) ² Delete list at (robot, Y), at (X, Y) ² Add list at (robot, Z), at (X, Z)
Knowledge Representation I STRIPS F maintained lists of goals F selected goal to work on next F searched for applicable operators F matched goals against formulas in add lists F set up preconditions as sub-goals F used means-end analysis
Knowledge Representation I STRIPS - lessons F Heuristic search F Uniform representation F Problem reduction I Procedural semantics
Knowledge Representation I MYCIN F Assists physicians who are not experts in the field of antibiotics in treating blood infections F Consists of N Knowledge base N Dynamic patient database N Consultation program N Explanation program N Knowledge acquisition program
Knowledge Representation I MYCIN F Production rules N Premises ² Conjunctions of conditions N Actions ² Conclusions or instructions F Patient information stored in context tree F Certainty factors for uncertain reasoning F Backward chaining control structure (based on AND/OR tree)
Knowledge Representation I MYCIN F Evaluation N Panel of experts approved 72% of recommendations N Good as experts N Better than non-experts N Knowledge base incomplete (400 rules) N Required more computing power than available in hospitals N Doctors did not like the user interface
Knowledge Acquisition I Stages F Identification F Conceptualization F Formalization F Implementation F Testing I KADS I Ontological analysis
Knowledge Acquisition I Expert system shells F EMYCIN F TEIRESIAS N Rule models (meta-rules) N Schemas for data types N Domain-specific knowledge N Representation-specific knowledge N Representation-independent knowledge N Explain-Test-Review
Knowledge Acquisition I Methods and tools F Structured interview F Unstructured interview F Case studies N Retrospective v. observational N Familiar v. unfamiliar F Concurrent protocols N Verbalization, “thinking aloud” F Tape recording F Video recording
Knowledge Acquisition I Methods and tools F Automated knowledge acquisition N Domain models N Graphical interfaces N Visual programming language
Knowledge Acquisition I Different types of knowledge F Procedural knowledge N Rules, strategies, agendas, procedures F Declarative knowledge N Concepts, objects, facts F Meta-knowledge N Knowledge about other types of knowledge and how to use them F Structural knowledge N Rules sets, concept relationships, concept to object relationships
Knowledge Acquisition I Sources of knowledge F Experts F End-users F Multiple experts (panels) F Reports F Books F Regulations F Guidelines
Knowledge Acquisition I Major difficulties with elicitation F Expert may N be unaware of the knowledge used N be unable to verbalize the knowledge used N provide irrelevant knowledge N provide incomplete knowledge N provide incorrect knowledge N provide inconsistent knowledge
Knowledge Acquisition I “The more competent domain experts become, the less able they are to describe the knowledge they used to solve problems” (Waterman)
Knowledge Acquisition I Detailed guidelines for conducting structured and unstructured interviews and both retrospective and observational case studies are given in Durkin (Chapter 17)
Knowledge Acquisition I Technique Capabilities Interviews Case Studies Retrospective Observational Knowledge Unstructured Structured Familiar Unfamiliar Familiar Unfamiliar Facts Poor Good Fair Average Good Excellent Concepts Excellent Excellent Average Average Good Good Objects Good Excellent Average Average Good Good Rules Fair Average Average Average Good Excellent Strategies Average Average Good Good Excellent Excellent Heuristics Fair Average Excellent Good Good Poor Structures Fair Excellent Average Average Average Average
Knowledge Acquisition I Analyzing the knowledge collected F Producing transcripts F Interpreting transcripts N Chunking F Analyzing transcripts N Knowledge dictionaries N Graphical techniques ² Cognitive maps ² Inference networks ² Flowcharts ² Decision trees
Machine Learning I Rote learning I Supervised learning F Induction N Concept learning N Descriptive generalization I Unsupervised learning
Machine Learning I META-DENDRAL F RULEMOD N Removing redundancy N Merging rules N Making rules more specific N Making rules more general N Selecting final rules
Machine Learning I META-DENDRAL F Version spaces N Partial ordering N Boundary sets N Candidate elimination algorithm N Monotonic, non-heuristic N Results independent of order of presentation N Each training instance examine only once N Discarded hypotheses never reconsidered N Learning is properly incremental
Machine Learning I Decision trees and production rules F Decision trees are an alternative way of structuring rules F Efficient algorithms exist for constructing decision trees F There is a whole family of such learning systems: N CLS (1966) N ID3 (1979) N ACLS (1982) N ASSISTANT (1984) N IND (1990) N C4.5 (1993) - and C5.0 F Decision trees can be converted to rules later
Machine Learning I Entropy F Let X be a variable with states x 1 - - - x n F Define the entropy of X by ( ) n = − ∑ ( ) ( ) H( ) p log p X x x 2 i i = i 1 ( ) ( ) log ln x x ( ) x = = 10 F N.B. log ( ) ( ) 2 log 2 ln 2 10
Machine Learning I Entropy F Consider flipping a perfect coin: e.g., n = 2 X : x 1 , x 2 p( x 1 ) = p( x 2 ) = 1/2
Machine Learning I Entropy n ( ) ∑ ( ) ( ) = − H( ) p log p X x x 2 i i = i 1 1 1 1 1 = − + log log 2 2 2 2 2 2 ( ) ( ) 1 1 − − = = − + 1 1 1 2 2
Machine Learning I Entropy F Consider n equiprobable outcomes ( ) n ∑ ( ) ( ) = − H( ) p log p X x x i 2 i = 1 i n 1 1 ∑ = − log 2 n n = 1 i n 1 ∑ ( ) ( ) = = log n log n 2 2 n = 1 i
Machine Learning I Entropy F Consider flipping a totally biased coin: e.g., n = 2 X : x 1 , x 2 p( x 1 ) = 1 p( x 2 ) = 0
Machine Learning I Entropy ( ) n ∑ ( ) ( ) = − H( ) p log p X x x 2 i i = i 1 ( ) ( ) = − 1 + 0 0 log log 2 2 ( ) = = − 0 0 + 0 0 log2 (by L’Hopital’s rule)
Machine Learning I Entropy F Entropy is a measure of chaos or disorder F H( X ) is maximum for equiprobable outcomes
Machine Learning I Entropy F X : x 1 - - - x m and Y : y 1 - - - y n be two variables ( ) ( ) ( ) m n = − ∑∑ H( , ) p , log p , X Y x y x y 2 i j i j = = i 1 j 1 F If X and Y are independent = + H( , ) H( ) H( ) X Y X Y
Machine Learning I Conditional Entropy F Partial conditional entropy of Y given X is in state x i : ( ) ( ) ( ) n = − ∑ H( ) p log p Y x y x y x 2 i j i j i = j 1 F Full conditional entropy of Y given X m ∑ ( ) = ⋅ H( ) p H( ) Y X x Y x i i = 1 i
Machine Learning I Binary Logarithms 1 0.0000 2 1.0000 3 1.5850 4 2.0000 5 2.3219 6 2.5850 7 2.8074 8 3.0000
Machine Learning I ID3 F Builds a decision tree first, then rules F Given a set of attributes, and a decision, recursively selects attributes to be the root of the tree based on Information Gain: H (decision) - H (decision | attribute) F Favors attributes with many outcomes F Is not guaranteed to find the simplest decision tree F Is not incremental
Machine Learning I C4.5 F Selects attributes based on Information gain ratio: ( H (decision) - H (decision | attribute)) / H (attribute) F Uses pruning heuristics to simplify decision trees N to simplify N to reduce dependence on training set F Tunes the resulting rule(s)
Machine Learning I C4.5 rule tuning F Derive initial rules by enumerating paths through the decision tree F Generalize the rules by possibly deleting unnecessary conditions F Group rules according to target classes and delete any that do not contribute to overall performance on the class F Order the sets of rules for the target classes and choose a default class
Machine Learning I Rule tuning F Rule tuning may be useful for rules derived by a variety of other means besides C4.5 N Evaluate the contribution of individual rules N Evaluate the performance of the rule set as a whole
Recommend
More recommend