Bookkeeping (Lots) Machine Learning I: Decision Trees • Schedule mostly finalized Link on Teams now Piazza AI Class 14 (Ch. 18.1–18.3) • HW4 due 11/8 @ 11:59 Project 11/5 Design • No HW6 HW 4 11/8 Phase 1 11/15 11:59 pm • Final date and time posted HW 5 11/20 Phase II 11/29 • Full project description posted Final 12/11 Writeup Final 12/19 1:00-3:00 Exam Cynthia Matuszek – CMSC 671 1 Material from Dr. Marie desJardin, Dr. Manfred Kerber, 2 Today’s Class What is Learning? • Machine learning • “Learning denotes changes in a system that ... enable a system to do the same task more efficiently • What is ML? the next time.” –Herbert Simon • Inductive learning ß Review: What is induction? • Supervised • “Learning is constructing or modifying • Unsupervised representations of what is being experienced.” • Decision trees –Ryszard Michalski • Later: Bayesian learning, naïve Bayes, and BN • “Learning is making useful changes in our minds.” learning –Marvin Minsky 3 4 Why Learn? Pre-Reading Quiz • Discover previously-unknown new things or structure • What’s supervised learning? • Data mining, scientific discovery • What’s classification? What’s regression? • Fill in skeletal or incomplete domain knowledge • What’s a hypothesis? What’s a hypothesis space? • Large, complex AI systems: • What are the training set and test set? • Cannot be completely derived by hand and • Require dynamic updating to incorporate new information • What is Ockham’s razor? • Learning new characteristics expands the domain or expertise and lessens the “brittleness” of the system • What’s unsupervised learning? • Build agents that can adapt to users or other agents • Understand and improve efficiency of human learning • Use to improve methods for teaching and tutoring people (e.g., better computer-aided instruction) 5 6 1
A General Model of Some Terminology Learning Agents The Big Idea: given some data, you learn a model of how the world works that lets you predict new data. • Training Set: Data from which you learn initially. • Model: What you learn. A “model” of how inputs are associated with outputs. • Test set: New data you test tour model against. • Corpus: A body of data. (pl.: corpora) • Representation: The computational expression of data 7 8 Major Paradigms of Machine Learning Major Paradigms of Machine Learning • Analogy: Model is correspondence between two • Rote learning: 1:1 mapping from inputs to stored different representations representation • You’ve seen a problem before • Discovery: Unsupervised, specific goal not given • Learning by memorization • Genetic algorithms: “Evolutionary” search • Association-based storage and retrieval techniques • Induction: Specific examples à general • Based on an analogy to “survival of the fittest” conclusions • Surprisingly hard to get right/working • Reinforcement: Feedback (positive or negative • Clustering: Unsupervised grouping of data reward) given at the end of a sequence of steps 9 10 The Classification Problem Supervised vs. Unsupervised • Goal: Learn an unknown function • Extrapolate from examples to f (X) = Y, where make accurate predictions about • X is an input example future data points • Y is the desired output. ( f is the..?) yes • Examples are called training data no • Supervised learning: given a training set of (X, Y) pairs by a “teacher” • Predict into classes , based on attributes (“ features ”) X Y “class � • Example: it has tomato sauce, bread cheese tomato sauce pizza labels” � cheese, and no bread. Is it pizza? ¬ bread ¬ cheese tomato sauce ¬ not pizza provided • Example: does this image bread cheese ¬ tomato sauce gross pizza but still pizza contain a cat? lots more rows… 11 2
Supervised vs. Unsupervised Concept Learning • Goal: Learn an unknown function • Concept learning or classification f (X) = Y, where (aka “induction”) • X is an input example • Given a set of examples of some • Y is the desired output. ( f is the..?) concept/class/category: • Unsupervised learning: only given 1. Determine if a given example is an Xs and some (eventual) feedback instance of the concept (class member) or not 2. If it is : positive example X 3. If it is not : negative example bread cheese tomato sauce 4. Or we can make a probabilistic ¬ bread ¬ cheese tomato sauce prediction (e.g., using a Bayes net) bread cheese ¬ tomato sauce cat? lots more rows… 14 Supervised Concept Learning Inductive Learning Framework • Raw input data from sensors preprocessed to obtain • Given a training set of positive and feature vector , X negative examples of a concept • Relevant features for classifying examples • Construct a description (model) that • Each X is a list of (attribute, value) pairs will accurately classify whether future • n attributes (a.k.a. features): fixed, positive, and finite examples are positive or negative • Features have fixed, finite number # of possible values • I.e., learn estimate of function f given a training set: • Or continuous within some well-defined space, e.g., “age” {(x 1 , y 1 ), (x 2 , y 2 ), ..., (x n , y n )} � • Each example is a point in an n -dimensional feature space where each y i is either + (positive) or - (negative), or a • X = [Person:Sue, EyeColor:Brown, Age:Young, Sex:Female] probability distribution over +/- • X = [Cheese: f , Sauce: t , Bread: t ] • X = [Texture:Fuzzy, Ears:Pointy, Purrs:Yes, Legs:4] 15 16 Inductive Learning as Search Inductive Learning as Search • Instance space, I, is set of all possible examples • C gives an instance’s class • Defines the language for the training and test instances • Model space M defines the possible classifiers • Usually each instance i ∈ I is a feature vector • M: I → C, M = {m 1 , … m n } (possibly infinite) • Features are also sometimes called attributes or variables • Model space is sometimes defined using same features as I: V 1 × V 2 × … × V k , i = (v 1 , v 2 , …, v k ) instance space (not always) • Training data lets us search for a good (consistent, • Class variable C gives an instance’s class (to be complete, simple) hypothesis in the model space predicted) • The learned model is a classifier 17 3
Inductive Learning Pipeline Inductive Learning Pipeline Training data Training data T RAINING Test data Classifier Classifier (trained (trained model ) model ) Label: � puppy! 19 20 Inductive Learning Pipeline Inductive Learning Pipeline Training data Training data, X T RAINING T RAINING Text- Ears Legs Class ure Test data Test data + Fuzzy Round 4 Classifier Classifier x 1 = Slimy Missing 8 - (trained (trained <Fuzzy, - Fuzzy Pointy 4 Pointy, 4> model ) model ) + Fuzzy Round 4 Label: � Label: � Fuzzy Pointy 4 + puppy! + … 21 22 Model Spaces (1) Model Spaces (2) • Decision trees • Neural networks • Partition the instance space I into axis-parallel regions • Nonlinear feed-forward functions of attribute values • Labeled with class value • Support vector machines • Nearest-neighbor classifiers • Find a separating plane in a high-dimensional feature • Partition the instance space I into regions defined by centroid space instances (or cluster of k instances) • Associative rules (feature values → class) • Bayesian networks • First-order logical rules • Probabilistic dependencies of class on attributes • Naïve Bayes: special case of BNs where class à each attribute 23 24 4
Decision Trees Learning Decision Trees • Goal: Build a tree to classify examples as positive or • Each non-leaf node is negative instances of a concept using supervised associated with an learning from a training set attribute (feature) • A decision tree is a tree where: • Each leaf node is associated with a • Each non-leaf node is an attribute (feature) classification (+ or -) • Each leaf node is a classification (+ or -) • Positive and negative data points • Each arc is associated • Each arc is one possible value of the attribute at the node from with one possible value which the arc is directed of the attribute at the node from which the • Generalization: allow for >2 classes arc is directed • e.g., {sell, hold, buy} 26 27 Decision Tree-Induced Partition – Will You Buy My Product? Example I 28 http://www.edureka.co/blog/decision-trees/ Decision Tree-Induced Partition – Inductive Learning and Bias Example I • We want to learn a function f(x) = y • We are given sample (x,y) pairs, as in figure (a) • Several hypotheses for this function: (b), (c) and (d) (and others) • A preference here reveals our learning technique’s bias • Prefer piece-wise functions? (b) • Prefer a smooth function? (c) • Prefer a simple function and treat outliers as noise? (d) 32 5
Recommend
More recommend