What is Learning? • Herbert Simon: “Learning is any process by which a system improves performance from CS 391L: Machine Learning experience.” Introduction • What is the task? – Classification – Problem solving / planning / control Raymond J. Mooney University of Texas at Austin 1 2 Classification Problem Solving / Planning / Control • Assign object/event to one of a given finite set of • Performing actions in an environment in order to categories. achieve a goal. – Medical diagnosis – Credit card applications or transactions – Solving calculus problems – Fraud detection in e-commerce – Playing checkers, chess, or backgammon – Worm detection in network packets – Spam filtering in email – Balancing a pole – Recommended articles in a newspaper – Driving a car or a jeep – Recommended books, movies, music, or jokes – Flying a plane, helicopter, or rocket – Financial investments – DNA sequences – Controlling an elevator – Spoken words – Controlling a character in a video game – Handwritten letters – Astronomical images – Controlling a mobile robot 3 4 Why Study Machine Learning? Measuring Performance Engineering Better Computing Systems • Develop systems that are too difficult/expensive to • Classification Accuracy construct manually because they require specific detailed • Solution correctness skills or knowledge tuned to a specific task ( knowledge engineering bottleneck ). • Solution quality (length, efficiency) • Develop systems that can automatically adapt and customize themselves to individual users. • Speed of performance – Personalized news or mail filter – Personalized tutoring • Discover new knowledge from large databases ( data mining ). – Market basket analysis (e.g. diapers and beer) – Medical text mining (e.g. migraines to calcium channel blockers to magnesium) 5 6 1
Why Study Machine Learning? Why Study Machine Learning? Cognitive Science The Time is Ripe • Computational studies of learning may help us • Many basic effective and efficient understand learning in humans and other algorithms available. biological organisms. – Hebbian neural learning • Large amounts of on-line data available. • “Neurons that fire together, wire together.” • Large amounts of computational resources – Human’s relative difficulty of learning disjunctive concepts vs. conjunctive ones. available. – Power law of practice log(perf. time) log(# training trials) 7 8 Related Disciplines Defining the Learning Task • Artificial Intelligence Improve on task, T, with respect to • Data Mining performance metric, P, based on experience, E. • Probability and Statistics T: Playing checkers • Information theory P: Percentage of games won against an arbitrary opponent E: Playing practice games against itself • Numerical optimization • Computational complexity theory T: Recognizing hand-written words P: Percentage of words correctly classified • Control theory (adaptive) E: Database of human-labeled images of handwritten words • Psychology (developmental, cognitive) T: Driving on four-lane highways using vision sensors • Neurobiology P: Average distance traveled before a human-judged error • Linguistics E: A sequence of images and steering commands recorded while observing a human driver. • Philosophy T: Categorize email messages as spam or legitimate. P: Percentage of email messages correctly classified. 9 10 E: Database of emails, some with human-given labels Designing a Learning System Sample Learning Problem • Choose the training experience • Learn to play checkers from self-play • Choose exactly what is too be learned, i.e. the • We will develop an approach analogous to target function . that used in the first machine learning • Choose how to represent the target function. system developed by Arthur Samuels at • Choose a learning algorithm to infer the target IBM in 1959. function from the experience. Learner Environment/ Knowledge Experience Performance Element 11 12 2
Training Experience Source of Training Data • Provided random examples outside of the learner’s • Direct experience: Given sample input and output control. pairs for a useful target function. – Negative examples available or only positive? – Checker boards labeled with the correct move, e.g. • Good training examples selected by a “benevolent extracted from record of expert play teacher.” • Indirect experience: Given feedback which is not – “Near miss” examples direct I/O pairs for a useful target function. • Learner can query an oracle about class of an – Potentially arbitrary sequences of game moves and their unlabeled example in the environment. final game results. • Learner can construct an arbitrary example and • Credit/Blame Assignment Problem: How to assign query an oracle for its label. credit blame to individual moves given only • Learner can design and run experiments directly indirect feedback? in the environment without any human guidance. 13 14 Training vs. Test Distribution Choosing a Target Function • What function is to be learned and how will it be • Generally assume that the training and test used by the performance system? examples are independently drawn from the • For checkers, assume we are given a function for same overall distribution of data. generating the legal moves for a given board position – IID: Independently and identically distributed and want to decide the best move. – Could learn a function: • If examples are not independent, requires ChooseMove(board, legal-moves) → best-move collective classification . – Or could learn an evaluation function , V (board) → R , • If test distribution is different, requires that gives each board position a score for how favorable it is. V can be used to pick a move by applying each legal transfer learning . move, scoring the resulting board position, and choosing the move that results in the highest scoring board position. 15 16 Ideal Definition of V ( b ) Approximating V ( b ) • Computing V ( b ) is intractable since it • If b is a final winning board, then V ( b ) = 100 involves searching the complete exponential • If b is a final losing board, then V ( b ) = –100 game tree. • If b is a final draw board, then V ( b ) = 0 • Therefore, this definition is said to be non- • Otherwise, then V ( b ) = V ( b ´ ), where b ´ is the operational . highest scoring final board position that is achieved starting from b and playing optimally until the end • An operational definition can be computed in reasonable (polynomial) time. of the game (assuming the opponent plays • Need to learn an operational approximation optimally as well). – Can be computed using complete mini-max search of the to the ideal evaluation function. finite game tree. 17 18 3
Linear Function for Representing V ( b ) Representing the Target Function • Target function can be represented in many ways: • In checkers, use a linear approximation of the lookup table, symbolic rules, numerical function, evaluation function. neural network. V b = w + w ⋅ bp b + w ⋅ rp b + w ⋅ bk b + w ⋅ rk b + w ⋅ bt b + w ⋅ rt b ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) 0 1 2 3 4 5 6 – bp ( b ): number of black pieces on board b • There is a trade-off between the expressiveness of a representation and the ease of learning. – rp ( b ): number of red pieces on board b – bk ( b ): number of black kings on board b • The more expressive a representation, the better it – rk ( b ): number of red kings on board b will be at approximating an arbitrary function; – bt ( b ): number of black pieces threatened (i.e. which can however, the more examples will be needed to be immediately taken by red on its next turn) learn an accurate function. – rt ( b ): number of red pieces threatened 19 20 Obtaining Training Values Temporal Difference Learning • Estimate training values for intermediate (non- • Direct supervision may be available for the terminal) board positions by the estimated value of target function. their successor in an actual game trace. – < < bp =3, rp =0, bk =1, rk =0, bt =0, rt =0>, 100> V train b = V b ) ( ) ( successor( )) (win for black) where successor( b ) is the next board position • With indirect feedback, training values can where it is the program’s move in actual play. be estimated using temporal difference • Values towards the end of the game are initially learning (used in reinforcement learning more accurate and continued training slowly where supervision is delayed reward ). “backs up” accurate values to earlier board positions. 21 22 Learning Algorithm Least Mean Squares (LMS) Algorithm • Uses training values for the target function to • A gradient descent algorithm that incrementally induce a hypothesized definition that fits these updates the weights of a linear function in an examples and hopefully generalizes to unseen attempt to minimize the mean squared error examples. Until weights converge : • In statistics, learning to approximate a continuous For each training example b do : function is called regression . 1) Compute the absolute error : = − error b V b V b ) ( ) ( ) ( ) • Attempts to minimize some measure of error ( loss train 2) For each board feature, f i , update its weight, w i : function ) such as mean squared error : = + ⋅ ⋅ w w c f error ( b − ) V b V b ) 2 [ ( ) ( )] i i i train ∑ E = b ∈ B for some small constant (learning rate) c B 23 24 4
Recommend
More recommend