Machine Learning CSE 4308/5360: Artificial Intelligence I - PowerPoint PPT Presentation

Machine Learning CSE 4308/5360: Artificial Intelligence I University of Texas at Arlington 1

Machine Learning • Machine learning is useful for constructing agents that improve themselves using observations. • Instead of hardcoding how the agent to behave, we allow the behavior to be optimized based on training data. • In many AI applications in speech recognition, computer vision, game-playing, etc., machine learning methods vastly outperform hardcoded agents. 2

Pattern Recognition • In pattern recognition (aka pattern classification) the setting is this: • We have patterns, which can be, for example: – Images or videos. – Strings. – Sequences of numbers, booleans, or strings (or a mixture thereof). • We have classes, and each pattern is associated with a class. Pattern Class A photograph of a face The human A video of a sign from American Sign Language The sign A book (represented as a string) The genre of the book. • Our goal: build a system that, given a pattern, estimates its class. – E.g., given a photograph of a face, recognize a person. – Given a video of a sign, recognize the sign. 3

Pattern Recognition • More formally: the goal in pattern recognition is to construct a classifier that is as accurate as possible. • A classifier is a function F, mapping patterns to classes. F: set of patterns  set of classes. – The input to F is a pattern (e.g., a photograph of a face). – The output of F is a class (the ID of the human that the face belongs to). • Typically, classifiers are not perfect. – In most real-world cases, the classifier will make some mistakes, and for some patterns it will output the wrong class. • One key measure of performance of a classifier is its error rate : the percentage of patterns for which F provides the wrong answer. – Obviously, we want the error rate to be as low as possible. • Another term is classification accuracy , equal to 1 – error rate. 4

Learning and Recognition • Machine learning and pattern recognition are not the same thing. – This is a point that confuses many people. • You can use machine learning to learn things that are not classifiers. For example: – Learn how to walk on two feet. – Learn how to grasp a medical tool. • You can construct classifiers without machine learning. – You can hardcode a bunch of rules that the classifier applies to each pattern in order to estimate its class. • However, machine learning and pattern recognition are heavily related. – A big part of machine learning research focuses on pattern recognition. – Modern pattern recognition systems are usually exclusively based on machine learning. 5

Supervised Learning • In supervised learning, our training data is a set of pairs. • Each pair consists of: – A pattern. – The true class for that pattern. • Another way to think about this is this: – There exists a perfect classifier F true , that knows the true class of each pattern. – The training data gives us the value of F true for many examples. – Our goal is to learn a classifier F, mapping patterns to classes, that agrees with F true as much as possible. • The difficulty of the problem is this: – The training data provide values of F true for only some patterns. – Based on those examples, we need to construct a classifier F that provides an answer for ANY possible pattern. 6

Supervised Learning Example • This is a toy example. – From the textbook. • Here, the “pattern” is a single real number. • The class is also a real number. • So, F true is a function from the reals to the reals. – Usually patterns are much more complex. – In this toy example it is easy to visualize training examples and classifiers. • Each training example is an X on the figure. – The x coordinate is the pattern, the y coordinate is the class. • Based on these examples, what do you think F true looks like? 7

Supervised Learning Example • Different people may give different answers as to what F true may look like. • That shows the challenge in supervised learning: we can find some plausible functions, but how do we know that one of them is correct? 8

Supervised Learning Example • Here is one possible classifier F. • Can anyone guess how it was obtained? 9

Supervised Learning Example • Here is one possible classifier F. • Can anyone guess how it was obtained? • It was obtained by fitting a line to the training data. 10

Supervised Learning Example • Here we see another possible classifier F, shown in green. • It looks like a quadratic function (second degree polynomial). • It fits all the data perfectly, except for one. 11

Supervised Learning Example • Here we see a third possible classifier F, shown in blue. • It looks like a cubic degree polynomial. • It fits all the data perfectly. 12

Supervised Learning Example • Here we see a fourth possible classifier F, shown in orange. • It zig-zags a lot. • It fits all the data perfectly. 13

Supervised Learning Example • Overall, we can come up with an infinite number of possible classifiers here. • The question is, how do we choose which one is best? • Or, an easier version, how do we choose a good one. • Or, an easier version: given a classifier, how can we measure how good it is? • What are your thoughts on this? 14

Supervised Learning Example • One naïve solution is to evaluate classifiers based on training error . • For any classifier F, its training error can be measured as a sum of squared errors over training patterns X: [ 𝐺 𝑢 𝑠𝑣𝑓 (𝑌) − 𝐺 ( 𝑌 ) ] 2 𝑌 • What are the pitfalls of choosing the “best” classifier based on training error? 15

Supervised Learning Example • What are the pitfalls of choosing the “best” classifier based on training error? • The zig- zagging orange classifier comes out as “perfect”: its training error is zero. • As a human, would you find more reasonable the orange classifier or the blue classifier (cubic polynomial)? – They both have zero training error. 16

Supervised Learning Example • What are the pitfalls of choosing the “best” classifier based on training error? • The zig- zagging orange classifier comes out as “perfect”: its training error is zero. • As a human, would you find more reasonable the orange classifier or the blue classifier (cubic polynomial)? – They both have zero training error. – However, the zig-zagging classifier looks pretty arbitrary. 17

Supervised Learning Example • Ockham’s razor: given two equally good explanations, choose the more simple one. – This is an old philosophical principle (Ockham lived in the 14 th century). • Based on that, we prefer a cubic polynomial over a crazy zig- zagging classifier, because it is more simple, and they both have zero training error. 18

Supervised Learning Example • However, real life is more complicated. • What if none of the classifiers have zero training error? • How do we weigh simplicity versus training error? 19

Supervised Learning Example • However, real life is more complicated. • What if none of the classifiers have zero training error? • How do we weigh simplicity versus training error? • There is no standard or straightforward solution to this. • There exist many machine learning algorithms. Each corresponds to a different approach for resolving the trade-off between simplicity and training error. 20

The Road Ahead • In the remainder of this course, we will mostly study supervised learning methods for pattern recognition. • Some methods we will see, if we have time: – Decision trees. – Decision forests. – Bayesian classifiers. – Nearest neighbor classifiers. – Neural networks (in very little detail). • Studying these methods should give you a good first experience with machine learning and pattern recognition. • The current trend in AI is that machine learning and pattern recognition methods are becoming more and more dominant, with rapidly growing commercial applications and impact. 21

Machine Learning CSE 4308/5360: Artificial Intelligence I - PowerPoint PPT Presentation

Machine Learning CSE 4308/5360: Artificial Intelligence I University of Texas at Arlington 1 Machine Learning Machine learning is useful for constructing agents that improve themselves using observations. Instead of hardcoding how the

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

MACHINE LEARNING Kernel Canonical Correlation Analysis 1 ADVANCED MACHINE LEARNING ADVANCED

Machine learning for finance Nathan George Data Science Professor DataCamp Machine Learning

APPLIED MACHINE LEARNING Methods for Clustering K-means, Soft K-means DBSCAN 1 MACHINE

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

Crystallization from the Gas Phase: Morphology Control, Co-Crystal and Salt Formation Ciarn

#define WINTER 1 #define SPRING 2 typedef int coinValue; coinValue quarter = 25; coinValue

Updated Barrel EMC Up Geo Geome metry Guang Zhao ( zhaog@ihep.ac.cn ) Institute of High Energy

Time Crystal Platform Krzysztof Sacha Jagiellonian University in Krak ow. People Krzysztof

Modelling of Phase Transitions in R VO 3 Perovskites Andrzej M. Ole M. Smoluchowski Institute

The Chinta-Gunnells action and sums over highest weight crystals Anna Pusk as University of

Self-learning Monte Carlo method and all optical neural network Junwei Liu ( )

Nanopowder crystallite sizes and shapes from diffraction experiments D. Chateigner, L. Lutterotti