Learning Theory Part 3: Bias-Variance Tradeoff Yingyu Liang - PowerPoint PPT Presentation

Learning Theory Part 3: Bias-Variance Tradeoff Yingyu Liang Computer Sciences 760 Fall 2017 http://pages.cs.wisc.edu/~yliang/cs760/ Some of the slides in these lectures have been adapted/borrowed from materials developed by Mark Craven, David Page, Jude Shavlik, Tom Mitchell, Nina Balcan, Matt Gormley, Elad Hazan, Tom Dietterich, and Pedro Domingos.

Goals for the lecture you should understand the following concepts • estimation bias and variance • the bias-variance decomposition

Estimation bias and variance • How will predictive accuracy (error) change as we vary k in k -NN? • Or as we vary the complexity of our decision trees? • the bias/variance decomposition of error can lend some insight into these questions note that this is a different sense of bias than in the term inductive bias

Background: Expected values • the expected value of a random variable that takes on numerical values is defined as:     ( ) E X x P x x this is the same thing as the mean • we can also talk about the expected value of a function of a random variable     ( ) ( ) ( ) E g X g x P x x

Defining bias and variance f ( x ; D ) • consider the task of learning a regression model   D  given a training set ( 1 ) ( 1 ) ( ) ( ) m m ( , ),..., ( , ) x y x y indicates the • a natural measure of the error of f is dependency of model on D [ ] 2 | x , D ( ) E y - f ( x ; D ) where the expectation is taken with respect to the real-world distribution of instances

Defining bias and variance • this can be rewritten as: [ ] = E [ ] 2 | x , D 2 | x , D ( ) ( ) y - f ( x ; D ) y - E [ y | x ] E ( ) + f ( x ; D ) - E [ y | x ] 2 noise: variance of y given x ; error of f as a predictor of y doesn’t depend on D or f

Defining bias and variance • now consider the expectation (over different data sets D ) for the second term [ ] = ( ) f ( x ; D ) - E [ y | x ] 2 E D ( ) [ ] - E y | x [ ] 2 E D f ( x ; D ) bias [ ] ( ) [ ] 2 + E D f ( x ; D ) - E D f ( x ; D ) variance • bias: if on average f ( x ; D ) differs from E [ y | x ] then f ( x ; D ) is a biased estimator of E [ y | x ] • variance: f ( x ; D ) may be sensitive to D and vary a lot from its expected value

Bias/variance for polynomial interpolation the 1 st order • polynomial has high bias, low variance 50 th order • polynomial has low bias, high variance 4 th order polynomial • represents a good trade-off

Bias/variance trade-off for nearest- neighbor regression • consider using k -NN regression to learn a model of this surface in a 2-dimensional feature space

Bias/variance trade-off for nearest- neighbor regression darker pixels bias for 1-NN correspond to higher values variance for 1-NN bias for 10-NN variance for 10-NN

Bias/variance trade-off • consider k -NN applied to digit recognition

Bias/variance discussion • predictive error has two controllable components • expressive/flexible learners reduce bias , but increase variance • for many learners we can trade-off these two components (e.g. via our selection of k in k -NN) • the optimal point in this trade-off depends on the particular problem domain and training set size • this is not necessarily a strict trade-off; e.g. with ensembles we can often reduce bias and/or variance without increasing the other term

Bias/variance discussion the bias/variance analysis • helps explain why simple learners can outperform more complex ones • helps understand and avoid overfitting

Learning Theory Part 3: Bias-Variance Tradeoff Yingyu Liang - PowerPoint PPT Presentation

Learning Theory Part 3: Bias-Variance Tradeoff Yingyu Liang Computer Sciences 760 Fall 2017 http://pages.cs.wisc.edu/~yliang/cs760/ Some of the slides in these lectures have been adapted/borrowed from materials developed by Mark Craven, David

Conformal Field Theories, Conformal Bootstrap and Applications Konstantinos Deligiannis December

Overview Two-Part MDL Two-Part MDL Two-Part MDL for Two-Part MDL for Grammar Learning

Part 0: Git-ing Started Part 1: Essential Skills Part 2: Introduction to Git Part 3: Advanced

Chapter 2- -3 3 Chapter 2 Definition of Theory: A theory is a systematic Definition of

Supervised Learning Part 1 Theory Sven Krippendorf Workshop on Big Data in String Theory

Dennis Ryan Clark County School District Health Occupations ryandl@nv.ccsd.net Learning Theory

Computational Learning Theory: Probably Approximately Correct (PAC) Learning Machine Learning 1

Computational Learning Theory: Agnostic Learning Machine Learning 1 Slides based on material

Lectures on learning theory G abor Lugosi ICREA and Pompeu Fabra University Barcelona what

COMPLETE STATISTICAL THEORY OF LEARNING LEARNING USING STATISTICAL INVARIANTS Vladimir Vapnik

Game Theory and Nuclear Weapons Game Theory and Nuclear Weapons Game Theory and Nuclear Warfare

Theory and Applications of Boosting Theory and Applications of Boosting Theory and Applications

Theory and Applications of Boosting Theory and Applications of Boosting Theory and Applications

SOCIOLOGICAL THEORY: A SCIENTIFIC APPROACH What is a theory? ! What does a theory consist of?

Applied Hodge Theory: Social Choice, Crowdsourced Ranking, and Game Theory Yuan Yao HKUST

SOCIOLOGICAL THEORY: A SCIENTIFIC APPROACH What is a theory? What does a theory consist of?

Module 14. Budgeting Dr. Varadraj Bapat 1 Index Introduction Objectives Advantages

AI DRIVEN ENERGY CLOUD Global energy sector is in the midst of a major transformation High

Manufacturing & Industrial Team Number: Four 1. Will the crisis prompt manufacturing

Almost-Optimal Strategies in Priced Timed Games Patricia Bouyer 1 , Kim G. Larsen 2 , Nicolas

Towards Controllable Explanation Generation for Recommender Systems via Neural Template Lei Li 1

Process Robustness Studies Background When factors interact, the level of one can sometimes be

Introduction Questions: Assessment of the expected error of a learning algorithm: Is the error

An Automated, Cross-Layer Instrumentation Framework for Diagnosing Performance Problems in

Learning Theory Part 3: Bias-Variance Tradeoff Yingyu Liang - PowerPoint PPT Presentation

Learning Theory Part 3: Bias-Variance Tradeoff Yingyu Liang Computer Sciences 760 Fall 2017 http://pages.cs.wisc.edu/~yliang/cs760/ Some of the slides in these lectures have been adapted/borrowed from materials developed by Mark Craven, David

Conformal Field Theories, Conformal Bootstrap and Applications Konstantinos Deligiannis December

Overview Two-Part MDL Two-Part MDL Two-Part MDL for Two-Part MDL for Grammar Learning

Part 0: Git-ing Started Part 1: Essential Skills Part 2: Introduction to Git Part 3: Advanced

Chapter 2- -3 3 Chapter 2 Definition of Theory: A theory is a systematic Definition of

Supervised Learning Part 1 Theory Sven Krippendorf Workshop on Big Data in String Theory

Dennis Ryan Clark County School District Health Occupations ryandl@nv.ccsd.net Learning Theory

Computational Learning Theory: Probably Approximately Correct (PAC) Learning Machine Learning 1

Computational Learning Theory: Agnostic Learning Machine Learning 1 Slides based on material

Lectures on learning theory G abor Lugosi ICREA and Pompeu Fabra University Barcelona what

COMPLETE STATISTICAL THEORY OF LEARNING LEARNING USING STATISTICAL INVARIANTS Vladimir Vapnik

Game Theory and Nuclear Weapons Game Theory and Nuclear Weapons Game Theory and Nuclear Warfare

Theory and Applications of Boosting Theory and Applications of Boosting Theory and Applications

Theory and Applications of Boosting Theory and Applications of Boosting Theory and Applications

SOCIOLOGICAL THEORY: A SCIENTIFIC APPROACH What is a theory? ! What does a theory consist of?

Applied Hodge Theory: Social Choice, Crowdsourced Ranking, and Game Theory Yuan Yao HKUST

SOCIOLOGICAL THEORY: A SCIENTIFIC APPROACH What is a theory? What does a theory consist of?

Module 14. Budgeting Dr. Varadraj Bapat 1 Index Introduction Objectives Advantages

AI DRIVEN ENERGY CLOUD Global energy sector is in the midst of a major transformation High

Manufacturing &amp; Industrial Team Number: Four 1. Will the crisis prompt manufacturing

Almost-Optimal Strategies in Priced Timed Games Patricia Bouyer 1 , Kim G. Larsen 2 , Nicolas

Towards Controllable Explanation Generation for Recommender Systems via Neural Template Lei Li 1

Process Robustness Studies Background When factors interact, the level of one can sometimes be

Introduction Questions: Assessment of the expected error of a learning algorithm: Is the error

An Automated, Cross-Layer Instrumentation Framework for Diagnosing Performance Problems in

Manufacturing & Industrial Team Number: Four 1. Will the crisis prompt manufacturing