RECSM Summer School: Machine Learning for Social Sciences Session - PowerPoint PPT Presentation

Mar 15, 2024 •16 likes •436 views

RECSM Summer School: Machine Learning for Social Sciences Session 1.3: Supervised Learning and Model Accuracy Reto West Department of Political Science and International Relations University of Geneva 1 Supervised Learning Supervised

RECSM Summer School: Machine Learning for Social Sciences Session 1.3: Supervised Learning and Model Accuracy Reto Wüest Department of Political Science and International Relations University of Geneva 1
Supervised Learning
Supervised Learning Statistical Decision Theory
Statistical Decision Theory • Let X ∈ R p be a vector of input variables and Y ∈ R an output variable, with joint distribution Pr( X, Y ) . • Our goal is to find a function f ( X ) for predicting Y given values of X . • We need a loss function L ( Y, f ( X )) that penalizes errors in prediction. • The most common loss function is squared error loss L ( Y, f ( X )) = ( Y − f ( X )) 2 . (1.3.1) 1
Statistical Decision Theory • The expected prediction error or expected test error is expected test error = E ( Y − f ( X )) 2 . (1.3.2) • We choose f so as to minimize the expected test error. • The solution is the conditional expectation f ( x ) = E ( Y | X = x ) . (1.3.3) • Hence, the best prediction of Y at point X = x is the conditional expectation. • Let’s look at two simple methods that differ in how they approximate the conditional expectation. 2
Supervised Learning Method I: Linear Model and Least Squares
Linear Model and Least Squares • In linear regression, we specify a model to estimate the conditional expectation in (1.3.3) f ( x ) = x T β. (1.3.4) • Using the method of least squares, we choose β to minimize the residual sum of squares N � i β ) 2 . ( y i − x T RSS ( β ) = (1.3.5) i =1 3
Linear Model and Least Squares – Example • Goal is to predict outcome variable G ∈ { blue , orange } on the basis of training data on inputs X 1 ∈ R and X 2 ∈ R . • We fit a linear regression to the training data, with Y coded as 0 for blue and 1 for orange. • Fitted values ˆ Y are converted to a fitted variable ˆ G as follows   if ˆ orange Y > 0 . 5 , ˆ G = (1.3.6)  if ˆ blue Y ≤ 0 . 5 . • In the figure below, the set of points classified as orange is { x ∈ R 2 : x T ˆ β > 0 . 5 } and the set of points classified as blue is { x ∈ R 2 : x T ˆ β ≤ 0 . 5 } . The linear decision boundary separating the two predicted classes is { x ∈ R 2 : x T ˆ β = 0 . 5 } . 4

Recommend

RECSM Summer School: Machine Learning for Social Sciences Session 3.3: K -Means Clustering Reto

RECSM Summer School: Machine Learning for Social Sciences Session 3.3: K -Means Clustering Reto West Department of Political Science and International Relations University of Geneva 1 Clustering Clustering Clustering refers to a set of

352 views • 17 slides

RECSM Summer School: Machine Learning for Social Sciences Session 3.2: Principal Components

RECSM Summer School: Machine Learning for Social Sciences Session 3.2: Principal Components Analysis Reto West Department of Political Science and International Relations University of Geneva 1 Principal Components Analysis Principal

492 views • 32 slides

RECSM Summer School: Machine Learning for Social Sciences Session 2.1: Introduction to

RECSM Summer School: Machine Learning for Social Sciences Session 2.1: Introduction to Classification and Regression Trees Reto West Department of Political Science and International Relations University of Geneva 1 The Basics of Decision

524 views • 30 slides

RECSM Summer School: Machine Learning for Social Sciences Session 1.4: Ridge Regression Reto

RECSM Summer School: Machine Learning for Social Sciences Session 1.4: Ridge Regression Reto West Department of Political Science and International Relations University of Geneva 1 Shrinkage Methods Shrinkage Methods Shrinkage methods

264 views • 10 slides

RECSM Summer School: Machine Learning for Social Sciences Session 3.4: Hierarchical Clustering

RECSM Summer School: Machine Learning for Social Sciences Session 3.4: Hierarchical Clustering Reto West Department of Political Science and International Relations University of Geneva 1 Clustering Clustering Hierarchical Clustering

237 views • 22 slides

RECSM Summer School: Machine Learning for Social Sciences Session 2.1: Introduction to

496 views • 30 slides

RECSM Summer School: Machine Learning for Social Sciences Session 2.4: Boosting Reto West

RECSM Summer School: Machine Learning for Social Sciences Session 2.4: Boosting Reto West Department of Political Science and International Relations University of Geneva 1 Boosting Boosting Like bagging, boosting is a general

374 views • 13 slides

RECSM Summer School: Social Media and Big Data Research Pablo Barber a London School of

RECSM Summer School: Social Media and Big Data Research Pablo Barber a London School of Economics www.pablobarbera.com Course website: pablobarbera.com/social-media-upf Supervised Machine Learning Applied to Social Media Text Supervised

1.25k views • 95 slides

RECSM Summer School: Social Media and Big Data Research Pablo Barber a London School of

RECSM Summer School: Social Media and Big Data Research Pablo Barber a London School of Economics www.pablobarbera.com Course website: pablobarbera.com/social-media-upf Automated Analysis of Social Media Text Workflow: analysis of social

248 views • 12 slides

RECSM Summer School: Social Media and Big Data Research Pablo Barber a London School of

RECSM Summer School: Social Media and Big Data Research Pablo Barber a London School of Economics www.pablobarbera.com Course website: pablobarbera.com/social-media-upf 67% of Americans get news on social media (Pew Research) 58%

737 views • 69 slides

RECSM Summer School: Social Media and Big Data Research Pablo Barber a London School of

RECSM Summer School: Social Media and Big Data Research Pablo Barber a London School of Economics www.pablobarbera.com Course website: pablobarbera.com/social-media-upf Dictionary Methods Applied to Social Media Text Dictionary methods

420 views • 15 slides

RECSM Summer School: Social Media and Big Data Research Pablo Barber a London School of

RECSM Summer School: Social Media and Big Data Research Pablo Barber a London School of Economics www.pablobarbera.com Course website: pablobarbera.com/social-media-upf Discovery in Large-Scale Social Media Data Overview of text as data

402 views • 24 slides

RECSM Summer School: Social Media and Big Data Research Pablo Barber a London School of

RECSM Summer School: Social Media and Big Data Research Pablo Barber a London School of Economics www.pablobarbera.com Course website: pablobarbera.com/social-media-upf Discovery in Large-Scale Social Media Data Human behaviour is

688 views • 67 slides

RECSM Summer School: Social Media and Big Data Research Pablo Barber a School of

RECSM Summer School: Social Media and Big Data Research Pablo Barber a School of International Relations University of Southern California pablobarbera.com Networked Democracy Lab www.netdem.org Course website:

1.37k views • 119 slides

RECSM Summer School: Social Network Analysis Pablo Barber a School of International Relations

RECSM Summer School: Social Network Analysis Pablo Barber a School of International Relations University of Southern California pablobarbera.com Networked Democracy Lab www.netdem.org Course website: github.com/pablobarbera/big-data-upf

433 views • 24 slides

RECSM Summer School: Scraping the web Pablo Barber a School of International Relations

RECSM Summer School: Scraping the web Pablo Barber a School of International Relations University of Southern California pablobarbera.com Networked Democracy Lab www.netdem.org Course website: github.com/pablobarbera/big-data-upf

571 views • 40 slides

MLCC 2019 Local Methods and Bias Variance Trade-Off Lorenzo Rosasco UNIGE-MIT-IIT About this

MLCC 2019 Local Methods and Bias Variance Trade-Off Lorenzo Rosasco UNIGE-MIT-IIT About this class 1. Introduce a basic class of learning methods, namely local methods . 2. Discuss the fundamental concept of bias-variance trade-off to

1.06k views • 79 slides

Introduction to Machine Learning Model Validation and Selection Dr. Ilija Bogunovic Learning

Introduction to Machine Learning Model Validation and Selection Dr. Ilija Bogunovic Learning and Adaptive Systems (las.ethz.ch) Recap: Achieving generalization Fundamental assumption: Our data set is generated independently and identically

515 views • 28 slides

STAT 213 Cross-Validation (and Multifactor ANOVA?) Colin Reimer Dawson Oberlin College 12

Outline Last Time Cross-Validation STAT 213 Cross-Validation (and Multifactor ANOVA?) Colin Reimer Dawson Oberlin College 12 April 2016 Outline Last Time Cross-Validation Outline Last Time Cross-Validation Outline Last Time

381 views • 25 slides

Introduction to Data Science: Classifier n 1 n 1 k k Suppose you want to compare two

Classier evaluation Classier evaluation Leave-one-out Cross-Validation Leave-one-out Cross-Validation Leave-one-out Cross-Validation Classier evaluation Classier evaluation Leave-one-out Cross-Validation Resampled validation set

984 views • 51 slides

A major risk in classification: overfitting Assume we have a small data set We fit a model that

A major risk in classification: overfitting Assume we have a small data set We fit a model that separates red and blue red blue When more data becomes available, we see that the model is poor red blue A simpler model might have worked

211 views • 17 slides

Applied Machine Learning Applied Machine Learning Regularization Siamak Ravanbakhsh Siamak

Applied Machine Learning Applied Machine Learning Regularization Siamak Ravanbakhsh Siamak Ravanbakhsh COMP 551 COMP 551 (winter 2020) (winter 2020) 1 Learning objectives Learning objectives Basic idea of overfitting and underfitting

404 views • 39 slides

Deep Nets with http://vem.quantumunlimited.org/the-gates-of-horn/ Keras Professor

docs https://keras.io Deep Nets with http://vem.quantumunlimited.org/the-gates-of-horn/ Keras Professor Marie Roch These slides only cover enough to get started with feed-forward networks and do not cover regularization which is

134 views • 10 slides

Lecture 3: Method evaluation and tuning parameter selection Felix Held, Mathematical Sciences

Lecture 3: Method evaluation and tuning parameter selection Felix Held, Mathematical Sciences MSA220/MVE440 Statistical Learning for Big Data 29th March 2019 Evaluating performance of a statistical method Goals structure, e.g. in kNN

716 views • 27 slides