CSE217 INTRODUCTION TO DATA SCIENCE LECTURE 6: LEARNING PRINCIPLES - PowerPoint PPT Presentation

Sep 30, 2022 •540 likes •747 views

CSE217 INTRODUCTION TO DATA SCIENCE LECTURE 6: LEARNING PRINCIPLES Spring 2019 Marion Neumann RECAP: MACHINE LEARNING Workflow 2 NOISE noisy samples from true function 3 WHY IS NOISE A PROBLEM? small random sample from the noisy

CSE217 INTRODUCTION TO DATA SCIENCE LECTURE 6: LEARNING PRINCIPLES Spring 2019 Marion Neumann
RECAP: MACHINE LEARNING • Workflow 2
NOISE • noisy samples from true function 3
WHY IS NOISE A PROBLEM? • small random sample from the noisy data 4
WHY IS NOISE A PROBLEM? • best model for this (training) data 5
WHY IS NOISE A PROBLEM? à fitting the noise instead of the true function 6
REGRESSION AND MODEL COMPLEXITY Error on training set : linear model >> quadraEc >> 6-order polynomial ß error is zero ! Is the model with zero ( training ) error the best ? PDSH p393 7 Linear Regression
EVALUATION FOR REGRESSION • Training Error vs. Test Error & = 6(7 () ) % predictions for test data • Error measures: + • RMSE: root mean squared error 0 . − 0 . ) 3 RMSE % &, & () = , - (% . • MAE: mean absolute error &, & () = + MAE % , - |% 0 . − 0 . | 8 .
OVERFITTING t µ Sgp kfg fnderf.im pH f a linear s I high order poly 9
EVALUATION FOR CLASSIFICATION • Quality Measures: we have again training and test • error rate (or misclassification rate) = error (accuracy) # #$%%&'(%%$)$*+ ,*%, -.$/,% # ,*%, -.$/,% • average accuracy ( = 1 − 23343 3562 ) • Noise in Classification • where do labels come from? à noisy labels 10
EVALUATION FOR CLASSIFICATION • Confusion matrix prediction TPR TP N +1 -1 NR FF YE f ✓ ✘ +1 true positive false nega2ve FPR ftp.u TNI P true label prediction predic2on ✓ ✘ ETNR TIN f true negative -1 false positive prediction prediction Can you define accuracy using these measures? 11
CLASSIFICATION AND MODEL COMPLEXITY to 12
CLASSIFICATION AND MODEL COMPLEXITY eiE'oat µy test errors compare training for all three models 13
OVERFITTING Draw this yourself d I I I l v 14
COMBATING OVERFITTING Several Strategies: 1) prefer simpler models over more complicated ones 2) use validation set for model selection T lDra A ground Validation truth msn.ee EsmYegaePEt prediction B 4 Validation Performance C Evaluation Validation 3) add a regularization term to your optimization problem during training vs penalize large weights in 15
HOW MUCH DATA DO WE NEED? • Learning curve 16
DATA ≠ DATA • Two kinds of data: population vs. sample A sample is a ( representative ) A population is the entire set subset of the objects or events of objects or events under under study. study. Population can be à needed because it’s hypothetical “all students” or impossible or intractable to all students in this class. obtain or use population data. What are problems with sample data ? 17
SAMPLING BIAS • What if our sample is biased? • Think about real world ML applica:ons where this might have a (nega:ve) impact! 18
SUMMARY & READING • Avoid overfitting ! • Model selection using a validation set can prevent overfitting. • Learning curve à training data size matters and influences model selection • Model evaluation for classification is more than just looking at the error . • DSFS • Ch11 (p142-147) • PDSH • Ch5 (p357,370-373) • Ch5 (p393-398) 19

Recommend

CSE217 INTRODUCTION TO DATA SCIENCE LECTURE 2: EXPLORATORY DATA ANALYSIS Spring 2019 Marion

CSE217 INTRODUCTION TO DATA SCIENCE LECTURE 2: EXPLORATORY DATA ANALYSIS Spring 2019 Marion Neumann RECAP: WHAT IS DATA SCIENCE? solving problems with data scientific, collect & clean & use data social, or data understand

428 views • 24 slides

CSE217 INTRODUCTION TO DATA SCIENCE LECTURE 4: REGRESSION Spring 2019 Marion Neumann RECAP:

CSE217 INTRODUCTION TO DATA SCIENCE LECTURE 4: REGRESSION Spring 2019 Marion Neumann RECAP: DATA SCIENCE solving problems with data scientific or collect & clean & use data data business understand format to create

547 views • 13 slides

CSE217 INTRODUCTION TO DATA SCIENCE LECTURE 10: DATA ENGINEERING Spring 2019 Marion Neumann

CSE217 INTRODUCTION TO DATA SCIENCE LECTURE 10: DATA ENGINEERING Spring 2019 Marion Neumann RECAP: FEATURE ENGINEERING 5 Good Reasons for Feature Engineering: 1) get better represented features ( scaling , standardization ) improve model

771 views • 18 slides

CSE217 INTRODUCTION TO DATA SCIENCE LECTURE 3: SENTIMENT ANALYSIS Spring 2019 Marion Neumann

CSE217 INTRODUCTION TO DATA SCIENCE LECTURE 3: SENTIMENT ANALYSIS Spring 2019 Marion Neumann SENTIMENT ANALYSIS discover peoples opinions , emotions , feelings about a subject , topic , product , or service from text Step 3: Step 1:

310 views • 14 slides

CSE217 INTRODUCTION TO DATA SCIENCE LECTURE 8: SIMILARITY-BASED PREDICTION Spring 2019 Marion

CSE217 INTRODUCTION TO DATA SCIENCE LECTURE 8: SIMILARITY-BASED PREDICTION Spring 2019 Marion Neumann RECAP: CLUSTERING Good clustering high similarity within each group low similarity across the groups minimize distance of each

768 views • 14 slides

CSE217 INTRODUCTION TO DATA SCIENCE COURSE WEBSITE, SYLLABUS, ACADEMIC INTEGRITY Spring 2019

CSE217 INTRODUCTION TO DATA SCIENCE COURSE WEBSITE, SYLLABUS, ACADEMIC INTEGRITY Spring 2019 Marion Neumann ABOUT Marion Neumann office: Jolley Hall 222 You are a real person! office hours: THU 3-4pm contact: use Piazza

559 views • 8 slides

DataCamp Data Types for Data Science DataCamp Data Types for Data Science Data types Data type

DataCamp Data Types for Data Science DataCamp Data Types for Data Science Data types Data type system sets the stage for the capabilities of the language Understanding data types empowers you as a data scientist DataCamp Data Types for Data

528 views • 23 slides

EMIS/DS 1300: A Practical Introduction to Data Science Slides by Michael Hahsler Data + Science

EMIS/DS 1300: A Practical Introduction to Data Science Slides by Michael Hahsler Data + Science = Results? Data Science ? Data Sources Results What is Data Science? Data science is a concept to unify statistics, data analysis, machine

637 views • 30 slides

Introduction and lists Jason Myers Instructor DataCamp Data Types for Data Science Data types

DataCamp Data Types for Data Science DATA TYPES FOR DATA SCIENCE Introduction and lists Jason Myers Instructor DataCamp Data Types for Data Science Data types Data type system sets the stage for the capabilities of the language

270 views • 23 slides

Data Science in the Wild Lecture 1: Introduction Eran Toch Data Science in the Wild, Spring 2019

Data Science in the Wild Lecture 1: Introduction Eran Toch Data Science in the Wild, Spring 2019 1 Agenda 1. About the Course 2. The Data Explosion 3. Data Science Capabilities 4. The scientific method Data Science in the Wild, Spring 2019

498 views • 49 slides

Data Science: Statistics or Computer Science? 9/15/2015 DATA SCIENCE: STATISTICS OR COMPUTER

Data Science: Statistics or Computer Science? 9/15/2015 DATA SCIENCE: STATISTICS OR COMPUTER SCIENCE? DATA SCIE DA SCIENCE: ST STATIS ISTICS TICS OR OR CO COMPU MPUTER ER SCIEN SCIENCE? IMPLICATIONS FOR STATISTICS EDUCATION IMPLIC ICATION

609 views • 24 slides

Environmental Health Science Data Streams Data Streams Health Data Health Data Brian S.

Environmental Health Science Data Streams Data Streams Health Data Health Data Brian S. Schwartz, MD, MS January 10 2013 January 10, 2013 When is a data stream not a data stream? When it is health data. EHR data = PHI of health EHR data =

480 views • 11 slides

Kotlin for Data Science Thomas Nield @thomasnield9727 Agenda Kotlin for Data Science

Kotlin for Data Science Thomas Nield @thomasnield9727 Agenda Kotlin for Data Science What is Data Science? Challenges in Data Science Why Kotlin for Data Science? Example Applications Getting Involved Thomas Nield

2.19k views • 36 slides

CSCI 3022 Intro to Data Science with Probability and Statistics What is Data Science? What is

CSCI 3022 Intro to Data Science with Probability and Statistics What is Data Science? What is Data Science? Seriously. What do YOU think it is? What is Data Science? Seriously. What do YOU think it is? What is Data Science? Modeling and

657 views • 27 slides

DATA SCIENCE DAN S REZNIK, DIRECTOR DATA SCIENCE CONSULTING LTD (c) 2019 Data Science Consutling

(c) 2019 Data Science Consutling Ltd. 4 MYTHS ABOUT DATA SCIENCE DAN S REZNIK, DIRECTOR DATA SCIENCE CONSULTING LTD (c) 2019 Data Science Consutling Ltd. Artificial Intelligence is Intelligence Machine Learning is Learning CONTENTS

611 views • 44 slides

Data Science in the Wild Lecture 12: Memory-Based Data Warehouses Eran Toch Data Science in the

Data Science in the Wild Lecture 12: Memory-Based Data Warehouses Eran Toch Data Science in the Wild, Spring 2019 1 Data Engineering Extract Transform Load & Clean Sources Data Warehouse Data Science in the Wild, Spring 2019 2

1.12k views • 54 slides

New Langevin based algorithms for MCMC in high dimensions Alain Durmus Joint work with Gareth O.

New Langevin based algorithms for MCMC in high dimensions Alain Durmus Joint work with Gareth O. Roberts, Gilles Vilmart and Konstantinos Zygalakis. Dpartement TSI, Telecom ParisTech Siximes rencontres des jeunes statisticiens Main themes

426 views • 41 slides

Joint stateparameter estimation for nonlinear stochastic energy balance models Fei Lu 1 Nils

Joint stateparameter estimation for nonlinear stochastic energy balance models Fei Lu 1 Nils Weitzel 2 Adam Monahan 3 1 Department of Mathematics, Johns Hopkins 2 Meteorological Institute, University of Bonn, Germany 3 School of Earth and Ocean

502 views • 20 slides

Differentially Private Markov Chain Monte Carlo o 2 , Onur Dikmen 3 and Antti Honkela 1 a 1

Differentially Private Markov Chain Monte Carlo o 2 , Onur Dikmen 3 and Antti Honkela 1 a 1 , Joonas J Mikko Heikkil alk Equal contribution 1 University of Helsinki 2 Aalto University 3 Halmstad University NeurIPS, 12 December

497 views • 11 slides

Mapping ideals of quantum group multipliers Jason Crann with M. Alaghmandan and M. Neufang

Introduction QSIN Quantum Groups Quantum Gilbert Representation Mapping Ideals Mapping ideals of quantum group multipliers Jason Crann with M. Alaghmandan and M. Neufang arXiv:1803.08342 Carleton University Quantum Groups and Their

1.2k views • 106 slides

LEARNING Outline Confusion Matrix F1 Score Gain and Lift Charts Kolmogorov Smirnov

Measuring Performance CSCI 447/547 MACHINE LEARNING Outline Confusion Matrix F1 Score Gain and Lift Charts Kolmogorov Smirnov Chart ROC / AUC Regression Metrics Kappa Statistic Confusion Matrix Confusion Matrix

357 views • 11 slides

Linear Estimation Problem Formulation Basic ideas Goal for much of this class is to

Linear Estimation Problem Formulation Basic ideas Goal for much of this class is to estimate a random variable Types of inputs Usual assumptions Is a signal, y ( n ) C 1 1 Error criterion Much of what we discuss

513 views • 14 slides

Forecast verification 4th VALUE Training School Jonas Bhend, Sven Kotlarski Forecast verification

Federal Department of Home Affairs FDHA Federal Office of Meteorology and Climatology MeteoSwiss Forecast verification 4th VALUE Training School Jonas Bhend, Sven Kotlarski Forecast verification is the process of comparing forecasts with

379 views • 26 slides

Deep Learning-based Short Video Recommendation and Prefetching for Mobile Commuting Users Qian Li

RAPID708 Deep Learning-based Short Video Recommendation and Prefetching for Mobile Commuting Users Qian Li 1 , Yuan Zhang 1 , Hong Huang 2 , Jinyao Yan 1 1 Communication University of China 2 Huazhong University of Science and

187 views • 16 slides