Overview DS GA 1002 Probability and Statistics for Data Science - PowerPoint PPT Presentation

Overview DS GA 1002 Probability and Statistics for Data Science http://www.cims.nyu.edu/~cfgranda/pages/DSGA1002_fall17 Carlos Fernandez-Granda

Probability and statistics ◮ Probability: Framework for dealing with uncertainty ◮ Statistics: Framework for extracting information from data making probabilistic assumptions

Probability ◮ Probability basics: Probability spaces, conditional probability, independence ◮ Random variables: continuous/discrete, important distributions, generating random variables (rejection sampling) ◮ Multivariate random variables: random vectors, continuous/discrete, independence (conditional independence, graphical models), generating multivariate random variables

Probability ◮ Expectation: expectation operator, mean, variance, Markov and Chebyshev inequalities, covariance, covariance matrices, conditional expectation ◮ Random processes: Definition, mean, autocovariance, important processes (iid sequences, Gaussian, Poisson, random walk) ◮ Convergence of random sequences: Types of convergence (in probability/distribution), law of large numbers, central limit theorem, Monte Carlo simulation ◮ Markov chains: Definition, recurrence, periodicity, convergence, Markov chain Monte Carlo (Metropolis-Hastings)

Statistics ◮ Descriptive statistics: Histogram, empirical mean/variance, order statistics, empirical covariance, empirical covariance matrix (principal component analysis) ◮ Frequentist statistics: iid sampling, mean square error, consistency, nonparametric model estimation (kernel density estimation), parametric model estimation (method of moments, maximum likelihood)

Statistics ◮ Bayesian statistics: Bayesian parametric models, conjugate priors, Bayesian estimators (minimum MSE estimator, maximum a posteriori) ◮ Hypothesis testing: Hypothesis-testing framework, parametric testing, nonparametric testing (permutation test), multiple testing ◮ Linear regression: Linear models, least-squares estimation, overfitting

Why should I take this course?

To understand probabilistic models

United States presidential election ◮ Indirect election, citizens of the US cast ballots for electors in the Electoral College ◮ These electors vote for the President and Vice President ◮ Number of electors per state = members of Congress (Washington D.C. gets 3) ◮ Except in Maine and Nebraska, all electors in a state go to the candidate who wins the state

538 probabilistic model (from fivethirtyeight.com ) Aim: Predict the election result using poll data Probabilistic models allow to take into account that ◮ Polls have different sample sizes ◮ Some pollsters are unreliable ◮ In some states there may be few polls (especially at the start of the campaign) ◮ Historic trends in each state are important ◮ Polls from states with similar demographics are correlated ◮ Additional information (approval ratings, contributions, party identification, . . . ) can be useful In addition, probabilistic models quantify the uncertainty of the prediction

538 probabilistic model (from fivethirtyeight.com )

To understand statistical methodology

Polio vaccine ◮ Poliomyelitis is an infectious disease, which induces paralysis and can be lethal ◮ It has almost been eradicated by vaccination (98 cases in 2015 from 350 000 in 1988) ◮ The first vaccine was developed in 1952 by Jonas Salk and collaborators ◮ Two experiments were carried out to evaluate whether the vaccine was effective

Polio vaccine ◮ Experiment 1: Students in 2nd grade with consent of their parents were vaccinated. Students in 1st and 3rd grade were not. ◮ Experiment 2: A group of children, whose parents consented, was randomly divided in half to form the treatment and control groups. Experiment 1 Experiment 2 Size Rate Size Rate Treatment 225 000 25 Treatment 200 000 28 Control 725 000 54 Control 200 000 71 No consent 125 000 44 No consent 350 000 46

To understand machine-learning algorithms

Quadratic discriminant analysis Labeled data

Quadratic discriminant analysis Aim: Classify unlabeled examples

Quadratic discriminant analysis Quadratic discriminant analysis fits a Gaussian distribution to each class

Quadratic discriminant analysis Results: red (99.9 %), blue (55.8 %), blue (97.2 %)

Overview DS GA 1002 Probability and Statistics for Data Science - PowerPoint PPT Presentation

Overview DS GA 1002 Probability and Statistics for Data Science http://www.cims.nyu.edu/~cfgranda/pages/DSGA1002_fall17 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with uncertainty Statistics:

01 | KPF Overview 01 | KPF Overview 01 | KPF Overview 01 | KPF Overview 01 | KPF Overview 01 |

OVERVIEW PRESENTATION / 1 OVERVIEW PRESENTATION / 1 SF park overview OVERVIEW PRESENTATION / 2

OVERVIEW PRESENTATION / 1 OVERVIEW PRESENTATION / 1 Acknowledgements OVERVIEW PRESENTATION / 2 SF

INVESTOR PRESENTATION FEBRUARY 2016 INDEX EXECUTIVE SUMMARY COMPANY OVERVIEW BUSINESS OVERVIEW

INVESTOR PRESENTATION MAY 2019 Index Executive Summary Company Overview Business Overview

INVESTOR PRESENTATION MARCH 2016 INDEX EXECUTIVE SUMMARY COMPANY OVERVIEW BUSINESS OVERVIEW

1 Overview Overview Regional demographic overview Regional demographic overview Workforce

Covid-19 and Business Interruption: Maximizing Insurance Coverage and Federal Grants Counsel

OVERVIEW OVERVIEW OVERVIEW OVERVIEW The qualifications are aimed at primary school

An overview to Maltese An overview to Maltese An overview to Maltese An overview to Maltese

GSM System Overview GSM System Overview GSM System Overview GSM System Overview Phone Lin

Butterball Employees Butterball Employees Butterball Employees Benefits Overview Ruan Benefits

Program-for-Results Financing Overview Overview Overview of World Bank Instruments

INVESTOR PRESENTATION Index Executive Summary Company Overview Business Overview Industry

Key Maths 3 UK Assessm ent overview Claire Parsons Overview 1. Key Maths 3 UK (overview) 2.

Federal Fiscal Year 2017-18 CHASE Fee Program June 21, 2018 Overview CHASE Overview Fee

IO Session I: Combination Failures & Futures Much Ado About What? Moderator: Jeffrey M.

Family Orientation In-Person Learning September 2020 2 Lowell Public Schools 5th Grade

Is it possible? How to assess the efficacy of preventive strategies in older subjects Antonio

I nsights from I nsights from 2 6 Years in 2 6 Years in I m m unization I m m unization

Menopausal Hormone Therapy and Health Outcomes During the Intervention and Extended Poststopping

ICD-10 Coding for Contact Lens Problems The EyeCodingForum.com Jeffrey Restuccio, CPC, CPC-H, M

Whats New in Thyroid Eye Disease? James A. Garrity MD Mayo Clinic Rochester, MN Whats

WHEN LEADERSHIP SAVES LIVES Tom Kimball Director National Traffic Law Center 703 519 1641 9

Overview DS GA 1002 Probability and Statistics for Data Science - PowerPoint PPT Presentation

Overview DS GA 1002 Probability and Statistics for Data Science http://www.cims.nyu.edu/~cfgranda/pages/DSGA1002_fall17 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with uncertainty Statistics:

01 | KPF Overview 01 | KPF Overview 01 | KPF Overview 01 | KPF Overview 01 | KPF Overview 01 |

OVERVIEW PRESENTATION / 1 OVERVIEW PRESENTATION / 1 SF park overview OVERVIEW PRESENTATION / 2

OVERVIEW PRESENTATION / 1 OVERVIEW PRESENTATION / 1 Acknowledgements OVERVIEW PRESENTATION / 2 SF

INVESTOR PRESENTATION FEBRUARY 2016 INDEX EXECUTIVE SUMMARY COMPANY OVERVIEW BUSINESS OVERVIEW

INVESTOR PRESENTATION MAY 2019 Index Executive Summary Company Overview Business Overview

INVESTOR PRESENTATION MARCH 2016 INDEX EXECUTIVE SUMMARY COMPANY OVERVIEW BUSINESS OVERVIEW

1 Overview Overview Regional demographic overview Regional demographic overview Workforce

Covid-19 and Business Interruption: Maximizing Insurance Coverage and Federal Grants Counsel

OVERVIEW OVERVIEW OVERVIEW OVERVIEW The qualifications are aimed at primary school

An overview to Maltese An overview to Maltese An overview to Maltese An overview to Maltese

GSM System Overview GSM System Overview GSM System Overview GSM System Overview Phone Lin

Butterball Employees Butterball Employees Butterball Employees Benefits Overview Ruan Benefits

Program-for-Results Financing Overview Overview Overview of World Bank Instruments

INVESTOR PRESENTATION Index Executive Summary Company Overview Business Overview Industry

Key Maths 3 UK Assessm ent overview Claire Parsons Overview 1. Key Maths 3 UK (overview) 2.

Federal Fiscal Year 2017-18 CHASE Fee Program June 21, 2018 Overview CHASE Overview Fee

IO Session I: Combination Failures &amp; Futures Much Ado About What? Moderator: Jeffrey M.

Family Orientation In-Person Learning September 2020 2 Lowell Public Schools 5th Grade

Is it possible? How to assess the efficacy of preventive strategies in older subjects Antonio

I nsights from I nsights from 2 6 Years in 2 6 Years in I m m unization I m m unization

Menopausal Hormone Therapy and Health Outcomes During the Intervention and Extended Poststopping

ICD-10 Coding for Contact Lens Problems The EyeCodingForum.com Jeffrey Restuccio, CPC, CPC-H, M

Whats New in Thyroid Eye Disease? James A. Garrity MD Mayo Clinic Rochester, MN Whats

WHEN LEADERSHIP SAVES LIVES Tom Kimball Director National Traffic Law Center 703 519 1641 9

IO Session I: Combination Failures & Futures Much Ado About What? Moderator: Jeffrey M.