The Probabilistic Approach to Learning from Data Prob. Readings: - PowerPoint PPT Presentation

10-‑601 ¡Introduction ¡to ¡Machine ¡Learning Machine ¡Learning ¡Department School ¡of ¡Computer ¡Science Carnegie ¡Mellon ¡University The ¡Probabilistic ¡ Approach ¡to ¡Learning ¡ from ¡Data Prob. ¡Readings: Matt ¡Gormley Lecture ¡notes ¡from ¡10-‑600 ¡ (See ¡Piazza ¡post ¡for ¡the ¡pointers) Lecture ¡4 January ¡30, ¡2016 Murphy ¡2 Bishop ¡2 HTF ¡-‑-‑ Mitchell ¡-‑-‑ 1

Reminders • Website schedule updated • Background Exercises (Homework 1) – Released: ¡Wed, ¡Jan. ¡25 – Due: ¡Wed, ¡Feb. ¡1 ¡at ¡5:30pm (The deadline was extended!) • Homework 2: ¡Naive Bayes – Released: ¡Wed, ¡Feb. ¡1 – Due: ¡Mon, ¡Feb. ¡13 ¡at ¡5:30pm 2

Outline • Generating ¡Data – Natural ¡(stochastic) ¡data – Synthetic ¡data – Why ¡synthetic ¡data? – Examples: ¡Multinomial, ¡Bernoulli, ¡Gaussian • Data ¡Likelihood – Independent ¡and ¡Identically ¡Distributed ¡(i.i.d.) – Example: ¡Dice ¡Rolls • Learning ¡from ¡Data ¡(Frequentist) – Principle ¡of ¡Maximum ¡Likelihood ¡Estimation ¡(MLE) – Optimization ¡for ¡MLE – Examples: ¡1D ¡and ¡2D ¡optimization – Example: ¡MLE ¡of ¡Multinomial – Aside: ¡Method ¡of ¡Langrange Multipliers • Learning ¡from ¡Data ¡(Bayesian) – maximum ¡a ¡posteriori ¡ (MAP) ¡estimation – Optimization ¡for ¡MAP – Example: ¡MAP ¡of ¡Bernoulli—Beta ¡ 3

Generating ¡Data Whiteboard – Natural ¡(stochastic) ¡data – Synthetic ¡data – Why ¡synthetic ¡data? – Examples: ¡Multinomial, ¡Bernoulli, ¡Gaussian 4

In-‑Class ¡Exercise 1. With ¡your ¡neighbor, ¡ write ¡a ¡function ¡ which ¡ returns ¡ samples ¡from ¡a ¡Categorical – Assume ¡access ¡to ¡the ¡ rand() function – Function ¡signature ¡should ¡be: categorical_sample(phi) where ¡phi ¡is ¡the ¡array ¡of ¡parameters – Make ¡your ¡implementation ¡as ¡ efficient as ¡ possible! 2. What ¡is ¡the ¡ expected ¡runtime of ¡your ¡ function? 5

Data ¡Likelihood Whiteboard – Independent ¡and ¡Identically ¡Distributed ¡(i.i.d.) – Example: ¡Dice ¡Rolls 6

Learning ¡from ¡Data ¡(Frequentist) Whiteboard – Principle ¡of ¡Maximum ¡Likelihood ¡Estimation ¡ (MLE) – Optimization ¡for ¡MLE – Examples: ¡1D ¡and ¡2D ¡optimization – Example: ¡MLE ¡of ¡Multinomial – Aside: ¡Method ¡of ¡Langrange Multipliers 7

Learning ¡from ¡Data ¡(Bayesian) Whiteboard – maximum ¡a ¡posteriori ¡ (MAP) ¡estimation – Optimization ¡for ¡MAP – Example: ¡MAP ¡of ¡Bernoulli—Beta ¡ 8

Takeaways • One ¡view ¡of ¡what ¡ML ¡is ¡trying ¡to ¡accomplish ¡is ¡ function ¡approximation • The ¡principle ¡of ¡ maximum ¡likelihood ¡ estimation ¡ provides ¡an ¡alternate ¡view ¡of ¡ learning • Synthetic ¡data ¡ can ¡help ¡ debug ML ¡algorithms • Probability ¡distributions ¡can ¡be ¡used ¡to ¡ model real ¡data ¡that ¡occurs ¡in ¡the ¡world (don’t ¡worry ¡we’ll ¡make ¡our ¡distributions ¡more ¡ interesting ¡soon!) 9

The ¡remaining ¡slides ¡are ¡ extra ¡ slides for ¡your ¡reference. Since ¡they ¡are ¡ background ¡ material ¡ they ¡were ¡not ¡ (explicitly) ¡covered ¡in ¡class. 10

Outline ¡of ¡Extra ¡Slides • Probability ¡Theory – Sample ¡space, ¡Outcomes, ¡Events – Kolmogorov’s ¡Axioms ¡of ¡Probability • Random ¡Variables – Random ¡variables, ¡Probability ¡mass ¡function ¡(pmf), ¡Probability ¡ density ¡function ¡(pdf), ¡Cumulative ¡distribution ¡function ¡(cdf) – Examples – Notation – Expectation ¡and ¡Variance – Joint, ¡conditional, ¡marginal ¡probabilities – Independence – Bayes’ ¡Rule • Common ¡Probability ¡Distributions – Beta, ¡Dirichlet, ¡etc. 11

PROBABILITY ¡THEORY 12

Probability ¡Theory: ¡Definitions Example ¡1: ¡Flipping ¡a ¡coin Sample ¡Space {Heads, Tails} Ω Outcome Example: ¡Heads ω ∈ Ω Event Example: ¡{Heads} E ⊆ Ω P( {Heads} ) = 0.5 Probability P ( E ) P( {Tails} ) = 0.5 13

Probability ¡Theory: ¡Definitions Probability ¡provides ¡a ¡science ¡for ¡inference ¡ about ¡interesting ¡events Sample ¡Space The ¡set ¡of ¡all ¡possible ¡outcomes Ω Outcome Possible result ¡of ¡an ¡experiment ω ∈ Ω Event Any ¡subset ¡of ¡the ¡sample ¡space E ⊆ Ω Probability The ¡non-‑negative ¡number ¡assigned ¡ P ( E ) to ¡each ¡event ¡in ¡the ¡sample ¡space • Each ¡outcome ¡is ¡unique • Only ¡one ¡outcome ¡can ¡occur ¡per ¡experiment • An ¡outcome ¡can ¡be ¡in ¡multiple ¡events • An ¡ elementary ¡event ¡ consists ¡of ¡exactly ¡one ¡outcome 14

Probability ¡Theory: ¡Definitions Example ¡2: ¡Rolling ¡a ¡6-‑sided ¡die Sample ¡Space {1,2,3,4,5,6} Ω Outcome Example: ¡3 ω ∈ Ω Event Example: ¡{3} ¡ E ⊆ Ω (the event ¡“the ¡die came ¡up ¡3”) P( {3} ) = 1/6 Probability P ( E ) P( {4} ) = 1/6 15

Probability ¡Theory: ¡Definitions Example ¡2: ¡Rolling ¡a ¡6-‑sided ¡die Sample ¡Space {1,2,3,4,5,6} Ω Outcome Example: ¡3 ω ∈ Ω Event Example: ¡{2,4,6} ¡ E ⊆ Ω (the event ¡“the ¡roll ¡was even”) P( {2,4,6} ) = 0.5 Probability P ( E ) P( {1,3,5} ) = 0.5 16

Probability ¡Theory: ¡Definitions Example ¡3: ¡Timing ¡how ¡long ¡it ¡takes ¡a ¡monkey ¡to ¡ reproduce ¡Shakespeare Sample ¡Space [0, ¡+∞) Ω Outcome Example: ¡1,433,600 ¡hours ω ∈ Ω Event Example: ¡[1, ¡6] ¡hours E ⊆ Ω P( [1,6] ) = 0.000000000001 Probability P ( E ) P( [1,433,600, ¡+∞) ) = 0.99 17

Kolmogorov’s ¡Axioms 1. P ( E ) ≥ 0 , for all events E 2. P ( Ω ) = 1 3. If E 1 , E 2 , . . . are disjoint, then P ( E 1 or E 2 or . . . ) = P ( E 1 ) + P ( E 2 ) + . . . 18

Kolmogorov’s ¡Axioms All ¡of ¡ 1. P ( E ) ≥ 0 , for all events E probability ¡can ¡ 2. P ( Ω ) = 1 be ¡derived ¡ 3. If E 1 , E 2 , . . . are disjoint, then from ¡just ¡ � ∞ � ∞ these! � � = P ( E i ) P E i i =1 i =1 In ¡words: 1. Each ¡event ¡has ¡non-‑negative ¡probability. 2. The ¡probability ¡that ¡ some event ¡will ¡occur ¡is ¡one. 3. The ¡probability ¡of ¡the ¡union ¡of ¡many ¡disjoint ¡sets ¡is ¡ the ¡sum ¡of ¡their ¡probabilities 19

Probability ¡Theory: ¡Definitions • The ¡ complement of ¡an ¡event ¡ E , ¡denoted ¡ ~E , ¡ is ¡the ¡event ¡that ¡ E does ¡not ¡occur. Ω E ~E 20

RANDOM ¡VARIABLES 21

Random ¡Variables: ¡Definitions Random Def 1: ¡Variable whose ¡possible ¡values ¡ X Variable are ¡the ¡outcomes ¡of ¡a ¡random ¡ (capital experiment letters) Value ¡of ¡a ¡ The ¡value ¡taken ¡by ¡a ¡random ¡variable x (lowercase Random letters) Variable 22

Random ¡Variables: ¡Definitions Random Def 1: ¡Variable whose ¡possible ¡values ¡ X Variable are ¡the ¡outcomes ¡of ¡a ¡random ¡ experiment Discrete Random ¡variable ¡whose ¡values ¡come ¡ X Random ¡ from ¡a ¡countable ¡set ¡(e.g. ¡the ¡natural ¡ Variable numbers ¡or ¡{True, ¡False}) Continuous ¡ Random ¡variable ¡whose ¡values ¡come ¡ X Random from ¡an interval ¡or ¡collection ¡of ¡ Variable intervals ¡(e.g. ¡the ¡real ¡numbers ¡or ¡the ¡ range ¡(3, ¡5)) 23

Random ¡Variables: ¡Definitions Random Def 1: ¡Variable whose ¡possible ¡values ¡ X Variable are ¡the ¡outcomes ¡of ¡a ¡random ¡ experiment Def 2: ¡A ¡measureable ¡function ¡from ¡ the ¡sample ¡space ¡to ¡the ¡real ¡numbers: X : Ω → E Discrete Random ¡variable ¡whose ¡values ¡come ¡ X Random ¡ from ¡a ¡countable ¡set ¡(e.g. ¡the ¡natural ¡ Variable numbers ¡or ¡{True, ¡False}) Continuous ¡ Random ¡variable ¡whose ¡values ¡come ¡ X Random from ¡an interval ¡or ¡collection ¡of ¡ Variable intervals ¡(e.g. ¡the ¡real ¡numbers ¡or ¡the ¡ range ¡(3, ¡5)) 24

Random ¡Variables: ¡Definitions Discrete ¡ Random ¡variable ¡whose ¡values ¡come ¡ X Random from ¡a ¡countable ¡set ¡(e.g. ¡the ¡natural ¡ Variable numbers ¡or ¡{True, ¡False}) Probability ¡ Function ¡giving ¡the ¡probability that ¡ p ( x ) mass ¡ discrete ¡r.v. ¡X ¡takes ¡value ¡x. function ¡ p ( x ) := P ( X = x ) (pmf) 25

Random ¡Variables: ¡Definitions Example ¡2: ¡Rolling ¡a ¡6-‑sided ¡die Sample ¡Space {1,2,3,4,5,6} Ω Outcome Example: ¡3 ω ∈ Ω Event Example: ¡{3} ¡ E ⊆ Ω (the event ¡“the ¡die came ¡up ¡3”) P( {3} ) = 1/6 Probability P ( E ) P( {4} ) = 1/6 26

Random ¡Variables: ¡Definitions Example ¡2: ¡Rolling ¡a ¡6-‑sided ¡die Sample ¡Space {1,2,3,4,5,6} Ω Outcome Example: ¡3 ω ∈ Ω Event Example: ¡{3} ¡ E ⊆ Ω (the event ¡“the ¡die came ¡up ¡3”) P( {3} ) = 1/6 Probability P ( E ) P( {4} ) = 1/6 Discrete ¡Ran-‑ Example: ¡The ¡value ¡on ¡the ¡top ¡face X dom Variable of ¡the ¡die. Prob. Mass ¡ p(3) ¡= ¡1/6 p ( x ) Function ¡ p(4) ¡= ¡1/6 (pmf) 27

The Probabilistic Approach to Learning from Data Prob. Readings: - PowerPoint PPT Presentation

10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University The Probabilistic Approach to Learning from Data Prob. Readings: Matt

Probabilistic model Probabilistic model c Probabilistic model Probabilistic model c c

CS 4110 Probabilistic Programming Probabilistic Programming It's not about writing software.

Probabilistic Graphical Models Probabilistic Graphical Models Learning with partial observations

Running Probabilistic Running Probabilistic Running Probabilistic Programs Backwards Programs

Probabilistic Tracking and Probabilistic Tracking and Probabilistic Tracking and Thesis

Probabilistic Computation Lecture 13 BPP vs. PH 1 Recap 2 Recap Probabilistic computation 2

Table of Contents I Probabilistic Reasoning Classical Probabilistic Models Basic Probabilistic

Probabilistic Computation Lecture 12 Flipping coins, taking chances PP, BPP 1 Probabilistic

Probabilistic Tracking and Probabilistic Tracking and Probabilistic Tracking and Reconstruction

Probabilistic Computation Lecture 13 Understanding BPP 1 Recap 2 Recap Probabilistic

From Probabilistic Circuits to Probabilistic Programs and Back Guy Van den Broeck PROBPROG - Oct

Probabilistic Graphical Models Probabilistic Graphical Models Structure learning in Bayesian

Probabilistic Graphical Models Probabilistic Graphical Models introduction to learning Siamak

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Probabilistic Graphical Models Probabilistic Graphical Models Parameter learning in Bayesian

Probabilistic Graphical Models Probabilistic Graphical Models parameter learning in undirected

Welcome to CSci 1113 Introduction to C/C++ Programming for Scientists and Engineers Instructor

PRINCIPLES OF MICROSERVICES Sam Newman Microxchg, Berlin 2015 1 @samnewman There is no hyphen

1 & 2 Samuel Series Lesson #169 April 30, 2019 Dean Bible Ministries

CDA 4253 FPGA System Design Op7miza7on Techniques Hao Zheng Comp S ci & Eng Univ of South

Sampled data control Continuous-time system and discrete-time controller 2 4 2 1.5 u ( t ) y (

Unit 3: Foundations for inference Lecture 4: Review / Synthesis Statistics 101 Thomas Leininger

Rapid population synthesis of double neutron stars Alejandro Vigna-Gmez avignagomez@nbi.ku.dk

Abstraction refinement and plan revision for control synthesis under high level specifications

The Probabilistic Approach to Learning from Data Prob. Readings: - PowerPoint PPT Presentation

10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University The Probabilistic Approach to Learning from Data Prob. Readings: Matt

Probabilistic model Probabilistic model c Probabilistic model Probabilistic model c c

CS 4110 Probabilistic Programming Probabilistic Programming It's not about writing software.

Probabilistic Graphical Models Probabilistic Graphical Models Learning with partial observations

Running Probabilistic Running Probabilistic Running Probabilistic Programs Backwards Programs

Probabilistic Tracking and Probabilistic Tracking and Probabilistic Tracking and Thesis

Probabilistic Computation Lecture 13 BPP vs. PH 1 Recap 2 Recap Probabilistic computation 2

Table of Contents I Probabilistic Reasoning Classical Probabilistic Models Basic Probabilistic

Probabilistic Computation Lecture 12 Flipping coins, taking chances PP, BPP 1 Probabilistic

Probabilistic Tracking and Probabilistic Tracking and Probabilistic Tracking and Reconstruction

Probabilistic Computation Lecture 13 Understanding BPP 1 Recap 2 Recap Probabilistic

From Probabilistic Circuits to Probabilistic Programs and Back Guy Van den Broeck PROBPROG - Oct

Probabilistic Graphical Models Probabilistic Graphical Models Structure learning in Bayesian

Probabilistic Graphical Models Probabilistic Graphical Models introduction to learning Siamak

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Probabilistic Graphical Models Probabilistic Graphical Models Parameter learning in Bayesian

Probabilistic Graphical Models Probabilistic Graphical Models parameter learning in undirected

Welcome to CSci 1113 Introduction to C/C++ Programming for Scientists and Engineers Instructor

PRINCIPLES OF MICROSERVICES Sam Newman Microxchg, Berlin 2015 1 @samnewman There is no hyphen

1 &amp; 2 Samuel Series Lesson #169 April 30, 2019 Dean Bible Ministries

CDA 4253 FPGA System Design Op7miza7on Techniques Hao Zheng Comp S ci &amp; Eng Univ of South

Sampled data control Continuous-time system and discrete-time controller 2 4 2 1.5 u ( t ) y (

Unit 3: Foundations for inference Lecture 4: Review / Synthesis Statistics 101 Thomas Leininger

Rapid population synthesis of double neutron stars Alejandro Vigna-Gmez avignagomez@nbi.ku.dk

Abstraction refinement and plan revision for control synthesis under high level specifications

1 & 2 Samuel Series Lesson #169 April 30, 2019 Dean Bible Ministries

CDA 4253 FPGA System Design Op7miza7on Techniques Hao Zheng Comp S ci & Eng Univ of South