Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 3: - PowerPoint PPT Presentation

Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 3: Probability Jan-Willem van de Meent ( credit : Zhao, CS 229, Bishop)

Project Vote 1. Freeform : Develop your own project proposals • 30% of grade (homework 30%) • Present proposals after midterm • Peer-review reports 2. Predefined : Same project for whole class • 20% of grade (homework 40%) • More like a “super-homework” • Teaching assistants and instructors

Homework Problems Homework 1 will be out today (due 30 Sep) • 4 or (more likely) 5 problem sets • 30% - 40% of grade (depends on type of project) • Can use any language (within reason) • Discussion is encouraged, but submissions must be completed individually   (absolutely no sharing of code) • Submission via zip file by 11.59pm on day of deadline   (no late submissions) • Please follow submission guidelines on website   (TA’s have authority to deduct points)

Regression: Probabilistic Interpretation Log joint probability of N independent data points Maximum   Likelihood

Probability

Examples: Independent Events 1. What’s the probability of getting a sequence of 1,2,3,4,5,6 if we roll a dice six times? 2. A school survey found that 9 out of 10 students like pizza. If three students are chosen at random with replacement, what is the probability that all three students like pizza?

Dependent Events uit Apple or- intro- Orange Red bin Blue bin If I take a fruit from the red bin, what is the probability that I get an apple ?

Dependent Events uit Apple or- intro- Orange Red bin Blue bin Conditional Probability P(fruit = apple | bin = red ) = 2 / 8

Dependent Events uit Apple or- intro- Orange Red bin Blue bin Joint Probability P(fruit = apple , bin = red ) = 2 / 12

Dependent Events uit Apple or- intro- Orange Red bin Blue bin Joint Probability P(fruit = apple , bin = blue ) = ?

Dependent Events uit Apple or- intro- Orange Red bin Blue bin Joint Probability P(fruit = apple , bin = blue ) = 3 / 12

Dependent Events uit Apple or- intro- Orange Red bin Blue bin Joint Probability P(fruit = orange , bin = blue ) = ?

Dependent Events uit Apple or- intro- Orange Red bin Blue bin Joint Probability P(fruit = orange , bin = blue ) = 1 / 12

Two rules of Probability uit or- intro- 1. Sum Rule (Marginal Probabilities) P(fruit = apple ) = P(fruit = apple , bin = blue ) + P(fruit = apple , bin = red ) = ?

Two rules of Probability uit or- intro- 1. Sum Rule (Marginal Probabilities) P(fruit = apple ) = P(fruit = apple , bin = blue ) + P(fruit = apple , bin = red ) = 3 / 12 + 2 / 12 = 5 / 12

Two rules of Probability uit or- intro- 2. Product Rule P(fruit = apple , bin = red ) = P(fruit = apple | bin = red ) p(bin = red ) = ?

Two rules of Probability uit or- intro- 2. Product Rule P(fruit = apple , bin = red ) = P(fruit = apple | bin = red ) p(bin = red ) = 2 / 8 * 8 / 12 = 2 / 12

Two rules of Probability uit or- intro- 2. Product Rule (reversed) P(fruit = apple , bin = red ) = P(bin = red | fruit = apple ) p(fruit = apple ) = ?

Two rules of Probability uit or- intro- 2. Product Rule (reversed) P(fruit = apple , bin = red ) = P(bin = red | fruit = apple ) p(fruit = apple ) = 2 / 5 * 5 / 12 = 2 / 12

Bayes' Rule Posterior Likelihood Prior Sum Rule: Product Rule:

Bayes' Rule Posterior Likelihood Prior Probability of rare disease: 0.005 Probability of detection: 0.98 Probability of false positive: 0.05 Probability of disease when test positive?

Bayes' Rule Posterior Likelihood Prior 0.99 * 0.005 = 0.00495 0.99 * 0.005 + 0.05 * 0.995 = 0.0547 0.00495 / 0.0547 = 0.09

Measures

Elements of Probability • Sample space Ω   The set of all outcomes ω ∈ Ω of an experiment • Event space F   The set of all possible events A ∈ F, which are subsets A ⊆ Ω of possible outcomes • Probability Measure P   A function P: F → R

Axioms of Probability • A probability measure must satisfy 1. P ( A ) ≥ 0 ∀ A ∈ F 2. P ( Ω ) = 1 3. When A 1 , A 2 , … disjoint   P ( ∪ i A i ) = P P ( A i ) i

Corollaries of Axioms If A ⊆ B = ⇒ P ( A ) ≤ P ( B ) P ( A ∩ B ) ≤ min ( P ( A ) , P ( B )) P ( A ∪ B ) ≤ P ( A ) + P ( B ) (Union Bound) P ( Ω \ A ) = 1 − P ( A ) If A 1 , . . . , A k is a disjoint partition of Ω , then k P P ( A k ) = 1 i =1

Conditional Probability • Conditional Probability   Probability of event A, conditioned on   occurrence of event B P ( A | B ) = P ( A ∩ B ) P ( B ) • Conditional Independence   Events A and B are independent iff • P ( A | B ) = P ( A ) which implies • P( A ∩ B ) = P( A )P( B )

Conditional Probability

Conditional Probability What is the probability P ( B 3 )?

Conditional Probability What is the probability P ( B 1 | B 3 )?

Conditional Probability What is the probability P ( B 2 | A)?

Examples: Conditional Probability 1. A math teacher gave her class two tests . • 25% of the class passed both tests • 42% of the class passed the first test.   What percent of those who passed the first test also passed the second test? 2 . Suppose that for houses in New England • 84% of the houses have a garage • 65% of the houses have a garage and a back yard. What is the probability that a house has a backyard given that it has a garage?

Random Variable • A random variable X, is a function X : Ω → R Rolling a die: • X = number on the die • p(X = i) = 1/6 i = 1,2,...,6 Rolling two dice at the same time: • X = sum of the two numbers • p ( X = 2) = 1 / 36

Probability Mass Function • For a discrete random variable X,   a PMF is a function p : R → R such that p ( x ) = P ( X = x ) Rolling a die: • X = number on the die • p(X = i) = 1/6 i = 1,2,...,6 Rolling two dice at the same time: • X = sum of the two numbers • p ( X = 2) = 1 / 36

Continuous Random Variables p ( Y ) p ( X,Y ) Y = 2 Y = 1 X p ( X ) p ( X | Y = 1) X X

Probability Density Functions or P ( x ) x- p ( x ) inter- i- x δ x

Expected Values Statistics Machine Learning

Expected Values Mean Variance Covariance

Conjugate Distributions

Bernoulli µ x (1 − µ ) 1 − x Bern( x | µ ) = E [ x ] = µ var[ x ] = µ (1 − µ ) � 1 if µ � 0 . 5 , mode[ x ] = 0 otherwise H[ ] = ln (1 ) ln µ ∈ [0 , 1] that ariable x ∈ { 0 , 1 } by a single continuous

Binomial � N � µ m (1 − µ ) N − m Bin( m | N, µ ) = m E [ m ] = Nµ var[ m ] = Nµ (1 − µ ) mode[ m ] = ⌊ ( N + 1) µ ⌋

Beta Γ ( a + b ) Γ ( a ) Γ ( b ) µ a − 1 (1 − µ ) b − 1 Beta( µ | a, b ) = a E [ µ ] = a + b ab var[ µ ] = ( a + b ) 2 ( a + b + 1) a − 1 mode[ µ ] = a + b − 2 .

Conjugacy � N � µ m (1 − µ ) N − m Bin( m | N, µ ) = m [ ] = Γ ( a + b ) Γ ( a ) Γ ( b ) µ a − 1 (1 − µ ) b − 1 Beta( µ | a, b ) = a

Conjugacy Posterior Likelihood Prior Example: Biased Coin Observed data (flip outcomes) Unknown variable (coin bias)

Conjugacy Posterior Likelihood Prior Example: Biased Coin Likelihood of outcome given bias Prior belief about bias Posterior belief after trials

Conjugacy Posterior Likelihood Prior (bias)

Discrete (Multinomial) � K � µ x k p ( x ) = k k =1 E [ x k ] = µ k var[ x k ] = µ k (1 − µ k ) cov[ x j x k ] = I jk µ k

Dirichlet K � µ α k − 1 Dir( µ | α ) = C ( α ) k k =1 α k E [ µ k ] = � α α k ( � α − α k ) var[ µ k ] = α 2 ( � α + 1) � α j α k cov[ µ j µ k ] = − α 2 ( � α + 1) � α k − 1 mode[ µ k ] = α − K � E [ln ] = ( ) ( )

Dirichlet α = (1 , 1 , 1) α = (10 , 10 , 10) α = (0.1 , 0.1 , 0.1)

Multivariate Normal � � 1 1 − 1 2( x − µ ) T Σ − 1 ( x − µ ) N ( x | µ , Σ ) = | Σ | 1 / 2 exp (2 π ) D/ 2 E [ x ] = µ cov[ x ] = Σ mode[ x ] = µ 1 D N ( x | µ , Λ − 1 ) p ( x ) = N ( y | Ax + b , L − 1 ) p ( y | x ) = N ( y | A µ + b , L − 1 + A Λ − 1 A T ) p ( y ) = N ( x | Σ { A T L ( y − b ) + Λ µ } , Σ ) p ( x | y ) =

Bayesian Linear Regression Prior and Likelihood Posterior Maximum A Posteriori (MAP) gives Ridge Regression

Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 3: - PowerPoint PPT Presentation

Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 3: Probability Jan-Willem van de Meent ( credit : Zhao, CS 229, Bishop) Project Vote 1. Freeform : Develop your own project proposals 30% of grade (homework 30%) Present

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

Introduction What is data mining? to Data Mining: On what kind of data? Data Mining

DATA MINING LECTURE 2 What is data? The data mining pipeline What is Data Mining? Data

Data Mining: Concepts and Techniques Chapter 1 Introduction 1 August 19, 2013

Introduction What is data mining? to Data mining functionalities Data Mining Major

Data mining Machine Intelligence Thomas D. Nielsen September 2008 Data mining September 2008

DATA MINING LECTURE 1 Introduction What is data mining? After years of data mining there is

Data Mining: Concepts and Techniques Web Mining Li Xiong Slides credits: Jiawei Han and

Data Mining 2020 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 2, 2020

CS6220: DATA MINING TECHNIQUES Chapter 7: Advanced Pattern Mining Instructor: Yizhou Sun

Web MINING Web MINING Overview Overview Dr Ahmed Rafea Rafea Dr Ahmed 1 Web Mining Outline

LECTURE 1: INTRODUCTION TO DATA MINING Dr. Dhaval Patel CSE, IIT-Roorkee What is data mining?

Web Mining Web Mining to automatically discover and extract information from Web

Web Mining Web Mining to automatically discover and extract information from Web

Data Mining: Concepts and Techniques Chap 8. Data Streams, Time Series Data, and Sequential

Thanks to R Parr, C Guesterin

User Popula,ons Forgo=en usernames/ Distributed across networks; LOW-RATE passwords the

CSE446: Point Estjmatjon Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin, Dan Klein,

The prior model Alicia Johnson Associate Professor, Macalester College DataCamp Bayesian

Introduction to Bayesian Statistics Lecture 9: Hierarchical Models Rung-Ching Tsai Department of

Overview Bayesian Methods for Parameter Estimation Introduction to Bayesian Statistics: Learning

Learning Objectives At the end of the class you should be able to: derive Bayesian learning from

Identifying Parametric Prior Distributions Stephanie Kovalchik UCLA, Department of Biostatistics