Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang - PowerPoint PPT Presentation

Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

In the rest of this course we switch gear to the interplay between information theory and statistics. 1 In this lecture, we will introduce the basic elements of statistical decision theory : It is about how to make decision from data samples collected from a statistical model. It is about how to evaluate decision making algorithms (decision rules) under a statistical model. It also serves the purpose of overviewing the contents to be covered in the follow-up lectures. 2 In the follow-up lectures, we will go into details of several topics, including Hypothesis testing: large-sample asymptotic performance limits Point estimation: Bayes vs. Minimax, lower bounding techniques, high dimensional problems, etc. Along side, we will introduce tools and techniques for investigating the asymptotic performance of several statistical problems, and show its interplay with information theory . Tools from probability theory: large deviation, concentration inequalities, etc. Elements from information theory: information measures, lower bounding techniques, etc. 2 / 55 I-Hsiang Wang IT Lecture 7

Overview of this Lecture In this lecture, the goal is to establish basics of statistical decision theory. 1 We will begin with setting up the framework of statistical decision theory, including: Statistical experiment: parameter space, data samples, statistical model Decision rule: deterministic vs. randomized Performance evaluation: loss function, risk, minimax vs. Bayes 2 Next, we will introduce two basic statical decision making problems, including Hypothesis testing Point estimation 3 / 55 I-Hsiang Wang IT Lecture 7

Statistical Model and Decision Making Statistical Model and Decision Making 1 Basic Framework Examples Paradigms Hypothesis Testing 2 Basics Estimation 3 Mean-Squared Error (MSE) and Cramér-Rao Lower Bound Maximum Likelihood Estimator, Consistency, and Efficiency 4 / 55 I-Hsiang Wang IT Lecture 7

Statistical Model and Decision Making Basic Framework Statistical Model and Decision Making 1 Basic Framework Examples Paradigms Hypothesis Testing 2 Basics Estimation 3 Mean-Squared Error (MSE) and Cramér-Rao Lower Bound Maximum Likelihood Estimator, Consistency, and Efficiency 5 / 55 I-Hsiang Wang IT Lecture 7

� �� Statistical Model and Decision Making Basic Framework Statistical Decision Experiment Making τ ( X ) = ˆ T X θ P θ ( · ) θ ∈ Θ X ∈ X 6 / 55 I-Hsiang Wang IT Lecture 7

� �� Statistical Model and Decision Making Basic Framework Statistical Decision Experiment Making τ ( X ) = ˆ T X θ P θ ( · ) θ ∈ Θ X ∈ X Statistical Experiment Statistical Model: A collection of data-generating distributions P ≜ { P θ | θ ∈ Θ } , where ▶ Θ is called the parameter space , could be finite, infinitely countable, or uncountable. ▶ P θ ( · ) is a probability distribution which accounts for the implicit randomness in experiments, sampling, or making observations Data (Sample/Outcome/Observation) : X is generated by a random draw from P θ , that is, X ∼ P θ . ▶ X could be random variables, vectors, matrices, processes, etc. 7 / 55 I-Hsiang Wang IT Lecture 7

� �� Statistical Model and Decision Making Basic Framework Statistical Decision Experiment Making τ ( X ) = ˆ T X θ P θ ( · ) θ ∈ Θ X ∈ X Inference Task Objective: T ( θ ) , a function of the parameter θ . From the data X ∼ P θ , one would like to infer T ( θ ) from X . Decision Rule Decision rule (deterministic) : τ ( · ) is a function of X . ˆ T = τ ( X ) is the inferred result. Decision rule (randomized) : τ ( · , · ) is a function of ( X, U ) , where U is external randomness. ˆ T = τ ( X, U ) is the inferred result. 8 / 55 I-Hsiang Wang IT Lecture 7

� �� Statistical Model and Decision Making Basic Framework Statistical Decision Experiment Making τ ( X ) = ˆ T X θ P θ ( · ) θ ∈ Θ X ∈ X Loss Function ˆ T ( · ) T ( θ ) θ T l ( · , · ) l ( T ( θ ) , τ ( X )) E X ∼ P θ [ · ] L θ ( τ ) Performance Evaluation : how good is a decision rule τ ? Loss function: l ( T ( θ ) , τ ( X )) measures how bad the decision rule τ is (with a specific data point X ) . Note: since X is random, l ( T ( θ ) , τ ( X )) is also random. Risk: L θ ( τ ) ≜ E X ∼ P θ [ l ( T ( θ ) , τ ( X ))] measures on average how bad the decision rule τ is when the true parameter is θ . 9 / 55 I-Hsiang Wang IT Lecture 7

� �� Statistical Model and Decision Making Basic Framework Statistical Decision Experiment Making τ ( X ) = ˆ T X θ P θ ( · ) θ ∈ Θ X ∈ X Loss Function ˆ T ( · ) T ( θ ) θ T l ( · , · ) l ( T ( θ ) , τ ( X )) E X ∼ P θ [ · ] L θ ( τ ) Performance Evaluation : what if the decision rule τ is randomized? Loss function becomes l ( T ( θ ) , τ ( X, U )) . Risk becomes L θ ( τ ) ≜ E U,X ∼ P θ [ l ( T ( θ ) , τ ( X, U ))] . 10 / 55 I-Hsiang Wang IT Lecture 7

Statistical Model and Decision Making Examples Statistical Model and Decision Making 1 Basic Framework Examples Paradigms Hypothesis Testing 2 Basics Estimation 3 Mean-Squared Error (MSE) and Cramér-Rao Lower Bound Maximum Likelihood Estimator, Consistency, and Efficiency 11 / 55 I-Hsiang Wang IT Lecture 7

Statistical Model and Decision Making Examples Sometimes we care about the inferred object itself . 12 / 55 I-Hsiang Wang IT Lecture 7

Statistical Model and Decision Making Examples Example: Decoding Decoding in channel coding over a DMC is one example that we are familiar with. Parameter is the message: θ ← → m { 1 , 2 , . . . , 2 NR } Parameter space is the message set: Θ ← → Data is the received sequence: → Y N X ← → ∏ N Statistical model is Encoder+Channel: P θ ( x ) ← i =1 P Y | X ( y i | x i ( m )) Task is to decode the message: T ( θ ) ← → m Decision rule is the decoding algorithm: → dec ( Y N ) τ ( X ) ← { } m ̸ = dec ( y N ) Loss function is the 0-1 loss: l ( T ( θ ) , τ ( x )) ← → 1 { � } Risk is the decoding error probability: → λ m, dec ≜ P m ̸ = dec ( Y N ) � m is sent L θ ( τ ) ← 13 / 55 I-Hsiang Wang IT Lecture 7

Statistical Model and Decision Making Examples Example: Hypothesis Testing Decoding in channel coding belongs to a more general class of problems called hypothesis testing . Parameter space is a finite set: | Θ | < ∞ Task is to infer parameter θ : T ( θ ) = θ Loss function is the 0-1 loss: l ( T ( θ ) , τ ( x )) = 1 { θ ̸ = τ ( x ) } Risk is the probability of error: L θ ( τ ) = P X ∼ P θ { θ ̸ = τ ( X ) } 14 / 55 I-Hsiang Wang IT Lecture 7

Statistical Model and Decision Making Examples Example: Density Estimation Estimate the probability density function from the collected samples. Parameter space is a (huge) set of density functions: → F = { f : R → [0 , + ∞ ) which is concave/continuous/Lipschitz continuous/etc. } Θ ← i.i.d. Data is the observed i.i.d. sequence: → X n , X i X ← ∼ f ( · ) Task is to infer density function f ( · ) : T ( θ ) ← → f → ˆ Decision rule is the density estimator: τ ( X ) ← f X n ( · ) � ( ) � ˆ Loss function is some sort of divergence: � l ( T ( θ ) , τ ( x )) ← → D f f x n [ ( � )] � ˆ � Risk is the expected loss: L θ ( τ ) ← → E X n ∼ f ⊗ n D f f X n 15 / 55 I-Hsiang Wang IT Lecture 7

Statistical Model and Decision Making Examples Sometimes we care about the utility of the inferred object. 16 / 55 I-Hsiang Wang IT Lecture 7

Statistical Model and Decision Making Examples Example: Classification/Prediction A basic problem in learning is to train a classifier that predicts the category of a new object. Parameter space is a collection of labelings: Θ ← → H = { h : X → [1 : K ] } → ( X n , Y n ) , label Y i ∈ [1 : K ] . Data is the training data set: X ← → ∏ n Statistical model is the noisy labeling: P θ ( x ) ← i =1 P X ( x i ) P Y | h ( X ) ( y i | h ( x i )) → ˆ Task is to infer the true labeling h ∈ H : → h ( · ) , τ ( X ) ← h X n ,Y n ( · ) . T ( θ ) ← Loss function is the prediction error probability: [ { }] h ( X ) ̸ = ˆ l ( T, τ ( x )) ← → E X ∼ P X h ( X ) 1 (Note: This is still random as ˆ h depends on the randomly drawn training data ( X, Y ) n ) Risk is the averaged loss over training: [ [ { }]] h ( X ) ̸ = ˆ L θ ( τ ) ← → E ( X n ,Y n ) ∼ ( P X P Y | h ( X ) ) ⊗ n h ( X ) E X ∼ P X 1 17 / 55 I-Hsiang Wang IT Lecture 7

Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang - PowerPoint PPT Presentation

Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7 In the rest of this course we switch

Econ 2148, fall 2019 Statistical decision theory Maximilian Kasy Department of Economics,

Econ 2148, fall 2017 Statistical decision theory Maximilian Kasy Department of Economics,

Advanced Econometrics 2, Hilary term 2020 Statistical decision theory Maximilian Kasy Department

Advanced Econometrics 2, Hilary term 2021 Statistical decision theory Maximilian Kasy Department

Learning Decision Trees Representation is a decision tree. Bias is towards simple decision

Hypothesis testing and statistical decision theory Lirong Xia Fall, 2016 Schedule

Hypothesis testing and statistical decision theory Lirong Xia March 25, 2016 Schedule

Statistical Machine Translation Statistical Machine Translation p Lecture 2 Theory and Praxis of

6 Decision- -Making Making MVC (revisited) 6 Decision MVC (revisited) decision

Decision Trees Lecture 23 To left or to right 1 Decision Trees 2 Decision Trees A different

Decision Trees Lecture 22 To left or to right 1 Decision Trees 2 Decision Trees A different

Statistical Statistical Statistical Model Statistical Model Model Checking Model Checking

Statistical decision theory with economic incentives Aleksey Tetenov (University of Bristol)

Chapter 1: Probability Theory (a recap) STK4011/9011: Statistical Inference Theory Johan Pensar

Statistical graphics with Statistical graphics with ggplot2 ggplot2 Programming for Statistical

S C DECISION E N C E decision science SDS CMU What is Decision Science? Behavioral

(DMLAs) Outcome from an IRGC workshop, July 2018 https://irgc.epfl.ch No part of this document

C Programming for Engineers Data Types, Decision Making ICEN 360 Spring 2017 Prof. Dola

Chapter 16 Making Simple Decisions CS5811 - Advanced Artificial Intelligence Nilufer Onder

Decision making under uncertainty Course overview Christos Dimitrakakis October 29, 2013 . .

Decision Making in IT Ventures March 25 th , 2014 Vittoria Aiello, MBA, IT 496 Instructor

Introduction to Partially Observable Markov Decision Processes CS 886 Sequential Decision Making

POLI 359 Public Policy Making Session 3-Prescriptive Models of Public Policy Making Lecturer: Dr.

decision-making Maryam Hashemzadeh Winter 2019 1 What is cognitive science? The study of