A Utility-based Approach for Estimating Conditional Probability of - PowerPoint PPT Presentation

A Utility-based Approach for Estimating Conditional Probability of Human Lung Cancer From Microarray Data Craig Friedman (craig.friedman@cims.nyu.edu) Wenbo Cao (wcao@gc.cuny.edu )

• Introduction • Measuring model performance and building probabilistic models from a model-user’s perspective • Probabilistic models from Microarray data – 2 state conditional probability (NL or AD) – 6 state conditional probability (NL, AD, SMCL, SQ, COID, OA) – Conditional density (survival models, time permitting) • Conclusion 2

Introduction Conditional Probability Models • Prob(y=y|x), where – X is a vector of explanatory variables (for example, microarray data), – Y is a random variable or vector (for example, Y in {NL,AD}), and, – y is a particular value for Y (for example, y=AD). • Given one (or more) cutoff(s), each conditional probability model can be converted to a classification model. – For example, classify as AD if and only if Prob(y=AD|x)>cutoff. • Conditional probability models tell us directly about the probabilities. The provide more information than classification models, in general. 3

Introduction Principal References • Friedman, C. and Sandow, S., Model Performance Measures for Expected Utility Maximizing Investors, International Journal of Theoretical and Applied Finance , June, 2003. • Friedman, C. and Sandow, S., Learning Probabilistic Models: An Expected Utility Maximization Approach, Journal of Machine Learning Research , July, 2003. • Friedman, C. and Huang, J., A Utility-Based Public Firm Default Probability Model, Working Paper , 2003 • Friedman, C. and Sandow, S., Ultimate Recoveries, Risk , August, 2003 4

Introduction Model Performance Measures • Develop good conditional probability models – To do so, we must have a way to measure model performance – The models will be used by model-users to make decisions—performance should be measured accordingly 5

Performance Measures Utility Theory Elements • Utility functions assign values (utilities) to random wealth levels (power=2 utility used by Morningstar to rank funds) • Utility functions characterize the investor’s risk aversion. • Rational investors maximize their expected utility (from Utility Theory). 6

Performance Measures: Utility Theory Example A Fair Coin is Tossed GAME 1 GAME 2 $1.10 $2.00 $1 $1 $ .91 $ .80 Maximizing Expected Utility, The More Risk Averse Investor Chooses Game 1 Maximizing Expected Utility, The Less Risk Averse Investor Chooses Game 2 7

Performance Measures Utility Theory Elements • More is preferred to less (the utility function is a strictly increasing function of wealth) • The slope of the utility function decreases as wealth increases (a gift of $1 provides more utility when your wealth is low than when your wealth is large.) 8

Performance Measures These Model Performance Measures are – Natural Extensions of the Axioms of Utility Theory – Practical implementation is widely used (log-likelihood) – Many probabilistic contexts (not just 2 state, e.g., ROC) – Consistent with our approach to Model Formulation 9

Performance Measures • Assumptions: – Investor/model-user with utility function – Market with odds ratio for each state (NL and AD have different payoffs) – Model-user believes model and makes decisions to maximize expected utility (a consequence of Utility Theory) • Paradigm: We base our model performance measure on an (out of sample) estimate of expected utility . • Accurate models allow for effective decisions/investment strategies • Inaccurate models induce over-betting and under-betting • Our performance measures have financial interpretation 10

Performance Measures: An Important Class of Utilities • Interpretations (for a rich, tractable class of utility functions) – Estimated wealth growth rate pickup (for a certain type of investor) who uses model 2 rather than model 1 – Logarithm of likelihood ratio (our old friend from classical Statistics) – Performance measure that generates an optimal (in the sense of the Neyman-Pearson Lemma) decision surface. 11

Performance Measures: An Important Class of Utilities • Morningstar uses the power utility with power 2. • This is how a member of our family approximates Morningstar’s utility function 12

Performance Measures Investor has utility function U(W), 13

Performance Measures Our Paradigm 14

Performance Measures Our Paradigm 15

Maximum Expected Utility Models Introduction • To find a model, we balance – Consistency with the Data (there is a precise measure) – Consistency with Prior Beliefs (there is a precise measure) From an investor/model-user’s perspective • Result: a 1 hyperparameter family of models (an efficient frontier), each of which is associated with a a given level of consistency with the data. Each model – asymptotically maximizes expected utility over a potentially rich family of models – is robust: maximizing outperformance of benchmark model under the most adverse true measure (can make precise). • Choose optimal hyperparameter value by maximizing expected utility on an out of sample data set. 16

Maximum Expected Utility Models Formulation • We define the notion of Dominance (of one model measure over another). We select a measure on the efficient frontier. 17

Maximum Expected Utility Models Formulation • The probability simplex for a 2 state model: 18

Maximum Expected Utility Models Formulation • The probability simplex for a 3 state model: 19

Maximum Expected Utility Models Formulation 20

MEU Modeling Approach •Balance –Consistency with the Data –Consistency with Prior Beliefs 21

MEU Modeling Approach Dual Problem • We solve the dual of above optimization problem, which amounts to a maximization with respect to a set of Lagrange multipliers. • The dual problem can be interpreted as the search for a maximum-likelihood exponential model , with regularization. Choose α to maximize the out-of sample log-likelihood • 22

MEU Modeling Approach • Dual problem is strictly convex • Model produced is theoretically robust (best- worst case measure) • Features allow for flexibility • Approach avoids overfitting 23

Maximum Expected Utility Models Logistic Regression The models can be made flexible enough to conform to the data, but avoid overfitting. Linear logistic regression (and some generalizations) have the axis alignment Property. 24

Toy (2 Variable) Problem • Based on data from the Wisconsin Breast Cancer Databases, to illustrate, we select 2 variables: – Standard error of texture – Fractal dimension • We want to model the probability of malignancy, given the Standard error of texture and the fractal dimension. 25

Toy (2 Variable) Version Data • (We have transformed the data so that they lie in the unit square.) 26

Toy (2 Variable) Problem The Model • For each explanatory variable pair, we seek the function prob(malignancy|se,fd), where se=standard error of texture fd=fractal dimension 27

We model 2 ways • Linear logistic regression • MEU methodology 28

Toy Problem Same Data, 2 Different Models • Logistic Regression MEU 29

Toy Problem Contour Plots • The MEU model is clearly more consistent with the data, without overfitting. Linear logit is too stiff to reflect the story told by the data. 30

Model Performance Results Toy (2 Variable) Version • Logistic Regression: – ROC: 0.4646 – EWGRP: -0.0035 • MEU Model: – ROC: 0.6196 – EWGRP : 0.0223 where EWGRP=Expected Wealth Growth Rate Pickup • By adding additional explanatory variables, we can improve model performance 31

Model Performance MEU vs Benchmark • Overfit linear Logit versus MEU model 32

Applications 2 Categories: Harvard and Michigan Data Categories: 1) AD and suspected AD 2) Normal lung RESULTS , using 58 common probe set variables, level-set features Training on Harvard data, Testing on Michigan data MEU method: roc = 1, delta = 0.3341 Linear Logit: roc = .8186, delta = -22.1663 Training on Michigan data, Testing on Harvard data MEU method: roc = 1, delta = 0.3276 Linear Logit: roc = 0.7245, delta = -1.5982 We note that the MEU model produced perfect classification on the out of sample data sets. 33

Applications 2 Categories: Harvard and Michigan Data • Training on Harvard, testing on Michigan – Blue: Normal Lung; Red: AD and Suspected AD (OA) 34

Applications 2 Categories: Harvard and Michigan Data • Training on Michigan, testing on Harvard – Blue: Normal Lung; Red: AD and Suspected AD (OA) 35

A Utility-based Approach for Estimating Conditional Probability of - PowerPoint PPT Presentation

A Utility-based Approach for Estimating Conditional Probability of Human Lung Cancer From Microarray Data Craig Friedman (craig.friedman@cims.nyu.edu) Wenbo Cao (wcao@gc.cuny.edu ) Introduction Measuring model performance and

Estimating Variance under Estimating Mean . . . Interval and Fuzzy Estimating Variance . . .

Estimating Estimating Covariance . . . Statistical Characteristics Estimating . . . Proof of

Planning III-A: Planning III-A: Estimating Software Size - Estimating Software Size -

Estimating Frequency Moments Estimating F 0 Algorithm Correctness Further Anil Maheshwari

Estimating Frequency Moments Moments Estimating F 0 Algorithm Correctness Anil Maheshwari

Review: Conditional Probability Conditional Probability The conditional probability of event

11/15/16 Conditional distributions Let X and Y be discrete r.v.s. Conditional probability mass

Utility Flood SOLUTIONS November 9, 2017 UTILITY LIGHTING PRODUCTS 1 1 HO HOWARD WARD

Estimating Relative Expression Mark Voorhies 4/6/2011 Mark Voorhies Estimating Relative

Outline Outline Conditional Expected Value Conditional Expected Value Chapman

Conditional Statements Python Conditional Statements Sometimes a statement (or a block of

Markov random fields 2. conditional specifications 3. conditional auto-regression Rasmus

Formal Modeling in Cognitive Science Independence Lecture 23: Conditional Probability; Bayes

Conditional Sentences as Conditional Speech Acts Workshop Questioning Speech Acts Universitt

Conditional Probability & Independence Conditional Probabilities Question : How should we

P( ) 1 conditional probability where P(F) > 0 Conditional probability of E given F:

MnDOT MnDOT Planning Planning g Update Update p Metropolitan Council Transportation

Opportunity Zones: Tax Benefits & Issues Presented by: Doug Garner, CPA - Tax Manager Rick

Finance & Administrative Affairs Update Faculty Senate May 7, 2015 Robin Van Harpen, Vice

School Improvement Plan Island Hopping 2016-17 Strategy Islands Shoreline - The SIP Map

Rol ole e of of DE in in Lo Low Car arbon bon Bui uildings ldings of of th the e fut

The next BIG thing in Cleaning Presenter: Michael Parks, Six Sigma, Lean, CIMS Learning

NSEC Overview 1 NSEC Overview 2 NSEC Overview 3 NSEC Participants Growth of Nanoscale

Federal Bar Association Tax Law Conference Tax Accounting Committee March 8, 2019 Panelists:

A Utility-based Approach for Estimating Conditional Probability of - PowerPoint PPT Presentation

A Utility-based Approach for Estimating Conditional Probability of Human Lung Cancer From Microarray Data Craig Friedman (craig.friedman@cims.nyu.edu) Wenbo Cao (wcao@gc.cuny.edu ) Introduction Measuring model performance and

Estimating Variance under Estimating Mean . . . Interval and Fuzzy Estimating Variance . . .

Estimating Estimating Covariance . . . Statistical Characteristics Estimating . . . Proof of

Planning III-A: Planning III-A: Estimating Software Size - Estimating Software Size -

Estimating Frequency Moments Estimating F 0 Algorithm Correctness Further Anil Maheshwari

Estimating Frequency Moments Moments Estimating F 0 Algorithm Correctness Anil Maheshwari

Review: Conditional Probability Conditional Probability The conditional probability of event

11/15/16 Conditional distributions Let X and Y be discrete r.v.s. Conditional probability mass

Utility Flood SOLUTIONS November 9, 2017 UTILITY LIGHTING PRODUCTS 1 1 HO HOWARD WARD

Estimating Relative Expression Mark Voorhies 4/6/2011 Mark Voorhies Estimating Relative

Outline Outline Conditional Expected Value Conditional Expected Value Chapman

Conditional Statements Python Conditional Statements Sometimes a statement (or a block of

Markov random fields 2. conditional specifications 3. conditional auto-regression Rasmus

Formal Modeling in Cognitive Science Independence Lecture 23: Conditional Probability; Bayes

Conditional Sentences as Conditional Speech Acts Workshop Questioning Speech Acts Universitt

Conditional Probability &amp; Independence Conditional Probabilities Question : How should we

P( ) 1 conditional probability where P(F) &gt; 0 Conditional probability of E given F:

MnDOT MnDOT Planning Planning g Update Update p Metropolitan Council Transportation

Opportunity Zones: Tax Benefits &amp; Issues Presented by: Doug Garner, CPA - Tax Manager Rick

Finance &amp; Administrative Affairs Update Faculty Senate May 7, 2015 Robin Van Harpen, Vice

School Improvement Plan Island Hopping 2016-17 Strategy Islands Shoreline - The SIP Map

Rol ole e of of DE in in Lo Low Car arbon bon Bui uildings ldings of of th the e fut

The next BIG thing in Cleaning Presenter: Michael Parks, Six Sigma, Lean, CIMS Learning

NSEC Overview 1 NSEC Overview 2 NSEC Overview 3 NSEC Participants Growth of Nanoscale

Federal Bar Association Tax Law Conference Tax Accounting Committee March 8, 2019 Panelists:

Conditional Probability & Independence Conditional Probabilities Question : How should we

P( ) 1 conditional probability where P(F) > 0 Conditional probability of E given F:

Opportunity Zones: Tax Benefits & Issues Presented by: Doug Garner, CPA - Tax Manager Rick

Finance & Administrative Affairs Update Faculty Senate May 7, 2015 Robin Van Harpen, Vice