Estimating Discrete Choice Models with Market Level Zeroes: An - PowerPoint PPT Presentation

Estimating Discrete Choice Models with Market Level Zeroes: An Application to Scanner Data Amit Gandhi, Zhentong Lu, Xiaoxia Shi University of Wisconsin-Madison February 3, 2015

Introduction I Zeroes are highly prevalent in choice data I =Discrete choice models (a la McFadden) were designed to explain corner solutions in individual demand I Our research program : The empirical analysis of choice data with market zeroes. I Zero demand for a choice alternative after summing over sample of consumers in a market I A major feature of choice data from a diversity of environments I Causes serious problems for standard estimation techniques

Scanner Data I Store level scanner data covering all Dominick’s Finer Foods (DFF) stores in Chicago from 1989-1997 I ⇡ 80 stores over 300 weeks I For each week/store/UPC (universal product code) observation: I price I quantity I marketing (display, feature etc) I product characteristics (brand, size, premium etc) I wholesale price

Product Variety Avg No of Percent of Total Percent of Zero Category UPC’s in a Sale of the Top Sales Store/Week 20% UPC’s Analgesics 224 80.12% 58.02% Beer 179 87.18% 50.45% Bottled Juices 187 74.40% 29.87% Cereals 212 72.08% 27.14% Canned Soup 218 76.25% 19.80% Fabric Softeners 123 65.74% 43.74% Laundry Detergents 200 65.52% 50.46% Refrigerated Juices 91 83.18% 27.83% Soft Drinks 537 91.21% 38.54% Toothbrushes 137 73.69% 58.63% Canned Tuna 118 82.74% 35.34% Toothpastes 187 74.19% 51.93% Bathroom Tissues 50 84.06% 28.14%

Long Tail

A “Big Data” Problem I Quan and Williams (2014): data on 13.5 million shoe sales across 100,000 products from online retailer

A “Big Data” Problem I Marwell (2014): collects daily data on project donations to Kickstarter Kickstarter #Daily %Zero #Projects 90,876 Projects Contribution #Days 555 Mean 4,713 0.59 Project-Day Obs. 2,615,839 Std. Dev 1,706 0.14

A “not so Big Data” Problem I Nurski and Verboven (2013): Belgian data on 488 car models in 588 towns for 2 consumer types (men and women).

Discrete Choice Model I Classic McFadden (1973, 1980) discrete choice model I Markets are t = 1 , . . . , T are the store/week realizations (a menu of products, prices, and promotion) I Products j = 1 , . . . , J t with attributes x jt 2 R d w I Consumers i = 1 , . . . , N t with “demographics” w it 2 R d z ( d jt + w it Γ x jt + e ijt if j > 0 u ijt = if j = 0 e i 0 t I BLP (1995) add the new layer: = b x jt + x jt d jt

The Zeroes Problem I Consider simplest case of “simple logit” ( Γ = 0 ) ✓ s jt ◆ = b x jt + x jt jt = 1 , . . . , JT log s 0 t where E [ x jt | z jt ] = 0 . I If s jt = 0 then log ( s jt ) does not exist (or can only be defined as � ∞ ). I However dropping zeroes induces selection bias E [ x jt | z jt , s jt > 0 ] 6 = 0 I IV estimation asymptotically biased for b (which can be severe). I Will depend on the strength of this selection effect.

Questions I Why would the model generate an estimating equation that can’t be estimated with the data? I Is it a deep rejection of the choice model? I Is it a problem with the empirical strategy of taking model to data?

Identification I For simplicity focus on “simple logit” ( Γ = 0 ) e d jt p jt = j = 0 , . . . , J t . ∑ J t k = 0 e d kt I consumer variation ✓ p jt ◆ = d jt log p 0 t s � 1 = ( p t ) j I products/markets variation = b x jt + x jt = d jt ) � ⇥ ⇤� � 1 E ⇥ ⇤ z 0 z 0 = E jt x jt b jt d jt

Estimation I Standard estimation (aka BLP) uses sample analogues of both stages. p MLE I Replace p jt with ˆ = j s jt = ∑ i y ijt n t which implies d MLE ˆ = s � 1 ( s jt ) jt j I Plug ˆ d MLE into 2SLS jt ! � 1 T h i T J t J t ⇥ ⇤ b z 0 z 0 jt ˆ d MLE ∑ ∑ ∑ ∑ b = jt x jt jt t = 1 j = 1 t = 1 j = 1

What is happening? � ˆ � I Source of problem is ˆ d jt = s � 1 p MLE j t p MLE I does not exist when ˆ = 0 jt I Why use MLE in the first place? I MLE is a potentially bad when choice data is sparse. I Very old problem I Laplace’s “Law of Succession” I Multinomial cell probabilities and sparse contingency tables I Zeroes arise when some p jt ’s are small and n t is finite. I Treating n t as finite but JT ! ∞ makes p jt and hence d jt an incidental parameter .

Bayesian Analysis of Multinomial Cells I Consider multinomial probabilities p t = ( p 0 t , . . . , p J t t ) 2 ∆ J t I We observe quantities q t = ( q 0 t , q 1 t , . . . , q J t t ) for n t consumers I The likelihood of p t is q t ⇠ MN ( n t , p t ) I Conjugate prior is p t ⇠ Dir ( a 0 t , . . . , a J t t ) I Uniform prior: a jt = 1 (Laplace/De Morgan) I Non-informative prior: a jt = . 5 (Jeffreys/Bernardo) I Posterior is p t | q t , n t ⇠ Dir ( a 0 t + q 0 t , a 1 t + q 1 t , . . . , a J t t + q J t t )

Laplace’s “Law of Succession” I “What is the probability the sun will rise tomorrow given that it has risen everyday until now?” I He used a uniform prior a jt = 1 I Bayesian estimate ˆ p jt = q jt + 1 E [ p jt | q t , n t ] = n t + J t + 1 I ˆ p jt “shrinks” empirical share s jt towards prior mean 1 / ( J t + 1 ) I ˆ p jt is consistent (like s jt ), i.e., ˆ p jt ! p p jt I Data dominates the prior in large samples

Demand Application I We want to estimate ˆ d jt =  ✓ p jt ◆ � E log | q t , n t = y ( a jt + q jt ) � y ( a 0 t + q 0 t ) p 0 t where y is the digamma function. I Use ˆ d t to compute “optimal market shares” � ˆ � exp d kt p ⇤ kt = � ˆ � 1 + ∑ J t j = 1 exp d jt I Plug optimal shares into 2SLS ! � 1 T h i T J t ⇥ ⇤ J t b z 0 z 0 jt s � 1 ( p ⇤ ∑ ∑ ∑ ∑ b = jt x jt t ) j t = 1 t = 1 j = 1 j = 1

Why is this a good estimator? I We take a “Frequentist” interpretation of the prior I “Empirical Bayes” approach. I Choice probabilities p t are the endogenous variable of the structural model. I Let z t = ( z 1 t , . . . , z J t t ) be the the collection of exogenous variables I Then the conditional distribution p t | z t is the reduced form of the structural model I Prior distribution = Reduced form

Asymptotic Bias I Finite n t implies ˆ b will in general have asymptotic bias. I plim JT ! ∞ ˆ b = h ⇣ ⌘i b + Q � 1 z 0 s � 1 p t ) � s � 1 xz E jt ( ˆ jt ( p t ) jt h i z 0 where Q xz = E jt x jt . Theorem If optimal market shares p ⇤ t are constructed from the “correct prior” F p | z t = F 0 p t | z t then h ⇣ ⌘i z 0 s � 1 jt ( p ⇤ t ) � s � 1 jt ( p t ) = 0 E jt I Thus optimal market shares give consistent estimates ˆ b ! p b .

Robust Prior I What happens if we are not exactly right about prior, i.e., F p | z t ⇡ F 0 p t | z t ? I Use the “Robust Priors” approach of Arellano and Bonhomme (ECMA 2009). Theorem If the prior F t 6 = F 0 t is not exact then h ⇣ ⌘i � � + o � � z 0 s � 1 p t ) � s � 1 = n � 1 F 0 n � 1 jt ( ˆ jt ( p t ) E t KLIC t , F t jt t I So long as prior is sensible (and n t relatively large) the bias reduction will be good (and much better than the the implicit MLE prior)

Dirichlet and the Long Tail I Dirichlet is a conjugate prior I gives closed form optimal shares p ⇤ t I Dirichlet prior also gives rise to the long tail I A key feature of demand data. Theorem If q t ⇠ MN ( p t , n t ) and p t ⇠ Dir ( a · 1 J t + 1 ) (symmetric Dirichlet) then (for large J t ) the quantity histogram will exhibit the long tail shape (Pareto decay) I A restatement of Chen (1980) on probability foundations for Zipf’s Law I a is the concentration parameter

An Illustration 500 Products and 10,000 consumers Figure : Zipf’s Law and the Symmetric Dirichlet

Picking the Prior I Jeffrey’s prior p t | z t ⇠ Dir ( . 5 · 1 J t + 1 ) I If p t ⇠ Dir ( a · 1 J t + 1 ) and q t ⇠ MN ( p t , n t ) then q t ⇠ DirichletMultinomial ( a ) I ˆ a can be estimated with MLE. I More generally we can allow a jt = g z jt and estimate ˆ g (built into Stata). I We can also allow for mixtures of Dirichlet priors for increased flexibility at little analytic cost I Posterior is also a mixture of Dirichlet distributions

Mixed Logit I All the theory generalizes to mixed logit models: Bayes ) 0 = arg min m Bayes m Bayes ( ˆ l 0 Bayes , ˆ b 0 ( l , b ) 0 W T ¯ l , b ¯ ( l , b ) , T T (0.1) ( l , b ) = T � 1 ∑ T m Bayes t = 1 m Bayes ( l , b ) with where ¯ t T J t m Bayes ( p Bayes ( l , b ) = J � 1 ∑ z jt [ s � 1 , x t ; l ) � x 0 jt b ] . (0.2) t t t j j = 1 and p Bayes : = s ( d post ( l | q t ) , x t ; l ) . (0.3) t t ⇣ p jt ⌘ I s � 1 ( p t , x t ; l ) ⇡ log with second order j p 0 t approximation (Gandhi and Nevo 2013) I Log of zero is the first order problem for mixed logit I Can use logit optimal shares as an approximation to the optimal shares in general.

Monte Carlo I: Binary Logit I DGP ( a + b x t + x t + e it inside good I utility function: u it = e 0 t outside good I random draws: x t ⇠ Uniform [ 0 , 15 ] , e it ⇠ T 1 EV , � 0 , . 5 2 � ⇥ x t x t ⇠ N I b = � 1 , a varies to produce different fractions of zeros I Results Fraction of Zeros 16.48% 36.90% 49.19% 63.70% Empirical Share .3833 .6589 .7965 .9424 Laplace Share .2546 .5394 .6978 .8476 Optimal Share -.0798 -.0924 -.0066 .0362 Note: T = 500 , n = 10 , 000 , Number of Repetitions = 1 , 000 .

Estimating Discrete Choice Models with Market Level Zeroes: An - PowerPoint PPT Presentation

Estimating Discrete Choice Models with Market Level Zeroes: An Application to Scanner Data Amit Gandhi, Zhentong Lu, Xiaoxia Shi University of Wisconsin-Madison February 3, 2015 Introduction I Zeroes are highly prevalent in choice data I

Estimating Variance under Estimating Mean . . . Interval and Fuzzy Estimating Variance . . .

Choice Set Optimization Under Discrete Choice Models of Group Decisions Kiran Tomlinson and

Estimating Estimating Covariance . . . Statistical Characteristics Estimating . . . Proof of

MOBILITY CHOICE STUDY MOBILITY CHOICE STUDY MOBILITY CHOICE STUDY Planning for Mobility in

Planning III-A: Planning III-A: Estimating Software Size - Estimating Software Size -

Estimating Frequency Moments Estimating F 0 Algorithm Correctness Further Anil Maheshwari

Estimating Frequency Moments Moments Estimating F 0 Algorithm Correctness Anil Maheshwari

Discrete choice analysis & taboos Caspar Chorus 5-6-2019 Professor of choice behavior

Voting in Maines Ranked Choice Election A non-partisan guide to ranked choice elections

Homecare Choice Program Presented by Jenny Cokeley Homecare Choice Program Manager Homecare

Hidden Markov Models Discrete Markov Processes 1 Hidden Markov Models Hidden Markov Models 2

Discrete Mathematics Jeremy Siek Spring 2010 Jeremy Siek Discrete Mathematics 1 / 118 Jeremy

Cyber-Physical Systems Discrete Dynamics IECE 553/453 Fall 2019 Prof. Dola Saha 1 Discrete

CMSC 222: Discrete Mathematics Prof S Fall 2018 What is Discrete Mathematics? Discrete

Cyber-Physical Systems Discrete Dynamics ICEN 553/453 Fall 2018 Prof. Dola Saha 1 Discrete

Plan Discrete paths as Heyting algebras Discrete paths as categories Discrete paths as quantales

International taxation and company tax policy in small open economies by George R. Zodrow Iris

A centralised ElasticSearch service: Design ideas and status Motivation for a centralised ES

BlackBerry 10 Cascades UI FW: A Different Take Markus Landin, Product Manager, Research In Motion

Shield your cluster Security with Elasticsearch Alexander Reelsen @spinscale alex@elastic.co

Variational methods in fluid-structure interactions: Dynamics, dissipation, constraints, and

Model Equation, Stability and Dynamics for Wavepacket Solitary Waves Paul Milewski Mathematics,

Synchronous Elastic Systems Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella

The Risk Channel of Unconventional Monetary Policy Dejanir Silva UIUC dejanir@illinois.edu

Estimating Discrete Choice Models with Market Level Zeroes: An - PowerPoint PPT Presentation

Estimating Discrete Choice Models with Market Level Zeroes: An Application to Scanner Data Amit Gandhi, Zhentong Lu, Xiaoxia Shi University of Wisconsin-Madison February 3, 2015 Introduction I Zeroes are highly prevalent in choice data I

Estimating Variance under Estimating Mean . . . Interval and Fuzzy Estimating Variance . . .

Choice Set Optimization Under Discrete Choice Models of Group Decisions Kiran Tomlinson and

Estimating Estimating Covariance . . . Statistical Characteristics Estimating . . . Proof of

MOBILITY CHOICE STUDY MOBILITY CHOICE STUDY MOBILITY CHOICE STUDY Planning for Mobility in

Planning III-A: Planning III-A: Estimating Software Size - Estimating Software Size -

Estimating Frequency Moments Estimating F 0 Algorithm Correctness Further Anil Maheshwari

Estimating Frequency Moments Moments Estimating F 0 Algorithm Correctness Anil Maheshwari

Discrete choice analysis &amp; taboos Caspar Chorus 5-6-2019 Professor of choice behavior

Voting in Maines Ranked Choice Election A non-partisan guide to ranked choice elections

Homecare Choice Program Presented by Jenny Cokeley Homecare Choice Program Manager Homecare

Hidden Markov Models Discrete Markov Processes 1 Hidden Markov Models Hidden Markov Models 2

Discrete Mathematics Jeremy Siek Spring 2010 Jeremy Siek Discrete Mathematics 1 / 118 Jeremy

Cyber-Physical Systems Discrete Dynamics IECE 553/453 Fall 2019 Prof. Dola Saha 1 Discrete

CMSC 222: Discrete Mathematics Prof S Fall 2018 What is Discrete Mathematics? Discrete

Cyber-Physical Systems Discrete Dynamics ICEN 553/453 Fall 2018 Prof. Dola Saha 1 Discrete

Plan Discrete paths as Heyting algebras Discrete paths as categories Discrete paths as quantales

International taxation and company tax policy in small open economies by George R. Zodrow Iris

A centralised ElasticSearch service: Design ideas and status Motivation for a centralised ES

BlackBerry 10 Cascades UI FW: A Different Take Markus Landin, Product Manager, Research In Motion

Shield your cluster Security with Elasticsearch Alexander Reelsen @spinscale alex@elastic.co

Variational methods in fluid-structure interactions: Dynamics, dissipation, constraints, and

Model Equation, Stability and Dynamics for Wavepacket Solitary Waves Paul Milewski Mathematics,

Synchronous Elastic Systems Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella

The Risk Channel of Unconventional Monetary Policy Dejanir Silva UIUC dejanir@illinois.edu

Discrete choice analysis & taboos Caspar Chorus 5-6-2019 Professor of choice behavior