Estimating Discrete Choice Models with Market Level Zeroes: An Application to Scanner Data Amit Gandhi, Zhentong Lu, Xiaoxia Shi University of Wisconsin-Madison February 3, 2015
Introduction I Zeroes are highly prevalent in choice data I =Discrete choice models (a la McFadden) were designed to explain corner solutions in individual demand I Our research program : The empirical analysis of choice data with market zeroes. I Zero demand for a choice alternative after summing over sample of consumers in a market I A major feature of choice data from a diversity of environments I Causes serious problems for standard estimation techniques
Scanner Data I Store level scanner data covering all Dominick’s Finer Foods (DFF) stores in Chicago from 1989-1997 I ⇡ 80 stores over 300 weeks I For each week/store/UPC (universal product code) observation: I price I quantity I marketing (display, feature etc) I product characteristics (brand, size, premium etc) I wholesale price
Product Variety Avg No of Percent of Total Percent of Zero Category UPC’s in a Sale of the Top Sales Store/Week 20% UPC’s Analgesics 224 80.12% 58.02% Beer 179 87.18% 50.45% Bottled Juices 187 74.40% 29.87% Cereals 212 72.08% 27.14% Canned Soup 218 76.25% 19.80% Fabric Softeners 123 65.74% 43.74% Laundry Detergents 200 65.52% 50.46% Refrigerated Juices 91 83.18% 27.83% Soft Drinks 537 91.21% 38.54% Toothbrushes 137 73.69% 58.63% Canned Tuna 118 82.74% 35.34% Toothpastes 187 74.19% 51.93% Bathroom Tissues 50 84.06% 28.14%
Long Tail
Long Tail
A “Big Data” Problem I Quan and Williams (2014): data on 13.5 million shoe sales across 100,000 products from online retailer
A “Big Data” Problem I Marwell (2014): collects daily data on project donations to Kickstarter Kickstarter #Daily %Zero #Projects 90,876 Projects Contribution #Days 555 Mean 4,713 0.59 Project-Day Obs. 2,615,839 Std. Dev 1,706 0.14
A “not so Big Data” Problem I Nurski and Verboven (2013): Belgian data on 488 car models in 588 towns for 2 consumer types (men and women).
Discrete Choice Model I Classic McFadden (1973, 1980) discrete choice model I Markets are t = 1 , . . . , T are the store/week realizations (a menu of products, prices, and promotion) I Products j = 1 , . . . , J t with attributes x jt 2 R d w I Consumers i = 1 , . . . , N t with “demographics” w it 2 R d z ( d jt + w it Γ x jt + e ijt if j > 0 u ijt = if j = 0 e i 0 t I BLP (1995) add the new layer: = b x jt + x jt d jt
The Zeroes Problem I Consider simplest case of “simple logit” ( Γ = 0 ) ✓ s jt ◆ = b x jt + x jt jt = 1 , . . . , JT log s 0 t where E [ x jt | z jt ] = 0 . I If s jt = 0 then log ( s jt ) does not exist (or can only be defined as � ∞ ). I However dropping zeroes induces selection bias E [ x jt | z jt , s jt > 0 ] 6 = 0 I IV estimation asymptotically biased for b (which can be severe). I Will depend on the strength of this selection effect.
Questions I Why would the model generate an estimating equation that can’t be estimated with the data? I Is it a deep rejection of the choice model? I Is it a problem with the empirical strategy of taking model to data?
Identification I For simplicity focus on “simple logit” ( Γ = 0 ) e d jt p jt = j = 0 , . . . , J t . ∑ J t k = 0 e d kt I consumer variation ✓ p jt ◆ = d jt log p 0 t s � 1 = ( p t ) j I products/markets variation = b x jt + x jt = d jt ) � ⇥ ⇤� � 1 E ⇥ ⇤ z 0 z 0 = E jt x jt b jt d jt
Estimation I Standard estimation (aka BLP) uses sample analogues of both stages. p MLE I Replace p jt with ˆ = j s jt = ∑ i y ijt n t which implies d MLE ˆ = s � 1 ( s jt ) jt j I Plug ˆ d MLE into 2SLS jt ! � 1 T h i T J t J t ⇥ ⇤ b z 0 z 0 jt ˆ d MLE ∑ ∑ ∑ ∑ b = jt x jt jt t = 1 j = 1 t = 1 j = 1
What is happening? � ˆ � I Source of problem is ˆ d jt = s � 1 p MLE j t p MLE I does not exist when ˆ = 0 jt I Why use MLE in the first place? I MLE is a potentially bad when choice data is sparse. I Very old problem I Laplace’s “Law of Succession” I Multinomial cell probabilities and sparse contingency tables I Zeroes arise when some p jt ’s are small and n t is finite. I Treating n t as finite but JT ! ∞ makes p jt and hence d jt an incidental parameter .
Bayesian Analysis of Multinomial Cells I Consider multinomial probabilities p t = ( p 0 t , . . . , p J t t ) 2 ∆ J t I We observe quantities q t = ( q 0 t , q 1 t , . . . , q J t t ) for n t consumers I The likelihood of p t is q t ⇠ MN ( n t , p t ) I Conjugate prior is p t ⇠ Dir ( a 0 t , . . . , a J t t ) I Uniform prior: a jt = 1 (Laplace/De Morgan) I Non-informative prior: a jt = . 5 (Jeffreys/Bernardo) I Posterior is p t | q t , n t ⇠ Dir ( a 0 t + q 0 t , a 1 t + q 1 t , . . . , a J t t + q J t t )
Laplace’s “Law of Succession” I “What is the probability the sun will rise tomorrow given that it has risen everyday until now?” I He used a uniform prior a jt = 1 I Bayesian estimate ˆ p jt = q jt + 1 E [ p jt | q t , n t ] = n t + J t + 1 I ˆ p jt “shrinks” empirical share s jt towards prior mean 1 / ( J t + 1 ) I ˆ p jt is consistent (like s jt ), i.e., ˆ p jt ! p p jt I Data dominates the prior in large samples
Demand Application I We want to estimate ˆ d jt = ✓ p jt ◆ � E log | q t , n t = y ( a jt + q jt ) � y ( a 0 t + q 0 t ) p 0 t where y is the digamma function. I Use ˆ d t to compute “optimal market shares” � ˆ � exp d kt p ⇤ kt = � ˆ � 1 + ∑ J t j = 1 exp d jt I Plug optimal shares into 2SLS ! � 1 T h i T J t ⇥ ⇤ J t b z 0 z 0 jt s � 1 ( p ⇤ ∑ ∑ ∑ ∑ b = jt x jt t ) j t = 1 t = 1 j = 1 j = 1
Why is this a good estimator? I We take a “Frequentist” interpretation of the prior I “Empirical Bayes” approach. I Choice probabilities p t are the endogenous variable of the structural model. I Let z t = ( z 1 t , . . . , z J t t ) be the the collection of exogenous variables I Then the conditional distribution p t | z t is the reduced form of the structural model I Prior distribution = Reduced form
Asymptotic Bias I Finite n t implies ˆ b will in general have asymptotic bias. I plim JT ! ∞ ˆ b = h ⇣ ⌘i b + Q � 1 z 0 s � 1 p t ) � s � 1 xz E jt ( ˆ jt ( p t ) jt h i z 0 where Q xz = E jt x jt . Theorem If optimal market shares p ⇤ t are constructed from the “correct prior” F p | z t = F 0 p t | z t then h ⇣ ⌘i z 0 s � 1 jt ( p ⇤ t ) � s � 1 jt ( p t ) = 0 E jt I Thus optimal market shares give consistent estimates ˆ b ! p b .
Robust Prior I What happens if we are not exactly right about prior, i.e., F p | z t ⇡ F 0 p t | z t ? I Use the “Robust Priors” approach of Arellano and Bonhomme (ECMA 2009). Theorem If the prior F t 6 = F 0 t is not exact then h ⇣ ⌘i � � + o � � z 0 s � 1 p t ) � s � 1 = n � 1 F 0 n � 1 jt ( ˆ jt ( p t ) E t KLIC t , F t jt t I So long as prior is sensible (and n t relatively large) the bias reduction will be good (and much better than the the implicit MLE prior)
Dirichlet and the Long Tail I Dirichlet is a conjugate prior I gives closed form optimal shares p ⇤ t I Dirichlet prior also gives rise to the long tail I A key feature of demand data. Theorem If q t ⇠ MN ( p t , n t ) and p t ⇠ Dir ( a · 1 J t + 1 ) (symmetric Dirichlet) then (for large J t ) the quantity histogram will exhibit the long tail shape (Pareto decay) I A restatement of Chen (1980) on probability foundations for Zipf’s Law I a is the concentration parameter
An Illustration 500 Products and 10,000 consumers Figure : Zipf’s Law and the Symmetric Dirichlet
Picking the Prior I Jeffrey’s prior p t | z t ⇠ Dir ( . 5 · 1 J t + 1 ) I If p t ⇠ Dir ( a · 1 J t + 1 ) and q t ⇠ MN ( p t , n t ) then q t ⇠ DirichletMultinomial ( a ) I ˆ a can be estimated with MLE. I More generally we can allow a jt = g z jt and estimate ˆ g (built into Stata). I We can also allow for mixtures of Dirichlet priors for increased flexibility at little analytic cost I Posterior is also a mixture of Dirichlet distributions
Mixed Logit I All the theory generalizes to mixed logit models: Bayes ) 0 = arg min m Bayes m Bayes ( ˆ l 0 Bayes , ˆ b 0 ( l , b ) 0 W T ¯ l , b ¯ ( l , b ) , T T (0.1) ( l , b ) = T � 1 ∑ T m Bayes t = 1 m Bayes ( l , b ) with where ¯ t T J t m Bayes ( p Bayes ( l , b ) = J � 1 ∑ z jt [ s � 1 , x t ; l ) � x 0 jt b ] . (0.2) t t t j j = 1 and p Bayes : = s ( d post ( l | q t ) , x t ; l ) . (0.3) t t ⇣ p jt ⌘ I s � 1 ( p t , x t ; l ) ⇡ log with second order j p 0 t approximation (Gandhi and Nevo 2013) I Log of zero is the first order problem for mixed logit I Can use logit optimal shares as an approximation to the optimal shares in general.
Monte Carlo I: Binary Logit I DGP ( a + b x t + x t + e it inside good I utility function: u it = e 0 t outside good I random draws: x t ⇠ Uniform [ 0 , 15 ] , e it ⇠ T 1 EV , � 0 , . 5 2 � ⇥ x t x t ⇠ N I b = � 1 , a varies to produce different fractions of zeros I Results Fraction of Zeros 16.48% 36.90% 49.19% 63.70% Empirical Share .3833 .6589 .7965 .9424 Laplace Share .2546 .5394 .6978 .8476 Optimal Share -.0798 -.0924 -.0066 .0362 Note: T = 500 , n = 10 , 000 , Number of Repetitions = 1 , 000 .
Recommend
More recommend