The Role of DDA for Pricing Information Continuum Poor Rich Information Information • Means • Graphs • Regressions • Machine Learning • Proportions • Tabs • Elasticities • Predictive Modeling OLS Regression Shallow Data Analysis Deep Data Analysis Walter R. Paczkowski, Ph.D. Deep Data Analytics for Pricing October, 2018 29 / 181
The Role of DDA for Pricing Deep Data Analytics Deep Data Analytics is the process of taking raw data bricks/Poor Information, regardless of source, and assembling/converting them into Rich Information using advanced statistical, econometric, and machine learning methods applicable to a data set’s structure. Advantage of Deep Data Analytics Shallow Data Analyses leave untapped information. Deep Data Analyses reveal Rich Information . Walter R. Paczkowski, Ph.D. Deep Data Analytics for Pricing October, 2018 30 / 181
The Role of DDA for Pricing There’s No Such Thing as a Free Lunch Cost Cost of Analytics Cost of Approximation Cost of Analytics: DDA Base Base Analysis Approximation Cost Cost Poor Rich Walter R. Paczkowski, Ph.D. Deep Data Analytics for Pricing October, 2018 31 / 181 Information Continuum
Part IV DDA Drill-down and Case Study Walter R. Paczkowski, Ph.D. Deep Data Analytics for Pricing October, 2018 32 / 181
DDA Drill-down and Case Study Walter R. Paczkowski, Ph.D. Deep Data Analytics for Pricing October, 2018 33 / 181
DDA Drill-down and Case Study Identify Data Structure Walter R. Paczkowski, Ph.D. Deep Data Analytics for Pricing October, 2018 34 / 181
DDA Drill-down and Case Study Data structure and variables are often (most of the time?) overlooked. Both have major implications. Walter R. Paczkowski, Ph.D. Deep Data Analytics for Pricing October, 2018 35 / 181
DDA Drill-down and Case Study Data structure is not multiple data tables. Customer DB Order DB Product DB SQL Engine Pricing Data Mart (PDM) This is data organization, not structure. It’s a convenience for storage and parsing – not analytics. Walter R. Paczkowski, Ph.D. Deep Data Analytics for Pricing October, 2018 36 / 181
DDA Drill-down and Case Study This database framework is important, but not my view of data structure. The PDM from the database structure is part of the Analytical Toolset. Data structure is deeper although, on the one hand, it is shallow or simplistic but, on the other hand, it is complex and subtle. Walter R. Paczkowski, Ph.D. Deep Data Analytics for Pricing October, 2018 37 / 181
DDA Drill-down and Case Study Some variables impose a structure, explicitly or implicitly. Dummy variables impose structure by dividing the data. This is an explicit structural definition. Example Gender, age grouping, commercial/residential, large/small Clustering of cases or variables also divides the data. This is an implicit structural definition. The structure is hidden or latent and must be revealed. Example Segmentation (not a priori ) Walter R. Paczkowski, Ph.D. Deep Data Analytics for Pricing October, 2018 38 / 181
DDA Drill-down and Case Study Case Study: Overview Walter R. Paczkowski, Ph.D. Deep Data Analytics for Pricing October, 2018 39 / 181
DDA Drill-down and Case Study Stores come in different sizes (i.e., selling surfaces). Small storefronts (e.g., Mom & Pops) to ”Big Box” stores (e.g., warehouse clubs). Even within one chain, store sizes vary, perhaps due to geography. Example Grocery stores in Manhattan vs. suburbs of Central Jersey. Walter R. Paczkowski, Ph.D. Deep Data Analytics for Pricing October, 2018 40 / 181
DDA Drill-down and Case Study Store size impacts ”sales lift” forecasting from price promotions. If η Q = 1 + η Q P is the price elasticity, then η TR P . P × (%∆ P ) = (1 + η Q Sales lift: LIFT = η TR P ) × (%∆ P ). P Example If η Q P = − 1 . 8, then η TR = − 0 . 8. If %∆ P = − 0 . 25, then P LIFT = − 0 . 8 × ( − 0 . 25) = 0 . 2 or 20%. Overestimate lift: Storage costs or perishable losses. Underestimate lift: Stock-outs and lost sales, good-will. Walter R. Paczkowski, Ph.D. Deep Data Analytics for Pricing October, 2018 41 / 181
DDA Drill-down and Case Study Question What is the elasticity allowing for store size? Walter R. Paczkowski, Ph.D. Deep Data Analytics for Pricing October, 2018 42 / 181
DDA Drill-down and Case Study Retailers are moving to ”customized pricing practices . . . in which pricing depends on store size and clientele.” Evidence shows that retailers ”price promote more intensely in their large stores” but the evidence is shaky. 2 Large stores tend to be in suburban areas; small in urban. Available real estate is the issue. Implication: more elastic in large stores, but not clear-cut. More elastic in urban areas because of intense competition. More inelastic in suburban areas because of value of time for shopping. Would have to drive to next shopping mall which is time consuming. If value of time out-weighs a price saving, then more inelastic. 2 See Haans and Gijsbrechts (2011, 428). Walter R. Paczkowski, Ph.D. Deep Data Analytics for Pricing October, 2018 43 / 181
DDA Drill-down and Case Study Side Effects by Store Size Large Stores Small Stores Benefits Benefits • Increased parking • Personal treatment • Additional services • More competition • Wider variety • Neighborhood focus • One-stop shopping • Wider variety of stores Costs Costs • Longer distance to travel • Smaller variety • More/longer aisles • Fewer products • Longer checkout time • Frequent store entry/exit • Higher in-store search • Higher store search These will affect price elasticities, but impact is unclear. Walter R. Paczkowski, Ph.D. Deep Data Analytics for Pricing October, 2018 44 / 181
DDA Drill-down and Case Study Case Study: Background Walter R. Paczkowski, Ph.D. Deep Data Analytics for Pricing October, 2018 45 / 181
DDA Drill-down and Case Study Fictional data on a retail chain in New England states. One consumer product Six stores 3 Urban (Small) 3 Suburban (Large) 600 consumers Each consumer’s purchases and prices averaged to one annual number so n = 600. Will discuss aggregation later. Consumer data: Average price paid Household income Average purchase size Walter R. Paczkowski, Ph.D. Deep Data Analytics for Pricing October, 2018 46 / 181
DDA Drill-down and Case Study Case Study: Pooled Model Walter R. Paczkowski, Ph.D. Deep Data Analytics for Pricing October, 2018 47 / 181
DDA Drill-down and Case Study A simple, naive model is a pooled regression based on a Stat 101 data structure. The structure is a simple rectangular array of rows and columns. A simple ”tidy” structure in R terminology. 3 Each variable in the data set is placed in its own column. 1 Each observation is placed in its own row. 2 Each value is placed in its own cell. 3 A Stat 101 – Tidy Data Structure 3 See Wickham and Grolemund (2017). The chart and rules are on p. 149. Walter R. Paczkowski, Ph.D. Deep Data Analytics for Pricing October, 2018 48 / 181
DDA Drill-down and Case Study The model based on this simple data structure is: 4 e β 0 × P β 1 × I β 2 × e ǫ i Q i = i i or ln Q i = β 0 + β 1 × ln P i + β 2 × ln I i + ǫ i where P i is the price paid by the i th consumer and I i is that consumer’s income. The model is ”pooled” since the structure is simple, naive. The price elasticity is simply β 1 . 4 See the Appendix for a discussion of this model. Walter R. Paczkowski, Ph.D. Deep Data Analytics for Pricing October, 2018 49 / 181
DDA Drill-down and Case Study Price elasticity: − 2 . 4 Walter R. Paczkowski, Ph.D. Deep Data Analytics for Pricing October, 2018 50 / 181
The Components of a Pricing Strategy: Stores Pricing • Uniform Structure Pricing Strategy Pricing Effect Level Assessment η = − 2 . 4 Walter R. Paczkowski, Ph.D. Deep Data Analytics for Pricing October, 2018 51 / 181
DDA Drill-down and Case Study This is an example of Shallow Data Analysis. Walter R. Paczkowski, Ph.D. Deep Data Analytics for Pricing October, 2018 52 / 181
DDA Drill-down and Case Study Case Study: Dummy Variable Model Walter R. Paczkowski, Ph.D. Deep Data Analytics for Pricing October, 2018 53 / 181
DDA Drill-down and Case Study Analysts often impose structure with dummies. Example Segment consumers into homogeneous groups, say J segments. Groups could be a priori or derived. Pool all consumers into one model with J − 1 dummies to identify the groups. Let: Y i = Purchase Intent X i = Individual trait or price. Then for J = 4 segments: Y i = β 0 + β 1 X i + γ 1 D 1 + γ 2 D 2 + γ 3 D 3 + ǫ i Effect is to shift intercept for each segment, but maintain the same slope. Walter R. Paczkowski, Ph.D. Deep Data Analytics for Pricing October, 2018 54 / 181
DDA Drill-down and Case Study You could interact dummies with the independent variable(s) to change the slope(s): β 0 + β 1 X i + γ 1 D 1 × X i + γ 2 D 2 × X i + γ 3 D 3 × X i + ǫ i Y i = Or, you could do both: Y i = β 0 + β 1 X i + γ 1 D 1 + γ 2 D 2 + γ 3 D 3 + γ 4 D 1 × X i + γ 5 D 2 × X i + γ 6 D 3 × X i + ǫ i Regardless of model specification, you can calculate elasticities. Walter R. Paczkowski, Ph.D. Deep Data Analytics for Pricing October, 2018 55 / 181
DDA Drill-down and Case Study Location dummy as proxy for store size can be added to the basic store model. A model is: e β 0 + γ 1 × Location i × P β 1 + γ 2 × Location i × I β 2 × e ǫ i Q i = i i or ln Q i = β 0 + γ 1 × Location i + β 1 × ln P i + γ 2 × Location i × ln P i + ǫ i where � 1 if Suburban Location i = 0 if Urban Walter R. Paczkowski, Ph.D. Deep Data Analytics for Pricing October, 2018 56 / 181
DDA Drill-down and Case Study Store price elasticities: 5 : Urban β 1 β 1 + γ 2 : Suburban 5 See the Appendix for the elaticities. Walter R. Paczkowski, Ph.D. Deep Data Analytics for Pricing October, 2018 57 / 181
DDA Drill-down and Case Study Urban elasticity: − 1 . 2 Suburban: − 0 . 6 Walter R. Paczkowski, Ph.D. Deep Data Analytics for Pricing October, 2018 58 / 181
The Components of a Pricing Strategy: Stores Pricing • 3 rd Discrimination Structure Pricing Strategy Pricing Effect Level Assessment • Urban: η = − 1 . 2 • Urban: Low Price • Suburban: High Price • Suburban: η = − 0 . 6 Walter R. Paczkowski, Ph.D. Deep Data Analytics for Pricing October, 2018 59 / 181
DDA Drill-down and Case Study Walter R. Paczkowski, Ph.D. Deep Data Analytics for Pricing October, 2018 60 / 181
DDA Drill-down and Case Study Problems with dummy variable approach: 1 Proliferation of parameters from dummies. 2 Inefficient use of data – ignores a hierarchical data structure. 3 Shifts in parameters (intercepts and slopes) are fixed effects. 4 Does not allow for key drivers for the dummies. Everyone in a segment behaves the same way – but what drives that behavior? For the stores example, not all stores (or customers) are included; only a sample of stores is used. Random effects due to sampling are excluded. Walter R. Paczkowski, Ph.D. Deep Data Analytics for Pricing October, 2018 61 / 181
DDA Drill-down and Case Study Latent Regression Analysis: Another View of Structure Walter R. Paczkowski, Ph.D. Deep Data Analytics for Pricing October, 2018 62 / 181
DDA Drill-down and Case Study A data structure may be hidden or latent. No explicit variables such as location or size. Need to uncover or reveal a latent structure. This structure is implicit in the data. Walter R. Paczkowski, Ph.D. Deep Data Analytics for Pricing October, 2018 63 / 181
DDA Drill-down and Case Study You can estimate segments and elasticities simultaneously using latent class regression . 6 This makes more efficient use of the data. This class of models tries to find a latent variable that explains or determines or drives variables we can measure. Example You cannot measure or observe religious preference, a latent variable, but this may determine consumption patterns of some products. 6 See Paczkowski (2018) for an extensive discussion. Walter R. Paczkowski, Ph.D. Deep Data Analytics for Pricing October, 2018 64 / 181
DDA Drill-down and Case Study 7 7 Based on Collins and Lanza (2010, 5). Walter R. Paczkowski, Ph.D. Deep Data Analytics for Pricing October, 2018 65 / 181
DDA Drill-down and Case Study There is a plethora of (sometimes confusing) types. Latent Class Analysis ( LCA ) Dependent variable is discrete/categorical. Latent Profile Analysis ( LPA ) Latent Regression Analysis ( LRA ) All have a similar characteristic – there is a hidden, unknown factor or class or segment or group that drives or determines the results we see. What we see or observe is sometimes said to be realized . The results point to or indicate the latent class. Observed behaviors provide clues to the latent variables that drive those behaviors. Walter R. Paczkowski, Ph.D. Deep Data Analytics for Pricing October, 2018 66 / 181
DDA Drill-down and Case Study The Latent Regression model recognizes that a plain regression model on a super-population without allowing for classes involves estimating a single set of parameters across all observations. Estimate a pooled model This may be misleading if observations come from a number of unknown heterogeneous groups with different parameter values, θ S . Latent Regression has been developed consistent with the ordinary OLS family: normal; binary; and count. Walter R. Paczkowski, Ph.D. Deep Data Analytics for Pricing October, 2018 67 / 181
DDA Drill-down and Case Study Key Assumption There is one discrete latent variable with several classes or segments. Each individual belongs to a class but which one is unknown. The indicator variables point to a class but they could just as well point to several classes at once. The error associated with each indicator variable helps to hide which class. So there is a probability that an individual belongs to a class with sum of probabilities over the classes equaling 1.0. Walter R. Paczkowski, Ph.D. Deep Data Analytics for Pricing October, 2018 68 / 181
DDA Drill-down and Case Study Assume we have n objects (e.g., people or firms in a survey or database) and that the i th object has T i records or observations where the number may differ for each object. The total number of observations is T = � n i =1 T i . Each object has one response for each record, y it , t = 1 , . . . , T i . Example A store i has T i = 12 responses for 12 months. These responses can be continuous, nominal, or counts. Each response appears in its own record with an ID variable connecting them. This is a repeated measures format. Walter R. Paczkowski, Ph.D. Deep Data Analytics for Pricing October, 2018 69 / 181
DDA Drill-down and Case Study There are two types of independent variables. 1 Predictors Q predictors: z pred itq , 1 ≤ q ≤ Q . The predictors may vary by object i and repeated measure t so they are used to predict the response. Example Price is a predictor. 2 Covariates R covariates: z cov , 1 ≤ r ≤ R . ir The covariates do not vary by repeated measure for an object, but they do vary by objects. The covariates help predict class membership. Example Firm size and other firmographics are covariates. Walter R. Paczkowski, Ph.D. Deep Data Analytics for Pricing October, 2018 70 / 181
DDA Drill-down and Case Study This is a two-level data structure. Low-level replications within a high-level object. The predictor variables are for the low-level replications. The covariates are for the high-level objects. Walter R. Paczkowski, Ph.D. Deep Data Analytics for Pricing October, 2018 71 / 181
DDA Drill-down and Case Study For the latent variable, there is one variable x with K categories or levels or classes or segments, 1 ≤ k ≤ K . Walter R. Paczkowski, Ph.D. Deep Data Analytics for Pricing October, 2018 72 / 181
DDA Drill-down and Case Study The regression model is from the GLM family with parameters differing across the latent classes. This is how we get price elasticities by segments or classes. The GLM family includes a number of members. In statistics, the generalized linear model (GLM) is a flexible generalization of ordinary linear regression that allows for response variables that have error distribution models other than a normal distribution. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a link function and by allowing the magnitude of the variance of each measurement to be a function of its predicted value. 8 8 https://en.wikipedia.org/wiki/Generalized linear model. Last accessed August 11, 2015 Walter R. Paczkowski, Ph.D. Deep Data Analytics for Pricing October, 2018 73 / 181
DDA Drill-down and Case Study The model structure is not simple – there are two parts. 1 General probability structure. Explains how the responses (i.e., dependent variable) are generated. This is a general mixture model probability structure that defines the relationships between the exogenous, latent, and response variables. This is a probability density corresponding to a particular set of y i values given a particular set of exogenous values. 2 Conditional distributions. An assumed distributional form for the response variables, which depends on the scale types of the variables concerned. Walter R. Paczkowski, Ph.D. Deep Data Analytics for Pricing October, 2018 74 / 181
DDA Drill-down and Case Study The general probability structure is: K � Pr ( x | z cov ) × f ( y it | x , z pred f ( y it | z i ) = ) i it x =1 K T i � � f ( y it | x , z pred Pr ( x | z cov = ) × ) i it x =1 t =1 where the responses are assumed to be independent and f is a probability density function. The Pr ( x | z cov ) is a mixture weight . i Walter R. Paczkowski, Ph.D. Deep Data Analytics for Pricing October, 2018 75 / 181
DDA Drill-down and Case Study Since the basic probability structure is conditional, we need the conditional distributions. For continuous dependent variables, we use the normal distribution. For nominal data we use a Binomial. For count data we use a Poisson. The normal distribution can be a truncated normal if y it > 0 or a censored normal if y it ≥ 0 but with many y it = 0. Walter R. Paczkowski, Ph.D. Deep Data Analytics for Pricing October, 2018 76 / 181
DDA Drill-down and Case Study The latent variable probabilities (i.e., the mixture weights) are multinomial: e η x | zcov it Pr ( x | z cov ) = η x ′ | zcov it � x ′ =1 Ke it where the η functions are linear in the parameters combinations of the covariates. Walter R. Paczkowski, Ph.D. Deep Data Analytics for Pricing October, 2018 77 / 181
DDA Drill-down and Case Study Walter R. Paczkowski, Ph.D. Deep Data Analytics for Pricing October, 2018 78 / 181
DDA Drill-down and Case Study Model estimation is based on maximum likelihood. 9 Estimation requires knowing how many classes exist. We don’t know this – this is part of the problem to solve. Procedure is to specify the number of classes, K , and examine basic fit statistics to determine which models do better. Typical measures are AIC and BIC . − 2 × ln L + 2 × (Number of Parameters) AIC = BIC = − 2 × ln L + (ln N ) × (Number of Parameters) where N is the total sample size. Choose the model with the lowest AIC or BIC . 9 Use an EM algorithm for computation (sometimes augmented by Newton-Raphson). Walter R. Paczkowski, Ph.D. Deep Data Analytics for Pricing October, 2018 79 / 181
DDA Drill-down and Case Study Once the parameters have been estimated, posterior class membership probabilities can be calculated. These give the probability of each object belonging to a class. For each object, the probabilities sum to 1.0 and give the chance of the object belonging to each class. This means that we have a ”fuzzy” solution unlike for hierarchical clustering and decision trees where the object is assigned to a single, unique class. 10 10 There are fuzzy clustering approaches but not widely used. Walter R. Paczkowski, Ph.D. Deep Data Analytics for Pricing October, 2018 80 / 181
DDA Drill-down and Case Study The predictor variables are used to estimate the parameters for the latent classes – we still need to predict or profile or characterize the classes. In an explanatory study, we want to predict class membership. In a descriptive study, we want to profile the classes based on a set of variables. Price segmentation is really a combination since profiling is just as important as predicting considering the marketing mix concept. Walter R. Paczkowski, Ph.D. Deep Data Analytics for Pricing October, 2018 81 / 181
DDA Drill-down and Case Study There are two ways to handle the covariates. 1 One-step Approach Estimate a model with predictor and covariate variables. Involves estimating the class model and the mixing probabilities simultaneously. This is the framework I described above. Most software in this area allows this. 2 Three-step Approach Estimate class model with just predictor variables 1 Assign objects to a class. 2 Estimate separate (usually logit) model of class membership using the 3 covariates. Walter R. Paczkowski, Ph.D. Deep Data Analytics for Pricing October, 2018 82 / 181
DDA Drill-down and Case Study The one-step approach has an advantage — it’s one step!. But it has major issues: 1 It is impractical when the number of covariates is large as is typical in many explanatory studies. Each time a covariate is added or deleted, the model must be reestimated. 2 You have to decide on the type of model: with or without covariates. 3 Most researchers view modeling as adding covariates after the classes are developed – profiling from the old cluster/discriminant approach. Walter R. Paczkowski, Ph.D. Deep Data Analytics for Pricing October, 2018 83 / 181
DDA Drill-down and Case Study The three-step approach involves: 1 Estimating the latent classes. 2 Assign subjects to a class using the posterior probabilities based on the observed responses and the estimated parameters from the first step. There is one posterior probability per class and the probabilities sum to 1.0. This holds for each object. Modal approach: assign to the class with the largest posterior probability. Proportional assignment. 3 Regress the estimated class memberships on the covariates. Assumes that the assignments are actual memberships. Walter R. Paczkowski, Ph.D. Deep Data Analytics for Pricing October, 2018 84 / 181
DDA Drill-down and Case Study Case Study: Several models were estimated. Models with more than four segments had very small segment sizes ( < 1%) so dropped anything with more than four. Three segments did well, but didn’t seem practical for this market. So stayed with four segments. Estimated with and without covariates. Without covariates, segment four was < 1%, so kept covariates. Walter R. Paczkowski, Ph.D. Deep Data Analytics for Pricing October, 2018 85 / 181
DDA Drill-down and Case Study Walter R. Paczkowski, Ph.D. Deep Data Analytics for Pricing October, 2018 86 / 181
DDA Drill-down and Case Study Elasticity Summary Segment Elasticity 1 -1.04 2 -0.46 3 -2.10 Walter R. Paczkowski, Ph.D. Deep Data Analytics for Pricing October, 2018 87 / 181
The Components of a Pricing Strategy: Stores Extended Pricing • Price Segmentation Structure Pricing Strategy Pricing Effect Level Assessment • By Segment η = − 1 . 04 , − 0 . 46 , − 2 . 10 Walter R. Paczkowski, Ph.D. Deep Data Analytics for Pricing October, 2018 88 / 181
DDA Drill-down and Case Study Walter R. Paczkowski, Ph.D. Deep Data Analytics for Pricing October, 2018 89 / 181
DDA Drill-down and Case Study The LR model reveals latent structures. Have different parameter estimates by groups. But the parameter estimates are not themselves functions of variables or key drivers. There is no context for the parameters or the groups. Walter R. Paczkowski, Ph.D. Deep Data Analytics for Pricing October, 2018 90 / 181
DDA Drill-down and Case Study Multilevel Structure Walter R. Paczkowski, Ph.D. Deep Data Analytics for Pricing October, 2018 91 / 181
DDA Drill-down and Case Study There are two data structures: 1 Non-Nested 2 Nested or Multilevel Definition Non-nested Data Structure : The data in the population are at the same level. The sample data are from a SRS process. Definition Nested/Multilevel Data Structure : The data in the population are hierarchical. The sample data are from a multistage sampling process. Walter R. Paczkowski, Ph.D. Deep Data Analytics for Pricing October, 2018 92 / 181
DDA Drill-down and Case Study For a non-nested data structure, the central statistical model is successive sampling from one level only. For one level, there is no context for the behavior of the measurement units in the sample. Example All consumers in a random sample are the same. Their behavior is driven solely by their traits – and the prices they see. Variables can, of course, be aggregated or disggregated to a different level. Aggregation or disaggregation are sometimes done to hide/avoid data complexity. Walter R. Paczkowski, Ph.D. Deep Data Analytics for Pricing October, 2018 93 / 181
DDA Drill-down and Case Study Not all data are created equal. In simple statistical analysis, say Stat 101 descriptive statistics and basic OLS , all data are at the same level. This means that some data must be aggregated and others disaggregated to put them on the same level. Definition Aggregation: Taking data at a low level and redefining its values to be used at a higher level. For example, averaging income at the individual household level and using the mean for marketing region average household income. Definition Disaggregation: Taking data at a high level and redefining it to be used at a lower level. For example, dividing quarterly income by 3 and using the result as average monthly income or taking marketing region sales and dividing by the number of states in a region to get average state sales. Walter R. Paczkowski, Ph.D. Deep Data Analytics for Pricing October, 2018 94 / 181
DDA Drill-down and Case Study Example Time Series: Convert from one frequency to another. a a Frequency: Number of measurements per year. The Case Study weekly sales (high frequency data) of 600 consumers were aggregated to an annual number (low frequency data). Walter R. Paczkowski, Ph.D. Deep Data Analytics for Pricing October, 2018 95 / 181
DDA Drill-down and Case Study Example Sales: Convert from store to region and vice versa. Walter R. Paczkowski, Ph.D. Deep Data Analytics for Pricing October, 2018 96 / 181
DDA Drill-down and Case Study Aggregation and disaggregation are very common. Data at one level are ”moved” to another so all the data are at one level. Standard statistical/econometric methods (e.g., OLS , ANOVA ) are then applied. But there are problems with aggregation and disaggregation. Aggregation may make the problem ”easier.” Less data to manage. Avoid problems such as autocorrelation with time series. Walter R. Paczkowski, Ph.D. Deep Data Analytics for Pricing October, 2018 97 / 181
DDA Drill-down and Case Study Aggregation problems: Information, what is needed for decision making, is made more hidden/obscure. Recall that information is buried inside the data and must be extracted. Statistically, there is a loss of power for statistical tests and procedures. Disaggregation problems: Data are ”blown-up”. Statistical tests assume they are independent draws from a distribution, but they are not since they have a common base thus violating this key assumption. Also, sample size is affected since measures are at a higher level than what the sampling was designed for. Walter R. Paczkowski, Ph.D. Deep Data Analytics for Pricing October, 2018 98 / 181
DDA Drill-down and Case Study There are also two subtle issues associated with converting to a single level: 1 Ecological Fallacy; and 2 Atomistic Fallacy. Simpson’s Paradox could also be an issue. Walter R. Paczkowski, Ph.D. Deep Data Analytics for Pricing October, 2018 99 / 181
DDA Drill-down and Case Study Definition Ecological Fallacy : Aggregated data are used to draw conclusions about disaggregated units. Example Model sales and estimate price elasticities at the marketing region and use these elasticities to price at the store level. What holds at the region level may not hold at the store level. Each store has its own defining characteristics Clientele SES Size Local preferences Walter R. Paczkowski, Ph.D. Deep Data Analytics for Pricing October, 2018 100 / 181
Recommend
More recommend