what are we trying to solve for
play

What are we trying to solve for? How does a visitors interaction - PowerPoint PPT Presentation

What are we trying to solve for? How does a visitors interaction with the leading auto manufacturer site during a web session influence their propensity to undertake certain activities that are considered high value for The leading


  1. What are we trying to solve for? How does a visitor’s interaction with the leading auto manufacturer site during a web  session influence their propensity to undertake certain activities that are considered ‘high value’ for The leading auto manufacturer – e.g., Build My Own vehicle? Which non- HVA pages serve as triggers in the visitor’s psyche for indulging in HVA?  What are the key differences in revealed behavior for one-time vs. multi-times visitors –  i.e., does the journey matter? 1 Confidential

  2. 2 Confidential

  3. Data collection plan (DCP ) Source: the analysis data was extracted from the Hadoop Sandbox Database (DB)  environment that has been developed by The leading auto manufacturer. The DB comprises a Hadoop cluster with weblog data for the 5 domestic brand sites Modeling sample: in the pilot model development phase, web sessions data for the  month of June 2013 was considered Exclusions: web sessions of zero duration (i.e., bounce visits) were held out from the  analysis. Also, web sessions pertaining to the other brands were excluded Model data creation: applying standard data streamlining and variables generation  procedures, the raw data was transformed into the modeling sample. This sample was ordered in terms of unique visitors in ascending order of their timed visits 3 Confidential

  4. 4 Confidential

  5. Intuitive considerations The modeling sample entailed a panel data structure where we had web session data for  each visitor for June 2013. As expected, not all visitors frequented the site at the same time points during this period, thereby resulting in an unbalanced panel for our analysis HVA activity by a visitor at any given point in time may not necessarily be attributable to  the activities on the site on that specific time point The visitor may have arrived at this decision over a string of previous visits during  which she progressed through the AIDA journey The activities on the actual day of HVA could be merely an execution of a decision that  has already been made in the visitor’s mind It was therefore imperative for us to analyze the entire ‘journey’ for each visitor rather  than a single visit Our objective for was not so much to predict the likelihood of HVA for the visitors but to  identify the pages which trigger HVA and estimate their impact Therefore, our decision variable for any visitor was binary in nature (i.e., HVA vs. no HVA indicator at any time point) Our outcome variable of interest for any visitor was binary - i.e., HVA vs. no HVA at any  time point 5 Confidential

  6. 6 Confidential

  7. Modeling scheme It was observed in the modeling sample that 78% of the traffic comprised “one - time”  visitors for whom there was no concept of ‘journey’. For this category, a time -invariant binary logistic regression model was developed For the 22% visitors who recorded multiple visits we implemented a random effects  model assuming the following form: E[y it /X it ] = α i + X it β + ε it Where, y it = purchase indicator for i th visitor in period t (y it =1 if purchase>0; y it =0 if purchase=0) α i = intercept term related to i th visitor X it = regressor vector (i.e., site activities data) for i th visitor in period t ε it = unobserved error term for i th visitor in period t Since our decision variable is binary in nature, we applied the following equational form: log (prob[y it =1] /{1 - prob[y it =1]}) = α i + X it β + ε it In order to address the issue of correlation between the visitor specific intercept term  and the covariates we implemented the adjustment proposed by Mundlak (1978) which was later refined by Chamberlain (1982) - the so called Mundlak-Chamberlain device 7 Confidential

  8. 8 Confidential

  9. Model results and inferences Both models returned statistically significant, non-collinear predictors with robust fit  diagnostics:  Binary logistic model: concordance ratio of 85.9%  Mundlak-Chamberlain model: Gaussian quadrature convergent with significant t-ratios The overall HVA rate for multiple visits category was significantly higher than that for  the one-time visitors For one-time visitors it was observed that the vehicle related pages have strong  negative influence on HVA implying that it is generally unlikely that they would build their own during the same session if they have already browsed one or more standard vehicle pages For repeat visitors the vehicle pages exhibited reversed directionality of relationship  with HVA implying that visitors who had browsed the standard vehicle pages at some point in their journey were more likely to try out the BMO page at some point In essence, for one-time visitors the existing vehicle pages may be cannibalizing the  BMO opportunity within the same session 9 Confidential

  10. 10 Confidential

  11. Implications for further analytics and strategy Extend the modeling methodology on the other The leading auto manufacturer brands  to validate the existing model as well as to uncover brand specific idiosyncrasies in terms of impact of key HVA triggers and their relative impacts Review the design and content elements of the existing vehicle pages and the BMP page  to understand what may be causing potential cannibalization Applying advanced text-mining techniques transform raw content in the above pages  into measurable variables and introduce them into an expanded model formulation Develop a separate model to identify the factors that determine likelihood of one-time  vs. multiple visits Develop a model to statistically establish the relationship between HVA and sales  11 Confidential

  12. 12 Confidential

Recommend


More recommend