using web scraped data to construct consumer price indices
play

Using Web Scraped Data to Construct Consumer Price Indices Nigel - PowerPoint PPT Presentation

Using Web Scraped Data to Construct Consumer Price Indices Nigel Swier NTTS Conference, 10-12 March 2015, Brussels Background One of 4 big data pilots in ONS Prices collection manually based Difficulties accessing retail


  1. Using Web Scraped Data to Construct Consumer Price Indices Nigel Swier NTTS Conference, 10-12 March 2015, Brussels

  2. Background • One of 4 “big data” pilots in ONS • Prices collection manually based • Difficulties accessing retail scanner data • Web scraping as a possible alternative (although lacks quantity information) • More detailed, more frequent and cheaper • Price scraping for supermarket groceries relatively unexplored

  3. Prototype web scrapers • 3 supermarkets • 35 CPI/RPI item categories • Written in Python (scrapy) • Daily collection (around 6500 price quotes) • Item counts monitored daily

  4. Web scraping Rendered webpage: HTML code: ...... </div><div class="productLists" id="endFacets-1"><ul class="cf products line"><li id="p-254942348-3" class=" first"><div class="desc"><h3 class="inBasketInfoContainer"><a id="h-254942348" href="/groceries/Product/Details/?id=254942348" class="si_pl_254942348-title"><span class="image"><img src="http://img.tesco.com/Groceries/pi/121\5010044000121\IDShot_90x90.jpg" alt="" /><!----></span>W arburtons Toastie Sliced White Bread 800G </a></h3><p class="limitedLife"><a href="http://www.tesco.com/groceries/zones/default.aspx?name=quality-and- freshness">Delivering the freshest food to your door- Find out more &gt;</a></p><div class="descContent"><!----><div class="promo"><a href="/groceries/SpecialOffers/SpecialOfferDetail/Default.aspx?promoId=A31234788" title="All products available for this offer" id="flyout-254942348-promo-A31234788--pos" class="promoFlyout"><span class="promoImgBox"><img src="/Groceries/UIAssets/I/Sites/Retail/Superstore/Online/Product/pos/2for.png" class="promoFlyout promo" alt="Special Offer" id="flyout-254942348-promo-A31234788--posimg" /></span><em>Any 2 for £2.00</em></a><span> valid from 21/1/2014 until 10/2/2014</span></div><div class="tools"><div class="moreInfo"><a href="/groceries/Product/Details/?id=254942348" class="midiFlyout" id="flyout-254942348-midi-0-"><img class="midiFlyout hd" src="http://ui.tescoassets.com/groceries/UIAssets/I/../Compressed/I_635209615845382232/Sites/Retail/Superstore/Online/Product/i nfoBlue.gif" alt="" title="View product information" id="flyout-254942348-midi-1-" /></a></div><!----><div class="links"><ul><li><a href="http://www.tesco.com/groceries/product/browse/default.aspx?notepad=white%20sliced%20loaf%20800g&amp;N=4294793217" class="shelfFlyout active plaintooltip" id="s-tt-254942348" title="Premium White Bread"> Rest of <span class="hide">Premium White Bread <!----></span>shelf </a></li></ul></div></div></div></div><div class="quantity"><div class="content addToBasket"><p class="price"><span class="linePrice"> £1.45< !----></span><span class="linePriceAbbr"> (£0.18/100g)</span></p><h4 class="hide">Add to basket</h4><form method="post" id="fMultisearch-254942348" .....

  5. Mapping categories

  6. Data Manipulation (Wrangling) ONS Item Item Search Term Correct Match Category Description Apples, dessert, WAITROSE PINK 'APPLE*' Yes per kg LADY APPLES 4S Apples, dessert, SAINSBURY'S 'APPLE*' No per kg APPLE, KIWI & STRAWBERRY 160G

  7. Price quote distributions Whiskey: Onions:

  8. Experimental Monthly Indices All items with index day Random item from each item category with an index day (bootstrapping) All items, all days

  9. Daily Price Index (Whiskey)

  10. Next Steps • Experimental high frequency index • Analysis of mySupermarket data • Targeted use of web scraped data for temporal sampling project (HICP compliance) • Machine learning for product categorisation

  11. Acknowledgements • Rob Breton (Office for National Statistics) • Rob O’Neill (University of Huddersfield)

Recommend


More recommend