working with time series smoothing and imputation
play

Working with Time Series: Smoothing and Imputation Frameworks to - PowerPoint PPT Presentation

Working with Time Series: Smoothing and Imputation Frameworks to Improve Data Density Anjali Samani Director of Data Science asamani@circleup.com @anjalisamani @circleup September 26, 2019 This presentation was prepared by CircleUp Network,


  1. Working with Time Series: Smoothing and Imputation Frameworks to Improve Data Density Anjali Samani Director of Data Science asamani@circleup.com @anjalisamani @circleup September 26, 2019 This presentation was prepared by CircleUp Network, Inc. This is for information purposes only and is not intended to be an offer or solicitation of any services or products offered by CircleUp Network, Inc. or any products or offerings of its subsidiaries or affiliates.

  2. SETTING EXPECTATIONS Conceptual frameworks and approach rather low level technical details What will not be covered in this presentation Mathematical theory and implementation details • What will be covered When you should and should not use smoothing or denoising • How to correctly approach imputation for time-series datasets • and quantitative evaluation of techniques Downstream implications and considerations of different choices • O’Reilly Strata Data Conference, New York, 2019 2 strataconf.com #stratadata

  3. OUTLINE Roadmap to the frameworks Introductions: About CircleUp • Problem Statement: Why is smoothing and imputation needed? • Semantics: Denoising vs Smoothing • Smoothing Framework: To smooth or not to smooth… • Missing Value Imputation: How to approach • Imputation Framework: Making data driven choices • O’Reilly Strata Data Conference, New York, 2019 3 strataconf.com #stratadata

  4. This presentation was prepared by CircleUp Advisors LLC for individuals believed to be Qualified Institutional Buyers and interested in learning more about CircleUp Advisors LLC and the technological platform of its affiliate. This is for information purposes only and is not intended to be an offer or solicitation of any services or products offered by CircleUp 4 Advisors LLC or any products or offerings of its affiliate(s).

  5. CIRCLEUP A data-driven investment platform for the consumer sector APPLICATIONS Equity Credit Interpretable Insights Funds Fund Models Black-Box Models H E L I O PLATFORM MODELS DATA Public Sources Partnerships Proprietary O’Reilly Strata Data Conference, New York, 2019 5 strataconf.com #stratadata

  6. CIRCLEUP Unlocking private market investing with data • Helio finds, classifies, and Descriptive Predictive Prescriptive evaluates a universe of ~1.4 Analytics Modeling Analytics million companies across more than 200 sources through a combination of practitioner, Data Aggregation public, and partner data. Knowledge Graph • With this data, Helio evaluates businesses on an array of factors to spotlight breakout Data Acquisition & Brand Discovery Ingestion brands. O’Reilly Strata Data Conference, New York, 2019 6 strataconf.com #stratadata

  7. Why is smoothing and imputation needed? This presentation was prepared by CircleUp Advisors LLC for individuals believed to be Qualified Institutional Buyers and interested in learning more about CircleUp Advisors LLC and the technological platform of its affiliate. This is for information purposes only and is not intended to be an offer or solicitation of any services or products offered by CircleUp 7 Advisors LLC or any products or offerings of its affiliate(s).

  8. IF YOUR DATA IS BAD, YOUR ANALYTICS TOOLS ARE USELESS Your predictions are only as good as the data used by the model to learn patterns A model’s ability to learn and correctly predict future outcomes is • greatly influenced by underlying data Noisy, incomplete data can restrict its application to only a small • set of techniques Business decisions made using poor quality data and hence • modelling outcomes can be very costly O’Reilly Strata Data Conference, New York, 2019 8 strataconf.com #stratadata

  9. ALTERNATIVE DATA: THE NEW GAME CHANGER Complementary to conventional data Data curated from non-traditional sources • Typically complementary to traditional data • In terms of additional signal that can be derived from it • Alternative data is noisy and ephemeral O’Reilly Strata Data Conference, New York, 2019 9 strataconf.com #stratadata

  10. ALTERNATIVE DATA: SOURCES Nontraditional sources Alternative data is often generated by Connected devices and varied sensors: e.g. satellite images, • geolocation data Transactional systems: e.g. credit card transactions • Social networking sites: e.g. interests and affiliations • The internet: e.g. online browsing activity • And many more… O’Reilly Strata Data Conference, New York, 2019 10 strataconf.com #stratadata

  11. ALTERNATIVE DATA: DIAMOND IN THE ROUGH Requires significant cleaning and process to mine signal from alternative data To extract meaningful signals from alternative data, it is necessary to apply appropriate smoothing or denoising and imputation techniques to generate clean and complete time-series. O’Reilly Strata Data Conference, New York, 2019 11 strataconf.com #stratadata

  12. Denoising vs Smoothing signals This presentation was prepared by CircleUp Advisors LLC for individuals believed to be Qualified Institutional Buyers and interested in learning more about CircleUp Advisors LLC and the technological platform of its affiliate. This is for information purposes only and is not intended to be an offer or solicitation of any services or products offered by CircleUp 12 Advisors LLC or any products or offerings of its affiliate(s).

  13. DE-NOISING: MATHEMATICAL SOLUTIONS Removal of noise from a mixture of signal and noise to preserve maximum information If the structure of the signal or noise is known, it can be explicitly modeled • If the statistical properties of the signal are known, identify and remove • noise Create a series of test functions and noisy versions of test functions to simulate signal and • types of noise it may be susceptible to Apply different de-noising techniques and choose the one that maximizes signal-to-noise • ratio Signal Observed Signal Noise O’Reilly Strata Data Conference, New York, 2019 13 strataconf.com #stratadata

  14. DE-NOISING: ENGINEERING SOLUTION Redundancy in data collection can help with noise removal If the system is stable, build redundancy in data collection • May not be possible with ephemeral data O’Reilly Strata Data Conference, New York, 2019 14 strataconf.com #stratadata

  15. DE-NOISING Not easy, but possible Advantage: Greater confidence in the processed signal and its • information content Disadvantage: Can be challenging - requires technical and domain • expertise Likely unsuitable for Alternative Data – nature of signal and noise typically unknown O’Reilly Strata Data Conference, New York, 2019 15 strataconf.com #stratadata

  16. SMOOTHING: SIGNAL OR NOISE? When it is difficult to know what is signal and what is noise Growth in signal for an emerging brand O’Reilly Strata Data Conference, New York, 2019 16 strataconf.com #stratadata

  17. SMOOTHING: SIGNAL OR NOISE? When it is difficult to know what is signal and what is noise Growth in signal for an emerging brand Brand shows up on our radar O’Reilly Strata Data Conference, New York, 2019 17 strataconf.com #stratadata

  18. SMOOTHING: SIGNAL OR NOISE? When it is difficult to know what is signal and what is noise Growth in signal for an emerging brand Signal or Noise? Brand shows up on our radar O’Reilly Strata Data Conference, New York, 2019 18 strataconf.com #stratadata

  19. SMOOTHING: SIGNAL OR NOISE? When it is difficult to know what is signal and what is noise Growth in signal for an emerging brand Signal or Noise? Brand shows up on our radar Actual future values O’Reilly Strata Data Conference, New York, 2019 19 strataconf.com #stratadata

  20. SMOOTHING: SIGNAL OR NOISE? Reduction of excess variance from the data to highlight patterns and trends Smoothing can be a little experimental in nature • Difficult to distinguish legitimate observations from noisy outliers • Growth in signal for an emerging brand Signal or Noise? Brand shows up on our radar O’Reilly Strata Data Conference, New York, 2019 20 strataconf.com #stratadata

  21. SMOOTHING: SIGNAL OR NOISE? Smoothing can hide the very patterns you may want to identify Unsmoothed values Growth in signal for an emerging brand Brand shows up Smoothed on our radar values O’Reilly Strata Data Conference, New York, 2019 21 strataconf.com #stratadata

  22. SMOOTHING: SIGNAL OR NOISE? Smoothing can hide the very patterns you may want to identify Unsmoothed values Growth in signal for an emerging brand Hypothetical Growth Threshold for investment Brand shows up Smoothed on our radar values O’Reilly Strata Data Conference, New York, 2019 22 strataconf.com #stratadata

  23. SMOOTHING: SIGNAL OR NOISE? Smoothing can hide the very patterns you may want to identify Cost of missed opportunities can far outweigh any time-saving benefits of smoothing Actual future values Growth in signal for an emerging brand Unsmoothed values Hypothetical Growth Threshold for investment Brand shows up Smoothed on our radar values O’Reilly Strata Data Conference, New York, 2019 23 strataconf.com #stratadata

  24. DANGERS OF SMOOTHING Smoothing can be misleading and cost of missed opportunities can be high Smoothing make a series appear less volatile than it is • It may also mask the very patterns a practitioner is seeking to identify • Example borrowed from https://serialmentor.com/dataviz/visualizing-trends.html O’Reilly Strata Data Conference, New York, 2019 24 strataconf.com #stratadata

Recommend


More recommend