prophet
play

Prophet Forecasting at Scale Sean J. Taylor and Ben Letham Facebook - PowerPoint PPT Presentation

Prophet Forecasting at Scale Sean J. Taylor and Ben Letham Facebook / Core Data Science Outline Motivation and requirements Review of forecasting methods Curve-fitting as forecasting Uncertainty estimation Tuning parameters


  1. Prophet Forecasting at Scale Sean J. Taylor and Ben Letham Facebook / Core Data Science

  2. Outline • Motivation and requirements • Review of forecasting methods • Curve-fitting as forecasting • Uncertainty estimation • Tuning parameters https://xkcd.com/605/

  3. Motivation

  4. Background • We have many applications that require forecasts. • Often even a single metric must be forecast numerous times (e.g. for each country) • Not many people have forecasting training or experience. • Not many existing solutions or tools.

  5. Many applications Capacity planning • How many servers, employees, meals, parking spaces, etc., are we going to need? Goal setting • How much would a metric grow by next year if we did nothing at all? Anomaly detection • Is this spike in bug reports due to some actual problem or because it’s a holiday in Brazil? Stu ff we haven’t thought of yet • Forecasts can become components in complex data pipelines.

  6. Forecasting experience is uncommon Often people train for: • training and deploying predictive models • working with text, image, and graph data • experimentation (A/B testing) • data visualization • deep learning

  7. Pareto principle for forecasting • 80% of applications can be handled by a relatively constrained class of models. • Don’t sell to “top-of-market” -- very complex forecasting problems which can benefit from most advanced approaches (e.g. LSTMs). Cumulative share • We focus on scaling to more applications of complexity by making forecasting quick, simple, and repeatable for human analysts to use. • We focus on scaling to more users by making the tool easy to use for beginners with a path to improve models for experts. Cumulative share of applications from lowest to highest complexity

  8. Prophet (semi) automate forecasting • find similarities across forecasting problems • build a tool that can solve most of them • make it easy to use + teach everyone to use it • give a path forward to improving forecasts

  9. Implementation • Python and R packages • CRAN: prophet Python API • PyPI: fbprophet >>> from fbprophet import Prophet • Core procedure implemented in Stan >>> m = Prophet() (a probabilistic programming >>> m.fit(data) language). >>> future = • Version 0.1 released Feb 2017 m.make_future_dataframe(periods=365) • Version 0.4 released Dec 2018 >>> forecast = m.predict(future) • >8000 Github stars

  10. Review of time series methods

  11. AR and MA models ε 0 ε 1 ε 2 ε 0 ε 1 ε 2 X 0 X 1 X 2 X 0 X 1 X 2 White noise MA(1) ε 0 ε 1 ε 2 ε 0 ε 1 ε 2 X 0 X 1 X 2 X 0 X 1 X 2 AR(1) ARMA(1,1)

  12. ARMA models ARMA(p,q) p q ∑ ∑ X t = α i X t − i + θ q ϵ t − q + ϵ t i =1 i =1 • A special case of ARIMA models with no integration (initial di ff erencing step). • Problem: parameters don’t correspond to any human- interpretable properties of the time series.

  13. Exponential smoothing S t = α X t + (1 − α ) S t − 1

  14. Double exponential smoothing S t = α X t + (1 − α )( S t − 1 + B t − 1 ) B t = β ( S t − S t − 1 ) + (1 − β ) B t − 1

  15. Business time series features • outliers • multiple seasonalities • changes in trends • abrupt changes

  16. Parameters should capture structure

  17. Curve fitting “Curve Fitting by Segmented Straight Lines” Bellman and Roth (1969)

  18. Additive model y ( t ) = piecewise trend( t ) + seasonality( t ) + holiday e ff ects( t ) + i.i.d. noise

  19. Polynomials • Polynomials are a natural choice for fitting curves. • We can control the complexity of the fit using the degree of the polynomial. • But polynomials are terrible at extrapolation.

  20. Splines • Splines are piecewise polynomial curves. • They can have lower interpolation error than polynomials with fewer terms.

  21. Piecewise linear • The main curve that Prophet uses is piecewise linear. • These curves are simple to fit and tend to extrapolate well. • The hard part is deciding which “knots” or changepoints to use.

  22. Changepoint selection • We generate a grid of potential changepoints. • Each changepoint is an opportunity for the underlying curve to change its slope. • Apply a Laplace prior (equivalent to L1-penalty) to changes to select simpler curves.

  23. Changepoints in action • The Laplace prior is tuned using a prior scale that is an input to the procedure. • Smaller prior scales result in fewer changepoints and less flexible curves. • Notice how the trend line does not vary much!

  24. Seasonality • A partial Fourier sum can approximate an arbitrary periodic signal. • For a period P , we generate N pairs of terms using the following periodic equation: n =1 ( a n cos ( P ) + b n sin ( P ) ) N 2 π nt 2 π nt ∑ s ( t ) = • Coe ffi cient parameters are fit to data.

  25. Estimating uncertainty Three sources of uncertainty: • irreducible noise ( 路 ) • parameter uncertainty • trend forecast uncertainty

  26. Irreducible uncertainty • Anything Prophet cannot fit is modeled as mean-zero i.i.d. random noise. • This creates tube-shaped uncertainty in the forecast. • Large uncertainty indicates the model has fit the historical data poorly.

  27. Parameter uncertainty • Every parameter we have fit in the model has sampling variance. • This includes all seasonalities, trends, and changepoints. • As an option, can use Stan’s built in HMC implementation to sample draws from posterior. Credit: Thomas Wiecki

  28. Trend uncertainty distribution of simulated future trends one large trend change

  29. Trend change simulation • At each date in the forecast we allow the trend to change. • The rate of change is estimated based on how many changepoints were selected. • The distribution of changes is selected based on their magnitudes.

  30. Tuning If you run a forecasting procedure and you don’t like the forecast what can you? • Adjust the input data you supply. • Manually edit the results in a spreadsheet. • Change the parameters you used for your model.

  31. Changepoint prior scale • How likely we are to include changepoints in the model. • Controls flexibility of the curve. • Rigid curves: large i.i.d. errors (tube shaped) • Flexible curves: large trend uncertainty (cone shaped)

  32. Seasonality prior scale • Regularizes the parameters on the Fourier expansion. • Overfitting seasonality can also be controlled by turning o ff various types of seasonal patterns or using fewer Fourier terms.

  33. Capacities • Piecewise logistic growth curves have a capacity parameter that we do not fit from data. • Often we can use obvious constraints as upper and lower bounds on forecasts. • The user can specify the capacity as a constant or as a time series.

  34. Holidays • Recurring events that can’t be modeled by smooth curves. • We allow users to configure these to allow custom dates. • We also provide standard holidays for dozens of countries.

  35. Takeaways • Forecasting “at scale” is 25% technology problem 75% people problem. • Prophet is a simple model (with some tricks) but covers many important use- cases at Facebook and elsewhere. • Simple is good! Prophet works robustly and fails in understandable ways. • Using curve-fitting with interpretable parameters allows users to input their domain knowledge into forecasts.

  36. Conclusions Try out Prophet! https://facebook.github.io/prophet/ • Give us feedback -- when it works well and when it doesn’t work well. Contribute to the project! • We welcome pull requests :) Read our paper! • “Forecasting at Scale” 
 ( The American Statistician ) https://peerj.com/preprints/3190/

Recommend


More recommend