Blog: A Novel Approach for Forecasting With Future Indicators
Time series modeling and forecasting has fundamental importance to various practical domains. Thus a lot of active research is going on in this subject during several years. Many important models have been proposed in literature for improving the accuracy and efficiency of time series modeling and forecasting. Forecasting is a data science task that is central to many activities within an organization. For instance, large organizations like Apple must allocate scarce resources and goal setting in order to measure performance relative to a baseline. The Financial Planning & Analysis team at Corporate has used driver-based arithmetic models for years to predict global cash flows. Our goal was to explore the most modern techniques, validate them against traditional methods, and develop a global cash forecast that was more robust, flexible, and consistent than existing business practices. Understanding that producing high quality forecasts is not an easy problem for machines or analysts. We observed two main themes in the practice of creating a variety of business forecasts:
- Completely automatic forecasting techniques can be brittle and they are often too inflexible to incorporate useful assumptions or heuristics
- Analysts who can produce high quality forecasts are quite rare because forecasting is a specialized data science skill requiring substantial experience
Our goal was to forecast cash flows for Accounts Payable (AP) and Sub-Components over a 90-day (13 weeks) horizon period using Machine learning techniques on the basis of business drivers.
If the forecast model is able to forecast accurately :
- Corporate finance can allocate the necessary budget and invest rest on other revenue earning verticals
- If the forecast is way off then the teams can go for root cause analysis to check the discrepancy on the basis of future indicators provided by the model
- The model is a continuous improving model as it captures holidays effects and business ideology shifts
Our solution for Accounts Payable (AP) cash Forecast:
The data was sourced from different sources, data cleansing activities were carried out to make the data useful for modeling and exploratory data analysis (EDA), integrity analysis was carried out to validate data from data sources with bank statement actuals. Analyzed 58 independent variables to observe significant dependence on USD_AMT of payments. Out of which pay term is the major driver for the payments in AP
As shown below the pay terms are categorized in to 3 buckets.
- Lag 0(Posted and Paid in the same month): All pay-terms that have 0–30 value days come into this category
- Lag 1(Posted and Paid with one-month difference): All pay-terms that have 30–60 value days come into this category
- Lag 2(Posted and Paid with two-month difference): All pay-terms that have 60–90/above value days come into this category
The Figure 1 below illustrates the behavioral difference of pay categories with statistical mean difference. There is a definite need of modeling the different lag buckets separately at worldwide pay terms and integrating them together in reporting.
Inclusion of Future indicator : –The inclusion of future indicator adds immense explain-ability for the business. It also provides a guideline for the gradient of forecast to move in the direction that future indicator correlate. The term regressor is used for this factor in statistics is a rightful justification of dependent payments to flow in a correlated fashion if its explained by the measures. The Data Science team tried three main future indicators that are available COS (Cost of Sales), COS plus Open X(Operational Expenses) and COS plus Open Inventory. The below graph shows the effect of COS as driver for AP payments and the driver is statistically proved with a correlation co-efficient of 0.602 for Lag 2 .
We use a decomposable time series model (Harvey & Peters 1990) with three main model components: trend, seasonality, and holidays. They are combined in the following equation:
Here g(t) is the trend function which models non-periodic changes in the value of the time series, s(t) represents periodic changes (e.g., weekly and yearly seasonality), and h(t) represents the effects of holidays which occur on potentially irregular schedules over one or more days. The error term represents any idiosyncratic changes which are not accommodated by the model; later we will make the parametric assumption that error term is normally distributed
with C the carrying capacity, k the growth rate, and m an offset parameter.
We rely on Fourier series to provide a flexible model of periodic effects (Harvey & Shephard 1993). Let P be the regular period we expect the time series to have (e.g. P = 365.25 for yearly data or P = 7 for weekly data, when we scale our time variable in days). We can approximate arbitrary smooth seasonal effects with
Explanation of HPprophecy and Grid inclusion to set a bench mark:- Prophet model is used as a baseline which is an ensembling method. An ensembling method is a kernel of group of methods that train data and gives out the output forecast of best model that is evaluated on training data. Facebook Prophet is a baseline for HPprophecy. Both use ensembling approaches of General additive models. The functions that makes HPpropehcy a better one is having a additional time series cross validation and grid search.
Cross Validation and Grid search in HpProphecy: –The cross validation along with grid search in HPprophecy evaluates the model accuracy on different time series chucks of data and parameters like effect of seasonality over weekly,monthly,quarterly and yearly. It mainly works on fourier series in training the algorithm on data and ranks the parameter pipeline on basis of best rmse value on validation set. The algorithm outperforms most of the algorithm in industry because as it gets trained on multiple algorithms and selects best algorithms during training and takes the parameters from grid search by evaluating them on validation set (the data that algorithm never trained on). The forecast is finally forecasted with best parameters from grid search.
Evidence the solution works
Below Figure 4 shows the moving window forecast of AP payments with inclusion of COS plus open inventory and future indicators.
The model is currently in production and delivering the forecast for the AP team and help out HP finance in budget allocation.