Comparing Time Series Forecasting Techniques

Introduction

A time series is a sequence of numerical data points in successive order. Time series analysis helps to identify the pattern or nature of the phenomenon represented by the sequence of observations. It can be used to understand the past and predict the future. The purpose of this notebook is to illustrate the use of different time series prediction techniques in forecasting future short-term item-level sales given 5 years of store-item sales data. Let’s begin!

Getting Started

Load required libraries

Understanding the data

The data provided contains the following fields:

date - Date of the sale data. There are no holiday effects or store closures.
store - Store ID
item - Item ID
sales - Number of items sold at a particular store on a particular date.

The given dataset contains sales data of 10 stores with 50 items for each store. Ideally, we could use the whole dataset but for this notebook we will be using only the sales data of store 1 item 1 as the benchmark dataset for comparing different techniques.

Loading output...
Loading output...

Exploring the time series

Let's plot the time series data and check its characteristics.

Loading output...
Loading output...

We can clearly see there is a upward trend and contains a fairly obvious seasonality.

The following plots allow us to visually examine the trend variation of monthly and weekly sales.

Yearly sales by month plot

Loading output...

Weekly sales by year plot

Loading output...

Time series analysis and modeling

For time series prediction, we will be using and comparing three common prediction techniques which are Holt-Winters Exponential Smoothing, ARIMA and XGBoost.

Decompose the time series

The first step is to decompose the data to extract the components of time series. The four components of a time series are trend, sesonality, cycle and residuals. Trend component refers to the overall pattern of the series. Seasonal component is the fluctuations in the data related to calendar cycles. Cycle component consists of decreasing or increasing patterns that are not seasonal, which is usually grouped together with trend component. Finally, residuals that can't be attributed to the three components above. These components are crutial in time series analysis as it captures the historical patterns in the series.

For our data, additive decomposition is the most appropriate since the seasonal flunctuation and random variations seem to be constant in size over time. We can plot the estimated trend, seasonal, and irregular components of the time series by using the decompose function, for example:

Loading output...
Loading output...

The plot above shows the original time series (top), the estimated seasonal component (second), the estimated trend (third) and the estimated residuals component (bottom). We can see that the trend is increasing over time and there is a similar pattern every one year. The remainder plot shows what is left over when the seasonal and trend components have been removed from the data. If we add up the last three components, we can reconstruct the original time series.

Holt-Winters Exponential Smoothing

Holt-Winters exponential smoothing approximates the level, slope and seasonal component at the current time point. Unlike moving average, exponential smoothing technique smoothes time series using exponential window function, where the decreasing weights are assigned exponentially over time. The weight are controlled by three parameters: alpha, beta and gamma which range between 0 and 1. Value closer to 0 mean that little weight is placed on the recent observations and vice versa. This method is often used for short-term forecast of time series that can be described using an additive model. We can simply use HoltWinters function to apply this algorithm on our data.

Loading output...

The alpha, beta and gamma values showed in the output above are 0.018,0 and 0.468 respectively. Since the alpha value is extremely small, it indicates that considerable weights are placed on the less recent observations. The beta value of 0 tells us that the estimate of the slope is constant over time.

We can plot the Holt-Winters model with the forecasted values as a red line on top of the original time series.

Loading output...

From the plot we can tell that this method is fairly good in predicting the seasonal trend.

To make future prediction, we use the forecast.HoltWinters function from the forecast package.

Loading output...

Prediction Result

For evaluation, we use rmse and smape metrics.

Loading output...
Loading output...

The result is actually pretty good. But let's move forward to a slightly more complex model.

ARIMA

Exponential smoothing methods are relatively simple and useful for making forecasts. However, it does not make assumptions on the correlations between the observations. Thus, to improve the predictive power of a model, accounting for correlation between data may be necessary. Autoregressive Integrated Moving Average (ARIMA) model includes trends, level shifts and seasonal pulses in the equation and it assumes correlation between successive time sequenced observations. ARIMA model describes movements in a stationary time series as a function of AR (Autoregressive) and MA (Moving Average) parameters. An AR component denoted by p parameter refers to the use of lagged values in the regression equation for the series. An MA component denoted by q paramter is the model error as a combination of previous term.

Non-seasonal ARIMA

Non-seasonal ARIMA is specified by three order parameters:(p,d,q). The p parameter specifies the number of lags used in the model, d refers to the order of differencing and d determines the number of terms to be included in the model. The process of fitting an ARIMA model is known as the Box-Jenkins method.

Differencing

As mentioned, fitting an ARIMA model requires a stationary time series. In general, a time series is stationary when its mean, variance, and autocovariance are time invariant. Theoretically, a time series with trend or seasonality is not considered stationary, as the value of the series is affected at different time point. Differencing can help to stabilize the mean of time series by reducing the level change thus eliminating the trend.

ACF plot

Autocorrelation plots (ACF) are useful in determining whether a series is stationary. The series is said to have trend or seasonal components when it is correlated with its lags, suggesting inconstant statistical properties over time. Let's take a quick look at the ACF plot of the sales time series.

Loading output...

The above ACF of the time series is decreasing slowly and the autocorrelations remain well above the significance range. This indicates that the time series is non-stationary.

We can check the differencing order using the ndiffs function:

Loading output...

It appears that we should apply a first order differencing on the data to make it stationary.

Loading output...

Let's check the ACF of the differenced time series.

Loading output...

Note that the ACF drops to zero very quickly. This is indicative of a stationary series.

PACF plot

Partial autocorrelation plots (PACF) display correlation between a variable and its lags that is not explained by previous lags. PACF plot is useful in determining the order of the AR model.

Loading output...

The partial correlogram demonstrates that the partial autocorrelations at the first few lags all exceed the significance bound and they are slowly decreasing in magnitude with increasing lag.

In the classical ARIMA, the main problem is to decide which specification to use, that is, how many order of AR and MA term to include. Since we know that the order of differencing is 1, we can use the ACF plot and PACF plot above to determine the p and q parameters. We will also be trying out different models and select the one with the lowest AIC value.

ARIMA(0,1,1)

The ACF plot suggest a MA(1) structure, since the autocorrelogram tails off to zero after lag 1.

Loading output...

ARIMA(1,1,0)

The PACF plot suggest an AR(1) structure, due to the significant first lag and the tapering pattern in the ACF plot.

Loading output...

ARIMA(7,1,0)

We can also try with a AR(7) structure since the first 7 lags are significant in PACF plot.

Loading output...

ARIMA(7,1,7)

Loading output...

By far ARIMA(7,1,7) has achieved the lowest aic value which is 10308, let's further interpret the model by plotting its residuals.

Loading output...

The ACF plot shows that the autocorrelation is not particularly large, although a few significant spikes are spotted.

Let's see if our model will be better off by adding a seasonal component.

Seasonal ARIMA

Seasonal Autoregressive Integrated Moving Average (SARIMA) is an extension of ARIMA that supports time series with a seasonal component. Additional paramters are introduced in SARIMA where we need to specify the period of seasonality, AR/MA seasonal term and seasonal differencing degree.

Time series plot of our sales data shows that the series is strongly non-stationary and seasonal. In the ACF plot, peaks occur every 7 lags. So, we need to do a seasonal differencing on the data where period = 7. A seasonal difference is the difference between an observation and the previous observation from the same season.

Loading output...

Visual inspection of the plot indicates a stationary series.

We can confirm it by using the ndiffs function to check whether first differencing is required.

Loading output...

There is no need for further differencing since the seasonal differenced time series is already in stationary form.

ACF Plot for seasonal differenced series

Loading output...

PACF Plot for seasonal differenced series

Loading output...

For SARIMA model, we will also try out different variation of the model and select the best one for final prediction.

ARIMA(0,0,0)(1,1,0)7

There is a significant spike at the first seasonal lag in the PACF.

Loading output...

ARIMA(0,0,0)(0,1,1)7

There is an exponential decay in the seasonal lags in PACF and a single large significant spike found in lag 7 in ACF.

Loading output...

ARIMA(1,0,1)(0,1,1)7

The significant spike at lag 1 in the ACF suggests a non-seasonal MA(1) component and the significant spike at lag 1 in the PACF suggests a non-seasonal AR(1) component.

Loading output...

As shown, ARIMA(1,0,1)(0,1,1)7 achieves lowest aic score which is 10255.77.

Residual diagnotics of ARIMA(1,0,1)(0,1,1)7

Loading output...

Although the residual ACF of seasonal ARIMA and non-seasonal ARIMA look similar, more lags remain in the significance bound in the ACF.

Prediction using ARIMA(0,1,1)(0,1,1)7

Loading output...
Loading output...

Predicttion result

Loading output...
Loading output...

XGBoost

XGboost or eXtreme Gradient Boosting is an optimized version of distributed gradient boosting library. In boosting, trees are grown based on the information of the previously grown trees. It slowly learns from past data and tries to improve its prediction in subsequent iterations. XGboost is generally faster, more flexible and with higher predictive power as it is enabled with internal cross validation function. In addition, it is able to search for important variables in the data.

We will do a basic implementation of XGBoost here to predict future sales. First thing we'll do is to preprocess the data for the model.

We will create and include other variables in the data such as month, day, day mean, day median etc.

Training the XGBoost model

Prediction

Loading output...
Loading output...

Prediction Result

Loading output...
Loading output...

Conclusion

In this notebook we have applied three different widely used models for our time series forecast. Based on the prediction result, Holt-Winters Exponential Smoothing has achieved the lowest SMAPE while XGBoost has achieved the lowest RMSE, although there is only little differences between the result of all three models.

While there is still room for improvement in the above application, through this project, I have learnt the following pros and cons of each method:

Holt-Winters Exponential Smoothing

Pros:

  • Simple and fast, since it automatically capture the trend of time series and adjust its parameters accordingly.
  • High predictive power

Cons:

  • Only predict one time series at a time. For 10 stores and 50 items that is a total of 500 time series.

ARIMA/SARIMA

Pros:

  • Robust as it includes the correlation of the observations and the previous error term for forecasting.

Cons:

  • Only predict one time series at a time
  • steep learning curve, has more parameters to specify
  • must check for trend and seasonality beforehand

XGBoost

Pros:

  • High predictive power
  • Able to do prediction for multiple time series at a time

Cons:

  • Slow implementation
  • need to optimize many parameters and create additional features that allow it to capture trend and seasonlity