M10 Estimation and Forecasting

Estimation #

Summary: analysing the ACF and PACF #

Behaviour of the ACF and PACF for ARMA Models #

AR(p) MA(q) ARMA(p,q)
ACF Tails off Cuts off after lag q Tails off
PACF Cuts off after lag p Tails off Tails off
  • The PACF for MA models behaves much like the ACF for AR models.
  • The PACF for AR models behaves much like the ACF for MA models.
  • Because an invertible ARMA model has an infinite AR representation, its PACF will not cut off.
  • Remember that the data might have to be detrended and/or transformed first (e.g., to stabilise the variance, apply a log transform), before such analysis is performed.

Example: Recruitment Series #

acf2(rec, 48)  # will produce values and a graph
##      [,1]  [,2]  [,3]  [,4] [,5]  [,6]  [,7] [,8] [,9] [,10] [,11]
## ACF  0.92  0.78  0.63  0.48 0.36  0.26  0.18 0.13 0.09  0.07  0.06
## PACF 0.92 -0.44 -0.05 -0.02 0.07 -0.03 -0.03 0.04 0.05 -0.02 -0.05
##      [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21]
## ACF   0.02 -0.04 -0.12 -0.19 -0.24 -0.27 -0.27 -0.24 -0.19 -0.11
## PACF -0.14 -0.15 -0.05  0.05  0.01  0.01  0.02  0.09  0.11  0.03
##      [,22] [,23] [,24] [,25] [,26] [,27] [,28] [,29] [,30] [,31]
## ACF  -0.03  0.03  0.06  0.06  0.02 -0.02 -0.06 -0.09 -0.12 -0.13
## PACF -0.03 -0.01 -0.07 -0.12 -0.03  0.05 -0.08 -0.04 -0.03  0.06
##      [,32] [,33] [,34] [,35] [,36] [,37] [,38] [,39] [,40] [,41]
## ACF  -0.11 -0.05  0.02  0.08  0.12  0.10  0.06  0.01 -0.02 -0.03
## PACF  0.05  0.15  0.09 -0.04 -0.10 -0.09 -0.02  0.05  0.08 -0.02
##      [,42] [,43] [,44] [,45] [,46] [,47] [,48]
## ACF  -0.03 -0.02  0.01  0.06  0.12  0.17  0.20
## PACF -0.01 -0.02  0.05  0.01  0.05  0.08 -0.04


  • The behaviour of the PACF is consistent with the behaviour of an AR(2): the ACF tails off, the PACF cuts after lag 2.
  • The following slide shows the fit of a regression using the data triplets {(x:z1,z2):(x3;x2,x1),(x4;x3,x2),,(x447;x446,x445)} to fit a model of the form xt=ϕ0+ϕ1xt1+ϕ2xt2+wt for t=3,4,,447
  • Let us fit this to an AR process using ar.ols, which, according to R, will Fit an autoregressive time series model to the data by ordinary least squares, by default selecting the complexity by AIC.
  • Note that the “optimal” order will be chose up to a maximum defined by order in the function.

regr <- ar.ols(rec, order = 10, demean = FALSE, intercept = TRUE)
regr
## 
## Call:
## ar.ols(x = rec, order.max = 10, demean = FALSE, intercept = TRUE)
## 
## Coefficients:
##       1        2  
##  1.3541  -0.4632  
## 
## Intercept: 6.737 (1.111) 
## 
## Order selected 2  sigma^2 estimated as  89.72
regr$asy.se.coef  # standard errors of the estimates
## $x.mean
## [1] 1.110599
## 
## $ar
## [1] 0.04178901 0.04187942
ts.plot(rec, main = "Results of regression using previous two values of the time series")
lines(time(rec)[-c(1, 2)], regr$x.intercept + regr$ar[1] * rec[-c(1,
  length(rec))] + regr$ar[2] * rec[-c(length(rec) - 1, length(rec))],
  col = "blue", lwd = 1)

ACF and PACF in presence of seasonality #

AR(P)s MA(Q)s ARMA(P,Q)s
ACF* Tails off at lags ks Cuts off after Tails off at
k=1,2,, lag Qs lags ks
PACF* Cuts off after Tails off at lags ks Tails off at
lag Ps k=1,2,, lags ks

*The values at nonseasonal lags hks, for k=1,2,, are zero.

  • This can be seen as a generalisation of the previous table (which is the special case s=1)
  • Fitting seasonal autoregressive and moving average components first generally leads to more satisfactory results

The Box-Jenkins methodology #

Overview - The Box-Jenkins methodology #

  1. Determine the integration order d of the ARIMA(p,d,q) and work on the corresponding ARMA(p,q) by differencing.
  2. Then choose candidates for p and q from examining ACF and PACF
  3. For fixed p and q (candidates), estimate parameters via:
  • Method of moments
  • Maximum likelihood (Note MLE is more involved than the Method of moments, but way more efficient when q>0.)
  • Least squares and variations
  1. Perform diagnostics, to choose the best p and q. This may suggest new candidates for p and q, in which case we perform a new iteration from step 2.
  2. Use the chosen model for forecasting.

This general approach to fitting is called
the Box-Jenkins methodology.

ARIMA(p,d,q)ARMA(p,q): choosing d #

Choosing d involves:

  • A time series xt may be modelled by a stationary ARMA model if the sample ACF decays rapidly. If the decay is slow (and the ACF is positive) then this suggests further differencing.
  • Let σd2 be the sample variance of the dxt. This quantity should first decrease with d until stationarity is achieved, and then starts to increase. Hence d could be set to the value which minimises σd2, d0.

Several candidates could be selected, a final choice of which could be made after full estimation of all candidates and based on diagnostics.

One should be careful to not over-difference, as this introduces artificial dependence in the data.

Example: Recruitment series #

Example: Chicken prices #

Choosing p and q for the ARMA(p,q) #

  • We work on the differenced series, which is then assumed to be ARMA(p,q).
  • We assume the mean of the differenced series is 0. If it isn’t the case in the data, subtract its sample mean, and work on the residuals.
  • Examine ACF and PACF to choose candidates for p and q:
    • p can be inferred from the number of spikes in the PACF until a geometrical decay to zero is observed.
    • q can be inferred from the number of spikes in the ACF until a geometrical decay to zero is observed.
  • Alternatively or additionally, work iteratively from the simplest models and increase orders p and q:
    • Higher orders will always reduce the sum of squares (more parameters). At the extreme a model with n parameters will replicate the data.
    • The appropriate order could be chosen using information criteria (e.g., BIC or AIC).

Example: Recruitment series #

Estimation of the parameters #

  • We now need estimates for ϕ1,,ϕp and θ1,,θq for given p and q
  • Use R to fit the model with the function fit <- arima(x,order=c(p,0,q)) where x is the differenced time series, and where p and q need to replaced by their chosen numerical values. The default estimation procedure (unless there are missing values) is to use conditional sum-of-squares to find starting values, then maximum likelihood.
  • This will output parameter estimates, their standard errors, as well as σ2 and the AIC.
  • (One could alternatively use a method of moments approach (see later), but this is much less efficient than MLE if q>0.)

Example: Recruitment series #

MLE estimators are implemented in R as follows:

rec.mle <- ar.mle(rec, order = 2)
rec.mle$x.mean
## [1] 62.26153
rec.mle$ar
## [1]  1.3512809 -0.4612736
sqrt(diag(rec.mle$asy.var.coef))
## [1] 0.04099159 0.04099159
rec.mle$var.pred
## [1] 89.33597

and then the fit is displayed as follows:

ts.plot(rec, main = "Results of fit using the R MLE estimators")
lines(rec[1:length(rec)] - rec.mle$resid, col = "blue", lwd = 1)


The best R estimators are implemented in R as follows:

rec.arima0 <- arima(rec, order = c(2, 0, 0))  # to use with tsdiag
rec.arima <- sarima(rec, 2, 0, 0)
rec.arima0$coef[3]  # 61.86
## intercept 
##  61.85847
rec.arima0$coef[1:2]  # 1.35, -.46
##        ar1        ar2 
##  1.3512196 -0.4612128
sqrt(diag(rec.arima0$var.coef))  # .04, .04
##        ar1        ar2  intercept 
## 0.04158461 0.04166812 4.00393378
rec.arima0$sigma2  # 89.33
## [1] 89.33436

and then the fit is displayed as follows:

ts.plot(rec, main = "Results of fit using the R ARIMA estimators")
lines(rec[1:length(rec)] - rec.arima$fit$residuals, col = "blue",
  lwd = 1)

Diagnostics #

Guiding principle: if we have a good fit, the residuals should be (uncorrelated) white noise. This can be done in three ways:

  • Residuals: If visual inspection of the residuals lets appear any pattern (trend or magnitude) then it can’t be white noise
  • ACF and PACF: these should be within their confidence intervals with appropriate frequency (95%)
  • More formally, one can test for white noise with the Ljung-Box portmanteau test. This test considers the dimension of the models (the number of parameters), and tests whether correlations are 0 at all lags, and displays p-values. High p-values mean we cannot reject the (null) hypothesis of white noise (which is what we want).

In Base R , those three diagnostics are output when running the R function tsdiag(fit) where fit is where we stored our estimation from the function arima (but this function has errors!).
Better still, use the fitting function sarima(rec,2,0,0) of astsa.


tsdiag(rec.arima0)

[note no output in the console]



RecSARIMAdiag <- sarima(rec, 2, 0, 0)
## initial  value 3.332380 
## iter   2 value 3.251366
## iter   3 value 2.564654
## iter   4 value 2.430141
## iter   5 value 2.258212
## iter   6 value 2.253343
## iter   7 value 2.248346
## iter   8 value 2.248345
## iter   9 value 2.248345
## iter  10 value 2.248341
## iter  11 value 2.248332
## iter  12 value 2.248331
## iter  13 value 2.248330
## iter  13 value 2.248330
## iter  13 value 2.248330
## final  value 2.248330 
## converged
## initial  value 2.248862 
## iter   2 value 2.248857
## iter   3 value 2.248855
## iter   4 value 2.248855
## iter   5 value 2.248854
## iter   6 value 2.248854
## iter   7 value 2.248854
## iter   8 value 2.248854
## iter   9 value 2.248853
## iter  10 value 2.248853
## iter  10 value 2.248853
## iter  10 value 2.248853
## final  value 2.248853 
## converged
## <><><><><><><><><><><><><><>
##  
## Coefficients: 
##       Estimate     SE  t.value p.value
## ar1     1.3512 0.0416  32.4933       0
## ar2    -0.4612 0.0417 -11.0687       0
## xmean  61.8585 4.0039  15.4494       0
## 
## sigma^2 estimated as 89.33436 on 450 degrees of freedom 
##  
## AIC = 7.353244  AICc = 7.353362  BIC = 7.389587 
## 

RecSARIMAdiag$fit
## 
## Call:
## arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, Q), period = S), 
##     xreg = xmean, include.mean = FALSE, transform.pars = trans, fixed = fixed, 
##     optim.control = list(trace = trc, REPORT = 1, reltol = tol))
## 
## Coefficients:
##          ar1      ar2    xmean
##       1.3512  -0.4612  61.8585
## s.e.  0.0416   0.0417   4.0039
## 
## sigma^2 estimated as 89.33:  log likelihood = -1661.51,  aic = 3331.02
RecSARIMAdiag$ICs
##      AIC     AICc      BIC 
## 7.353244 7.353362 7.389587

Overfitting caveat #

  • If we choose an order that is too high, the unnecessary parameters will be insignificant, but this reduces the efficiency of the parameters that are significant. This might look like a better fit, but might lead to bad forecasts. Hence, this should be avoided.
  • This can be used as a diagnostic though: if we increment the order and we get similar parameter estimates, then the smaller (original) model should have the appropriate order.

Although generally not well-known, it is an obvious fact to seasoned modellers that extrapolating overfitted models is almost guaranteed to lead to aberrant results.

Example: Overfitting caveat #

  • shows the U.S. population by official census, every ten years from 1910 to 1990, as points. If we use these nine observations to predict the future population, we can use an eight-degree polynomial so the fit to the nine observations is perfect.
  • The model predicts that the population of the United States will be close to zero in the year 2000, and will cross zero sometime
    in the year 2002!

When seasonality is involved #

The process described above is applied by analogy when some seasonality is present in the residuals:

  1. Of course, the first step is to include a seasonal component in the trend if appropriate (such as with sin and cos functions).
  2. The series is then “in-season” and “out-of-season” differenced to lead to stationarity (if needed).
  3. Peaks in the ACF and PACF are then analysed with the tables presented at the beginning of this section, and eliminated with an appropriate ARIMA(p,d,q)×(P,D,Q)s model.
  4. Goodness-of-fit is assessed as usual by examining the whiteness of the residuals
    (e.g., this can be done thanks to the diagnostics of sarima)

Example: Air Passengers #

Remember the steps 1.-2. performed in the previous module:

x <- AirPassengers
lx <- log(x)
dlx <- diff(lx)
ddlx <- diff(dlx, 12)
plot.ts(cbind(x, lx, dlx, ddlx), main = "")

This corresponds to step 2 above, and had led to residuals that look reasonably stationary.

We now need to choose a model for them.



acf2(ddlx, 50)
##       [,1]  [,2]  [,3]  [,4] [,5] [,6]  [,7]  [,8] [,9] [,10] [,11]
## ACF  -0.34  0.11 -0.20  0.02 0.06 0.03 -0.06  0.00 0.18 -0.08  0.06
## PACF -0.34 -0.01 -0.19 -0.13 0.03 0.03 -0.06 -0.02 0.23  0.04  0.05
##      [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21]
## ACF  -0.39  0.15 -0.06  0.15 -0.14  0.07  0.02 -0.01 -0.12  0.04
## PACF -0.34 -0.11 -0.08 -0.02 -0.14  0.03  0.11 -0.01 -0.17  0.13
##      [,22] [,23] [,24] [,25] [,26] [,27] [,28] [,29] [,30] [,31]
## ACF  -0.09  0.22 -0.02  -0.1  0.05 -0.03  0.05 -0.02 -0.05 -0.05
## PACF -0.07  0.14 -0.07  -0.1 -0.01  0.04 -0.09  0.05  0.00 -0.10
##      [,32] [,33] [,34] [,35] [,36] [,37] [,38] [,39] [,40] [,41]
## ACF   0.20 -0.12  0.08 -0.15 -0.01  0.05  0.03 -0.02 -0.03 -0.07
## PACF -0.02  0.01 -0.02  0.02 -0.16 -0.03  0.01  0.05 -0.08 -0.17
##      [,42] [,43] [,44] [,45] [,46] [,47] [,48] [,49] [,50]
## ACF   0.10 -0.09  0.03 -0.04 -0.04  0.11 -0.05  0.11 -0.02
## PACF  0.07 -0.10 -0.06 -0.03 -0.12 -0.01 -0.05  0.09  0.13


Seasonal component:

  • At the seasons, the ACF appears to be cutting off at lag 1s (s=12)
  • PACF appears to be tailing off at lags 1s,2s,3s,4s,.
  • SMA(1), P=0, Q=1, in the season s=12

Non-Seasonal component:

  • We inspect the ACF and PACF at lower lags.
  • Both appear to be tailing off, which suggests ARMA within the seasons, say with p=q=1.
  • ARMA(1,1)

Thus, we will first try an ARIMA(1,1,1)×(0,1,1)12} on the logged data:

AirPassFit1 <- sarima(lx, 1, 1, 1, 0, 1, 1, 12)

AirPassFit1$fit
## 
## Call:
## arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, Q), period = S), 
##     include.mean = !no.constant, transform.pars = trans, fixed = fixed, optim.control = list(trace = trc, 
##         REPORT = 1, reltol = tol))
## 
## Coefficients:
##          ar1      ma1     sma1
##       0.1960  -0.5784  -0.5643
## s.e.  0.2475   0.2132   0.0747
## 
## sigma^2 estimated as 0.001341:  log likelihood = 244.95,  aic = -481.9

The AR parameter is not significant, so we try dropping one parameter from the within seasons part. We will try

AirPassFit2 <- sarima(lx, 0, 1, 1, 0, 1, 1, 12)  # ARIMA(0,1,1)x(0,1,1)_12
AirPassFit3 <- sarima(lx, 1, 1, 0, 0, 1, 1, 12)  # ARIMA(1,1,0)x(0,1,1)_12

AirPassFit2$fit
...
## Coefficients:
##           ma1     sma1
##       -0.4018  -0.5569
## s.e.   0.0896   0.0731
## 
## sigma^2 estimated as 0.001348:  log likelihood = 244.7,  aic = -483.4
...
AirPassFit3$fit
...
## Coefficients:
##           ar1     sma1
##       -0.3395  -0.5619
## s.e.   0.0822   0.0748
## 
## sigma^2 estimated as 0.001367:  log likelihood = 243.74,  aic = -481.49
...

AirPassFit2$ICs
##       AIC      AICc       BIC 
## -3.690069 -3.689354 -3.624225
AirPassFit3$ICs
##       AIC      AICc       BIC 
## -3.675493 -3.674777 -3.609649

All information criteria prefer the ARIMA(0,1,1)×(0,1,1)12 model, which is displayed next.


AirPassFit2 <- sarima(lx, 0, 1, 1, 0, 1, 1, 12)  # ARIMA(0,1,1)x(0,1,1)_12

Except for 1-2 outliers, the model fits well.


Checking whether we should have P=1? #

AirPassFit4 <- sarima(lx, 0, 1, 1, 1, 1, 1, 12)  # ARIMA(0,1,1)x(1,1,1)_12
AirPassFit2$ICs
##       AIC      AICc       BIC 
## -3.690069 -3.689354 -3.624225
AirPassFit4$ICs
##       AIC      AICc       BIC 
## -3.678726 -3.677284 -3.590934

All information criteria prefer the ARIMA(0,1,1)×(0,1,1)12 model again.


AirPassFit4 <- sarima(lx, 0, 1, 1, 1, 1, 1, 12)  # ARIMA(0,1,1)x(1,1,1)_12

Complete case study: GNP data #

plot(gnp, main = "Quarterly U.S. GNP from 1947(1) to 2002(3)")

GNP data seems to have exponential growth,
so a log transformation might be appropriate.


acf2(gnp, 50, main = "Sample ACF and PACF of the GNP data. Lag is in terms of years.")

...
##      [,1] [,2]  [,3] [,4] [,5]  [,6]  [,7]  [,8]  [,9] [,10] [,11]
## ACF  0.99 0.97  0.96 0.94 0.93  0.91  0.90  0.88  0.87  0.85  0.83
## PACF 0.99 0.00 -0.02 0.00 0.00 -0.02 -0.02 -0.02 -0.01 -0.02  0.00
...

Slow decay of ACF differencing may be appropriate.


Making those two modifications leads to

gnp.log.diff <- diff(log(gnp))  # growth rate plot(gnpgr)
ts.plot(gnp.log.diff, main = "U.S. GNP quarterly growth rate. The horizontal line displays the average growth of the process, which is close to 1%.")
abline(mean(gnp.log.diff), 0, col = "blue", lwd = 2)


acf2(gnp.log.diff, 24)

...
##      [,1] [,2]  [,3]  [,4]  [,5]  [,6]  [,7]  [,8] [,9] [,10] [,11]
## ACF  0.35 0.19 -0.01 -0.12 -0.17 -0.11 -0.09 -0.04 0.04  0.05  0.03
## PACF 0.35 0.08 -0.11 -0.12 -0.09  0.01 -0.03 -0.02 0.05  0.01 -0.03
...

GNP data: MA(2) fit #

GNP.MA2 <- sarima(gnp.log.diff, 0, 0, 2)  # MA(2)
GNP.MA2$fit
## 
## Call:
## arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, Q), period = S), 
##     xreg = xmean, include.mean = FALSE, transform.pars = trans, fixed = fixed, 
##     optim.control = list(trace = trc, REPORT = 1, reltol = tol))
## 
## Coefficients:
##          ma1     ma2   xmean
##       0.3028  0.2035  0.0083
## s.e.  0.0654  0.0644  0.0010
## 
## sigma^2 estimated as 8.919e-05:  log likelihood = 719.96,  aic = -1431.93
GNP.MA2$ICs
##       AIC      AICc       BIC 
## -6.450133 -6.449637 -6.388823

GNP data: AR(1) fit #

GNP.AR1 <- sarima(gnp.log.diff, 1, 0, 0)  # AR(1)
GNP.AR1$fit
## 
## Call:
## arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, Q), period = S), 
##     xreg = xmean, include.mean = FALSE, transform.pars = trans, fixed = fixed, 
##     optim.control = list(trace = trc, REPORT = 1, reltol = tol))
## 
## Coefficients:
##          ar1   xmean
##       0.3467  0.0083
## s.e.  0.0627  0.0010
## 
## sigma^2 estimated as 9.03e-05:  log likelihood = 718.61,  aic = -1431.22
GNP.AR1$ICs
##       AIC      AICc       BIC 
## -6.446940 -6.446693 -6.400958

GNP data: ARMA(1,2) fit #

GNP.ARMA <- sarima(gnp.log.diff, 1, 0, 2)  # ARMA(1,2)
GNP.ARMA$fit
## 
## Call:
## arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, Q), period = S), 
##     xreg = xmean, include.mean = FALSE, transform.pars = trans, fixed = fixed, 
##     optim.control = list(trace = trc, REPORT = 1, reltol = tol))
## 
## Coefficients:
##          ar1     ma1     ma2   xmean
##       0.2407  0.0761  0.1623  0.0083
## s.e.  0.2066  0.2026  0.0851  0.0010
## 
## sigma^2 estimated as 8.877e-05:  log likelihood = 720.47,  aic = -1430.95
GNP.ARMA$ICs
##       AIC      AICc       BIC 
## -6.445712 -6.444882 -6.369075

GNP data: Comparison #

Note fitted models are

  • MA(2): x^t=0.008(0.001)+0.303(0.065)w^t1+0.204(0.064)w^t2+w^t, where σ^w=0.0094.
  • AR(1): x^t=0.008(0.001)(10.347)+0.347(0.063)x^t1+w^t, where σ^w=0.0095.

  • In fact, both models are nearly the same. This is because the AR model can be rewritten (ignoring the constant) as xt0.35wt1+0.12wt2+wt, where the coefficients were obtained via
formatC(ARMAtoMA(ar = 0.35, ma = 0, 6), digits = 3)
## [1] "0.35"    "0.122"   "0.0429"  "0.015"   "0.00525" "0.00184"

GNP data: Model selection #

  • Information criteria (the lower the better):
    • AR(1): $AIC: -6.446940 $AICc: -6.446693 $BIC: -6.400958
    • MA(2): $AIC: -6.450133 $AICc: -6.449637 $BIC: -6.388823
    • ARMA(1,2): $AIC: -6.445712 $AICc: -6.444882 $BIC: -6.369075
  • The AIC and AICc both prefer the MA(2) fit to AR(1)
  • The BIC prefers the simpler AR(1) model to MA(2).
  • It is often the case that the BIC will select a model of smaller order than the AIC or AICc. In either case, it is not unreasonable to retain the AR(1) because pure autoregressive models are easier to work with.
  • Combining the two to ARMA(1,2) leads to poorer scores.

Side comment: compare with ARIMA(1,1,2) on log(GNP) #

GNP.MA2 <- sarima(log(gnp), 1, 1, 2)  # ARIMA(1,1,2)
GNP.MA2$fit
## 
## Call:
## arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, Q), period = S), 
##     xreg = constant, transform.pars = trans, fixed = fixed, optim.control = list(trace = trc, 
##         REPORT = 1, reltol = tol))
## 
## Coefficients:
##          ar1     ma1     ma2  constant
##       0.2407  0.0761  0.1623    0.0083
## s.e.  0.2066  0.2026  0.0851    0.0010
## 
## sigma^2 estimated as 8.877e-05:  log likelihood = 720.47,  aic = -1430.95
GNP.MA2$ICs
##       AIC      AICc       BIC 
## -6.445711 -6.444880 -6.369074

Appendix: Method of moments for AR(p) models #

We seek to estimate ϕ1,,ϕp in the AR(p) model xt=ϕ1xt1++ϕpxtp+wt. A method of moments approach leads us back to the Yule-Walker equations γ(h)=ϕγ(h1)++ϕpγ(hp),h=1,2,,pσw2=γ(0)ϕ1γ(1)ϕpγ(p). In matrix notation, these become Γpϕ=γpandσw2=γ(0)ϕγp. where Γp={γ(kj)}j,k=1p is a p×p matrix.


Using the method of moments, we replace γ(h) by γ^(h) and solve ϕ^=Γ^p1γ^p,σ^w2=γ^(0)γ^pΓ^p1γ^p. These are called the Yule-Walker estimators. Equivalently ϕ^=Γ^p1ρ^p,σ^w2=γ^(0)[1ρ^pR^p1ρ^p], where R^p={ρ^(kj)}j,k=1p is a p×p matrix.

  • For AR(p) models, if the sample size is large, the Yule-Walker estimators are approximately normally distributed, and σ^w2 is close to the true value of σw2.
  • The Yule-Walker estimators are essentially least square estimators, which work well on AR models because they are linear. MA (and ARMA) processes are nonlinear in the parameters, and will fare badly with the Yule-Walker estimators.

Example: Recruitment series #

Yule-Walker estimators are implemented in R as follows:

rec.yw <- ar.yw(rec, order = 2)
rec.yw$x.mean  #  (mean estimate)
## [1] 62.26278
rec.yw$ar  # (coefficient estimates)
## [1]  1.3315874 -0.4445447
sqrt(diag(rec.yw$asy.var.coef))  # (standard errors)
## [1] 0.04222637 0.04222637
rec.yw$var.pred  # (error variance estimate)
## [1] 94.79912

The fit is then displayed as follows:

ts.plot(rec, main = "Results of fit using the Yule-Walker estimators")
lines(rec[1:length(rec)] - rec.yw$resid, col = "blue", lwd = 1)

The estimators obtained are nearly identical to the MLE ones (although MLE has lower variance as expected). It can be shown that this will typically be the case for AR models.

Comparison of estimation techniques via residuals of rec #

ts.plot(rec.yw$resid - rec.arima$fit$residuals, ylab = "Difference in residuals",
  main = "Residuals of {black: Yule-Walker, blue: Regression} - Residuals of ARIMA(2,0,0) fit")
abline(a = mean((rec.yw$resid[3:length(rec)] - rec.arima$fit$residuals[length(rec)])^1),
  0, col = "black")
lines(regr$resid - rec.arima$fit$residuals, col = "blue")
abline(a = mean((regr$resid[3:length(rec)] - rec.arima$fit$residuals[3:length(rec)])^1),
  0, col = "blue")


ts.plot(regr$resid - rec.arima$fit$residuals, col = "blue", ylab = "Difference in residuals",
  main = "Residuals of {black: Yule-Walker (only mean), blue: Regression, red: MLE} - Residuals of ARIMA(2,0,0) fit")
abline(a = mean((regr$resid[3:length(rec)] - rec.arima$fit$residuals[3:length(rec)])^1),
  0, col = "blue")
lines(rec.mle$resid - rec.arima$fit$residuals, col = "red")
abline(a = mean((rec.mle$resid[3:length(rec)] - rec.arima$fit$residuals[3:length(rec)])),
  0, col = "red")
# lines(rec.yw$resid-rec.arima$fit$residuals,col='black')
abline(a = mean((rec.yw$resid[3:length(rec)] - rec.arima$fit$residuals[3:length(rec)])^1),
  0, col = "black")


Model residuals innovations e^ Mean Δ to arima e^2
regr$resid[3:length(rec)] 0.000 -0.020 40343
rec.yw$resid[3:length(rec)] -0.021 -0.041 40369
rec.mle$resid[3:length(rec)] -.020 -0.040 40344
rec.arima$fit$residuals[3:length(rec)] 0.020 - 404344
  • Yule-Walker and MLE are very close (location) as pointed out earlier.
  • Yule-Walker is focussed on moments irrespective of residuals, so it is the poorest in terms of least squares.
  • As compared to Yule-Walker and MLE, the arima methodology sacrifices some Least Square performance in favour of a better fit (mean of residuals is closer to 0). Flatness of MLE (in red) suggests arima is a shifted MLE. It seems to be the best compromise.
  • The regression is pure least squares, and minimises them, and has 0 mean residuals. However, it can’t be the preferred model as it ignores the fact that the model is autoregressive, and is not regressed on an independent, assumed known and independent, series. Forecasts (and their standard errors) should not be trusted.
    (see next slide for code that generated numbers)

mean((regr$resid[3:length(rec)] - rec.arima$fit$residuals[3:length(rec)])^1)
## [1] 0.01208978
mean((rec.yw$resid[3:length(rec)] - rec.arima$fit$residuals[3:length(rec)])^1)
## [1] -0.04591007
mean((rec.mle$resid[3:length(rec)] - rec.arima$fit$residuals[3:length(rec)])^1)
## [1] -0.04432752
mean(regr$resid[3:length(rec)]^1)
## [1] -3.17507e-13
mean(rec.yw$resid[3:length(rec)]^1)
## [1] -0.05799985
mean(rec.mle$resid[3:length(rec)]^1)
## [1] -0.05641729
mean(rec.arima$fit$residuals[3:length(rec)]^1)
## [1] -0.01208978
sum(regr$resid[3:length(rec)]^2)
## [1] 40462.39
sum(rec.yw$resid[3:length(rec)]^2)
## [1] 40491.55
sum(rec.mle$resid[3:length(rec)]^2)
## [1] 40464.39
sum(rec.arima$fit$residuals[3:length(rec)]^2)
## [1] 40463.03

Forecasting #

Introduction #

  • In forecasting, the goal is to predict future values of a time series, xn+m, m=1,2,, based on the data collected to the present, x1:n={x1,x2,,xn}.
  • We assume here that xt is stationary and the model parameters are known.
  • The problem of forecasting when the model parameters are unknown is more complicated. We mostly focus on performing predictions using R (which allows for that fact appropriately), rather than the deep technicalities of it (which are outside scope of this course).

Best Linear Predictors (BLPs) #

We will restrict our attention to predictors that are linear functions of the data, that is, predictors of the form xn+mn=α0+k=1nαkxk, where α0,α1,,αn are real numbers. These are called Best Linear Predictors (BLPs).

Note:

  • In fact the α’s depend on m too, but that is not reflected in the notation.
  • Such estimators depend only on the second-order moments of the process, which are easy to estimate from the data.
  • Most actuarial credibility estimators belong to the family of BLPs (Bühlmann, Bühlmann-Straub, )

Best Linear Prediction for Stationary Processes #

Given data x1,,xn, the best linear predictor xn+mn=α0+k=1nαkxk of xn+m for m1 is found by solving E[(xn+mxn+mn)xk]=0,k=0,1,,n, where x0=1, for α0,α1,,αn.

  • The n+1 equations specified above are called the prediction equations, and they are used to solve for the coefficients {α0,α1,,αn}.
  • This results stems from minimising least squares.

If E[xt]=μ, the first equation (k=0) implies E[xn+mn]=E[xn+m]=μ. Thus, taking expectation of the BLP leads to μ=α0+k=1nαkμ or α0=μ(1k=1nαk). Hence, the form of the BLP is xn+mn=μ+k=1nαk(xkμ) Henceforth, we will assume that μ=0α0=0, without loss of generality (as long as parameters are assumed known).

One-step-ahead prediction #

  • Given x1:n={x1,x2,,xn} we wish to forecast the time series xn+1 value at the next point in time n+1.
  • The BLP of xn+1 is of the form xn+1n=ϕn1xn+ϕn2xn1++ϕnnx1, where we now display the dependence of the coefficients on n.
  • In this case, αk is ϕn,n+1k for k=1,,n.
  • Using the BLP result above, the coefficients {ϕn1,ϕn2,,ϕnn} satisfy j=1nϕnjγ(kj)=γ(k) for k=1,,n or Γnϕn=γn, where Γn={γ(kj)}j,k=1n is an n×n matrix, ϕn=(ϕn1,,ϕnn), and where γn=(γ(1),,γ(n)).
    [Note that these correspond to the Yule-Walker equations.]

  • The matrix Γn is nonnegative definite (in fact guaranteed positive definite for ARMA models). We have then ϕn=Γn1γn.
  • The one-step-ahead forecast is then xn+1n=ϕnx,x=(xn,xn1,,x1).
  • The mean square one-step-ahead prediction error is Pn+1n=E[(xn+1xn+1n)2]=γ(0)γnΓn1γn.

The Durbin-Levinson Algorithm #

The prediction equations (and associated mean-square errors) of the one-step-ahead prediction can be found iteratively thanks to the Durbin-Levinson Algorithm:

  • Initial values: ϕ00=0,P10=γ(0).
  • For all n1 the last ϕ is: ϕnn=ρ(n)k=1n1ϕn1,kρ(nk)1k=1n1ϕn1,kρ(k).
  • If n2, middle ϕ’s are: ϕnk=ϕn1,kϕnnϕn1,nk,k=1,2,,n1.
  • For n1, the prediction error is Pn+1n=Pnn1(1ϕnn2)=γ(0)j=0n[1ϕjj2].

Concluding notes #

  • The results shown above are provided to illustrate the process of forecasting. Students are not expected to be able to show the results.
  • There are other algorithms for calculating one-step-ahead forecasts, such as the innovations algorithm (outside scope).
  • The Durbin-Levinson Algorithm can be adapted to calculate the PACF iteratively.

Forecasting ARMA processes #

Introduction #

  • The technical side of forecasting ARMA models can get involved.
  • Throughout, we assume xt is a causal and invertible ARMA(p,q) process ϕ(B)xt=θ(B)wt, where wt iid N(0,σw2).
  • In the non-zero mean case E[xt]=μx, simply replace xt with xtμx in the model.

Two types of forecasts #

We consider two types of forecasts:

  1. The minimum mean square error predictor of xn+m based on the data {xn,,x1}, defined as xn+mn=E[xn+m|xn,xn1,,x1].
  2. The predictor of xn+m based on the infinite past, defined as x~n+m=E[xn+m|xn,xn1,,x1,x0,x1,].

Note:

  • For ARMA models, x~n+m is easier to calculate.
  • In general, xn+mn and x~n+m are not the same.
  • The idea is that, for large samples, x~n+m provides a good approximation to xn+mn.

Forecasts for more than one step #

Write xn+m in its invertible form: wn+m=j=0πjxn+mj,π0=1. Taking conditional expectations we have 0=x~n+m+j=1πjx~n+mj. Because E[xt|xn,xn1,,x0,x1,]=xt for tn this can be rewritten as x~n+m=j=1m1πjx~n+mjj=mπjxn+mj. the second part of which belongs to the past.


Prediction is then accomplished recursively using x~n+m=j=1m1πjx~n+mjj=mπjxn+mj, starting with the one-step-ahead (m=1), and then continuing for m=2,3,.

This illustrates why it is easier to forecast with x~n+m Thanks to the infinite representation of xt (if causal), extending the condition of the predictor xn+mn to infinite past allowed us the simple calculations above.

Mean-square prediction error is calculated next, but we need an intermediary result first - an expression for x~n+m in function of w’s (so we can calculate variance easily).


Write xn+m in its causal form: xn+m=j=0ψjwn+mj,ψ0=1. Taking conditional expectations we have x~n+m=j=0ψjw~n+mj=j=mψjwn+mj because w~t=E[wt|xn,xn1,,x0,x1,]={0t>nwttn.


Mean-square prediction error can now be calculated thanks to x~n+m=j=mψjwn+mjxn+mx~n+m=j=0m1ψjwn+mj, and hence Pn+mn=E[(xn+mx~n+m)2]=σw2j=0m1ψj2.

Note that for a fixed sample size, n, the prediction errors are correlated. That is, for time lag k1, E[(xn+mx~n+m)(xn+m+kx~n+m+k)]=σw2j=0m1ψjψj+k.

Predictions in practice #

  • When n is small the system of “prediction equations” E[(xn+mxn+mn)xk]=0,k=0,1,,n, (where x0=1, for α0,α1,,αn) can be solved directly.
  • However, when n is large one will want to use x~n+m=j=1m1πjx~n+mjj=mπjxn+mj. This is correct “in theory”, but unfortunately, in practice we do not observe x0,x1,x2,, and only data x1,x2,,xn are available. We then need to truncate the infinite sum in the RHS.

  • The truncated predictor is then written as x~n+mn=j=1m1πjx~n+mjnj=mn+m1πjxn+mj, which is applied recursively as described above.
  • The mean square prediction error, in this case, is approximated using Pn+mn as before.

Truncated prediction for ARMA #

For ARMA(p,q) models, the truncated predictors for m=1,2, are x~n+mn=ϕ1x~n+m1n++ϕpx~n+mpn+θ1w~n+m1n++θqw~n+mqn, where x~tn=xt for 1tn and x~tn=0 for t0. The truncated prediction errors are given by w~tn={0t0 or t>nϕ(B)x~nnθ1w~t1nθqw~tqn1tn. (See Example 3.24 for an illustration). Note:

  • For AR(p) models with n>p, x~n+mn=x~n+m=xn+mn and there is no need for approximations.
  • The above approximation is required for MA(q) and ARMA(p,q) models with q>0.

Long-range forecasts #

  • Consider forecasting an ARMA process with mean μx.
  • Replacing xn+m with xn+mμx in the causal representation above, and taking expectation in an analogous way implies that the m-step-ahead forecast can be written as x~n+m=μx+j=mψjwn+mj.
  • Because the ψ dampen to zero exponentially fast, x~n+mμx exponentially fast as m.
  • Moreover, the mean square prediction error Pn+mnσw2j=0ψj2=γx(0)=σx2 exponentially fast as m.
  • ARMA forecasts quickly settle to the mean with a constant prediction error as the forecast horizon, m, grows.

Example: Recruitment Series #

fore2 <- predict(rec.arima0, n.ahead = 36)
cbind(fore2$pred, fore2$se)
...
##          fore2$pred  fore2$se
## Oct 1987   20.36547  9.451686
## Nov 1987   26.08036 15.888378
## Dec 1987   32.65148 20.464325
## Jan 1988   38.89474 23.492457
## Feb 1988   44.30006 25.393693
## Mar 1988   48.72437 26.537088
## Apr 1988   52.20958 27.199368
## May 1988   54.87831 27.570234
## Jun 1988   56.87693 27.771616
## Jul 1988   58.34666 27.877923
## Aug 1988   59.41079 27.932597
## Sep 1988   60.17081 27.960042
## Oct 1988   60.70697 27.973508
## Nov 1988   61.08091 27.979974
## Dec 1988   61.33890 27.983014
...

ts.plot(rec, fore2$pred, col = 1:2, xlim = c(1980, 1990.5), ylab = "Recruitment")
U <- fore2$pred + fore2$se
L <- fore2$pred - fore2$se
xx <- c(time(U), rev(time(U)))
yy <- c(L, rev(U))
polygon(xx, yy, border = 8, col = gray(0.6, alpha = 0.2))
lines(fore2$pred, type = "p", col = 2)

Example: Air Passengers #

We forecast the model chosen in the previous section:

sarima.for(lx, 12, 0, 1, 1, 0, 1, 1, 12)

Including seasonality leads (apparently) to much higher precision.

References #

Shumway, Robert H., and David S. Stoffer. 2017. Time Series Analysis and Its Applications: With r Examples. Springer.