M9 Time Series Models

Introduction (TS 3.1, 3.3, 3.6, S6) #

Overview of main models #

We are now ready to introduce the main models with full generality, and to discuss an overall estimation, fitting, and forecasting approach. The main models we will review are:

  • Autoregressive models of order p, denoted AR(p): xt=α+ϕ1xt1+ϕ2xt2++ϕpxtp+wt, where α=μ(1ϕ1ϕ2ϕp).
  • Moving averages of order q, denoted MA(q): xt=wt+θ1wt1+θ2wt2++θqwtq.
  • ARMA(p,q) models, which are a combination of the above: xt=ϕ1xt1++ϕpxtp+wt+θ1wt1++θqwtq

Variations #

  • ARIMA(p,d,q) models, which reduce to ARMA(p,q) when differenced d times.
  • Multiplicative seasonal ARIMA models:
    • Pure seasonal autoregressive moving average ARMA(P,Q)s models
    • Multiplicative seasonal autoregressive moving average models
    • Seasonal autoregressive integrated moving average models
  • Multivariate time series—Vector Auto-Regressive VAR(p) models
  • Extra models mentioned by the CS2 syllabus:
    • bilinear model
    • threshold autoregressive model
    • ARCH(p) models

Partial Autocorrelation Function (PACF) #

Motivation #

  • There is one more property we need to introduce, the Partial Autocorrelation Function (PACF).
  • The ACF can help identify q for a MA(q) model, because it should be 0 for lags >q
  • However, the order of an AR(p) or ARMA(p,q) model cannot be recognised from the ACF.
  • What we need for autoregressive models is a measure of dependence at two different points in time,
    with the effect of the other points in time removed.
  • We will remove the linear effect of all variables in-between, and so by increasing the lag iteratively, this partial correlation should be 0 after some lag ( p for AR(p))

Reminder: partial correlation #

  • Let X, Y, and Z be random variables.
  • The partial correlation between X and Y, given Z is obtained by
    • Regressing X on Z to obtain X^;
    • Regressing Y on Z to obtain Y^,
  • Then the partial correlation is ρXY|Z=corr{XX^,YY^}.
  • In the special case where the random variables are multivariate normal (and hence dependence is linear), ρXY|Z=corr{X,Y|Z}. (Otherwise, X^ and Y^ are only linear projections, not the full story.)

The partial autocorrelation function (PACF) #

The partial autocorrelation function (PACF) of a stationary process, xt, denoted ϕhh, for h=1,2,, is ϕ11=corr(xt+1,xt)=ρ(1), and ϕhh=corr(xt+hx^t+h,xtx^t),h2. Note:

  • The PACF, ϕhh, is the correlation between xt+h and xt with the linear dependence of {xt+1,,xt+h1} on each, removed.
  • The PACF of an AR(p) model will be 0 for h>p.
  • The R function pacf will display a sample ϕhh for h>0.

Examples of AR PACFs #

set.seed(2)
AR1 <- arima.sim(list(order = c(1, 0, 0), ar = 0.8), n = 10000)
AR2 <- arima.sim(list(order = c(2, 0, 0), ar = c(0.8, -0.1)),
  n = 10000)
AR5 <- arima.sim(list(order = c(5, 0, 0), ar = c(0.8, -0.8, 0.5,
  -0.5, 0.3)), n = 10000)
par(mfrow = c(2, 3))
acf(AR1)
acf(AR2)
acf(AR5)
pacf(AR1)
pacf(AR2)
pacf(AR5)

Examples of MA PACFs #

set.seed(2)
MA1 <- arima.sim(list(order = c(0, 0, 1), ma = 0.8), n = 10000)
MA2 <- arima.sim(list(order = c(0, 0, 2), ma = c(0.8, -0.1)),
  n = 10000)
MA5 <- arima.sim(list(order = c(0, 0, 5), ma = c(0.8, -0.8, 0.5,
  -0.5, 0.3)), n = 10000)
par(mfrow = c(2, 3))
acf(MA1)
acf(MA2)
acf(MA5)
pacf(MA1)
pacf(MA2)
pacf(MA5)

AR(p) models (TS 3.1, 3.3, 3.6, S6) #

Definition #

AR(p) models #

An autoregressive model of order p AR(p) is of the form xt=α+ϕ1xt1+ϕ2xt2++ϕpxtp+wt, where xt is stationary, wtwn(0,σw2), ϕ1,ϕ2,,ϕp are constants (ϕp0), and where α=μ(1ϕ1ϕ2ϕp).

Assume α=0 and rewrite (1ϕ1Bϕ2B2ϕpBp)xt=wtorϕ(B)xt=wt. Note:

  • Unless stated otherwise we assume α=0. The mean of xt is then zero.
  • Recall that here the regressors xt1,,xtp are random components (as opposed to the ‘usual’ regression).
  • The autoregressive operator is defined as ϕ(B)=(1ϕ1Bϕ2B2ϕpBp), the roots of which will be crucial for the analysis.

Example: AR(1) model #

An AR(1) is xt=ϕxt1+wt=ϕ(ϕxt2+wt1)+wt=ϕkxtk+j=0k1ϕjwtj If one continues to iterate backwards, one gets (provided |ϕ|<1 and suptVar(xt)<) xt=j=0ϕjwtj, a linear process! This is called the stationary solution of the model.


The stationary solution of the model has mean E[xt]=j=0ϕjE[wtj]=0, and autocovariance function (see (3.7) in TS) γ(h)=σw2ϕh1ϕ2,h0. (remember γ(h)=γ(h)) The ACF of an AR(1) is ρ(h)=γ(h)γ(0)=ϕh,h0, and ρ(h) satisfies the recursion ρ(h)=ϕρ(h1),h=1,2,.

Sample paths of AR(1) processes #

par(mfrow = c(2, 1))
plot(arima.sim(list(order = c(1, 0, 0), ar = 0.9), n = 100),
  ylab = "x", main = (expression(AR(1) ~ ~~phi == +0.9 ~ ~~sigma^2 ==
    1 ~ ~~rho == 0.9^h)))
plot(arima.sim(list(order = c(1, 0, 0), ar = -0.9), n = 100),
  ylab = "x", main = (expression(AR(1) ~ ~~phi == -0.9 ~ ~~sigma^2 ==
    1 ~ ~~rho == (-0.9)^h)))

Stationarity (and Causality) #

Stationarity of AR(p) processes #

The time series process xt is stationary and causal if, and only if, the roots of the characteristic polynomial 1ϕ1zϕ2z2ϕpzp=ϕ(z)=0 (which can be complex numbers) are all greater than 1 in absolute value (outside the unit circle).

We will get a sense of why as we make our way through this module.

ACF and PACF #

Autocovariances #

Remember that for stationary xt=j=1pϕjxtj+wt we have γ(k)=Cov(j=1pϕjxtj+wt,xtk)=j=1pϕjγ(kj)for k0. This is a p-th order difference equation with constant coefficients with solution (see TS 3.2) γ(k)=j=1pAjzjk for all k0, where z1,,zp are the p roots of the characteristic polynomial ϕ(z)=0, and A1,,Ap are constants depending on the initial values.

R can calculate this for you (code is provided later).


Note:

  • we expect γ(k)0 as k for stationary xt
  • this is equivalent to |zj|>1 for all j
  • this is equivalent to our iff condition for xt to be stationary and causal

Yule-Walker equations #

  • We have a solution form for γ(k), but no explicit solution yet. These need to be solved for.
  • Sometimes, it is quicker to use the so-called “Yule-Walker equations”. Consider the equation for an AR(p) model with μ=0, multiply by xtk, take expectations to get Cov(xt,xtk)=ϕ1Cov(xt1,xtk)+ϕ2Cov(xt2,xtk)++ϕpCov(xtp,xtk)+Cov(wt,xtk),0kp, that is, γ(k)=ϕ1γ(k1)+ϕ2γ(k2)++ϕpγ(kp)+σw21{k=0},0kp. This is solvable thanks to the fact that γ(k)=γ(k).
  • A matrix representation will be introduced later

Example #

  • For an AR(3) model we have γ(3)=ϕ1γ(2)+ϕ2γ(1)+ϕ3γ(0),γ(2)=ϕ1γ(1)+ϕ2γ(0)+ϕ3γ(1),γ(1)=ϕ1γ(0)+ϕ2γ(1)+ϕ3γ(2),γ(0)=ϕ1γ(1)+ϕ2γ(2)+ϕ3γ(3)+σw2.
  • The second and third equations can be solved linearly to obtain expressions for γ(1) and γ(2) as a constant times γ(0), which yields explicitly ρ(1) and ρ(2). ( Remember ρ(k)=γ(k)/γ(0), ρ(0)=1 ) ρ(2)=ϕ1ρ(1)+ϕ2+ϕ3ρ(1),ρ(1)=ϕ1+ϕ2ρ(1)+ϕ3ρ(2),+σw2.
  • For numerical values of γ(1) and γ(2), and indeed the others, too, the system needs to be solved for.
    The matrix representation will help.

ACF #

As illustrated above ρ(h)=j=1pϕjρ(kj), because ρ(h)=γ(h)/γ(0).

Note:

  • For stationary and causal AR, |zi|>1, i=1,,r ( rp distinct roots)
  • If all the roots are real, then ρ(h) dampens exponentially fast to zero as h.
  • If some of the roots are complex, then they will be in conjugate pairs and ρ(h) will dampen, in a sinusoidal fashion, exponentially fast to zero as h. In this case, the time series may appear to be cyclic in nature.
  • This property flows on to ARMA models.

PACF #

When h>p the regression of xt+h on {xt+1,,xt+h1} is x^t+h=j=1pϕjxt+hj,h>p. This is not a typo, it is a really nice result! (see TS for a proof).

[why? because this means that x^t+h=xt+hwt+h]

This means that xtx^t, which will depend only on {wt+h1,wt+h2,}, has no overlap with xt+hx^t+h=wt+h

Hence the PACF ϕhh=corr(xt+hx^t+h,xtx^t)=corr(wt+h,xtx^t)=0,h>p.


In summary, we know that ϕhh=0, for all h>p “by design” (we wanted a measure that would die down beyond p for diagnostic reasons).

“In-between”, we have that ϕ11,,ϕpp are not necessarily 0.

Furthermore, it can be shown that ϕpp=ϕp, a really nice feature.


Explosive AR models and causality #

Remember the random walk (an AR(1) with ϕ=1) xt=xt1+wt is not stationary. This was because xt1 includes all past wt’s, leading to an exploding variance.


Examination of the AR(1) representation xt=j=0ϕjwtj suggests that the key to contain explosion is with ϕ. If |ϕj| increases without bound as j the expected value of this quantity won’t converge. Explosive processes quickly become large in magnitude, leading to unstationarity.

Unfortunately not all AR models are stationary. For the random walk, ϕ=1 and it is not stationary. What are the values of ϕ so that this does not happen?


Assume |ϕ|>1. Write (by iterating forward k steps) xt=ϕ1xt+1ϕ1wt+1=ϕ1(ϕ1xt+2ϕ1wt+2)ϕ1wt+1=ϕkxt+kj=1k1ϕjwt+j

Because |ϕ|1<1, this result suggests the future dependent AR(1) model xt=j=1ϕjwt+j. This is of the AR(1) form xt=ϕxt1+wt, but it is useless because it requires us to know the future to be able to predict the future!


When a process does not depend on the future— such as AR(1) when |ϕ|<1— we will say the process is causal.

The model above with |ϕ|>1 is stationary, but it is also future dependent, and hence is not causal.

Here is the lesson for p>1:

  • stationary and causal are not equivalent conditions
  • depending on the parameters of your AR(p) model, you may have a future dependent model without knowing it: when p>1 it is not obvious by just looking at the parameters ϕj
  • that’s why the condition above stated “stationary and causal”
  • this is further discussed in the ARMA section, where the argument above is further generalised to p>1

MA(q) models (TS 3.1, 3.3, 3.6, S6) #

Definition #

MA(q) models #

The moving average model of order q MA(q) is of the form xt=wt+θ1wt1+θ2wt2++θqwtq=θ(B)wt, where wtwn(0,σw2) and θ1,θ2,,θq (θq0) are parameters.

Note:

  • The AR combines linearly the xt’s, whereas the MA combines linearly the wt’s.
  • The moving average operator is defined as θ(B)=1+θ1B+θ2B2++θqBq.
  • Unlike the AR, the MA is stationary for any value of the parameters θ’s

Example: MA(1) model #

An MA(1) is xt=wt+θwt1. Hence, E[xt]=0, and γ(h)={(1+θ2)σw2h=0,θσw2h=1,0h>1, and the ACF is ρ(h)={(1h=0)θ(1+θ2)h=1,0h>1. Furthermore |ρ(1)|1/2 for all values of θ.

Sample paths of MA(1) processes #

par(mfrow = c(2, 1))
plot(arima.sim(list(order = c(0, 0, 1), ma = 0.5), n = 100),
  ylab = "x", main = (expression(MA(1) ~ ~~theta == +0.5)))
plot(arima.sim(list(order = c(0, 0, 1), ma = -0.5), n = 100),
  ylab = "x", main = (expression(MA(1) ~ ~~theta == -0.5)))

Non-uniqueness and invertibility #

  • Note that (multiply the RHS by θ2 to get the original formula): ρ(h)=θ(1+θ2)=1/θ(1+1/θ2).
  • Furthermore, the pair (θ=5,σw2=1) yield the same autocovariance function as the pair (θ=1/5,σw2=25), namely, γ(h)={26h=0,5h=1,0h>1.

  • Thus, the MA(1) processes xt=wt+15wt1,wtN(0,25) and yt=vt+5vt1,vtN(0,1) are the same because of normality (same first two moments, and the γ fully determine their dependence structure).
  • We can only observe xt or yt, but not wt or vt so we cannot distinguish between them.
  • We encountered this phenomenon already in Tutorial Exercise TS5.

For convenience, we will systematically choose the version that is invertible, which means a process that has an infinite AR representation, as explained below.

Example:

  • Consider the following inversion of the roles of xt and wt in the specific case of an MA(1) model: wt=θwt1+xt=j=0(θ)jxtj if |θ|<1, which is an infinite AR representation of the model.
  • Since we need |θ|<1 for this to work, we will choose the version with (θ=1/5,σ22=25).

How can we generalise this to q>1?

  • As in the AR case, the polynomial θ(z) is key. The inversion in general is xt=θ(B)wtπ(B)xt=wtwhere π(B)=θ1(B)
  • Just as we required |θ|<1 for , we will

Example: in the MA(1) case, θ(z)=1+θzπ(z)=θ1(z)=11+θz=j=0(θ)jzj if |θz|<1. Consequently, π(B)=j=0(θ)jBj.

ACF and PACF #

Autocovariance #

We have γ(h)=Cov(xt+h,xt)=Cov(j=0qθjwt+hj,k=0qθkwtk)={σw2j=0qhθjθj+h,0hq,0h>q.=γ(h) Note

  • γ(q)0 because θq0
  • the cutting off of γ(h) after q lags is the signature of the MA(q) model.

ACF #

The ACF is then ρ(h)={j=0qhθjθj+h1+θ12++θq2,1hq,0h>q.

PACF #

An invertible MA(q) can be written as xt=j=1πjxtj+wt. No finite representation exists, and hence the PACF will never cut off (as opposed to the case of AR(p) ).

Example: For an MA(1), ϕ11=ρ(1),ϕ22=θ21+θ2+θ4,ϕhh=(θ)h(1θ2)1θ2(h+1),h1. This is derived in the textbook.


ARMA(p,q) models (TS 3.1, 3.3, 3.6, S6) #

Definition #

A time series {xt;t=0,±1,±2,} is ARMA(p,q) if it is stationary and xt=ϕ1xt1++ϕpxtp+wt+θ1wt1++θqwtq, with ϕp0, θq0, and σw2>0. The parameters p and q are called the autoregressive and the moving average orders, respectively. If xt has a nonzero μ, we set α=μ(1ϕ1ϕp) and write the model as xt=α+ϕ1xt1++ϕpxtp+wt+θ1wt1++θqwtq, where wtwn(0,σw2).

This can be rewritten very concisely as ϕ(B)xt=θ(B)wt.

Parameter redundancy #

Multiplying both sides by a third operator η(B) leads to η(B)ϕ(B)xt=η(B)θ(B)wt, which doesn’t change the dynamics. One could then have redundant parameters. For example:

  • Consider the white noise process xt=wt, which is “$ARMA(0,0)$”
  • If we multiply both sides by η(B)=10.5B then the model becomes xt=0.5xt10.5wt1+wt, which looks like ARMA(1,1), even though it is still white noise.
  • We have hidden that fact through over-parametrisation, leading to parameter redundancy.
  • Fitting results are likely to suggest that all parameters (including
    the unnecessary ones) are significant.

The AR and MA polynomials #

The AR and MA polynomials are defined as ϕ(z)=1ϕ1zϕpzp,ϕp0, and θ(z)=1+θ1z++θqzq,θq0, respectively, where z may be a complex number.

These will be used extensively to study the properties of ARMA processes.

Properties #

Problems to keep in mind #

In the previous sections, we identified the following potential “problems”, or potential “issues to keep in mind”:

  • parameter redundant models,
  • stationary AR models that depend on the future, and
  • MA models that are not unique.

We now discuss how to avoid those issues.

Parameter redundant models #

We will require:

  • ϕ(z) and θ(z) cannot have common factors

This will ensure that when referring to an ARMA(p,q) model, there can’t be a reduced form of it.

Future dependence and causality #

[What does causal mean?] An ARMA(p,q) model is said to be causal, if the time series {xt;t=0,±1,±2,} can be written as a one-sided linear process: xt=j=0ψjwtj=ψ(B)wt, where ψ(B)=j=0ψjBj, and j=0|ψj|<; we set ψ0=1.

Note:

  • key here is that the sum starts at j=0 (so that it depends only on the past);
  • the ψ parameters are new ones, which are defined in the equation above (“such that” xt can be written as such a sum).

[When is it causal?] An ARMA(p,q) model is causal if and only if ϕ(z)0 for |z|1.

To see this, note that the coefficients of the linear process given above can be determined by solving ψ(z)=j=0ψjzj=θ(z)ϕ(z),|z|1.

Equivalently, an ARMA process is causal only when the roots of ϕ(z)
lie outside the unit circle, that is, ϕ(z)=0 only when |z|>1.

Invertibility #

[What does invertible mean?] An ARMA(p,q) model is said to be invertible, if the time series {xt;t=0,±1,±2,} can be written as π(B)xt=j=0πjxtj=wt, where π(B)=j=0πjBj, and j=0|πj|<; we set π0=1.

Note:

  • key here is that we express the model as “$w_t$ is a function of xt’s” even though our focus (what we model and observe) is xt. That’s the “invertible” idea;
  • the π parameters are new ones, which are defined in the equation above (“such that” wt can be written as such a sum of xt’s).

[When is it invertible?] An ARMA(p,q) model is invertible if and only if θ(z)0 for |z|1. The coefficients πj of π(B) given above can be determined by solving π(z)=j=0πjzj=ϕ(z)θ(z),|z|1. Equivalently, an ARMA process is invertible only when the roots of
θ(z) lie outside the unit circle; that is, θ(z)=0 only when |z|>1.

Example: parameter redundancy, causality, invertability #

Consider the process: xt=0.4xt1+0.45xt2+wt+wt1+0.25wt2. The first step here is to express that with the help of the backshift operator: (10.4B0.45B2)xt=(1+B+0.25B2)wt This looks like an ARMA(2,2) process, but there’s a trick! Write the AR and MA polynomials, and try to factorise them: ϕ(B)=10.4B0.45B2=(1+0.5B)(10.9B)θ(B)=1+B+0.25B2=(1+0.5B)2 There is a common factor, leading to parameter redundancy!


Factorise the common factor (1+0.5B) out, and get ϕ(B)=10.9Bθ(B)=1+0.5B so that our model is in fact xt=0.9xt1+0.5wt1+wt, an ARMA(1,1) model.
[Parameter redundancy: checked!]

We next check whether the process is causal. We need the root of ϕ(z)=10.9z=0 to be outside the unit circle, which it is as the solution is z=10/9>1. [Causality: checked!]


We next check wether the model is invertible. We need the root of θ(z)=1+0.5z=0 to be outside the unit circle, which it is as the solution is z=2. [Invertibility: checked!]

Now, we would like to find the linear representation of the process, that is, get the ψ weights. Because ϕ(z)ψ(z)=θ(z)(10.9z)(1+ψ1z+ψ2z2++ψjzj+)=1+0.5z(regrouping coefficients of powers of z)1+(ψ10.9)z+(ψ20.9ψ1)z2++(ψj0.9ψj1)zj+=1+0.5z We compare coefficients of the powers of z, and note that all coefficients of zj are 0 for j>1 on the RHS.


We obtain then ψ10.9=0.5ψ1=1.4,ψj0.9ψj1=0ψjψj1=0.9,j>1. and thus ψj=1.4(0.9)j1 for j1, and hence xt=wt+1.4j=10.9j1wtj. In R, this is much quicker! just use ( xt=0.9xt1+0.5wt1+wt )

format(ARMAtoMA(ar = 0.9, ma = 0.5, 10), digits = 2)  # first 10 psi-weights
##  [1] "1.40" "1.26" "1.13" "1.02" "0.92" "0.83" "0.74" "0.67" "0.60"
## [10] "0.54"

[Linear representation: checked!]


(no, it’s not over yet!)

Next, we want to determine the invertible representation of the model. Because θ(z)π(z)=ϕ(z)(1+0.5z)(1+π1z+π2z2+π3z3+)=10.9z(regrouping coefficients of powers of z)1+(π1+0.5)z+(π2+0.5π1)z2++(πj+0.5πj1)zj+=10.9z We compare coefficients of the powers of z.


We get πj=(1)j1.4(0.5)j1 for j1 and then wt=j=0πjxtj=xt+j=1πjxtj so that xt=j=1πjxtj+wt.

Again, this is much quicker in R: ( wt=0.5wt10.9xt1+xt )

format(ARMAtoMA(ar = -0.5, ma = -0.9, 10), digits = 1)  # first 10 pi-weights
##  [1] "-1.400" " 0.700" "-0.350" " 0.175" "-0.087" " 0.044" "-0.022"
##  [8] " 0.011" "-0.005" " 0.003"

[Invertible representation: checked!]

Stationarity, Causality and Invertibility #

Wrapping it up #

First, it is helpful to rewrite the ARMA representation as xt=θ(B)ϕ(B)wt=ψ(B)wt To summarise:

  • first, we require θ(B) and ϕ(B) to not have common factors. If they do, these will obviously cancel out in the ratio θ(B)/ϕ(B) and we, really, are only dealing with the “reduced” model (without the redundant parameters).
  • pure AR(p) models will be stationary as long as ψ(B) is well behaved (say, finite), which will happen as long as the roots of ϕ(B) are all greater than one in modulus (|zj|>1); this is because ϕ(B) appears in the denominator. Since this is not impacted by θ(B), which is in the numerator, this result also holds in the case of ARMA(p,q)
    models.

  • pure MA(q) models—where ϕ(B)=1—are always stationary (under some mild conditions on the coefficients which we ignore here), because by definition they include a finite number of w’s (all covariances are finite). They are also causal by definition.
  • now, even though MA(q) are always causal, establishing causality for AR(p) and ARMA(p,q) models—where p>0—is a little trickier. We not only need to to be well behaved, but we also need it to depend on past values only (an example of stationary but non-causal model is provided on slide 26 of Module 9 above; see also next subsection). It turns out that you only need the roots to not be on the unit circle for the process to be stationary. For it to be causal, the additional requirement is that it needs to be outside the unit circle. In other words, while you could have roots inside the unit circle and still achieve stationarity, the process would not be causal. This means that causality will imply stationarity, but not the other way around.

  • Again, stationarity does not require the roots of θ(B) to be greater than one in modulus. This is required for invertibility, which aims to flip things around (“invert” the process): wt=ϕ(B)θ(B)xt. It becomes clear why, now, it is the roots of θ(B) that need to be well behaved (as it is now θ(B) that is in the denominator).

Revisiting the future-dependent example #

Let us revisit that example: We have xt=ϕxt1+wt. In this case the characteristic equation is ϕ(z)=1ϕz. This has root z0=1/ϕ.


We distinguish three cases:

  1. ϕ<1 (say, ϕ=0.5 so that z0=1/ϕ=2): the root is not on the unit circle, and also outside the unit circle, so that the process—which is AR(1)—is stationary and causal.
  2. ϕ=1 (z0=1): the root is on the unit circle, and hence the process is not stationary. This is the random walk case. Note it is not causal either, because a causal process is a linear process that depends on the past only, and while the random walk depends on the present and past only, it does not satisfy the requirement (for it to be a linear process) that the sum of the absolute value of weights is finite (the sum of an infinite number of 1’s is infinite) - see Module 7.
  3. ϕ>1 (say, ϕ=3 so that z0=1/3<1): the root is not on the unit circle, and hence the process is stationary. However, the root is inside the unit circle, which implies it is not causal. The process will depend on the future as demonstrated earlier.

ACF and PACF #

Autocovariance and ACF #

For a causal ARMA(p,q) model ϕ(B)xt=θ(B)wt we use the linear representation xt=j=0ψjwtj, from which it follows immediately that E[xt]=0, and the autocovariance function of xt is γ(h)=Cov(xt+h,xt)=σw2j=0ψjψj+h,h0. This approach requires solving for the ψ’s.


Alternatively, it is possible to write a general homogeneous equation for the ACF of a causal ARMA process to solve for the γ’s directly
(The proof is outside scope but available in TS): γ(h)ϕ1γ(h1)ϕpγ(hp)=0,hmax(p,q+1), with initial conditions γ(h)j=1pϕjγ(hj)=σw2j=hqθjψjh,0h<max(p,q+1). Finally, the ACF is ρ(h)=γ(h)γ(0). In general, the ACF cannot distinguish between AR and ARMA, which is why PACF is useful (in presence of pure AR, it will cut off).

Example: ACF of ARMA(1,1) #

Consider the ARMA(1,1) process xt=ϕxt1+θwt1+wt, where |ϕ|<1. The autocovariance then satisfies γ(h)ϕγ(h1)=0,h=2,3,, which has general solution γ(h)=cϕh,h=1,2,. Initial conditions are γ(0)=ϕγ(1)+σw2(θ0ψ0+θ1ψ1)=ϕγ(1)+σw2[1+θϕ+θ2]γ(1)=ϕγ(0)+σw2θ. Note that ψ1=θ+ϕ for an ARMA(1,1) model
(see Example 3.12 in the textbook).


Solving for γ(0) and γ(1) yields γ(0)=σw21+2θϕ+θ21ϕ2γ(1)=σw2(1+θϕ)(ϕ+θ)1ϕ2. Since γ(1)=cϕ, we get c=γ(1)/ϕ and γ(h)=γ(1)ϕϕh=σw2(1+θϕ)(ϕ+θ)1ϕ2ϕh1,h1, from which we obtain the ACF ρ(h)=γ(h)γ(0)=γ(1)γ(0)ϕh1=(1+θϕ)(ϕ+θ)1+2θϕ+θ2ϕh1,h1. This has the same pattern as an AR(1).

ARIMA(p,d,q) models (TS 3.1, 3.3, 3.6, S6) #

Definition #

A process xt is said to be ARIMA(p,d,q) if dxt=(1B)dxt is ARMA(p,q). In general, we will write the model as ϕ(B)(1B)dxt=θ(B)wt. If E[dxt]=μ, we write the model as ϕ(B)(1B)dxt=δ+θ(B)wt, where δ=μ(1ϕ1ϕp). The integrated ARMA, or ARIMA, is a broadening of the class of
ARMA models to include differencing.

Remarks #

  • Because of nonstationarity, care must be taken when deriving forecasts. It is best to use so-called state-space models for handling nonstationary models (but these are outside the scope of this course).
  • Since yt=dxt is ARMA, it suffices to discuss how to fit and forecast ARMA models. For instance, if d=1, given forecast yn+mn for m=1,2,, we have yn+mn=xn+mnxn+m1n so that xn+mn=yn+mn+xn+m1n with initial condition xn+1n=yn+1n+xn (noting xnn=xn).
  • Derivation of prediction errors is slightly more involved (but also outside the scope of this course).

The IMA(1,1) model and exponential smoothing #

IMA(1,1) and EWMA #

  • The ARIMA(0,1,1), or IMA(1,1) model is of interest because many economic time series can be successfully modelled this way.
  • The model leads to a frequently used— and abused! —forecasting method called exponentially weighted moving averages (EWMA) (“exponential smoothing” in S6).
  • One should not use this method unless there is strong statistical evidence that this is the right model to use!
    (as per the following sections)

EWMA example #

set.seed(666)
x <- arima.sim(list(order = c(0, 1, 1), ma = -0.8), n = 100)  #simulate IMA(1,1)
x.ima <- HoltWinters(x, beta = FALSE, gamma = FALSE)  # fit EWMA
plot(x.ima)

Multiplicative seasonal ARIMA models (TS 3.9) #

Introduction #

  • We observed seasonality in many time series so far. This is manifested when the dependence on the past tends to occur most strongly at multiples of some underlying seasonal lag s. For example:
    • monthly economic data: strong yearly component occurring at lags that are multiples of s=12;
    • data taken quarterly: will exhibit the yearly repetitive period at s=4 quarters.
  • In this section, we introduce several modifications made to the ARIMA model to account for seasonal and nonstationary behavior.
  • The reference for this section is TS 3.9.
  • Note that seasonality can be due to:
    • Deterministic reasons: include in trend
    • Stochastic dependence at seasonal lags s: use SARIMA models

Pure seasonal ARMA(P,Q)s models #

The pure seasonal autoregressive moving average model ARMA(P,Q)s takes the form ΦP(Bs)xt=ΘQ(Bs)wt, where ΦP(Bs)=1Φ1BsΦ2B2sΦPBPs is the seasonal autoregressive operator of order P with seasonal period s, and where ΘQ(Bs)=1+Θ1Bs+Θ2B2s++ΘQBPs is the seasonal moving average operator of order Q with seasonal period s.

Here, inter-temporal correlations are exclusively seasonal (there are no non-seasonal correlations).


Note: (analogous to nonseasonal ARMA)

  • It will be causal only when the roots of ΦP(zs) lie outside the unit circle
  • It will be invertible only when the roots of ΘQ(zs) lie outside the unit circle

Example: first order seasonal (s=12) AR model #

A first-order seasonal autoregressive series that might run over months could be written as (1ΦB12)xt=wtorxt=Φxt12+wt.` Note:

  • xt is expressed in terms of past lags at multiple of the (yearly) seasonal period s=12 months
  • very similar to the unit lag model AR(1) that we know
  • causal if |Φ|<1
    Simulated example (with Φ=0.9):
set.seed(666)
phi <- c(rep(0, 11), 0.9)
sAR <- arima.sim(list(order = c(12, 0, 0), ar = phi), n = 37)
sAR <- ts(sAR, freq = 12)

par(mar = c(3, 3, 2, 1), mgp = c(1.6, 0.6, 0))
plot(sAR, axes = FALSE, main = "Seasonal AR(1)", xlab = "year",
  type = "c")
Months <- c("J", "F", "M", "A", "M", "J", "J", "A", "S", "O",
  "N", "D")
points(sAR, pch = Months, cex = 1.25, font = 4, col = 1:4)
axis(1, 1:4)
abline(v = 1:4, lty = 2, col = gray(0.7))
axis(2)
box()


Theoretical ACF and PACF of the model:

ACF <- ARMAacf(ar = phi, ma = 0, 100)
PACF <- ARMAacf(ar = phi, ma = 0, 100, pacf = TRUE)
par(mfrow = c(2, 1), mar = c(3, 3, 2, 1), mgp = c(1.6, 0.6, 0))
plot(ACF, type = "h", xlab = "LAG", ylim = c(-0.1, 1))
abline(h = 0)
plot(PACF, type = "h", xlab = "LAG", ylim = c(-0.1, 1))
abline(h = 0)

Autocovariance and ACF of first-order seasonal models #

  1. For the first-order seasonal (s=12) MA model xt=wt+Θwt12 we have γ(0)=(1+Θ2)σ2γ(±12)=Θσ2γ(h)=0, otherwise. Thus, the only nonzero correlation (aside from lag zero) is ρ(±12)=Θ1+Θ2. The PACF will tail off at multiples of s=12.

  1. For the first-order seasonal (s=12) AR model xt=Φxt12+wt we have γ(0)=σ21Φ2γ(±12k)=σ2Φk1Φ2,k=1,2,γ(h)=0, otherwise. Thus, the only nonzero correlations are ρ(±12k)=Φk,k=1,2,. The PACF will have one nonzero correlation at s=12 and then cut off.

Multiplicative seasonal ARMA(p,q)×(P,Q)s models #

In general, we can combine the seasonal and nonseasonal operators into a multiplicative seasonal autoregressive moving average model ARMA(p,q)×(P,Q)s, and write ΦP(Bs)ϕ(B)xt=ΘQ(Bs)θ(B)wt as the overall model.

Note:

  • When selecting a model, we will need to carefully examine the ACF and PACF of the data.
  • Choosing the seasonal autoregressive and moving average components first will generally lead to better results.
  • This will be discussed more in Module 10.

Example: A mixed seasonal model ARMA(0,1)×(1,0)12 #

Consider an ARMA(0,1)×(1,0)12 model xt=Φxt12+wt+θwt1, where |Φ|<1 and |θ|<1.

Because

  • xt12, wt and wt1 are uncorrelated; and
  • xt is stationary

then γ(0)=Φ2γ(0)+σw2+θ2σw2γ(0)=1+θ21Φ2σw2.


Furthermore, multiplying the model by xth, h>0 xtxth=Φxt12xth+wtxth+θwt1xth and taking expectations leads to γ(1)=Φγ(11)+θσw2γ(h)=Φγ(h12) for h2. The first result stems from γ(1)=E[xtxt1]=E[Φxt12xt1]+E[wtxt1]+E[θwt1xt1]=Φγ(11)+0+θσw2 because xt1=Φxt13+wt1+θwt2. Proof of the second result is similar.


Thus, the ACF for this model (requires some algebra) is

ρ(12h)=Φh,h=1,2,ρ(12h1)=ρ(12h+1)=θ1+θ2Φh,h=0,1,2,ρ(h)=0otherwise.

Example: if Φ=0.8 and θ=.5, then theoretical ACF and PACF become

phi <- c(rep(0, 11), 0.8)
ACF <- ARMAacf(ar = phi, ma = -0.5, 50)[-1]  # [-1] removes 0 lag
PACF <- ARMAacf(ar = phi, ma = -0.5, 50, pacf = TRUE)
par(mfrow = c(1, 2))
plot(ACF, type = "h", xlab = "LAG", ylim = c(-0.4, 0.8))
abline(h = 0)
plot(PACF, type = "h", xlab = "LAG", ylim = c(-0.4, 0.8))
abline(h = 0)

Seasonal differencing #

Motivating example #

  • Consider average temperatures over the years.
  • Each January would be approximately the same (as would February, etc…).
  • In this case we might think of the average monthly temperature as xt=St+wt, where St is a seasonal component that varies a little from one year to the next, say (random walk) St=St12+vt.
  • Here, vt and wt are uncorrelated white noise processes.

  • Note xt=St12+vt+wt and xt12=St12+wt12.
  • If we subtract the effect of successive years from each other (“seasonal differencing”), we find that (1B12)xt=xtxt12=vt+wtwt12.
  • This model is a stationary purely seasonal MA(1)12, and its ACF will have a peak only at lag 12.
  • Such a model would exhibit an ACF that is large and decays very slowly at lags h=12k, for k=1,2,.

Seasonal differencing #

  • In general, seasonal differencing can be indicated when the ACF decays slowly at multiples of some season s, but is negligible between the periods.
  • The seasonal difference of order D is defined as sDxt=(1Bs)Dxt, where D=1,2, takes positive integer values
  • Typically, D=1 is sufficient to obtain seasonal stationarity

How do we combine this idea with an ARIMA model?

SARIMA model ARIMA(p,d,q)×(P,D,Q)s #

The multiplicative seasonal autoregressive integrated moving average model, or SARIMA model is given by ΦP(Bs)ϕ(B)sDdxt=δ+ΘQ(Bs)θ(B)wt, where wt is the (usual) Gaussian white noise process.

Note:

  • This is denoted as ARIMA(p,d,q)×(P,D,Q)s.
  • The ordinary autoregressive and moving average components are represented by polynomials ϕ(B) and θ(B) of orders p and q, respectively.
  • The seasonal autoregressive and moving average components by ΦP(Bs) and ΘQ(Bs) of orders P and Q, respectively.
  • The ordinary and seasonal difference components by
    d=(1B)d and sD=(1Bs)D.

A typical SARIMA model #

Consider the following ARIMA(0,1,1)×(0,1,1)12 model with seasonal fluctuations that occur every 12 months, 12xt=Θ(B12)θ(B)wt, where we have set δ=0.

Note:

  • This model often provides a reasonable representation for seasonal, nonstationary, economic time series.
  • This model can be represented equivalently as (1B12)(1B)xt=(1+ΘB12)(1+θB)wt(1BB12+B13)xt=(1+θB+ΘB12+ΘθB13)wt.

Yet another representation is xt=xt1+xt12xt13+wt+θwt1+Θwt12+Θθwt13.

  • The multiplicative nature implies that the coefficient of wt13 is Θθ rather than yet another free parameter.
  • However, this often works well, and reduces the number of parameters needed.

Example: Air Passengers #

Consider the R data set AirPassengers, which are the monthly totals of international airline passengers, 1949 to 1960, taken from Box & Jenkins (1970):

x <- AirPassengers

We’ll explore this dataset and see we can “difference out” some seasonality.


Let’s have a look at the series:

plot.ts(x, main = "Air Passengers series, unmodified")

This shows trend plus increasing variance try a log transformation.


lx <- log(x)
plot.ts(lx, main = "Log of Air Passengers series")

The log transformation has stabilised the variance.


We now need to remove the trend, and try differencing:

dlx <- diff(lx)
plot.ts(dlx, main = "Differenced Log of Air Passengers series")

It is clear that there is still persistence in the seasons, that is,
dlx$t\approx;$ dlx${t-12}$.


We apply a twelvth-order difference:

ddlx <- diff(dlx, 12)
plot.ts(ddlx, main = "[s=12]-differenced Differenced Log of Air Passengers series")


This seems to have removed the seasonality:

par(mfrow = c(2, 1), mar = c(3, 3, 2, 1), mgp = c(1.6, 0.6, 0))
monthplot(dlx)
monthplot(ddlx)

Note the monthplot function.


To summarise:

plot.ts(cbind(x, lx, dlx, ddlx), main = "")

The transformed data appears to be stationary and
we are now ready to fit a model (see Module 10).

Multivariate time series (TS 5.6) #

Introduction #

  • Many data sets involve more than one time series, and we are often interested in the possible dynamics relating all series.
  • We are thus interested in modelling and forecasting k×1 vector-valued time series xt=(xt1,,xtk),t=0,±1,±2,.
  • Unfortunately, extending univariate ARMA models to the multivariate case is not so simple.
  • The multivariate autoregressive model, however, is a straight-forward extension of the univariate AR model.
  • The resulting models are called Vector Auto-Regressive (VAR) models.
  • The reference for this section is TS 5.6.

VAR(1) model #

For the first-order vector autoregressive model, VAR(1), we take xt=α+Φxt1+wt, where Φ is a k×k transition matrix that expresses the dependence of xt on xt1 (note these are vectors). The vector white noise process wt is assumed to be multivariate normal with mean-zero and covariance matrix E[wtwt]=Σw. The vector α=(α1,α2,...,αk) is similar to a constant in a regression setting. If E[xt]=μ, then α=(IΦ)μ as before.

Example: Mortality, Temperature, Pollution #

  • Define xt=(xt1,xt2,xt3) as a vector of dimension k=3 for cardiovascular mortality xt1, temperature xt2, and particulate levels xt3.
  • We might envision dynamic relations with first order relation xt1=α1+β1t+ϕ11xt1,1+ϕ12xt1,2+ϕ13xt1,3+wt1xt2=α2+β2t+ϕ21xt1,1+ϕ22xt1,2+ϕ23xt1,3+wt2xt3=α3+β3t+ϕ31xt1,1+ϕ32xt1,2+ϕ33xt1,3+wt3
  • This can be rewritten in matrix form as xt=Γut+Φxt1+wt, where Γ=[α|β] is 3×2 and ut=(1,t) is 2x1.

We use the R package vars to fit VAR models via least squares.

library(vars)
x <- cbind(cmort, tempr, part)
summary(VAR(x, p = 1, type = "both"))  # 'both' fits constant + trend
...
## VAR Estimation Results:
## ========================= 
## Endogenous variables: cmort, tempr, part 
## Deterministic variables: both 
## Sample size: 507 
## Log Likelihood: -5116.02 
## Roots of the characteristic polynomial:
## 0.8931 0.4953 0.1444
## Call:
## VAR(y = x, p = 1, type = "both")
...

Note that roots less than one here ensure stability; see this


library(vars)
x <- cbind(cmort, tempr, part)
summary(VAR(x, p = 1, type = "both"))  # 'both' fits constant + trend
...
## Estimation results for equation cmort: 
## ====================================== 
## cmort = cmort.l1 + tempr.l1 + part.l1 + const + trend 
## 
##           Estimate Std. Error t value Pr(>|t|)    
## cmort.l1  0.464824   0.036729  12.656  < 2e-16 ***
## tempr.l1 -0.360888   0.032188 -11.212  < 2e-16 ***
## part.l1   0.099415   0.019178   5.184 3.16e-07 ***
## const    73.227292   4.834004  15.148  < 2e-16 ***
## trend    -0.014459   0.001978  -7.308 1.07e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Residual standard error: 5.583 on 502 degrees of freedom
## Multiple R-Squared: 0.6908,	Adjusted R-squared: 0.6883 
## F-statistic: 280.3 on 4 and 502 DF,  p-value: < 2.2e-16 
...

library(vars)
x <- cbind(cmort, tempr, part)
summary(VAR(x, p = 1, type = "both"))  # 'both' fits constant + trend
...
## Estimation results for equation tempr: 
## ====================================== 
## tempr = cmort.l1 + tempr.l1 + part.l1 + const + trend 
## 
##           Estimate Std. Error t value Pr(>|t|)    
## cmort.l1 -0.244046   0.042105  -5.796 1.20e-08 ***
## tempr.l1  0.486596   0.036899  13.187  < 2e-16 ***
## part.l1  -0.127661   0.021985  -5.807 1.13e-08 ***
## const    67.585598   5.541550  12.196  < 2e-16 ***
## trend    -0.006912   0.002268  -3.048  0.00243 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Residual standard error: 6.4 on 502 degrees of freedom
## Multiple R-Squared: 0.5007,	Adjusted R-squared: 0.4967 
## F-statistic: 125.9 on 4 and 502 DF,  p-value: < 2.2e-16 
...

library(vars)
x <- cbind(cmort, tempr, part)
summary(VAR(x, p = 1, type = "both"))  # 'both' fits constant + trend
...
## Estimation results for equation part: 
## ===================================== 
## part = cmort.l1 + tempr.l1 + part.l1 + const + trend 
## 
##           Estimate Std. Error t value Pr(>|t|)    
## cmort.l1 -0.124775   0.079013  -1.579    0.115    
## tempr.l1 -0.476526   0.069245  -6.882 1.77e-11 ***
## part.l1   0.581308   0.041257  14.090  < 2e-16 ***
## const    67.463501  10.399163   6.487 2.10e-10 ***
## trend    -0.004650   0.004256  -1.093    0.275    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Residual standard error: 12.01 on 502 degrees of freedom
## Multiple R-Squared: 0.3732,	Adjusted R-squared: 0.3683 
## F-statistic: 74.74 on 4 and 502 DF,  p-value: < 2.2e-16 
...

library(vars)
x <- cbind(cmort, tempr, part)
summary(VAR(x, p = 1, type = "both"))  # 'both' fits constant + trend
...
## Covariance matrix of residuals:
##        cmort  tempr   part
## cmort 31.172  5.975  16.65
## tempr  5.975 40.965  42.32
## part  16.654 42.323 144.26
## 
## Correlation matrix of residuals:
##        cmort  tempr   part
## cmort 1.0000 0.1672 0.2484
## tempr 0.1672 1.0000 0.5506
## part  0.2484 0.5506 1.0000
...

VAR(p) models #

  • It is easy to extend the VAR(1) process to higher orders (that means with correlations going farther than one step into the past), leading to VAR(p).
  • The regressors are (1,xt1,xt2,,xtp).
  • The regression model is then xt=α+j=1pΦjxtj+wt.
  • The function VARselect suggests the optimal order p according to different criteria: AIC, Hannan-Quinn (similar to BIC), BIC, and Final Prediction Error (which minimises the approximate mean squared one-step-ahead prediction error).

Example: VAR(1) on Mortality, Temperature, Pollution #

VARselect(x, lag.max = 10, type = "both")
## $selection
## AIC(n)  HQ(n)  SC(n) FPE(n) 
##      9      5      2      9 
## 
## $criteria
##                   1           2           3           4           5
## AIC(n)     11.73780    11.30185    11.26788    11.23030    11.17634
## HQ(n)      11.78758    11.38149    11.37738    11.36967    11.34557
## SC(n)      11.86463    11.50477    11.54689    11.58541    11.60755
## FPE(n) 125216.91717 80972.28678 78268.19568 75383.73647 71426.10041
##                  6           7           8           9          10
## AIC(n)    11.15266    11.15247    11.12878    11.11915    11.12019
## HQ(n)     11.35176    11.38144    11.38760    11.40784    11.43874
## SC(n)     11.65996    11.73587    11.78827    11.85473    11.93187
## FPE(n) 69758.25113 69749.89175 68122.40518 67476.96374 67556.45243

We will proceed with order p=2 according to BIC.


summary(fit <- VAR(x, p = 2, type = "both"))
...
## VAR Estimation Results:
## ========================= 
## Endogenous variables: cmort, tempr, part 
## Deterministic variables: both 
## Sample size: 506 
## Log Likelihood: -4987.186 
## Roots of the characteristic polynomial:
## 0.8807 0.8807 0.5466 0.4746 0.4746 0.4498
## Call:
## VAR(y = x, p = 2, type = "both")
...

summary(fit <- VAR(x, p = 2, type = "both"))
...
## Estimation results for equation cmort: 
## ====================================== 
## cmort = cmort.l1 + tempr.l1 + part.l1 + cmort.l2 + tempr.l2 + part.l2 + const + trend 
## 
##           Estimate Std. Error t value Pr(>|t|)    
## cmort.l1  0.297059   0.043734   6.792 3.15e-11 ***
## tempr.l1 -0.199510   0.044274  -4.506 8.23e-06 ***
## part.l1   0.042523   0.024034   1.769  0.07745 .  
## cmort.l2  0.276194   0.041938   6.586 1.15e-10 ***
## tempr.l2 -0.079337   0.044679  -1.776  0.07639 .  
## part.l2   0.068082   0.025286   2.692  0.00733 ** 
## const    56.098652   5.916618   9.482  < 2e-16 ***
## trend    -0.011042   0.001992  -5.543 4.84e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Residual standard error: 5.295 on 498 degrees of freedom
## Multiple R-Squared: 0.7227,	Adjusted R-squared: 0.7188 
## F-statistic: 185.4 on 7 and 498 DF,  p-value: < 2.2e-16 
...

summary(fit <- VAR(x, p = 2, type = "both"))
...
## Estimation results for equation tempr: 
## ====================================== 
## tempr = cmort.l1 + tempr.l1 + part.l1 + cmort.l2 + tempr.l2 + part.l2 + const + trend 
## 
##           Estimate Std. Error t value Pr(>|t|)    
## cmort.l1 -0.108889   0.050667  -2.149  0.03211 *  
## tempr.l1  0.260963   0.051292   5.088 5.14e-07 ***
## part.l1  -0.050542   0.027844  -1.815  0.07010 .  
## cmort.l2 -0.040870   0.048587  -0.841  0.40065    
## tempr.l2  0.355592   0.051762   6.870 1.93e-11 ***
## part.l2  -0.095114   0.029295  -3.247  0.00125 ** 
## const    49.880485   6.854540   7.277 1.34e-12 ***
## trend    -0.004754   0.002308  -2.060  0.03993 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Residual standard error: 6.134 on 498 degrees of freedom
## Multiple R-Squared: 0.5445,	Adjusted R-squared: 0.5381 
## F-statistic: 85.04 on 7 and 498 DF,  p-value: < 2.2e-16 
...

summary(fit <- VAR(x, p = 2, type = "both"))
...
## Estimation results for equation part: 
## ===================================== 
## part = cmort.l1 + tempr.l1 + part.l1 + cmort.l2 + tempr.l2 + part.l2 + const + trend 
## 
##           Estimate Std. Error t value Pr(>|t|)    
## cmort.l1  0.078934   0.091773   0.860 0.390153    
## tempr.l1 -0.388808   0.092906  -4.185 3.37e-05 ***
## part.l1   0.388814   0.050433   7.709 6.92e-14 ***
## cmort.l2 -0.325112   0.088005  -3.694 0.000245 ***
## tempr.l2  0.052780   0.093756   0.563 0.573724    
## part.l2   0.382193   0.053062   7.203 2.19e-12 ***
## const    59.586169  12.415669   4.799 2.11e-06 ***
## trend    -0.007582   0.004180  -1.814 0.070328 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Residual standard error: 11.11 on 498 degrees of freedom
## Multiple R-Squared: 0.4679,	Adjusted R-squared: 0.4604 
## F-statistic: 62.57 on 7 and 498 DF,  p-value: < 2.2e-16 
...

summary(fit <- VAR(x, p = 2, type = "both"))
...
## Covariance matrix of residuals:
##        cmort  tempr   part
## cmort 28.034  7.076  16.33
## tempr  7.076 37.627  40.88
## part  16.325 40.880 123.45
## 
## Correlation matrix of residuals:
##        cmort  tempr   part
## cmort 1.0000 0.2179 0.2775
## tempr 0.2179 1.0000 0.5998
## part  0.2775 0.5998 1.0000
...

Furthermore, the Portmanteau test rejects the null hypothesis of white noise, suggesting poor fit:

serial.test(fit, lags.pt = 12, type = "PT.adjusted")
## 
## 	Portmanteau Test (adjusted)
## 
## data:  Residuals of VAR object fit
## Chi-squared = 162.35, df = 90, p-value = 4.602e-06

This is confirmed by visual inspection:

acf(resid(fit), 52)

acf(resid(fit), 52)


Predictions are produced with

fit.pr <- predict(fit, n.ahead = 24, ci = 0.95)  # 24 weeks ahead
fanchart(fit.pr)  # plot prediction + error

Special cases briefly mentioned in CS2 syllabus (S6) #

Bilinear models #

  • Consider the bilinear model xtα(xt1μ)=μ+wt+βwt1+b(xtμ)wt1.
  • This equation is linear in xt, and also linear in wt - hence the name of .
  • This can be rewritten as xt=μ+wt+(β+b(xtμ))wt1+α(xt1μ).
  • As opposed to ARMA models, the bilinear models exhibit behaviour: when the process is far from its mean it tends to exhibit larger fluctuations.
  • When the process is “at” its mean, dynamics are similar
    to that of an MA(1) process.

Threshold autoregressive models #

  • Consider the threshold autogressive model xt=μ+{α1(xt1μ)+wtxt1d,α2(xt1μ)+wtxt1>d.
  • A distinctive feature of some models from the threshold autoregressive class is limit cycle behaviour. This makes the threshold autoregressive models potential candidates for modelling ‘cyclic’ phenomena.
  • The idea is to have parameters to be allowed to change in a “regime-switching” fashion, sometimes depending on the past values of xt.
  • Some additional details can be found, for instance, on Wikipedia here (not assessable).

Random coefficient autoregressive models #

  • Here xt=μ+αt(xt1μ)+wt, where {α1,α2,,αn} is a sequence of independent random variables.
  • Such a model could be used to represent the behaviour of an investment fund, with μ=0 and αt=1+it where it is a random rate of return.
  • The extra randomness make such processes more irregular than the corresponding AR(1).

Autoregressive models with conditional heteroscedasticity #

  • A feature that is frequently observed in asset price data is that a significant change in the price of an asset is often followed by a period of high volatility.
  • The class of autoregressive models with conditional heteroscedasticity of order pthe ARCH(p) models —is defined by the relation xt=μ+wtα0+k=1pαk(xtkμ)2, where wtiid N(0,σw2).
  • Because of the scaling of wt with a term that is bigger as previous values of xt are farther from their mean μ, ARCH models go towards capturing this feature.

  • If p=1, the ARCH(1) model is xt=μ+wtα0+α(xt1μ)2.
  • ARCH models have been used for modelling financial time series. If zt is the price of an asset at the end of the t-th trading day, set xt=log(zt/zt1) (the daily return on day t).
  • Note:
    • Setting all αk=0, k=1,,p will “switch off” the “ARCH” behaviour, and we will be back with a white noise process of variance α0 (plus constant mean μ).
    • So what we are modelling here is purely the variance of the process xt, given its past ($p$) values.
    • If we extend the idea of conditional heteroskedasticity to an ARMA structure (rather than just AR), we get the so-called “GARCH” models.

References #

Shumway, Robert H., and David S. Stoffer. 2017. Time Series Analysis and Its Applications: With r Examples. Springer.