M7 Characteristics of Time Series

Introduction (TS 1.0) #

Definition #

  • Consider data of the same nature that have been observed at different points in time
  • The mere fact that they are of the same nature means that they are likely related in one way or another - let’s call those `correlations’
    (an acceptable term in this context as we focus on this measure, at least in this course)
  • This is in contrast with the usual “i.i.d.” assumptions associated with a sample of outcomes of a random variable
  • This invalidates some of the techniques we know, and brings additional difficulties, but also opportunities! (such as forecasting)

Definition: “The systematic approach by which one goes about answering the mathematical and statistical questions posed by these time correlations is commonly referred to as time series analysis.”

Applications #

The applications of time series are many, and crucial in many cases:

  • Economics: unemployment, GDP, CPI, etc
  • Finance: share prices, indices, etc
  • Medicine: COVID-19 cases and fatalities, biometric data for a patient (blood pressure, iron levels, ), etc
  • Global warming: ocean temperatures, CO2 levels, concentration of particulates in the atmosphere, sea levels, all in relation with another, and with many others
  • Actuarial studies: frequency and severity of claims in a LoB, mortality (at different ages, in different locations, ), superimposed inflation, IBNR claims, etc

Process for time series analysis #

Sketch of process:

  • Careful examination of data plotted over time (Module 7)
  • Compute major statistical indicators (Modules 7 and 8)
  • Guess an appropriate method/model for analysing the data (Modules 8 and 9)
  • Fit and assess your model (Module 9)
  • Use your model to perform forecasts if relevant (Module 10)

We distinguish two types of approaches:

  • Time domain approach: investigate lagged relationships (impact of today on tomorrow)
  • Frequency domain approach: investigate cycles (understand regular variations)

In actuarial studies, both are relevant.

Examples (TS 1.1) #

Johnson & Johnson quarterly earnings per share #

What is the primary pattern?
Can you see any cyclical pattern as well?
How does volatility change over time (if at all)?

Global mean land-ocean temperature index #

Can you see a trend? Are there periods of continuous increase?
What would be the main focus for global warming: trend or cycles?


How do these graphs support the global warming thesis?

Dow Jones Industrial Average #

How is this time series special?
What qualities would a good forecast model need to have?

Analysis of two series together: El Niño & fish population #

How many cycles can you spot?
Is there a relationship between both series?

Signals within noise #

Typically we only see the the signal obscured by noise.

Basic models (TS 1.2) #

Preliminaries #

  • Our primary objective is to develop mathematical models that provide plausible descriptions for sample data
  • A time series is a sequence of rv’s x1, x2, x3, , denoted {xt}
  • In this course, t will typically be discrete and be N (or subset)
  • One set of observed values of {xt} is referred to as a realisation
  • Time series are usually plotted with time in the x-axis, with observations connected at adjacent periods
  • Sampling rate must be sufficient, lest appearance of the data is changed completely (aliasing; see also this which explains how car wheels can appear to go backwards)
  • Smoothness of the time series suggests some level of correlation between adjacent points, or in other words that xt depends in some way on the past values xt1, xt2, . This is a good starting point for imagining appropriate theoretical models!

White noise - 3 scales :-) #

Let’s define wt as uncorrelated (over t) random variables wt with mean 0 and finite variance σw2. This is denoted wtwn(0,σw2), and is called a white noise. Two special cases:

  • White independent noise: (or iid noise) additional assumption of iid, denoted wtiid(0,σw2).
  • Gaussian white noise: further additional assumption of normal distribution, denoted wtiid N(0,σw2). Usually, time series are smoother than that (see bottom graph on the next slide). Ways of introducing serial correlation and more smoothness into time series include filtering and autoregression.

Gaussian white noise series and its 3-point moving average #

w <- rnorm(500, 0, 1)  # 500 N(0,1) variates
plot.ts(w, ylim = c(-3, 3), main = "white noise")
v <- stats::filter(w, sides = 2, filter = rep(1/3, 3))  # moving average
plot.ts(v, ylim = c(-3, 3), main = "moving average")

Filtering (and moving average) #

A series vt which is a linear combination of values of a more fundamental time series wt is called a filtered series.

  • Example: 3-point moving average (see bottom of previous slide for graph): vt=13(wt1+wt+wt+1).

  • In R, moving averages are implemented through the function

    filter(x, filter, method = c("convolution"),sides = 2)
    

    where x is the original series, filter is a vector of weights (in reverse time order), method = c("convolution") is the default (alternative is recursive), and where sides is 1 for past values only, and 2 if weights are centered around lag 0 (requires uneven number of weights).

  • Moving average smoothers will be further discussed in Module 8.

Autoregressions #

A series xt that depends on some of its past values, as well as a noise wt is called an autoregression, because the formula looks like a regression—not of independent variables, but of its own past values—hence autoregression.

  • Example: An autoregression of the white noise: xt=xt10.9xt2+wt.

  • If the autoregression goes back k periods, one needs k initial conditions (filter will use 0’s otherwise).

  • In R, autoregressions are implemented through the function

    filter(x, filter, method = c("recursive"),init)
    

    where x is the original series, filter is a vector of weights (reverse time order) and init a vector of initial values (reverse time order).

  • Autoregressions will be denoted AR(p) (details in Module 9).


# take this series:
wt <- 1:10
# a simple filter:
filter(wt, rep(1/3, 3), method = c("convolution"), sides = 1)
## Time Series:
## Start = 1 
## End = 10 
## Frequency = 1 
##  [1] NA NA  2  3  4  5  6  7  8  9
# an autoregression
filter(wt, rep(1/3, 3), method = c("recursive"))
## Time Series:
## Start = 1 
## End = 10 
## Frequency = 1 
##  [1]  1.000000  2.333333  4.111111  6.481481  9.308642 12.633745
##  [7] 16.474623 20.805670 25.638012 30.972768
# note how the previous _results_ (rather than wt) are
# being used in the recursion

Autoregression examples #

w <- rnorm(550, 0, 1)  # 50 extra to avoid startup problems
x <- stats::filter(w, filter = c(1, -0.9), method = "recursive")[-(1:50)]
# remove first 50
plot.ts(x, ylab = "autogression", main = "Autoregressive series generated from model x_t=x_{t-1}-0.9x_{t-2}+w_t")
y <- stats::filter(w, filter = c(1, -0.3), method = "recursive")[-(1:50)]
# remove first 50
plot.ts(y, ylab = "autogression", main = "Autoregressive series generated from model x_t=x_{t-1}-0.3x_{t-2}+w_t")

Random walk with drift #

  • The autoregressions introduced above are all centered around 0 for all t (in the expected sense).
  • Assume now that the series increases linearly by δ (called drift) every time unit.
  • The random walk with drift looks back only one time unit: xt=δ+xt1+wt=δt+j=1twj for t=1,2, with initial condition x0=0 and with wt a white noise.
  • If δ=0 this is simply called a random walk.
  • The term can be explained by visualising each increment from t to t+1 as a purely random step from wherever the process is at xt, ignoring what happened before.

Random walk with drift δ=0.2 and σw=1 #


Code used to generate the plot:

set.seed(155)  # so you can reproduce the results
w <- rnorm(200)
x <- cumsum(w)
wd <- w + 0.2
xd <- cumsum(wd)
plot.ts(xd, ylim = c(-5, 55), main = "random walk", ylab = "")
lines(x, col = 4)
abline(h = 0, col = 4, lty = 2)

Describing the behaviour of basic models (TS 1.3) #

Motivation #

  • In this section we would like to develop theoretical measures to help describe how times series behave.

  • We are particularly interested in describing the relationships between observations at different points in time.

Full specification #

  • A full specification of a time series of size n at times t1,t2,,tn for any n would require the full joint distribution function Ft1,t2,,tn(c1,c2,,cn)=Pr[xt1c1,xt2c2,,xtncn]. This is a quite unwieldy tool for analysis.
  • Examination of the margins Ft(x)=Pr[xtx] and corresponding pdf ft(x), when they exist, can be informative.
  • These are very theoretical. In practice, one often have only one realisation for each xt so that inferring full distributions (let alone their dependence structure) is simply impractical without tricks, manipulations, and assumptions (some of which we will learn).

What would be the most basic descriptors?

Mean function #

The mean function is defined as μxt=E[xt]=xft(x)dx.

Examples:

  • Moving Average Series: we have μvt=E[vt]=13(E[wt1]+E[wt]+E[wt+1])=0. Smoothing does not change the mean.
  • Random walk with drift: we have μxt=E[xt]=δt+j=1tE[wj]=δt.

Autocovariance function #

The autocovariance function is defined as the second moment product γx(s,t)=Cov(xs,xt)=E[(xsμxs)(xtμxt)] for all s and t. Note:

  • We will write γx(s,t)=γ(s,t) if no confusion is possible.
  • This is a measure of linear dependence.
  • Smooth series large γ even for t and s far apart
  • Choppy series γ is nearly zero for large separations
  • [ γx(s,t)=0 independence] all variables are normal

For two series xt and yt this becomes γxy(s,t)=Cov(xs,yt)=E[(xsμxs)(ytμyt)], called cross-covariance function.

Examples of autocovariance functions #

White noise: The white noise series wt has E[wt]=0 and γw(s,t)=Cov(ws,wt)={σw2s=t0st


Remember that if U=j=1majXj and V=k=1rbkYk then Cov(U,V)=j=1mk=1rajbkCov(Xj,Yk). This will be useful for computing γ of filtered series.


Moving average: A 3-point moving average vt to the white noise series wt has γv(s,t)=Cov(vs,vt)={39σw2s=t29σw2|st|=119σw2|st|=20|st|>2 This only depends on the time separation lag only, and not on the absolute location along the series.

This is related to the concept of weak stationarity which will introduce later.


Random walk: For the random walk xt=j=1twj we have γx(s,t)=Cov(xs,xt)=Cov(j=1swj,k=1twk)=min{s,t}σw2. Contrary to the previous examples, this depends on the absolute location rather than the lag.

Also Var(xt)=tσw2 increases without bound as t increases.

The autocorrelation function (ACF) #

The autocorrelation function (ACF) is defined as 1ρ(s,t)=γ(s,t)γ(s,s)γ(t,t)1.

  • The ACF measures the linear predictability of the series at time t, say xt, using only the value xs.
  • If we could do that perfectly, then ρ(s,t)±1 and xt=β0+β1xs with β1 of same sign as ρ(s,t).

In the case of two series this becomes 1ρxy(s,t)=γxy(s,t)γx(s,s)γy(t,t)1, called cross-correlation function (CCF).

Stationary time series (TS 1.4) #

Strict stationarity #

A strictly stationary times series is one for which the probabilistic behaviour of every collection of values {xt1,xt2,,xtk} is identical to that of the time shifted set (for any h) {xt1+h,xt2+h,,xtk+h}. That is, Pr[xt1c1,xt2c2,,xtkck] =Pr[xt1+hc1,xt2+hc2,,xtk+hck] for all k=1,2,, all the time points t1,t2,,tk, all numbers c1,c2,,ck and all time shifts h=0,±1,±2,. This implies

  • identical marginals of dimensions <k for any shift h
  • constant mean: μxs=μxtμ
  • for k=2, an autocovariance function that depends only on ts: γ(s,t)=γ(s+h,t+h)

We need something less constraining, that still allows us to infer
properties from a single series.

Weak stationarity #

A weakly stationary time series, xt, is a finite variance process such that

  1. the mean value function, μxt is constant and does not depend on time t, and
  2. the autocovariance function, γ(s,t) depends on s and t only through their difference |st|.

Note:

  • We dropped full distributional requirements. This imposes conditions on the first two moments of the series only.
  • Since those completely define a normal distribution, a (weak) stationary Gaussian time series is also strictly stationary.
  • We will use the term stationary to mean weakly stationary; if a process is stationary in the strict sense, we will use the term strictly stationary.

Stationarity means we can estimate those two quantities by averaging of a single series. This is what we needed.

Properties of stationary series #

  • Because of condition 1, μt=μ.

  • Because of condition 2, γ(t+h,t)=Cov(xt+h,xt)=Cov(xh,x0)=γ(h,0)γ(h) and the autocovariance of a stationary time series is then γ(h)=Cov(xt+h,xt)=E[(xt+hμ)(xtμ)].

  • γ(h) is non-negative definite, which means that the variance of linear combinations of variates xt will never be negative, that is, 0Var(a1x1++anxn)=j=1nk=1najakγ(jk).


  • Furthermore, |γ(h)|γ(0) (the variance of the time series) and γ(h)=γ(h).
  • The autocorrelation function (ACF) of a stationary time series becomes 1ρ(h)=γ(t+h,t)γ(t+h,t+h)γ(t,t)=γ(h)γ(0)1.

Examples of (non)-stationarity #

White noise: We have μwt=0 and γw(h)=Cov(wt+h,wt)={σw2h=0,0h0,, which are both independent of time. Hence, the white noise satisfies both conditions and is (weakly) stationary. Furthermore, ρw(h)={1h=0,0h0. If in addition wtiid N(0,σw2), then it is also strictly stationary.


Moving average: For the 3-point MA we have μvt=0 and γv(h)={39σw2h=0,29σw2h±1,19σw2h±2,0|h|>2, which are both independent of time. Hence, the 3-point MA satisfies both conditions and is stationary. Furthermore, ρv(h)={1h=0,23h±1,13h±2,0|h|>2, which is symmetric around lag 0.


Random walk: For the random walk model xt=j=1twj we have μxt=δt, which is a function of time t, and γ(s,t)=min{s,t}σw2, which depends on s and t (not just their difference), so the random walk is not stationary.

Furthermore, remember Var(xt)=γx(t,t)=tσw2 which increases without bound as t.

Trend stationarity #

  • If only the second condition (on the ACF) is satisfied, but not the first condition (on the mean value function), we have trend stationarity
  • This means that the model has a stationary behaviour around its trend.
  • Example: if xt=α+βt+ytwhere yt is stationary, then the mean function is μx,t=E[xt]=α+βt+μy, which is *not* independent of time. The autocovariance function, γx(h)=Cov(xt+h,xt)=E[(xt+hμx,t+h)(xtμx,t)]=E[(yt+hμy)(ytμy)]=γy(h), however, is independent of time.

Joint stationarity #

Two time series, say, xt and yt, are said to be jointly stationary if they are each stationary, and the cross-covariance function γxy(h)=Cov(xt+h,yt)=E[(xt+hμx)(ytμy)] is a function only of lag h. The corresponding cross-correlation function (CCF) is 1ρxy(h)=γxy(h)γx(0)γy(0)1.


Note that because Cov(x2,y1) and Cov(x1,y2) (for example) need not be the same, it follows that typically ρxy(h)ρxy(h), that is, the CCF is not generally symmetric about zero. However, we have ρxy(h)=ρyx(h).

Example of joint stationarity #

Consider xt=wt+wt1andyt=wtwt1, where wt are independent with mean 0 and variance σw2. We have then γx(0)=γy(0)=2σw2γx(1)=γx(1)=σw2γy(1)=γy(1)=σw2 and γxy(1)=σw2,γxy(0)=0, and γxy(1)=σw2, so that ρxy(h)={0h=0,1/2h=1,1/2h=1,0|h|2, which depends only on the lag h, so both series are jointly stationary.

Prediction using cross-correlation #

Prediction using cross-correlation: A lagging relation between two series xt and yt may be exploited for predictions. For instance, if yt=Axt+wt, xt is said to lead yt for >0, and is said to lag yt for <0.

If the relation above holds true, then the lag can be inferred from the shape of the autocovariance of the input series xt:

  • If wt is uncorrelated with xt then γyx(h)=Cov(yt+h,xt)=Cov(Axt+h+wt+h,xt)=Cov(Axt+h,xt)=Aγx(h)
  • We know γx(h)γx(0), and the peak of γyx(h) should be at h=. Also:
    h will be positive if xt leads yt, negative if xt lags yt.

  • Here =5 and xt leads yt:

Note this example was simulated and uses the R functions lag and ccf:

x <- rnorm(100)
y <- stats::lag(x, -5) + rnorm(100)
ccf(y, x, ylab = "CCovF", type = "covariance")

Linear process #

A linear process, xt , is defined to be a linear combination of white noise variates wt, and is given by xt=μ+j=Ψjwtj,j=|Ψj|< This is an important class of models because it encompasses moving averages, autoregressions, and also the combination of both, called autoregressive moving average (ARMA) processes which we will introduce later.

Example:

  • Moving average The 3-point moving average has Ψ0=Ψ1=Ψ1=1/3 and is hence a linear process.

Properties of linear processes #

  • The autocovariance function of a linear process is given by γx(h)=σw2j=Ψj+hΨjfor h0.
  • It has finite variance if j=Ψj2<.
  • In its most general form xt depends on the future ($j<0$ components), the present ($j=0$) and the past ($j>0$).

For forecasting, a model dependent on the future is useless. We will focus on processes that do not depend on the future. Such processes are called causal, that is, xt is causalΨj=0 for j<0, which we will assume unless stated otherwise.

Estimation of correlation (TS 1.5) #

Background #

  • One can very rarely hypothetise (specify) time series. In practice, most analyses are performed using sample data.
  • Furthermore, one often has only one realisation of the time series.
  • This means that we don’t have n realisations of the time series to estimate its covariance and correlation functions.
  • This is why the assumption of stationarity is essential: in this case, the assumed `homogeneity’ of the data means we can estimate those functions on one realisation only.
  • This also means that one needs to manipulate / de-trend series such that they are arguably stationary before we can fit parameters to them and use them for projections.

Sample mean #

If a time series is stationary the mean function μt=μ is constant so that we can estimate it by the sample mean, x=1nt=1nxt. This estimator is unbiased, E[x]=μ, and has standard error the square root of Var(x)=1n2Cov(t=1nxt,s=1nxs)=1nh=nn(1|h|n)γx(h).

Sample autocovariance function #

The sample autocovariance function is defined as γ^(h)=1nt=1nh(xt+hx)(xtx)with γ^(h)=γ^(h) for h=0,1,,n1. Note:

  • The estimator is biased.
  • The sum runs over a restricted range ($n-h$) because xt+h is not available for t+h>n.
  • One could wonder why the factor of the sum is not 1/(nh) (the number of elements in the sum), but factor 1/n is not a mistake. It ensures that the estimate of the variances of linear combinations, Var^(a1x1++anxn)=j=1nk=1najakγ^(jk), is non-negative.

Sample autocorrelation function #

The sample autocorrelation function (SACF) is ρ^(h)=γ^(h)γ^(0).

Under some conditions (see book for details), if xt is a white noise, then for large n, the SACF ρ^(h) is approximately normally distributed with zero mean and standard deviation given by σρ^(h)=1n.

Testing for significance of autocorrelation #

The asymptotic result for the variance of the SACF means we can test whether lagged observations are uncorrelated (which is a requirement for white noise):

  • test for significance of the ρ^’s at different lags: check how many ρ^’s lie outside the interval ±2/n (a 95% confidence interval)
  • One should expect approximately 1 out of 20 to lie outside the interval if the sequence is a white noise. Many more than that would invalidate the whiteness assumption.
  • This allows for a recursive approach for manipulating / de-trending series until they are white noise, called whitening.
  • The R function acf automatically displays those bounds with dashed blue lines.

SOI autocorrelation #

acf(soi, main = "Sample autocorrelation function (SACF) of SOI")

r <- round(acf(soi, 6, plot = FALSE)$acf[-1], 3)  # first 6 sample acf values
## [1]  0.604  0.374  0.214  0.050 -0.107 -0.187

The SOI series is clearly not a white noise.


plot(stats::lag(soi, -1), soi, main = "SOI pairs of values 1 month apart")
legend("topleft", legend = r[1])
plot(stats::lag(soi, -6), soi, main = "SOI pairs of values 6 months apart")
legend("topleft", legend = r[6])

Scatterplots allow to have a visual representation of the dependence
(which may not necessarily be linear).

Sample cross-covariances and cross-correlations #

The sample cross-covariance function is γ^xy(h)=1nt=1nh(xt+hx)(yty), where γ^xy(h)=γ^yx(h) determines the function for negative lags.

The sample cross-correlation function is 1ρ^xy(h)=γ^xy(h)γ^x(0)γ^y(0)1. Note:

  • Graphical examinations of ρ^xy(h) provide information about the leading or lagging relations in the data.

Testing for independent cross-whiteness #

If xt and yt are independent linear processes then the large sample distribution of ρ^xy(h) has mean 0 and σρ^xy=1n if at least one of the processes is independent white noise.

This is very useful, and adds to the toolbox:

  • This provides feedback about the quality of our explanation of the relationship between both time series: if we have explained the trends and relationships between both processes, then their residuals should be independent white noise.
  • After each improvement of our model, significance of the ρ^xy’s of the residuals can be tested: if we have independent cross-whiteness then we have a good model. If the ρ^xy’s are still significant (outside
    the boundaries) then we still have things to explain (to add).

SOI and recruitment correlation analysis #

acf(soi, 48, main = "Southern Oscillation Index")
acf(rec, 48, main = "Recruitment")
ccf(soi, rec, 48, main = "SOI vs Recruitment", ylab = "CCF")

The SCCF (bottom) has a different cycle, and peak at h=6
suggests SOI leads Recruitment by 6 months (negatively).

Idea of prewhitening #

  • to use the test of cross-whiteness one needs to “prewhiten” at least one of the series
  • for the SOI vs recruitment example, there is strong seasonality which, upon removal, may whiten the series
  • we look at an example here that looks like the SOI vs recruitment example, and show how this seasonality could be removed with the help of sin and cos functions

Example:

  • Let us generate two series xt and yt, for t=1,,120, independently as xt=2cos(2πt112)+wt1andyt=2cos(2π[t+5])112)+wt2, where {wt1,wt2;t=1,,120} are all independent standard
    normals.

  • this generates the data and plots it:
set.seed(1492)
num <- 120
t <- 1:num
X <- ts(2 * cos(2 * pi * t/12) + rnorm(num), freq = 12)
Y <- ts(2 * cos(2 * pi * (t + 5)/12) + rnorm(num), freq = 12)
par(mfrow = c(1, 2), mgp = c(1.6, 0.6, 0), mar = c(3, 3, 1, 1))
plot(X)
plot(Y)


  • looking at the ACFs one can see seasonality
par(mfrow = c(3, 2), mgp = c(1.6, 0.6, 0), mar = c(3, 3, 1, 1))
acf(X, 48, ylab = "ACF(X)")
acf(Y, 48, ylab = "ACF(Y)")
ccf(X, Y, 24, ylab = "CCF(X,Y)")

  • furthermore the CCF suggests cross-correlation
    even though the series are independent

  • what we do now is to “prewhiten” yt by removing the signal from the data by running a regression of yt on cos(2πt) and sin(2πt) and then putting y~=yty^t, where y^t are the predicted values from the regression.
  • in the R code below, Yw is y~
par(mgp = c(1.6, 0.6, 0), mar = c(3, 3, 1, 1))
Yw <- resid(lm(Y ~ cos(2 * pi * t/12) + sin(2 * pi * t/12), na.action = NULL))
ccf(X, Yw, 24, ylab = "CCF(X,Yw)", ylim = c(-0.3, 0.3))

  • the updated CCF now suggests cross-independence, as it should

References #

Shumway, Robert H., and David S. Stoffer. 2017. Time Series Analysis and Its Applications: With r Examples. Springer.