Code should execute sequentially if run in a Jupyter notebook

- See the set up page to install Jupyter, Julia (0.6+) and all necessary libraries
- Please direct feedback to contact@quantecon.org or the discourse forum

# Covariance Stationary Processes¶

## Overview¶

In this lecture we study covariance stationary linear stochastic processes, a class of models routinely used to study economic and financial time series

This class has the advantange of being

- simple enough to be described by an elegant and comprehensive theory
- relatively broad in terms of the kinds of dynamics it can represent

We consider these models in both the time and frequency domain

### ARMA Processes¶

We will focus much of our attention on linear covariance stationary models with a finite number of parameters

In particular, we will study stationary ARMA processes, which form a cornerstone of the standard theory of time series analysis

Every ARMA processes can be represented in linear state space form

However, ARMA have some important structure that makes it valuable to study them separately

### Spectral Analysis¶

Analysis in the frequency domain is also called spectral analysis

In essence, spectral analysis provides an alternative representation of the autocovariance function of a covariance stationary process

Having a second representation of this important object

- shines light on the dynamics of the process in question
- allows for a simpler, more tractable representation in some important cases

The famous *Fourier transform* and its inverse are used to map between the two representations

## Introduction¶

Consider a sequence of random variables \(\{ X_t \}\) indexed by \(t \in \mathbb Z\) and taking values in \(\mathbb R\)

Thus, \(\{ X_t \}\) begins in the infinite past and extends to the infinite future — a convenient and standard assumption

As in other fields, successful economic modeling typically assumes the existence of features that are constant over time

If these assumptions are correct, then each new observation \(X_t, X_{t+1},\ldots\) can provide additional information about the time-invariant features, allowing us to learn from as data arrive

For this reason, we will focus in what follows on processes that are *stationary* — or become so after a transformation
(see for example this lecture and this lecture)

### Definitions¶

A real-valued stochastic process \(\{ X_t \}\) is called *covariance stationary* if

- Its mean \(\mu := \mathbb E X_t\) does not depend on \(t\)
- For all \(k\) in \(\mathbb Z\), the \(k\)-th autocovariance \(\gamma(k) := \mathbb E (X_t - \mu)(X_{t + k} - \mu)\) is finite and depends only on \(k\)

The function \(\gamma \colon \mathbb Z \to \mathbb R\) is called the *autocovariance function* of the process

Throughout this lecture, we will work exclusively with zero-mean (i.e., \(\mu = 0\)) covariance stationary processes

The zero-mean assumption costs nothing in terms of generality, since working with non-zero-mean processes involves no more than adding a constant

### Example 1: White Noise¶

Perhaps the simplest class of covariance stationary processes is the white noise processes

A process \(\{ \epsilon_t \}\) is called a *white noise process* if

- \(\mathbb E \epsilon_t = 0\)
- \(\gamma(k) = \sigma^2 \mathbf 1\{k = 0\}\) for some \(\sigma > 0\)

(Here \(\mathbf 1\{k = 0\}\) is defined to be 1 if \(k = 0\) and zero otherwise)

White noise processes play the role of **building blocks** for processes with more complicated dynamics

### Example 2: General Linear Processes¶

From the simple building block provided by white noise, we can construct a very flexible family of covariance stationary processes — the *general linear processes*

where

- \(\{\epsilon_t\}\) is white noise
- \(\{\psi_t\}\) is a square summable sequence in \(\mathbb R\) (that is, \(\sum_{t=0}^{\infty} \psi_t^2 < \infty\))

The sequence \(\{\psi_t\}\) is often called a *linear filter*

Equation (1) is said to present a **moving average** process or a moving average representation

With some manipulations it is possible to confirm that the autocovariance function for (1) is

By the Cauchy-Schwartz inequality one can show that \(\gamma(k)\) satisfies equation (2)

Evidently, \(\gamma(k)\) does not depend on \(t\)

### Wold’s Decomposition¶

Remarkably, the class of general linear processes goes a long way towards describing the entire class of zero-mean covariance stationary processes

In particular, Wold’s decomposition theorem states that every zero-mean covariance stationary process \(\{X_t\}\) can be written as

where

- \(\{\epsilon_t\}\) is white noise
- \(\{\psi_t\}\) is square summable
- \(\eta_t\) can be expressed as a linear function of \(X_{t-1}, X_{t-2},\ldots\) and is perfectly predictable over arbitrarily long horizons

For intuition and further discussion, see [Sar87], p. 286

### AR and MA¶

General linear processes are a very broad class of processes.

It often pays to specialize to those for which there exists a representation having only finitely many parameters

(Experience and theory combine to indicate that models with a relatively small number of parameters typically perform better than larger models, especially for forecasting)

One very simple example of such a model is the first-order autoregessive or AR(1) process

By direct substitution, it is easy to verify that \(X_t = \sum_{j=0}^{\infty} \phi^j \epsilon_{t-j}\)

Hence \(\{X_t\}\) is a general linear process

Applying (2) to the previous expression for \(X_t\), we get the AR(1) autocovariance function

The next figure plots an example of this function for \(\phi = 0.8\) and \(\phi = -0.8\) with \(\sigma = 1\)

```
using PyPlot
num_rows, num_cols = 2, 1
fig, axes = subplots(num_rows, num_cols, figsize=(10, 8))
for (i, phi) in enumerate((0.8, -0.8))
ax = axes[i]
times = 0:16
acov = [phi.^k ./ (1 - phi.^2) for k in times]
label = latexstring("autocovariance, \\phi = $phi")
ax[:plot](times, acov, "bo-", alpha=0.6, label=label)
ax[:legend](loc="upper right")
ax[:set](xlabel="time", xlim=(0, 15))
ax[:hlines](0, 0, 15, linestyle="--", alpha=0.5)
end
```

Another very simple process is the MA(1) process (here MA means “moving average”)

You will be able to verify that

The AR(1) can be generalized to an AR(\(p\)) and likewise for the MA(1)

Putting all of this together, we get the

### ARMA Processes¶

A stochastic process \(\{X_t\}\) is called an *autoregressive moving
average process*, or ARMA(\(p,q\)), if it can be written as

where \(\{ \epsilon_t \}\) is white noise

An alternative notation for ARMA processes uses the *lag operator* \(L\)

**Def.** Given arbitrary variable \(Y_t\), let \(L^k Y_t := Y_{t-k}\)

It turns out that

- lag operators facilitate succinct representations for linear stochastic processes
- algebraic manipulations that treat the lag operator as an ordinary scalar are legitimate

Using \(L\), we can rewrite (5) as

If we let \(\phi(z)\) and \(\theta(z)\) be the polynomials

then (6) becomes

In what follows we **always assume** that the roots of the polynomial \(\phi(z)\) lie outside the unit circle in the complex plane

This condition is sufficient to guarantee that the ARMA(\(p,q\)) process is convariance stationary

In fact it implies that the process falls within the class of general linear processes described above

That is, given an ARMA(\(p,q\)) process \(\{ X_t \}\) satisfying the unit circle condition, there exists a square summable sequence \(\{\psi_t\}\) with \(X_t = \sum_{j=0}^{\infty} \psi_j \epsilon_{t-j}\) for all \(t\)

The sequence \(\{\psi_t\}\) can be obtained by a recursive procedure outlined on page 79 of [CC08]

The function \(t \mapsto \psi_t\) is often called the *impulse response function*

## Spectral Analysis¶

Autocovariance functions provide a great deal of infomation about covariance stationary processes

In fact, for zero-mean Gaussian processes, the autocovariance function characterizes the entire joint distribution

Even for non-Gaussian processes, it provides a significant amount of information

It turns out that there is an alternative representation of the autocovariance function of a covariance stationary process, called the *spectral density*

At times, the spectral density is easier to derive, easier to manipulate, and provides additional intuition

### Complex Numbers¶

Before discussing the spectral density, we invite you to recall the main properties of complex numbers (or skip to the next section)

It can be helpful to remember that, in a formal sense, complex numbers are just points \((x, y) \in \mathbb R^2\) endowed with a specific notion of multiplication

When \((x, y)\) is regarded as a complex number, \(x\) is called the *real part* and \(y\) is called the *imaginary part*

The *modulus* or *absolute value* of a complex number \(z = (x, y)\) is just its Euclidean norm in \(\mathbb R^2\), but is usually written as \(|z|\) instead of \(\|z\|\)

The product of two complex numbers \((x, y)\) and \((u, v)\) is defined to be \((xu - vy, xv + yu)\), while addition is standard pointwise vector addition

When endowed with these notions of multiplication and addition, the set of complex numbers forms a field — addition and multiplication play well together, just as they do in \(\mathbb R\)

The complex number \((x, y)\) is often written as \(x + i y\), where \(i\) is called the *imaginary unit*, and is understood to obey \(i^2 = -1\)

The \(x + i y\) notation provides an easy way to remember the definition of multiplication given above, because, proceeding naively,

Converted back to our first notation, this becomes \((xu - vy, xv + yu)\) as promised

Complex numbers can be represented in the polar form \(r e^{i \omega}\) where

where \(x = r \cos(\omega), y = r \sin(\omega)\), and \(\omega = \arctan(y/z)\) or \(\tan(\omega) = y/x\)

### Spectral Densities¶

Let \(\{ X_t \}\) be a covariance stationary process with autocovariance function \(\gamma\) satisfying \(\sum_{k} \gamma(k)^2 < \infty\)

The *spectral density* \(f\) of \(\{ X_t \}\) is defined as the discrete time Fourier transform of its autocovariance function \(\gamma\)

(Some authors normalize the expression on the right by constants such as \(1/\pi\) — the convention chosen makes little difference provided you are consistent)

Using the fact that \(\gamma\) is *even*, in the sense that \(\gamma(t) = \gamma(-t)\) for all \(t\), we can show that

It is not difficult to confirm that \(f\) is

- real-valued
- even (\(f(\omega) = f(-\omega)\) ), and
- \(2\pi\)-periodic, in the sense that \(f(2\pi + \omega) = f(\omega)\) for all \(\omega\)

It follows that the values of \(f\) on \([0, \pi]\) determine the values of \(f\) on all of \(\mathbb R\) — the proof is an exercise

For this reason it is standard to plot the spectral density only on the interval \([0, \pi]\)

### Example 1: White Noise¶

Consider a white noise process \(\{\epsilon_t\}\) with standard deviation \(\sigma\)

It is easy to check that in this case \(f(\omega) = \sigma^2\). So \(f\) is a constant function

As we will see, this can be interpreted as meaning that “all frequencies are equally present”

(White light has this property when frequency refers to the visible spectrum, a connection that provides the origins of the term “white noise”)

### Example 2: AR and MA and ARMA¶

It is an exercise to show that the MA(1) process \(X_t = \theta \epsilon_{t-1} + \epsilon_t\) has spectral density

With a bit more effort, it’s possible to show (see, e.g., p. 261 of [Sar87]) that the spectral density of the AR(1) process \(X_t = \phi X_{t-1} + \epsilon_t\) is

More generally, it can be shown that the spectral density of the ARMA process (5) is

where

- \(\sigma\) is the standard deviation of the white noise process \(\{\epsilon_t\}\)
- the polynomials \(\phi(\cdot)\) and \(\theta(\cdot)\) are as defined in (7)

The derivation of (12) uses the fact that convolutions become products under Fourier transformations

The proof is elegant and can be found in many places — see, for example, [Sar87], chapter 11, section 4

It’s a nice exercise to verify that (10) and (11) are indeed special cases of (12)

### Interpreting the Spectral Density¶

Plotting (11) reveals the shape of the spectral density for the AR(1) model when \(\phi\) takes the values 0.8 and -0.8 respectively

```
function ar1_sd(phi, omega)
return 1 ./ (1 - 2 * phi * cos.(omega) + phi.^2)
end
omegas = linspace(0, pi, 180)
num_rows, num_cols = 2, 1
fig, axes = subplots(num_rows, num_cols, figsize=(10, 8))
for (i, phi) in enumerate((0.8, -0.8))
ax = axes[i]
sd = ar1_sd(phi, omegas)
label = latexstring("spectral \\ density, \\phi = $phi")
ax[:plot](omegas, sd, "b-", alpha=0.6, lw=2, label=label)
ax[:legend](loc="upper center")
ax[:set](xlabel="frequency", xlim=(0, pi))
end
```

These spectral densities correspond to the autocovariance functions for the AR(1) process shown above

Informally, we think of the spectral density as being large at those \(\omega \in [0, \pi]\) at which the autocovariance function seems approximately to exhibit big damped cycles

To see the idea, let’s consider why, in the lower panel of the preceding figure, the spectral density for the case \(\phi = -0.8\) is large at \(\omega = \pi\)

Recall that the spectral density can be expressed as

When we evaluate this at \(\omega = \pi\), we get a large number because \(\cos(\pi k)\) is large and positive when \((-0.8)^k\) is positive, and large in absolute value and negative when \((-0.8)^k\) is negative

Hence the product is always large and positive, and hence the sum of the products on the right-hand side of (13) is large

These ideas are illustrated in the next figure, which has \(k\) on the horizontal axis (click to enlarge)

```
phi = -0.8
times = 0:16
y1 = [phi.^k ./ (1 - phi.^2) for k in times]
y2 = [cos.(pi * k) for k in times]
y3 = [a * b for (a, b) in zip(y1, y2)]
num_rows, num_cols = 3, 1
fig, axes = subplots(num_rows, num_cols, figsize=(10, 8))
# Autocovariance when phi = -0.8
ax = axes[1]
ax[:plot](times, y1, "o-", alpha=0.6, label=L"\gamma(k)")
ax[:legend](loc="upper right")
ax[:set](xlim=(0, 15), yticks=(-2, 0, 2))
ax[:hlines](0, 0, 15, linestyle="--", alpha=0.5)
# Cycles at frequence pi
ax = axes[2]
ax[:plot](times, y2, "o-", alpha=0.6, label=L"$\cos(\pi k)$")
ax[:legend](loc="upper right")
ax[:set](xlim=(0, 15), yticks=(-1, 0, 1))
ax[:hlines](0, 0, 15, linestyle="--", alpha=0.5)
# Product
ax = axes[3]
ax[:stem](times, y3, label=L"$\gamma(k) \cos(\pi k)$")
ax[:legend](loc="upper right")
ax[:set](xlim=(0, 15), ylim=(-3, 3), yticks=(-1, 0, 1, 2, 3))
ax[:hlines](0, 0, 15, linestyle="--", alpha=0.5)
```

On the other hand, if we evaluate \(f(\omega)\) at \(\omega = \pi / 3\), then the cycles are not matched, the sequence \(\gamma(k) \cos(\omega k)\) contains both positive and negative terms, and hence the sum of these terms is much smaller

```
phi = -0.8
times = 0:16
y1 = [phi.^k ./ (1 - phi.^2) for k in times]
y2 = [cos.(pi * k/3) for k in times]
y3 = [a * b for (a, b) in zip(y1, y2)]
num_rows, num_cols = 3, 1
fig, axes = subplots(num_rows, num_cols, figsize=(10, 8))
# Autocovariance when phi = -0.8
ax = axes[1]
ax[:plot](times, y1, "o-", alpha=0.6, label=L"\gamma(k)")
ax[:legend](loc="upper right")
ax[:set](xlim=(0, 15), yticks=(-2, 0, 2))
ax[:hlines](0, 0, 15, linestyle="--", alpha=0.5)
# Cycles at frequence pi
ax = axes[2]
ax[:plot](times, y2, "o-", alpha=0.6, label=L"$\cos(\pi k/3)$")
ax[:legend](loc="upper right")
ax[:set](xlim=(0, 15), yticks=(-1, 0, 1))
ax[:hlines](0, 0, 15, linestyle="--", alpha=0.5)
# Product
ax = axes[3]
ax[:stem](times, y3, label=L"$\gamma(k) \cos(\pi k/3)$")
ax[:legend](loc="upper right")
ax[:set](xlim=(0, 15), ylim=(-3, 3), yticks=(-1, 0, 1, 2, 3))
ax[:hlines](0, 0, 15, linestyle="--", alpha=0.5)
```

In summary, the spectral density is large at frequencies \(\omega\) where the autocovariance function exhibits damped cycles

### Inverting the Transformation¶

We have just seen that the spectral density is useful in the sense that it provides a frequency-based perspective on the autocovariance structure of a covariance stationary process

Another reason that the spectral density is useful is that it can be “inverted” to recover the autocovariance function via the *inverse Fourier transform*

In particular, for all \(k \in \mathbb Z\), we have

This is convenient in situations where the spectral density is easier to calculate and manipulate than the autocovariance function

(For example, the expression (12) for the ARMA spectral density is much easier to work with than the expression for the ARMA autocovariance)

### Mathematical Theory¶

This section is loosely based on [Sar87], p. 249-253, and included for those who

- would like a bit more insight into spectral densities
- and have at least some background in Hilbert space theory

Others should feel free to skip to the next section — none of this material is necessary to progress to computation

Recall that every separable Hilbert space \(H\) has a countable orthonormal basis \(\{ h_k \}\)

The nice thing about such a basis is that every \(f \in H\) satisfies

where \(\langle \cdot, \cdot \rangle\) denotes the inner product in \(H\)

Thus, \(f\) can be represented to any degree of precision by linearly combining basis vectors

The scalar sequence \(\alpha = \{\alpha_k\}\) is called the *Fourier coefficients* of \(f\), and satisfies \(\sum_k |\alpha_k|^2 < \infty\)

In other words, \(\alpha\) is in \(\ell_2\), the set of square summable sequences

Consider an operator \(T\) that maps \(\alpha \in \ell_2\) into its expansion \(\sum_k \alpha_k h_k \in H\)

The Fourier coefficients of \(T\alpha\) are just \(\alpha = \{ \alpha_k \}\), as you can verify by confirming that \(\langle T \alpha, h_k \rangle = \alpha_k\)

Using elementary results from Hilbert space theory, it can be shown that

- \(T\) is one-to-one — if \(\alpha\) and \(\beta\) are distinct in \(\ell_2\), then so are their expansions in \(H\)
- \(T\) is onto — if \(f \in H\) then its preimage in \(\ell_2\) is the sequence \(\alpha\) given by \(\alpha_k = \langle f, h_k \rangle\)
- \(T\) is a linear isometry — in particular \(\langle \alpha, \beta \rangle = \langle T\alpha, T\beta \rangle\)

Summarizing these results, we say that any separable Hilbert space is isometrically isomorphic to \(\ell_2\)

In essence, this says that each separable Hilbert space we consider is just a different way of looking at the fundamental space \(\ell_2\)

With this in mind, let’s specialize to a setting where

- \(\gamma \in \ell_2\) is the autocovariance function of a covariance stationary process, and \(f\) is the spectral density
- \(H = L_2\), where \(L_2\) is the set of square summable functions on the interval \([-\pi, \pi]\), with inner product \(\langle g, h \rangle = \int_{-\pi}^{\pi} g(\omega) h(\omega) d \omega\)
- \(\{h_k\} =\) the orthonormal basis for \(L_2\) given by the set of trigonometric functions

Using the definition of \(T\) from above and the fact that \(f\) is even, we now have

In other words, apart from a scalar multiple, the spectral density is just an transformation of \(\gamma \in \ell_2\) under a certain linear isometry — a different way to view \(\gamma\)

In particular, it is an expansion of the autocovariance function with respect to the trigonometric basis functions in \(L_2\)

As discussed above, the Fourier coefficients of \(T \gamma\) are given by the sequence \(\gamma\), and, in particular, \(\gamma(k) = \langle T \gamma, h_k \rangle\)

Transforming this inner product into its integral expression and using (16) gives (14), justifying our earlier expression for the inverse transform

## Implementation¶

Most code for working with covariance stationary models deals with ARMA models

Julia code for studying ARMA models can be found in the `DSP.jl`

package

Since this code doesn’t quite cover our needs — particularly vis-a-vis spectral analysis — we’ve put together the module arma.jl, which is part of QuantEcon.jl package

The module provides functions for mapping ARMA(\(p,q\)) models into their

- impulse response function
- simulated time series
- autocovariance function
- spectral density

### Application¶

Let’s use this code to replicate the plots on pages 68–69 of [LS12]

Here are some functions to generate the plots

```
using QuantEcon
# == Plot functions == #
function plot_spectral_density(arma::ARMA, ax::PyCall.PyObject)
(w, spect) = spectral_density(arma, two_pi=false)
ax[:plot](w, spect, lw=2, alpha=0.7)
ax[:set](title="Spectral density", xlim=(0, pi),
xlabel="frequency", ylabel="spectrum", yscale="log")
return ax
end
function plot_spectral_density(arma::ARMA)
fig, ax = subplots()
plot_spectral_density(arma::ARMA, ax=ax)
return ax
end
function plot_autocovariance(arma::ARMA, ax::PyCall.PyObject)
acov = autocovariance(arma)
n = length(acov)
ax[:stem](0:(n - 1), acov)
ax[:axhline](y=0, c="red", lw=0.5)
ax[:set](title="Autocovariance", xlim=(-0.5, n - 0.5),
xlabel="time", ylabel="autocovariance")
return ax
end
function plot_autocovariance(arma::ARMA)
fig, ax = subplots()
plot_spectral_density(arma::ARMA, ax=ax)
return ax
end
function plot_impulse_response(arma::ARMA, ax::PyCall.PyObject)
psi = impulse_response(arma)
n = length(psi)
ax[:stem](0:(n - 1), psi)
ax[:axhline](y=0, c="red", lw=0.5)
ax[:set](title="Impulse response", xlim=(-0.5, n - 0.5),
xlabel="time", ylabel="response")
return ax
end
function plot_impulse_response(arma::ARMA)
fig, ax = subplots()
plot_spectral_density(arma::ARMA, ax=ax)
return ax
end
function plot_simulation(arma::ARMA, ax::PyCall.PyObject)
X = simulation(arma)
n = length(X)
ax[:plot](0:(n - 1), X, lw=2, alpha=0.7)
ax[:set](title="Sample path", xlim=(0.0, n),
xlabel="time", ylabel="state space")
return ax
end
function plot_simulation(arma::ARMA)
fig, ax = subplots()
plot_spectral_density(arma::ARMA, ax=ax)
return ax
end
function quad_plot(arma::ARMA)
fig, axes = subplots(2, 2, figsize=(12, 8))
plot_functions = [plot_impulse_response,
plot_spectral_density,
plot_autocovariance,
plot_simulation]
for (plot_func, ax) in zip(plot_functions, reshape(axes, prod(size(axes))))
plot_func(arma, ax)
end
fig[:tight_layout]()
return ax
end
```

Now let’s call these functions to generate the plots

We’ll use the model \(X_t = 0.5 X_{t-1} + \epsilon_t - 0.8 \epsilon_{t-2}\)

```
phi = 0.5;
theta = [0, -0.8];
arma = ARMA(phi, theta, 1.0)
quad_plot(arma)
```

### Explanation¶

The call

```
arma = ARMA(phi, theta, sigma)
```

creates an instance `lp`

that represents the ARMA(\(p, q\)) model

If `phi`

and `theta`

are arrays or sequences, then the interpretation will
be

`phi`

holds the vector of parameters \((\phi_1, \phi_2,..., \phi_p)\)`theta`

holds the vector of parameters \((\theta_1, \theta_2,..., \theta_q)\)

The parameter `sigma`

is always a scalar, the standard deviation of the white noise

We also permit `phi`

and `theta`

to be scalars, in which case the model will be interpreted as

The two numerical packages most useful for working with ARMA models are `DSP.jl`

and the `fft`

routine in Julia

### Computing the Autocovariance Function¶

As discussed above, for ARMA processes the spectral density has a simple representation that is relatively easy to calculate

Given this fact, the easiest way to obtain the autocovariance function is to recover it from the spectral density via the inverse Fourier transform

Here we use Julia’s Fourier transform routine fft, which wraps a standard C-based package called FFTW

A look at the fft documentation shows that the inverse transform ifft takes a given sequence \(A_0, A_1, \ldots, A_{n-1}\) and returns the sequence \(a_0, a_1, \ldots, a_{n-1}\) defined by

Thus, if we set \(A_t = f(\omega_t)\), where \(f\) is the spectral density and \(\omega_t := 2 \pi t / n\), then

For \(n\) sufficiently large, we then have

(You can check the last equality)

In view of (14) we have now shown that, for \(n\) sufficiently large, \(a_k \approx \gamma(k)\) — which is exactly what we want to compute