Download PDF

We are working to support a site-wide PDF but it is not yet available. You can download PDFs for individual lectures through the download badge on each lecture page.

How to read this lecture...

Code should execute sequentially if run in a Jupyter notebook

Reverse engineering a la Muth

Co-author: Chase Coleman

In addition what’s in Anaconda, this lecture uses the quantecon library

In [1]:
!pip install quantecon

We’ll also need the following imports

In [2]:
import matplotlib.pyplot as plt
import numpy as np
import scipy.linalg as la

from quantecon import Kalman
from quantecon import LinearStateSpace
from scipy.stats import norm

%matplotlib inline
np.set_printoptions(linewidth=120, precision=4, suppress=True)

This lecture uses the Kalman filter to reformulate John F. Muth’s first paper [Mut60] about rational expectations

Muth used classical prediction methods to reverse engineer a stochastic process that renders optimal Milton Friedman’s [Fri56] “adaptive expectations” scheme

Friedman (1956) and Muth (1960)

Milton Friedman [Fri56] (1956) posited that consumer’s forecast their future disposable income with the adaptive expectations scheme

$$ y_{t+i,t}^* = K \sum_{j=0}^\infty (1 - K)^j y_{t-j} \tag{1} $$

where $ K \in (0,1) $ and $ y_{t+i,t}^* $ is a forecast of future $ y $ over horizon $ i $

Milton Friedman justified the exponential smoothing forecasting scheme (1) informally, noting that it seemed a plausible way to use past income to forecast future income

In his first paper about rational expectations, John F. Muth [Mut60] reverse engineered a univariate stochastic process $ \{y_t\}_{t=- \infty}^\infty $ for which Milton Friedman’s adaptive expectations scheme gives linear least forecasts of $ y_{t+j} $ for any horizon $ i $

Muth sought a setting and a sense in which Friedman’s forecasting scheme is optimal

That is, Muth asked for what optimal forecasting question is Milton Friedman’s adaptive expectation scheme the answer

Muth (1960) used classical prediction methods based on lag-operators and $ z $-tranforms to find the answer to his question

Please see lectures Classical Control with Linear Algebra and Classical Filtering and Prediction with Linear Algebra for an introduction to the classical tools that Muth used

Rather than using those classical tools, in this lecture we apply the Kalman filter to express the heart of Muth’s analysis concisely

The lecture First Look at Kalman Filter describes the Kalman filter

We’ll use limiting versions of the Kalman filter corresponding to what are called stationary values in that lecture

A $ \{y_t\} $ Process for Which Adaptive Expectations are Optimal

Suppose that an observable $ y_t $ is the sum of an unobserved random walk $ x_t $ and an i.i.d. shock $ \epsilon_{2,t} $:

$$ \eqalign{ x_{t+1} & = x_t + \sigma_x \epsilon_{1,t+1} \cr y_t & = x_t + \sigma_y \epsilon_{2,t} } \tag{2} $$


$$ \left[\matrix{\epsilon_{1,t+1} \cr \epsilon_{2,t} } \right] \sim {\mathcal N} (0, I) $$

is an i.i.d. process

Note: A property of the statespace representation (2) is that in general neither $ \epsilon_{1,t} $ nor $ \epsilon_{2,t} $ is in the space spanned by square-summable lineary combinations of $ y_t, y_{t-1}, \ldots $

In general $ \begin{bmatrix} \epsilon_{1,t} \cr \epsilon_{2t} \end{bmatrix} $ has more information about future $ y_{t+j} $’s than is contained in $ y_t, y_{t-1}, \ldots $

We can use the asymptotic or stationary values of the Kalman gain and the one-step ahead conditional state covarariance matrix to compute a time-invariant innovations representation

$$ \eqalign{ \hat x_{t+1} & = \hat x_t + K a_t \cr y_t & = \hat x_t + a_t } \tag{3} $$

where $ \hat x_t = E [x_t | y_{t-1}, y_{t-2}, \ldots ] $ and $ a_t = y_t - E[y_t |y_{t-1}, y_{t-2}, \ldots ] $

Note: A key property about an innovations representation is that $ a_t $ is in the space spanned by square summable linear combinations of $ y_t, y_{t-1}, \ldots $

For more ramifications of this property, see the lectures Shock Non Invertibility and Recursive Models of Dynamic Linear Economies

Later we’ll stack these statespace systems (2) and (3) to display some classic findings of Muth

But first let’s create an instance of the statespace system (2) then apply the quantecon Kalman class, then uses it to construct the associated “innovations representation”

In [3]:
# Make some parameter choices
# sigx/sigy are state noise std err and measurement noise std err
μ_0, σ_x, σ_y = 10, 1, 5

# Create a LinearStateSpace object
A, C, G, H = 1, σ_x, 1, σ_y
ss = LinearStateSpace(A, C, G, H, mu_0=μ_0)

# Set prior and initialize the Kalman type
x_hat_0, Σ_0 = 10, 1
kmuth = Kalman(ss, x_hat_0, Σ_0)

# Computes stationary values which we need for the innovation representation
S1, K1 = kmuth.stationary_values()

# Form innovation representation state space
Ak, Ck, Gk, Hk = A, K1, G, 1

ssk = LinearStateSpace(Ak, Ck, Gk, Hk, mu_0=x_hat_0)

Some useful statespace math

Now we want to map the time-invariant innovations representation (3) and the original statespace system (2) into a convenient form for deducing the impulse responses from the original shocks to the $ x_t $ and $ \hat x_t $

Putting both of these representations into a single state space system is yet another application of the insight that “finding the state is an art”

We’ll define a state vector and appropriate statespace matrices that allow us to represent both systems in one fell swoop

Note that

$$ a_t = x_t + \sigma_y \epsilon_{2,t} - \hat x_t $$

so that

$$ \eqalign{ \hat x_{t+1} & = \hat x_t + K (x_t + \sigma_y \epsilon_{2,t} - \hat x_t) \cr & = (1-K) \hat x_t + K x_t + K \sigma_y \epsilon_{2,t} } $$

The stacked system

$$ \left[ \matrix{ x_{t+1} \cr \hat x_{t+1} \cr \epsilon_{2,t+1} } \right] = \left[\matrix{ 1 & 0 & 0 \cr K & (1-K) & K \sigma_y \cr 0 & 0 & 0 } \right] \left[ \matrix{ x_{t} \cr \hat x_t \cr \epsilon_{2,t} } \right]+ \left[ \matrix{ \sigma_x & 0 \cr 0 & 0 \cr 0 & 1} \right] \left[ \matrix{ \epsilon_{1,t+1} \cr \epsilon_{2,t+1} } \right] $$$$ \left[ \matrix{ y_t \cr a_t } \right] = \left[\matrix{ 1 & 0 & \sigma_y \cr 1 & -1 & \sigma_y } \right] \left[ \matrix{ x_{t} \cr \hat x_t \cr \epsilon_{2,t} } \right] $$

is a statespace system that tells us how the shocks $ \left[ \matrix{ \epsilon_{1,t+1} \cr \epsilon_{2,t+1} } \right] $ affect states $ \hat x_{t+1}, x_t $, the observable $ y_t $, and the innovation $ a_t $

With this tool at our disposal, let’s form the composite system and simulate it

In [4]:
# Create grand state space for y_t, a_t as observed vars -- Use stacking trick above
Af = np.array([[ 1,      0,        0],
               [K1, 1 - K1, K1 * σ_y],
               [ 0,      0,        0]])
Cf = np.array([[σ_x,        0],
               [  0, K1 * σ_y],
               [  0,        1]])
Gf = np.array([[1,  0, σ_y],
               [1, -1, σ_y]])

μ_true, μ_prior = 10, 10
μ_f = np.array([μ_true, μ_prior, 0]).reshape(3, 1)

# Create the state space
ssf = LinearStateSpace(Af, Cf, Gf, mu_0=μ_f)

# Draw observations of y from state space model
N = 50
xf, yf = ssf.simulate(N)

print(f"Kalman gain = {K1}")
print(f"Conditional variance = {S1}")
Kalman gain = [[0.181]]
Conditional variance = [[5.5249]]

Now that we have simulated our joint system, we have $ x_t $, $ \hat{x_t} $, and $ y_t $

We can now investigate how these variables are related by plotting some key objects

Estimates of unobservables

First, let’s plot the hidden state $ x_t $ and the filtered version $ \hat x_t $ that is linear-least squares projection of $ x_t $ on the history $ y_{t-1}, y_{t-2}, \ldots $

In [5]:
plt.plot(xf[0, :], label="$x_t$")
plt.plot(xf[1, :], label="Filtered $x_t$")
plt.title(r"$x$ vs $\hat{x}$")

Note how $ x_t $ and $ \hat{x_t} $ differ

For Friedman, $ \hat x_t $ and not $ x_t $ is the consumer’s idea about her/his permanent income

Relation between unobservable and observable

Now let’s plot $ x_t $ and $ y_t $

Recall that $ y_t $ is just $ x_t $ plus white noise

In [6]:
plt.plot(yf[0, :], label="y")
plt.plot(xf[0, :], label="x")
plt.title(r"$x$ and $y$")

We see above that $ y $ seems to look like white noise around the values of $ x $


Recall that we wrote down the innovation representation that depended on $ a_t $. We now plot the innovations $ \{a_t\} $:

In [7]:
plt.plot(yf[1, :], label="a")
plt.title(r"Innovation $a_t$")

MA and AR Representations

Now we shall extract from the Kalman instance kmuth coefficients of

  • a fundamental moving average representation that represents $ y_t $ as a one-sided moving sum of current and past $ a_t $s that are square summable linear combinations of $ y_t, y_{t-1}, \ldots $
  • a univariate autogression representation that depicts the coefficients in a linear least squares projection of $ y_t $ on the semi-infinite history $ y_{t-1}, y_{t-2}, \ldots $

Then we’ll plot each of them

In [8]:
# Kalman Methods for MA and VAR
coefs_ma = kmuth.stationary_coefficients(5, "ma")
coefs_var = kmuth.stationary_coefficients(5, "var")

# Coefficients come in a list of arrays; but we
# want to plot them and so need to stack into array
coefs_ma_array = np.vstack(coefs_ma)
coefs_var_array = np.vstack(coefs_var)

fig, ax = plt.subplots(2)
ax[0].plot(coefs_ma_array, label="MA")
ax[1].plot(coefs_var_array, label="VAR")

The moving average coefficients in the top panel show tell-tale signs of $ y_t $ being a process whose first difference is a first order autoregression

The autoregressive coefficients decline geometrically with decay rate $ (1-K) $

These are exactly the target outcomes that Muth (1960) aimed to reverse engineer

In [9]:
print(f'decay parameter 1 - K1 = {1 - K1}')
decay parameter 1 - K1 = [[0.819]]