A Stochastic Optimal Growth Model


In this lecture we’re going to study a simple optimal growth model with one agent

The model is a version of the standard one sector infinite horizon growth model studied in

The technique we use to solve the model is dynamic programming

Our treatment of dynamic programming follows on from earlier treatments in our lectures on shortest paths and job search

We’ll discuss some of the technical details of dynamic programming as we go along

The Model

Consider an agent who owns an amount \(y_t \in \mathbb R_+ := [0, \infty)\) of a consumption good at time \(t\)

This output can either be consumed or invested

When the good is invested it is transformed one-for-one into capital

The resulting capital stock, denoted here by \(k_{t+1}\), will then be used for production

Production is stochastic, in that it also depends on a shock \(\xi_{t+1}\) realized at the end of the current period

Next period output is

\[y_{t+1} := f(k_{t+1}) \xi_{t+1}\]

where \(f \colon \RR_+ \to \RR_+\) is called the production function

The resource constraint is

(1)\[k_{t+1} + c_t \leq y_t\]

and all variables are required to be nonnegative

Assumptions and Comments

In what follows,

  • The sequence \(\{\xi_t\}\) is assumed to be IID
  • The common distribution of each \(\xi_t\) will be denoted \(\phi\)
  • The production function \(f\) is assumed to be increasing and continuous
  • Depreciation of capital is not made explicit but can be incorporated into the production function

While that many other treatments of the stochastic growth model use \(k_t\) as the state variable, we will use \(y_t\)

This will allow us to treat a stochastic model while maintaining only one state variable

We consider alternative states and timing specifications in some of our other lectures


Taking \(y_0\) as given, the agent wishes to maximize

(2)\[\mathbb E \left[ \sum_{t = 0}^{\infty} \beta^t u(c_t) \right]\]

subject to

(3)\[y_{t+1} = f(y_t - c_t) \xi_{t+1} \quad \text{and} \quad 0 \leq c_t \leq y_t \quad \text{for all } t\]


  • \(u\) is a bounded, continuous and strictly increasing utility function and
  • \(\beta \in (0, 1)\) is a discount factor

In (3) we are assuming that the resource constraint (1) holds with equality — which is reasonable because \(u\) is strictly increasing and no output will be wasted at the optimum

In summary, the agent’s aim is to select a path \(c_0, c_1, c_2, \ldots\) for consumption that is

  1. nonnegative,
  2. feasible in the sense of (1),
  3. optimal, in the sense that it maximizes (2) relative to all other feasible consumption sequences, and
  4. adapted, in the sense that the action \(c_t\) depends only on observable outcomes, not future outcomes such as \(\xi_{t+1}\)

In the present context

  • \(y_t\) is called the state variable — it summarizes the “state of the world” at the start of each period
  • \(c_t\) is called the control variable — a value chosen by the agent each period after observing the state

The Policy Function Approach

One way to think about solving this problem is to look for the best policy function

A policy function is a map from past and present observables into current action

We’ll be particularly interested in Markov policies, which are maps from the current state \(y_t\) into a current action \(c_t\)

For standard dynamic programming problems such as this one (also called Markov decision processes, the optimal policy is always a Markov policy

In other words, the current state \(y_t\) provides a sufficient statistic for the history in terms of making an optimal decision today

This is quite intuitive but if you wish you can find proofs in texts such as [SLP89] (section 4.1)

Hereafter we focus on finding the best Markov policy

In our context, a Markov policy is a function \(\sigma \colon \mathbb R_+ \to \mathbb R_+\), with the understanding that states are mapped to actions via

\[c_t = \sigma(y_t) \quad \text{for all } t\]

In what follows, we will call \(\sigma\) a feasible consumption policy if it satisfies

(4)\[0 \leq \sigma(y) \leq y \quad \text{for all} \quad y \in \mathbb R_+\]

In other words, a feasible consumption policy is a Markov policy that respects the resource constraint

The set of all feasible consumption policies will be denoted by \(\Sigma\)

Each \(\sigma \in \Sigma\) determines a continuous state Markov process \(\{y_t\}\) for output via

(5)\[y_{t+1} = f(y_t - \sigma(y_t)) \xi_{t+1}, \quad y_0 \text{ given}\]

This is the time path for output when we choose and stick with the policy \(\sigma\)

We insert this process into the objective to get

(6)\[\mathbb E \left[ \, \sum_{t = 0}^{\infty} \beta^t u(c_t) \, \right] = \mathbb E \left[ \, \sum_{t = 0}^{\infty} \beta^t u(\sigma(y_t)) \, \right]\]

This is the total expected present value of following policy \(\sigma\) forever, given initial income \(y_0\)

The aim is to select a policy that makes this number as large as possible

The next section covers these ideas more formally


The policy value function \(v_{\sigma}\) associated with a given policy \(\sigma\) is the mapping defined by

(7)\[v_{\sigma}(y) = \mathbb E \left[ \sum_{t = 0}^{\infty} \beta^t u(\sigma(y_t)) \right]\]

when \(\{y_t\}\) is given by (5) with \(y_0 = y\)

In other words, it is the lifetime value of following policy \(\sigma\) starting at initial condition \(y\)

The value function is then defined as

(8)\[v^*(y) := \sup_{\sigma \in \Sigma} \; v_{\sigma}(y)\]

The value function gives the maximal value that can be obtained from state \(y\), after considering all feasible policies

A policy \(\sigma \in \Sigma\) is called optimal if it attains the supremum in (8) for all \(y \in \mathbb R_+\)

The Bellman Equation

With our assumptions on utility and production function, the value function as defined in (8) also satisfies a Bellman equation

For this problem, the Bellman equation takes the form

(9)\[v^*(y) = \max_{0 \leq c \leq y} \left\{ u(c) + \beta \int v^*(f(y - c) z) \phi(dz) \right\} \quad \text{for all} \quad y \in \mathbb R_+\]

Here \(\int v^*(f(y - c) z) \phi(dz) = \mathbb E v^*(f(y - c) \xi_{t+1})\) is the expected next period value when the state is \(y\) and consumption is set to \(c\)

  • See e.g., EDTC, theorem 10.1.11

The intuitive content of the Bellman equation is that maximal value from a given state can be obtained by optimally trading off

  • current reward from a given action vs
  • expected discounted future value of the state resulting from that action

The Bellman equation is important because it gives us more information about the value function

It also suggests a way of computing the value function, which we discuss below

Greedy policies

The primary importance of the value function is that we can use it to compute optimal policies

The details are as follows

Given a continuous function \(w\) on \(\mathbb R_+\), we say that \(\sigma \in \Sigma\) is \(w\)-greedy if \(\sigma(y)\) is a solution to

(10)\[\max_{0 \leq c \leq y} \left\{ u(c) + \beta \int w(f(y - c) z) \phi(dz) \right\}\]

for every \(y \in \mathbb R_+\)

In other words, \(\sigma \in \Sigma\) is \(w\)-greedy if it optimally trades off current and future rewards when \(w\) is taken to be the value function

In our setting, a feasible consumption policy is optimal if and only if it is \(v^*\)-greedy policy (see, e.g., theorem 10.1.11 of EDTC)

Hence, once we have a good approximation to \(v^*\), we can compute the (approximately) optimal policy by computing the corresponding greedy policy

The advantage is that we are now solving a much lower dimensional optimization problem

The Bellman Operator

How, then, should we compute the value function?

One way is to use the so-called Bellman operator

(An operator is a map that sends functions into functions)

The Bellman operator is denoted by \(T\) and defined by

(11)\[Tw(y) := \max_{0 \leq c \leq y} \left\{ u(c) + \beta \int w(f(y - c) z) \phi(dz) \right\}\]

In other words, \(T\) sends the function \(w\) into the new function \(Tw\) defined by

\[y \mapsto \max_{0 \leq c \leq y} \left\{ u(c) + \beta \int w(f(y - c) z) \phi(dz) \right\}\]

By construction, the value function \(v^*\) is a fixed point of \(T\)

In other words, \(T\) maps \(v^*\) back to itself, since, by the definition of the Bellman operator and the Bellman equation

\[Tv^*(y) = \max_{0 \leq c \leq y} \left\{ u(c) + \beta \int v^*(f(y - c) z) \phi(dz) \right\} = v^*(y)\]

It turns out that, in addition, all bounded continuous functions are mapped towards \(v^*\) under iteration with \(T\)

This will be a key point for our algorithm (value function iteration) and is discussed further below

Summary of Theoretical Results

Let’s collect and summarize the theoretical results discussed above

Under our assumptions:

  1. The value function \(v^*\) is finite, bounded, continuous and satisfies the Bellman equation
  2. At least one optimal policy exists
  3. A policy is optimal if and only if it is \(v^*\)-greedy
  4. The sequence of functions \(w, Tw, T(Tw) = T^2 w, \ldots\) generated by iteratively applying \(T\) starting at some initial \(w\) converges uniformly to \(v^*\) whenever \(w\) is bounded and continuous

The iterative method in the last step is called value function iteration

For a full proof of these results see, e.g., proposition 10.1.13 and lemma 10.1.20 of EDTC

Unbounded Utility

The results stated above assume that the utility function is bounded

In practice economists often work with unbounded utility functions — and so will we

In the unbounded setting, various optimality theories exist but they tend to be case specific

We’ll give references as we go along, but you can consult, for example, section 12.2 of EDTC, [Kam12] or [MdRV10]


Let’s now look at computing the value function and the optimal policy

Fitted Value Iteration

The first step is to compute the value function by value function iteration

In theory, the algorithm is as follows

  1. Begin with a function \(w\) — an initial condition
  2. Solving (11), obtain the function \(T w\)
  3. Unless some stopping condition is satisfied, set \(w = Tw\) and go to step 2

This generates the sequence \(w, Tw, T^2 w, \ldots\)

However, there is a problem we must confront before we implement this procedure: The iterates can neither be calculated exactly nor stored on a computer

To see the issue, consider (11)

Even if \(w\) is a known function, unless \(Tw\) can be shown to have some special structure, the only way to store this function is to record the value \(Tw(y)\) for every \(y \in \mathbb R_+\)

Clearly this is impossible

What we will do instead is use fitted value function iteration

The procedure is to record the value of the function \(Tw\) at only finitely many “grid” points \(y_1 < y_2 < \cdots < y_I\) and reconstruct it from this information when required

More precisely, the algorithm will be

  1. Begin with an array of values \(\{ w_1, \ldots, w_I \}\) representing the values of some initial function \(w\) on the grid points \(\{ y_1, \ldots, y_I \}\)
  2. Build a function \(\hat w\) on the state space \(\mathbb R_+\) by interpolation or approximation, based on the grid values
  3. By repeatedly solving (11), obtain and record the value \(T \hat w(y_i)\) on each grid point \(y_i\)
  4. Unless some stopping condition is satisfied, set \(\{ w_1, \ldots, w_I \} = \{ T \hat w(y_1), \ldots, T \hat w(y_I) \}\) and go to step 2

How should we go about step 2?

This is a problem of function approximation, and there are many ways to approach it

What’s important here is that the function approximation scheme must not only produce a good approximation to \(Tw\), but also combine well with the broader iteration algorithm described above

One good choice from both respects is continuous piecewise linear interpolation (see this paper for further discussion)

The next figure illustrates piecewise linear interpolation of an arbitrary function on grid points \(0, 0.2, 0.4, \ldots, 1\)


Another advantage of piecewise linear interpolation is that it preserves useful shape properties such as monotonicity and concavity / convexity

The Bellman Operator

Here’s a function that implements the Bellman operator using linear interpolation

Solving the optimal growth problem via value function iteration.

@authors : Spencer Lyon, John Stachurski

@date : Thu Feb 2 14:39:36 AEDT 2017



using QuantEcon
using Optim

The approximate Bellman operator, which computes and returns the
updated value function Tw on the grid points.  An array to store
the new set of values Tw is optionally supplied (to avoid having to
allocate new arrays at each iteration).  If supplied, any existing data in 
Tw will be overwritten.

#### Arguments

w : array_like(float, ndim=1)
    The value of the input function on different grid points
grid : array_like(float, ndim=1)
    The set of grid points
u : function
    The utility function
f : function
    The production function
shocks : numpy array
    An array of draws from the shock, for Monte Carlo integration (to
    compute expectations).
beta : scalar
    The discount factor
Tw : array_like(float, ndim=1) optional (default=None)
    Array to write output values to
compute_policy : Boolean, optional (default=False)
    Whether or not to compute policy function

function bellman_operator(w, grid, beta, u, f, shocks; compute_policy=false)

    # === Apply linear interpolation to w === #
    w_func = LinInterp(grid, w)
    Tw = similar(w)

    if compute_policy
        sigma = similar(w)

    # == set Tw[i] = max_c { u(c) + beta E w(f(y  - c) z)} == #
    for (i, y) in enumerate(grid)
        objective(c) = - u(c) - beta * mean(w_func.(f(y - c) .* shocks))
        res = optimize(objective, 1e-10, y)

        if compute_policy
            sigma[i] = res.minimizer
        Tw[i] = - res.minimum

    if compute_policy
        return Tw, sigma
        return Tw

This code is from the file optgrowth.jl

The arguments to bellman_operator are described in the docstring to the function

Notice that the expectation in (11) is computed via Monte Carlo, using the approximation

\[\int w(f(y - c) z) \phi(dz) \approx \frac{1}{n} \sum_{i=1}^n w(f(y - c) \xi_i)\]

where \(\{\xi_i\}_{i=1}^n\) are IID draws from \(\phi\)

Monte Carlo is not always the most efficient way to compute integrals numerically but it does have some theoretical advantages in the present setting

(For example, it preserves the contraction mapping property of the Bellman operator — see, e.g., [PalS13])

An Example

Let’s test out our operator when

  • \(f(k) = k^{\alpha}\)
  • \(u(c) = \ln c\)
  • \(\phi\) is the distribution of \(\exp(\mu + \sigma \zeta)\) when \(\zeta\) is standard normal

As is well-known (see [LS12], section 3.1.2), for this particular problem an exact analytical solution is available, with

(12)\[v^*(y) = \frac{\ln (1 - \alpha \beta) }{ 1 - \beta} + \frac{(\mu + \alpha \ln (\alpha \beta))}{1 - \alpha} \left[ \frac{1}{1- \beta} - \frac{1}{1 - \alpha \beta} \right] + \frac{1}{1 - \alpha \beta} \ln y\]

The optimal consumption policy is

\[\sigma^*(y) = (1 - \alpha \beta ) y\]

Let’s code this up now so we can test against it below


Log linear optimal growth model, with log utility, CD production and
multiplicative lognormal shock, so that

    y = f(k, z) = z k^alpha

with z ~ LN(mu, s).


alpha = 0.4
beta = 0.96
mu = 0
s = 0.1

ab = alpha * beta
c1 = log(1 - ab) / (1 - beta)
c2 = (mu + alpha * log(ab)) / (1 - alpha)
c3 = 1 / (1 - beta)
c4 = 1 / (1 - ab)

# Utility 
u(c) = log(c)

u_prime(c) = 1 / c

# Deterministic part of production function
f(k) = k^alpha

f_prime(k) = alpha * k^(alpha - 1)

# True optimal policy
c_star(y) = (1 - alpha * beta) * y

# True value function
v_star(y) = c1 + c2 * (c3 - c4) + c4 * log(y)

A First Test

To test our code, we want to see if we can replicate this solution numerically, using fitted value function iteration

We need a grid and some shock draws for Monte Carlo integration

grid_max = 4         # Largest grid point
grid_size = 200      # Number of grid points
shock_size = 250     # Number of shock draws in Monte Carlo integral

grid = linspace(1e-5, grid_max, grid_size)
shocks = exp(mu + s * randn(shock_size))

Now let’s do some tests

In the code shown below and all other code, we assume that you’ve already run the code above, as well as

using PyPlot

As one preliminary test, let’s see what happens when we apply our Bellman operator to the exact solution \(v^*\)

In theory, the resulting function should again be \(v^*\)

In practice we expect some small numerical error

w = bellman_operator(v_star(grid),
                     k -> k^alpha,

fig, ax = subplots(figsize=(9, 5))

ax[:set_ylim](-35, -24)
ax[:plot](grid, w, lw=2, alpha=0.6, label=L"$Tv^*$")
ax[:plot](grid, v_star(grid), lw=2, alpha=0.6, label=L"$v^*$")
ax[:legend](loc="lower right")


Here’s the output:


The two functions are essentially indistinguishable, so we are off to a good start

Now let’s have a look at iterating with the Bellman operator, starting off from an arbitrary initial condition

The initial condition we’ll start with is \(w(y) = 5 \ln (y)\)

w = 5 * log(grid)  # An initial condition -- fairly arbitrary
n = 35
fig, ax = subplots(figsize=(9, 6))

ax[:set_ylim](-50, 10)
ax[:set_xlim](minimum(grid), maximum(grid))
lb = "initial condition"
jet = ColorMap("jet")
ax[:plot](grid, w, color=jet(0), lw=2, alpha=0.6, label=lb)
for i in 1:n
    w = bellman_operator(w,
                         k -> k^alpha,

    ax[:plot](grid, w, color=jet(i / n), lw=2, alpha=0.6)

lb = "true value function"
ax[:plot](grid, v_star(grid), "k-", lw=2, alpha=0.8, label=lb)
ax[:legend](loc="lower right")


The figure shows

  1. the first 36 functions generated by the fitted value function iteration algorithm, with hotter colors given to higher iterates
  2. the true value function \(v^*\) drawn in black

The sequence of iterates converges towards \(v^*\)

We are clearly getting closer

Let’s try a more serious attempt to converge to \(v^*\), using QuantEcon’s compute_fixed_point function

import QuantEcon: compute_fixed_point

Tw = similar(grid)
initial_w = 5 * log(grid)

bellman_operator(w) = bellman_operator(w,
                         k -> k^alpha,

v_star_approx = compute_fixed_point(bellman_operator,

Here’s the output

Compute iterate 5 with error 0.946698700118977
Compute iterate 10 with error 0.6988224004801369
Compute iterate 15 with error 0.5691870511391119
Compute iterate 20 with error 0.46409447475507903
Compute iterate 25 with error 0.37840992130950113
Compute iterate 30 with error 0.30854511798280626
Compute iterate 35 with error 0.2515792651781048
Compute iterate 40 with error 0.20513086410855763
Compute iterate 45 with error 0.1672581060291911
Compute iterate 50 with error 0.13637769310849635
Compute iterate 55 with error 0.11119864752242492
Compute iterate 60 with error 0.09066834119981593
Compute iterate 65 with error 0.07392848995095136
Compute iterate 70 with error 0.06027927228089425
Compute iterate 75 with error 0.04915007284901307
Compute iterate 80 with error 0.04007562748615712
Compute iterate 85 with error 0.03267657249148925
Compute iterate 90 with error 0.026643585060639907
Compute iterate 95 with error 0.021724451824759683
Compute iterate 100 with error 0.017713524888225862
Compute iterate 105 with error 0.014443124572117938
Compute iterate 110 with error 0.011776529444226469
Compute iterate 115 with error 0.009602260581335287
Compute iterate 120 with error 0.00782942111325724
Compute iterate 125 with error 0.00638389621369484
Compute iterate 130 with error 0.005205254677008497
Compute iterate 135 with error 0.004244222547782783
Compute iterate 140 with error 0.003460623188022538
Compute iterate 145 with error 0.0028216976641211033
Compute iterate 150 with error 0.002300735236314466
Compute iterate 155 with error 0.0018759566960540042
Compute iterate 160 with error 0.0015296038719085914
Compute iterate 165 with error 0.0012471972353047533
Compute iterate 170 with error 0.001016930574156305
Compute iterate 175 with error 0.000829177425632821
Compute iterate 180 with error 0.000676088634357086
Compute iterate 185 with error 0.0005512642135769852
Compute iterate 190 with error 0.00044948578893411195
Compute iterate 195 with error 0.00036649844026470646
Compute iterate 200 with error 0.0002988328219863945
Compute iterate 205 with error 0.00024366012416265903
Compute iterate 210 with error 0.00019867381280747054
Compute iterate 215 with error 0.00016199320264220773
Compute iterate 220 with error 0.00013208483474258514
Compute iterate 225 with error 0.00010769836800150756
Compute iterate 230 with error 8.78143089515504e-5
Compute iterate 235 with error 7.160138991935128e-5
Compute iterate 240 with error 5.8381818430319754e-5
Compute iterate 245 with error 4.7602940842494945e-5
Compute iterate 250 with error 3.881413832473868e-5
Compute iterate 255 with error 3.1647988627980794e-5
Compute iterate 260 with error 2.580490586723272e-5
Compute iterate 265 with error 2.10406158309695e-5
Compute iterate 270 with error 1.7155943691449238e-5
Compute iterate 275 with error 1.3988488081650985e-5
Compute iterate 280 with error 1.1405831287447654e-5
Converged in 284 steps

Let’s have a look at the result

fig, ax = subplots(figsize=(9, 5))
ax[:set_ylim](-35, -24)
ax[:plot](grid, v_star_approx, lw=2, alpha=0.6, label="approximate value function")
ax[:plot](grid, v_star(grid), lw=2, alpha=0.6, label="true value function")
ax[:legend](loc="lower right")

The figure shows that we are pretty much on the money

The Policy Function

To compute an approximate optimal policy, we take the approximate value function we just calculated and then compute the corresponding greedy policy

The next figure compares the result to the exact solution, which, as mentioned above, is \(\sigma(y) = (1 - \alpha \beta) y\)

Tw, sigma = bellman_operator(v_star_approx,
                            k -> k^alpha,

cstar = (1 - alpha * beta) * grid

fig, ax = subplots(figsize=(9, 5))
ax[:plot](grid, sigma, lw=2, alpha=0.6, label="approximate policy function")
ax[:plot](grid, cstar, lw=2, alpha=0.6, label="true policy function")
ax[:legend](loc="lower right")

The figure shows that we’ve done a good job in this instance of approximating the true policy


Exercise 1

Once an optimal consumption policy \(\sigma\) is given, the dynamics for the capital stock follows (5)

The next figure shows a simulation of 100 elements of this sequence for three different discount factors (and hence three different policies)


In each sequence, the initial condition is \(y_0 = 0.1\)

The discount factors are discount_factors = (0.9, 0.94, 0.98)

Otherwise, the parameters and primitives are the same as the log linear model discussed earlier in the lecture

Replicate the figure modulo randomness


Solution to Exercise 1

Here’s our solution, which assumes you’ve executed the code in the lecture

Compute a time series given consumption policy sigma.
function simulate_og(sigma, y0 = 0.1, ts_length=100)
    y = zeros(ts_length)
    xi = randn(ts_length)
    y[1] = y0
    for t in 1:(ts_length-1)
        y[t+1] = (y[t] - sigma(y[t]))^alpha * exp(mu + s * xi[t+1])
    return y

fig, ax = subplots(figsize=(9, 6))

for beta in (0.9, 0.94, 0.98)

    Tw = similar(grid)
    initial_w = 5 * log(grid)

    v_star_approx = compute_fixed_point(bellman_operator,

    Tw, sigma = bellman_operator(v_star_approx,
                            k -> k^alpha,

    sigma_func = LinInterp(grid, sigma)
    y = simulate_og(sigma_func)
    ax[:plot](y, lw=2, alpha=0.6, label="beta = $beta" )

ax[:legend](loc="lower right")