How to read this lecture...

Code should execute sequentially if run in a Jupyter notebook

Other Scientific Libraries

In addition to what’s in Anaconda, this lecture will need the following libraries:

In [1]:
!pip install --upgrade quantecon

Overview

In this lecture, we review some other scientific libraries that are useful for economic research and analysis.

We have, however, already picked most of the low hanging fruit in terms of economic research.

Hence you should feel free to skip this lecture on first pass.

Cython

Like Numba, Cython provides an approach to generating fast compiled code that can be used from Python.

As was the case with Numba, a key problem is the fact that Python is dynamically typed.

As you’ll recall, Numba solves this problem (where possible) by inferring type.

Cython’s approach is different — programmers add type definitions directly to their “Python” code.

As such, the Cython language can be thought of as Python with type definitions.

In addition to a language specification, Cython is also a language translator, transforming Cython code into optimized C and C++ code.

Cython also takes care of building language extensions — the wrapper code that interfaces between the resulting compiled code and Python.

Important Note:

In what follows code is executed in a Jupyter notebook.

This is to take advantage of a Cython cell magic that makes Cython particularly easy to use.

Some modifications are required to run the code outside a notebook.

A First Example

Let’s start with a rather artificial example.

Suppose that we want to compute the sum $ \sum_{i=0}^n \alpha^i $ for given $ \alpha, n $.

Suppose further that we’ve forgotten the basic formula

$$ \sum_{i=0}^n \alpha^i = \frac{1 - \alpha^{n+1}}{1 - \alpha} $$

for a geometric progression and hence have resolved to rely on a loop.

Python vs C

Here’s a pure Python function that does the job

In [2]:
def geo_prog(alpha, n):
    current = 1.0
    sum = current
    for i in range(n):
        current = current * alpha
        sum = sum + current
    return sum

This works fine but for large $ n $ it is slow.

Here’s a C function that will do the same thing

double geo_prog(double alpha, int n) {
    double current = 1.0;
    double sum = current;
    int i;
    for (i = 1; i <= n; i++) {
        current = current * alpha;
        sum = sum + current;
    }
    return sum;
}

If you’re not familiar with C, the main thing you should take notice of is the type definitions

  • int means integer
  • double means double precision floating-point number
  • the double in double geo_prog(... indicates that the function will return a double

Not surprisingly, the C code is faster than the Python code.

A Cython Implementation

Cython implementations look like a convex combination of Python and C.

We’re going to run our Cython code in the Jupyter notebook, so we’ll start by loading the Cython extension in a notebook cell

In [3]:
%load_ext Cython

In the next cell, we execute the following

In [4]:
%%cython
def geo_prog_cython(double alpha, int n):
    cdef double current = 1.0
    cdef double sum = current
    cdef int i
    for i in range(n):
        current = current * alpha
        sum = sum + current
    return sum

Here cdef is a Cython keyword indicating a variable declaration and is followed by a type.

The %%cython line at the top is not actually Cython code — it’s a Jupyter cell magic indicating the start of Cython code.

After executing the cell, you can now call the function geo_prog_cython from within Python.

What you are in fact calling is compiled C code with a Python call interface

In [5]:
import quantecon as qe
qe.util.tic()
geo_prog(0.99, int(10**6))
qe.util.toc()
TOC: Elapsed: 0:00:0.12
Out[5]:
0.12008047103881836
In [6]:
qe.util.tic()
geo_prog_cython(0.99, int(10**6))
qe.util.toc()
TOC: Elapsed: 0:00:0.05
Out[6]:
0.051430463790893555

Example 2: Cython with NumPy Arrays

Let’s go back to the first problem that we worked with: generating the iterates of the quadratic map

$$ x_{t+1} = 4 x_t (1 - x_t) $$

The problem of computing iterates and returning a time series requires us to work with arrays.

The natural array type to work with is NumPy arrays.

Here’s a Cython implementation that initializes, populates and returns a NumPy array

In [7]:
%%cython
import numpy as np

def qm_cython_first_pass(double x0, int n):
    cdef int t
    x = np.zeros(n+1, float)
    x[0] = x0
    for t in range(n):
        x[t+1] = 4.0 * x[t] * (1 - x[t])
    return np.asarray(x)

If you run this code and time it, you will see that its performance is disappointing — nothing like the speed gain we got from Numba

In [8]:
qe.util.tic()
qm_cython_first_pass(0.1, int(10**5))
qe.util.toc()
TOC: Elapsed: 0:00:0.04
Out[8]:
0.04188847541809082

This example was also computed in the Numba lecture, and you can see Numba is around 90 times faster.

The reason is that working with NumPy arrays incurs substantial Python overheads.

We can do better by using Cython’s typed memoryviews, which provide more direct access to arrays in memory.

When using them, the first step is to create a NumPy array.

Next, we declare a memoryview and bind it to the NumPy array.

Here’s an example:

In [9]:
%%cython
import numpy as np
from numpy cimport float_t

def qm_cython(double x0, int n):
    cdef int t
    x_np_array = np.zeros(n+1, dtype=float)
    cdef float_t [:] x = x_np_array
    x[0] = x0
    for t in range(n):
        x[t+1] = 4.0 * x[t] * (1 - x[t])
    return np.asarray(x)

Here

  • cimport pulls in some compile-time information from NumPy
  • cdef float_t [:] x = x_np_array creates a memoryview on the NumPy array x_np_array
  • the return statement uses np.asarray(x) to convert the memoryview back to a NumPy array

Let’s time it:

In [10]:
qe.util.tic()
qm_cython(0.1, int(10**5))
qe.util.toc()
TOC: Elapsed: 0:00:0.00
Out[10]:
0.0008900165557861328

This is fast, although still slightly slower than qm_numba.

Summary

Cython requires more expertise than Numba, and is a little more fiddly in terms of getting good performance.

In fact, it’s surprising how difficult it is to beat the speed improvements provided by Numba.

Nonetheless,

  • Cython is a very mature, stable and widely used tool.
  • Cython can be more useful than Numba when working with larger, more sophisticated applications.

Joblib

Joblib is a popular Python library for caching and parallelization.

To install it, start Jupyter and type

In [11]:
!pip install joblib
Requirement already satisfied: joblib in /home/qebuild/anaconda3/lib/python3.7/site-packages (0.13.2)

from within a notebook.

Here we review just the basics.

Caching

Perhaps, like us, you sometimes run a long computation that simulates a model at a given set of parameters — to generate a figure, say, or a table.

20 minutes later you realize that you want to tweak the figure and now you have to do it all again.

What caching will do is automatically store results at each parameterization.

With Joblib, results are compressed and stored on file, and automatically served back up to you when you repeat the calculation.

An Example

Let’s look at a toy example, related to the quadratic map model discussed above.

Let’s say we want to generate a long trajectory from a certain initial condition $ x_0 $ and see what fraction of the sample is below 0.1.

(We’ll omit JIT compilation or other speedups for simplicity)

Here’s our code

In [12]:
from joblib import Memory
location = './cachedir'
memory = Memory(location='./joblib_cache')

@memory.cache
def qm(x0, n):
    x = np.empty(n+1)
    x[0] = x0
    for t in range(n):
        x[t+1] = 4 * x[t] * (1 - x[t])
    return np.mean(x < 0.1)

We are using joblib to cache the result of calling qm at a given set of parameters.

With the argument location=’./joblib_cache’, any call to this function results in both the input values and output values being stored a subdirectory joblib_cache of the present working directory.

(In UNIX shells, . refers to the present working directory)

The first time we call the function with a given set of parameters we see some extra output that notes information being cached

In [13]:
qe.util.tic()
n = int(1e7)
qm(0.2, n)
qe.util.toc()
________________________________________________________________________________
[Memory] Calling __main__--home-qebuild-repos-quantecon.build.lectures-_source-lecture-source-py-_build-website-jupyter-executed-__ipython-input__.qm...
qm(0.2, 10000000)
______________________________________________________________qm - 10.4s, 0.2min
TOC: Elapsed: 0:00:10.40
Out[13]:
10.408669471740723

The next time we call the function with the same set of parameters, the result is returned almost instantaneously

In [14]:
qe.util.tic()
n = int(1e7)
qm(0.2, n)
qe.util.toc()
TOC: Elapsed: 0:00:0.00
Out[14]:
0.0008406639099121094

Other Options

There are in fact many other approaches to speeding up your Python code.

One is interfacing with Fortran.

If you are comfortable writing Fortran you will find it very easy to create extension modules from Fortran code using F2Py.

F2Py is a Fortran-to-Python interface generator that is particularly simple to use.

Robert Johansson provides a very nice introduction to F2Py, among other things.

Recently, a Jupyter cell magic for Fortran has been developed — you might want to give it a try.

Exercises

Exercise 1

Later we’ll learn all about finite-state Markov chains.

For now, let’s just concentrate on simulating a very simple example of such a chain.

Suppose that the volatility of returns on an asset can be in one of two regimes — high or low.

The transition probabilities across states are as follows

For example, let the period length be one month, and suppose the current state is high.

We see from the graph that the state next month will be

  • high with probability 0.8
  • low with probability 0.2

Your task is to simulate a sequence of monthly volatility states according to this rule.

Set the length of the sequence to n = 100000 and start in the high state.

Implement a pure Python version, a Numba version and a Cython version, and compare speeds.

To test your code, evaluate the fraction of time that the chain spends in the low state.

If your code is correct, it should be about 2/3.

Solutions

Exercise 1

We let

  • 0 represent “low”
  • 1 represent “high”
In [15]:
p, q = 0.1, 0.2  # Prob of leaving low and high state respectively

Here’s a pure Python version of the function

In [16]:
def compute_series(n):
    x = np.empty(n, dtype=np.int_)
    x[0] = 1  # Start in state 1
    U = np.random.uniform(0, 1, size=n)
    for t in range(1, n):
        current_x = x[t-1]
        if current_x == 0:
            x[t] = U[t] < p
        else:
            x[t] = U[t] > q
    return x

Let’s run this code and check that the fraction of time spent in the low state is about 0.666

In [17]:
n = 100000
x = compute_series(n)
print(np.mean(x == 0))  # Fraction of time x is in state 0
0.67497

Now let’s time it

In [18]:
qe.util.tic()
compute_series(n)
qe.util.toc()
TOC: Elapsed: 0:00:0.09
Out[18]:
0.09005856513977051

Next let’s implement a Numba version, which is easy

In [19]:
from numba import jit

compute_series_numba = jit(compute_series)

Let’s check we still get the right numbers

In [20]:
x = compute_series_numba(n)
print(np.mean(x == 0))
0.66727

Let’s see the time

In [21]:
qe.util.tic()
compute_series_numba(n)
qe.util.toc()
TOC: Elapsed: 0:00:0.00
Out[21]:
0.001298666000366211

This is a nice speed improvement for one line of code.

Now let’s implement a Cython version

In [22]:
%load_ext Cython
The Cython extension is already loaded. To reload it, use:
  %reload_ext Cython
In [23]:
%%cython
import numpy as np
from numpy cimport int_t, float_t

def compute_series_cy(int n):
    # == Create NumPy arrays first == #
    x_np = np.empty(n, dtype=int)
    U_np = np.random.uniform(0, 1, size=n)
    # == Now create memoryviews of the arrays == #
    cdef int_t [:] x = x_np
    cdef float_t [:] U = U_np
    # == Other variable declarations == #
    cdef float p = 0.1
    cdef float q = 0.2
    cdef int t
    # == Main loop == #
    x[0] = 1
    for t in range(1, n):
        current_x = x[t-1]
        if current_x == 0:
            x[t] = U[t] < p
        else:
            x[t] = U[t] > q
    return np.asarray(x)
In [24]:
compute_series_cy(10)
Out[24]:
array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1])
In [25]:
x = compute_series_cy(n)
print(np.mean(x == 0))
0.67253
In [26]:
qe.util.tic()
compute_series_cy(n)
qe.util.toc()
TOC: Elapsed: 0:00:0.00
Out[26]:
0.003190279006958008

The Cython implementation is fast but not as fast as Numba.