Useful Libraries

Overview

While Julia lacks the massive scientific ecosystem of Python, it has successfully attracted a small army of enthusiastic and talented developers

As a result, its package system is moving towards a critical mass of useful, well written libraries

In addition, a major advantage of Julia libraries is that, because Julia itself is sufficiently fast, there is less need to mix in low level languages like C and Fortran

As a result, most Julia libraries are written exclusively in Julia

Not only does this make the libraries more portable, it makes them much easier to dive into, read, learn from and modify

In this lecture we introduce a few of the Julia libraries that we’ve found particularly useful for quantitative work in economics

Credits: Thanks to @cc7768, @vgregory757 and @spencerlyon2 for keeping us up to date with current best practice

Distributions

Functions for manipulating probability distributions and generating random variables are supplied by the excellent Distributions.jl package

We’ll restrict ourselves to a few simple examples (the package itself has detailed documentation)

  • d = Normal(m, s) creates a normal distribution with mean \(m\) and standard deviation \(s\)

    • defaults are m = 0 and s = 1
  • d = Uniform(a, b) creates a uniform distribution on interval \([a, b]\)

    • defaults are a = 0 and b = 1
  • d = Binomial(n, p) creates a binomial over \(n\) trials with success probability \(p\)

  • defaults are n = 1 and p = 0.5

Distributions.jl defines various methods for acting on these instances in order to obtain

  • random draws
  • evaluations of pdfs (densities), cdfs (distribution functions), quantiles, etc.
  • mean, variance, kurtosis, etc.

For example,

  • To generate k draws from the instance d use rand(d, k)
  • To obtain the mean of the distribution use mean(d)
  • To evaluate the probability density function of d at x use pdf(d, x)

Further details on the interface can be found here

Several multivariate distributions are also implemented

Working with Data

A useful package for working with data is DataFrames

The most important data type provided is a DataFrame, a two dimensional array for storing heterogeneous data

Although data can be heterogeneous within a DataFrame, the contents of the columns must be homogeneous

This is analogous to a data.frame in R, a DataFrame in Pandas (Python) or, more loosely, a spreadsheet in Excel

The DataFrames package also supplies a DataArray type, which is like a one dimensional DataFrame

In terms of working with data, the advantage of a DataArray over a standard numerical array is that it can handle missing values

Here’s an example

julia> using DataFrames

julia> commodities = ["crude", "gas", "gold", "silver"]
4-element Array{ASCIIString,1}:
 "crude"
 "gas"
 "gold"
 "silver"

julia> last = @data([4.2, 11.3, 12.1, NA])  # Create DataArray
4-element DataArray{Float64,1}:
  4.2
 11.3
 12.1
   NA

julia> df = DataFrame(commod = commodities, price = last)
4x2 DataFrame
|-------|----------|-------|
| Row # | commod   | price |
| 1     | "crude"  | 4.2   |
| 2     | "gas"    | 11.3  |
| 3     | "gold"   | 12.1  |
| 4     | "silver" | NA    |

Columns of the DataFrame can be accessed by name

julia> df[:price]
4-element DataArray{Float64,1}:
  4.2
 11.3
 12.1
   NA

julia> df[:commod]
4-element DataArray{ASCIIString,1}:
 "crude"
 "gas"
 "gold"
 "silver"

The DataFrames package provides a number of methods for acting on DataFrames

A simple one is describe()

julia> describe(df)
commod
Length  4
Type    ASCIIString
NAs     0
NA%     0.0%
Unique  4

price
Min      4.2
1st Qu.  7.75
Median   11.3
Mean     9.200000000000001
3rd Qu.  11.7
Max      12.1
NAs      1
NA%      25.0%

There are also functions for splitting, merging and other data munging operations

Data can be read from and written to CSV files using syntax df = readtable("data_file.csv") and writetable("data_file.csv", df) respectively

Other packages for working with data can be found at JuliaStats and JuliaQuant

Interpolation

In economics we often wish to interpolate discrete data (i.e., build continuous functions that join discrete sequences of points)

We also need such representations to be fast and efficient

The package we usually turn to for this purpose is Interpolations.jl

One downside of Interpolations.jl is that the code to set up simple interpolation objects is relatively verbose

The upside is that the routines have excellent performance

The package is also well written and well maintained

Another alternative, if using univariate linear interpolation, is QuantEcon’s interpolation function

The syntax for this function is much simpler

Univariate Interpolation

Let’s start with the univariate case

We begin by creating some data points, using a sine function

using Interpolations
using Plots
plotlyjs()

x = -7:7    # x points, coase grid
y = sin.(x)  # corresponding y points

xf = -7:0.1:7  # fine grid
plot(xf, sin.(xf), label="sine function")
scatter!(x, y, label="sampled data", markersize=4)

Here’s the resulting figure

../_images/jl_interp1.png

Now let’s interpolate the sampled data points using piecewise constant, piecewise linear and cubic interpolation

itp_const = scale(interpolate(y, BSpline(Constant()), OnGrid()), x)
itp_linear = scale(interpolate(y, BSpline(Linear()), OnGrid()), x)
itp_cubic = scale(interpolate(y, BSpline(Cubic(Line())), OnGrid()), x)

When we want to evaluate them at points in their domain (i.e., between min(x) and max(x)) we can do so as follows

julia> itp_cubic[0.3]
0.29400097760820687

Note the use of square brackets, rather than parentheses!

Let’s plot these functions

xf = -7:0.1:7
y_const = [itp_const[x] for x in xf]
y_linear = [itp_linear[x] for x in xf]
y_cubic = [itp_cubic[x] for x in xf]

plot(xf, [y_const y_linear y_cubic], label=["constant" "linear" "cubic"])
scatter!(x, y, label="sampled data", markersize=4)

Here’s the figure we obtain

../_images/jl_interp2.png

Here is code for how to do the same thing with QuantEcon’s 1D linear interpolation function

using QuantEcon

li = LinInterp(x, y)        # create LinInterp object
li(0.3)                     # evaluate at a single point
y_linear_qe = li.(xf)       # evaluate at multiple points

plot(xf, y_linear_qe, label="linear")
scatter!(x, y, label="sampled data", markersize=4)

Univariate with Irregular Grid

Here’s an example with an irregular grid

using Interpolations
using Plots
plotlyjs()

x = log.(linspace(1, exp(4), 10)) + 1  # Uneven grid
y = log.(x)                            # Corresponding y points

itp_const = interpolate((x, ), y, Gridded(Constant()))
itp_linear = interpolate((x, ), y, Gridded(Linear()))

xf = log.(linspace(1, exp(4), 100)) + 1
y_const = [itp_const[x] for x in xf]
y_linear = [itp_linear[x] for x in xf]
y_true = [log(x) for x in xf]

labels = ["piecewise constant" "linear" "true function"]
plot(xf, [y_const y_linear y_true], label=labels)
scatter!(x, y, label="sampled data", markersize=4, size=(800, 400))

The figure looks as follows

../_images/jl_interp3.png

QuantEcon’s function works in exactly the same way with irregular grids

Multivariate Interpolation

We can also interpolate in higher dimensions

The following example gives one illustration

using Interpolations
using Plots
plotlyjs()
using QuantEcon: gridmake

n = 5
x = linspace(-3, 3, n)
y = copy(x)

z = Array(Float64, n, n)
f(x, y) = cos(x^2 + y^2) / (1 + x^2 + y^2)
for i in 1:n
    for j in 1:n
        z[j, i] = f(x[i], y[j])
    end
end

itp = interpolate((x, y), z, Gridded(Linear()));

nf = 50
xf = linspace(-3, 3, nf)
yf = copy(xf)

zf = Array(Float64, nf, nf)
ztrue = Array(Float64, nf, nf)
for i in 1:nf
    for j in 1:nf
        zf[j, i] = itp[xf[i], yf[j]]
        ztrue[j, i] = f(xf[i], yf[j])
    end
end

grid = gridmake(x, y)
z = reshape(z, n*n, 1)

pyplot()
surface(xf, yf, zf, color=:greens, falpha=0.7, cbar=false)
surface!(xf, yf, ztrue, fcolor=:blues, falpha=0.25, cbar=false)
scatter!(grid[:, 1], grid[:, 2], vec(z), legend=:none, color=:black, markersize=4)

This code produces the following figure

../_images/jl_interp4.png

The original function is in blue, while the linear interpolant is shown in green

Optimization, Roots and Fixed Points

Let’s look briefly at the optimization and root finding algorithms

Roots

A root of a real function \(f\) on \([a,b]\) is an \(x \in [a, b]\) such that \(f(x)=0\)

For example, if we plot the function

(1)\[f(x) = \sin(4 (x - 1/4)) + x + x^{20} - 1\]

with \(x \in [0,1]\) we get

../_images/root.png

The unique root is approximately 0.408

One common root-finding algorithm is the Newton-Raphson method

This is implemented as newton() in the Roots package and is called with the function and an initial guess

julia> using Roots

julia> f(x) = sin(4 * (x - 1/4)) + x + x^20 - 1
f (generic function with 1 method)

julia> newton(f, 0.2)
0.40829350427936706

The Newton-Raphson method uses local slope information, which can lead to failure of convergence for some initial conditions

julia> newton(f, 0.7)
-1.0022469256696989

For this reason most modern solvers use more robust “hybrid methods”, as does Roots’s fzero() function

julia> fzero(f, 0, 1)
0.40829350427936706

Optimization

For constrained, univariate minimization a useful option is optimize() from the Optim package

This function defaults to a robust hybrid optimization routine called Brent’s method

julia> using Optim

julia> optimize(x -> x^2, -1.0, 1.0)
Results of Optimization Algorithm
 * Algorithm: Brent's Method
 * Search Interval: [-1.000000, 1.000000]
 * Minimizer: -0.000000
 * Minimum: 0.000000
 * Iterations: 5
 * Convergence: max(|x - x_upper|, |x - x_lower|) <= 2*(1.5e-08*|x|+2.2e-16): true
 * Objective Function Calls: 6

For other optimization routines, including least squares and multivariate optimization, see the documentation

A number of alternative packages for optimization can be found at JuliaOpt

Others Topics

Numerical Integration

The base library contains a function called quadgk() that performs Gaussian quadrature

julia> quadgk(x -> cos(x), -2pi, 2pi)
(5.644749237155177e-15,4.696156369056425e-22)

This is an adaptive Gauss-Kronrod integration technique that’s relatively accurate for smooth functions

However, its adaptive implementation makes it slow and not well suited to inner loops

For this kind of integration you can use the quadrature routines from QuantEcon

julia> using QuantEcon

julia> nodes, weights = qnwlege(65, -2pi, 2pi);

julia> integral = do_quad(x -> cos(x), nodes, weights)
-2.912600716165059e-15

Let’s time the two implementations

julia> @time quadgk(x -> cos(x), -2pi, 2pi)
elapsed time: 2.732162971 seconds (984420160 bytes allocated, 40.55% gc time)

julia> @time do_quad(x -> cos(x), nodes, weights)
elapsed time: 0.002805691 seconds (1424 bytes allocated)

We get similar accuracy with a speed up factor approaching three orders of magnitude

More numerical integration (and differentiation) routines can be found in the package Calculus

Linear Algebra

The standard library contains many useful routines for linear algebra, in addition to standard functions such as det(), inv(), eye(), etc.

Routines are available for

  • Cholesky factorization
  • LU decomposition
  • Singular value decomposition,
  • Schur factorization, etc.

See here for further details

Further Reading

The full set of libraries available under the Julia packaging system can be browsed at pkg.julialang.org