Types, Methods and Performance


In this lecture we delve more deeply into the structure of Julia, and in particular into

  • the concept of types
  • building user defined types
  • methods and multiple dispatch

These concepts relate to the way that Julia stores and acts on data

While they might be thought of as advanced topics, some understanding is necessary to

  1. Read Julia code written by other programmers
  2. Write “well organized” Julia code that’s easy to maintain and debug
  3. Improve the speed at which your code runs

At the same time, don’t worry about following all the nuances on your first pass

If you return to these topics after doing some programming in Julia they will make more sense


In Julia all objects (all “values” in memory) have a type, which can be queried using the typeof() function


Note here that the type resides with the object itself, not with the name x

The name x is just a symbol bound to an object of type Int64

Here we rebind it to another object, and now typeof(x) gives the type of that new object


Common Types

We’ve already met many of the types defined in the core Julia language and its standard library

For numerical data, the most common types are integers and floats

For those working on a 64 bit machine, the default integers and floats are 64 bits, and are called Int64 and Float64 respectively (they would be Int32 and Float64 on a 32 bit machine)

There are many other important types, used for arrays, strings, iterators and so on

typeof(1 + 1im)
typeof(linspace(0, 1, 100))
typeof('c')  # Single character is a *Char*

Type is important in Julia because it determines what operations will be performed on the data in a given situation

Moreover, if you try to perform an action that is unexpected for a given type the function call will usually fail

100 + "100"
MethodError: no method matching +(::Int64, ::String)
Closest candidates are:
  +(::Any, ::Any, ::Any, ::Any...) at operators.jl:138
  +{T<:Union{Int128,Int16,Int32,Int64,Int8,UInt128,UInt16,UInt32,UInt64,UInt8}}(::T<:Union{Int128,Int16,Int32,Int64,Int8,UInt128,UInt16,UInt32,UInt64,UInt8}, ::T<:Union{Int128,Int16,Int32,Int64,Int8,UInt128,UInt16,UInt32,UInt64,UInt8}) at int.jl:32
  +(::Integer, ::Ptr{T}) at pointer.jl:108

Some languages will try to guess what the programmer wants here and return 200

Julia doesn’t — in this sense, Julia is a “strongly typed” language

Type is important and it’s up to the user to supply data in the correct form (as specified by type)

Methods and Multiple Dispatch

Looking more closely at how this works brings us to a very important topic concerning Julia’s data model — methods and multiple dispatch

Let’s look again at the error message

100 + "100"
MethodError: no method matching +(::Int64, ::String)
Closest candidates are:
  +(::Any, ::Any, ::Any, ::Any...) at operators.jl:138
  +{T<:Union{Int128,Int16,Int32,Int64,Int8,UInt128,UInt16,UInt32,UInt64,UInt8}}(::T<:Union{Int128,Int16,Int32,Int64,Int8,UInt128,UInt16,UInt32,UInt64,UInt8}, ::T<:Union{Int128,Int16,Int32,Int64,Int8,UInt128,UInt16,UInt32,UInt64,UInt8}) at int.jl:32
  +(::Integer, ::Ptr{T}) at pointer.jl:108

As discussed earlier, the operator + is just a function, and we can rewrite that call using functional notation to obtain exactly the same result

+(100, "100")
MethodError: no method matching +(::Int64, ::String)
Closest candidates are:
  +(::Any, ::Any, ::Any, ::Any...) at operators.jl:138
  +{T<:Union{Int128,Int16,Int32,Int64,Int8,UInt128,UInt16,UInt32,UInt64,UInt8}}(::T<:Union{Int128,Int16,Int32,Int64,Int8,UInt128,UInt16,UInt32,UInt64,UInt8}, ::T<:Union{Int128,Int16,Int32,Int64,Int8,UInt128,UInt16,UInt32,UInt64,UInt8}) at int.jl:32
  +(::Integer, ::Ptr{T}) at pointer.jl:108

Multiplication is similar

100 * "100"
MethodError: no method matching *(::Int64, ::String)
Closest candidates are:
  *(::Any, ::Any, ::Any, ::Any...) at operators.jl:138
  *{T<:Union{Int128,Int16,Int32,Int64,Int8,UInt128,UInt16,UInt32,UInt64,UInt8}}(::T<:Union{Int128,Int16,Int32,Int64,Int8,UInt128,UInt16,UInt32,UInt64,UInt8}, ::T<:Union{Int128,Int16,Int32,Int64,Int8,UInt128,UInt16,UInt32,UInt64,UInt8}) at int.jl:33
  *(::Real, ::Complex{Bool}) at complex.jl:158
*(100, "100")
MethodError: no method matching *(::Int64, ::String)
Closest candidates are:
  *(::Any, ::Any, ::Any, ::Any...) at operators.jl:138
  *{T<:Union{Int128,Int16,Int32,Int64,Int8,UInt128,UInt16,UInt32,UInt64,UInt8}}(::T<:Union{Int128,Int16,Int32,Int64,Int8,UInt128,UInt16,UInt32,UInt64,UInt8}, ::T<:Union{Int128,Int16,Int32,Int64,Int8,UInt128,UInt16,UInt32,UInt64,UInt8}) at int.jl:33
  *(::Real, ::Complex{Bool}) at complex.jl:158

What the message tells us is that *(a, b) doesn’t work when a is an integer and b is a string

In particular, the function * has no matching method

In essence, a method in Julia is a version of a function that acts on a particular tuple of data types

For example, if a and b are integers then a method for multiplying integers is invoked

*(100, 100)

On the other hand, if a and b are strings then a method for string concatenation is invoked

*("foo", "bar")

In fact we can see the precise methods being invoked by applying @which

@which *(100, 100)
*{T<:Union{Int128,Int16,Int32,Int64,Int8,UInt128,UInt16,UInt32,UInt64,UInt8}}(x::<b>T</b>, y::<b>T</b>) at int.jl:33
@which *("foo", "bar")
*(s1::AbstractString, ss::AbstractString...) at strings/basic.jl:84

We can see the same process with other functions and their methods

isfinite(1.0)  # Call isfinite on a float
@which isfinite(1)
isfinite(x::Integer) at float.jl:360
@which isfinite(1.0)
isfinite(x::AbstractFloat) at float.jl:358

Here isfinite() is a function with multiple methods

It has a method for acting on floating points and another method for acting on integers

In fact it has quite a few methods

9 methods for generic function isfinite:
isfinite(x::BigFloat) at mpfr.jl:799
isfinite(x::Float16) at float16.jl:119
isfinite(x::AbstractFloat) at float.jl:358
isfinite(x::Integer) at float.jl:360
isfinite(::Irrational) at irrationals.jl:82
isfinite(x::Real) at float.jl:359
isfinite(z::Complex) at complex.jl:57
isfinite{T<:Number}(x::AbstractArray{T,N<:Any}) at operators.jl:555
isfinite{T<:Base.Dates.TimeType}(::Union{T,Type{T}}) at dates/types.jl:218

The particular method being invoked depends on the data type on which the function is called

We’ll discuss some of the more complicated data types you see towards the end of this list as we go along

Abstract Types

Looking at the list of methods above you can see references to types that we haven’t met before, such as Real and Number

These two types are examples of what are known in Julia as abstract types

Abstract types serve a different purpose to concrete types such as Int64 and Float64

To understand what that purpose is, imagine that you want to write a function with two methods, one to handle real numbers and the other for complex numbers

As we know, there are multiple types for real numbers, such as integers and floats

There are even multiple integer and float types, such as Int32, Int64, Float32, Float64, etc.

If we want to handle all of these types for real numbers in the same way, it’s useful to have a “parent” type called Real

Rather than writing a separate method for each concrete type, we can just write one for the abstract Real type

In this way, the purpose of abstract types is to serve as a unifying parent type that concrete types can “inherit” from

Indeed, we can see that Float64, Int64, etc. are subtypes of Real as follows

Float64 <: Real
Int64 <: Real

On the other hand, 64 bit complex numbers are not reals

Complex64 <: Real

They are, however, subtypes of Number

Complex64 <: Number

Number in turn is a subtype of Any, which is a parent of all types

Number <: Any

Type Hierarchy

In fact the types form a hierarchy, with Any at the top of the tree and the concrete types at the bottom

Note that we never see instances of abstract types

For example, if x is a value then typeof(x) will never return an abstract type

The point of abstract types is simply to categorize the concrete types (as well as other abstract types that sit below them in the hierarchy)

On the other hand, we cannot subtype concrete types

While we can build subtypes of abstract types we cannot do the same for concrete types

Multiple Dispatch

We can now be a little bit clearer about what happens when you call a function on given types

Suppose we execute the function call f(a, b) where a and b are of concrete types S and T respectively

The Julia interpreter first queries the types of a and b to obtain the tuple (S, T)

It then parses the list of methods belonging to f, searching for a match

If it finds a method matching (S, T) it calls that method

If not, it looks to see whether the pair (S, T) matches any method defined for immediate parent types

For example, if S is Float64 and T is Complex64 then the immediate parents are AbstractFloat and Number respectively


Hence the interpreter looks next for a method of the form f(x::AbstractFloat, y::Number)

If the interpreter can’t find a match in immediate parents (supertypes) it proceeds up the tree, looking at the parents of the last type it checked at each iteration

  • If it eventually finds a matching method it invokes that method
  • If not, we get an error

This is the process that leads to the error that we saw above:

*(100, "100")
MethodError: no method matching *(::Int64, ::String)
Closest candidates are:
  *(::Any, ::Any, ::Any, ::Any...) at operators.jl:138
  *{T<:Union{Int128,Int16,Int32,Int64,Int8,UInt128,UInt16,UInt32,UInt64,UInt8}}(::T<:Union{Int128,Int16,Int32,Int64,Int8,UInt128,UInt16,UInt32,UInt64,UInt8}, ::T<:Union{Int128,Int16,Int32,Int64,Int8,UInt128,UInt16,UInt32,UInt64,UInt8}) at int.jl:33
  *(::Real, ::Complex{Bool}) at complex.jl:158

The procedure of matching data to appropriate methods is called dispatch

Because the procedure starts from the concrete types and works upwards, dispatch always invokes the most specific method that is available

For example, if you have methods for function f that handle

  1. (Float64, Int64) pairs
  2. (Number, Number) pairs

and you call f with f(0.5, 1) then the first method will be invoked

This makes sense because (hopefully) the first method is designed to work well with exactly this kind of data

The second method is probably more of a “catch all” method that handles other data in a less optimal way

Defining Types and Methods

Let’s look at defining our own methods and data types, including composite data types

User Defined Methods

It’s straightforward to add methods to existing functions or functions you’ve defined

In either case the process is the same:

  • use the standard syntax to define a function of the same name
  • but specify the data type for the method in the function signature

For example, we saw above that + is just a function with various methods

  • recall that a + b and +(a, b) are equivalent

We saw also that the following call fails because it lacks a matching method

+(100, "100")
MethodError: no method matching +(::Int64, ::String)
Closest candidates are:
  +(::Any, ::Any, ::Any, ::Any...) at operators.jl:138
  +{T<:Union{Int128,Int16,Int32,Int64,Int8,UInt128,UInt16,UInt32,UInt64,UInt8}}(::T<:Union{Int128,Int16,Int32,Int64,Int8,UInt128,UInt16,UInt32,UInt64,UInt8}, ::T<:Union{Int128,Int16,Int32,Int64,Int8,UInt128,UInt16,UInt32,UInt64,UInt8}) at int.jl:32
  +(::Integer, ::Ptr{T}) at pointer.jl:108

This is sensible behavior, but if you want to change it by defining a method to handle the case in question there’s nothing to stop you:

importall Base.Operators
+(x::Integer, y::String) = x + parse(Int, y)
+ (generic function with 164 methods)
+(100, "100")
100 + "100"

Here’s another example, involving a user defined function

We begin with a file called test.jl in the present working directory with the following content

function f(x)
    println("Generic function invoked")

function f(x::Number)
    println("Number method invoked")

function f(x::Integer)
    println("Integer method invoked")

Clearly these methods do nothing more than tell you which method is being invoked

Let’s now run this and see how it relates to our discussion of method dispatch above

Integer method invoked
Number method invoked
Generic function invoked

Since 3 is an Int64 and Int64 <: Integer <: Number, the call f(3) proceeds up the tree to Integer and invokes f(x::Integer)

On the other hand, 3.0 is a Float64, which is not a subtype of Integer

Hence the call f(3.0) continues up to f(x::Number)

Finally, f("foo") is handled by the generic function, since it is not a subtype of Number

User Defined Types

Most languages have facilities for creating new data types and Julia is no exception

type Foo end
foo = Foo()

Let’s make some observations about this code

First note that to create a new data type we use the keyword type followed by the name

  • By convention, type names use CamelCase (e.g., FloatingPoint, Array, AbstractArray)

When a new data type is created in this way, the interpreter simultaneously creates a default constructor for the data type

This constructor is a function for generating new instances of the data type in question

It has the same name as the data type but uses function call notion — in this case Foo()

In the code above, foo = Foo() is a call to the default constructor

A new instance of type Foo is created and the name foo is bound to that instance

Now if we want to we can create methods that act on instances of Foo

Just for fun, let’s define how to add one Foo to another

+(x::Foo, y::Foo) = "twofoos"
+ (generic function with 165 methods)
foo1, foo2 = Foo(), Foo()  # Create two Foos
+(foo1, foo2)

We can also create new functions to handle Foo data

foofunc(x::Foo) = "onefoo"
foofunc (generic function with 1 method)

This example isn’t of much use but more useful examples follow

Composite Data Types

Since the common primitive data types are already built in, most new user-defined data types are composite data types

Composite data types are data types that contain distinct fields of data as attributes

For example, let’s say we are doing a lot of work with AR(1) processes, which are random sequences \(\{X_t\}\) that follow a law of motion of the form

(1)\[X_{t+1} = a X_t + b + \sigma W_{t+1}\]

Here \(a\), \(b\) and \(\sigma\) are scalars and \(\{W_t\}\) is an iid sequence of shocks with some given distribution \(\phi\)

At times it might be convenient to take these primitives \(a\), \(b\), \(\sigma\) and \(\phi\) and organize them into a single entity like so

type AR1

For the distribution phi we’ll assign a Distribution from the Distributions package

After reading in the AR1 definition above we can do the following

using Distributions
m = AR1(0.9, 1, 1, Beta(5, 5))
AR1(0.9,1,1,Distributions.Beta{Float64}(α=5.0, β=5.0))

In this call to the constructor we’ve created an instance of AR1 and bound the name m to it

We can access the fields of m using their names and “dotted attribute” notation

Distributions.Beta{Float64}(α=5.0, β=5.0)

For example, the attribute m.phi points to an instance of Beta, which is in turn a subtype of Distribution as defined in the Distributions package

typeof(m.phi) <: Distribution

We can reach in to m and change this if we want to

m.phi = Exponential(0.5)

In our type definition we can be explicit that we want phi to be a Distribution, and the other elements to be real scalars

type AR1_explicit

(Before reading this in you might need to restart your REPL session in order to clear the old definition of AR1 from memory)

Now the constructor will complain if we try to use the wrong data type

m = AR1_explicit(0.9, 1, "foo", Beta(5, 5))
MethodError: Cannot `convert` an object of type String to an object of type Real
This may have arisen from a call to the constructor Real(...),
since type constructors fall back to convert methods.
in AR1_explicit(::Float64, ::Int64, ::String, ::Distributions.Beta{Float64}) at ./In[60]:2

This is useful if we’re going to have functions that act on instances of AR1

  • e.g., simulate time series, compute variances, generate histograms, etc.

If those functions only work with AR1 instances built from the specified data types then it’s probably best if we get an error as soon we try to make an instance that doesn’t fit the pattern

Better to fail early rather than deeper into our code where errors are harder to debug

Type Parameters

Consider the following output

typeof([10, 20, 30])

Here Array is one of Julia’s predefined types (Array <: DenseArray <: AbstractArray <: Any)

The Int64,1 in curly brackets are type parameters

In this case they are the element type and the dimension

Many other types have type parameters too

typeof(1.0 + 1.0im)
typeof(1 + 1im)

Types with parameters are therefore in fact an indexed family of types, one for each possible value of the parameter

Defining Parametric Types

We can use parametric types in our own type definitions

Let’s say we’re defining a type called FooBar with attributes foo and bar

type FooBar

Suppose we now decide that we want foo and bar to have the same type, although we don’t much care what that type is

We can achieve this with the syntax

type FooBar_explicit{T}

Now our constructor is happy provided that the arguments do in fact have the same type

fb = FooBar_explicit(1.0, 2.0)
fb = FooBar_explicit(1, 2)
fb = FooBar_explicit(1, 2.0)
MethodError: no method matching FooBar_explicit{T}(::Int64, ::Float64)
Closest candidates are:
  FooBar_explicit{T}{T}(::T, ::T) at In[66]:2
  FooBar_explicit{T}{T}(::Any) at sysimg.jl:53

Now let’s say we want the data to be of the same type and that type must be a subtype of Number

We can achieve this as follows

type FooBar_Number{T <: Number}

Let’s try it

fb = FooBar_Number(1, 2)
fb = FooBar_Number("fee", "fi")
MethodError: no method matching FooBar_Number{T<:Number}(::String, ::String)
Closest candidates are:
  FooBar_Number{T<:Number}{T}(::Any) at sysimg.jl:53

In the second instance we get an error because ASCIIString is not a subtype of Number

Writing Fast Code

Let’s briefly discuss how to write Julia code that executes quickly (for a given hardware configuration)

For now our focus is on generating more efficient machine code from essentially the same program (i.e., without parallelization or other more significant changes to the way the program runs)

Basic Concepts

The benchmark for performance is well written compiled code, expressed in languages such as C and Fortran

This is because computer programs are essentially operations on data, and the details of the operations implemented by the CPU depend on the nature of the data

When code is written in a language like C and compiled, the compiler has access to sufficient information to build machine code that will organize the data optimally in memory and implement efficient operations as required for the task in hand

To approach this benchmark, Julia needs to know about the type of data it’s processing as early as possible

An Example

Consider the following function, which essentially does the same job as Julia’s sum() function but acts only on floating point data

function sum_float_array(x::Array{Float64, 1})
    sum = 0.0
    for i in 1:length(x)
        sum += x[i]
    return sum

Calls to this function run very quickly

x = linspace(0, 1, 1e6)
1000000-element LinSpace{Float64}:
x = collect(x);  # Convert to array of Float64s
1000000-element Array{Float64,1}:
@time sum_float_array(x)
0.005524 seconds (1.74 k allocations: 82.486 KB)

One reason is that data types are fully specified

When Julia compiles this function via its just-in-time compiler, it knows that the data passed in as x will be an array of 64 bit floats

Hence it’s known to the compiler that the relevant method for + is always addition of floating point numbers

Moreover, the data can be arranged into continuous 64 bit blocks of memory to simplify memory access

Finally, data types are stable — for example, the local variable sum starts off as a float and remains a float throughout

Type Inferences

What happens if we don’t supply type information?

Here’s the same function minus the type annotation in the function signature

function sum_array(x)
    sum = 0.0
    for i in 1:length(x)
        sum += x[i]
    return sum

When we run it with the same array of floating point numbers it executes at a similar speed as the function with type information

@time sum_array(x)
0.005556 seconds (1.61 k allocations: 75.002 KB)

The reason is that when sum_array() is first called on a vector of a given data type, a newly compiled version of the function is produced to handle that type

In this case, since we’re calling the function on a vector of floats, we get a compiled version of the function with essentially the same internal representation as sum_float_array()

Things get tougher for the interpreter when the data type within the array is imprecise

For example, the following snippet creates an array where the element type is Any

x = Any[1/i for i in 1:1e6];

Now summation is much slower and memory management is less efficient

@time sum_array(x)
0.021680 seconds (1.00 M allocations: 15.332 MB)

Summary and Tips

To write efficient code use functions to segregate operations into logically distinct blocks

Data types will be determined at function boundaries

If types are not supplied then they will be inferred

If types are stable and can be inferred effectively your functions will run fast

Further Reading

There are many other aspects to writing fast Julia code

A good next stop for further reading is the relevant part of the Julia documentation


Exercise 1

Write a function with the signature simulate(m::AR1, n::Integer, x0::Real) that takes as arguments

  • an instance m of AR1
  • an integer n
  • a real number x0

and returns an array containing a time series of length n generated according to (1) where

  • the primitives of the AR(1) process are as specified in m
  • the initial condition \(X_0\) is set equal to x0

Here AR1 is as defined above:

type AR1

Hint: If d is an instance of Distribution then rand(d) generates one random draw from the distribution specified in d

Exercise 2

The term universal function is sometimes applied to functions which

  • when called on a scalar return a scalar
  • when called on an array of scalars return an array of the same length by acting elementwise on the scalars in the array

For example, sin() has this property in Julia

julia> sin(pi)

julia> sin([pi, 2pi])
2-element Array{Float64,1}:

Write a universal function f such that

  • f(k) returns a chi-squared random variable with k degrees of freedom when k is an integer
  • f(k_vec) returns a vector where f(k_vec)[i] is chi-squared with k_vec[i] degrees of freedom

Hint: If we take k independent standard normals, square them all and sum we get a chi-squared with k degrees of freedom


Exercise 1

Let’s start with the AR1 definition as specified in the lecture

using Distributions

type AR1_ex1

Now let’s write the function to simulate AR1s

function simulate(m::AR1_ex1, n::Integer, x0::Real)
    X = Array{Float64}(n)
    X[1] = x0
    for t in 1:(n-1)
        X[t+1] = m.a * X[t] + m.b + rand(m.phi)
    return X

Let’s test it out on the AR(1) process discussed in the lecture

m = AR1_ex1(0.9, 1, 1, Beta(5, 5))
X = simulate(m, 100, 0.0)
100-element Array{Float64,1}:

Next let’s plot the time series to see what it looks like

using Plots
plot(X, legend=:none)

Exercise 2

Here’s the function to act on integers

function f_ex2(k::Integer)
    @assert k > 0 "k must be a natural number"
    z = randn(k)
    return sum(z.^2)
f_ex2 (generic function with 1 method)

Calls with non-integer arguments will raise a “no matching method” error

MethodError: no method matching f_ex2(::Float64)
Closest candidates are:
  f_ex2(::Integer) at In[89]:2

Calls with integers less than 1 will trigger an assertion failure inside the function body

AssertionError: k must be a natural number

 in f_ex2(::Int64) at ./In[89]:2

Now let’s add the version that works on vectors. The notation we’ll use is slightly different to what we saw in the lectures. We are using parametric types to indicate that f will act on vectors of the form Array{T, 1} where T is a subtype Integer

function f_ex2{T <: Integer}(k_vec::Array{T, 1})
    for k in k_vec
        @assert k > 0 "Each integer must be a natural number"
    n = length(k_vec)
    draws = Array{Float64}(n)
    for i in 1:n
        z = randn(k_vec[i])
        draws[i] = sum(z.^2)
    return draws
f_ex2 (generic function with 2 methods)
f_ex2([2, 4, 6])
3-element Array{Float64,1}:

The first version of f continues to work as before when we call it on scalars

f_ex2(k::<b>Integer</b>) at In[89]:2
f_ex2<i>{T<:Integer}</i>(k_vec::<b>Array{T,1}</b>) at In[93]:2