Types, Methods and Performance

Overview

In this lecture we delve more deeply into the structure of Julia, and in particular into

  • the concept of types
  • building user defined types
  • methods and multiple dispatch

These concepts relate to the way that Julia stores and acts on data

While they might be thought of as advanced topics, some understanding is necessary to

  1. Read Julia code written by other programmers
  2. Write “well organized” Julia code that’s easy to maintain and debug
  3. Improve the speed at which your code runs

At the same time, don’t worry about following all the nuances on your first pass

If you return to these topics after doing some programming in Julia they will make more sense

Types

In Julia all objects (all “values” in memory) have a type, which can be queried using the typeof() function

julia> x = 42
42

julia> typeof(x)
Int64

Note here that the type resides with the object itself, not with the name x

The name x is just a symbol bound to an object of type Int64

Here we rebind it to another object, and now typeof(x) gives the type of that new object

julia> x = 42.0
42.0

julia> typeof(x)
Float64

Common Types

We’ve already met many of the types defined in the core Julia language and its standard library

For numerical data, the most common types are integers and floats

For those working on a 64 bit machine, the default integers and floats are 64 bits, and are called Int64 and Float64 respectively (they would be Int32 and Float64 on a 32 bit machine)

There are many other important types, used for arrays, strings, iterators and so on

julia> typeof(1 + 1im)
Complex{Int64} (constructor with 1 method)

julia> typeof(linspace(0, 1, 100))
LinSpace{Float64}

julia> typeof(eye(2))
Array{Float64,2}

julia> typeof("foo")
ASCIIString

julia> typeof(1:10)
UnitRange{Int64}

julia> typeof('c')  # Single character is a *Char*
Char

Type is important in Julia because it determines what operations will be performed on the data in a given situation

Moreover, if you try to perform an action that is unexpected for a given type the function call will usually fail

julia> 100 + "100"
ERROR: `+` has no method matching +(::Int64, ::ASCIIString)

Some languages will try to guess what the programmer wants here and return 200

Julia doesn’t — in this sense, Julia is a “strongly typed” language

Type is important and it’s up to the user to supply data in the correct form (as specified by type)

Methods and Multiple Dispatch

Looking more closely at how this works brings us to a very important topic concerning Julia’s data model — methods and multiple dispatch

Let’s look again at the error message

julia> 100 + "100"
ERROR: `+` has no method matching +(::Int64, ::ASCIIString)

As discussed earlier, the operator + is just a function, and we can rewrite that call using functional notation to obtain exactly the same result

julia> +(100, "100")
ERROR: `+` has no method matching +(::Int64, ::ASCIIString)

Multiplication is similar

julia> 100 * "100"
ERROR: `*` has no method matching *(::Int64, ::ASCIIString)

julia> *(100, "100")
ERROR: `*` has no method matching *(::Int64, ::ASCIIString)

What the message tells us is that *(a, b) doesn’t work when a is an integer and b is a string

In particular, the function * has no matching method

In essence, a method in Julia is a version of a function that acts on a particular tuple of data types

For example, if a and b are integers then a method for multiplying integers is invoked

julia> *(100, 100)
10000

On the other hand, if a and b are strings then a method for string concatenation is invoked

julia> *("foo", "bar")
"foobar"

In fact we can see the precise methods being invoked by applying @which

julia> @which *(100, 100)
*(x::Int64,y::Int64) at int.jl:47

julia> @which *("foo", "bar")
*(s1::AbstractString, ss::AbstractString...) at strings/basic.jl:50

We can see the same process with other functions and their methods

julia> isfinite(1.0)  # Call isfinite on a float
true

julia> @which isfinite(1)
isfinite(x::Integer) at float.jl:311

julia> @which isfinite(1.0)
isfinite(x::AbstractFloat) at float.jl:309

Here isfinite() is a function with multiple methods

It has a method for acting on floating points and another method for acting on integers

In fact it has quite a few methods

julia> methods(isfinite)
# 11 methods for generic function "isfinite":
isfinite(x::Float16) at float16.jl:119
isfinite(x::BigFloat) at mpfr.jl:790
isfinite(x::AbstractFloat) at float.jl:309
isfinite(x::Integer) at float.jl:311
isfinite(::Irrational{sym}) at irrationals.jl:66
isfinite(x::Real) at float.jl:310
isfinite(z::Complex{T<:Real}) at complex.jl:44
isfinite{T<:Number}(::AbstractArray{T<:Number,1}) at operators.jl:380
...

The particular method being invoked depends on the data type on which the function is called

We’ll discuss some of the more complicated data types you see towards the end of this list as we go along

Abstract Types

Looking at the list of methods above you can see references to types that we haven’t met before, such as Real and Number

These two types are examples of what are known in Julia as abstract types

Abstract types serve a different purpose to concrete types such as Int64 and Float64

To understand what that purpose is, imagine that you want to write a function with two methods, one to handle real numbers and the other for complex numbers

As we know, there are multiple types for real numbers, such as integers and floats

There are even multiple integer and float types, such as Int32, Int64, Float32, Float64, etc.

If we want to handle all of these types for real numbers in the same way, it’s useful to have a “parent” type called Real

Rather than writing a separate method for each concrete type, we can just write one for the abstract Real type

In this way, the purpose of abstract types is to serve as a unifying parent type that concrete types can “inherit” from

Indeed, we can see that Float64, Int64, etc. are subtypes of Real as follows

julia> Float64 <: Real
true

julia> Int64 <: Real
true

On the other hand, 64 bit complex numbers are not reals

julia> Complex64 <: Real
false

They are, however, subtypes of Number

julia> Complex64 <: Number
true

Number in turn is a subtype of Any, which is a parent of all types

julia> Number <: Any
true

Type Hierarchy

In fact the types form a hierarchy, with Any at the top of the tree and the concrete types at the bottom

Note that we never see instances of abstract types

For example, if x is a value then typeof(x) will never return an abstract type

The point of abstract types is simply to categorize the concrete types (as well as other abstract types that sit below them in the hierarchy)

On the other hand, we cannot subtype concrete types

While we can build subtypes of abstract types we cannot do the same for concrete types

Multiple Dispatch

We can now be a little bit clearer about what happens when you call a function on given types

Suppose we execute the function call f(a, b) where a and b are of concrete types S and T respectively

The Julia interpreter first queries the types of a and b to obtain the tuple (S, T)

It then parses the list of methods belonging to f, searching for a match

If it finds a method matching (S, T) it calls that method

If not, it looks to see whether the pair (S, T) matches any method defined for immediate parent types

For example, if S is Float64 and T is Complex64 then the immediate parents are AbstractFloat and Number respectively

julia> super(Float64)
AbstractFloat

julia> super(Complex64)
Number

Hence the interpreter looks next for a method of the form f(x::AbstractFloat, y::Number)

If the interpreter can’t find a match in immediate parents (supertypes) it proceeds up the tree, looking at the parents of the last type it checked at each iteration

  • If it eventually finds a matching method it invokes that method
  • If not, we get an error

This is the process that leads to the error that we saw above:

julia> *(100, "100")
ERROR: `*` has no method matching *(::Int64, ::ASCIIString)

The procedure of matching data to appropriate methods is called dispatch

Because the procedure starts from the concrete types and works upwards, dispatch always invokes the most specific method that is available

For example, if you have methods for function f that handle

  1. (Float64, Int64) pairs
  2. (Number, Number) pairs

and you call f with f(0.5, 1) then the first method will be invoked

This makes sense because (hopefully) the first method is designed to work well with exactly this kind of data

The second method is probably more of a “catch all” method that handles other data in a less optimal way

Defining Types and Methods

Let’s look at defining our own methods and data types, including composite data types

User Defined Methods

It’s straightforward to add methods to existing functions or functions you’ve defined

In either case the process is the same:

  • use the standard syntax to define a function of the same name
  • but specify the data type for the method in the function signature

For example, we saw above that + is just a function with various methods

  • recall that a + b and +(a, b) are equivalent

We saw also that the following call fails because it lacks a matching method

julia> +(100, "100")
ERROR: `+` has no method matching +(::Int64, ::ASCIIString)

This is sensible behavior, but if you want to change it by defining a method to handle the case in question there’s nothing to stop you:

julia> importall Base.Operators

julia> +(x::Integer, y::ASCIIString) = x + parse(Int, y)
+ (generic function with 172 methods)

julia> +(100, "100")
200

julia> 100 + "100"
200

Here’s another example, involving a user defined function

We begin with a file called test.jl in the present working directory with the following content

function f(x)
    println("Generic function invoked")
end

function f(x::Number)
    println("Number method invoked")
end

function f(x::Integer)
    println("Integer method invoked")
end

Clearly these methods do nothing more than tell you which method is being invoked

Let’s now run this and see how it relates to our discussion of method dispatch above

julia> include("test.jl")
f (generic function with 3 methods)

julia> f(3)
Integer method invoked

julia> f(3.0)
Number method invoked

julia> f("foo")
Generic function invoked

Since 3 is an Int64 and Int64 <: Integer <: Number, the call f(3) proceeds up the tree to Integer and invokes f(x::Integer)

On the other hand, 3.0 is a Float64, which is not a subtype of Integer

Hence the call f(3.0) continues up to f(x::Number)

Finally, f("foo") is handled by the generic function, since it is not a subtype of Number

User Defined Types

Most languages have facilities for creating new data types and Julia is no exception

julia> type Foo end

julia> foo = Foo()
Foo()

julia> typeof(foo)
Foo (constructor with 1 method)

Let’s make some observations about this code

First note that to create a new data type we use the keyword type followed by the name

  • By convention, type names use CamelCase (e.g., FloatingPoint, Array, AbstractArray)

When a new data type is created in this way, the interpreter simultaneously creates a default constructor for the data type

This constructor is a function for generating new instances of the data type in question

It has the same name as the data type but uses function call notion — in this case Foo()

In the code above, foo = Foo() is a call to the default constructor

A new instance of type Foo is created and the name foo is bound to that instance

Now if we want to we can create methods that act on instances of Foo

Just for fun, let’s define how to add one Foo to another

julia> +(x::Foo, y::Foo) = "twofoos"
+ (generic function with 126 methods)

julia> foo1, foo2 = Foo(), Foo()  # Create two Foos
(Foo(),Foo())

julia> +(foo1, foo2)
"twofoos"

julia> foo1 + foo2
"twofoos"

We can also create new functions to handle Foo data

julia> foofunc(x::Foo) = "onefoo"
foofunc (generic function with 1 method)

julia> foofunc(foo)
"onefoo"

This example isn’t of much use but more useful examples follow

Composite Data Types

Since the common primitive data types are already built in, most new user-defined data types are composite data types

Composite data types are data types that contain distinct fields of data as attributes

For example, let’s say we are doing a lot of work with AR(1) processes, which are random sequences \(\{X_t\}\) that follow a law of motion of the form

(1)\[X_{t+1} = a X_t + b + \sigma W_{t+1}\]

Here \(a\), \(b\) and \(\sigma\) are scalars and \(\{W_t\}\) is an iid sequence of shocks with some given distribution \(\phi\)

At times it might be convenient to take these primitives \(a\), \(b\), \(\sigma\) and \(\phi\) and organize them into a single entity like so

type AR1
    a
    b
    sigma
    phi
end

For the distribution phi we’ll assign a Distribution from the Distributions package

After reading in the AR1 definition above we can do the following

julia> using Distributions

julia> m = AR1(0.9, 1, 1, Beta(5, 5))
AR1(0.9,1,1,Beta( alpha=5.0 beta=5.0 ))

In this call to the constructor we’ve created an instance of AR1 and bound the name m to it

We can access the fields of m using their names and “dotted attribute” notation

julia> m.a
0.9

julia> m.b
1

julia> m.sigma
1

julia> m.phi
Beta( alpha=5.0 beta=5.0 )

For example, the attribute m.phi points to an instance of Beta, which is in turn a subtype of Distribution as defined in the Distributions package

julia> typeof(m.phi)
Beta (constructor with 3 methods)

julia> typeof(m.phi) <: Distribution
true

We can reach in to m and change this if we want to

julia> m.phi = Exponential(0.5)
Exponential( scale=0.5 )

In our type definition we can be explicit that we want phi to be a Distribution, and the other elements to be real scalars

type AR1
    a::Real
    b::Real
    sigma::Real
    phi::Distribution
end

(Before reading this in you might need to restart your REPL session in order to clear the old definition of AR1 from memory)

Now the constructor will complain if we try to use the wrong data type

julia> m = AR1(0.9, 1, "foo", Beta(5, 5))
ERROR: `convert` has no method matching convert(::Type{Real}, ::ASCIIString) in AR1 at no file

This is useful if we’re going to have functions that act on instances of AR1

  • e.g., simulate time series, compute variances, generate histograms, etc.

If those functions only work with AR1 instances built from the specified data types then it’s probably best if we get an error as soon we try to make an instance that doesn’t fit the pattern

Better to fail early rather than deeper into our code where errors are harder to debug

Type Parameters

Consider the following output

julia> typeof([10, 20, 30])
Array{Int64,1}

Here Array is one of Julia’s predefined types (Array <: DenseArray <: AbstractArray <: Any)

The Int64,1 in curly brackets are type parameters

In this case they are the element type and the dimension

Many other types have type parameters too

julia> typeof(1.0 + 1.0im)
Complex{Float64} (constructor with 1 method)

julia> typeof(1 + 1im)
Complex{Int64} (constructor with 1 method)

Types with parameters are therefore in fact an indexed family of types, one for each possible value of the parameter

Defining Parametric Types

We can use parametric types in our own type definitions

Let’s say we’re defining a type called FooBar with attributes foo and bar

type FooBar
    foo
    bar
end

Suppose we now decide that we want foo and bar to have the same type, although we don’t much care what that type is

We can achieve this with the syntax

type FooBar{T}
    foo::T
    bar::T
end

Now our constructor is happy provided that the arguments do in fact have the same type

julia> fb = FooBar(1.0, 2.0)
FooBar{Float64}(1.0,2.0)

julia> fb = FooBar(1, 2)
FooBar{Int64}(1,2)

julia> fb = FooBar(1, 2.0)
ERROR: `FooBar{T}` has no method matching FooBar{T}(::Int64, ::Float64)

Now let’s say we want the data to be of the same type and that type must be a subtype of Number

We can achieve this as follows

type FooBar{T <: Number}
    foo::T
    bar::T
end

Let’s try it

julia> fb = FooBar(1, 2)
FooBar{Int64}(1,2)

julia> fb = FooBar("fee", "fi")
ERROR: `FooBar{T<:Number}` has no method matching FooBar{T<:Number}(::ASCIIString, ::ASCIIString)

In the second instance we get an error because ASCIIString is not a subtype of Number

Writing Fast Code

Let’s briefly discuss how to write Julia code that executes quickly (for a given hardware configuration)

For now our focus is on generating more efficient machine code from essentially the same program (i.e., without parallelization or other more significant changes to the way the program runs)

Basic Concepts

The benchmark for performance is well written compiled code, expressed in languages such as C and Fortran

This is because computer programs are essentially operations on data, and the details of the operations implemented by the CPU depend on the nature of the data

When code is written in a language like C and compiled, the compiler has access to sufficient information to build machine code that will organize the data optimally in memory and implement efficient operations as required for the task in hand

To approach this benchmark, Julia needs to know about the type of data it’s processing as early as possible

An Example

Consider the following function, which essentially does the same job as Julia’s sum() function but acts only on floating point data

function sum_float_array(x::Array{Float64, 1})
    sum = 0.0
    for i in 1:length(x)
        sum += x[i]
    end
    return sum
end

Calls to this function run very quickly

julia> x = linspace(0, 1, 1e6)
linspace(0.0,1.0,1000000)

julia> x = collect(x);  # Convert to array of Float64s

julia> typeof(x)
Array{Float64,1}

julia> @time sum_float_array(x)
  0.001800 seconds (149 allocations: 10.167 KB)
499999.9999999796

One reason is that data types are fully specified

When Julia compiles this function via its just-in-time compiler, it knows that the data passed in as x will be an array of 64 bit floats

Hence it’s known to the compiler that the relevant method for + is always addition of floating point numbers

Moreover, the data can be arranged into continuous 64 bit blocks of memory to simplify memory access

Finally, data types are stable — for example, the local variable sum starts off as a float and remains a float throughout

Type Inferences

What happens if we don’t supply type information?

Here’s the same function minus the type annotation in the function signature

function sum_array(x)
    sum = 0.0
    for i in 1:length(x)
        sum += x[i]
    end
    return sum
end

When we run it with the same array of floating point numbers it executes at a similar speed as the function with type information

julia> @time sum_array(x)
0.001949 seconds (5 allocations: 176 bytes)

The reason is that when sum_array() is first called on a vector of a given data type, a newly compiled version of the function is produced to handle that type

In this case, since we’re calling the function on a vector of floats, we get a compiled version of the function with essentially the same internal representation as sum_float_array()

Things get tougher for the interpreter when the data type within the array is imprecise

For example, the following snippet creates an array where the element type is Any

julia> x = Any[1/i for i in 1:1e6];

julia> eltype(x)
Any

Now summation is much slower and memory management is less efficient

julia> @time sum_array(x)
0.058874 seconds (1.00 M allocations: 15.259 MB, 41.67% gc time)

Summary and Tips

To write efficient code use functions to segregate operations into logically distinct blocks

Data types will be determined at function boundaries

If types are not supplied then they will be inferred

If types are stable and can be inferred effectively your functions will run fast

Further Reading

There are many other aspects to writing fast Julia code

A good next stop for further reading is the relevant part of the Julia documentation

Exercises

Exercise 1

Write a function with the signature simulate(m::AR1, n::Integer, x0::Real) that takes as arguments

  • an instance m of AR1
  • an integer n
  • a real number x0

and returns an array containing a time series of length n generated according to (1) where

  • the primitives of the AR(1) process are as specified in m
  • the initial condition \(X_0\) is set equal to x0

Here AR1 is as defined above:

type AR1
    a::Real
    b::Real
    sigma::Real
    phi::Distribution
end

Hint: If d is an instance of Distribution then rand(d) generates one random draw from the distribution specified in d

Exercise 2

The term universal function is sometimes applied to functions which

  • when called on a scalar return a scalar
  • when called on an array of scalars return an array of the same length by acting elementwise on the scalars in the array

For example, sin() has this property in Julia

julia> sin(pi)
1.2246467991473532e-16

julia> sin([pi, 2pi])
2-element Array{Float64,1}:
  1.22465e-16
 -2.44929e-16

Write a universal function f such that

  • f(k) returns a chi-squared random variable with k degrees of freedom when k is an integer
  • f(k_vec) returns a vector where f(k_vec)[i] is chi-squared with k_vec[i] degrees of freedom

Hint: If we take k independent standard normals, square them all and sum we get a chi-squared with k degrees of freedom