How to read this lecture...

Code should execute sequentially if run in a Jupyter notebook

Python Essentials

In this lecture we’ll cover features of the language that are essential to reading and writing Python code

Overview

Topics:

  • Data types
  • Imports
  • Basic file I/O
  • The Pythonic approach to iteration
  • More on user-defined functions
  • Comparisons and logic
  • Standard Python style

Data Types

So far we’ve briefly met several common data types, such as strings, integers, floats and lists

Let’s learn a bit more about them

Primitive Data Types

A particularly simple data type is Boolean values, which can be either True or False

x = True
y = 100 < 10   # Python evaluates expression on right and assigns it to y
y
False
type(y)
bool

In arithmetic expressions, True is converted to 1 and False is converted 0

x + y
1
x * y
0
True + True
2
bools = [True, True, False, True]  # List of Boolean values

sum(bools)
3

This is called Boolean arithmetic and is very useful in programming

The two most common data types used to represent numbers are integers and floats

a, b = 1, 2
c, d = 2.5, 10.0
type(a)
int
type(c)
float

Computers distinguish between the two because, while floats are more informative, interal arithmetic operations on integers are more straightforward

Warning

Be careful: If you’re still using Python 2.x, division of two integers returns only the integer part

To clarify:

1 / 2       # Integer division in Python 2.x
0           # Returns integer component only

For Python 3.x

1 / 2       # Division of two integers in Python 3+
0.5         # Returns a floating point number
1 // 2       # Integer division in Python 3+
0

Otherwise the following syntax is the same regardless of python version

1.0 / 2.0   # Floating point division
0.5
1.0 / 2     # Floating point division
0.5

Complex numbers are another primitive data type in Python

x = complex(1, 2)
y = complex(2, 1)
x * y
5j

There are several more primitive data types that we’ll introduce as necessary

Containers

Python has several basic types for storing collections of (possibly heterogeneous) data

We have already discussed lists

A related data type is tuples, which are “immutable” lists

x = ('a', 'b')  # Round brackets instead of the square brackets
x = 'a', 'b'   # Or no brackets at all---the meaning is identical
x
('a', 'b')
type(x)
tuple

In Python, an object is called “immutable” if, once created, the object cannot be changed

Lists are mutable while tuples are not

x = [1, 2]  # Lists are mutable
x[0] = 10   # Now x = [10, 2], so the list has "mutated"
x = (1, 2)  # Tuples are immutable
x[0] = 10   # Trying to mutate them produces an error
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<python-input-21-6cb4d74ca096> in <module>()
----> 1 x[0]=10

TypeError: 'tuple' object does not support item assignment

We’ll say more about mutable vs immutable a bit later, and explain why the distinction is important

Tuples (and lists) can be “unpacked” as follows

integers = (10, 20, 30)
x, y, z = integers
x
10
y
20

You’ve actually seen an example of this already

Tuple unpacking is convenient and we’ll use it often

Slice Notation

To access multiple elements of a list or tuple, you can use Python’s slice notation

For example,

a = [2, 4, 6, 8]
a[1:]
[4, 6, 8]
a[1:3]
[4, 6]

The general rule is that a[m:n] returns n - m elements, starting at a[m]

Negative numbers are also permissible

a[-2:]  # Last two elements of the list
[6, 8]

The same slice notation works on tuples and strings

s = 'foobar'
s[-3:]  # Select the last three elements
'bar'

Sets and Dictionaries

Two other container types we should mention before moving on are sets and dictionaries

Dictionaries are much like lists, except that the items are named instead of numbered

d = {'name': 'Frodo', 'age': 33}
type(d)
dict
d['age']
33

The names 'name' and 'age' are called the keys

The objects that the keys are mapped to ('Frodo' and 33) are called the values

Sets are unordered collections without duplicates, and set methods provide the usual set theoretic operations

s1 = {'a', 'b'}
type(s1)
set
s2 = {'b', 'c'}
s1.issubset(s2)
False
s1.intersection(s2)
set(['b'])

The set() function creates sets from sequences

s3 = set(('foo', 'bar', 'foo'))
s3
set(['foo', 'bar'])  # Unique elements only

Imports

From the start, Python has been designed around the twin principles of

  • a small core language
  • extra functionality in separate libraries or modules

For example, if you want to compute the square root of an arbitrary number, there’s no built in function that will perform this for you

Instead, you need to import the functionality from a module — in this case a natural choice is math

import math

math.sqrt(4)
2.0

We discussed the mechanics of importing earlier

Note that the math module is part of the standard library, which is part of every Python distribution

On the other hand, the scientific libraries we’ll work with later are not part of the standard library

We’ll talk more about modules as we go along

To end this discussion with a final comment about modules and imports, in your Python travels you will often see the following syntax

from math import *

sqrt(4)
2.0

Here from math import * pulls all of the functionality of math into the current “namespace” — a concept we’ll define formally later on

Actually this kind of syntax should be avoided for the most part

In essence the reason is that it pulls in lots of variable names without explicitly listing them — a potential source of conflicts

Input and Output

Let’s have a quick look at basic file input and output

We discuss only reading and writing to text files

Input and Output

Let’s start with writing

f = open('newfile.txt', 'w')   # Open 'newfile.txt' for writing
f.write('Testing\n')           # Here '\n' means new line
f.write('Testing again')
f.close()

Here

  • The built-in function open() creates a file object for writing to
  • Both write() and close() are methods of file objects

Where is this file that we’ve created?

Recall that Python maintains a concept of the present working directory (pwd) that can be located by

import os
print(os.getcwd())

or, in IPython or a Jupyter notebook,

%pwd

If a path is not specified, then this is where Python writes to

You can confirm that the file newfile.txt is in your present working directory using a file browser or some other method

(Use %ls to list the files in the present working directory)

We can also use Python to read the contents of newline.txt as follows

f = open('newfile.txt', 'r')
out = f.read()
out
'Testing\nTesting again'
print(out)
Testing
Testing again

Paths

Note that if newfile.txt is not in the present working directory then this call to open() fails

In this case you can either specify the full path to the file

f = open('insert_full_path_to_file/newfile.txt', 'r')

or change the present working directory to the location of the file via os.chdir('path_to_file')

(In python, use cd to change directories)

Details are OS specific – a Google search on paths and Python should yield plenty of examples

Iterating

One of the most important tasks in computing is stepping through a sequence of data and performing a given action

One of Python’s strengths is its simple, flexible interface to this kind of iteration via the for loop

Looping over Different Objects

Many Python objects are “iterable”, in the sense that they can looped over

To give an example, consider the file us_cities.txt, which lists US cities and their population

new york: 8244910
los angeles: 3819702
chicago: 2707120 
houston: 2145146
philadelphia: 1536471
phoenix: 1469471
san antonio: 1359758
san diego: 1326179
dallas: 1223229

Suppose that we want to make the information more readable, by capitalizing names and adding commas to mark thousands

The program us_cities.py program reads the data in and makes the conversion:

data_file = open('us_cities.txt', 'r')
for line in data_file:
    city, population = line.split(':')            # Tuple unpacking
    city = city.title()                           # Capitalize city names
    population = '{0:,}'.format(int(population))  # Add commas to numbers
    print(city.ljust(15) + population)
data_file.close()

Here format() is a string method used for inserting variables into strings

The output is as follows

New York       8,244,910
Los Angeles    3,819,702
Chicago        2,707,120
Houston        2,145,146
Philadelphia   1,536,471
Phoenix        1,469,471
San Antonio    1,359,758
San Diego      1,326,179
Dallas         1,223,229

The reformatting of each line is the result of three different string methods, the details of which can be left till later

The interesting part of this program for us is line 2, which shows that

  1. The file object f is iterable, in the sense that it can be placed to the right of in within a for loop
  2. Iteration steps through each line in the file

This leads to the clean, convenient syntax shown in our program

Many other kinds of objects are iterable, and we’ll discuss some of them later on

Looping without Indices

One thing you might have noticed is that Python tends to favor looping without explicit indexing

For example,

for x in x_values:
    print(x * x)

is preferred to

for i in range(len(x_values)):
    print(x_values[i] * x_values[i])

When you compare these two alternatives, you can see why the first one is preferred

Python provides some facilities to simplify looping without indices

One is zip(), which is used for stepping through pairs from two sequences

For example, try running the following code

countries = ('Japan', 'Korea', 'China')
cities = ('Tokyo', 'Seoul', 'Beijing')
for country, city in zip(countries, cities):
    print('The capital of {0} is {1}'.format(country, city))

The zip() function is also useful for creating dictionaries — for example

names = ['Tom', 'John']
marks = ['E', 'F']
dict(zip(names, marks))
{'John': 'F', 'Tom': 'E'}

If we actually need the index from a list, one option is to use enumerate()

To understand what enumerate() does, consider the following example

letter_list = ['a', 'b', 'c']
for index, letter in enumerate(letter_list):
    print("letter_list[{0}] = '{1}'".format(index, letter))

The output of the loop is

letter_list[0] = 'a'
letter_list[1] = 'b'
letter_list[2] = 'c'

Comparisons and Logical Operators

Comparisons

Many different kinds of expressions evaluate to one of the Boolean values (i.e., True or False)

A common type is comparisons, such as

x, y = 1, 2
x < y
True
x > y
False

One of the nice features of Python is that we can chain inequalities

1 < 2 < 3
True
1 <= 2 <= 3
True

As we saw earlier, when testing for equality we use ==

x = 1    # Assignment
x == 2   # Comparison
False

For “not equal” use !=

1 != 2
True

Note that when testing conditions, we can use any valid Python expression

x = 'yes' if 42 else 'no'
x
'yes'
x = 'yes' if [] else 'no'
x
'no'

What’s going on here?

The rule is:

  • Expressions that evaluate to zero, empty sequences or containers (strings, lists, etc.) and None are all equivalent to False

    • for example, [] and () are equivalent to False in an if clause
  • All other values are equivalent to True

    • for example, 42 is equivalent to True in an if clause

Combining Expressions

We can combine expressions using and, or and not

These are the standard logical connectives (conjunction, disjunction and denial)

1 < 2 and 'f' in 'foo'
True
1 < 2 and 'g' in 'foo'
False
1 < 2 or 'g' in 'foo'
True
not True
False
not not True
True

Remember

  • P and Q is True if both are True, else False
  • P or Q is False if both are False, else True

More Functions

Let’s talk a bit more about functions, which are all-important for good programming style

Python has a number of built-in functions that are available without import

We have already met some

max(19, 20)
20
range(4)
[0, 1, 2, 3]
str(22)
'22'
type(22)
int

Two more useful built-in functions are any() and all()

bools = False, True, True
all(bools)  # True if all are True and False otherwise
False
any(bools)  # False if all are False and True otherwise
True

The full list of Python built-ins is here

Now let’s talk some more about user-defined functions constructed using the keyword def

Why Write Functions?

User defined functions are important for improving the clarity of your code by

  • separating different strands of logic
  • facilitating code reuse

(Writing the same thing twice is almost always a bad idea)

The basics of user defined functions were discussed here

The Flexibility of Python Functions

As we discussed in the previous lecture, Python functions are very flexible

In particular

  • Any number of functions can be defined in a given file
  • Functions can be (and often are) defined inside other functions
  • Any object can be passed to a function as an argument, including other functions
  • A function can return any kind of object, including functions

We already gave an example of how straightforward it is to pass a function to a function

Note that a function can have arbitrarily many return statements (including zero)

Execution of the function terminates when the first return is hit, allowing code like the following example

def f(x):
    if x < 0:
        return 'negative'
    return 'nonnegative'

Functions without a return statement automatically return the special Python object None

Docstrings

Python has a system for adding comments to functions, modules, etc. called docstrings

The nice thing about docstrings is that they are available at run-time

For example, let’s say that this code resides in file temp.py

# Filename: temp.py
def f(x):
    """
    This function squares its argument
    """
    return x**2

After running this code, the docstring is available as follows

f?
Type:       function
String Form:<function f at 0x2223320>
File:       /home/john/temp/temp.py
Definition: f(x)
Docstring:  This function squares its argument
f??
Type:       function
String Form:<function f at 0x2223320>
File:       /home/john/temp/temp.py
Definition: f(x)
Source:
def f(x):
    """
    This function squares its argument
    """
    return x**2

With one question mark we bring up the docstring, and with two we get the source code as well

One-Line Functions: lambda

The lambda keyword is used to create simple functions on one line

For example, the definitions

def f(x):
    return x**3

and

f = lambda x: x**3

are entirely equivalent

To see why lambda is useful, suppose that we want to calculate \(\int_0^2 x^3 dx\) (and have forgotten our high-school calculus)

The SciPy library has a function called quad that will do this calculation for us

The syntax of the quad function is quad(f, a, b) where f is a function and a and b are numbers

To create the function \(f(x) = x^3\) we can use lambda as follows

from scipy.integrate import quad

quad(lambda x: x**3, 0, 2)
(4.0, 4.440892098500626e-14)

Here the function created by lambda is said to be anonymous, because it was never given a name

Keyword Arguments

If you did the exercises in the previous lecture, you would have come across the statement

plt.plot(x, 'b-', label="white noise")

In this call to Matplotlib’s plot function, notice that the last argument is passed in name=argument syntax

This is called a keyword argument, with label being the keyword

Non-keyword arguments are called positional arguments, since their meaning is determined by order

  • plot(x, 'b-', label="white noise") is different from plot('b-', x, label="white noise")

Keyword arguments are particularly useful when a function has a lot of arguments, in which case it’s hard to remember the right order

You can adopt keyword arguments in user defined functions with no difficulty

The next example illustrates the syntax

def f(x, coefficients=(1, 1)):
    a, b = coefficients
    return a + b * x

After running this code we can call it as follows

f(2, coefficients=(0, 0))
0
f(2)  # Use default values (1, 1)
3

Notice that the keyword argument values we supplied in the definition of f become the default values

Coding Style and PEP8

To learn more about the Python programming philosophy type import this at the prompt

Among other things, Python strongly favors consistency in programming style

We’ve all heard the saying about consistency and little minds

In programming, as in mathematics, the opposite is true

  • A mathematical paper where the symbols \(\cup\) and \(\cap\) were reversed would be very hard to read, even if the author told you so on the first page

In Python, the standard style is set out in PEP8

(Occasionally we’ll deviate from PEP8 in these lectures to better match mathematical notation)

Exercises

Solve the following exercises

(For some, the built in function sum() comes in handy)

Exercise 1

Part 1: Given two numeric lists or tuples x_vals and y_vals of equal length, compute their inner product using zip()

Part 2: In one line, count the number of even numbers in 0,...,99

  • Hint: x % 2 returns 0 if x is even, 1 otherwise

Part 3: Given pairs = ((2, 5), (4, 2), (9, 8), (12, 10)), count the number of pairs (a, b) such that both a and b are even

Exercise 2

Consider the polynomial

(1)\[p(x) = a_0 + a_1 x + a_2 x^2 + \cdots a_n x^n = \sum_{i=0}^n a_i x^i\]

Write a function p such that p(x, coeff) that computes the value in (1) given a point x and a list of coefficients coeff

Try to use enumerate() in your loop

Exercise 3

Write a function that takes a string as an argument and returns the number of capital letters in the string

Hint: 'foo'.upper() returns 'FOO'

Exercise 4

Write a function that takes two sequences seq_a and seq_b as arguments and returns True if every element in seq_a is also an element of seq_b, else False

  • By “sequence” we mean a list, a tuple or a string
  • Do the exercise without using sets and set methods

Exercise 5

When we cover the numerical libraries, we will see they include many alternatives for interpolation and function approximation

Nevertheless, let’s write our own function approximation routine as an exercise

In particular, without using any imports, write a function linapprox that takes as arguments

  • A function f mapping some interval \([a, b]\) into \(\mathbb R\)
  • two scalars a and b providing the limits of this interval
  • An integer n determining the number of grid points
  • A number x satisfying a <= x <= b

and returns the piecewise linear interpolation of f at x, based on n evenly spaced grid points a = point[0] < point[1] < ... < point[n-1] = b

Aim for clarity, not efficiency

Solutions

Exercise 1

Part 1 solution:

Here’s one possible solution

x_vals = [1, 2, 3]
y_vals = [1, 1, 1]
sum([x * y for x, y in zip(x_vals, y_vals)])
6

This also works

sum(x * y for x, y in zip(x_vals, y_vals))
6

Part 2 solution:

One solution is

sum([x % 2 == 0 for x in range(100)])
50

This also works:

sum(x % 2 == 0 for x in range(100))
50

Some less natural alternatives that nonetheless help to illustrate the flexibility of list comprehensions are

len([x for x in range(100) if x % 2 == 0])
50

and

sum([1 for x in range(100) if x % 2 == 0])
50

Part 3 solution

Here’s one possibility

pairs = ((2, 5), (4, 2), (9, 8), (12, 10))
sum([x % 2 == 0 and y % 2 == 0 for x, y in pairs])
2

Exercise 2

def p(x, coeff):
    return sum(a * x**i for i, a in enumerate(coeff))
p(1, (2, 4))
6

Exercise 3

Here’s one solution:

def f(string):
    count = 0
    for letter in string:
        if letter == letter.upper() and letter.isalpha():
            count += 1
    return count
f('The Rain in Spain')
3

Exercise 4

Here’s a solution:

def f(seq_a, seq_b):
    is_subset = True
    for a in seq_a:
        if a not in seq_b:
            is_subset = False
    return is_subset

# == test == #

print(f([1, 2], [1, 2, 3]))
print(f([1, 2, 3], [1, 2]))
True
False

Of course if we use the sets data type then the solution is easier

def f(seq_a, seq_b):
    return set(seq_a).issubset(set(seq_b))

Exercise 5

def linapprox(f, a, b, n, x):
    """
    Evaluates the piecewise linear interpolant of f at x on the interval
    [a, b], with n evenly spaced grid points.

    Parameters
    ===========
        f : function
            The function to approximate

        x, a, b : scalars (floats or integers)
            Evaluation point and endpoints, with a <= x <= b

        n : integer
            Number of grid points

    Returns
    =========
        A float. The interpolant evaluated at x

    """
    length_of_interval = b - a
    num_subintervals = n - 1
    step = length_of_interval / num_subintervals

    # === find first grid point larger than x === #
    point = a
    while point <= x:
        point += step

    # === x must lie between the gridpoints (point - step) and point === #
    u, v = point - step, point

    return f(u) + (x - u) * (f(v) - f(u)) / (v - u)