Writing Functions
Overview
Teaching: 40 min
Exercises: 10 minQuestions
How can I create my own functions?
Objectives
Explain and identify the difference between function definition and function call.
Write a function that takes a small, fixed number of arguments and produces a single result.
If you don’t have the data files already (e.g., using Google Colab), you will want to download them.
# Same download function as last time
import urllib.request
import os
def download_data(filename):
if not os.path.exists('data'):
os.mkdir('data')
url = 'https://raw.githubusercontent.com/ualberta-rcg/python-intro/gh-pages/data/' + filename
output_file = 'data/' + filename
urllib.request.urlretrieve(url, output_file)
print("Downloaded " + filename + " to the data directory")
# Run the function to download
filenames = ['gapminder_gdp_africa.csv',
'gapminder_gdp_americas.csv',
'gapminder_gdp_oceania.csv',
'gapminder_gdp_europe.csv',
'gapminder_gdp_asia.csv',
'gapminder_all.csv']
for filename in filenames:
download_data(filename)
Break programs down into functions to make them easier to understand.
- Human beings can only keep a few items in working memory at a time.
- Understand larger/more complicated ideas by understanding and combining pieces.
- Components in a machine.
- Lemmas when proving theorems.
- Functions serve the same purpose in programs.
- Encapsulate complexity so that we can treat it as a single “thing”.
- Also enables re-use.
- Write one time, use many times.
Define a function using def
with a name, parameters, and a block of code.
- Begin the definition of a new function with
def
. - Followed by the name of the function.
- Must obey the same rules as variable names.
- Then parameters in parentheses.
- Empty parentheses if the function doesn’t take any inputs.
- We will discuss this in detail in a moment.
- Then a colon.
- Then an indented block of code.
def print_greeting():
print('Hello!')
Defining a function does not run it.
- Defining a function does not run it.
- Like assigning a value to a variable.
- Must call the function to execute the code it contains.
print_greeting()
Hello!
Arguments in call are matched to parameters in definition.
- Functions are most useful when they can operate on different data.
- Specify parameters when defining a function.
- These become variables when the function is executed.
- Are assigned the arguments in the call (i.e., the values passed to the function).
- If you don’t name the arguments when using them in the call, the arguments will be matched to parameters in the order the parameters are defined in the function.
def print_date(year, month, day):
joined = str(year) + '/' + str(month) + '/' + str(day)
print(joined)
print_date(1871, 3, 19)
1871/3/19
Or, we can name the arguments when we call the function, which allows us to specify them in any order:
print_date(month=3, day=19, year=1871)
1871/3/19
- Via Twitter:
()
contains the ingredients for the function while the body contains the recipe.
Functions may return a result to their caller using return
.
- Use
return ...
to give a value back to the caller. - May occur anywhere in the function.
- But functions are easier to understand if
return
occurs:- At the start to handle special cases.
- At the very end, with a final result.
def average(values):
if len(values) == 0:
return None
return sum(values) / len(values)
a = average([1, 3, 4])
print('average of actual values:', a)
2.6666666666666665
print('average of empty list:', average([]))
None
- Remember: every function returns something.
- A function that doesn’t explicitly
return
a value automatically returnsNone
.
result = print_date(1871, 3, 19)
print('result of call is:', result)
1871/3/19
result of call is: None
Identifying Syntax Errors
- Read the code below and try to identify what the errors are without running it.
- Run the code and read the error message. Is it a
SyntaxError
or anIndentationError
?- Fix the error.
- Repeat steps 2 and 3 until you have fixed all the errors.
def another_function print("Syntax errors are annoying.") print("But at least python tells us about them!") print("So they are usually not too hard to fix.")
Solution
def another_function(): print("Syntax errors are annoying.") print("But at least Python tells us about them!") print("So they are usually not too hard to fix.")
Definition and Use
What does the following program print?
def report(pressure): print('pressure is', pressure) print('calling', report, 22.5)
Solution
calling <function report at 0x7fd128ff1bf8> 22.5
A function call always needs parenthesis, otherwise you get memory address of the function object. So, if we wanted to call the function named report, and give it the value 22.5 to report on, we could have our function call as follows
print("calling") report(22.5)
Order of Operations
The example above:
result = print_date(1871, 3, 19) print('result of call is:', result)
printed:
1871/3/19 result of call is: None
Explain why the two lines of output appeared in the order they did.
What’s wrong in this example?
result = print_date(1871,3,19) def print_date(year, month, day): joined = str(year) + '/' + str(month) + '/' + str(day) print(joined)
Solution
- The first line of output (
1871/3/19
) is from the print function insideprint_date()
, while the second line is from the print function below the function call. All of the code insideprint_date()
is executed first, and the program then “leaves” the function and executes the rest of the code.- The problem with the example is that the function is defined after the call to the function is made. Python therefore doesn’t understand the function call.
Encapsulation
Fill in the blanks to create a function that takes a single filename as an argument, loads the data in the file named by the argument, and returns the minimum value in that data.
import pandas as pd def min_in_data(____): df = ____ return ____
Solution
import pandas as pd def min_in_data(filename): df = pd.read_csv(filename) return df.min()
Find the First
Fill in the blanks to create a function that takes a list of numbers as an argument and returns the first negative value in the list. What does your function do if the list is empty?
def first_negative(values): for v in ____: if ____: return ____
Solution
def first_negative(values): for v in values: if v<0: return v
If an empty list is passed to this function, it returns
None
:my_list = [] print(first_negative(my_list))
None
Calling by Name
Earlier we saw this function:
def print_date(year, month, day): joined = str(year) + '/' + str(month) + '/' + str(day) print(joined)
We saw that we can call the function using named arguments, like this:
print_date(day=1, month=2, year=2003)
- What does
print_date(day=1, month=2, year=2003)
print?- When have you seen a function call like this before?
- When and why is it useful to call functions this way?
Solution
2003/2/1
- We saw examples of using named arguments when working with the pandas library. For example, when reading in a dataset using
data = pd.read_csv('data/gapminder_gdp_europe.csv', index_col='country')
, the last argumentindex_col
is a named argument.- Using named arguments can make code more readable since one can see from the function call what name the different arguments have inside the function. It can also reduce the chances of passing arguments in the wrong order, since by using named arguments the order doesn’t matter.
Encapsulate of If/Print Block
The code below will run on a label-printer for chicken eggs. A digital scale will report a chicken egg mass (in grams) to the computer and then the computer will print a label.
Please re-write the code so that the if-block is folded into a function.
import random for i in range(10): # simulating the mass of a chicken egg # the (random) mass will be 70 +/- 20 grams mass=70+20.0*(2.0*random.random()-1.0) print(mass) #egg sizing machinery prints a label if(mass>=85): print("jumbo") elif(mass>=70): print("large") elif(mass<70 and mass>=55): print("medium") else: print("small")
The simplified program follows. What function definition will make it functional?
# revised version import random for i in range(10): # simulating the mass of a chicken egg # the (random) mass will be 70 +/- 20 grams mass=70+20.0*(2.0*random.random()-1.0) print(mass,print_egg_label(mass))
- Create a function definition for
print_egg_label()
that will work with the revised program above. Note, the function’s return value will be significant. Sample output might be71.23 large
.- A dirty egg might have a mass of more than 90 grams, and a spoiled or broken egg will probably have a mass that’s less than 50 grams. Modify your
print_egg_label()
function to account for these error conditions. Sample output could be25 too light, probably spoiled
.Solution
def print_egg_label(mass): #egg sizing machinery prints a label if(mass>=90): return("warning: egg might be dirty") elif(mass>=85): return("jumbo") elif(mass>=70): return("large") elif(mass<70 and mass>=55): return("medium") elif(mass<50): return("too light, probably spoiled") else: return("small")
Encapsulating Data Analysis
Assume that the following code has been executed:
import pandas as pd asia_df = pd.read_csv('data/gapminder_gdp_asia.csv', index_col=0) japan = asia_df.loc['Japan']
1.Complete the statements below to obtain the average GDP for Japan across the years reported for the 1980s.
year = 1983 gdp_decade = 'gdpPercap_' + str(year // ____) avg = (japan.loc[gdp_decade + ___] + japan.loc[gdp_decade + ___]) / 2
2.Abstract the code above into a single function.
def avg_gdp_in_decade(country, continent, year): df = pd.read_csv('data/gapminder_gdp_'+___+'.csv',delimiter=',',index_col=0) ____ ____ ____ return avg
3.How would you generalize this function if you did not know beforehand which specific years occurred as columns in the data? For instance, what if we also had data from years ending in 1 and 9 for each decade? (Hint: use the columns to filter out the ones that correspond to the decade, instead of enumerating them in the code.)
Solution
1.
year = 1983 gdp_decade = 'gdpPercap_' + str(year // 10) avg = (japan.loc[gdp_decade + '2'] + japan.loc[gdp_decade + '7']) / 2
2.
def avg_gdp_in_decade(country, continent, year): df = pd.read_csv('data/gapminder_gdp_' + continent + '.csv', index_col=0) c = df.loc[country] gdp_decade = 'gdpPercap_' + str(year // 10) avg = (c.loc[gdp_decade + '2'] + c.loc[gdp_decade + '7'])/2 return avg
3.
We need to loop over the reported years to obtain the average for the relevant ones in the data.
def avg_gdp_in_decade(country, continent, year): df = pd.read_csv('data/gapminder_gdp_' + continent + '.csv', index_col=0) c = df.loc[country] gdp_decade = 'gdpPercap_' + str(year // 10) total = 0.0 num_years = 0 for yr_header in c.index: # c's index contains reported years if yr_header.startswith(gdp_decade): total = total + c.loc[yr_header] num_years = num_years + 1 return total/num_years
The function can now be called by:
avg_gdp_in_decade('Japan','asia',1983)
20880.023800000003
Using Functions With Conditionals in Pandas
Functions will often contain conditionals. Here is a short example that will indicate which quartile the argument is in based on hand-coded values for the quartile cut points.
def calculate_life_quartile(exp): if exp < 58.41: # This observation is in the first quartile return 1 elif exp >= 58.41 and exp < 67.05: # This observation is in the second quartile return 2 elif exp >= 67.05 and exp < 71.70: # This observation is in the third quartile return 3 elif exp >= 71.70: # This observation is in the fourth quartile return 4 else: # This observation has bad data return None calculate_life_quartile(62.5)
2
That function would typically be used within a
for
loop, but Pandas has a different, more efficient way of doing the same thing, and that is by applying a function to a dataframe or a portion of a dataframe. Here is an example, using the definition above.all_df = pd.read_csv('data/gapminder_all.csv', index_col='country') all_df['life_qrtl_1992'] = all_df['lifeExp_1992'].apply(calculate_life_quartile)
There is a lot in that second line, so let’s take it piece by piece. On the right side of the
=
we start withall_df['lifeExp_1992']
, which is the column in the dataframe calledall_df
labeledlifExp_1992
. We use theapply()
to do what it says, apply thecalculate_life_quartile
to the value of this column for every row in the dataframe. We write the results to a new column in the dataframe.
Simulating a dynamical system
In mathematics, a dynamical system is a system in which a function describes the time dependence of a point in a geometrical space. A canonical example of a dynamical system is a system called the logistic map.
Define a function called
logistic_map
that takes two inputs:x
, representing the state of the system at time t, and a parameterr
. This function should return a value representing the state of the system at time t+1.Using a
for
loop, iterate thelogistic_map
function defined in part 1 starting from an initial condition of 0.5 fort_final=10
,100
, and1000
periods. Store the intermediate results in a list so that after thefor
loop terminates you have accumulated a sequence of values representing the state of the logistic map at time t=0,1,…,t_final.Encapsulate the logic of your
for
loop into a function callediterate
that takes the initial condition as its first input, the parametert_final
as its second input and the parameterr
as its third input. The function should return the list of values representing the state of the logistic map at time t=0,1,…,t_final.Solution
1.
def logistic_map(x, r): return r * x * (1 - x)
2.
initial_condition = 0.5 t_final = 10 r = 1.0 trajectory = [initial_condition] for t in range(1, t_final): trajectory.append( logistic_map(trajectory[t-1], r) )
3.
def iterate(initial_condition, t_final, r): trajectory = [initial_condition] for t in range(1, t_final): trajectory.append( logistic_map(trajectory[t-1], r) ) return trajectory
Key Points
Break programs down into functions to make them easier to understand.
Define a function using
def
with a name, parameters, and a block of code.Defining a function does not run it.
Arguments in call are matched to parameters in definition.
Functions may return a result to their caller using
return
.