Testing
Untested software can be compared to uncalibrated detectors
“Before relying on a new experimental device, an experimental scientist always establishes its accuracy. A new detector is calibrated when the scientist observes > its responses to known input signals. The results of this calibration are compared against the expected response.”
From Testing and Continuous Integration with Python, created by K. Huff
Simulations and analysis using software should be held to the same standards as experimental measurement devices!
Further motivation for testing: - A Scientist’s Nightmare: Software Problem Leads to Five Retractions - Researchers find bug in Python script may have affected hundreds of studies
Testing helps to detect errors before they cause problems
In software tests, expected results are compared with observed results in order to establish accuracy: - As projects grow, it becomes easier to break things without noticing immediately - Software defects can be caused by both human errors and non-controllable events (i.e. environmental conditions) - Testing is essential for research software because we care about reproducibility of scientific results
Testing also encourages others to use your code: - It provides a way for users to see if the code is installed correctly - It allows users to better judge the quality of the code
Finally, testing encourages other developers to contribute to your code as it is easier for external developers to contribute to the project without breaking your code (or at least it is clear when they have broken the code!)
However bear in mind that tested code does not mean the code is perfect; “Program testing can be used to show the presence of bugs, but never to show their absence!” (Edsger W. Dijkstra)
Discussion question: Can you think of examples where it is not necessary to share your code?
Show answer
Some examples where you may choose not to test your code: - A Jupyter notebook which produces a plot and you know by looking at the plot whether it worked - A short, ‘obviously correct’ Python script which you never intend to reuse
Use assertions for things you believe will/should never happen.
def kelvin_to_celsius(temp_k):
"""
Converts temperature in Kelvin
to Celsius.
"""
assert temp_k >= 0.0, "ERROR: negative T_K"
= temp_k - 273.15
temp_c return temp_c
Units testing is for small components (units) of a code
def fahrenheit_to_celsius(temp_f):
"""Converts temperature in Fahrenheit
to Celsius.
"""
= (temp_f - 32.0) * (5.0/9.0)
temp_c return temp_c
# This is the test function: `assert` raises an error if something
# is wrong.
def test_fahrenheit_to_celsius():
= fahrenheit_to_celsius(temp_f=100.0)
temp_c = 37.777777
expected_result assert abs(temp_c - expected_result) < 1.0e-6
- Unit tests are functions
- Unit testing is used to test one unit: for example, a single function
Question: In the example above we want to check if the calculated temp_c
is equal to the expected_result
. Why do we use the assert statement abs(temp_c - expected_result) < 1.0e-6
instead of the simpler assert statement temp_c == expected_result
?
Show answer
Due to the inherent limitations to computational accuracy, we should not test for the exact equality between two floats. Instead we check that they are the same to within a numerical tolerance (in this case 1.0e-6). For more details please see the lesson Evaluating numerical errors and accuracy.
Another commonly used type of testing is called end-to-end. End-to-end tests check if the software is working as a whole, from beginning to end. For example, you input example data at the start of a simulation and then check that the end results of the simulation are correct.
Pytest can be used to implement a Python test suite
Rather than running unit tests one-by-one we can use pytest to automatically find and run all the tests within a project; pytest collects and runs all test functions starting with test_.
In the following steps we will make a simple Python function and use pytest to test it.
- Create a new directory and change into it:
mkdir pytest-example
cd pytest-example
- Then create a file called example.py and copy-paste the following code into it:
def add(a, b):
return a + b
def test_add():
assert add(2, 3) == 5
assert add('space', 'ship') == 'spaceship'
This code contains one genuine function and a test function. pytest finds any functions beginning with test_ and treats them as tests.
- Let us try to test it with pytest:
pytest -v example.py
============================================================ test session starts =================================
platform linux -- Python 3.7.2, pytest-4.3.1, py-1.8.0, pluggy-0.9.0 -- /home/user/pytest-example/venv/bin/python3
cachedir: .pytest_cache
rootdir: /home/user/pytest-example, inifile:
collected 1 item
example.py::test_add PASSED
========================================================= 1 passed in 0.01 seconds ===============================
Yay! The test passed!
- Let us break the test! Introduce a code change which breaks the code and check whether pytest detects the change:
pytest -v example.py
============================================================ test session starts =================================
platform linux -- Python 3.7.2, pytest-4.3.1, py-1.8.0, pluggy-0.9.0 -- /home/user/pytest-example/venv/bin/python3
cachedir: .pytest_cache
rootdir: /home/user/pytest-example, inifile:
collected 1 item
example.py::test_add FAILED
================================================================= FAILURES =======================================
_________________________________________________________________ test_add _______________________________________
def test_add():
> assert add(2, 3) == 5
E assert -1 == 5
E --1
E +5
example.py:6: AssertionError
========================================================= 1 failed in 0.05 seconds ==============
Notice how pytest is smart and includes context: lines that failed, values of the relevant variables.
Question: In the example above we have compared integers. In this optional exercise we want to learn how to compare floating point numbers since they are more tricky.
The following test will fail and this might be surprising. Try it out:
def add(a, b):
return a + b
def test_add():
assert add(0.1, 0.2) == 0.3
Your goal: find a more robust way to test this addition.
Show answer
One solution is to use pytest.approx
:
from pytest import approx
def add(a, b):
return a + b
def test_add():
assert add(0.1, 0.2) == approx(0.3)
But maybe you didn’t know about pytest.approx: and did this instead:
def test_add():
= add(0.1, 0.2)
result assert abs(result - 0.3) < 1.0e-7
This is OK but the 1.0e-7 can be a bit arbitrary.
TASKS
- Write and run a unit test for the
Lorenz()
function in theLorenz.py
script you developed in the Scripting lesson.
EXTENSION TASKS: Automate everything!
- Automated testing using Continuous Integration allows us to automatically run tests when there is a commit to a Github repository. Following the tutorial from The Code Refinery, implement continuous integration for a Github repository holding
Lorenz.py
(this could be the Github repository you created in the previous lesson). Note that you will have to adapt the workflow file so that pip installs the python packages imported in your script (for Lorenz.py the packages are numpy and matplotlib).