Documentation

Documentation comes in different forms

There is nothing in the programming field more despicable than an undocumented program.  — Ed Yourdon, Software Engineering pioneer

  • Tutorials: learning-oriented, allows the newcomer to get started
  • How-to guides: goal-oriented, shows how to solve a specific problem
  • Explanation: understanding-oriented, explains a concept
  • Reference: information-oriented, describes the machinery

These are distinct. For an excellent discussion, please see What nobody tells you about documentation.

There is no one size fits all: however a good starting point is almost always to include a README file with your code. Your code may, depending on the number of users apart from yourself, also include:

  • Purpose
  • Authors
  • License
  • Recommended citation
  • Copy-paste-able example to get started
  • Dependencies and their versions or version ranges
  • Installation instructions
  • Tutorials covering key functionality
  • Reference documentation (e.g. API) covering all functionality
  • How do you want to be asked questions (mailing list or forum or chat or issue tracker)
  • FAQ section
  • Contribution guide

Question: Visit the Github repository for effmass and explore the repository and documentation site.

  1. Which different types (tutorials/how-to/explanation/reference) of documentation can you find?
  2. Are all the items in the bullet point list included?
Show answer
  • There is a README file with basic installation how-to instructions and an overview of the functionality.
  • There are notebook tutorials for allowing newcomers to get started.
  • There is API reference information here which describes how each function works.
  • There is a paper which explains some of the concepts behind the code (for example the different definitions of effective mass).
  1. All of the bullet points on the list are included except i) there is no FAQ ii) the example is a Jupyter notebook and is not quickly copy-pastable (you would end up with the code and the surrounding text, which would then need to be removed).

The README file is important for users

  • The README file is important as it is usually the first thing people look at when visiting a code repository.
  • Use it to communitcate important information about your project.
  • Write your README using a markup language such as Markdown. Github will automatically format your markdown file when you push it to your Github repository.
  • The README should be in the top-level of your repository
  • To make your README extra friendly you can use emojis.
  • A README file might contains: A summary of the software purpose, author list, a link to the license file, a recommended citation, installation instructions and a list of code dependencies (e.g. numpy, matplotlib), a link to the issues tracker, and any other contact details.

In-code documentation is important for code developers

  • In-code documentation are very useful for people wanting to contribute to your code.
  • They can also be used to auto-generate online documentation for functions/classes (for more information see the extension task below).
  • The most commonly used Python in-code documentation are comments starting with # and Python docstrings

Question: Which of the comments below is the best and why?

# Now we check if temperature is larger then -50:
if temperature > -50:
    print('do something')
# We regard temperatures below -50 degrees as measurement errors
if temperature > -50:
    print('do something')
Show answer

The first comment describes what happens in this piece of code, whereas the second describes why this piece of code is there, i.e. its purpose. Comments like the second comment are much more useful.

A docstring is a structured comment associated to a segment of code (i.e. a function or class).

We have already written basic docstrings. Here we will see how to write Google style docstrings. Let’s look at the following function:

def mean_temperature(temperatures):
    return sum(temperatures)/len(temperatures)

We can make it clearer what this function does and how to use it using a docstring. A good docstring describes:

  • What the function does
  • What goes in (including the type of the input variables)
  • What goes out (including the return type)
def mean_temperature(temperatures):
    """
      Get the mean temperature

      Args:
          data (list): A list with air temperature measurements.

      Returns:
          The mean air temperature (float)
    """
    return sum(temperatures)/len(temperatures)

Two key advantages of docstrings are: - Python parses docstrings, for example calling the help function will display the docstring: try help(mean_temperature) - You can generate your documentation as you are generating the code. In fact, thinking carefully about what the input/output/behaviour should be may encourage you to write better code!

TASKS

  1. Write a docstring for the Lorenz() function in the Lorenz.py script. Follow the Google docstring format as in the example above.

EXTENSION TASKS: Automate everything!

  1. API (Application Programming Interface) documentation can be automatically generated using a tool such as pdoc3. These tools read in all of the docstrings within a Python package and generate a webpage accordingly. For example, see the API documentation for the effmass project. Automatic API documentation is a real gamechanger; as long as we write update the docstrings alongside our code, we can quickly generate new documentation. Following the documentation on pdoc3 generate a html page with the API documentation for Lorenz.py. In the next lesson you will learn how to host this page using Github Pages.