How to Code with Me - Organizing a Package

This blog post is the next installment in the series about all of the very particular ways I do software development in Python. This round is about where to put your code, your tests, your CLI, and the right metadata for each.

Package Structure

After following the debate on pypa/packaging.python.org#320, I’ve opted to use the src/ layout as aptly described by Ionel Cristian Mărieș, and Hynek Schlawack.

This means that there’s a top-level tests/ directory (will come back to that later) and no possibility of mixing up your working directory when making imports. I also enforce the sole usage of relative imports to make sure there are no accidental circular imports. While I know this is allowed and reasonable sometimes, I assume others will misuse it. Additionally, relative imports force users to access their scripts like modules using the command line like python -m my_module.my_submodule. This is good because I believe there should be no such thing as Python scripts. You should always think about packaging and how someone else will use your code later.

Licensing

Your package should have a file called LICENSE (no extension) that tells people how they’re allowed to use your code. Even if you’re working in a company and won’t be sharing code, it’s still good practice.

An excellent resource to help you choose a license is https://choosealicense.com/. I normally pick MIT License because it’s easy for other people to use and modify.

Ignore the Junk

This repository uses a .gitignore file to make sure no junk gets committed. GitHub will ask you if you want a pre-populated .gitignore added to your repo on creation. You can also go to gitignore.io to get more options.

Things that are especially bad to commit to repos:

compiled python files (*.pyc)
Jupyter notebook checkpoint folders (.ipynb_checkpoints/)
documentation builds (let ReadTheDocs take care of this!)
tox and other automation/build tool caches (.tox/, .pytest_cache/, .mypy_cache/, build/, dist/, etc.)
basically any file you didn’t make on purpose

I usually use this pre-configured .gitignore with has ignores appropriate for Python, Jupyter Notebooks, IntelliJ/PyCharm, Mac, Windows, and Linux.

Packaging

I use a declarative setup in all of my packages. It’s not easy to figure out everything in this documentation, so I either copy-paste from a previous project (usually pybel/pybel or use my cookiecutter template.

First, you need to create a setup.py file in the root of your repository when using a declarative setup. It should always look exactly like this:

# -*- coding: utf-8 -*-

"""The setup module."""

import setuptools

if __name__ == '__main__':
    setuptools.setup()

The first section in the setup.cfg is [metadata]. The top of the setup.cfg for PyBEL looks like this:

[metadata]
# The name of the package (should be same as what's in `src/{your project name}`)
name = pybel
# The version of the package (you should start with 0.0.1-dev for new projects)
version = 0.14.6-dev
# A one line description of your package, should be the same as the module-level docstring
# in src/{your project}/__init__.py
description = Parsing, validation, compilation, and data exchange of Biological Expression Language (BEL)
# The `file:` magical prefix tells it to load what's in your README.rst. You did write a nice readme, right?
long_description = file: README.rst

The next few lines describe the places where project resources live on the internet:

# [metadata]
# Where is the project
url = https://github.com/pybel/pybel
# Where can people get your code
download_url = https://github.com/pybel/pybel/releases
# You can put whatever key-values here you want, but these three are special
project_urls =
    Bug Tracker = https://github.com/pybel/pybel/issues
    Source Code = https://github.com/pybel/pybel
    Documentation = https://pybel.readthedocs.io

Next is author and licensing information.

# [metadata]
# Author information
author = Charles Tapley Hoyt
author_email = cthoyt@gmail.com
# Who is actually taking care of the code? Might not be the same as the author
maintainer = Charles Tapley Hoyt
maintainer_email = cthoyt@gmail.com

# What kind of license are you using? This uses a SPDX identifier (https://spdx.org/licenses/)
license = MIT
# The thing you put here is the name of the file in the same directory as the setup.cfg
license_file = LICENSE

The license_file entry obviously points to the file. The license entry is what gets shown on PyPI using the Software Package Data Exchange controlled vocabulary.

Next comes the PyBEL classifiers. This is a list of trove classifiers that are a controlled vocabulary for describing your project’s development status, who should use it, its topics, etc.

# [metadata]
# Search tags
classifiers =
    Development Status :: 5 - Production/Stable
    Environment :: Console
    Intended Audience :: Developers
    Intended Audience :: Science/Research
    License :: OSI Approved :: MIT License
    Operating System :: OS Independent
    Programming Language :: Python
    Programming Language :: Python :: 3.8
    Programming Language :: Python :: 3.7
    Programming Language :: Python :: 3.6
    Programming Language :: Python :: 3.5
    Programming Language :: Python :: 3 :: Only
    Topic :: Scientific/Engineering :: Bio-Informatics
    Topic :: Scientific/Engineering :: Chemistry

Again, the license is very important! pyroma (see below) won’t pass if you don’t have this. Also, the other things are important too, because this will tell users that you’re cool and only allow the newest Python versions. Unfortunately, at the time of writing this post, I still had to support Python 3.5 in PyBEL for downstream dependencies :/

Next are the keywords, which can be whatever you want. Here’s what I’ve got for PyBEL:

# [metadata]
keywords =
    Biological Expression Language
    BEL
    Domain Specific Language
    DSL
    Systems Biology
    Networks Biology

Next is the [options] section. First we’ll tell it what the requirements for the package are:

[options]
install_requires =
    networkx>=2.1
    sqlalchemy
    click
    click-plugins
    bel_resources>=0.0.3
    more_itertools
    requests
    requests_file
    pyparsing
    tqdm

This is part that’s hard to explain. The packages and package_dir option are tricky… you just have to do it this way and everything magically works. Then, you basically say the same thing one more time in the [options.packages.find].

# [options]
# You're always supposed to set zip_safe = false
zip_safe = false
# If you have some non-python files inside your `src/{your package/` directory you
# want to come for the ride when other people use your code, do this
include_package_data = True
# Always tell people what python you support! Is redundant of classifiers, but that's how it is.
python_requires = >=3.5

# Where is my code?
packages = find:
package_dir =
    = src

[options.packages.find]
where = src

Testing (The Prequel Series)

To make a tiny little test that shows everything works, make the following file in tests/test_import.py:

# -*- coding: utf-8 -*-

"""Test the module can be imported."""

import unittest

class TestImport(unittest.TestCase):
    """A test case for import tests."""

    def test_import(self):
        """Test that PyBEL can be imported."""
        import pybel

Now you’re ready to run some tests. Do the following in your shell:

pip install pytest
pytest tests/

Because pytest isn’t actually a requirement to use pybel, but it’s useful to have installed, you can specify it in an optional requirement in the [options.extras_require] section of your setup.cfg like in:

[options.extras_require]
testing =
    pytest
docs =
    sphinx
    sphinx-rtd-theme
    sphinx-click
    sphinx-autodoc-typehints

You can install PyBEL with the testing and docs extras like:

pip install -e .[testing,docs]

Then you wouldn’t have to worry about the availability of pytest. However, there’s a better way to make sure that pytest is available, and more generally for any testing or build task, we have tox.

Building with Tox

There are three parts to automated builds with tox. First, you have to pip install tox. Second, you have to make a file in the root of the repository called tox.ini like below:

# The name of the default tox environment is [testenv]
[testenv]
# This is a list of commands to run as if you were in the shell yourself
commands =
    pytest tests/
# This is a list of extra dependencies to install with pip just for this testing environment
deps =
    pytest
description = Run the tests using pytest.

Third, you just have to run tox when the working directory in your shell is in the root of the repository. Then everything is taken care of for you! tox makes a new virtual environment, installs the repository using the setup.cfg/setup.py, installs the tox environment-specific dependencies, then runs the commands in order. There are tons of other options available for customizing tox listed on their documentation. There will be several more examples here that show some of them being used.

Since you’re using the extras in the setup.cfg, you can actually rewrite this configuration to use them:

[testenv]
commands =
    pytest tests/
extras =
    testing

Packaging Metadata

I use pyroma to make sure that I remembered to put everything in the packaging metadata. It can be run with

python -m pip install pyroma
pyroma --min=10 .

I welcome and encourage you to copy my configuration, but don’t forget to carefully change everything to your metadata. It’s pretty embarrassing if you accidentally attribute your work to me. I’ve done it before by accident. I’ve seen others do it too…

[testenv:pyroma]
deps =
    pygments
    pyroma
skip_install = true
commands = pyroma --min=10 .
description = Run the pyroma tool to check the package friendliness of the project.

This environment adds the skip_install key, which just says not to bother pip installing the whole package for the tests. This makes sense here because checking the metadata contained in setup.cfg doesn’t require actually installing the code.

Where to Put Persistent Data

If your application needs to download data that is not related to user input or configuration, it’s best that it has a default location for storing stuff that isn’t in a place a normal user will delete or corrupt. This is a situation where it might make sense to make a folder in the user’s home directory.

# -*- coding: utf-8 -*-

"""Constants for PyBEL."""

import os

__all__ = [
    'PYBEL_HOME',
]

# Have a reasonable default location
_DEFAULT_HOME = os.path.join(os.path.expanduser('~'), '.pybel')
# Allow the user to modify the location with an environment variable
PYBEL_HOME = os.path.abspath(os.getenv('PYBEL_HOME', _DEFAULT_HOME))
os.makedirs(PYBEL_HOME, exist_ok=True)

Now you can import PYBEL_HOME and do all sorts of nice os.path.joins to build the directory structure to help organize the data you might need. For example, PyBEL will download a copy of Daniel Himmelstein’s hetionet for conversion to BEL and put it in its cache folder.

Becuase this kind of configuration is so ubiquitous, I’ve written an package that supports doing this called pystow that simplifies the previous code to:

import pystow

pybel_module = pystow.module("pybel")
PYBEL_HOME = pybel_module.base

Where to Put Configuration

Only use the following section if your package actually needs configuration. Keep in mind that there should always be reasonable defaults for anything that can be configured, so users don’t actually have to engage with configuration.

Most packages put configuration inside the ~/.config/ folder, so you should do the same. PyBEL uses something a bit different, but here’s a mock of how loading configuration might look.

import os
from configparser import ConfigParser

CONFIG_PATH = os.path.join(os.path.expanduser('~'), '.config', 'pybel.ini')

cfp = ConfigParser()
cfp.read(CONFIG_PATH)

try:
    config = cfp['pybel']
except KeyError:
    config = {}


def get_config(key):
    """Get PyBEL-specific configuration."""
    return config.get(key)

You could also use JSON, but ini/cfg files are ubiquitous for configuration so it’s best to stick to what’s expected. Because I’ve written the previous code so often, I encapsulated it in a function in pystow:

import pystow

def get_config(key):
    pystow.get_config("pybel", key)

Note, this function has a few more bells and whistles than the boilerplate code for fallbacks, passthroughs, error handling, and type coercion.

Code Style

Mercilessly use flake8 to check your code has good style. If your code doesn’t have good style, nobody else will be able to read it. I already wrote a whole blog post on this one called Flake8 Hell.

Random Code Style Necessities

Every python file must start with the file encoding, a newline, then the module docstring like:

# -*- coding: utf-8 -*-

"""The module-level docstring."""

This docstring has to follow flake8 rules, meaning there’s a short description that fits on the first line then there’s a period. After that, there can be a blank line before any other restructured text-formatted documentation you’d like.

Bonus Round: How to Code with Ben Gyori

This isn’t something I do, but to maintain a clean git history, Ben Gyori frequently reminds me to rebase on master. This keeps a more linear history of what happened and when. Here are his instructions:

git fetch --all
# This is your master
git checkout master
# This is their master
git merge --ff-only upstream/master
git rebase master <your branch name>
# Optionally
git push -f origin <your branch name>

This is by no means everything I have to say on this topic. I’ll be back with more on documentation, ReadTheDocs, using CI, checking unit test coverage, and more.