How to Code with Me - Organizing a Package
This blog post is the next installment in the series about all of the very particular ways I do software development in Python. This round is about where to put your code, your tests, your CLI, and the right metadata for each.
Package Structure
After following the debate on
I’ve opted to use the src/
layout as aptly described by
Ionel Cristian Mărieș,
and Hynek Schlawack.
This means that there’s a top-level tests/
directory (will come back to that
later) and no possibility of mixing up your working directory when making
imports. I also enforce the sole usage of
relative imports
to make sure there are no accidental circular imports. While I know this is
allowed and reasonable sometimes, I assume others will misuse it. Additionally,
relative imports force users to access their scripts like modules using the
command line like python -m my_module.my_submodule
. This is good because I
believe there should be no such thing as Python scripts. You should always think
about packaging and how someone else will use your code later.
Your package should have a file called LICENSE (no extension) that tells people how they’re allowed to use your code. Even if you’re working in a company and won’t be sharing code, it’s still good practice.
An excellent resource to help you choose a license is I normally pick MIT License because it’s easy for other people to use and modify.
Ignore the Junk
This repository uses a .gitignore
file to make sure no junk gets committed.
GitHub will ask you if you want a pre-populated .gitignore
added to your repo
on creation. You can also go to to get
more options.
Things that are especially bad to commit to repos:
- compiled python files (
) - Jupyter notebook checkpoint folders (
) - documentation builds (let ReadTheDocs take care of this!)
- tox and other automation/build tool caches (
, etc.) - basically any file you didn’t make on purpose
I usually use
pre-configured .gitignore
with has ignores appropriate for Python, Jupyter
Notebooks, IntelliJ/PyCharm, Mac, Windows, and Linux.
I use a declarative setup in all of my packages. It’s not easy to figure out everything in this documentation, so I either copy-paste from a previous project (usually pybel/pybel or use my cookiecutter template.
First, you need to create a
file in the root of your repository when
using a declarative setup. It should always look exactly like this:
# -*- coding: utf-8 -*-
"""The setup module."""
import setuptools
if __name__ == '__main__':
The first section in the setup.cfg
is [metadata]
. The
of the setup.cfg
for PyBEL looks like this:
# The name of the package (should be same as what's in `src/{your project name}`)
name = pybel
# The version of the package (you should start with 0.0.1-dev for new projects)
version = 0.14.6-dev
# A one line description of your package, should be the same as the module-level docstring
# in src/{your project}/
description = Parsing, validation, compilation, and data exchange of Biological Expression Language (BEL)
# The `file:` magical prefix tells it to load what's in your README.rst. You did write a nice readme, right?
long_description = file: README.rst
The next few lines describe the places where project resources live on the internet:
# [metadata]
# Where is the project
url =
# Where can people get your code
download_url =
# You can put whatever key-values here you want, but these three are special
project_urls =
Bug Tracker =
Source Code =
Documentation =
Next is author and licensing information.
# [metadata]
# Author information
author = Charles Tapley Hoyt
author_email =
# Who is actually taking care of the code? Might not be the same as the author
maintainer = Charles Tapley Hoyt
maintainer_email =
# What kind of license are you using? This uses a SPDX identifier (
license = MIT
# The thing you put here is the name of the file in the same directory as the setup.cfg
license_file = LICENSE
The license_file
entry obviously points to the file. The license
entry is
what gets shown on PyPI using the
Software Package Data Exchange controlled
Next comes the PyBEL classifiers. This is a list of trove classifiers that are a controlled vocabulary for describing your project’s development status, who should use it, its topics, etc.
# [metadata]
# Search tags
classifiers =
Development Status :: 5 - Production/Stable
Environment :: Console
Intended Audience :: Developers
Intended Audience :: Science/Research
License :: OSI Approved :: MIT License
Operating System :: OS Independent
Programming Language :: Python
Programming Language :: Python :: 3.8
Programming Language :: Python :: 3.7
Programming Language :: Python :: 3.6
Programming Language :: Python :: 3.5
Programming Language :: Python :: 3 :: Only
Topic :: Scientific/Engineering :: Bio-Informatics
Topic :: Scientific/Engineering :: Chemistry
Again, the license is very important! pyroma
(see below) won’t pass if you
don’t have this. Also, the other things are important too, because this will
tell users that you’re cool and only allow the newest Python versions.
Unfortunately, at the time of writing this post, I still had to support Python
3.5 in PyBEL for downstream dependencies :/
Next are the keywords, which can be whatever you want. Here’s what I’ve got for PyBEL:
# [metadata]
keywords =
Biological Expression Language
Domain Specific Language
Systems Biology
Networks Biology
Next is the [options]
section. First we’ll tell it what the requirements for
the package are:
install_requires =
This is part that’s hard to explain. The packages
and package_dir
option are
tricky… you just have to do it this way and everything magically works. Then,
you basically say the same thing one more time in the [options.packages.find]
# [options]
# You're always supposed to set zip_safe = false
zip_safe = false
# If you have some non-python files inside your `src/{your package/` directory you
# want to come for the ride when other people use your code, do this
include_package_data = True
# Always tell people what python you support! Is redundant of classifiers, but that's how it is.
python_requires = >=3.5
# Where is my code?
packages = find:
package_dir =
= src
where = src
Testing (The Prequel Series)
To make a tiny little test that shows everything works, make the following file
in tests/
# -*- coding: utf-8 -*-
"""Test the module can be imported."""
import unittest
class TestImport(unittest.TestCase):
"""A test case for import tests."""
def test_import(self):
"""Test that PyBEL can be imported."""
import pybel
Now you’re ready to run some tests. Do the following in your shell:
pip install pytest
pytest tests/
Because pytest
isn’t actually a requirement to use pybel
, but it’s useful to
have installed, you can specify it in an optional requirement in the
section of your setup.cfg
like in:
testing =
docs =
You can install PyBEL with the testing and docs extras like:
pip install -e .[testing,docs]
Then you wouldn’t have to worry about the availability of pytest
. However,
there’s a better way to make sure that pytest
is available, and more generally
for any testing or build task, we have tox
Building with Tox
There are three parts to automated builds with tox
. First, you have to
pip install tox
. Second, you have to make a file in the root of the repository
called tox.ini
like below:
# The name of the default tox environment is [testenv]
# This is a list of commands to run as if you were in the shell yourself
commands =
pytest tests/
# This is a list of extra dependencies to install with pip just for this testing environment
deps =
description = Run the tests using pytest.
Third, you just have to run tox
when the working directory in your shell is in
the root of the repository. Then everything is taken care of for you! tox
makes a new virtual environment, installs the repository using the
setup.cfg/, installs the tox environment-specific dependencies, then
runs the commands in order. There are tons of other options available for
customizing tox listed on their
documentation. There will be several
more examples here that show some of them being used.
Since you’re using the extras
in the setup.cfg
, you can actually rewrite
this configuration to use them:
commands =
pytest tests/
extras =
Packaging Metadata
I use pyroma
to make sure that I remembered to put everything in the packaging
metadata. It can be run with
python -m pip install pyroma
pyroma --min=10 .
I welcome and encourage you to copy my configuration, but don’t forget to carefully change everything to your metadata. It’s pretty embarrassing if you accidentally attribute your work to me. I’ve done it before by accident. I’ve seen others do it too…
deps =
skip_install = true
commands = pyroma --min=10 .
description = Run the pyroma tool to check the package friendliness of the project.
This environment adds the skip_install
key, which just says not to bother pip
installing the whole package for the tests. This makes sense here because
checking the metadata contained in setup.cfg
doesn’t require actually
installing the code.
Where to Put Persistent Data
If your application needs to download data that is not related to user input or configuration, it’s best that it has a default location for storing stuff that isn’t in a place a normal user will delete or corrupt. This is a situation where it might make sense to make a folder in the user’s home directory.
# -*- coding: utf-8 -*-
"""Constants for PyBEL."""
import os
__all__ = [
# Have a reasonable default location
_DEFAULT_HOME = os.path.join(os.path.expanduser('~'), '.pybel')
# Allow the user to modify the location with an environment variable
PYBEL_HOME = os.path.abspath(os.getenv('PYBEL_HOME', _DEFAULT_HOME))
os.makedirs(PYBEL_HOME, exist_ok=True)
Now you can import PYBEL_HOME
and do all sorts of nice os.path.join
s to
build the directory structure to help organize the data you might need. For
example, PyBEL will download a copy of Daniel Himmelstein’s
hetionet for conversion to BEL and put it
in its cache folder.
Becuase this kind of configuration is so ubiquitous, I’ve written an package
that supports doing this called pystow
that simplifies the previous code to:
import pystow
pybel_module = pystow.module("pybel")
PYBEL_HOME = pybel_module.base
Where to Put Configuration
Only use the following section if your package actually needs configuration. Keep in mind that there should always be reasonable defaults for anything that can be configured, so users don’t actually have to engage with configuration.
Most packages put configuration inside the ~/.config/
folder, so you should do
the same. PyBEL uses something a bit different, but here’s a mock of how loading
configuration might look.
import os
from configparser import ConfigParser
CONFIG_PATH = os.path.join(os.path.expanduser('~'), '.config', 'pybel.ini')
cfp = ConfigParser()
config = cfp['pybel']
except KeyError:
config = {}
def get_config(key):
"""Get PyBEL-specific configuration."""
return config.get(key)
You could also use JSON, but ini/cfg files are ubiquitous for configuration so
it’s best to stick to what’s expected. Because I’ve written the previous code so
often, I encapsulated it in a function in pystow
import pystow
def get_config(key):
pystow.get_config("pybel", key)
Note, this function has a few more bells and whistles than the boilerplate code for fallbacks, passthroughs, error handling, and type coercion.
Code Style
Mercilessly use flake8
to check your code has good style. If your code doesn’t
have good style, nobody else will be able to read it. I already wrote a whole
blog post on this one called Flake8
Random Code Style Necessities
Every python file must start with the file encoding, a newline, then the module docstring like:
# -*- coding: utf-8 -*-
"""The module-level docstring."""
This docstring has to follow flake8
rules, meaning there’s a short description
that fits on the first line then there’s a period. After that, there can be a
blank line before any other restructured text-formatted documentation you’d
Bonus Round: How to Code with Ben Gyori
This isn’t something I do, but to maintain a clean git history, Ben Gyori frequently reminds me to rebase on master. This keeps a more linear history of what happened and when. Here are his instructions:
git fetch --all
# This is your master
git checkout master
# This is their master
git merge --ff-only upstream/master
git rebase master <your branch name>
# Optionally
git push -f origin <your branch name>
This is by no means everything I have to say on this topic. I’ll be back with more on documentation, ReadTheDocs, using CI, checking unit test coverage, and more.