One of the cardinal sins in computational science is to hard code a file path in your analysis. This post is a guide to reorganizing your code to avoid this and then to generate a command line interface (CLI) using click.

The best way around this is to make all of your code live inside a function that takes a file path as an argument. Here’s an example of some sinful code:

# sinful_analysis.py
import pandas as pd

df = pd.read_csv('/Users/cthoyt/data/example.tsv')
analysis = do_analysis(df)
save_analysis(analysis, '/Users/cthoyt/data/analysis.tsv')

Here’s the same code, but enlightened:

# enlightened_analysis.py
import pandas

def do_enlightened_analysis(input_path, output_path):
    df = pd.read_csv(input_path)
    analysis = do_analysis(df)
    save_analysis(analysis, output_path)

The enlightened code doesn’t contain any references to the file paths on which you’re doing analysis. In fact, the enlightened code can’t even be run directly without passing the file paths as variables. This pattern gets you in the mindset of separating the code from the configuration for running the code. Again, this is important because the file path will change depending on who’s running it, if you decide to do spring cleaning on your hard drive, or if you get new files.

There are lots of ways you might pass the input and output paths into this function. The most obvious, since you’ve probably read my previous blog post and you’re now a packaging master, is to import enlightened_analysis and run it from the Python REPL. Another way would be to make a one-off Python script whose job is to actually run the analysis (as opposed to this example, which is creating the workflow to be run). Though this are both better than the sinful analysis, it’s a problem since you have to manually interact with Python to run your code.

You’re likely familiar with using the CLI for pip. Wouldn’t it be terrible if you had to write a Python script that calls pip (like R makes you do with install.packages(), ughhh!!). This is the same visceral reaction you should have to having to make specific python code for an analysis.

Making your first CLI

As you might have guessed, the solution is to make a CLI. After making your function that does the hard work, the job of the CLI should just take care of getting the configuration (e.g., file paths) from the user and passing them to your functions that do the hard work.

CLIs should be very very short! If you’re putting lots of logic inside your CLI, then you should probably reconsider refactoring that logic into more generally reusable functions. Here’s an example of the hello world CLI:

# cli_simple.py

if __name__ == '__main__':
    print('hello')

Then you run this python script with python cli_simple.py. It’s boring. You didn’t need to read this guide to do this. However, you might not know what if __name__ == '__main__' is. It turns out that all python files know their name and store it in the __name__ variable when they’re imported. However, if you are running a python file as a script then __name__ gets set to the string '__main__'. This allows you to make sure that the print('hello') is only ever run if the user is actually running the script as a CLI.

However, don’t be tempted to put lots of code in if __name__ == '__main__'. You should always make a function main() where all of the code that runs the CLI, and just call it.

# cli_simple_2.py

def main():
    print('hello')

if __name__ == '__main__':
    main()

You still run this script with python cli_simple_2.py. Another reason we introduced main() is to take care of getting information from the user. That’s where click comes in. It makes functions do all sorts of magical things to get information from command line arguments. All you have to do is use function decorator (something starting with the @ symbol) to annotate that the function is a click.command(). If you’re not already familiar with decorators, check this short video or this long video.

Run pip install click in your shell then update your code to look like this:

# cli_simple_3.py
import click

@click.command()
def main():
    print('hello')

if __name__ == '__main__':
    main()

You still run this script with python cli_simple_3.py and it does exactly the same as the last one. However, once your main function is a click.command(), you can do all sorts of wonderful things. The first is to pass arguments from the command line into the function. Update your code to look like the following:

# cli_simple_4.py
import click

@click.command()
@click.argument('text')
def main(text):
    print(text)

if __name__ == '__main__':
    main()

You probably notice that the call to main() at the bottom does not include an argument for text. That’s because click is decorating the original function in the meantime, which means that the actual thing called main is a function that takes no arguments in the end. This is good, because click uses the extra decorators (e.g. click.argument(...), click.option(...)) to figure out what to put in the arguments from the function that we actually wrote. Notice that the @click.argument('text') matches up to the variable name. That’s no coincidence.

You can now run this script from the command line with python cli_simple_4.py. You’ll see that it yells at you for forgetting the text argument. Better not! Try again with python cli_simple_4.py "Hello World!" and you’ll be happy to see you’re now at Hello World for CLIs. From here you can do all sorts of stuff which is all outlined in the excellent click documentation.

click also automatically generates documentation for you, so it’s always possible to run the command without arguments and with the --help flag as in python cli_simple_4.py --help. It will give you information about all of the arguments, their types, and more.

CLIs in Package World

It’s my strong opinion that almost all code should be packaged, and the CLI is no exception. To finish our original problem, we’ll create a python file cli.py in the package where enlightened_analysis.py is and import our function from there. Then we’ll add the right arguments to click, pass them to the right place, and profit!

# cli.py
import click
from .enlightened_analysis import do_enlightened_analysis

@click.command()
@click.argument('input_path')
@click.argument('output_path')
def main(input_path, output_path):
    do_enlightened_analysis(input_path, output_path)

if __name__ == '__main__':
    main()

If you’ve done it right, a the body of the main() function for your CLI should very boring function. Of course, there are other ways to organize your code, but this is a good way to do it until you’re more comfortable.

However, now we’re living in package world. In this tutorial, I’ve skipped the explanation of turning the enlightened_analysis.py and cli.py into a package. I’ll assume from here that you’ve done this and named the package superanalysis. If you’re not familiar with doing that, check my previous blog post or my tutorial on YouTube.

We don’t want to interact with this code by running it as a script with python cli.py. Instead, we want to interact with the code via the package. Further, if you cd into the place where the code is and run python cli.py, you’ll get an import warning because relative imports don’t work when you’re not in a Python package context. This error is a good thing - it’s a reminder that you should always live in the packaged world.

The solution to the problem is to use the -m flag in the python CLI. Remember that enlightened_analysis.py and cli.py modules are in a package called superanalysis (that you should have also already installed). You can now run the CLI using python -m superanalysis.cli <input_path> <output_path>. This is also going to set __name__ to '__main__' the same way as running it as a script, but you’re in the python package context!

Vanity is a Virtue

The -m can almost be used to run any python file inside your package as command line interface, which means you should always wrap up code for the CLI in if __name__ == '__main__' so it doesn’t accidentally get run if the module is imported.

The exception is the __init__.py files can’t be run as a module. If you were to write python -m superanalysis, it wouldn’t run the __init__.py file as a script and instead would throw an error. If you want to associate a CLI with the package , you need to make an additional file called __main__.py sitting next to cli.py in the superanalysis package. As an aside, this also works in subpackages.

# __main__.py

"""Entrypoint module, in case you use `python -m superanalysis`.

Why does this file exist, and why `__main__`? For more info, read:

 - https://www.python.org/dev/peps/pep-0338/
 - https://docs.python.org/3/using/cmdline.html#cmdoption-m
"""

from .cli import main

if __name__ == '__main__':
    main()

This python module simply reuses the main function we already wrote before. It can basically be copied verbatim from package to package, but don’t forget to change the first line to match yours! I like to copy it because it also has the information from the python docs on why it works.

Now, you can run python -m superanalysis instead of python -m superanalysis.cli. We can also do one better. Wouldn’t it be nice to make a CLI function so we could just run superanalysis <input_path> <output_path>? You’re in luck, because since we grouped all of our code in a main() function, we can make a small addition to the setup.cfg’s entry points to tell pip to automatically create a superanalysis CLI in your shell by doing the following:

[options.entry_points]
console_scripts =
    superanalysis = superanalysis.cli:main

The left part is the name of the CLI that will be in your shell, then the right part has the path to a module followed by a colon : then the name of the function to be run.

That’s pretty much it! Now you can make beautiful command line interfaces. There’s one more topic that I think is worth noting at the end of this tutorial, and that’s to use command line groups. This allows you to organize subcommands in your CLI and import other CLIs from other modules. It would look like this:

# cli.py

import click

@click.group()
def main():
    pass  # becuase this is a group, you don't actually need to do anything in it

@main.command()  # note that the main function now can assign commands
def subcommand1():
    print('hello world')

@main.command()
def subcommand2():
    print('other greeting')

@main.group()
def subgroup():
    pass

@subgroup.command()
def turtle():
    print('you can go as deep as you want with subcommands')

# You can include other CLIs from other modules in your package
# to make everything much more unified
from .my_other_module.cli import main as other_module_command
main.add_command(other_module_command)

You can see that rather than using the click.command() decorator, the main() got the click.group() decorator. This means that it can be used to issue subcommands or even subgroups! At the end, it was also used to combine CLIs from another part of the same package. This is good if you have a package that does lots of things, but you want a single unified CLI to access all of it. Just be careful with how the functions are named (it doesn’t always have to be main) because if two are called the same thing then there will be a name clash and one sub-command or sub-group won’t get shown.


CLIs are really powerful! In the end, you can write in your README how you used your CLI on your data to run your experiment. This gives others the best shot at reproducing your work. Happy hunting, and see you the next installment of “How to Code with Me”!