How to Code with Me - Making a CLI
One of the cardinal sins in computational science is to hard code a file path in your analysis. This post is a guide to reorganizing your code to avoid this and then to generate a command line interface (CLI) using click.
The best way around this is to make all of your code live inside a function that takes a file path as an argument. Here’s an example of some sinful code:
# sinful_analysis.py
import pandas as pd
df = pd.read_csv('/Users/cthoyt/data/example.tsv')
analysis = do_analysis(df)
save_analysis(analysis, '/Users/cthoyt/data/analysis.tsv')
Here’s the same code, but enlightened:
# enlightened_analysis.py
import pandas
def do_enlightened_analysis(input_path, output_path):
df = pd.read_csv(input_path)
analysis = do_analysis(df)
save_analysis(analysis, output_path)
The enlightened code doesn’t contain any references to the file paths on which you’re doing analysis. In fact, the enlightened code can’t even be run directly without passing the file paths as variables. This pattern gets you in the mindset of separating the code from the configuration for running the code. Again, this is important because the file path will change depending on who’s running it, if you decide to do spring cleaning on your hard drive, or if you get new files.
There are lots of ways you might pass the input and output paths into this
function. The most obvious, since you’ve probably read my previous blog
post and you’re now
a packaging master, is to import enlightened_analysis
and run it from the
Python REPL. Another way would be to make a one-off Python script whose job is
to actually run the analysis (as opposed to this example, which is creating the
workflow to be run). Though this are both better than the sinful analysis, it’s
a problem since you have to manually interact with Python to run your code.
You’re likely familiar with using the CLI for pip
. Wouldn’t it be terrible if
you had to write a Python script that calls pip
(like R makes you do with
install.packages()
, ughhh!!). This is the same visceral reaction you should
have to having to make specific python code for an analysis.
Making your first CLI
As you might have guessed, the solution is to make a CLI. After making your function that does the hard work, the job of the CLI should just take care of getting the configuration (e.g., file paths) from the user and passing them to your functions that do the hard work.
CLIs should be very very short! If you’re putting lots of logic inside your CLI, then you should probably reconsider refactoring that logic into more generally reusable functions. Here’s an example of the hello world CLI:
# cli_simple.py
if __name__ == '__main__':
print('hello')
Then you run this python script with python cli_simple.py
. It’s boring. You
didn’t need to read this guide to do this. However, you might not know what
if __name__ == '__main__'
is. It turns out that all python files know their
name and store it in the __name__
variable when they’re imported. However,
if you are running a python file as a script then __name__
gets set to the
string '__main__'
. This allows you to make sure that the print('hello')
is
only ever run if the user is actually running the script as a CLI.
However, don’t be tempted to put lots of code in if __name__ == '__main__'
.
You should always make a function main()
where all of the code that runs the
CLI, and just call it.
# cli_simple_2.py
def main():
print('hello')
if __name__ == '__main__':
main()
You still run this script with python cli_simple_2.py
. Another reason we
introduced main()
is to take care of getting information from the user. That’s
where click
comes in. It makes functions do all sorts of magical things to get
information from command line arguments. All you have to do is use function
decorator (something starting with the @
symbol) to annotate that the function
is a click.command()
. If you’re not already familiar with decorators, check
this short video or
this long video.
Run pip install click
in your shell then update your code to look like this:
# cli_simple_3.py
import click
@click.command()
def main():
print('hello')
if __name__ == '__main__':
main()
You still run this script with python cli_simple_3.py
and it does exactly the
same as the last one. However, once your main function is a click.command()
,
you can do all sorts of wonderful things. The first is to pass arguments from
the command line into the function. Update your code to look like the following:
# cli_simple_4.py
import click
@click.command()
@click.argument('text')
def main(text):
print(text)
if __name__ == '__main__':
main()
You probably notice that the call to main()
at the bottom does not include an
argument for text
. That’s because click
is decorating the original function
in the meantime, which means that the actual thing called main
is a function
that takes no arguments in the end. This is good, because click uses the extra
decorators (e.g. click.argument(...)
, click.option(...)
) to figure out what
to put in the arguments from the function that we actually wrote. Notice that
the @click.argument('text')
matches up to the variable name. That’s no
coincidence.
You can now run this script from the command line with python cli_simple_4.py
.
You’ll see that it yells at you for forgetting the text
argument. Better not!
Try again with python cli_simple_4.py "Hello World!"
and you’ll be happy to
see you’re now at Hello World for CLIs. From here you can do all sorts of
stuff which is all outlined in the excellent
click
documentation.
click
also automatically generates documentation for you, so it’s always
possible to run the command without arguments and with the --help
flag as in
python cli_simple_4.py --help
. It will give you information about all of the
arguments, their types, and more.
CLIs in Package World
It’s my strong opinion that almost all code should be packaged, and the CLI is
no exception. To finish our original problem, we’ll create a python file
cli.py
in the package where enlightened_analysis.py
is and import our
function from there. Then we’ll add the right arguments to click
, pass them to
the right place, and profit!
# cli.py
import click
from .enlightened_analysis import do_enlightened_analysis
@click.command()
@click.argument('input_path')
@click.argument('output_path')
def main(input_path, output_path):
do_enlightened_analysis(input_path, output_path)
if __name__ == '__main__':
main()
If you’ve done it right, a the body of the main()
function for your CLI should
very boring function. Of course, there are other ways to organize your code, but
this is a good way to do it until you’re more comfortable.
However, now we’re living in package world. In this tutorial, I’ve skipped the
explanation of turning the enlightened_analysis.py
and cli.py
into a
package. I’ll assume from here that you’ve done this and named the package
superanalysis
. If you’re not familiar with doing that, check my previous blog
post or
my tutorial on YouTube.
We don’t want to interact with this code by running it as a script with
python cli.py
. Instead, we want to interact with the code via the package.
Further, if you cd
into the place where the code is and run python cli.py
,
you’ll get an import warning because relative imports don’t work when you’re not
in a Python package context. This error is a good thing - it’s a reminder that
you should always live in the packaged world.
The solution to the problem is to use the -m
flag in the python
CLI.
Remember that enlightened_analysis.py
and cli.py
modules are in a package
called superanalysis
(that you should have also already installed). You can
now run the CLI using python -m superanalysis.cli <input_path> <output_path>
.
This is also going to set __name__
to '__main__'
the same way as running it
as a script, but you’re in the python package context!
Vanity is a Virtue
The -m
can almost be used to run any python file inside your package as
command line interface, which means you should always wrap up code for the CLI
in if __name__ == '__main__'
so it doesn’t accidentally get run if the module
is imported.
The exception is the __init__.py
files can’t be run as a module. If you were
to write python -m superanalysis
, it wouldn’t run the __init__.py
file as a
script and instead would throw an error. If you want to associate a CLI with the
package , you need to make an additional file called __main__.py
sitting next
to cli.py
in the superanalysis
package. As an aside, this also works in
subpackages.
# __main__.py
"""Entrypoint module, in case you use `python -m superanalysis`.
Why does this file exist, and why `__main__`? For more info, read:
- https://www.python.org/dev/peps/pep-0338/
- https://docs.python.org/3/using/cmdline.html#cmdoption-m
"""
from .cli import main
if __name__ == '__main__':
main()
This python module simply reuses the main function we already wrote before. It can basically be copied verbatim from package to package, but don’t forget to change the first line to match yours! I like to copy it because it also has the information from the python docs on why it works.
Now, you can run python -m superanalysis
instead of
python -m superanalysis.cli
. We can also do one better. Wouldn’t it be nice to
make a CLI function so we could just run
superanalysis <input_path> <output_path>
? You’re in luck, because since we
grouped all of our code in a main()
function, we can make a small addition to
the setup.cfg
’s entry points to tell pip
to automatically create a
superanalysis
CLI in your shell by doing the following:
[options.entry_points]
console_scripts =
superanalysis = superanalysis.cli:main
The left part is the name of the CLI that will be in your shell, then the right
part has the path to a module followed by a colon :
then the name of the
function to be run.
That’s pretty much it! Now you can make beautiful command line interfaces. There’s one more topic that I think is worth noting at the end of this tutorial, and that’s to use command line groups. This allows you to organize subcommands in your CLI and import other CLIs from other modules. It would look like this:
# cli.py
import click
@click.group()
def main():
pass # becuase this is a group, you don't actually need to do anything in it
@main.command() # note that the main function now can assign commands
def subcommand1():
print('hello world')
@main.command()
def subcommand2():
print('other greeting')
@main.group()
def subgroup():
pass
@subgroup.command()
def turtle():
print('you can go as deep as you want with subcommands')
# You can include other CLIs from other modules in your package
# to make everything much more unified
from .my_other_module.cli import main as other_module_command
main.add_command(other_module_command)
You can see that rather than using the click.command()
decorator, the main()
got the click.group()
decorator. This means that it can be used to issue
subcommands or even subgroups! At the end, it was also used to combine CLIs from
another part of the same package. This is good if you have a package that does
lots of things, but you want a single unified CLI to access all of it. Just be
careful with how the functions are named (it doesn’t always have to be main)
because if two are called the same thing then there will be a name clash and one
sub-command or sub-group won’t get shown.
CLIs are really powerful! In the end, you can write in your README how you used your CLI on your data to run your experiment. This gives others the best shot at reproducing your work. Happy hunting, and see you the next installment of “How to Code with Me”!