How to Code with Me - Making a CLI
One of the cardinal sins in computational science is to hard code a file path in your analysis. This post is a guide to reorganizing your code to avoid this and then to generate a command line interface (CLI) using click.
The best way around this is to make all of your code live inside a function that takes a file path as an argument. Here’s an example of some sinful code:
# sinful_analysis.py
import pandas as pd
df = pd.read_csv('/Users/cthoyt/data/example.tsv')
analysis = do_analysis(df)
save_analysis(analysis, '/Users/cthoyt/data/analysis.tsv')
Here’s the same code, but enlightened:
# enlightened_analysis.py
import pandas
def do_enlightened_analysis(input_path, output_path):
df = pd.read_csv(input_path)
analysis = do_analysis(df)
save_analysis(analysis, output_path)
The enlightened code doesn’t contain any references to the file paths on which you’re doing analysis. In fact, the enlightened code can’t even be run directly without passing the file paths as variables. This pattern gets you in the mindset of separating the code from the configuration for running the code. Again, this is important because the file path will change depending on who’s running it, if you decide to do spring cleaning on your hard drive, or if you get new files.
There are lots of ways you might pass the input and output paths into this function. The
most obvious, since you’ve probably read my previous blog post
and you’re now a packaging master, is to
import enlightened_analysis
and run it from the Python REPL. Another way would be to
make a one-off Python script whose job is to actually run the analysis (as opposed to this
example, which is creating the workflow to be run). Though this are both better than the
sinful analysis, it’s a problem since you have to manually interact with Python to run your code.
You’re likely familiar with using the CLI for pip
.
Wouldn’t it be terrible if you had to write a Python script
that calls pip
(like R makes you do with install.packages()
, ughhh!!). This is the
same visceral reaction you should have to having to make specific python code for an analysis.
Making your first CLI
As you might have guessed, the solution is to make a CLI. After making your function that does the hard work, the job of the CLI should just take care of getting the configuration (e.g., file paths) from the user and passing them to your functions that do the hard work.
CLIs should be very very short! If you’re putting lots of logic inside your CLI, then you should probably reconsider refactoring that logic into more generally reusable functions. Here’s an example of the hello world CLI:
# cli_simple.py
if __name__ == '__main__':
print('hello')
Then you run this python script with python cli_simple.py
. It’s boring. You didn’t need
to read this guide to do this. However, you might not know what if __name__ == '__main__'
is.
It turns out that all python files know their name and store it in the __name__
variable when
they’re imported. However, if you are running a python file as a script then __name__
gets
set to the string '__main__'
. This allows you to make sure that the print('hello')
is only
ever run if the user is actually running the script as a CLI.
However, don’t be tempted to put lots of code in if __name__ == '__main__'
. You should always
make a function main()
where all of the code that runs the CLI, and just call it.
# cli_simple_2.py
def main():
print('hello')
if __name__ == '__main__':
main()
You still run this script with python cli_simple_2.py
.
Another reason we introduced main()
is to take care of getting information from the user. That’s
where click
comes in. It makes functions do all sorts of magical things to get information
from command line arguments. All you have to do is use function decorator (something
starting with the @
symbol) to annotate that the function is a click.command()
. If you’re not
already familiar with decorators, check this short video
or this long video.
Run pip install click
in your shell then update your code to look like this:
# cli_simple_3.py
import click
@click.command()
def main():
print('hello')
if __name__ == '__main__':
main()
You still run this script with python cli_simple_3.py
and it does exactly the same as the last one.
However, once your main function is a click.command()
, you can do all sorts of wonderful things.
The first is to pass arguments from the command line into the function. Update your code to look like
the following:
# cli_simple_4.py
import click
@click.command()
@click.argument('text')
def main(text):
print(text)
if __name__ == '__main__':
main()
You probably notice that the call to main()
at the bottom does not include an argument
for text
. That’s because click
is decorating the original function in the meantime, which means
that the actual thing called main
is a function that takes no arguments in the end. This
is good, because click uses the extra decorators (e.g. click.argument(...)
, click.option(...)
)
to figure out what to put in the arguments
from the function that we actually wrote. Notice that the @click.argument('text')
matches up to the variable name. That’s no coincidence.
You can now run this script from the command line with python cli_simple_4.py
. You’ll
see that it yells at you for forgetting the text
argument. Better not! Try again
with python cli_simple_4.py "Hello World!"
and you’ll be happy to see you’re now at
Hello World for CLIs. From here you can do all sorts of stuff which is all outlined in
the excellent click
documentation.
click
also automatically generates documentation for you, so it’s always possible to run
the command without arguments and with the --help
flag as in python cli_simple_4.py --help
.
It will give you information about all of the arguments, their types, and more.
CLIs in Package World
It’s my strong opinion that almost all code should be packaged, and the CLI is no exception.
To finish our original problem, we’ll create a python file cli.py
in the package
where enlightened_analysis.py
is and import our function from there. Then we’ll add
the right arguments to click
, pass them to the right place, and profit!
# cli.py
import click
from .enlightened_analysis import do_enlightened_analysis
@click.command()
@click.argument('input_path')
@click.argument('output_path')
def main(input_path, output_path):
do_enlightened_analysis(input_path, output_path)
if __name__ == '__main__':
main()
If you’ve done it right, a the body of the main()
function for your CLI should very boring function.
Of course, there are other ways to organize your code, but this is a good
way to do it until you’re more comfortable.
However, now we’re living in package world. In this tutorial, I’ve skipped the explanation of
turning the enlightened_analysis.py
and cli.py
into a package. I’ll assume from here that
you’ve done this and named the package superanalysis
. If you’re not familiar with doing that,
check my previous blog post or
my tutorial on YouTube.
We don’t want to interact with this code by running it as a script with python cli.py
. Instead, we want to
interact with the code via the package. Further, if you cd
into the place where the code is and
run python cli.py
, you’ll get an import warning because relative imports don’t work
when you’re not in a Python package context. This error is a good thing - it’s a reminder that you
should always live in the packaged world.
The solution to the problem is to use the -m
flag in the python
CLI. Remember that
enlightened_analysis.py
and cli.py
modules are in a package called superanalysis
(that
you should have also already installed).
You can now run the CLI using python -m superanalysis.cli <input_path> <output_path>
. This is
also going to set __name__
to '__main__'
the same way as running it as a script, but you’re
in the python package context!
Vanity is a Virtue
The -m
can almost be used to run any python file inside your package as command line interface,
which means you should always wrap up code for the CLI in if __name__ == '__main__'
so it doesn’t
accidentally get run if the module is imported.
The exception is the __init__.py
files can’t be run as a module. If you were to write
python -m superanalysis
, it wouldn’t run the __init__.py
file as a script and instead
would throw an error.
If you want to associate a CLI with the package , you need to make an additional file
called __main__.py
sitting next to cli.py
in the superanalysis
package. As an aside,
this also works in subpackages.
# __main__.py
"""Entrypoint module, in case you use `python -m superanalysis`.
Why does this file exist, and why `__main__`? For more info, read:
- https://www.python.org/dev/peps/pep-0338/
- https://docs.python.org/3/using/cmdline.html#cmdoption-m
"""
from .cli import main
if __name__ == '__main__':
main()
This python module simply reuses the main function we already wrote before. It can basically be copied verbatim from package to package, but don’t forget to change the first line to match yours! I like to copy it because it also has the information from the python docs on why it works.
Now, you can run python -m superanalysis
instead of python -m superanalysis.cli
.
We can also do one better. Wouldn’t it be nice to make a CLI function so we could just run
superanalysis <input_path> <output_path>
? You’re in luck, because since we grouped all
of our code in a main()
function, we can make a small addition to the setup.cfg
’s
entry points to tell pip
to automatically create a superanalysis
CLI in your shell
by doing the following:
[options.entry_points]
console_scripts =
superanalysis = superanalysis.cli:main
The left part is the name of the CLI that will be in your shell, then the right part has
the path to a module followed by a colon :
then the name of the function to be run.
That’s pretty much it! Now you can make beautiful command line interfaces. There’s one more topic that I think is worth noting at the end of this tutorial, and that’s to use command line groups. This allows you to organize subcommands in your CLI and import other CLIs from other modules. It would look like this:
# cli.py
import click
@click.group()
def main():
pass # becuase this is a group, you don't actually need to do anything in it
@main.command() # note that the main function now can assign commands
def subcommand1():
print('hello world')
@main.command()
def subcommand2():
print('other greeting')
@main.group()
def subgroup():
pass
@subgroup.command()
def turtle():
print('you can go as deep as you want with subcommands')
# You can include other CLIs from other modules in your package
# to make everything much more unified
from .my_other_module.cli import main as other_module_command
main.add_command(other_module_command)
You can see that rather than using the click.command()
decorator, the main()
got
the click.group()
decorator. This means that it can be used to issue subcommands or
even subgroups! At the end, it was also used to combine CLIs from another part of the
same package. This is good if you have a package that does lots of things, but you
want a single unified CLI to access all of it. Just be careful with how the functions
are named (it doesn’t always have to be main) because if two are called the same thing
then there will be a name clash and one sub-command or sub-group won’t get shown.
CLIs are really powerful! In the end, you can write in your README how you used your CLI on your data to run your experiment. This gives others the best shot at reproducing your work. Happy hunting, and see you the next installment of “How to Code with Me”!