R is a scripting language.
While often used interactively in exploratory data science, or run
batch-style to replicate a previous analysis, the language can also be
used for shell-like scripting at a command line. This usage is supported
by the Rscript
binary provided with R, and by the littler
project, but parsing script arguments requires additional
effort.
There are several R packages available to provide option and argument
parsing, and choosing between them is largely a matter of taste. docopt
is cute, and available for R
thanks to Edwin de Jonge, but I find some of
its heuristics strange and that puts me off. Also available on CRAN
are argparse
,
argparser
,
batch
,
defineOptions
,
getopt
,
GetoptLong
,
optigrab
,
optparse
and scribe
,
so there’s no shortage of options.
But none of these suited me, because I’m picky, so I wrote my own.
The arrg
package is not yet on CRAN, though it should be
soon. In the meantime it can be installed using the remotes
package:
# install.packages("remotes")
::install_github("jonclayden/arrg") remotes
The package is still in an experimental phase, so the interface and syntax are likely to change as it is developed. Among features planned but not currently implemented are default patterns, subcommands and automatic help options.
The package’s key function is also called arrg
. It takes
structured information about how your command is to be used, and creates
a parser to extract the necessary information from user arguments, and
check them for validity.
Here’s an example.
#! /usr/bin/env Rscript --vanilla
library(arrg)
<- arrg("test",
parser opt("h,help", "Display this usage information and exit"),
opt("n,times", "Run test the specifed number of times", arg="count", default=1L),
opt("t,time", "Print the overall run-time once the test is completed"),
opt("install", "Install the code before testing it"),
patterns=list(
pat(options="h!"),
pat("command", "arg...?", options="nt"),
pat("path?", options="n,t,install")),
header="Test your code",
footer="Run the test on the code at the specified path (default \".\"), or run a specific command.")
There’s quite a lot going on here, so let’s take it a bit at a time.
The “shebang” line
starting #!
allows the script to be run directly from the
command line without explicitly passing it through Rscript
or r
. It’s not required but is a helpful convenience on
Unix-like systems.
The first argument to arrg()
, in this case
"test"
, defines the name of the command. That should
generally match the name of the file, but this is an abstract example
without a file so it could be anything suitable.
The various calls to opt()
specify options that the
command accepts. When the user calls the command they would need to
preface them with one or two hyphens, for short and long versions
respectively, but when specifying them this is optional. A short
(single-character) or long (word-length) option must be specified, or
both, separated by a comma as in "h,help"
. The second
argument gives a description of the option to help the user, which is
displayed in the usage output.
Some options take an argument, like the -n
or
--times
option above. In that case, we specify the name of
the argument (arg="count"
) and, optionally, a default value
(default=1L
). Since the default is of integer mode, any
specified value of this argument will be coerced to integer.
Patterns give mutually exclusive ways in which the command may be
used, and will be displayed separately in the usage information. Here,
we have one pattern that requires the -h
(or
--help
) option, as the exclamation mark indicates. The
second pattern accepts a subcommand and possible arguments (the
ellipsis, ...
, indicates one or more values, and the
question mark, ?
, that the argument is optional), and
accepts the -n
and -t
options. The third takes
an optional path and accepts the -n
, -t
and
--install
options.
This is all made explicit to the user in the usage information, which
can be shown using the show()
method. This is also where
the header
and footer
arguments above come in,
as they’re shown before and after the usage and option summary:
$show()
parser## Test your code
##
## Usage:
## test -h
## test [-n <count>] [-t] <command> [<arg>...]
## test [-n <count>] [-t] [--install] [<path>]
##
## Options:
## -h, --help Display this usage information and
## exit
## -n <count>, --times=<count> Run test the specifed number of times
## -t, --time Print the overall run-time once the
## test is completed
## --install Install the code before testing it
##
## Run the test on the code at the specified path (default "."), or run
## a specific command.
Note that the text blocks are wrapped neatly, each pattern is shown separately, optional arguments are surrounded by square brackets, argument names are used in appropriate places, and so on.
The parse()
method then does the actual parse. By
default it takes script arguments from the result of
commandArgs(trailingOnly=TRUE)
, but arguments can also be
specified explicitly. (We would use argv
for
littler
.)
$parse("-h")
parser## $help
## [1] TRUE
The result is a list with the valid arguments included, keyed by
their argument names, long option labels or short option labels, in that
order of preference. Note that -h
is a flag, so its value
is a Boolean (TRUE
/FALSE
) value. In practice
this form would prompt us to show()
the usage information
above. A more interesting example could be
$parse(c("-tn3", "--install", "."))
parser## $path
## [1] "."
##
## $times
## [1] 3
##
## $time
## [1] TRUE
##
## $install
## [1] TRUE
Here, we’re using the third pattern, which arrg
distinguishes from the second by the presence of the
--install
option, which the second pattern doesn’t accept.
Note that -tn3
combines the -t
option with the
-n
option and its argument, which is a common space-saving
syntax that arrg
supports; and that the argument to the
-n
(times) option gets converted to integer mode
automatically to match the default value.
The script can now use the parsed argument list to implement its core functionality.
arrg
?OK, so why might you want to use arrg
in preference to
one of the many alternatives for R? As I stated at the outset, this is
largely a question of taste, but here are my reasons.
For a fairly basic piece of functionality like this, portability is
important to me, and that means that I want dependencies to be minimal,
especially outside the R ecosystem. That excludes the
argparse
package, which requires Python, and
GetoptLong
, which requires Perl.
Package batch
doesn’t use Unix-style options, but rather
an argument list of alternating variable names and values which are
interpreted as R expressions. It’s an neat and simple solution, but a
little verbose to use in my opinion, especially for simple on/off flags.
getopt
is very bare-bones, and is essentially subsumed by
optparse
. I couldn’t get defineOptions
to work
as expected using the documentation.
That still leaves various alternatives. docopt
is a
solid option if its heuristics work for you. Indeed, it parses
arrg
’s help string correctly for the test case above, and
is the only other package that seems to handle multiple usage patterns
and short-option clusters like -tn3
.
# Write the usage help output to a character vector
$show(textConnection("help", "w"))
parser
::docopt(paste(help,collapse="\n"), c("-tn3", "--install", "."))
docopt## List of 16
## $ --help : logi FALSE
## $ --times : chr "3"
## $ --time : logi TRUE
## $ --install: logi TRUE
## $ <command>: NULL
## $ <arg> : list()
## $ <path> : chr "."
## $ is : logi FALSE
## $ completed: logi FALSE
## $ help : logi FALSE
## $ times : chr "3"
## $ time : logi TRUE
## $ install : logi TRUE
## $ command : NULL
## $ arg : list()
## $ path : chr "."
## NULL
optigrab
doesn’t use a parser object or up-front
interface specification, but just searches for each option on demand. As
a result, the requested options must already be in scope when its
functions are called. It generates basic usage for options it has seen
when opt_help()
is called.
library(optigrab)
## optigrab-0.9.2.1 (2019-01-05) - Copyright © 2019 Decision Patterns
<- c("-t", "-n", "3", "--install", ".")
opts
list(help=opt_get(c("h","help"), FALSE, description="Display this usage information and exit", opts=opts),
times=opt_get(c("n","times"), 1L, description="Run test the specifed number of times", opts=opts),
time=opt_get(c("t","time"), FALSE, description="Print the overall run-time once the test is completed", opts=opts),
install=opt_get("install", FALSE, description="Install the code before testing it", opts=opts),
opt_get_verb(opts=opts))
## $help
## [1] FALSE
##
## $times
## [1] 3
##
## $time
## [1] TRUE
##
## $install
## [1] TRUE
##
## [[5]]
## [1] "."
# This will quit a non-interative R session
# opt_help(opts="--help")
The optparse
package works similarly to
arrg
, although it has quite basic support for positional
arguments and creates only a simple usage block by default.
library(optparse)
<- OptionParser(prog="test",
op.parser add_help_option=FALSE,
description="Test your code",
epilogue="Run the test on the code at the specified path (default \".\"), or run a specific command.") |>
add_option(c("-h","--help"), "store_true", help="Display this usage information and exit") |>
add_option(c("-n","--times"), "store", default=1L, help="Run test the specifed number of times") |>
add_option(c("-t","--time"), "store_true", help="Print the overall run-time once the test is completed") |>
add_option("--install", "store_true", help="Install the code before testing it")
print_help(op.parser)
## Usage: test [options]
## Test your code
##
## Options:
## -h, --help
## Display this usage information and exit
##
## -n TIMES, --times=TIMES
## Run test the specifed number of times
##
## -t, --time
## Print the overall run-time once the test is completed
##
## --install
## Install the code before testing it
##
## Run the test on the code at the specified path (default "."), or run a specific command.
parse_args(op.parser, "-h", print_help_and_exit=FALSE, positional_arguments=TRUE)
## $options
## $options$help
## [1] TRUE
##
## $options$times
## [1] 1
##
##
## $args
## character(0)
parse_args(op.parser, c("-t", "-n", "3", "--install", "."), print_help_and_exit=FALSE, positional_arguments=TRUE)
## $options
## $options$times
## [1] 3
##
## $options$time
## [1] TRUE
##
## $options$install
## [1] TRUE
##
##
## $args
## [1] "."
And finally, scribe
generates nice help output, but I
couldn’t figure out how to fully suppress its default
--help
and --version
options:
library(scribe)
<- command_args(c("-t", "-n", "3", "--install", "."), include=NA_character_)
s.parser $add_description("Test your code")
s.parser
$add_argument("-h", "--help", action="flag", options=list(no=FALSE), info="Display this usage information and exit")
s.parser$add_argument("-n", "--times", n=1L, default=1L, info="Run test the specifed number of times")
s.parser$add_argument("-t", "--time", action="flag", options=list(no=FALSE), info="Print the overall run-time once the test is completed")
s.parser$add_argument("--install", action="flag", options=list(no=FALSE), info="Install the code before testing it")
s.parser$add_argument("command", n=1L, info="Command to run or path to code")
s.parser
$help()
s.parser## {scribe} command_args
##
## file : {path}
##
## DESCRIPTION
## Test your code
##
## USAGE
## {command} [--help | --version]
## {command} [-h, --help] [-n, --times [ARG]] [-t, --time] [--install] [command [ARG]]
##
## ARGUMENTS
## -h, --help : Display this usage information and exit
## -n, --times [ARG] : Run test the specifed number of times
## -t, --time : Print the overall run-time once the test is completed
## --install : Install the code before testing it
## command [ARG] : Command to run or path to code
$parse()
s.parser## $help
## [1] FALSE
##
## $times
## [1] 3
##
## $time
## [1] TRUE
##
## $install
## [1] TRUE
##
## $command
## [1] "."