Julia. Scripts and parsing command line arguments

  • Tutorial


We continue to deal with the programming language Julia. Since it is just necessary to have a batch mode for an analysis-oriented data processing language, consider the implementation features of Julia scripts and passing arguments from the command line to them. Someone, perhaps, this topic will seem commonplace, but given the novelty of the language, I hope that a small overview of the methods for parsing command line arguments and libraries for this, presented in Julia, will still be useful.


First, a few words about how the script is made. Any script starts with a special format string indicating the interpreter. The line begins with a sequence known as shebang. For Julia, this line is:


#!/usr/bin/env julia

Of course, you can not do this, but then you have to run the script with the command:


julia имяскрипта.jl

Also, any script must end with a newline character. This is a requirement of the POSIX standard, which follows from the definition of a string as a sequence of characters terminated by a newline character.


In order for the script to be directly run, it must have an attribute executable. You can add such an attribute in the terminal with the command:


chmod +x имяскрипта.jl

These rules are valid for all modern operating systems, except, perhaps, MS Windows.


Args array


Let us turn to the first option of the transfer of parameters. Command line arguments are available in the Julia script via the Base.ARGS constant array. Let's prepare the simplest script:


#!/usr/bin/env julia
@show typeof(ARGS)
@show ARGS

This script simply prints the type and contents of the ARGS array to the console.


Very often, the file name is passed as command line arguments. And here there is a feature of processing the file template passed in as an argument. For example, run our script with the command ./args.jl *.jland get:


>./args.jl *.jl
typeof(ARGS) = Array{String,1}
ARGS = ["argparse.jl", "args.jl", "docopt.jl"]

Now slightly modify the command-line option, the mask surrounding quotation marks:
./args.jl "*.jl". As a result, we get:


>./args.jl "*.jl"
typeof(ARGS) = Array{String,1}
ARGS = ["*.jl"]

We see an obvious difference. In the first case, we got an array with the names of all files that are in the same directory. In the second case, this is just the same mask that was passed as a command line argument. The reason for this different script behavior is that the bash interpreter (as well as those close to it), from which the script was run, recognizes the patterns of the file names. More information can be found in the search engine for the request "Bash Pattern Matching" or "Bash Wildcards". And all together it is called Globs.


Among the templates, masking of several characters is possible - *, masking of one character is? .. Search by range [...], and even the ability to specify complex combinations:


>./args.jl {args,doc}*
typeof(ARGS) = Array{String,1}
ARGS = ["args.jl", "docopt.jl"]

For more information, see the GNU / Linux Command-Line Tools Summary documentation.


If, for some reason, we do not want to use the globs mechanism provided by bash, then the mask can be found from the script using the Globs.jl package.
The following code converts everything found in the argument string into a single array of file names. That is, regardless of whether the user specified masks in quotes, without quotes, or simply listed the names of existing or non-existent files, filelistonly the names of actual files or directories will remain in the resulting array .


using Glob 
filelist = unique(collect(Iterators.flatten(map(arg -> glob(arg), ARGS))))

These simple examples, in essence, are a demonstration of the use of an ARGS array, where the programmer implements the entire logic of parsing arguments. This approach is often used when the argument set is extremely simple. For example, a list of file names. Or one or two options that can be processed by simple string operations. Access to the elements of ARGS is the same as to the elements of any other array. Just remember that the index of the first element in an array in Julia is 1.


ArgParse.jl package


It is a flexible means of describing attributes and command line options without the need to implement parsing logic.
Let's use a slightly modified example from the package documentation - http://carlobaldassi.github.io/ArgParse.jl/stable/ :


#!/usr/bin/env julia
using ArgParse
function parse_commandline()
    s = ArgParseSettings()
    @add_arg_table s begin
        "--opt1"
            help = "an option with an argument"
        "--opt2", "-o"
            help = "another option with an argument"
            arg_type = Int
            default = 0
        "--flag1"
            help = "an option without argument, i.e. a flag"
            action = :store_true
        "arg1"
            help = "a positional argument"
            required = true
    end
    return parse_args(s)
end
function main()
    @show parsed_args = parse_commandline()
    println("Parsed args:")
    for (arg,val) in parsed_args
        print("  $arg  =>  ")
        show(val)
        println()
    end
end
main()

If we run this script without arguments, we get the output of reference information on their composition:


>./argparse.jl 
required argument arg1 was not provided
usage: argparse.jl [--opt1 OPT1] [-o OPT2] [--flag1] arg1

Moreover, in square brackets we see optional arguments. At that time, the argument marked as arg1(that is, what we substitute in its place) is mandatory.


Run again, but specify the required attribute arg1.


>./argparse.jl test
parsed_args = parse_commandline() = Dict{String,Any}("flag1"=>false,"arg1"=>"test","opt1"=>nothing,"opt2"=>0)
Parsed args:
  flag1  =>  false
  arg1  =>  "test"
  opt1  =>  nothing
  opt2  =>  0

We can see that parsed_args- this is an associative array, where the keys are the names of the attributes according to the declaration made in the function parse_commandline, and their values ​​are that which was substituted by default or passed as values ​​of the command line arguments. And values ​​have that type which is explicitly specified at the declaration.


The declaration of arguments is performed using a macro @add_arg_table. It is possible to declare options:


    "--opt2", "-o"
        help = "another option with an argument"
        arg_type = Int
        default = 0

Or arguments


    "arg1"
        help = "a positional argument"
        required = true

Moreover, the options can be set with the indication of the full and short form (at the same time --opt2and -o). Or, only in the only form. The type is indicated in the field arg_type. The default value can be set with default = .... An alternative to the default value is to require the argument to be - required = true.
It is possible to declare an automatic action, for example, assign trueor falsedepending on the presence or absence of an argument. This is done usingaction = :store_true


        "--flag1"
            help = "an option without argument, i.e. a flag"
            action = :store_true

The field helpcontains the text that will be displayed in the prompt on the command line.
If at the start we specify all the attributes, we get:


>./argparse.jl --opt1 "2+2" --opt2 "4" somearg --flag
parsed_args = parse_commandline() = Dict{String,Any}("flag1"=>true,"arg1"=>"somearg","opt1"=>"2+2","opt2"=>4)
Parsed args:
  flag1  =>  true
  arg1  =>  "somearg"
  opt1  =>  "2+2"
  opt2  =>  4

For debugging from the Atom / Juno IDE, you can add the following, somewhat dirty, but working ARGS array initialization code to the first lines of the script.


if (Base.source_path() != Base.basename(@__FILE__))
    vcat(Base.ARGS, 
         ["--opt1", "2+2", "--opt2", "4", "somearg", "--flag"]
    )
end

Macro @__FILE__is the name of the file in which the macro is expanded. And this name for the REPL is different from the name of the current program file received through Base.source_path(). It Base.ARGSis impossible to initialize an array constant with another value, but, at the same time, you can add new lines, since the array itself is not a constant. The array is the column for Julia, so we use vcat(vertical concatenate).


However, in the settings of the Juno editor, you can set arguments to run the script. But they will have to be changed every time for each script being debugged individually.


Package DocOpt.jl


This option is an implementation of the docopt markup language approach - http://docopt.org/ . The main idea of ​​this language is a declarative description of options and arguments in a form, which can also be an internal description of a script. A special template language is used.


Let's use an example from the documentation for this package https://github.com/docopt/DocOpt.jl


#!/usr/bin/env julia
doc = """Naval Fate.
Usage:
  naval_fate.jl ship new <name>...
  naval_fate.jl ship <name> move <x> <y> [--speed=<kn>]
  naval_fate.jl ship shoot <x> <y>
  naval_fate.jl mine (set|remove) <x> <y> [--moored|--drifting]
  naval_fate.jl -h | --help
  naval_fate.jl --version
Options:
  -h --help     Show this screen.
  --version     Show version.
  --speed=<kn>  Speed in knots [default: 10].
  --moored      Moored (anchored) mine.
  --drifting    Drifting mine.
"""
using DocOpt  # import docopt function
args = docopt(doc, version=v"2.0.0")
@show args

A record doc = ...is the creation of a Julia string doc, which contains the entire declaration for docopt. The result of the launch on the command line with no arguments will be:


>./docopt.jl 
Usage:
  naval_fate.jl ship new <name>...
  naval_fate.jl ship <name> move <x> <y> [--speed=<kn>]
  naval_fate.jl ship shoot <x> <y>
  naval_fate.jl mine (set|remove) <x> <y> [--moored|--drifting]
  naval_fate.jl -h | --help
  naval_fate.jl --version

If we use the hint and try to "create a new ship", we get a printout of the associative array args, which was formed by the result of the command line parsing.


>./docopt.jl ship new Bystriy
args = Dict{String,Any}(
  "remove"=>false,
  "--help"=>false,
  "<name>"=>["Bystriy"],
  "--drifting"=>false,
  "mine"=>false,
  "move"=>false,
  "--version"=>false,
  "--moored"=>false,
  "<x>"=>nothing,
  "ship"=>true,
  "new"=>true,
  "shoot"=>false,
  "set"=>false,
  "<y>"=>nothing,
  "--speed"=>"10")

The function is docoptdeclared as:


docopt(doc::AbstractString, argv=ARGS;
           help=true, version=nothing, options_first=false, exit_on_error=true)

Named arguments help, version, oprtions_first, exit_on_errordefine the behavior of the default parser arguments komandroy line. For example, in case of errors, to complete execution, to issue a value substituted for a version version=…request -h, to issue a certificate to a request . options_firstused to indicate that options must be before positional arguments.


And now let's take a closer look at this declarative language and the reaction of the argument parser to the entered values.


The declaration begins with an arbitrary text, which, in addition to the text for the command line, may be part of the documentation of the script itself. The service word "Usage:" declares the use case templates for this script.


Usage:
  naval_fate.jl ship new <name>...
  naval_fate.jl ship <name> move <x> <y> [--speed=<kn>]

Arguments are declared in the form of <name>, <x>, <y>. Notice that in the associative array argsthat was obtained earlier, these arguments act as keys. We used the start form ./docopt.jl ship new Bystriy, so we got the following explicitly initialized values:


  "<name>"=>["Bystriy"],
  "ship"=>true,
  "new"=>true,

In accordance with the docopt language, optional elements are specified in square brackets. For example [--speed=<kn>]. In parentheses, the required elements are specified, but with a certain condition. For example, (set|remove)specifies the requirement of having one of them. If the element is specified without brackets, for example naval_fate.jl --version, it says that in this particular launch --versionoption is a mandatory option.


The next section is the option description section. It begins with the word "Options:"
Options are declared each on a separate line. Indents to the left of the beginning of the line are important. For each option, you can specify the full and short form. And also the description of the option given in the hint. At the same time, options are -h | --help, --versionautomatically recognized. The reaction to them is given by the function arguments docopt. Interesting to consider is the declaration:


  --speed=<kn>  Speed in knots [default: 10].

Here the form ...=<kn>specifies the presence of some value, and [default: 10]determines the default value. Referring again to the values ​​obtained in args:


"--speed"=>"10"

The principal difference, for example, from the ArgParse package, is that the values ​​are not typed. That is, the value is default: 10set as the string "10".
In relation to other arguments that are presented in argsas a result of the analysis of arguments, one should pay attention to their values:


  "remove"=>false,
  "--help"=>false,
  "--drifting"=>false,
  "mine"=>false,
  "move"=>false,
  "--version"=>false,
  "--moored"=>false,
  "<x>"=>nothing,
  "shoot"=>false,
  "set"=>false,
  "<y>"=>nothing,

That is, absolutely all template elements specified in the docopt declaration for all use cases are represented as a result of parsing with the original names. All optional arguments that were not present on the command line are false here. Arguments <x>are <y>also absent in the start line and are set to nothing. Other arguments, for which the parse pattern coincided, were true:


  "ship"=>true,
  "new"=>true,

And we have already received specific values ​​for the following template elements:


  "<name>"=>["Bystriy"],
  "--speed"=>"10"

The first value was set explicitly on the command line as argument substitution, and the second was an option with a default value.

Also note that the name of the current script can be calculated automatically.
For example, we can enter:


doc = """Naval Fate.
Usage:
  $(Base.basename(@__FILE__)) ship new <name>…
"""

An additional recommendation for placing a command line argument parser is to place it at the very beginning of the file. An unpleasant feature of Julia at the moment is quite a long connection of modules. For example, you using Plots; using DataFramescan send a script to wait for a few seconds. This is not a problem for server-side, once loaded scripts, but it will annoy users who just want to see a hint on the command line arguments. That is why, you first need to issue help and check the command line arguments, and only then proceed to download the necessary libraries.


Conclusion


The article does not pretend to the completeness of the consideration of all methods of analysis of arguments in Julia However, the options considered, in fact, cover 3 possible options. Completely manual analysis of the array ARGS. Strongly declared but automatically parsed arguments in ArgParse. And a fully declarative, though not a strict form of docopt. The choice of use case depends entirely on the complexity of the arguments being parsed. The docopt variant seems to be the easiest to use, although it requires an explicit type conversion for the values ​​of the arguments received. However, if the script does not accept anything other than the file name, then, quite well, you can use the issue of help on it using the usual function println("Run me with file name"), and the file names can be parsed directly fromARGS as it was shown in the first section.


Links



Also popular now: