kokorins March 26, 2013 at 22:20

A brief introduction to boost :: program_options

Tutorial

Engaged in the development of algorithms, I constantly pull myself up, and suddenly the changes that work on a small example will bring confusion and reeling in the results on other, big data. Then the command line comes to my aid. The worst thing is that every time we are tired of implementing the argument parser, which means that the program_options package from the boost library is not the last resort for a C ++ programmer .

Let's start with an example. Suppose I am developing a recognition algorithm for something with training, and we have the following data. Files with some data and the extension .dat (data); files with training information and the extension .trn (train) and parameter files with the extension .prs (parameters). Parameter files are obtained as a result of training and are used for recognition. So, we have 3 actions: train (train), recognize (recognize), score (evaluate the quality of recognition). In this case, the training, recognition, assessment chain call script looks, for example, like this:

  recognizer --type=train --input=train.dat --info=train.trn --output=best.prs
  recognizer --type=recognize --input=test1.dat --input=test2.dat --params=best.prs --output=./
  recognizer --type=score --ethanol=test1_expected.trn --test=test1.trn --output=scores.txt
  recognizer --type=score --ethanol=test2_expected.trn --test=test2.trn --output=scores.txt

In the example, a parameter file is created from the data file and the training file, then the parameter file is used to recognize another data file, the recognition result is compared with the standard and appended to the end of the file with the results. In order to program all this logic of parsing the command line using program_options, nothing is required:

  po::options_description desc("General options");
  std::string task_type;
  desc.add_options()
    ("help,h", "Show help")
    ("type,t", po::value(&task_type), "Select task: train, recognize, score")
    ;
  po::options_description train_desc("Train options");
  train_desc.add_options()
    ("input,I", po::value(), "Input .dat file")
    ("info,i", po::value(), "Input .trn file")
    ("output,O", po::value(), "Output parameters file .prs")
    ;
  po::options_description recognize_desc("Recognize options");
  recognize_desc.add_options()
    ("input,I",  po::value >(), "Input .dat file")
    ("params,p", po::value(), "Input .prs file")
    ("output,O", po::value(), "Output directory")
    ;
  po::options_description score_desc("Score options");
  score_desc.add_options()
    ("ethanol,e",  po::value(), "Etalon .trn file")
    ("test,t", po::value(), "Testing .trn file")
    ("output,O", po::value(), "Output comparison file")
    ;

The description of valid command line arguments includes information about their types, a short verbal description of each of them and some grouping. Checking the conversion of argument types minimizes the worry about incorrect data. A short description allows you to systematize information and practically avoid comments, and grouping allows you to separate mandatory arguments from optional ones. Let's take a closer look at a specific line:

  ("input,I", po::value(), "Input .dat file")

The first argument is input, I actually are two variants of the argument: input is the long name of the argument, I is short (case is important). The peculiarity of boost :: program_options is that the short name should always be single-letter (it is true, you can not specify it). A call to a long name on the command line will look like this:

  --input=train.dat

A short argument passing, less readable at first glance, but I prefer to use it:

  -Itrain.dat

The second parameter is po :: value() defines the format of the argument value (the part after the equal sign) and may be absent if no value is to be transmitted. For example, the following calls are equivalent:

  recognizer --help
  recognizer -h

If you look more closely, you will notice that in the recognize group, the input argument is of type:

 po::value >()

std :: vectormeans that input can appear in command line arguments more than once, that is, in our case it is possible to recognize more than one file at a time. For instance:

 recognizer --type=recognize -itest1.dat -itest2.dat -pbest.prs -O./

The third and final parameter is the description. A very useful point, especially when you need to count something else six months after writing the last line in recognizer. In our case, the help output will look something like this:

me@my: ./recognizer -h
General options:
  -h [ --help ]         Show help
  -t [ --type ] arg     Select task: train, recognize, score
Train options:
  -I [ --input ] arg    Input .dat file
  -i [ --info ] arg     Input .trn file
  -O [ --output ] arg   Output parameters file .prs
Recognize options:
  -I [ --input ] arg    Input .dat file
  -p [ --params ] arg   Input .prs file
  -O [ --output ] arg   Output directory
Score options:
  -e [ --ethanol ] arg  Etalon .trn file
  -t [ --test ] arg     Testing .trn file
  -O [ --output ] arg   Output comparison file

Let's move on to parsing the command line arguments. The first thing to do is to find out the task that the recognizer should execute:

  namespace po = boost::program_options;
  po::variables_map vm;
  po::parsed_options parsed = po::command_line_parser(ac, av).options(desc).allow_unregistered().run();
  po::store(parsed, vm);
  po::notify(vm);

We only pass General options as an argument template. Without a call to allow_unregistered boost :: program_options will swear for extra arguments that are not described in the template, in which only the type of operation and help. After executing this code, the task_type variable is filled and you can write “switch”:

  if(task_type == "train") {
    desc.add(train_desc);
    po::store(po::parse_command_line(ac,av,desc), vm);
    train(vm);
  }
  else if(task_type == "recognize") {
  //...
  else {
    desc.add(train_desc).add(recognize_desc).add(score_desc);
    std::cout << desc << std::endl;
  }

The appropriate group is added to the template and the command line arguments are completely parsed without exception. The vm variable is a dictionary with a string key and boost :: any as values. help, as you can see, is obtained practically for nothing.

Let's take a closer look at the train (vm) procedure to understand how to get values from the resulting dictionary.

  void train(const po::variables_map& vm)
  {
    std::string input_path, info_path, output_path;
    if (vm.count("input")) {
      input_path = vm["input"].as();
    }
    if(vm.count("info")) {
      info_path = vm["info"].as();
    }
    if(vm.count("output")) {
      output_path = vm["output"].as();
    }
    //...
  }

As you can see, everything is simple, however, note that the arguments must be accessed by their full name, and not by the line passed in the description. Compare “info, i” and just “info”.

Conclusion

The full version of the example can be found on pastebin . This is not all the features of the library, but those who are interested already in the middle have gone to read the official documentation.

Benefits:

intuitiveness (at least for me)
self-sufficiency (comments, types, names and groups out of the box)
work with arguments and configuration files (although this was not covered)

Disadvantages:

scanty documentation
requires linking of binaries (compared to many other boost packages)
only one letter short argument names

Tags:

A brief introduction to boost :: program_options

Conclusion

Also popular now: