OCaml : Parsing a program’s arguments with the Arg module

French version here

The OCaml language has a rather big number of modules in its standard library, which can serve to do almost anything you could want.

In the following, we’re going to see quickly how to use the Arg module to declare and manage optional parameters (some just call them options) of an OCaml program.

One of OCaml’s sexy logos, making other languages jealous

The other motivation of this article is that I myself been at a lost when searching how to do it… it took me a while to figure it out.

This article is addressing, as usual, to people who already know the language (here OCaml). I won’t re-explain how it works (documentation on the Internet is abundant, though).

The Arg(h) module

An IT student unable to make the Arg module work

To parse command-line interface arguments, Arg is the module that you want ! Let’s read the module’s documentation on INRIA’s website:

Parsing of command line arguments.

This module provides a general mechanism for extracting options and arguments from the command line to the program.

Syntax of command lines: A keyword is a character string starting with a -.

An option is a keyword alone or followed by an argument.

The types of keywords are: Unit, Bool, Set, Clear, String, Set_string, Int, Set_int, Float, Set_float, Tuple, Symbol, and Rest. Unit, Set and Clear keywords take no argument. A Rest keyword takes the remaining of the command line as arguments.

Every other keyword takes the following word on the command line as argument.

Arguments not preceded by a keyword are called anonymous arguments.

What does it tell us ?

  • It is the module to parse command-line arguments (that’s it, we’re at the right place !)
  • When using this module, we call a string beginning by a dash (-) a keyword
  • Inside it, keywords are separated into types (we’ll come back to it later)
  • The arguments which don’t come after a keyword are called anonymous.

OK, so by reading the rest of the documentation, we learn plenty of interesting things on the types defined by the module. We’ll have the occasion of coming back on some of them.

The function interesting us here is Arg.parse. Not Arg.parse_argv, like we could expect ! Because, by reading its documentation, we learn that Arg.parse_argv is actually meant to parse a string array like it was the true argv, the string array containing the real program’s parameters. No: the function that we’ll be really using to parse the program’s arguments is Arg.parse.

Arg.parse

Its documentation is pretty dense:

Arg.parse speclist anon_fun usage_msg parses the command line. speclist is a list of triples (key, spec, doc).

key is the option keyword, it must start with a ‘-‘ character. spec gives the option type and the function to call when this option is found on the command line. doc is a one-line description of this option. anon_fun is called on anonymous arguments.

The functions in spec and anon_fun are called in the same order as their arguments appear on the command line.

If an error occurs, Arg.parse exits the program, after printing to standard error an error message as follows:

  • The reason for the error: unknown option, invalid or missing argument, etc.
  • usage_msg
  • The list of options, each followed by the corresponding doc string. Beware: options that have an empty doc string will not be included in the list.

For the user to be able to specify anonymous arguments starting with a -, include for example (« -« , String anon_fun, doc) in speclist. By default, parse recognizes two unit options, -help and –help, which will print to standard output usage_msg and the list of options, and exit the program. You can override this behaviour by specifying your own -help and –help options in speclist.

… and maybe not very understandable at first. Let’s break up into steps what it does and what it is. First its signature:

teaches us that it is a function taking three parameters: a (key * spec * doc) triplet list, an anon_fun and an usage_msg, and that it returns nothing (unit).

Geez, what are all those weird types ?

For a better understanding of this declaration, we must carefully read the documentation of the Arg module. It then appears clearly that all these types are created by Arg:

  • key, doc and usage_msg are just aliases on normal strings
  • spec is a pretty special variant, which permits to specify what to do when we encounter a certain argument. We’ll come back to it further
  • anon_fun is declared in the documentation as being of type string -> unit. Which means it actually describes a function taking a string parameter and returning nothing (the type name here invites us to declare it as an anonymous funtion, directly in the call to Arg.parse with the fun keyword e.g., but it’s not mandatory).

What’s the funniest programming language ever ? OCaml, because it’s just pure fun !

How to use it ?

These elements in mind, we can reread the function’s definition and better understand it.

The (key * spec * doc) triplet list that Arg.parse awaits as first parameter is in fact where we’ll be able to declare all our keywords (to use the module’s vocabulary), put a small explanation of what they’re for, and tell to Arg.parse what to do when it encounters a keyword. It is here that the variant spec becomes important: by looking at Arg documentation, we find that it’s just a bunch of aliases (a variant) that permits us to make a different action according to the type in which we want to represent a given keyword. For example, the spec Arg.Int will permit us to call a function taking an integer parameter, once Arg.parse will have converted the argument of the specified keyword into an int. Same thing for Arg.String, but for strings… With Arg.Set and Arg.Clear, the Arg module even proposes to set a boolean flag to true or false for us when it encounters the appropriate keyword.

Besides, the function anon_fun is called each time Arg.parse encounters an anonymous argument, that is, if you followed well, any argument which doesn’t come after a keyword.

The function documentation also tells that it by default proposes the keywords -help and –help. These two options, which have the same effect, will print the string usage_msg to show how to use the program, before listing all the possible options.

The example !

Let’s take an example to use all of this : we want to create MyLs2000, a revolutionary program that will list files inside a directory (wow). It will accept three parameters:

  • -v, to activate verbose mode
  • -n, to specify a maximum number of files to display (it’s a revolutionary program)
  • -d, to give a directory name of which list the files

At the moment, we don’t manage the anonymous arguments, so we’ll just print them out on the standard output with print_endline (after all, it’s a function having the right string -> unit signature asked by Arg.parse prototype!).

Which gives us the following first version:

 

Let’s compile the file to test our code:

 

Note that thanks to its spec mecanism, Arg.parse avoids us to have to check by ourselves the user input’s validity against what we expect. It also checks that a keyword expecting a parameter actually has one:

 

Let’s go further with our example

The most careful of you will have noticed that our example is broadly improvable.

For example, for what they do, our functions set_max_files nbr_of_files and set_directory are totally useless, we could just replace them respectively by the spec Set_int and Set_string. We could by the way change the function that manages the anonymous arguments: instead of just printing them out, we could also print that they are anonymous arguments beforehand.

In addition to making this little edition, we’ll now study three pretty special spec, on which the documentation passes very shortly but that can reveal theirselves as very powerful tools: Arg.Tuple, Arg.Symbol et Arg.Rest. Let’s first introduce what they’re for:

  • Arg.Tuple : it’s the spec that permits us to make a keyword taking several separated parameters
  • Arg.Symbol: it’s the spec that permits us to accept only a specific set of parameters for a keyword
  • Arg.Rest: it’s the spec that permits us to say « after this keyword, stop the parsing and send all the following to whatever function ».

In order to show how we can use them, let’s improve our example. Suppose we want to add three options to MyLs2000:

  • the option -t, which permits to list only files created a specific time. The option take as parameters an hour and a number of minutes, space-separated
  • the option -s, which permits to sort the listed files according to three ways: alphabetically, chronologically, or by owner.
  • and finally the option «  » (double dash), which permits to say « Stop ! » and to print everything that comes after it on the standard output.

The three specs we have previously seen serve exactly to do what we want ! Our modified example to integrate the new options now looks like that:

 

Several things worth noting here:

  • as the Arg.Tuple type is actually just a spec list, all we have to do is insert heree the list of operations we want to do with our keyword, by their appearance order.
  • Arg.Symbol is of type string list * (string -> unit), which means it expects a string list containing the acceptable options, and the function called if the supplied argument is part of this list. You can see that in our example, it’s the sort_files function, which the module has being of type string -> unit, that will be called in such situation.
  • also, a small example of using one of the exceptions the module declares, Arg.Bad, in a case that will normally never be reached (because Arg.parse will have checked for us)
  • Arg.Rest is pretty simple to understand: upon encountering the keyword « –« , we stop everything, and the rest of the following keywords will be passed, one by one, to the print_endline function.
  • we eventually replaced the print_endline function by an anonymous one to print out the anonymous arguments, for better clarity

We can make some more tests to convince ourselves our new additions work as expected:

 

I don’t do an exhaustive test here, I think you understood how and why it works this way… I let you copy-pasta the code on your machine and try it if you have a doubt or wanna test by yourself 😉

To go (even) further

Well, not too far either ! Hey ! Come back !!

Et voilà, we have seen most of the powerful features of the Arg module’s parse function. Of course, there are much more, but it’s usually the one we use. Speaking of argument parsing, there also exist modules like Getopt or OptParse. I never used them, but long story short, they are alternatives to Arg which work more closely like the getopt GNU command. If one day, you’re stuck with the Arg module, you may want to give these two a chance.

Pretty randomly while I was searching, I found a presentation by the university of Valenciennes (in french) that makes a pretty exhaustive use of the parse function (around the middle).

To finish, I’ll let the last words of the article to Joe :

« See you, tuaregs ! And remember: smoking kills ! »

Une réflexion au sujet de « OCaml : Parsing a program’s arguments with the Arg module »

  1. Barth

    Bien plus appétissant que la doc officielle !

    Juste comme ça, le bout de code :
    let main = ...
    risque de ne pas marcher (il me semble).
    Il faut expliciter que la fonction prend unit en paramètre non ?

    Merci pour le tuto 😉

    Répondre

Laisser un commentaire

Votre adresse de messagerie ne sera pas publiée.