It all started with an innocuous snippet of Idris posted on Twitter. But before we get into details, we need to talk about format strings and first-class types.
Format strings are an archetypal example demonstrating dependent typing. It’s a simple but telling use case. In most languages we program in we don’t have access to first-class types. As a result, we have to rely on various tools to help us program with format strings. Those tools may be embedded in our compilers, IDEs, or other analyzers. In any case, they are not intrinsic to the language.
Here, for instance, a C compiler is warning us of an invalid call to printf
.
That error got caught at compile time. Unfortunately, sometimes the feedback loop is not as tight. Let’s consider this piece of Java interacting with a SQL database.
The code is invalid: there’s no fourth argument, and the two which remain are never set. In spite of errors, the compiler accepts the file without complaints. Unless our IDE warns us earlier, we have to count on feedback from our test suite or static code analysers like SonarJava.
In a language with first-class types the situation changes.
Types and the number of arguments to a function like printf
can be determined at compile time based on its first parameter: the format string.
Idris is a language with first-class types. In fact, the format string problem appears as an example in Edwin Brady’s Type-Driven Development with Idris. I can’t recommend this book enough; get a copy if you have a chance.
The first five chapters of the book offer a gentle and unassuming introduction to the language. In chapter six the author leads us through a process of defining a printf
function which will fail to compile unless arguments match the specification in the format string. We can experiment with the printf
in the REPL:
What’s important to note is that the rules you see above are not embedded into the language, the standard library, or the compiler. We specify them ourselves by using Idris' type system. How? Let’s take a look, loosely following the solution from the book.
The implementation begins with a definition of a type representing our format string. We will limit ourselves to formats with %d
and %s
specifications.
We represent a format string "%d is a %s"
as a value Number (Lit " is a " (Str End))
.
Specifically:
- it begins with a number passed as an argument to
printf
, - then it continues with a literal constant string
" is a "
, - then it expects a string passed as another argument to
printf
, and - it ends.
Given a Format
like the one above we want to compute the type of our printf
function. We need a function for this.
We’ll call it PrintfType
, and its implementation might look as follows.
If the format is a Number
, expect an Int
argument and continue computing the type. If it’s a Str
, expect a String
argument. If it’s a literal, no argument is necessary but we still need to continue parsing; the rest of the format might need some arguments. Finally, at the end of the format we end up with a regular String
.
Using the REPL, we can verify that the type for our format is correct. It’s a type describing a function, taking an Int
and a String
and returning a String
.
Now we need a way of parsing our format string into its Format
representation.
Using unpack
we transform the given string into a list of characters. Such a list is much easier to process and pattern-match against than a normal string. After unpacking we parse the list using the following toFormat
function.
Let’s go through all the top-level cases.
- If no characters are left, we’re done and return
End
. - If the list begins with
"%d"
, we parse it as aNumber
. - If it begins with
"%s"
— it’s aStr
. - Otherwise, we read a literal string using the
lit
helper function.
lit
keeps reading from its first argument and accumulates read characters in the second one. Upon encountering a '%'
we know we have a complete literal. Using pack
— the inverse of unpack
— we transform what we have back into a string.
We return the string in a Lit
, with the rest of parsing being handled by a call to toFormat
.
Let’s try it out by parsing a string into a Format
and then computing its PrintfType
.
Great!
Now it’s time to wrap it up by declaring the type of printf
.
Can you see? printf
takes a String
as its first argument and the rest of its signature is computed from the first argument.
Let’s inspect it in the REPL!
The implementation of the printf
function remains an open topic, but we’ve solved the most interesting part of the problem. Code using our printf
will fail to compile if our arguments don’t match the format string. Using exactly the same approach we could bullet-proof our functions for building
SQL statements.
At this point, I can’t recommend Type-Driven Development with Idris enough. In chapter six you will find a far more precise discussion of printf
’s implementation, including an alternative definition of toFormat
.
Having an understanding of how printf
works, we are well-equipped to attack the next problem: format strings with precision specifications, such as "%0.3f"
.
We’ll also feature the aforementioned snippet from Twitter. But for now, let’s take a break, have a walk outside, and let what we saw sink in. We’ll be back soon. Until then!