I enjoy occasionally writing custom command line tools.
Some recent examples are
walk
(like find(1) but without options) and
stest
(like test(1) but as a Unix Filter).
Then a while ago I read about Rob Landley’s idea of
writing a words
tool
(like cut(1) but simpler)
and because I liked the idea I want to (re)implement this tool
while trying to see if I can keep it simpler than Rob’s C version.
We’ll start with Swift.
Swift
read/print
Words
is supposed to be a filter,
which means it should read from standard input and write to standard output.
Let’s get started by writing the simplest possible filter
that just echoes stdin unmodified,
like cat(1) does.
split/join
To solve the problem at hand we can process each line to
- split the line into words
- select some of those words
- in arbitrary orders
- allowing multiple selections
- combine the selected words into a new line
Let’s implement the easier part of this:
The difference in behaviour is that this version replaces sequences of whitespace with single space characters.
transparent/opaque
Which words are being selected (and their order) need to be
- indicated by the user in some way
- evaluated by the program in some way
We’ll handle the user interface later. The program can store the required information either as a (passive) data structure or as a(n active) function or object.
This illustrates a typical design conflict between… well, not between functional and object-oriented programming, which I believe is entirely independent from this. I’m referring to what Noel Welsh describes as “opaque and transparent interpreters”.
In our program the difference would look something like this:
transparent
opaque
The trade-offs boil down to the question: Which parts of our solution do we want to separate?
We’ll go for the transparent variant first and see where it takes us.
arguments
We want to enter the selection on the command line. I can think of two approaches to this:
- Each command line argument indicates a single selection, the program only ever operates on standard input.
- The first command line argument specifies all selections, the remaining ones indicate files to process.
We’ll go with the first approach here because I believe it’s easier to implement.
There’s a lot going on here, let’s walk through the code:
The first element of CommandLine.arguments
contains the path to the executable,
we’ll transform the remaining arguments into indices.
We convert string
to a number.
If that fails, like when string
doesn’t contain a number,
we print an error message
and abort with EX_USAGE
(from sysexits(3)).
Indices supplied on the command line should start at one, so we correct by one here to get array indices.
select
Once we have the indices, we can grab the corresponding elements from the array.
“fatal error: Index out of range”
This naive version of the program crashes as soon as one of the indices is out of range for any line. We can prevent this problem by ignoring indices that are out of bounds for a given line.
Range
One very useful feature of cut(1) is
that besides numbers you can supply ranges of numbers,
like -f 3-6
to print the third through sixth fields.
We can model these ranges with the Range
type
so let’s try adding that feature to our program.
Once again, the more complex part is parsing the indices:
We have extracted the parsing of a single number into an inner function that we can call multiple times.
As an inner function it has access to the parameters of its containing function,
and we can make use of that to provide better error messages.
This function also expects a default value that it return when one bound of the range is missing,
which means we can write 3-
to mean “every word starting with the third”.
For each range we parse the first and the last component as its lower and upper bounds (respectively),
using 1
and Int.max
as the default values,
and construct a CountableRange
from those bounds.
We only need to correct the lower bound by one
because CountableRange
expects the upper bound to be excluded.
To apply our selections we take each range,
trim it to the bounds of the array using clamp
,
and grab the corresponding elements from the array.
Since this returns collections instead of single words,
we call flatMap
instead of map
to flatten all those collections.
Conclusion
Here’s the code as a whole:
The change in requirements has staid pleasantly local in the two places that produce and consume the modified data. We left the body of the script unmodified since we had omitted the type signature of the changed data and since we didn’t need any other changes in processing.
The size of the script is comparable to the C code, although both versions support different features (our version supports ranges while the other one supports custom word separators) and use entirely different frameworks (the toybox infrastructure is intended for Unix command line tools, the swift standard library has a few bare bones provisions for them), so take this comparison with a huge grain of salt.
Future Work
I have a bunch of ideas for more work on this tool, including:
- support the
-d
flag - support a
-v
flag to invert range matching - support reverse ranges like
5-2
- print usage information, for example when no ranges were supplied
- rewrite the tool, maybe in Rust or Go
If you have more ideas, I’d love to hear about them!