Let’s start with some requirements:
- Imagine you need to maintain documents.
- The desired output format is PDF or HTML, although our approach could easily handle
*.docx
or LaTeX, but we will keep it simple for now. - Several people constantly provide updates to these documents, they need to collaborate without interfering with each other.
- Changes should be reviewed and approved by somebody else.
- From time to time, you need to release updated versions of your documents.
- For readers, this version number is important, therefore it needs to be contained within the documents.
- Maybe it is self-evident for you, but we strive for a high degree of automation. So please don’t come up with a “Save document as PDF” function within a word processor.
Just in case that a few of these requirements sound familiar to you, since source code needs to be maintained that way: Good news, you will recognize a few of our proposals.
Let’s visualize the situation:
- Figure 1 depicts a few authors that independently update distinct parts of an English and a German document.
- Figure 2 shows three hypothetial document releases with two languages.
What kind of documents?
We (Ben and Gernot) are (co-)authors and maintainers of a few documents, for example an extensive glossary of software architecture terminology [1] and a number of technical curriculae [2].
We maintain these documents (together with a group of additional authors) in English and German. Our problem is that we write and speak only these two languages, but you will see below that additional languages can be easily integrated.
Collaboration first
As software developers you will have experienced the numerous advantages of professional version control, namely git. Combined with services like Gitlab or Github, you get a rock-solid and proven platform for collaboration, including pull/merge requests (in our case: document reviews and approvals).
Therefore, we obviously maintain our documents on such a git platform.
Pull and merge requests require that differences between documents can be automatically determined, so the technical format for documents need to be plain text. A number of such formats are used in practice (see our explanatory box below). Several of these lack the babylonic features we require to process several languages automatically, which is why we decided to use AsciiDoc. AsciiDoc is open-source and provides several incredibly powerful features that will come in handy later on.
AsciiDoc HelloWorld
Using the Asciidoc processor (either on your favourite shell or wrapped in a build script), you get the following output from the text above:
We compiled the AsciiDoc with gradle
, using the following simple build file:
Split Documents into Parts
Now that we know how to create a document, let’s prepare for more complicated stuff.
At first, we should modularize our document and split it into distinct parts.
It’s like creating a larger software system from distinct components or modules, but for AsciiDoc documents.
Luckily, AsciiDoc comes with a highly practical feature called include
, which allows for modularization of documents – see the following diagram
Of course, these include directives may contain path or directory information so that you can organize your files in adequate ways.
Hey Babylon: Multiple Languages
For multiple languages, you have two different options to organize your content (explained in Fig. 4 for EN and DE, English and German):
- Put EN content in an English-only file tree, and DE German content in a second file tree.
- Put EN and DE content in the same files, and find a clever mechanism to separate these languages when creating output for a single language.
Let’s consider an important text passage in both English and German: (we took the liberty of using the introductory paragraph of the Agile Manifesto):
We have the two language versions next to each other, but we need to create an English-only output, without the German stuff in it.
Excursion: The C Preprocessor
A few old-generation developers might remember the days of the C programming language. Programs sometimes contained nerdy statements like the following:
In C or C++, these conditional includes are quite common. Sometimes, even the behavior of the compiler is controlled via such directives. We tell you this for a reason, just read on.
But We Are Writing Documents, Not C?
If we had a similar directive, a kind of conditional compilation, for our documents,
then we could for example write #ifdef ENGLISH #include page-1-EN.adoc
, and omit the other languages for a moment.
The AsciiDoc processors have learned their lessons from history, and came up with a conditional include on steroids: One can include specific parts of a file, for example just the English parts. Such include statements can even be written with variables, and these variables can be set during the build process. Wow!
AsciiDoc performs this magic by using tags, explicitly marked parts of a document. Here is a simple example:
We can then tell Asciidoc to pass the tag for EN
when including the file. See the following image.
Now our build script needs to iterate over all the desired output languages, call the Asciidoc transformer and create a distinct output for each one. The common build tools like Gradle, Maven, or make have their specific mechanisms, a detailed explanation would exceed the scope of this article. The structure of such a build script (in Gradle) looks as follows:
You find a specific task definition per language (here: EN and DE), where the generic RenderDocumentTask
gets called with the filename and the language as parameters.
The heavy lifting of AsciiDoc conversion is done by the AsciiDoctor Gradle plugin.
More Conditions Asciidoc offers additional options to include conditions in your documents: You can use
ifeval::
or the plain oldifdef::
But let’s have a look at a more realistic example.
Configuring the output
When we started with this toolchain, we knew that we had to find a way to be able to create either a PDF file or an HTML representation of our documents. Fortunately, asciidoc allows us to do both.
PDF files
Asciidoc allows you to create a PDF theme which is used to configure the output. It allows you to configure all sorts of stuff, like a cover image, position of elements on the pages, background images, and more. You can even use variables in the theme file, which are in our case filled with language dependent text, like the date in the footer (you can have a look at our PDF theme here). All you need to do is to tell the asciidoctor task where to look for the theme, that’s it. Let’s have a look at our gradle task to generate the PDF.
We removed everything from the task that is not relevant for the PDF creation (you can check the full file here). You have to enable pdf
as backend (line 11) and then set the name of the theme (pdf-style
), the directory where to look for the fonts that are used (pdf-fontsdir
), and the directory where to look for the theme (pdf-stylesdir
). Why are there two more lines that don’t seem to be related to PDF? Well, glad you asked!
HTML files?
The two additional lines you see in the code snipped above can be used to also style the HTML output. Asciidoctor has a default theme that is used for HTML output. If you want to adjust the result, all you have to do is to provide a css file that contains all the magic you want for your result. Enable html
as backend and tell asciidoc where to find the stylesheet (stylesheet
) and where to look for images or fonts that might be referenced in the stylesheet (stylesheet-dir
). You can check one of our examples below to see the PDF and HTML results.
Ok, that’s fine for a single project, but the Advanced Level has more than ten curricula, so we would have to copy the themes to each project. If we adjusted the PDF theme in one repository, how can we make sure that all other curricula also benefit from the changes?
A Family of Similar Documents
To be able to only define both the HTML theme and the PDF theme once, we moved them to separate repositories. These repositories are then linked in each curriculum repository as a submodule. This offers several advantages.
- There is only one place where we have to change the themes. If we’re working on a specific curriculum and want to improve on one of the themes, we can open the submodule and commit/push our changes.
- Owners (curators) of other curricula don’t have to think about doing the same changes. All they have to do is to update the respective submodule.
- Should owners of a curriculum not want to upgrade the themes for whatever reason, they can decide to just keep their submodules at the revision they are happy with.
We also identified the copyright of each curriculum as a candidate for a separate submodule. It is changed every year (to add the current year to it) and has to be done in each repository. Extracting the copyright file as submodule allows us to only change one single file. Everyone who updates their curriculum also updates the submodule to the latest revision, and that’s it.
Real World Examples
The Curriculum for Software Architecture, iSAQB CPSA-F®
Worldwide courses and classes in software architecture are taught based upon the iSAQB Software Architecture Foundation curriculum, guiding thousands of developers towards their “Professional for Software Architecture” certification, CPSA-F. Therefore, the iSAQB needs to provide versions in different languages, both in HTML and PDF formats. This curriculum consists of approximately 40 learning goals (LGs) in 5 parts, resulting in about 30 pages Every two years the iSAQB releases an updated version of the curriculum, based upon new ideas and input from the international software architecture community.
We (Ben and Gernot) belong to the core maintainers' group of this document.
Let’s dissect its structure:
- The entry point is the file
curriclum-foundation.adoc
, which contains a number of include statements. - The first is
setup.adoc
, which defines several variables that are used all around the document. Among others, the document type (book
), position of the table-of-contents (left
), and the location of the image directory. - Now a list of all learning goals is included. It is worth noting that this list is generated as part of the build process, to ensure we always have an up-to-date list of learning goals.
- Next all the chapters are included, one by one, which in turn include important terms, all learning goals, and the references helpful for this chapter.
This allows us to be able to change and review each single learning goal without conflicting with other learning goals of the document. We keep both the English and the German translation of a learning goal in a file, so if one language is changed, the other one is less likely to be omitted.
For translations in other languages, we added the possibility to easily upload PDF files to the repository which will be added to the next release automatically.
The Curricula of the iSAQB Advanced Level CPSA-A®
We use the same template for each advanced level module that we also described in the previous example. This ensures a clear and overarching design and structure of the documents, so that participants can navigate through the different modules at ease, always knowing where to find what. Updating the formatting is no real effort, since this is done via the submodules. Only changes to the build environment or GitHub actions require manual adjustments in each repository.
A Large Glossary
We maintain a glossary of software architecture terminology (available for free from the iSAQB), with close to a dozen authors. A few parts of this document change quite frequently (new terms are added, explanations are updated), others are highly stable (e.g. the introduction, copyright notice and authors' biographies).
We maintained this glossary in GitHub before, but we had to manually create a PDF and upload it to Leanpub. The current approach with Asciidoc and our build pipeline allows us to create a new release by creating a new git tag and pushing it to GitHub. That’s it.
Summary
You can maintain multi-lingual documents with a pragmatic, simple and free (as in open-source) toolchain, that is developer-friendly and proven in practice. Business- and other non-IT people might miss their favorite word processing tool, but the benefit of multiple languages organized along the principle one fact, one place will help you in the long run. Until then - may the power of expressive wording be with you.