Type-safe SQL queries in Java and Rust

Part 1: Type-safe HTTP routing in Java and Rust
Part 2: Type-safe HTML templates in Java and Rust
Part 3: Type-safe SQL queries in Java and Rust (this article)

In the previous posts in this blog series, we looked at the benefits and drawbacks of build-time checks and code generation for HTTP request routing and HTML templating. In this part, we will look at the final missing piece of a typical web stack: SQL queries.

Although the mainstream approach within the Java community is to rely on ORM and avoid writing SQL as much as possible, in this blog post, we’re only interested in approaches that are very close to SQL. This means that we will examine two kinds of tools: those that require you to write SQL queries as strings and those that provide a DSL that has a very straightforward mapping to the corresponding SQL query in text form.

Raw SQL in Java

If you want to stay close to the language of the relational database, one option you have in Java is to write your SQL queries as strings. This option is always available, whether you use old blocking JDBC, or R2DBC for non-blocking interaction with the database. For example, using the battle-tested JDBC Template approach provided by Spring, you could write code like this:

@GetMapping(value = "/messages/{id}", produces = "text/plain")  
@ResponseBody  
String message(@PathVariable("id") UUID identifier) {  
    return Optional.ofNullable(jdbcTemplate.query("SELECT body FROM message WHERE id = :id",  
            new MapSqlParameterSource("id", identifier),  
            rs -> { return rs.getString("body"); }  
    )).orElseThrow(() -> new ResponseStatusException(HttpStatus.NOT_FOUND, "message not found"));
}

For this and all other examples in this blog post, assume that our database schema is defined by the following two database migrations:

-- migration 1:

CREATE TABLE message (
	id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
	author TEXT NOT NULL,
	topic TEXT NOT NULL,
	body TEXT NOT NULL
);

-- migration 2:

CREATE TABLE comment (
	id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
	author TEXT NOT NULL,
	message_id UUID NOT NULL,
	body TEXT NOT NULL
);

If you like staying close to the SQL dialect spoken by your relational database, this approach is great. However, it’s not by any means type-safe — you could put arbitrary garbage into your SQL string that you pass to the query method, and your application would compile and build just fine.

On a side note, the one safety advantage that the JDBC Template approach gives you over putting together your SQL queries via string concatenation is protection against SQL injection. However, this is a feature that all of the approaches we look at in this blog post have in common.

Generating code from your SQL schema

If you want more or less full control over the SQL you write, but also more build-time assistance for composing your SQL queries correctly, the Java ecosystem provides a few interesting options that rely on code generation based on your SQL schema. The most popular tools that follow this approach are JOOQ and Querydsl. According to its website, JOOQ allows you to “build type safe SQL queries through its fluent API”. Querydsl, too, is advertised as “safe” on its website.

Let’s see how type-safe they really are. In this blog post, we’ll use code examples based on Querydsl, but JOOQ works very similarly.

With Querydsl, you can write the previous example like this:

import static com.example.demo.db.QMessage.message;

@GetMapping(value = "/messages/{id}", produces = "text/plain")  
@ResponseBody  
String message(@PathVariable("id") UUID identifier) {  
    return Optional.ofNullable(queryFactory  
        .select(message.body)  
        .from(message)  
        .where(message.id.eq(identifier))  
        .fetchOne()).orElseThrow(() -> new ResponseStatusException(HttpStatus.NOT_FOUND, "message not found"  
    ));
}

Here, message is statically imported from the QMessage class. With QueryDSL or JOOQ, the idea is to generate Java source code, typically one class for each table in your schema. This generated Java code is a representation of your schema, helping you to write SQL queries.

The code generation requires a connection to a database with the current schema required by your application. This step can happen manually, on a developer’s machine, or automatically, as a build step that happens before compilation. In the latter case, the generated sources would not be under version control.

But how much more type-safe is this approach really? In most cases, the generated code will not allow you to pass in values of the wrong type in the SELECT or WHERE clauses, preventing a certain class of errors from slipping through. For example, you cannot pass an int value to the eq method of a column that is of type TEXT — represented as String in the generated code. However, because UUID columns are not represented in a type-safe manner, but using the type Object, this does not hold true for all queries.

Moreover, the generated code and the DSL provided by these libraries do not prevent you from accidentally writing SQL queries that violate your database schema. Consider this variation of the previous code example:

import static com.example.demo.db.QMessage.message;
import static com.example.demo.db.QComment.comment;

@GetMapping(value = "/messages/{id}", produces = "text/plain")  
@ResponseBody  
String message(@PathVariable("id") UUID identifier) {  
    return Optional.ofNullable(queryFactory  
        .select(message.body)  
        .from(message)  
        .where(comment.messageId.eq(identifier))  
        .fetchOne()).orElseThrow(() -> new ResponseStatusException(HttpStatus.NOT_FOUND, "message not found"  
    ));
}

Here, we build a condition for the WHERE clause that refers to the message_id column defined in the comment table. However, the comment table does not appear in the FROM clause of our query. And yet, the compiler is satisfied, letting you ship this invalid query.

JPA Criteria Query

Although this blog post focusses on non-ORM approaches, it’s worth having a quick look at JPA’s Criteria Query with the optional static meta model. This meta model is generated from the database schema, just as with Querydsl or JOOQ, and its goal is to provide a type-safe API for creating SQL queries. Indeed, our example doesn’t look that different from what we did with Querydsl:

@GetMapping(value = "/messages/{id}", produces = "text/plain")
@ResponseBody
String messages(@PathVariable("id") UUID identifier) {
    final var cb = em.getCriteriaBuilder();
    final var query = cb.createQuery(Message.class);
    final var messages = query.from(Message.class);
    final var result = em.createQuery(query
                    .select(messages)
                    .where(cb.equal(messages.get(Message_.id), identifier))
            ).getSingleResult();
    return result.getBody();
}

However, JPA Criteria Query API are not as type-safe as promised. For example, the equal method on the CriteriaBuilder allows you to pass in objects of any type as a second parameter, so that the following would accepted as well:

@GetMapping(value = "/messages/{id}", produces = "text/plain")
@ResponseBody
String messages(@PathVariable("id") int identifier) {
    final var cb = em.getCriteriaBuilder();
    final var query = cb.createQuery(Message.class);
    final var messages = query.from(Message.class);
    final var result = em.createQuery(query
                    .select(messages)
                    .where(cb.equal(messages.get(Message_.id), identifier))
            ).getSingleResult();
    return result.getBody();
}

Here, we compare a UUID with an int, and thanks to way the API of JPA Criteria Query is designed, the compiler sees no reason to complain.

Clearly, JPA Criteria Query with a static meta model is not a big win in terms of type safety and cannot compete with Querydsl or JOOQ.

Beyond Java: Compile-time verified SQL

We have seen that in Java, we can make use of code generation in order to reduce the chances of writing invalid SQL queries. However, the class of bugs we can prevent this way is relatively small. Can Rust, with its macros and strong type system, do better?

There are a few libraries in the Rust ecosystem that let you write SQL queries with an SQL-like DSL, for example SeaQuery or Diesel. The latter, however, follows an ORM approach and uses code generation, the former requires you to write some boilerplate to represent your schema in Rust. Also, both of them provide about the same capabilities in terms of type-safety as Querydsl or JOOQ, so we will not examine them further in this blog post.

Instead, we are going to take a look at SQLx. This library is all about writing your database queries as plain SQL strings, just as you do in Java with Spring JDBC Templates:

#[get("/messages/<id>")]
async fn message(mut database: Connection<Gagyolo>, id: Identifier) -> Option<String> {
    let message = sqlx::query!("SELECT body FROM message WHERE id = $1", id.value)
        .fetch_one(&mut **database)
		.await
		.ok();
    return message.map(|record| record.body);
}

However, in contrast to Spring JDBC Templates, SQLx allows for compile-time verification of your queries by means of the sqlx::query! macro.

In order for this to work, you need an active connection to a database with the current schema of your application while your program is being compiled. This is similar to what is needed by Querydsl or JOOQ. The only difference is that it’s needed in the compilation phase, not in the code generation phase of your build. In practice, this connection is only required during local development, while you are potentially making changes to the code interacting with the database. In your build pipeline, you will never make any changes to your code whatsoever, so you can use the offline mode of sqlx, which means that it will rely on a cached version of the results of the SQL query analysis.

So how type-safe is the query! macro? It provides syntactic and semantic verification of your SQL queries. Like Querydsl, JOOQ, and similar tools, it will prevent you from passing in parameters of the wrong type. In goes further than that, though. You will not be able to write SQL queries that contain references to non-existing columns, or references to tables in the SELECT clause that are missing in the FROM clause. Basically, any query that is not consistent with your database schema is denied — in most cases, with really helpful error messages.

There are limits to what bugs we can prevent at compile-time, though. Not every query that is abides by our database schema is also correct. Let’s say we want to do a left outer join of a message with all the comments for that message. The correct query would look like this:

SELECT m.topic, c.body 
FROM message m LEFT OUTER JOIN comment c ON m.id = c.message_id 
WHERE m.id = ? LIMIT 10

One probable mistake we could make is to use a wrong id column in the join condition:

SELECT m.topic, c.body
FROM message m LEFT OUTER JOIN comment c ON m.id = c.id 
WHERE m.id = ? LIMIT 10

Now the message id needs to be equal to the comment id, which will not yield any results. The types are right, the query is valid, but it’s absolutely not what we wanted.

The verdict

If you want to write plain SQL queries for your database access layer in Java, the compiler will not help you at all in preventing you from making mistakes. Using code generation based on your database schema, certain classes of bugs can be prevented, most notably those related to passing in parameters of the wrong type, or returning values of the wrong type. It is still possible to create invalid queries, though, for example by referring to tables in the SELECT clause that are not mentioned in the FROM clause.

However, the real benefit of having generated code and an SQL-like DSL may be that, since you use plain Java classes and methods to write your query, you get all the usual Java autocomplete suggestions. You can effectively let your IDE guide you in order to come up with valid queries.

Also, since SQL queries and their components are represented by Java expressions and Java types, it’s very easy to define re-usable elements that can be composed to various different queries in flexible ways. This is just not possible with a String-based approach, or at the very least, it becomes ugly and error-prone fairly quickly. While this is a definitive advantage of approaches relying on code generation, it’s only marginally related to type-safety and build-time verification.

One might argue that if you have a sufficiently good IDE, you can get most of the benefits when writing plain SQL queries with Spring JDBC Templates. After all, IntelliJ IDEA, when fed with a connection to your local database, can provide autocomplete suggestions for inline SQL strings as well, and inform you about errors in your SQL query.

So, just looking at the options we presented from the Java ecosystem, if you are fine with relying on a specific IDE vendor and don’t care that much about composability of queries from fragments, the raw SQL approach can be a reasonable choice. Introducing code generation will also complicate your development workflow, and possibly your build configuration and pipeline. There is also a chance that you introduce errors by accidentally generating code from an outdated schema, or a schema that was never committed to version control, depending on the workflow you choose.

And what about Rust and sqlx? The idea that you can completely prevent developers from expressing invalid queries at compile-time sounds very attractive, and it’s really impressive how many mistakes can be caught at compile-time by the Rust compiler with sqlx.

However, it has the same potential for errors as the code generation approaches: You need to ensure that the schema to which the compiler connects is really the correct one, and if you want to avoid the need to have a connection to a database in your build pipeline, some manual steps by the developers are necessary that are as error-prone as the steps required with Querydsl or JOOQ.

Also, sqlx does not offer the benefit of code completion suggestions. You can’t discover the schema via autocomplete suggestions and let the IDE guide you — you need to know your database schema.

One might also argue that there is no point in preventing invalid queries at build-time, because the database access layer should be covered with integration tests anyway, and those would catch such obvious mistakes. That is true, but it’s true for Querydsl and JOOQ as well. Also, with sqlx, you get instant feedback in your code editor as soon as you write or edit a query so that it violates your database schema. This feedback is a lot faster than any integration test can ever be, which is very valuable — compile-time verification does not replace integration tests, but increases your productivity.

Finally, there is the question of whether the benefits are worth the costs of a more complicated build pipeline or an additional manual step for developers, which can lead to mistakes as well. Those costs are pretty much the same, whether you use sqlx, code generation with Querydsl or JOOQ, or a connection to a local database from IntelliJ Idea. The benefits, on the other hand, depend on what is more important to you: being guided by your IDE via code completion suggestions, or getting instant feedback about a much broader set of errors, effectively preventing you from expressing any illegal queries.

You also need to take into account how often your team actually makes mistakes that can be prevented by these tools. You will not always find those mistakes in your Git history, because they may have been fixed before being pushed to the remote repository.

Critics of these tools argue that most mistakes regarding SQL queries are of the kind that cannot be caught even by sqlx, like our left outer join example. In my experience, mistakes related to the database schema occur often enough — time time spent with finding and fixing those errors is just not always visible, because the fixes often happen before a developer pushes their changes to the remote repository. If you have a chance to prevent such mistakes in the first place, with instant feedback in your IDE, you should absolutely make use of it.

Blog Post