Dieser Artikel ist auch auf Deutsch verfügbar
Since a new JDK is released every six months, the number of completed Java Enhancement Proposals (JEPs) is quite massive. At first glance, it looks like we have to learn a lot of new things on a regular basis to stay up to date.
However, thanks to the concept of Incubator Modules and Preview-Features, this quantity remains manageable. Moreover, the intermediate stages can often be ignored, and it is sufficient to deal with a feature when it is ready. Between JDK 17 and 21, a total of 38 JEPs were completed, but 23 of them were incubators or previews and with JDK 21 there were a total of 15 finished JEPs.
Furthermore, there are always several JEPs that are important for JDK and its ecosystem but are less relevant for application developers.
And yet, in my opinion, it always makes sense to take an early look at unfinished features or those that don’t have a direct impact on the development of applications. In this article, we will thus take a closer look at the Class-File API, which is subject to a second preview version with JEP 466 in JDK 23.
A bit of history first
Java, as you probably know, is compiled into bytecode that is executed by the JVM at runtime. Consequently, there is always a requirement in the JVM ecosystem to read, generate, or transform bytecode. This ranges from JDK itself to libraries or frameworks and tools for code analysis as well as other languages implemented on the JVM. Currently, there are various libraries for these tasks, such as ASM, cglib, or ByteBuddy; and even JDK already contains three implementations for this purpose.
Because of this uncontrolled development, there has long been a need for an official API for these tasks. And the shortened release cycle of the JDK has increased this need even more, since it puts pressure on the existing libraries to create a new release as quickly as possible after each release in order to support the new version. This is necessary to ensure that the new bytecode version is correctly read and supported and to support new bytecode instructions.
This is why an official API within JDK makes sense. It can be expanded during JDK development and support a new JDK version from the outset. It can also be used immediately within the JDK.
Another option would have been to integrate one of the existing libraries into the JDK. ASM would have been an obvious choice here, since JDK already has a fork in its code base and some of the other libraries are also based on ASM. However, since ASM is an old code base, its API no longer meets modern requirements.
Consequently, a first preview for the completely new Class-File API was delivered with JEP 457 in JDK 22, which is now making a second round as a preview feature in JDK 23 with JEP 466, before it will probably reach its final entry into JDK in JDK 24 with JEP 484.
Basics of the API
The new Class-File API lives in package java.lang.classfile
and is essentially based on three abstractions. The first one is the element, which can be either a single bytecode instruction or an entire class. Each element is immutable and some, such as classes or methods, can contain additional elements as children.
Builders are used to create the elements. They make it possible to use a series of methods to define the properties of the element to be created.
The third and final abstraction is transformations. They are usually methods that accept an element and a builder.
The API relies on the new features of JDK such as lambdas, records, sealed classes, and pattern matching, which makes it appear very modern, as will be seen below. Class java.lang.classfile.ClassFile
and its static methods always serve as the main entry point into the Class-File API.
Reading bytecode
If we want to read bytecode with the API, we can call the parse
method with either a byte array or a java.nio.file.Path
and receive a java.lang.classfile.ClassModel
as the result. We can use this result to analyze the bytecode. Listing 1, for example, reads all fields and methods of java.lang.String
and outputs them.
try (var in = String.class.getResourceAsStream("/java/lang/String.class")) {
var classModel = ClassFile.of().parse(in.readAllBytes());
System.out.println("## Methods");
classModel.methods().stream()
.map(method -> method.methodName().stringValue())
.map(methodName -> " * " + methodName)
.forEach(System.out::println);
System.out.println("## Fields");
classModel.fields().stream()
.map(field -> field.fieldName().stringValue())
.map(methodName -> " * " + methodName)
.forEach(System.out::println);
}
Instead of obtaining the methods or fields directly, in many cases it makes more sense (and is more convenient) to iterate over all child elements and use pattern matching to select the correct elements. Listing 2 shows how this procedure can be used to collect the dependencies of a class.
try (var in = String.class.getResourceAsStream("/java/lang/String.class")) {
var classModel = ClassFile.of().parse(in.readAllBytes());
var deps = classModel.elementStream()
.flatMap(ce -> ce instanceof MethodModel mm
? mm.elementStream() : Stream.empty())
.flatMap(me -> me instanceof CodeModel com
? com.elementStream() : Stream.empty())
.<ClassDesc>mapMulti((element, c) -> {
switch (element) {
case InvokeInstruction i -> c.accept(i.owner().asSymbol());
case FieldInstruction i -> c.accept(i.owner().asSymbol());
default -> {}
}
})
.collect(Collectors.toSet());
deps.forEach(System.out::println);
}
Here we iterate over all elements of class java.lang.String
and only keep the elements of type java.lang.classfile.MethodModel
, i.e. methods. We also iterate over their children and only keep the java.lang.classfile.CodeModel
elements. Of these elements, we then remember the java.lang.classfile.InvokeInstruction
and FieldInstruction
elements or the name of the owner. These elements are finally collected as java.util.Set
, which eliminates duplication, and then output on the standard output.
Generate bytecode
Analogous to reading, we also start to generate bytecode with ClassFile.of
, but then use one of the build
methods, for example buildTo
(Listing 3).
var system = of("java.lang", "System");
var printStream = of("java.io", "PrintStream");
ClassFile.of().buildTo(
Path.of("Echo.class"),
of("Echo"),
classBuilder -> classBuilder
.withMethodBody(
"main",
MethodTypeDesc.of(CD_void, CD_String.arrayType()),
ACC_PUBLIC | ACC_STATIC,
codeBuilder -> codeBuilder
.getstatic(system, "out", printStream)
.aload(codeBuilder.parameterSlot(0))
.iconst_0()
.aaload()
.invokevirtual(printStream, "println", MethodTypeDesc.of(CD_void, CD_String))
.return_()));
A class Echo
is generated here in the file Echo.class. The method main
is added to it. This method uses the return type void
and a parameter of type String[]
and features the modifiers public
and static
. The body of the method first packs the static variable out
of class java.lang.System
onto the stack. Then we add the array from the first method parameter and the constant 0
. aaload
is now used to load the value from the 0 slot of the array. With this value, the println
method is then activated on the previously loaded static variable. Finally, the method must call return
in order to be terminated.
If we execute this class, it returns the first argument to the standard output:
$ java -cp . Echo Hallo Welt
Hallo
In addition to the low-level bytecode instructions used so far, the Class-File API also contains higher-level methods that save us a great deal of work and may generate several low-level instructions. For example, Listing 4 generates the bytecode shown in Listing 5 and thus spares us from having to define an if-else.
var system = of("java.lang", "System");
var printStream = of("java.io", "PrintStream");
ClassFile.of().buildTo(
Path.of("Foo.class"),
of("Foo"),
classBuilder -> classBuilder
.withMethodBody(
"main",
MethodTypeDesc.of(CD_void, CD_String.arrayType()),
ACC_PUBLIC | ACC_STATIC,
codeBuilder -> codeBuilder
.iconst_1()
.ifThenElse(
t -> t
.getstatic(system, "out", printStream)
.loadConstant("True")
.invokevirtual(printStream, "println", MethodTypeDesc.of(CD_void, CD_String)),
f -> f
.getstatic(system, "out", printStream)
.loadConstant("False")
.invokevirtual(printStream, "println", MethodTypeDesc.of(CD_void, CD_String)))
.return_()));
$ javap -c Foo.class
public class Foo {
public static void main(java.lang.String[]);
Code:
0: iconst_1
1: ifeq 15
4: getstatic #10 // Field ...
7: ldc #12 // String True
9: invokevirtual #18 // Method ...
12: goto 23
15: getstatic #10 // Field ...
18: ldc #20 // String False
20: invokevirtual #18 // Method ...
23: return
}
Transforming bytecode
The last area that the Class-File API supports is the transformation of bytecode. Of course, this could also be accomplished through reading and then writing, but the direct support in the form of transform*
methods makes it even more convenient with the API.
For example, the code in Listing 6 can be used to remove all static calls to the debug
method of a class Logger
.
var classModel = ClassFile.of().parse(Path.of(".../Application.class"));
var removeDebugInvocations = MethodTransform.transformingCode(
(builder, element) -> {
switch (element) {
case InvokeInstruction i when
i.opcode() == Opcode.INVOKESTATIC
&& i.owner().asInternalName().equals(".../Logger")
&& i.method().name().equalsString("debug") ->
builder.pop();
default ->
builder.accept(element);
}
});
var newClassBytes = ClassFile.of().transformClass(
classModel,
ClassTransform.transformingMethods(removeDebugInvocations));
try (var out = new FileOutputStream(".../Application.class")) {
out.write(newClassBytes);
}
If we use this string to transform the class Application
shown in Listing 7, the bytecode shown in Listing 8 is created.
public class Application {
public static void main(String[] args) {
Logger.debug("Starting");
Logger.info("Greeting");
System.out.println("Hallo Welt!");
Logger.debug("Finished");
}
}
$ javap -c .../Application.class
Compiled from "Application.java"
public class ....Application {
public ....Application();
Code:
0: aload_0
1: invokespecial #1 // Method java/lang/Object."<init>":()V
4: return
public static void main(java.lang.String[]);
Code:
0: ldc #7 // String Starting
2: pop
3: ldc #15 // String Greeting
5: invokestatic #17 // Method .../Logger.info:(Ljava/lang/String;)V
8: getstatic #20 // Field ...
11: ldc #26 // String Hallo Welt!
13: invokevirtual #28 // Method ...
16: ldc #33 // String Finished
18: pop
19: return
}
In this example, we remove the calls to the debug
method, but the loading of the arguments is retained. Although they are then immediately removed from the stack by pop
, they should still be optimized for real use.
Use
Using the Class-File API in application development will probably be rather limited. After all, very few applications will need to read, write, or transform bytecode to achieve their goal.
However, integration of such an API into JDK can help various tools get by with one less dependency in the future. At the same time, they can benefit from the fact that they do not have to push out a new release as quickly as possible after every new JDK.
The typical use cases here are probably static analysis of code and dynamic generation of code at runtime. Certain aspects, such as transaction control or instrumentation for observability, are candidates that can benefit from this API.
However, the API is also very convenient and useful for compiling other or new languages into bytecode and executing them on the JVM. The blog post “Build A Compiler With The JEP 457 Class-File API” by Dr. James Hamilton illustrates nicely how the API can be used to write a compiler for the Brainf**k language. Listing 9 shows a shortened version of the code created there to give an impression.
public class BrainFckCompiler {
static final ClassDesc SYSTEM = ClassDesc.of("java.lang", "System");
static final ClassDesc PRINT_STREAM = ClassDesc.of("java.io", "PrintStream");
static final int DATA_POINTER = 0;
static final int MEMORY = 1;
public static void main(String[] args) throws IOException {
var input = Files.readString(Path.of(args[0]));
var bytes = ClassFile.of()
.build(
ClassDesc.of("BrainFckProgram"),
classBuilder -> classBuilder
.withMethodBody(
"main",
MethodTypeDesc.of(CD_void, CD_String.arrayType()),
ACC_PUBLIC | ACC_STATIC,
codeBuilder -> {
codeBuilder
.sipush(30_000)
.newarray(ByteType)
.astore(MEMORY);
codeBuilder
.iconst_0()
.istore(DATA_POINTER);
generateInstructions(input, codeBuilder);
codeBuilder.return_();
}));
generateJarFile(args[1], bytes);
}
static void generateInstructions(String input, CodeBuilder codeBuilder) {
input.chars().forEach(c -> {
switch (c) {
case '>' -> move(codeBuilder, 1);
case '<' -> move(codeBuilder, -1);
case '+' -> increment(codeBuilder, 1);
case '-' -> increment(codeBuilder, -1);
case '.' -> printChar(codeBuilder);
default -> {}
}
});
}
static void move(CodeBuilder codeBuilder, int amount) {
codeBuilder.iinc(DATA_POINTER, amount);
}
static void increment(CodeBuilder codeBuilder, int amount) {
codeBuilder
.aload(MEMORY)
.iload(DATA_POINTER)
.dup2()
.baload()
.loadConstant(amount)
.iadd()
.bastore();
}
static void printChar(CodeBuilder codeBuilder) {
codeBuilder
.getstatic(SYSTEM, "out", PRINT_STREAM)
.aload(MEMORY)
.iload(DATA_POINTER)
.baload()
.i2c()
.invokevirtual(PRINT_STREAM, "print", MethodTypeDesc.of(CD_void, CD_char));
}
static void generateJarFile(String jarFileName, byte[] classContent) throws IOException {
var manifest = new Manifest();
manifest.getMainAttributes().put(MANIFEST_VERSION, "1.0");
manifest.getMainAttributes().put(MAIN_CLASS, "BrainFckProgram");
try (var os = new JarOutputStream(
new BufferedOutputStream(new FileOutputStream(jarFileName)), manifest)) {
os.putNextEntry(new JarEntry("BrainFckProgram.class"));
os.write(classContent);
os.closeEntry();
}
}
}
A JAR file is generated that contains a class BrainFckProgram
and defines it as Main-Class
. The Class-File API is used to generate this class. Two variables are defined: a byte array with 30,000 slots and an integer as a pointer. The source code is then run through character by character.
If a <
or >
appears, the pointer is increased or decreased by 1 using iinc
. If a +
or -
is read, the value in the byte array must be increased or decreased at the current position of the pointer. To do this, the byte array is first placed on the stack using aload
and then the position of the pointer using iload
.
These two values are then duplicated using dup2
. The baload
instruction now ensures that the value is actually read from the byte array. The last two values on the stack are removed and replaced by the read value. Subsequently, we place the value to be increased or decreased on the stack using loadConstant
. This is again a high-level method that generates different bytecode depending on the data type. We can now add the last two values using iadd
and save them using bastore
. This works because we have previously duplicated the byte array and the pointer using dup2
, allowing bastore
to use the last three frames. In the end, the .
is interpreted, which ensures that the value for the current pointer is output. The use of this compiler and the subsequent execution can be seen in Listing 10.
$ cat hallo.bf
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++++++.
-------.
+++++++++++.
.
+++.
$ java --enable-preview -cp . BrainFckCompiler hallo.{bf,jar}
$ java -jar hallo.jar
HALLO
There are use cases for this API, even if we probably rarely come into direct contact with it in application development. The API provides a good foundation for them, even if knowledge of bytecode and its instruction is still required.