tl;dr

Using GraalVM, I was able to take a small Java microservice running Kafka Streams and build it into a native application which doesn’t require a JVM to execute. Docker image sizes reduced to less than 1/3rd of the previous size. Application memory consumption dropped to perhaps 1/9th of previous, and CPU usage dropped to perhaps 1/4 of what it was. But beware: some things are not supported and you may need to change your source code to workaround problems. And be aware of third party libraries that may not be compatible.

History

When Java and the JVM were introduced, they solved a problem of being able to run “compiled” code on different hardware. The JVM abstracted away the hardware problem. Today, we’re using the JVM in more ways than was ever imagined. However, we don’t need the portability so much nowadays as that problem has been solved by containerisation and docker. If you’re working with microservices then your runtime is likely to be linux on docker, perhaps with kubernetes.

The “run anywhere” flexibility of the JVM is now often a drawback, as we need to bundle our applications onto an environment with a JVM installed. That means that we have a sizeable amount of software to install for even a “Hello World” program. Non JVM languages do not have this drawback, and simple programs are of a suitably small size.

Stage Right

Enter GraalVM, available in Community Edition or an Enterprise Edition from Oracle. It runs on linux or OSX, and a windows version is under development (an early adopter version is available).

In addition to interesting polyglot facilities, the compiler is able to take an existing Java program (say a fat Jar) and compile if from the Java bytecode into a native executable file, which does not require a JVM to run. That’s new (the first official “production” release was in May 2019) and pretty darn cool! It means that we can reduce the size of our docker images considerably. On top of this, the memory footprint of the application seems to be drastically smaller, CPU resources required are reduced and app start-up times are orders of magnitude better.

Versions

While I was working on this blog, GraalVM 19.1.0 was released. I used both 19.0.0 and 19.1.0 with the examples. I used GraalVM EE with the Mac/OSX native image, and GraalVM CE with the Docker native-image.

Moving in

Having already attempted to run IntelliJ as a native image, I tried a simple Java microservice that picks up XML in any format from an input Kafka topic and translates it into JSON which is written to an output Kafka topic. Fairly simple, and not something I’d want to be taking up much space – perfect for an experiment!

I quickly ran into some issues when attempting to compile the app to a native image. If GraalVM runs into problems that it can’t handle, it will create a “fallback” image. While it looks like a native executable, it still uses the JVM to execute. Not being what I was after, I played with the build options that can be supplied to the GraalVM compiler and got a standalone native image built.

However, when running, it failed with the following:

Exception in thread "main" java.lang.ExceptionInInitializerError
 	at com.oracle.svm.core.hub.ClassInitializationInfo.initialize(ClassInitializationInfo.java:290)
 	at java.lang.Class.ensureInitialized(DynamicHub.java:451)
 	at org.apache.kafka.streams.KafkaStreams.<init>(KafkaStreams.java:544)
 	at com.aimyourtechnology.xmljson.converter.ConverterStream.runTopology(ConverterStream.java:58)
 	at com.aimyourtechnology.xmljson.converter.ConverterApp.main(ConverterApp.java:36)
Caused by: org.apache.kafka.common.config.ConfigException: Invalid value org.apache.kafka.streams.errors.LogAndFailExceptionHandler for configuration default.deserialization.exception.handler: Class org.apache.kafka.streams.errors.LogAndFailExceptionHandler could not be found.
 	at org.apache.kafka.common.config.ConfigDef.parseType(ConfigDef.java:720)
 	at org.apache.kafka.common.config.ConfigDef$ConfigKey.<init>(ConfigDef.java:1091)
 	at org.apache.kafka.common.config.ConfigDef.define(ConfigDef.java:150)
 	at org.apache.kafka.common.config.ConfigDef.define(ConfigDef.java:170)
 	at org.apache.kafka.common.config.ConfigDef.define(ConfigDef.java:209)
 	at org.apache.kafka.common.config.ConfigDef.define(ConfigDef.java:371)
 	at org.apache.kafka.common.config.ConfigDef.define(ConfigDef.java:384)
 	at org.apache.kafka.streams.StreamsConfig.<clinit>(StreamsConfig.java:514)
 	at com.oracle.svm.core.hub.ClassInitializationInfo.invokeClassInitializer(ClassInitializationInfo.java:350)
 	at com.oracle.svm.core.hub.ClassInitializationInfo.initialize(ClassInitializationInfo.java:270)

The class it was complaining about certainly was present in the fat Jar. I tried using the compilation options for an “assisted build” and placed the generated config files in the project under “graalOutput”. You can find out more about the assisted build here and here.

To pick up these config files, my build command evolved into the following:

native-image -O0 -H:+ReportExceptionStackTraces -H:ConfigurationFileDirectories=./graalOutput --initialize-at-build-time -jar ./target/xmlJsonConverter-1.0-SNAPSHOT-jar-with-dependencies.jar ./target/macXmlToJsonConverter

This wasn’t enough, however. I was now seeing the following GraalVM native compiler error:

Error: com.oracle.graal.pointsto.constraints.UnsupportedFeatureException: Invoke with MethodHandle argument could not be reduced to at most a single call: java.lang.invoke.LambdaForm$MH.1921375740.invoke_MT(Object, Object, Object)
Trace:
        at parsing org.apache.kafka.common.record.CompressionType$5.wrapForInput(CompressionType.java:131)

Of the issues GraalVM has, it doesn’t like some code using reflection and it turns out that the Kafka Streams library isn’t compatible. I checked out the Kafka codebase to have a closer look, and following the diagnosis from a previous Jira ticket, I modified the code. This solved this problem successfully and I now had a standalone native image (for OSX at least). My native OSX build command now looked like this:

native-image -H:+ReportExceptionStackTraces -H:ConfigurationFileDirectories=./graalOutput --no-fallback -jar ./target/xmlJsonConverter-1.0-SNAPSHOT-jar-with-dependencies.jar ./target/macXmlToJsonConverter

This led to the next problem:

[main] ERROR org.apache.kafka.common.metrics.Metrics - Error when registering metric on org.apache.kafka.common.metrics.JmxReporter
java.lang.NullPointerException
        at org.apache.kafka.common.metrics.JmxReporter.unregister(JmxReporter.java:157)
        at org.apache.kafka.common.metrics.JmxReporter.reregister(JmxReporter.java:165)
        at org.apache.kafka.common.metrics.JmxReporter.metricChange(JmxReporter.java:85)
        at org.apache.kafka.common.metrics.Metrics.registerMetric(Metrics.java:568)
        at org.apache.kafka.common.metrics.Sensor.add(Sensor.java:246)
        at org.apache.kafka.common.metrics.Sensor.add(Sensor.java:227)
        at org.apache.kafka.common.network.Selector$SelectorMetrics.<init>(Selector.java:1132)
        at org.apache.kafka.common.network.Selector.<init>(Selector.java:178)
        at org.apache.kafka.common.network.Selector.<init>(Selector.java:214)
        at org.apache.kafka.common.network.Selector.<init>(Selector.java:227)
        at org.apache.kafka.common.network.Selector.<init>(Selector.java:231)
        at org.apache.kafka.clients.admin.KafkaAdminClient.createInternal(KafkaAdminClient.java:385)
        at org.apache.kafka.clients.admin.AdminClient.create(AdminClient.java:67)
        at org.apache.kafka.streams.processor.internals.DefaultKafkaClientSupplier.getAdminClient(DefaultKafkaClientSupplier.java:34)
        at org.apache.kafka.streams.KafkaStreams.<init>(KafkaStreams.java:713)
        at org.apache.kafka.streams.KafkaStreams.<init>(KafkaStreams.java:634)
        at org.apache.kafka.streams.KafkaStreams.<init>(KafkaStreams.java:544)
        at com.aimyourtechnology.xmljson.converter.ConverterStream.runTopology(ConverterStream.java:58)
        at com.aimyourtechnology.xmljson.converter.ConverterApp.main(ConverterApp.java:36)

Here, we’re seeing another “limitation” of GraalVM: it doesn’t support JMX. I’m not sure that its fair to say its a limitation though as it requires access to the Java bytecode. We got rid of that entirely by building a native executable! I made another modification to the Kafka source code and commented out the related JMX code.

Result

With the Kafka code modified I was now able to build a native image which ran successfully on both OSX and linux in Docker. Mission accomplished! The next step was to look at resource use.

Memory Leak?

Watching the native image process on my mac showed that the memory consumption seemed to continually grow, despite not even sending any messages through Kafka for processing. Further investigation showed that running the app on a normal JVM showed similar symptoms initially, but then stabilised. This led me to consider garbage collection and heap size settings. By adding an “Xmx’ option to limit the heap size, I could see the garbage collection running more frequently. I applied this to each different way I had of running the application and the memory usage immediately stabilised.I experimented with different sizes of heap and in the end I settled on -Xmx48M, though I could probably reduce it further if there was a significant benefit.

The Stats

Running on Mac
Arguments Memory Usage Physical Footprint CPU Usage
JVM -Xmx48m Real: 370MB; Private: 337MB; Shared: 25MB 343M 0.6%->3.7%
GraalVM Native Image -Xmx48m Real: 22MB; Private: 8MB; Shared: 1MB 10M 0.4%
Running on Docker
Arguments Docker Image Size Memory Usage CPU Usage
JVM -Xmx48m 114MB 73MiB 1.5%->6.8%
GraalVM Native Image -Xmx48m 32.5MB 8MiB 1.5%

Kafka change?

I raised a JIRA ticket for Apache Kafka, and have forked the codebase. At the time of writing I have not raised a Pull request as I’m looking for some feedback first. If it looks like there is a future for it then I’ll have a shot at moving the changes to live behind a feature flag so that JVM users don’t lose the JMX feature! You can watch its progress and vote for it here.

And onto Quarkus

Part 2 of the blog will cover a closer look at Quarkus – a “Kubernetes native Java stack tailored for GraalVM & OpenJDK HotSpot”. I’ll take the same application and build it using the Quarkus framework rather than Kafka Streams. Stay tuned!