When running Java applications in a Kubernetes environment (or any other container orchestrator), JVM won’t work well with default settings. The most common problems are: memory usage issues, imperfect CPU utilization, and suboptimal garbage collector choice and effectiveness. In this article, I will explain these issues and provide you with ways to optimize Kubernetes for Java Developers.
How much system memory can Java use on a machine (virtual or not)? It depends. The algorithm that determines the amount isn’t simple and sometimes changes with Java versions. How much memory should Java use on a typical server? It also depends. There may be other applications running, like a database, or other services.
How much memory should Java use in a container? In this case, the answer is simple. All that’s available! If you spawn a pod with a 1 GB memory limit and use only 256 MB, it’s a waste.
The default limits are very small, so you should always set the memory limits in your Java application when running it in a container.
However, setting the memory sizes so that your container limits are utilized to their fullest isn’t that easy. The system will consume some memory, some will be used to store class data, some will be used for stacks on threads, and only the rest of it should be used for heap.
Calculating memory usage
As a Java Developer, you don’t know precisely how much memory is required for each purpose, and even if you test it and use the obtained value, it might change under certain circumstances – like when switching container base image, java version, GC or simply when the application spawns additional threads under heavy load. Below are some hints for your own calculations.
System and class data
The system and class data will likely use just a few dozen megabytes. Reserving 50 MB for this purpose should be fine.
The amount used by threads will depend on how many threads you have in your application. The value of memory allocated may range from “a few” MB for backend queue processing services, to even hundreds for an HTTP REST service accessing a database or two. The default stack size varies but it is usually set to 1 MB, and so the memory usage can range from “a few” MB to a “hundred or more” MB. If you want to know how many threads your application is using, run it in debug mode, do some operations and pause the JVM. Then, use the obtained value multiplied by your own safety factor (like 2) to estimate the memory required.
Maximum heap size
The heap size is where most of your application’s data in memory is stored. This is the value you can really control, and perhaps the one that matters the most. It can be configured in two ways:
In most cases, however, you can simply use the magic range of 75-80% of total available RAM as the heap memory size. For a smaller container, like 256-512 MB, it should probably be 75%, and the larger the container, the more percent can be used by the heap, up to around 80% (though for some really large pods, it can be even more).
TL;DR: set your initial heap size to 75-80% of the memory available in the container.
Properly setting your container’s CPU limit is quite difficult. Just as with memory, there are ways to manually fine-tune the processor usage – this time, however, there is no solution that is right 90% of the time. It depends on your particular use case. Also, there is some insider knowledge you should have to make proper decisions.
The number of threads used in Java
Even if your application is just a single-threaded batch processor, in practice, it will always use more than one thread. In most cases, you will have dozens of threads: garbage collection threads, ForkJoinPool threads, DB connection pool threads, HTTP handler threads, spring cron threads, and so on.
The number of threads used by each library is different, can have default values, and can vary on hardware parameters. But it can also always be configured manually.
If you scale your REST service to many small pods, it makes sense to limit the HTTP worker threads, as having too many will only result in errors instead of delays. You don’t need to have more DB connections in a pool than there are possible DB connection users, like the HTTP worker threads.
TL;DR: Your application uses multiple threads and many of those threads spend a lot of time actually not using the CPU.
How container CPU limiting works in Kubernetes
You now know that you have many threads in your Java application. But why does it matter? Well, it’s important because of how the CPU is actually allocated for a pod in Kubernetes.
Imagine you set the limit to 1000m for your application. What does this really mean? Is exactly one CPU core allocated to your container for its entire lifetime? No. It’s not that simple.
The CPU limit for a container specifies how much total processing power your application can use each second (read more in the Kubernetes documentation). For example, if you have 5 threads, each doing CPU-intensive operations, and you set the limit to 1000m, it will cause your application to use 5 processors for the first 200ms of each second, and then wait for 800ms.
The number of processors is not explicitly limited – it’s the total processing power used that is measured and throttled each second. To be honest, the name chosen for this parameter in k8s is really misleading. It should be called processing power per second, CPUps, or something like that.
If your application handles REST requests and runs out of CPU, it will introduce a delay to each request that is handled this second equal to the amount of the missing CPU time for this second. The same applies to your event-handling application, processing each event will take longer, as the application will have to wait for the next second to continue work.
TL;DR: If your container CPU limit is too low, your application will be extremely slow under heavy stress. Your SLA commitment on request handling time will likely be breached.
If your application mostly just waits for database query results, you are unlikely to reach the limit. On the other hand, if you never actually reach the limit, is there any point in setting it unreasonably low? You likely scale your Kubernetes cluster to real usage, not theoretical limits anyway. There is little risk in setting the limit higher than the expected number used, as the processor will be throttled by your node’s hardware anyway.
Other impacts of Kubernetes CPU limits
How many processors does Java see when it runs in the container? As always, it depends. If everything works properly and you’re using a reasonably modern version of Java (if not, make sure to read my Java 8 to Java 17 migration guide), it sees ceil(containerCpuLimit). For 500m it would be 1 CPU, for 1000m – 1 CPU, for 1500m – 2 CPU.
The value is available at runtime via Runtime.availableProcessors() method. And it is read by many libraries when determining the parallelism level. Java’s ForkJoinPool uses it to determine core pool size, and so does Spring for its ThreadPoolExecutor.
So, we’ve established that the visible number of processors is usually low and that Java in a container can actually use more at once, if necessary. Low parallelism may actually introduce unnecessary computing throttling for a spike of processing in a short period of time. If you have few requests but handle them with a fork/join-based algorithm, you will wait longer than necessary. Remember, there is nothing else running on that container.
To avoid unnecessary throttling, increase the number of threads your application uses over the default, small number.
TL;DR: You can do either or both of these things:
- Set the number of CPUs visible by Java with the flag -XX:ActiveProcessorCount to ~2x the number of your container CPU limit;
- Set the number of threads explicitly for important pools in Java libraries (HTTP server worker pool, DP connection pool, spring executor pool).
Which garbage collector is your application using? You likely don’t know. Java chooses it for you, and thus – it depends.
From my experience, most Java containers run with around 1 GB RAM limit and 2 CPUs. For such a configuration, the default GC will be … the Serial GC. Which, usually, isn’t the optimal choice.
Serial GC is only good for small heap/single-core apps. As we discussed, any k8s-deployed Java application is in fact a multicore application, and there are always many threads running. You shouldn’t go for Serial GC unless you really know what you’re doing.
In most cases, you should go with Parallel GC for smaller heaps. It introduces little overhead and is multi-threaded; its pauses should be generally shorter than for Serial GC. It pauses the entire JVM for the garbage collection duration, and the pauses get longer with larger heap space. Enabling Parallel GC can be done through the -XX:+UseParallelGC flag.
With around 2-4 GB heap, you can consider switching to G1 (JDK<17) / Z (JDK 17+). Both those collectors have shorter pauses than Parallel GC. Z GC is a particularly interesting choice because of its very short pauses (should be shorter than 1 ms). With shorter pauses, however, comes a larger CPU overhead, so more of the CPU limit will be consumed by GC itself. You can enable G1 or Z garbage collectors by specifying, respectively, the -XX:+UseG1GC or -XX:+UseZGC flag.
TL;DR: Use Parallel GC for your typical 1GB RAM, 2 CPU Java containerized application.
When running your Java application in a container, make sure to allow it to use the allocated resources to the fullest. Specify heap size, think and test CPU usage, and set GC explicitly.
What if you just run your application with java -jar myapp.jar? It will work, but it won’t work well. It will likely use only 25% of available memory for heap, it will use imperfect Serial GC, and will probably not utilize CPU efficiently. You don’t want that, do you?
TL;DR: Use the following command when running Java in a container:
java -XX:MaxRAMPercentage=75 -XX:+UseParallelGC -XX:ActiveProcessorCount=<2x yourCpuLimit> myapp.jar
It will be better than the defaults 99% of the time.