Project Loom with Spring boot: performance tests
Today I would like to test if Project Loom is ready to replace Spring WebFlux(the most popular efficient reactive framework) for writing high-throughput concurrent applications
Problems of Reactive/Non-blocking services
WebFlux is great, the performance is fantastic, but:
- Functional code is difficult
- Difficult to debug
- Bad stack trace
- Clients/libs should be async/reactive as well
A few facts about Project Loom:
- Preview feature since Java 19, implementation was started in 2017
- Virtual threads: lightweight threads that dramatically reduce the effort of writing, maintaining, and observing applications.
- Can create millions of virtual threads
- Context switching is fast
- Code changes are minimum
- The virtual thread stack is stored in JVM heap
Looks like Project Loom can solve all main problems of Reactive/Non-blocking code. But what about performance?
Test scenario
We are going to test the performance of the service which just proxies the request to one more service that replays with the expected 500ms delay.
The all source code is here
We are going to test 3 implementations:
- Spring Boot (Tomcat)+Project Loom
- Spring Webflux
- Spring Webflux + Project Loom
Hardware:
- All tests will be processed in AWS
- Our service uses t2.micro node (1 CPU, 1 GB)
- Third-party service is using t2.medium node (2 CPU, 4GB)
- Load is made from external one more EC2
Tomcat + Loom implementation
Customization of Spring Boot with Tomcat application to support Project Loom:
@Configuration
public class Config {
@Bean
AsyncTaskExecutor applicationTaskExecutor() {
// enable async servlet support
ExecutorService executorService = Executors.newVirtualThreadPerTaskExecutor();
return new TaskExecutorAdapter(executorService);
}
@Bean
TomcatProtocolHandlerCustomizer<?> protocolHandlerVirtualThreadExecutorCustomizer() {
return protocolHandler -> protocolHandler.setExecutor(Executors.newVirtualThreadPerTaskExecutor());
}
}
Controller
@RestController
public class Controller {
private final RestTemplate restTemplate = new RestTemplate();
private final String host = "http://test:7000/address/";
@GetMapping("/address/{timeout}")
String getAddress(@PathVariable long timeout) throws URISyntaxException {
URI uri = new URI(host + timeout);
return restTemplate.getForObject(uri, String.class);
}
}
WebFlux implementation
For WebFlux implementation I’m using WebClient as HTTP client. To support a big number of concurrent requests I change the default settings.
@RestController
public class Controller {
private final WebClient webClient = init();
private final String host = "http://test:7000/address/";
private WebClient init() {
String connectionProviderName = "myConnectionProvider";
HttpClient httpClient = HttpClient.create(ConnectionProvider.builder(connectionProviderName)
.maxConnections(10_000)
.pendingAcquireMaxCount(10_000)
.pendingAcquireTimeout(Duration.of(100, ChronoUnit.SECONDS))
.build()
);
return WebClient.builder()
.clientConnector(new ReactorClientHttpConnector(httpClient)).build();
}
private Mono<String> getAddressInternal(long timeout) {
return webClient.get()
.uri(host + timeout)
.exchangeToMono(clientResponse -> clientResponse.bodyToMono(String.class))
.timeout(Duration.ofSeconds(200));
}
@GetMapping("/address-reactive/{timeout}")
Mono<String> getAddress(@PathVariable long timeout) {
return getAddressInternal(timeout);
}
}
WebFlux + Project Loom implementation
For this implementation, we are going to use blocking code but run this blocking code inside Executors.newVirtualThreadPerTaskExecutor(). The result of blocking code we just need to wrap into Mono.
@GetMapping("/address-loom/{timeout}")
Mono<String> getAddressWithLoom(@PathVariable long timeout) {
return Mono.fromFuture(CompletableFuture.supplyAsync(() ->
// Any blocking code here, can be Hibernate etc (that will
// not pin the system thread)
getAddressInternal(timeout).block(),
Executors.newVirtualThreadPerTaskExecutor()));
}
Performance tests result
Tomcat with Loom is not good enough. The throughput is low due to high GC activity (~50CPU).
Tomcat with Loom was not able to process more than 4k parallel users, for 8k users was OOM:
I analyzed the heap dump and almost all memory is loaded by tomcat SocketWrapper. I easy to explain, because Tomcat was designed for request-to-thread model, and sockets wrappers are too heavy and cannot be used with Loom.
Let’s compare WebFlux and WebFlux + Loom profiles
Memory, CPU and GC are the same. That is why we see almost the same throughput, but for 10k users Loom is even better!
Summary
Project Loom is a game changer. We proved that it’s very efficient and it allows us to write blocking, simple code that can be as fast as reactive/non-blocking code. It means that we can easily migrate all our blocking code to Loom and continue using blocking libs like Hibernate, etc. But we still need Netty+WebFlux as a wrapper for our blocking code, because Tomcat is not designed for it.
P.S. Project Loom limitations
Currently, Loom has one important limitation.
The system thread will be pinned if inside the virtual thread we have:
- Native code invocation
- Synchronized section/method. Solution: use ReentrantLock and -Djdk.tracePinnedThreads=full
It means that all current libraries should be fixed and synchronized blocks replaced with ReentrantLock For example https://github.com/pgjdbc/pgjdbc/issues/1951