Project Loom with Spring boot: performance tests

Aleksandr Filichkin
4 min readDec 3, 2022

--

Today I would like to test if Project Loom is ready to replace Spring WebFlux(the most popular efficient reactive framework) for writing high-throughput concurrent applications

Problems of Reactive/Non-blocking services

WebFlux is great, the performance is fantastic, but:

  • Functional code is difficult
  • Difficult to debug
  • Bad stack trace
  • Clients/libs should be async/reactive as well

A few facts about Project Loom:

  • Preview feature since Java 19, implementation was started in 2017
  • Virtual threads: lightweight threads that dramatically reduce the effort of writing, maintaining, and observing applications.
  • Can create millions of virtual threads
  • Context switching is fast
  • Code changes are minimum
  • The virtual thread stack is stored in JVM heap

Looks like Project Loom can solve all main problems of Reactive/Non-blocking code. But what about performance?

Test scenario

We are going to test the performance of the service which just proxies the request to one more service that replays with the expected 500ms delay.

The all source code is here

We are going to test 3 implementations:

  1. Spring Boot (Tomcat)+Project Loom
  2. Spring Webflux
  3. Spring Webflux + Project Loom
Test workflow

Hardware:

  • All tests will be processed in AWS
  • Our service uses t2.micro node (1 CPU, 1 GB)
  • Third-party service is using t2.medium node (2 CPU, 4GB)
  • Load is made from external one more EC2

Tomcat + Loom implementation

Customization of Spring Boot with Tomcat application to support Project Loom:

@Configuration
public class Config {
@Bean
AsyncTaskExecutor applicationTaskExecutor() {
// enable async servlet support
ExecutorService executorService = Executors.newVirtualThreadPerTaskExecutor();
return new TaskExecutorAdapter(executorService);
}

@Bean
TomcatProtocolHandlerCustomizer<?> protocolHandlerVirtualThreadExecutorCustomizer() {
return protocolHandler -> protocolHandler.setExecutor(Executors.newVirtualThreadPerTaskExecutor());
}
}

Controller

@RestController
public class Controller {
private final RestTemplate restTemplate = new RestTemplate();
private final String host = "http://test:7000/address/";

@GetMapping("/address/{timeout}")
String getAddress(@PathVariable long timeout) throws URISyntaxException {
URI uri = new URI(host + timeout);
return restTemplate.getForObject(uri, String.class);
}
}

WebFlux implementation

For WebFlux implementation I’m using WebClient as HTTP client. To support a big number of concurrent requests I change the default settings.

@RestController
public class Controller {

private final WebClient webClient = init();
private final String host = "http://test:7000/address/";

private WebClient init() {
String connectionProviderName = "myConnectionProvider";
HttpClient httpClient = HttpClient.create(ConnectionProvider.builder(connectionProviderName)
.maxConnections(10_000)
.pendingAcquireMaxCount(10_000)
.pendingAcquireTimeout(Duration.of(100, ChronoUnit.SECONDS))
.build()
);
return WebClient.builder()
.clientConnector(new ReactorClientHttpConnector(httpClient)).build();
}

private Mono<String> getAddressInternal(long timeout) {
return webClient.get()
.uri(host + timeout)
.exchangeToMono(clientResponse -> clientResponse.bodyToMono(String.class))
.timeout(Duration.ofSeconds(200));
}

@GetMapping("/address-reactive/{timeout}")
Mono<String> getAddress(@PathVariable long timeout) {
return getAddressInternal(timeout);
}

}

WebFlux + Project Loom implementation

For this implementation, we are going to use blocking code but run this blocking code inside Executors.newVirtualThreadPerTaskExecutor(). The result of blocking code we just need to wrap into Mono.

@GetMapping("/address-loom/{timeout}")
Mono<String> getAddressWithLoom(@PathVariable long timeout) {
return Mono.fromFuture(CompletableFuture.supplyAsync(() ->
// Any blocking code here, can be Hibernate etc (that will
// not pin the system thread)
getAddressInternal(timeout).block(),
Executors.newVirtualThreadPerTaskExecutor()));
}

Performance tests result

Tomcat with Loom is not good enough. The throughput is low due to high GC activity (~50CPU).

Tomcat +Loom for 4000 users

Tomcat with Loom was not able to process more than 4k parallel users, for 8k users was OOM:

Tomcat + Loom OOM for 8k users

I analyzed the heap dump and almost all memory is loaded by tomcat SocketWrapper. I easy to explain, because Tomcat was designed for request-to-thread model, and sockets wrappers are too heavy and cannot be used with Loom.

Tomcat +Loom: heap dump report

Let’s compare WebFlux and WebFlux + Loom profiles

Webflux: 8k users profile
Webflux+ Loom: 8k users profile

Memory, CPU and GC are the same. That is why we see almost the same throughput, but for 10k users Loom is even better!

Summary

Project Loom is a game changer. We proved that it’s very efficient and it allows us to write blocking, simple code that can be as fast as reactive/non-blocking code. It means that we can easily migrate all our blocking code to Loom and continue using blocking libs like Hibernate, etc. But we still need Netty+WebFlux as a wrapper for our blocking code, because Tomcat is not designed for it.

P.S. Project Loom limitations

Currently, Loom has one important limitation.

The system thread will be pinned if inside the virtual thread we have:

  • Native code invocation
  • Synchronized section/method. Solution: use ReentrantLock and -Djdk.tracePinnedThreads=full

It means that all current libraries should be fixed and synchronized blocks replaced with ReentrantLock For example https://github.com/pgjdbc/pgjdbc/issues/1951

--

--