AWS Lambda battle 2021: performance comparison for all languages (cold and warm start)

Let’s compare the performance of all supported runtimes + 2 custom runtimes (Rust and GraalVM).

Will compare cold start and warm.

Source code is here: https://github.com/Aleksandr-Filichkin/aws-lambda-runtimes-performance. It requires the minimum local setup(almost all is Dockerized)

  • NodeJs (14.x)
  • Python (3.9)
  • Go(1.x)
  • Ruby(2.7)
  • .Net(3.1)
  • Java (11)
  • Rust(1.54.0)
  • GraalVM(21.2)

Disclaimer:

All benchmarks were performed in September 2021

I’m not an expert in all these languages and I’m happy to see MR in GitHub repo with performance improvements. I’m going to support these repo and run the perfomance test every 3 months. I believe in opensource collaboration :)

Test scenario

We are going to test API-Gateway -> AWS Lambda->DynamoDb flow.

We will test only POST endpoint which will save the book into the DynamoDb table in the known AWS region(us-east-2).

The main flow

Cold start test

I did all my best to reduce the cold start:

  • Removed useless dependencies.
  • Move as much as possible to the initialization phase(for example, in Java move everything to static) to use CPU burst on startup.
  • Specified the Region.
  • Got rid of any DI frameworks

The detailed information about cold start read here.

Cold start result
Cold start result
  • All languages(except Java and .Net) have a pretty small cold start.
  • Java even cannot start with 128Mb. It needs more memory. But GraalVM can help in this case. Feel free to read a detailed page about GraalVM and AWS Lambda
  • Rust beats all runtimes for all setups, the only exception is 128 MB where Python is the best.
  • The huge setup helps only for Java and .Net.

WARM test

The test is to send 15.000 requests to each lambda one by one.

For the load test, I’m using JMeter. It looks like:

  • The average(per minute) duration for each language (256MB setup,(128MB short result you can find at the end)
  • The maximum(per minute) duration for each language (256MB setup)

NodeJS

NodeJS has an expected behavior.

First times it’s slow, but after JIT optimization it becomes better:

NodeJS 256MB average duration
NodeJS 256MB maximum duration

Python

Has a stable performance: 100th and 15000th invocations are the same.

Python 256MB average duration
Python 256MB maximum duration

Ruby

I observe very weird behavior for Ruby: average duration is growing up(looks like a memory leak or bug in code)

Ruby 256MB average duration
Ruby 256MB maximum duration

.NET

The first ~1k invocations are slow, but then it has very good performance:

.Net 256MB average duration
.Net 256MB maximum duration

Golang

Stable briliant performance:

Golang 256MB average duration
Golang 256MB maximum duration

Java

The first ~1k iterations are slow, then it becomes faster(JIT C1 helps).

Java 256MB average duration
Java 256MB maximum duration

For Java I expected C2 JIT optimization after 10k iterations, but there is no optimization even after 20k invocations and duration is the same. See the screen below:

Java 256 MB, no C2 optimization.

GraalVM:

As expected, GraalVM has stable good performance from the very beginning.

GraalVM 256MB average duration
GraalVM 256MB maximum duration

Rust

Rust has a constant awesome performance.

Rust 256MB average duration
Rust 256MB maximum duration

All together

It’s very tricky to measure average performance because every new lambda has a bit different result (I believe it’s because lambdas are allocated on different hardware). I run the test 3 times with 30 min delay between tests to have 3 different lambdas allocations.

5K iteration for 3 timeslots(256MB Lambda)
256 MB Lambda

Also, I tested the same flow for 128MB lambda. And here we can see a big difference.

128MB average warm state
128MB average warm state
128MB maximum(per minute) warm state

I assume for CPU-intensive flow the difference between compiled and interpreted languages will be much bigger. I guess, GraalVM doesn’t perform well for 128 MB, because it still has JVM inside and it needs too much memory and Lambda performs to often GC.

Conclusion:

  • All languages(except Java and .Net) have a pretty small cold start.
  • Java even cannot start with 128Mb. It needs more memory. But GraalVM can help in this case.
  • Rust beats all runtimes for all setups for cold start, the only exception is 128 MB where Python is the best.
  • Golang and Rust are the winners. They have the same brilliant performance.
  • .Net has almost the same performance as Golang and Rust, but only after 1k iterations(after JIT).
  • GraalVM has a stable great performance almost the same as .Net and a bit worse than Rust and Golang. But it doesn’t perform well for the smallest setup.
  • Java is the next after GraalVM.The same as .Net, Java needs some time(1–3k iterations) for JIT(C1). Unfortunately for this particular use case, I was not able to achieve the expected great performance after JIT C2 compilation. Maybe AWS just disabled it.
  • Python has stable good performance but works too slow for the 128 MB
  • Ruby has almost the same performance as Python, but we see some duration growing after 20 min invocations(after 15k iteration).
  • NodeJs is the slowest runtime, after some time it becomes better(JIT?) but still is not good enough. In addition, we see the NodeJS has the worst maximum duration.

Cold+warm start winners are Golang and Rust. They are always faster than other runtimes and demonstrated very stable results.

Check my next performance comparison for AWS Lambda: x86 vs ARM https://filia-aleks.medium.com/aws-lambda-battle-x86-vs-arm-graviton2-perfromance-3581aaef75d9

Java, AWS expert