Finding the optimal values for requests and limits of a microservice performing a load test.
Load testing generally refers to the practice of modeling the expected usage of a software program by simulating multiple users accessing the program concurrently.
Requests and limits are the mechanisms Kubernetes uses to control resources such as CPU and memory.
Requests are what the container is guaranteed to get. If a container requests a resource, Kubernetes will only schedule it on a node that can provide at least that resources.
Limits, on the other hand, make sure a container never goes above a certain value. The container is only allowed to go up to the limit, and then it is restricted.
CPU resources are defined in millicores. If your container needs two full cores to run, you would put the value "2000m". If your container only needs ¼ of a core, you would put a value of "250m".
One thing to keep in mind about CPU requests is that if you put in a value larger than the core count of your biggest node, your pod will never be scheduled.
Unless your app is specifically designed to take advantage of multiple cores, it is usually a best practice to keep the CPU request at 1 core ("1000m") or below, and run more replicas to scale it out. This gives the system more flexibility and reliability.
CPU is considered a "compressible" resource. If your app hits its CPU limit, Kubernetes will start throttling the container (in reality, if limit level is low, throttling can start much earlier, even before hitting requests level). Throttling means the CPU will be artificially restricted, giving your app potentially worse performance! However, it won’t be terminated or evicted.
Memory resources are defined in bytes. Normally, you give a mebibyte value for memory.
Just like CPU, if you put in a memory request that is larger than the amount of memory on your nodes, the pod will never be scheduled.
Unlike CPU resources, memory cannot be compressed. Because there is no way to throttle memory usage, if a container goes past its memory limit it will be terminated.
For further information check out this article:
To test a microservice, a series of operations need to be applied to load and stress that specific part of the project.
Usually, this process consists of:
Understanding the very purpose of the tested microservice
Finding specific APIs using that service
Creating multiple requests to those APIs
Testing the microservice-gateway
This service is responsible for the orchestration of API decorators.
In the Console Design area you can create a few microservices to implement decorators and make various requests to the endpoints associated with those services.
In order to reproduce the interaction of multiple users, we can leverage different load testing tools among which:
In this tutorial, we will focus on Locust as our main load testing tool. However, the following explanation can be useful even when using any other testing tool.
Locust is a Python library for mocking users making requests to your application.
To configure the types of actions and access points that need to be tested, Locust leverages a configuration file referred to as the Locust file.
A Locust file makes use of classes mocking different types of users, and each class defines a series of tasks to be reproduced by that user.
Further information regarding the creation of Locust files can be found in the official documentation.
Here’s an example of a Locust file instance:
You can use this Locust file as a template to test your application.
In this tutorial we will create a load-testing folder and save the file as my-locustfile.py
The class FastHttpUser defines a fast and lightweight user implementing a geventhttpclient. The property wait_time instead, will make the simulated users wait between 1 and 1.5 seconds after each task.
We defined two tasks by decorating functions with a tag @task and assigning a specific weight between brackets:
crud_service will be used to test services responsible for retrieving the collection my_collection from the database (please, keep in mind operating on a database during a load test may require an appropriate database scaling)
authorization_service will be used to test a login operation with an appropriate client-key
Tasks are picked at random, however, the weight determines the probability of that task to be executed. In this configuration, it will be two times more likely for Locust to pick authorization_service than crud_service.
Once you created a Locust file, you need to launch a Locust instance to test your application. You can execute a Locust instance locally by calling:
Following the previous example you can execute:
Now, you can access your Locust instance by connecting to
This will prompt a new Locust window where you can specify:
- users: the total number of users testing your application. Each user opens a TCP connection to your application and tests it.
- spawn rate: the number of users that will be added for each second to the current users, until the total amount of users is reached.
- host: the host of the application you wish to test.
Later in the process we will learn how to start a Locust instance from the cluster. However, launching a Locust instance locally is a necessary step to assess the correctness of your test configuration.
Select a value for users and spawn rate. In this example we choose 50 users and a spawn rate of 10.
When executing a Locust instance locally, it might be convenient to access the desired host using localhost. In our case, this might be easily achieved connecting to an existing cluster and using port forwarding to redirect a service to our local machine:
For instance, to redirect the service api-gateway from development namespace you can execute:
so you can use
http://localhost:8080 as your host value.
You can now click Start Swarming and begin your test.
Here’s how a working example of a Locust swarm test should look like. Once made sure your test configuration is correct, you can move to the execution of a swarm test directly from the cluster.
On most occasions, it will be useful to have access to cluster resources to run your tests (mocking thousands of requests per second can be pretty computationally demanding!).
In this case, an instance of Locust needs to be run directly from the cluster.
This tutorial requires the usage of a preconfigured namespace.
In order to continue with the tutorial, you may need to ask your Mia Platform referent to give you access to a namespace created for load testing purposes.
In this tutorial we will refer to a specific namespace called load-testing, however, the following explanation is valid for any namespace on K8s.
First you need to make sure to have a proper connection to Kubectl.
You can visualize your current configuration running:
Once you are connected to the cluster, you can create a config map containing your Locust instance with the following command:
This command will create a config map called YOUR_LOCUST_FILE on your K8s namespace.
You can later edit your configmap with:
or overwrite with:
If you are using load-testing namespace, and you want to create a my-loadtest-locustFile configmap you can execute:
and successively overwrite my-loadtest-locustfile with:
To apply these modifications, it might be necessary either to create a new pod or to restart the deployment mounting the config map you previously created.
You can create a new pod by deleting the existing one. Kubernetes will automatically start a new pod.
For instance, if you are using load-testing namespace you can execute:
to retrieve the names of the existing pods, then by executing
you can kill the pod
locust-worker-7f9f8c955f-l8nc4 and create a new one.
The easiest way to restart a deployment is setting its replicas to 0 and again to 1:
You can set replicas of a specific deployment by using:
When using load-testing, you can perform this operation with the following two commands:
Finally, you will need to access your Locust instance from your local machine.
The most straightforward approach is to redirect the cluster instance to a local port, which can be easily achieved with port forwarding.
If you are using load-testing namespace you can execute:
In this example we choose the default Locust port 8089, however, you can choose any port to achieve this result, as long as it matches the local port you access to.
Now you should be able to visualize the Locust interface:
In our tests, our goal is to identify requests and limits for multiple user categories.
We selected four levels of users representing different user bases. Thus, tests should be performed for each category and will have a different output for each one.
To model an increasing amount of users interacting with the console, we will keep spawning a small number of new users every second. This will help us testing real case scenarios where the load on microservices sums up over time. The closer this value is to the total amount of users, the steepest the curve will be in reaching that value.
Finally, we exploit Kubernetes DNS records for services and pods, so we can contact services with consistent DNS names instead of IP addresses.
In our tests we will use the following values:
value selected for users or less
(typically 5 - 10 users per second)
e.g. for the api-gateway running on the namespace "my-namespace":
A useful tool that can be used to track system performances is Grafana.
For the cluster handled by Mia Platform, a series of dashboards has been setup to be ready to use for testing purposes.
First, you have to connect to https://grafana.mia-platform.eu
Here you will be able to visualize CPU and memory usage for each microservice during your tests.
Once you started a test, you can use Grafana dashboards to monitor the advancement of the test and report your results.
Please notice the execution of a test, in order to be considered valid for the inference of requests and limits, should be long enough for cpu and memory usage to have reached a stable situation, a plateau.
The most important graphs you need to take into consideration are:
- CPU Usage
- CPU Throttling
- Memory Usage
In particular, you should always keep track of maximum values.
After a test is considered to have reached a plateau, where the CPU and memory levels are stable, you should report this information:
from our setup
- How the test has been set (e.g. requests to api-gateway + authorization service (no login) - with request and limits set as returned in test before )
- Time range of the test execution (e.g. 22/04/2021 16.00 -16.30 )
- Requests and limits set for the microservice
- Number of users spawned
- Number of requests per second (RPS) served by the console
- Number of failures
- Maximum CPU Throttling level
- Maximum CPU consumption reached during the test
- Maximum Memory consumption reached during the test
Additionally, you can collect reports from the tools you used for the test execution:
- Locust allows you to download from Download Data tab a report in html format and csv files regarding requests, failures and exceptions
- Grafana allows you to create a link or a snapshot of the current dashboard using the share button.
From the data you previously collected you can directly infer if requests and limits have been correctly set for the selected microservice.
There are a few constraints that should be taken into consideration before selecting the next test parameters:
- Requests per second should match almost exactly the number of users spawned (< 0.5% difference)
- Failures should be almost 0
If one of these two conditions is not met during the execution, the test should be stopped and restarted.
If a test fails for one of the aforementioned reasons, you should look for a bottleneck in your test setup. The reason for this result to happen could be linked to other microservices not being correctly configured.
If you successfully passed the previous constraints, you should be able to determine the correctness of requests and limits using these rules:
- CPU usage should always correspond to the CPU requests level specified
- CPU throttling should be 0
- Memory usage should always correspond to the memory requests level specified
If all these conditions are met, you correctly identified request and limit levels!
Here’s an example of a successful test configuration:
During the majority of test cases, one of these conditions could fail. In that case, you need to correctly interpret the outcomes to update the requests and limits for the next test configuration.
If the CPU/memory usage maximum level overcomes the request level you should consider updating the request level to that specific value.
Accordingly, the level of the limits should be updated to be around 3-4 times the value specified for the requests.
Here’s an example of a test configuration where the CPU levels should be updated:
In particular, for the next test setup we will have:
- CPU requests updated from 230 to 380
- CPU limits updated from 800 to 1200