Behind the Scenes at KernelCare: How We Test Patches Before Release

October 23, 2020 - TuxCare PR Team

Testing is essential for any software update including patches, but it’s even more essential when changes are made to critical infrastructure that powers revenue-impacting services. Release of security updates that are not thoroughly tested may result in kernel crashes, operating system reboots, system- or service-level failures – some of these aftermaths are critical and some just unpleasant, but all of them can hurt your business and service-level agreements. KernelCare has a strict testing process every patch must go through before it’s deployed to production, and this article details how we ensure customer infrastructure reliability and uptime after patch deployments.

Patching Delays are Risky

It’s essential that all software including the Linux kernel is patched when developers release a security update. It’s even more critical to patch as soon as possible when a vulnerability is published with exploit code. The longer you allow servers to stay unpatched, the greater the risk of being the next large data breach victim. It’s not uncommon for administrators to test and wait for a scheduled date before patching. Within this timeframe, an attacker could scan the server, find the vulnerability and exploit it.

Leaving patch management to KernelCare eases much of the overhead for administrators, but our clients need the reassurance that proper testing is done before allowing a third party to deploy to critical infrastructure. KernelCare’s patching solutions go through rigorous testing before deployment to our client servers. Our testing alleviates much of the administrative overhead in large organizations so that you no longer need several machines with the various vendor operating systems installed to ensure a clean, bugless patching experience.

KernelCare Development and Testing

Buggy code can cause several issues including introduction of new vulnerabilities. An example of why it’s so important to test patches before deployment is the discovery of a trivially exploitable vulnerability introduced in May 2020. The patch named Huawei Kernel Self Protection was supposed to offer a series of security-hardening options for the Linux kernel. Instead, it opened the operating system to possible backdoors, offered no threat-level programming, and allowed for disclosure of kernel memory. Because the patch was tested, the Linux team was able to stop it from weakening the security of the kernel. This is one example of why it’s critical to test patches before they’re installed, and KernelCare performs a number of testing types before deployment to our client servers.

To help our customers understand the high-level testing we require from our teams, we’ve detailed the process below.

We provision a bare-metal physical server or a virtual machine with the target operating system. If the patch for specific CVE (Common Vulnerability Exposures) is for a specific hardware component (e.g. a CPU vulnerability), we provision the corresponding hardware component on the server. For a kernel-based virtual machine (KVM), we will provision nested virtual machines and necessary hardware features.
We pull the kernel that will be used for testing from the vendor and install it on our provisioned test server.
Reboot the server with the newly installed kernel.
We install KernelCare in the same way a customer would. We publish the instructions here.
Execute the KernelCare patching command ($ /usr/bin/kcarectl –update).
Load kernel modules created for the patch.
Validate and test all changed functionalities for the module.
If exploit code is available, we use this code to reproduce it on the server and test that patches remediate the vulnerability.
Run our four tests (see below).
If patches pass our testing, we deploy them to production.
If tests are unsuccessful, we analyze logs to find the cause of failure, remediate the issue, and then repeat these steps again until patches pass our tests.

KernelCare’s Four Types of Tests

Thorough testing is critical for successful deployments, and we know that bugs impact our customers. We have four tests that run simultaneously for each kernel to ensure patches don’t cause critical issues.

Before any patch is deployed, it goes through the following automated tests depending on how fast we need these patches in production and the critical level of the vulnerability:

We apply our patches to the running kernel and then unpatch it to test rollbacks. This process takes about 10 minutes per kernel.
We apply our patches and then perform a subset of tests from the Linux Test Project (LTP) test suite which add some load to the operating system. This test takes about 15 minutes per kernel.
The same as 2. but with the full set of LTP tests. This test is a thorough 4 hour per kernel process that includes:

Filesystem stress test
Storage I/O test
Memory management stress test
IPC stress test
Scheduler test
Commands functional verification tests
System call functional verification tests
etc

LTP extra. This test is a high-load test with continuous patching and unpatching throughout the test. This test takes 4 hours per kernel.

Conclusion

In order to provide our clients with a stable and reliable patching service – the one you can rely onand even forget that it exists on your servers – we test each patch on each affected kernel for at least 4 hours/kernel and only release patches when we are 100% sure they will not disturb your operations. With KernelCare, administrators can not only patch their systems with no reboot required, but they also know that these patches are thoroughly tested before installation.