A comprehensive guide to QEMU patching
When it comes to patching, thoroughness is a critical aspect – it takes just one unpatched service to open the doors to a damaging intrusion. The result is a long list of devices, services, and applications that need regular patching – including services that run in the background, such as QEMU.
QEMU (Quick EMUlator) is, of course, the popular virtualization tool that makes it more efficient to run large, complex workloads in the enterprise environment. Even though QEMU sits quietly working away in the background, it is still an essential service that must be patched consistently.
However patching QEMU can be disruptive, which is why QEMU can sometimes be at the back of the line in terms of patching. In this article, we outline why patching QEMU is so important and point to the steps you can take to ensure QEMU is safely patched. We also cover a non-disruptive route to patching QEMU – live patching.
Content:
1 – Why does QEMU patching matter – and why is it so tricky?
2 – Your guide to patching QEMU
3 – The quick and dirty – but careless – method
4 – A sysadmin approach
5 – Best-practice, enterprise-grade approach
6 – Live patching is an alternative
7 – Conclusion
Why does QEMU patching matter – and why is it so tricky?
Just about every application and service on a machine is a target for hackers – you’re guaranteed that if it’s live, it’ll get targeted. It doesn’t matter if it runs deep in the background doing an invisible job.
QEMU should be patched just like any other service, even if it’s not the most visible element of your workload. Go looking and you’ll find vulnerabilities for QEMU, just like for any other service and application that you use.
For example, in 2019 a vulnerability was reported affecting virtual instances of QEMU-KVM using the vhost/vhost_net network backend. By forcing the virtual machine to migrate, the log of dirty pages can be made to overflow because the kernel does not check the bounds of the log.
Another QEMU vulnerability emerged in 2015, this time vulnerability in the virtual floppy control for QEMU. It was called Venom and it affected systems even where QEMU has not been configured with a floppy drive simply because the code is always active within QEMU.
Vulnerabilities that affect QEMU will continue to accumulate. These vulnerabilities are particularly pernicious, given that any vulnerability that affects your QEMU services will also affect the virtual machines running under QEMU too – including your client’s virtual machines, for example.
That’s why you have to make sure that you patch QEMU just as consistently as any other tool in your tech arsenal. However, QEMU patching can sometimes be neglected because patching the virtual environment also affects the machines running on it.
In other words, depending on the avenue you choose for patching QEMU, you may find that your patching leads to significant downtime which is something your clients or other stakeholders may not be entirely happy with.
Your guide to patching QEMU
There are a few ways to go about patching QEMU, we’ll walk you through your options in this section. The approach you take will determine how much risk you create with your efforts – and how much disruption your patching efforts entail.
It’s worth noting that the approaches we list below – with the exception of live patching – are applicable to most virtualization solutions, including VMware, Xen, Hyper-V, and so forth; even if the steps might be slightly different.
The quick and dirty – but careless – method
If you’re in a rush you could opt to simply go ahead and apply a patch without doing much planning of any nature. These would be the steps that you take:
- You log into your host and update QEMU and its dependencies by using your package manager of choice.
- The package manager applies the patches and may or may not immediately restart affected services – which may also stop and restart running virtual machines.
- You restart QEMU to pick up the updates, effectively stopping all the virtual machines.
- Your virtual machine tenants and stakeholders are not happy that their work was affected, interrupted or destroyed.
- You restart the virtual machines, hoping they automatically restart their workloads.
Though you could take this approach if you’re running a single system, single-user home lab, you’re guaranteed to get into deep trouble with your employer – or with your clients – if you’re trying to patch live workloads in such a ramshackle manner. Simply put: try to patch QEMU this way and you’re going to spend the rest of your week attempting to explain yourself and fixing you and your colleague’s applications.
A sysadmin approach
If you’re a professional systems administrator you’ll know better, of course. In this section we talk about a more sensible way to try and patch QEMU.
- You start by planning a maintenance window, including estimating the time required, a list of affected virtual machines and the impact on other systems that interact with the virtual machines.
- You inform your stakeholders about the downtime and get authorization for the operation one month or so from now. The window period of this announcement will depend on how critical the vulnerability is – you’ll need to patch faster the more dangerous the vulnerability appears to be.
- Wait one month to start the maintenance, and hope that you haven’t been hacked in the meantime.
- You gracefully shut down all the virtual machines and use the package manager for your distribution to apply all the updates for QEMU.
- You restart QEMU, and you restart all the virtual machines.
- With everything going according to plan, you end the maintenance operation and inform your stakeholders that everything is back to normal.
- You spend the next couple of days troubleshooting – for example, tracking down the one virtual machine stuck in the boot process and fixing it.
You won’t upset as many people by following the above approach, and you won’t lose as much time trying to fix broken applications and services. While it will work in the enterprise environment, it’s not really an enterprise-grade approach and it won’t be acceptable under some conditions. In particular, with this approach, you’re still at risk of uncovering all sorts of unexpected problems along the way.
Best-practice, enterprise-grade approach
In technology as in life, there are always trade-offs, and that’s the case for our best practice method too. While the following method is the most secure and least disruptive way for patching QEMU, it is also the most involved route and will require a fair bit of resources from your technology team.
Nonetheless, if you want a reasonably non-disruptive method to patch your workloads that depend on QEMU we suggest that you follow these steps:
- You start off by ensuring that your workloads depend on several QEMU hosts, splitting the workload across your virtual machines, and serving as a form of redundancy in case one of the nodes fails.
- You put in place high availability mechanisms between virtual machines in different nodes, protecting both the host and the virtual workloads from failures.
- You plan the maintenance operation, taking into consideration key details – for example, that different high availability virtual machines that are part of the same service cannot be in the same virtualization host when the patching process is initiated. This process is called “Orchestration”.
- Inform your stakeholders that there will be a maintenance operation and that they should expect reduced performance but that there should not be any service unavailability.
- You request authorization to go ahead with the operation, and because no service is expected to be unavailable, you receive this authorization immediately.
- You start the maintenance operation on the first QEMU node – which in turn consists out of a number steps:
A. You live migrate the virtual machines to other nodes, ensuring virtual machine nodes from the same high availability service are still kept separate because you would lose high availability in that service otherwise.
An example migration command would look like this:
virsh migrate web1 qemu+ssh://desthost/system
There are many different ways to accomplish this, this is just an example migration command.
B. You wait a period of time for the necessary data to transfer over the network. It is advisable to have available or to use a secondary network connection for migration, and this can be controlled with this command:
virsh migrate web1 qemu://desthost/system tcp://10.0.0.1/
This migration phase can be quite long due to a high number of virtual machines and/or the RAM/resources affected to each migrated virtual machine that has to be sent across the network.
- After all the virtual machines are migrated you stop QEMU.
- Using your package manager, you update QEMU and dependencies.
- You start QEMU.
- You live migrate the virtual machines back.
- You repeat steps 6 to 10 on each of the virtualization nodes.
- You end the maintenance operation, informing your stakeholders that operations are restored to normal.
You’ve live-migrated your virtual machines so hopefully, there’s no disruption – there really shouldn’t be. There is one thing to watch out for though: some workloads can be very sensitive to even minor packet loss during the migration operation, and in some very edge cases this packet loss can cause problems. Ensure that your stakeholders are aware of this risk, and that they confirm everything is indeed fine on each virtual machine.
Live patching is an alternative
While our third suggestion is a safe and secure way to migrate workloads, it is undoubtedly resource intensive while it also requires a high level of technical skill to ensure that you are able to offer high availability and therefore no downtime.
There is an alternative in the shape of TuxCare’s QEMUCare live patching agent. Instead of performing patching that takes down a QEMU instance, TuxCare’s patching solution applies code fixes while QEMU is live and running.
It is easy to get started with QEMU live patching from TuxCare. It is a one-time operation, you simply install the TuxCare Live Patch agent and add the appropriate license. You can accelerate installation across many servers by deploying the TuxCare agent using Ansible or another automation tool.
Once installed, TuxCare’s live patching tool will continuously patch your QEMU instances, without disrupting the virtual machines you’re running on QEMU.
Conclusion
We’ve taken a close look at the different options you have when you patch your QEMU instances. The first option really isn’t available to you unless you’re patching a test or experimental system. Depending on your resources and the complexity of your workloads you could choose the second or the third option – and most serious enterprise workloads would probably opt for the third.
Our final option, live patching from TuxCare, eliminates so many problematic steps and simplifies the whole process to such a degree that it’s not even in the same league as other ways of patching QEMU. Needless to say, live QEMU patching is worth considering.
Regardless of which patching option you choose, the point remains that patching is a critical part of SecOps and that you must patch thoroughly – touching every service in the process, including QEMU.