A Deep Dive on the xz Compromise

Joao Correia

April 2, 2024 - Technical Evangelist

xz is a widely distributed package that provides lossless compression for both users and developers, and is included by default in most, if not all, Linux distributions. Created in 2009, it has since released numerous versions.

As an open-source project, it is available on GitHub. However, as of the time of writing this article, attempting to visit the project page greets you with a message stating that “this repository has been disabled due to a violation of the terms of service” instead of the traditional GitHub page. This violation was due to the distribution of malware. In this article, we dig into the what, the why, the how, and perhaps even the “who” behind this incident.

When the story broke late in the week right before Easter, the Twitter cybersec sphere exploded, with Internet sleuths obsessing over every detail. This article draws heavily on information gathered and published by these investigations, much of which is consolidated here, and from information gleaned through the author’s own research. Attribution is provided by linking back to the original sources where possible, and where no direct source is linked, the reader can assume it to have come from either the GitHub gist recapping the event or this timeline.

Keep in mind that this story is very recent. There are still some unresolved aspects, and some hints at the (unlikely) possibility that the person originally believed to be behind the incident may not necessarily be so. This is discussed further in the caveat section of this article.

Buckle up, the rabbit hole goes deep in this one.

Historical Context

The xz project, like many other open-source projects, was established as the pet project of a single developer who had an idea and shared it with the world. In 2009, Lasse Collins, previously responsible for maintaining lzma-utils, another compression-related project, created xz. It was designed as an extension, or evolution, of lzma, to the extent that liblzma now ships as part of the xz package by default.

As is common with most one-person projects, the author was responsible for the bulk of the code and for managing the occasional commits from third parties, integrating the code into the repository, and publishing new versions. As the project gained attention, the burden of management became heavier, and, eventually, other contributors were recognized for their efforts and given permissions to manage tasks like reviewing code, accepting or rejecting submissions, and managing the repository.

Jia Tan, a developer who had previously contributed to the project, was granted commit and then release management permissions a little over two years ago. In retrospect, the process of adding a new maintainer seems to have been part of a social engineering campaign targeting Lasse Collins. The timeline of these email exchanges can be seen here.

Through pressure and accusations of abandoning the project or losing interest, several different users (or perhaps a single individual behind multiple usernames) sent messages to the project’s mailing list. They complained about delays in accepting patches, releasing new versions, and adopting different methods until the project lead began to rely on a new, but eager, contributor. The email accounts behind these pressure tactics seem to have no presence on the Internet beyond that mailing list.

To say that xz is widely used is an understatement. xz is one of the compression formats you can use to package the Linux Kernel on a system, making it indispensable for booting that system. The compression ratios achieved and the decompression speed make it a perfect option for space-efficient releases, kernel images, and other components.

The Missing Time

In an ironic twist of fate, a software engineer at Microsoft, Andres Freund, was conducting timing measurements on Linux virtual machines to establish a baseline for an unrelated task. He began to notice some “noise” in the readings. The sshd process, responsible for accepting remote secure shell connections, was using an excessive amount of CPU even on failed connection attempts, with a 500 ms delay impacting connections. This unexplained delay led Freund to investigate further, using performance monitoring (perf) and debugging (dbg) tools to identify the problem.

He discovered a backdoor in xz. Specifically, he identified the backdoor and the injected code, as the backdoor operated by piggybacking off certain functions called by sshd. It’s important to note, and keen-eyed readers might already be aware, that sshd does not depend on xz or liblzma. However, in systems that use systemd to load sshd, systemd includes liblzma as a dependency. This connection allowed the backdoor code to be introduced into sshd. Systems that fall into this category include Fedora, Ubuntu, Debian, and most of their derivative distributions.

The Backdoor

The method by which the code was introduced into xz (and liblzma) involved a few binary files added to the xz repository, and a compromised build process. These files, originally submitted by Jia Tan as part of a “test suite” for xz, were binary blobs “handcrafted with a hex editor.” Reading source code is already a complex task, but reading binary code is nearly impossible. There is a tongue-in-cheek comment that “there is no better source code than the files themselves.” Indeed.

The commit including the malicious files was made on February 23, 2024. As far as is currently known, the malicious code is present in only two of the binary files, with the others appearing to perform their intended functions — testing different conditions for failures in the compression or decompression process of xz and liblzma.

One of the backdoor’s notable aspects is that if a random user pulled the code from the xz repository and built it after the backdoor was included, they would not end up with a compromised binary (as far as is currently assumed, though this may be proven wrong later). The actual injection code would only be included in the output if a specific “build-to-host.m4” file was present during compilation, but this file was also added to “.gitignore,” making it inaccessible to regular git operations typically used by end-users. However, it is used during the build process of a Linux distribution’s package.

This approach was highly targeted at getting the code into actual Linux distributions, while simultaneously making it harder for security researchers to replicate the incident in a controlled environment.

On February 24, a day after adding the malicious files, the same user tagged (meaning: presented as the next version) xz-5.6.0. Linux distributions typically pick up new versions of packages, and, for example, Debian included it in the unstable repository a couple of days later.

On the 28th, an additional patch by Jia Tan was added that interfered with the detection of landlock during the build process. Landlock is a security feature that prevents (among other things) extraneous linking and importing of code that would be triggered if it were enabled during compilation, so it had to be bypassed for the attack to succeed.

This was accomplished in a very subtle manner. A single “.” (dot) added to “CMakeLists.txt” disrupted the compilation of the sandbox check, causing the landlock detection to always return false, which, in turn, always disabled it for the malicious code. This change was not detected when submitted. Significantly, Lasse Collins (not Jia Tan) is credited with this commit, which may have been forged, or the very subtle change was not spotted before adding the code.

Notably, on February 29, a pull request on the systemd project requested to stop linking liblzma in order to reduce image sizes. This would have thwarted the attack. This event seems unrelated to the actual attack and is accepted as a coincidence that may have accelerated the timeline, and it is unknown if previous conversations on the topic were public and known to the attacker.

The function that invokes the malicious code is _get_cpuid. There may be other functions yet to be discovered. This function would run the malicious payload code before performing the original _get_cpuid work and returning it, adding the latency noticed by the Microsoft engineer. More specifically, this function gets called early in the certificate validation process when initiating a connection.

Some distributions began to observe valgrind errors in that function during analysis. On March 8th, Jia Tan submitted code to address the Valgrind detections, which, in hindsight, appears to simply disable the analysis of the malicious code.

The following day, on March 9th, Jia Tan updated the backdoored files, under the pretext that “they were created with randomness local to his machine, and to better reproduce them in the future, a constant seed value should be used.” Again, binary files are difficult to analyze, especially when the backdoor code merely selects values from predefined locations within the binary blob, making detection virtually impossible. On the same day, he tagged and released xz-5.6.1.

On March 25th, one of the email accounts used to pressure xz’s original maintainer resurfaced, now pressuring the Debian project to include xz’s new version. Other seemingly ghost accounts joined in to increase the pressure.

On March 28th, Jia Tan filed a bug report with Ubuntu for the inclusion of xz-5.6.1, since Debian had already included it.

The malicious behavior is targeted specifically at x86-64 systems running glibc (it depends on a function provided by glibc) and running sshd through systemd. This set of conditions matches Debian-based and Red-Hat-based distributions.

The deobfuscation and analysis of the malicious code revealed that it would activate each time an incoming connection was received on sshd. It has only recently been discovered that the code expects a particular public key upon receiving a connection. If this key is not present, sshd will function as usual. However, when an attacker-controlled key is present, the code triggers a different, remotely controlled, behavior.

Detection, and the Open Source World Is Shaken

On March 28th, Andres Freund’s investigation led to the identification of the backdoor. Distributions were notified, and updates were released that essentially rolled back the packages to a version prior to Jia Tan’s involvement with the xz project. This means there is now a “5.6.1+really5.4.5” version of xz available in several distributions’ repositories.

So far, distributions that have acknowledged the presence of the compromised package include Fedora Rawhide, Fedora 40 beta, and Debian.

Because xz played a crucial role in the package build process itself, the Debian project opted to rebuild their entire build system, as it had been running a version of xz that included Jia Tan’s patches (prior to the supposed inclusion of the backdoor).

A supply chain attack can have a staggering reach. Although this is often stated, it only becomes truly evident with clear and unequivocal evidence, such as this attack.

Several issues become apparent when analyzing this situation:

First, the way open-source projects are run, without adequate funding and resources, leads to situations where maintainers can become overwhelmed and tricked into accepting less-than-ideal compromises. This vulnerability allows threat actors to infiltrate and eventually gain control over key projects.

Second, xz was merely the project that was caught. If not for the delay caused by the malicious code, which likely could have been optimized with more time, this attack might not have been detected at all, and everyone would eventually be running a compromised version of xz on their Debian and Red-Hat-based systems. By reverting xz, it is assumed that this particular threat has been neutralized. However, the certainty of this assertion remains unclear at this time.

Third, code is added to open-source projects every minute of every day, across countless different projects. The way dependencies are used and vetted allows a cascade of misdirections to reach the intended target, even if no malicious code ever directly touches the attacker’s intended target package.

Fourth, the argument that open-source code is more secure simply because it is open is a straw man. This argument only holds true while there are sufficient and sufficiently motivated individuals scrutinizing every line of code, of every project, on every commit. This requires a vast number of highly skilled people, performing a significant amount of unpaid work, indefinitely. The ability to check and vet code is only effective if the checking and vetting are actually being performed. Even then, by its very nature and the methods available for obfuscation, it is possible to slip malicious code into unsuspecting projects. In fact, one might argue that, from a security perspective, the availability of source code might also assist, and perhaps to a significant extent, malicious actors seeking to identify vulnerabilities, more than it helps end-users. While open-source code offers many undeniable benefits, intrinsic security is not always one of them. This backdoor was discovered entirely by accident, not because someone was actively reviewing the code.

Fifth, building a reputation in the open-source community often relies solely on the commit history in GitHub and nothing else. Demonstrating interest in a project and contributing some code fixes is often enough to be accepted. Demonstrating eagerness to help can be easily fabricated. This makes identifying specific individuals behind essentially anonymous accounts very challenging. A user may have multiple accounts, or create fake ones when an account becomes suspect or is identified as malicious. An organization with enough resources can build and staff an army of these accounts.

Sixth, there is often no clearly defined disaster recovery plan for most open-source projects. Reverting to a known good state is challenging if it is difficult to even determine what that known good state is. While one account is being blamed for this attack, it is uncertain whether it was the only one used.

The Possible Caveat

From the analysis of how this incident unfolded, everything seems to indicate the involvement of a well-funded, highly skilled threat actor or group. Some noteworthy observations include:

Jia Tan appears to follow a 9 to 5 (or 6) workday schedule. Commit and GitHub activity suggest this pattern, with weekends and holidays consistent with certain countries being evident at a glance. However, it is important to note that timestamps in git commits can be manipulated at will, and – compared to the level of expertise demonstrated in the backdoor code – altering the time on a commit is trivial.
Jia Tan has contributed code to multiple projects outside of xz. In fact, the account has even submitted code to Google’s OSS-Fuzz, a tool designed to identify malicious or buggy behavior in open-source projects, as well as several other compression-related projects. All of this code is now under scrutiny and is no longer trusted.
Curiously, the timezone of the commits where the malicious files were uploaded differs from the typical time zone associated with the Jia Tan account. A detailed analysis of this inconsistency can be found here, but it raised the possibility that the developer account had been compromised rather than being intentionally used. While this remains a possibility, other aspects of the incident make it less plausible. Nonetheless, it is a point of consideration.

Closing Remarks

This story is likely far from over. The response so far has involved shutting down xz’s GitHub repository, rolling back distribution packages to an older version, and reassessing other commits from the supposed responsible user.

The open questions include:

This is one of the most sophisticated supply chain attacks to date. It is difficult to conceive of it as the work of a single hacker. Who was behind it? If Jia Tan was adhering to a work schedule, what country, organization, or group was paying the salary?
Maintaining an operation for more than two years requires considerable resources and effort. What was the ultimate goal? Eventually, all Debian-based and Red Hat-based systems would have been compromised. Who stands to benefit from having access to every system running those distributions?
Time and time zones are easily manipulated in commit messages. Were they intentionally set to point to specific zones?
Has all the malicious code been identified? Is there more payload yet to be discovered?
Was sshd the only target process?
What other projects were impacted by the attack?

In cybersecurity, we are constantly dodging bullets. Until we don’t. And the curious thing is, we won’t even see the one that gets us until it’s too late.

Summary