Linux 6.1 Help Users Identify Faulty CPUs - TuxCare

Linux 6.1 Help Users Identify Faulty CPUs

Obanla Opeyemi

September 4, 2022

CPU, linux, Linux OS 6.1, server reboots

Linux Kernel 6.1 one of the latest updates to the Linux operating system provides users with a new logging system that will enable them to identify faulty CPUs and their associated cores within a server.

The logging system detects which core, CPU, and socket failed at a given time. However, the logger is far from perfect, as there is a possibility that the kernel gets rescheduled toward another CPU or CPU core, although it can still help identify faulty CPUs or cores.

“This is not perfect, since the task might get rescheduled on another CPU between when the fault hit, and when the message is printed, but in practice, this has been good enough to help people identify several bad CPU cores,” explained Rik van Riel, the author of the change.

Often CPU bugs have the ability to be “oddly specific,” where certain programs or pieces of code only crash the core.

“In a large enough fleet of computers, it is common to have a few bad CPUs. Those can often be identified by seeing that some commonly run kernel code, which runs fine everywhere else, keeps crashing on the same CPU core on one particular bad system. However, the failure mode in CPUs that have gone bad over the years are often oddly specific, and the only bad behavior seen might be segfaulting in programs like bash, Python, or various system daemons that run fine everywhere else,” said Riel.

The logging system will help detect potentially faulty processors and will be in use from Linux 6.1 later this year. It will also complement the new Intel In-Field Scan, MCEs, EDAC reporting and others.

The sources for this piece include an article in Tech Radar.

Like what you're reading?
Get Important Content In Your Inbox.

Tell us your challenges and our experts will help you find the best approach to address them with the TuxCare product line.

Resources

State of Enterprise Linux Cybersecurity ... Read More State of Enterprise Linux Cybersecurity ...
Dangerous remotely exploitable vulnerability ... Read More Dangerous remotely exploitable vulnerability ...
Securing confidential research data ... Read More Securing confidential research data ...
State of Enterprise Vulnerability Detection ... Read More State of Enterprise Vulnerability Detection ...
Demand for Rapid Risk Elimination for ... Read More Demand for Rapid Risk Elimination for ...
TuxCare Free Raspberry Pi Patching Read More TuxCare Free Raspberry Pi Patching