Overview
About vulnerability
In the Linux kernel, the following vulnerability has been resolved:
dm cache: prevent BUG_ON by blocking retries on failed device resumes
A cache device failing to resume due to mapping errors should not be retried, as the failure leaves a partially initialized policy object. Repeating the resume operation risks triggering BUG_ON when reloading cache mappings into the incomplete policy object.
Reproduce steps:
- create a cache metadata consisting of 512 or more cache blocks, with some mappings stored in the first array block of the mapping array. Here we use cache_restore v1.0 to build the metadata.
cat «EOF » cmeta.xml
<superblock uuid="" block_size=“128” nr_cache_blocks=“512”
policy=“smq” hint_width=“4”>
- wipe the second array block of the mapping array to simulate data degradations.
mapping_root=$(dd if=/dev/sdc bs=1c count=8 skip=192
2>/dev/null | hexdump -e ‘1/8 “%u\n”’)
ablock=$(dd if=/dev/sdc bs=1c count=8 skip=$((4096*mapping_root+2056))
2>/dev/null | hexdump -e ‘1/8 “%u\n”’)
dd if=/dev/zero of=/dev/sdc bs=4k count=1 seek=$ablock
- try bringing up the cache device. The resume is expected to fail due to the broken array block.
dmsetup create cmeta –table “0 8192 linear /dev/sdc 0”
dmsetup create cdata –table “0 65536 linear /dev/sdc 8192”
dmsetup create corig –table “0 524288 linear /dev/sdc 262144”
dmsetup create cache –notable
dmsetup load cache –table “0 524288 cache /dev/mapper/cmeta
/dev/mapper/cdata /dev/mapper/corig 128 2 metadata2 writethrough smq 0”
dmsetup resume cache
- try resuming the cache again. An unexpected BUG_ON is triggered while loading cache mappings.
dmsetup resume cache
Kernel logs:
(snip) ————[ cut here ]———— kernel BUG at drivers/md/dm-cache-policy-smq.c:752! Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI CPU: 0 UID: 0 PID: 332 Comm: dmsetup Not tainted 6.13.4 #3 RIP: 0010:smq_load_mapping+0x3e5/0x570
Fix by disallowing resume operations for devices that failed the initial attempt.
Details
- Affected product:
- AlmaLinux 9.2 ESU , CentOS 6 ELS , CentOS 7 ELS , CentOS 8.4 ELS , CentOS 8.5 ELS , CentOS Stream 8 ELS , CloudLinux 7 ELS , Oracle Linux 6 ELS , Oracle Linux 7 ELS , RHEL 7 ELS , TuxCare 9.6 ESU , Ubuntu 16.04 ELS , Ubuntu 18.04 ELS , Ubuntu 20.04 ELS
- Affected packages:
- linux @ 4.15.0 (+15 more)
In the Linux kernel, the following vulnerability has been resolved:
dm cache: prevent BUG_ON by blocking retries on failed device resumes
A cache device failing to resume due to mapping errors should not be retried, as the failure leaves a partially initialized policy object. Repeating the resume operation risks triggering BUG_ON when reloading cache mappings into the incomplete policy object.
Reproduce steps:
- create a cache metadata consisting of 512 or more cache blocks, with some mappings stored in the first array block of the mapping array. Here we use cache_restore v1.0 to build the metadata.
cat «EOF » cmeta.xml
<superblock uuid="" block_size=“128” nr_cache_blocks=“512”
policy=“smq” hint_width=“4”>
- wipe the second array block of the mapping array to simulate data degradations.
mapping_root=$(dd if=/dev/sdc bs=1c count=8 skip=192
2>/dev/null | hexdump -e ‘1/8 “%u\n”’)
ablock=$(dd if=/dev/sdc bs=1c count=8 skip=$((4096*mapping_root+2056))
2>/dev/null | hexdump -e ‘1/8 “%u\n”’)
dd if=/dev/zero of=/dev/sdc bs=4k count=1 seek=$ablock
- try bringing up the cache device. The resume is expected to fail due to the broken array block.
dmsetup create cmeta –table “0 8192 linear /dev/sdc 0”
dmsetup create cdata –table “0 65536 linear /dev/sdc 8192”
dmsetup create corig –table “0 524288 linear /dev/sdc 262144”
dmsetup create cache –notable
dmsetup load cache –table “0 524288 cache /dev/mapper/cmeta
/dev/mapper/cdata /dev/mapper/corig 128 2 metadata2 writethrough smq 0”
dmsetup resume cache
- try resuming the cache again. An unexpected BUG_ON is triggered while loading cache mappings.
dmsetup resume cache
Kernel logs:
(snip) ————[ cut here ]———— kernel BUG at drivers/md/dm-cache-policy-smq.c:752! Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI CPU: 0 UID: 0 PID: 332 Comm: dmsetup Not tainted 6.13.4 #3 RIP: 0010:smq_load_mapping+0x3e5/0x570
Fix by disallowing resume operations for devices that failed the initial attempt.