Overview
About vulnerability
In the Linux kernel, the following vulnerability has been resolved:
bpf: fix ktls panic with sockmap
[ 2172.936997] ————[ cut here ]————
[ 2172.936999] kernel BUG at lib/iov_iter.c:629!
……
[ 2172.944996] PKRU: 55555554
[ 2172.945155] Call Trace:
[ 2172.945299]
After calling bpf_exec_tx_verdict(), the size of msg_pl->sg may increase, e.g., when the BPF program executes bpf_msg_push_data().
If the BPF program sets cork_bytes and sg.size is smaller than cork_bytes, it will return -ENOSPC and attempt to roll back to the non-zero copy logic. However, during rollback, msg->msg_iter is reset, but since msg_pl->sg.size has been increased, subsequent executions will exceed the actual size of msg_iter. ’’' iov_iter_revert(&msg->msg_iter, msg_pl->sg.size - orig_size); ’''
The changes in this commit are based on the following considerations:
-
When cork_bytes is set, rolling back to non-zero copy logic is pointless and can directly go to zero-copy logic.
-
We can not calculate the correct number of bytes to revert msg_iter.
Assume the original data is “abcdefgh” (8 bytes), and after 3 pushes by the BPF program, it becomes 11-byte data: “abc?de?fgh?”. Then, we set cork_bytes to 6, which means the first 6 bytes have been processed, and the remaining 5 bytes “?fgh?” will be cached until the length meets the cork_bytes requirement.
However, some data in “?fgh?” is not within ‘sg->msg_iter’ (but in msg_pl instead), especially the data “?” we pushed.
So it doesn’t seem as simple as just reverting through an offset of msg_iter.
- For non-TLS sockets in tcp_bpf_sendmsg, when a “cork” situation occurs, the user-space send() doesn’t return an error, and the returned length is the same as the input length parameter, even if some data is cached.
Additionally, I saw that the current non-zero-copy logic for handling corking is written as: ’’' line 1177 else if (ret != -EAGAIN) { if (ret == -ENOSPC) ret = 0; goto send_end; ’''
So it’s ok to just return ‘copied’ without error when a “cork” situation occurs.
Details
- Affected product:
- AlmaLinux 9.2 ESU , Oracle Linux 7 ELS , TuxCare 9.6 ESU , Ubuntu 20.04 ELS
- Affected packages:
- linux @ 5.4.0 (+3 more)
In the Linux kernel, the following vulnerability has been resolved:
bpf: fix ktls panic with sockmap
[ 2172.936997] ————[ cut here ]————
[ 2172.936999] kernel BUG at lib/iov_iter.c:629!
……
[ 2172.944996] PKRU: 55555554
[ 2172.945155] Call Trace:
[ 2172.945299]
After calling bpf_exec_tx_verdict(), the size of msg_pl->sg may increase, e.g., when the BPF program executes bpf_msg_push_data().
If the BPF program sets cork_bytes and sg.size is smaller than cork_bytes, it will return -ENOSPC and attempt to roll back to the non-zero copy logic. However, during rollback, msg->msg_iter is reset, but since msg_pl->sg.size has been increased, subsequent executions will exceed the actual size of msg_iter. ’’' iov_iter_revert(&msg->msg_iter, msg_pl->sg.size - orig_size); ’''
The changes in this commit are based on the following considerations:
-
When cork_bytes is set, rolling back to non-zero copy logic is pointless and can directly go to zero-copy logic.
-
We can not calculate the correct number of bytes to revert msg_iter.
Assume the original data is “abcdefgh” (8 bytes), and after 3 pushes by the BPF program, it becomes 11-byte data: “abc?de?fgh?”. Then, we set cork_bytes to 6, which means the first 6 bytes have been processed, and the remaining 5 bytes “?fgh?” will be cached until the length meets the cork_bytes requirement.
However, some data in “?fgh?” is not within ‘sg->msg_iter’ (but in msg_pl instead), especially the data “?” we pushed.
So it doesn’t seem as simple as just reverting through an offset of msg_iter.
- For non-TLS sockets in tcp_bpf_sendmsg, when a “cork” situation occurs, the user-space send() doesn’t return an error, and the returned length is the same as the input length parameter, even if some data is cached.
Additionally, I saw that the current non-zero-copy logic for handling corking is written as: ’’' line 1177 else if (ret != -EAGAIN) { if (ret == -ENOSPC) ret = 0; goto send_end; ’''
So it’s ok to just return ‘copied’ without error when a “cork” situation occurs.