Linux 6.15 was released
Summary: This release includes a number of VFS improvements, such as mount notifications, allow creating idmapped mounts from idmapped mounts, support creating detached mounts from a detached mount, allow mount detached mounts on detached mounts, and support detached mounts in overlayfs. There is also support for latency profiling in perf, io_uring networking support for zero-copy receive, a fwctl subsystem to standarize firmware management, bcachefs improvements such as scrub, and support for broadcast TLB invalidation using AMD's INVLPGB instruction. As always, there are many other features, new drivers, improvements and fixes. Also, you might be interested in the LWN merge window report: part 1
Mount notifications
This release includes an API to listen for mount topology changes without requiring looking at {{{/proc/
Currently notifications for mount, umount and moving mounts are generated. The generated notification record contains the unique mount id of the mount, which can then be used with {{{listmount()}}} and {{{statmount()}}}.
Allow creating idmapped mounts from idmapped mounts
In previous releases it wasn't possible to allow the creation of idmapped mounts from already idmapped mounts. This release adds a new system call, {{{open_tree_attr()}}} which works just like open_tree() but takes an optional struct mount_attr parameter.
Support creating detached mounts from a detached mount
Currently, detached mounts can only be created from attached mounts. This limitaton prevents various use-cases. For example, the ability to mount a subdirectory without ever having to make the whole filesystem visible first. This release removes this limitation.
Allow mount detached mounts on detached mounts
Currently, detached mounts can only be mounted onto attached mounts. This limitation makes it impossible to assemble a new private rootfs and move it into place. Instead, a detached tree must be created, attached, then mounted open and then either moved or detached again. This release lifts this restriction.
Support detached mounts in overlayfs
Since last cycle, overlayfs supports specifying layers via file descriptors. However, it does not allow detached mounts which means userspace cannot user file descriptors received via {{{open_tree(OPEN_TREE_CLONE)}}} and {{{fsmount()}}} directly, they have to do dirty tricks. This release allows to directly use detached mounts.
Support for broadcast TLB invalidation using AMD's INVLPGB instruction
This release adds support for the INVLPGB instruction on AMD systems that support it (Zen 3 and later). It allows the kernel to invalidate TLB entries on remote CPUs without needing to send IPIs, without having to wait for remote CPUs to handle those interrupts, and with less interruption to what was running on those CPUs. If you didn't understand any of these words, it means better overall performance
Support for latency profiling in perf
This release introduces latency profiling using scheduler information. The latency profiling is to show impacts on wall-time rather than cpu-time. By tracking context switches, it can weight samples and find which part of the code contributed more to the execution latency.
An example (after pasing {{{--latency}}} to perf record):
{{{
$ perf report -s comm
...
#
# Overhead Latency Command
# ........ ........ ...............
#
78.97% 48.66% cc1
6.54% 25.68% python3
4.21% 0.39% shellcheck
3.28% 13.70% ld
[...]
}}}
You can see latency of cc1 is around 50% and python3 and ld contribute a lot more than their overhead.
io_uring networking support for zero-copy receive
This release adds support for zero-copy receive with io_uring, enabling fast bulk receive of data directly into application memory, rather than needing to copy the data out of kernel memory. While this version only supports host memory as that was the initial target, other memory types are planned in the future as well.
This release also adds support for reading epoll events via io_uring. While this may seem counter-intuitive (and/or productive), the reasoning here is that quite a few existing epoll event loops can easily do a partial conversion to a completion based model, but are still stuck with one (or few) event types that remain readiness based.
New fwctl subsystem to standarize firmware management
fwctl is a new subsystem intended to bring some common rules and order to the growing pattern of exposing a secure FW interface directly to userspace. Unlike existing places like RDMA/DRM/VFIO/uacce that are exposing a device for datapath operations fwctl is focused on debugging, configuration and provisioning of the device.
Documentation: fwctl subsystem
bcachefs improvements
This release adds some important features to the bcachefs filesystem, like scrubbing and support for blocksize greater than page size, and casefolding support. This requires a number of disk format changes.