Linux 6.11 has been released
Summary: This release includes support for using a vDSO implementation of getrandom(), nested-software interrupt locking for better realtime support, better namespace management APIs, block layer atomic writes, multi-size support for anonymous shmem, dedicated bucket slab allocator for better protection against heap spraying; a new uretprobe system call for faster uretprobes; a binary interface for {{{/proc/
Implement getrandom() in a vDSO
This release implements getrandom() in vDSO. First it adds a new kind of mapping to mmap(2), MAP_DROPPABLE, which lets the kernel zero out pages anytime under memory pressure, which enables allocating memory that never gets swapped to disk but also doesn't count as being mlocked. Then, the vDSO implementation of getrandom() is introduced. This provides a fast and cryptographically secure random number generator.
Recommended LWN article: Another try for getrandom() in the vDSO
Better namespace management APIS
This release includes various API improvements that let programs deal with namespaces more easily:
* In nsfs (the namespace filesystem) a couple of ioctls are added that allow to translate PIDs between PID namespaces
* In pidfs it is possible to make it possible to derive namespace file descriptors from pidfd file descriptors
* Both listmount() and statmount() have been extended to list and stat mounts in foreign mount namespaces
Block layer atomic writes
This release implements atomic writes in the kernel for torn-write protection. Atomic write HW is required, like SCSI ATOMIC WRITE. It does so by providing an interface that allows applications use application-specific block sizes larger than logical block size or larger than filesystem block size. With this new interface, application blocks will never be torn or fractured when written. For a power fail, for each individual application block, all or none of the data to be written. A racing atomic write and read will mean that the read sees all the old data or all the new data, but never a mix of old and new.
Recommended LWN article: Atomic writes without tears
Nested-software interrupt locking for better realtime support
Software interrupt handlers, called "bottom halves" in Linux, an important part of the kernel, does not play well with the realtime patchset, as they can introduce latency. This release reworks how locking works in order to make them preemptible and thus more adequate for the realtime needs.
Recommended LWN article: Nested bottom-half locking for realtime kernels
Add multi-size support for anonymous shmem
This release adds multi-transparent huge page support for anonymous shmem. Dramatic improvements in pagefault latency are realized.
The strategy is similar to supporting anonymous mTHP, a new interface '/mm/transparent_hugepage/hugepage-XXkb/shmem_enabled' is added, which can have almost the same values as the top-level '/sys/kernel/mm/transparent_hugepage/shmem_enabled', with adding a new
additional "inherit" option and dropping the testing options 'force' and 'deny'. By default all sizes will be set to "never" except PMD size, which is set to "inherit".
Allow writing to all executables
Under some circumstances, rewriting an executable in Linux can fail with a "text file is busy" message. This release makes the kernel ignore the {{{MAP_DENYWRITE}}} flag.
Dedicated bucket slab allocator for better protection against heap spraying
This release introduces a dedicated bucket allocator in the slab. This enhances the probabilistic defense against heap spraying/grooming of CONFIG_RANDOM_KMALLOC_CACHES from last year
Recommended LWN article: Hardening the kernel against heap-spraying attacks
New uretprobe system call for faster uretprobes
This release adds a new uretprobe syscall which speeds up the uretprobe 10-30% faster. This syscall is automatically used from user-space trampolines which are generated by the uretprobe. If this syscall is used by normal user program, it will cause SIGILL. Currently only implemented on x86_64.
Binary interface for /proc/
This release aims to solve some problems with the text interface of {{{/proc/
iommufd: Deliver IO page faults to user space
This release implements the functionality of delivering IO page faults to user space through the IOMMUFD framework. One feasible use case is the nested translation. Nested translation is a hardware feature that supports two-stage translation tables for IOMMU. The second-stage translation table is managed by the host VMM, while the first-stage translation table is owned by user space. This allows user space to control the IOMMU mappings for its devices.