Lista para version 6.11

Linux 6.11 has been released

Summary: This release includes support for using a vDSO implementation of getrandom(), nested-software interrupt locking for better realtime support, better namespace management APIs, block layer atomic writes, multi-size support for anonymous shmem, dedicated bucket slab allocator for better protection against heap spraying; a new uretprobe system call for faster uretprobes; a binary interface for {{{/proc//maps}}}, and a iommufd facility to deliver IO page faults to user space. As always, there are many other features, new drivers, improvements and fixes.

Implement getrandom() in a vDSO

This release implements getrandom() in vDSO. First it adds a new kind of mapping to mmap(2), MAP_DROPPABLE, which lets the kernel zero out pages anytime under memory pressure, which enables allocating memory that never gets swapped to disk but also doesn't count as being mlocked. Then, the vDSO implementation of getrandom() is introduced. This provides a fast and cryptographically secure random number generator.

Recommended LWN article: Another try for getrandom() in the vDSO

Better namespace management APIS

This release includes various API improvements that let programs deal with namespaces more easily:

* In nsfs (the namespace filesystem) a couple of ioctls are added that allow to translate PIDs between PID namespaces

* In pidfs it is possible to make it possible to derive namespace file descriptors from pidfd file descriptors

* Both listmount() and statmount() have been extended to list and stat mounts in foreign mount namespaces

Block layer atomic writes

This release implements atomic writes in the kernel for torn-write protection. Atomic write HW is required, like SCSI ATOMIC WRITE. It does so by providing an interface that allows applications use application-specific block sizes larger than logical block size or larger than filesystem block size. With this new interface, application blocks will never be torn or fractured when written. For a power fail, for each individual application block, all or none of the data to be written. A racing atomic write and read will mean that the read sees all the old data or all the new data, but never a mix of old and new.

Recommended LWN article: Atomic writes without tears

Nested-software interrupt locking for better realtime support

Software interrupt handlers, called "bottom halves" in Linux, an important part of the kernel, does not play well with the realtime patchset, as they can introduce latency. This release reworks how locking works in order to make them preemptible and thus more adequate for the realtime needs.

Recommended LWN article: Nested bottom-half locking for realtime kernels

Add multi-size support for anonymous shmem

This release adds multi-transparent huge page support for anonymous shmem. Dramatic improvements in pagefault latency are realized.

The strategy is similar to supporting anonymous mTHP, a new interface '/mm/transparent_hugepage/hugepage-XXkb/shmem_enabled' is added, which can have almost the same values as the top-level '/sys/kernel/mm/transparent_hugepage/shmem_enabled', with adding a new

additional "inherit" option and dropping the testing options 'force' and 'deny'. By default all sizes will be set to "never" except PMD size, which is set to "inherit".

Allow writing to all executables

Under some circumstances, rewriting an executable in Linux can fail with a "text file is busy" message. This release makes the kernel ignore the {{{MAP_DENYWRITE}}} flag.

Dedicated bucket slab allocator for better protection against heap spraying

This release introduces a dedicated bucket allocator in the slab. This enhances the probabilistic defense against heap spraying/grooming of CONFIG_RANDOM_KMALLOC_CACHES from last year

Recommended LWN article: Hardening the kernel against heap-spraying attacks

New uretprobe system call for faster uretprobes

This release adds a new uretprobe syscall which speeds up the uretprobe 10-30% faster. This syscall is automatically used from user-space trampolines which are generated by the uretprobe. If this syscall is used by normal user program, it will cause SIGILL. Currently only implemented on x86_64.

Binary interface for /proc//maps

This release aims to solve some problems with the text interface of {{{/proc//maps}}} with a new binary API. It's meant to address both non-selectiveness and text nature of /proc//maps, by giving user more control of what sort of VMA(s) needs to be queried, and being binary-based interface eliminates the overhead of text formatting (on kernel side) and parsing (on user space side)

iommufd: Deliver IO page faults to user space

This release implements the functionality of delivering IO page faults to user space through the IOMMUFD framework. One feasible use case is the nested translation. Nested translation is a hardware feature that supports two-stage translation tables for IOMMU. The second-stage translation table is managed by the host VMM, while the first-stage translation table is owned by user space. This allows user space to control the IOMMU mappings for its devices.