Lista para version 6.9

Linux 6.9 was released

Summary: This release includes suppor for x86 FRED, which is a new way of transitioning between CPU ring privileves; it also includes support for creating pidfds for threads; support for BPF arenas, which is a sparse shared memory region between the BPF programs and user space; and BPF tokens, which allow delegating functionality to less privileged programs; host support for AMD Secure Nested Paging; support for weighted interleaveing memory policies; support for a FUSE passthrough mode that makes regular file I/O faster; and a new device mapper VDO deduplication target. As always, there are many other features, new drivers, improvements and fixes.

pidfd: pidfds for threads and pidfs

pifds (PID fd, a file descriptor that represents a process) is a concept that was added first in Linux 5.3

* Create pidfds for threads. Until now pidfds could only be created for thread-group leaders

* {{{clone()}}} and {{{clone3()}}} can now be called with {{{CLONE_PIDFD | CLONE_THREAD}}}

* Moving pidfds to a tiny pseudo filesystem that allows several improvements

Recommended LWN article: A new filesystem for pidfds

x86 FRED support

FRED (Fast Return and Event Delivery) is a new architecture on Intel processors that defines simple new transitions that change privilege level (ring transitions). It is a replacement for IDT event delivery on x86 and addresses most of the technical nightmares which IDT exposes.

Documentation

BPF improvements: arenas and tokens

This new release incorporates, as usual, new BPF features. A couple of features stand out from the rest:

* BPF arenas: a sparse shared memory region between the BPF program and user space

* BPF tokens: provides the ability to delegate a subset of BPF subsystem functionality from privileged system-wide daemon (e.g., systemd or any other container manager) through special mount options for userns-bound BPF FS to a trusted unprivileged application. The main motivation is to enable containerized BPF applications to be used together with user namespaces. This is currently impossible, as CAP_BPF, required for BPF subsystem usage, cannot be namespaced or sandboxed, as a general rule.

Recommended LWN articles:

* A proposal for shared memory in BPF programs

* Finer-grained BPF tokens

Host support for AMD Secure Nested Paging

AMD EPYC systems utilizing Zen 3 and newer microarchitectures add support for a new feature called SEV-SNP, which adds Secure Nested Paging support on top of the SEV/SEV-ES support already present on existing EPYC systems. This release adds support for acting as a KVM host capable of running SNP guests. One of the main features of SNP is the addition of an RMP (Reverse Map) table to enforce additional security protections for private guest memory.

Weighted interleaving memory policies

When trying to allocate memory, the kernel has to decide from which node NUMA node it should allocate that memory. The existing memory interleave mechanism does an even round-robin distribution of memory across all nodes. This release provides a weighted interleave mechanism that distributes memory across nodes according to a provided weight, which helps to provide greater use of the total available memory bandwidth.

Recommended LWN article: Weighted interleaving for memory tiering

Faster FUSE I/O

This release adds a passthrough mode for regular file I/O. This allows performing read and write (also via memory maps) on a backing file without incurring the overhead of roundtrips to userspace. For now this is only allowed to privileged servers, but this limitation will go away in the future.

Recommended LWN article: FUSE passthrough for file I/O

Slightly faster timer setup

Kernel code uses a lot of timers which are canceled or rearmed before they expire. However, when timers are enqueued a target CPU is choosen, which is wasted time if these timers are not going to be used. In this release, the CPU selection is avoided whenever possible, which makes some hot code paths in the kernel a bit faster in optional cases.

Recommended LWN article: Better CPU selection for timer expiration

Device mapper VDO deduplication target

This release adds a new device mapper VDO (virtual data optimizer) target which provides block-level deduplication, compression, and thin provisioning. As a device mapper target, it can add these features to the storage stack, compatible with any file system. The vdo target does not protect against data corruption, relying instead on integrity protection of the storage below it.

Documentation: dm-vdo

Documentation: Design of dm-vdo