Lista para version 5.8

Linux 5.8 has been released

Summary: This release adds: memory management changes to improve the behaviour of systems under thrashing situations; a event notification mechanism built on top of standard pipes that splices messages from the kernel into pipes opened by userspace; support for having different procfs mounts with different mount options each one; a Kernel Concurrency Sanitizer that helps to find data race bugs; make it possible to use pidfds with setns(2) for easier attachment to the namespaces of a process; support for Shadow Call Stack and Branch Target Identification in ARM64 to prevent security exploits; support for Inline Encryption hardware; new CAP_BPF and CAP_PERFMON capabilities for BPF and performance monitoring programs; and IPv6 MPLS support. As always, there are many other new drivers and improvements.

Better behavior in memory thrashing situations

The reclaim code that balances between swapping and cache memory reclaim tries to predict likely reuse of a memory page. When it fails it cannot detect when the cache is thrashing pathologically, or when the system is in the middle of a swap storm. This code has been tuned over time to a point where even in the presence of large amounts of cold anonymous memory and a capable swap device, the VM refuses to even seriously scan these pages, and can leave the page cache thrashing needlessly. The proliferation of fast random IO devices such as SSDs has made this undesirable behavior more noticeable.

This release sets out to address this. Since Linux 3.15

Kernel Concurrency Sanitizer

The Kernel Concurrency Sanitizer (KCSAN) is a data race detector for the kernel. Key priorities in KCSAN's design are lack of false positives, scalability, and simplicity. KCSAN uses compile-time instrumentation to instrument memory accesses and it is supported in both GCC and Clang.

Documentation: The Kernel Concurrency Sanitizer (KCSAN)

Recommended LWN article: Concurrency bugs should fear the big bad data-race detector (part 1)

Kernel event notification mechanism

This release adds an event notification mechanism built on top of standard pipes, it splices notification messages from the kernel into pipes opened by userspace. The pipe is opened in a special mode, and its internal buffer is used to hold messages generated by the kernel, which are then read out by read(2). The owner of the pipe tells the kernel which sources it would like to watch through that pipe, and filters may also be emplaced on a pipe so that certain source types and subevents can be ignored if they’re not of interest. In this release, the only event source is for keys/keyrings, such as linking and unlinking keys and changing their attributes, which will be used by Gnome.

Documentation: General notification mechanism

Recommended LWN article: A kernel event notification mechanism

Private procfs instances

Procfs was historically tied to PID namespaces, this has the effect that all new procfs mounts are just a mirror of the internal one; any change, any mount option update, any new future introduction will propagate to all other procfs mounts in the same PID namespace.

This release allows to have several procfs mounts with different mounts options within the same PID namespace. The main aim of this work is to have on embedded systems one supervisor for apps. It also adds some convenient mount options that let a private procfs mount to show only ptraceable processes in the procfs, which allows to support lightweight sandboxes in Embedded Linux. Or a mount option that allows to hide non-pid inodes.

Using pidfds to attach to namespaces

This release makes it possible to use pidfds

These features support various use-cases where callers setns to a subset of namespaces to retain privilege, perform an action and then re-attach another subset of namespaces. Apart from reducing the number of syscalls needed to attach to all currently supported namespaces, this also allows to setns to a set of namespaces atomically, this is useful for a standard container manager interacting with a running container.

Shadow Call Stack and Branch Target Identification for improved security on ARM64

This release adds generic support for Clang's Shadow Call Stack

There is also support for ARMv8.5-BTI in both user- and kernel-space. This allows branch targets to limit the types of branch from which they can be called and additionally prevents branching to arbitrary code.

Recommended LWN article: Some near-term arm64 hardening patches

Support for Inline Encryption hardware

This release supports Inline Encryption in the block layer. Inline Encryption hardware allows software to specify an encryption context (an encryption key, crypto algorithm, data unit num, data unit size, etc.) along with a data transfer request to a storage device, and the inline encryption hardware will use that context to en/decrypt the data. The inline encryption hardware is part of the storage device, and it conceptually sits on the data path between system memory and the storage device.

Recommended LWN article: Inline encryption for filesystems

Introduce CAP_BPF and CAP_PERFMON security capabilities

Using BPF has required the {{{CAP_SYS_ADMIN}}} capability to run. This means that software that needs to use BPF needs that capability, which grants way too many privileges. This releases grants access to BPF functionality with a new {{{CAP_BPF}}} capability combined with {{{CAP_PERFMON}}}, {{{CAP_NET_ADMIN}}} and some of them kept under {{{CAP_SYS_ADMIN}}}. The user process has to have: {{{CAP_BPF}}} to create maps and do other {{{sys_bpf()}}} commands, {{{CAP_BPF}}} and {{{CAP_PERFMON}}} to load tracing programs, and {{{CAP_BPF}}} plus {{{CAP_NET_ADMIN}}} to load networking programs.

This release also adds the {{{CAP_PERFMON}}} capability for performance monitoring and observability.

Recommended LWN article: CAP_PERFMON — and new capabilities in general

IPv6 MPLS support

This release extends the Multi-Protocol Label Switching support to IPv6.

bridge: Add support for Media Redundancy Protocol (MRP)

This release adds support for the Media Redundancy Protocol is a data network protocol standardized by International Electrotechnical Commission as IEC 62439-2. It allows rings of Ethernet switches to overcome any single failure with recovery time faster than STP. It is primarily used in Industrial Ethernet applications.