Lista para version 5.16

Linux 5.16 was released on

Summary: This release adds a new futex_waitv syscall that can speeds up games by letting them wait for multiple futexes with a single system call; a file system health reporting API baed on fanotify; introduction of the concept of "memory folios", which speeds up some memory management areas significantly; support in the task scheduler for CPU "clusters" that share some L2/L3 cache; support for Intel AMX instructions; support for DAMON-based proactive memory reclamation and improved write congestion management. As always, there are many other features, new drivers, improvements and fixes.

New futex_waitv() system call for faster game performance

This release adds a new system call, `futex_waitv(2)`, which allows to wait on multiple futexes with a single system call. The main use case is emulating Windows' `WaitForMultipleObjects` call, which allows software like Proton to improve the performance of Windows Games. Native Linux games can also benefit from this interface as this is a common wait pattern for this kind of applications.

Recommended LWN article: Short subjects: Realtime, Futexes, and ntfs3

Documentation: Documentation/userspace-api/futex2.rst

File system health reporting through fanotify

This release adds a new {{{FAN_FS_ERROR}}} fanotify event type for file system-wide error reporting. It is meant to be used by file system health monitoring daemons, which listen for these events and take actions (notify sysadmin, start recovery) when a file system problem is detected. It tries to report only the first error that occurred for a file system since the last notification, and it simply counts additional errors. This ensures that the most important pieces of information are never lost. Right now, the only file system that supports this interface is Ext4.

Documentation: Documentation/admin-guide/filesystem-monitoring.rst

Memory folios infrastructure for a faster memory management

To manage the system's memory, the available RAM is split into small units, called pages. The size of these pages vary depending on the architecture, but on x86 systems it's KB. In modern systems with several tens of GB, such small page size equals to a vast amount of pages, which are difficult to manage. To solve this problem, the Linux kernel developed the concept of compound pages, which are page structures that can contain more than one physical page. But the way these compound pages work is not clear, and it has bug-prone APIs that also introduce some overhead all across the kernel.

This release introduces the concept of page folios, which are like compound pages, but with better semantics. Using page folios in some core parts of the kernel brings some performance improvements in common workloads. This release will include the core infrastructure of page folios and converts some parts of the core memory management subsystem and the page cache. Future releases will convert some file systems and introduce multi-page folios.

Recommended LWN articles: Clarifying memory management with page folios

Add cluster scheduler support to the task scheduler

Some machines have a level of hardware topology in which some CPU cores, typically 4 cores, share L3 tags (e.g. ARM's Kunpeng 920) or L2 cache (e.g. x86's Jacobsville). Awareness of this special topology can drastically improve the task scheduling decisions: spreading those tasks between clusters will bring more memory bandwidth and decrease cache contention (but this isn't always a win: packing tasks might help decrease the latency of cache synchronization). This release adds support for cluster typologies to the task scheduler.

Add support for AMX instructions

This release adds support for Intel's Advanced Matrix Extensions (AMX)

Because of the size that this new extension would add to the signal stack of each task, it requires using a new ```arch_prctl(2)``` mechanism to read the supported features and request permission for dynamically enabling it for the calling process and its children.

DAMON-based proactive memory reclamation, operation schemes and physical memory monitoring

Following up the merge of DAMON in Linux 5.15

* DAMON_RECLAIM, which is based on DAMON and finds cold memory regions and reclaims those immediately. It is intended to be used as proactive lightweight reclamation logic for light memory pressure. To avoid it consuming too much CPU for the paging out operation, a speed limit can be configured, and for heavy memory pressure, it is possible to configure it to disable itself and fall back to the traditional page-scanning based reclamation. Documentation: Documentation/admin-guide/mm/damon/reclaim.rst

* Data Access Monitoring-based Operation Schemes (DAMOS). In short, this feature allows applying a defined {{{madvise()}}} operation to a memory region that has a specific access frequency for a specified time. This is done by writing a line to a {{{schemes}}} file in the debugfs filesystem. For example, the configuration {{{# echo "4096 8192 0 5 10 20 2" > schemes}}} means "If a memory region of size in [4KiB, 8KiB] is showing accesses per aggregate interval in [0, 5] for aggregate interval in [10, 20], apply the madvise MADV_PAGEOUT operation". Documentation: Documentation/admin-guide/mm/damon/usage.rst

* Add support for Physical Memory Address Space Monitoring. Recent versions only supported monitoring virtual memory addresses.

Improve write congestion

When a process writes lots of data and the disk can't keep up (i.e. it's "congested"), the process must not be allowed to continue making more write requests until the current write requests are completed. The mechanisms used to signal when congestion is happening is completely broken and is being replaced with a new approach.

Recommended LWN article: Replacing congestion_wait()