Lista para version 6.0

Linux 6.0 has been released

Summary: This release includes a Runtime Verification system that aims to complement classical exhaustive verification techniques; several io_uring features such as async buffered writes, a io_uring based userspace block driver, and networking zero-copy send, a new Btrfs send protocol version with improvement features, XFS scalability enhancements; task scheduler performance improvements; a dma-buf API for exporting and importing sync files; BPF improvements; and LRU list quality improvement with DAMON. As always, there are many other features, new drivers, improvements and fixes.

io_uring features

This release includes several io_uring improvements: async buffered writes (3x performance improvements using XFS), an userspace block driver which delivers io request from an ublk block device (/dev/ublkbN) into ublk server; a synchronous cancellation API, networking zerocopy send, LSM hooks for IORING_OP_URING_CMD, multishot recvmsg, and other features.

Recommended LWN article: Zero-copy network transmission with io_uring

Runtime verification system

Runtime Verification is a lightweight (yet rigorous) method that complements classical exhaustive verification techniques (such as model checking and theorem proving) with a more practical approach for complex systems.

Instead of relying on a fine-grained model of a system (e.g., a re-implementation a instruction level), RV works by analyzing the trace of the system's actual execution, comparing it against a formal specification of the system behavior.

Recommended LWN article: The runtime verification subsystem

Recommended video: Formal Verification Made Easy (and fast!)

Btrfs send v2 and other improvements

This release includes support for a new version of the protocol that is used by the btrfs `send/receive` utility. The protocol includes new commands: ability write larger data chunks than 64K, send raw compressed extents using the encoded data ioctls (avoiding the uncompress-compress process), send 'otime' (inode creation time), send file attributes (file flags and xflags).

There are also performance improvement in several areas, especially in direct I/O reads (improved throughput by 3x on sample workload).

XFS scalability improvements

This release includes some XFS log scalability improvements by removing spinlocks and global synchronization points. Also, there are lockless lookups for the buffer cache, which provide much better performance with higher CPU counts.

New perf tools: lock contention and kwork

This release brings support for a new 'perf lock contention' subtool, using new lock contention tracepoints and using BPF for in kernel aggregation and then userspace processing using the perf tooling infrastructure for resolving symbols, target specification, etc

There is also a new 'perf kwork' tool to trace time properties of kernel work (such as softirq, and workqueue), uses eBPF skeletons to collect info in kernel space, aggregating data that then gets processed by the userspace tool.

BPF improvements

As usual, the BPF subsystem includes support for several new features. This release adds support for type match support, 64 bit enums, sleepable uprobes, improved loop performance, a new eBPF-based LSM flavor, and many other improvements.

Task scheduler improvements

This release includes several improvements for the task scheduler: improved NUMA balancing on AMD Zen systems for affine workloads; improve handling of reduced-capacity CPUs in load-balancing; energy Model improvements; spend much less time searching for an idle CPU on overloaded systems, improved NUMA imbalance behavior, improved core scheduling and wakeup-balancing...

Better LRU list quality with DAMON

This release includes a DAMON-based LRU-lists Sorting, a static kernel module that is aimed to improve the quality of LRU-lists, which are used to determine if a page has

As page-granularity access checking overhead could be significant on huge systems, LRU lists are normally not proactively sorted but partially and reactively sorted for special events including specific user requests, system calls and memory pressure. As a result, LRU lists are sometimes not so perfectly prepared to be used as a trustworthy access pattern source for some situations including reclamation target pages selection under sudden memory pressure.

DAMON can identify access patterns of best-effort accuracy while inducing only user-specified range of overhead, so proactively running DAMON_LRU_SORT could be helpful for making LRU lists more trustworthy access pattern source with low and controlled overhead. DAMON_LRU_SORT finds hot pages (pages of memory regions that showing access rates that higher than a user-specified threshold) and cold pages (pages of memory regions that showing no access for a time that longer than a user-specified threshold) using DAMON, and prioritizes hot pages while deprioritizing cold pages on their LRU-lists. To avoid it consuming too much CPU for the prioritizations, a CPU time usage limit can be configured.

Documentation: DAMON-based LRU-lists Sorting

dma-buf: Add API for exporting and importing sync files

Modern userspace APIs like Vulkan are built on an explicit synchronization model. This doesn't always play nicely with the implicit synchronization used in the kernel and assumed by X11 and Wayland. The client -> compositor half of the synchronization isn't too bad, because the kernel can control whether or not the graphic driver synchronizes on the buffer and whether or not it's considered written. The harder part is the compositor -> client synchronization when we get the buffer back from the compositor. We're required to be able to provide the client with a !VkSemaphore and !VkFence representing the pointin time where the window system (compositor and/or display) finished using the buffer. With current APIs, it's very hard to do this in such a way that we don't get confused by the Vulkan driver's access of the buffer. In particular, once we tell the kernel that we're rendering to the buffer again, any CPU waits on the buffer or GPU dependencies will wait on some of the client rendering and not just the compositor.

This release adds a new ioctl that solves this problem by allowing to get a snapshot of the implicit synchronization state of a given dma-buf in the form of a sync file. It's effectively the same as a poll() or I915_GEM_WAIT only, instead of CPU waiting directly, it encapsulates the wait operation, at the current moment in time, in a sync_file so we can check/wait on it later. As long as the Vulkan driver does the sync_file export from the dma-buf before we re-introduce it for rendering, it will only contain fences from the compositor or display. This allows to accurately turn it into a !VkFence or !VkSemaphore without any over-synchronization.

There is also another ioctl that allows you to import a sync_file into a dma-buf. Unlike the previous one, however, this does add genuinely new functionality to dma-buf. Without this, the only way to attach a sync_file to a dma-buf is to submit a batch to your driver of choice which waits on the sync_file and claims to write to the dma-buf. Even if said batch is a no-op, a submit is typically way more overhead than just attaching a fence. A submit may also imply extra synchronization with other work because it happens on a hardware queue.

In the Vulkan world, this is useful for dealing with the out-fence from !vkQueuePresent. Current Linux window-systems (X11, Wayland, etc.) all rely on dma-buf implicit sync. Since Vulkan is an explicit sync API, we get a set of fences (!VkSemaphores) in vkQueuePresent and have to stash those as an exclusive (write) fence on the dma-buf. We handle it in Mesa today with the above mentioned dummy submit trick. This ioctl would allow us to set it directly without the dummy submit. This may also open up possibilities for GPU drivers to move away from

implicit sync for their kernel driver uAPI and instead provide sync files and rely on dma-buf import/export for communicating with other implicit sync clients.

Support for Intel SGX2

This release includes support for the Intel Software Guard Extensions v2, also referred to as Enclave Dynamic

Memory Management (EDMM). These extensions allow for several features absent in the first version, allowing changes to initialized enclaves: modifying enclave page permissions and type, and dynamically adding and removing of enclave pages. When an enclave accesses an address within its address range that does not have a backing page then a new regular page will be dynamically added to the enclave.