Linux 4.6 was released
Summary: This release adds support for USB 3.1 !SuperSpeedPlus (10 Gbps), the new distributed file system OrangeFS, a more reliable out-of-memory handling, support for Intel memory protection keys, a facility to make easier and faster implementations of application layer protocols, support for 802.1AE MAC-level encryption (MACsec), support for the version V of the BATMAN protocol, a OCFS2 online inode checker, support for cgroup namespaces, support for the pNFS SCSI layout, and many other improvements and new drivers.
USB 3.1 SuperSpeedPlus (10 Gbps) support
USB 3.1 specification includes a new !SuperSpeedPlus protocol supporting up to 10Gbps speeds. USB 3.1 devices using the new !SuperSpeedPlus protocol are called USB 3.1 Gen2 devices (note that USB 3.1 !SuperSpeedPlus is not the same as Type-C or power delivery).
This release adds support for the USB 3.1 !SuperSpeedPlus 10 Gbps speeds for USB core and xHCI host controller, meaning that a USB 3.1 mass storage connected to a USB 3.1 capable xHCI host should work with 10 Gbps speeds.
Code: commit
Improve the reliability of the Out Of Memory task killer
In previous releases, the OOM killer (which tries to kill a task to free memory) tries to kill a single task in a good hope that the task will terminate in a reasonable time and frees up its memory. In practice, it has been shown that it's easy to find workloads which break that assumption, and the OOM victim might take unbounded amount of time to exit because it might be blocked in the uninterruptible state waiting for an event which is blocked by another task looping in the page allocator. This release adds a specialized kernel thread {{{oom_reaper}}} that tries to reclaim memory by preemptively reaping the anonymous or swapped out memory owned by the OOM victim, under an assumption that such a memory won't be needed when its owner is killed anyway.
Recommended LWN article: Toward more predictable and reliable out-of-memory handling
Code: commit
Support for Intel memory protection keys
This release adds support for a memory protection hardware feature that is available in upcoming Intel CPUs: protection keys. Protection keys allow the encoding of user-controllable permission masks in the page table entries (pte). Instead of having a fixed protection mask in the pte (which needs a system call to change and works on a per page basis), the user can map a handful of protection mask variants. User space can then manipulate a new user-accessible, thread-local register, (PKRU) with two separate bits (Access Disable and Write Disable) for each mask. This makes possible to dynamically switch the protection bits of very large amounts of virtual memory by just manipulating a CPU register, without having to change every single page in the affected virtual memory range.
It also allows more precise control of MMU permission bits: for example the executable bit is separate from the read bit. This release adds the infrastructure for that, plus it adds a high level API to make use of protection keys. If a user-space application calls: {{{mmap(..., PROT_EXEC)}}} or {{{mprotect(ptr, sz, PROT_EXEC)}}} (note PROT_EXEC-only, without PROT_READ/WRITE), the kernel will notice this special case, and will set a special protection key on this memory range. It also sets the appropriate bits in the PKRU register so that the memory becomes unreadable and unwritable. So using protection keys the kernel is able to implement 'true' {{{PROT_EXEC}}}: code that can be executed, but not read, which is a small security advantage (but note that malicious code can manipulate the PKRU register too). In the future, there will further work around protection keys that will offer more high level call APIs to manage protection keys.
Recommended LWN article: Memory protection keys
Code: (merge)
OrangeFS, a new distributed file system
OrangeFS is an LGPL scale-out parallel storage system. Oiginally called PVFS, it was first developed in 1993 by Walt Ligon and Eric Blumer as a parallel file system for Parallel Virtual Machine as part of a NASA grant to study the I/O patterns of parallel programs. It is ideal for large storage problems faced by HPC, !BigData, Streaming Video, Genomics, Bioinformatics. OrangeFS can be accessed through included system utilities, user integration libraries, MPI-IO and can be used by the Hadoop ecosystem as an alternative to the HDFS filesystem.
Applications often don't require Orangefs to be mounted into the VFS, but the Orangefs kernel client allows Orangefs filesystems to be mounted as a VFS. The kernel client communicates with a userspace daemon which in turn communicates with the Orangefs server daemons that implement the file system. The server daemons (there's almost always more than one) need not be running on the same host as the kernel client. Orangefs filesystems can also be mounted with FUSE.
Recommended LWN article: The OrangeFS distributed filesystem
Documentation: Documentation/filesystems/orangefs.txt
Website: http://www.orangefs.org/
Code: fs/orangefs
Kernel Connection Multiplexor, a facility for accelerating application layer protocols
This release adds Kernel Connection Multiplexor (KCM), a facility that provides a message-based interface over TCP for accelerating application layer protocols. The motivation for this is based on the observation that although TCP is byte stream transport protocol with no concept of message boundaries, a common use case is to implement a framed application layer protocol running over TCP. Most TCP stacks offer byte stream API for applications, which places the burden of message delineation, message I/O operation atomicity, and load balancing in the application.
With KCM an application can efficiently send and receive application protocol messages over TCP using a datagram interface. The kernel provides necessary assurances that messages are sent and received atomically. This relieves much of the burden applications have in mapping a message based protocol onto the TCP stream. KCM also make application layer messages a unit of work in the kernel for the purposes of steerng and scheduling, which in turn allows a simpler networking model in multithreaded applications. In order to delineate message in a TCP stream for receive in KCM, the kernel implements a message parser based on BPF, which parses application layer messages and returns a message length. Nearly all binary application protocols are parseable in this manner, so KCM should be applicable across a wide range of applications.
For development plans, benchmarks and FAQ, see the merge
Recommended LWN article: The kernel connection multiplexer
API documentation: Documentation/networking/kcm.txt
Code: commit
802.1AE MAC-level encryption (MACsec)
This release adds support for MACsec IEEE 802.1AE
Media: DevConf.cz video about MACsec
Code: commit
BATMAN V protocol
B.A.T.M.A.N. (Better Approach To Mobile Adhoc Networking) adds support for the V protocol
Code: commit
dma-buf: new ioctl to manage cache coherency between CPU and GPU
Userspace might need some sort of cache coherency management e.g. when CPU and GPU domains are being accessed through dma-buf at the same time. To circumvent this problem there are begin/end coherency markers, that forward directly to existing dma-buf device drivers vfunc hooks. Userspace can make use of those markers through the {{{DMA_BUF_IOCTL_SYNC}}} ioctl.
Recommender article: Sharing CPU and GPU buffers on Linux
Code: commit
OCFS2 online inode checker
OCFS2 is often used in high-availaibility systems. OCFS2 usually converts the filesystem to read-only when encounters an error, but this decreases availability and is not always necessary. OCFS2 has the mount option ({{{errors=continue}}}), which returns the {{{EIO}}} error to the calling process, it doesn't remount the filesystem to read-only, and the problematic file's inode number is reported in the kernel log. This release adds a very simple in-kernel inode checker that can be used to check and reset the inode. Note that this feature is intended for very small issues which may hinder day-to-day operations of a cluster filesystem by turning the filesystem read-only, it is not suited for complex checks which involve dependency of other components of the filesystem. In these cases, the offline fsck is recommended.
The scope of checking/fixing is at the file level, initially only for regular files. The way this file checker is by writting the inode number, reported in dmesg, to {{{/sys/fs/ocfs2/devname/filecheck/check}}}, then read the output of that file to know what kind of error it has. If you determine to fix this inode, write the inode number to {{{/sys/fs/ocfs2/devname/filecheck/fix}}}, then read the file to know if the inode was able to be fixed or not. For more details see the documentation
Code: commit
Support for cgroup namespaces
This release adds support for cgroup namespaces
Without cgroup namespace, the {{{/proc/$PID/cgroup}}} file shows the complete path of the cgroup of a process. In a container setup where a set of cgroups and namespaces are intended to isolate processes the {{{/proc/$PID/cgroup}}} file may leak potential system level information to the isolated processes.
Documentation https://git.kernel.org/torvalds/c/d4021f6cd41f03017f831b3d40b0067bed54893d
Code: commit
Add support for the pNFS SCSI layout
This release adds NFSv4.1 support for parallel NFS SCSI layouts in the Linux NFS server, a variant of the block layout which uses SCSI features to offer improved fencing and device identification. With pNFS SCSI layouts, the NFS server acts as Metadata Server for pNFS, which in addition to handling all the metadata access to the NFS export, also hands out layouts to the clients so that they can directly access the underlying SCSI LUNs that are shared with the client. See draft-ietf-nfsv4-scsi-layout
To use pNFS SCSI layouts, the exported file system needs to support the pNFS SCSI layouts (currently just XFS), and the file system must sit on a SCSI LUN that is accessible to the clients in addition to the MDS. As of now the file system needs to sit directly on the exported LUN, striping or concatenation of LUNs on the MDS and clients is not supported yet. On a server built with {{{CONFIG_NFSD_SCSI}}}, the pNFS SCSI volume support is automatically enabled if the file system is exported using the "pnfs" option and the underlying SCSI device support persistent reservations. On the client make sure the kernel has the {{{CONFIG_PNFS_BLOCK}}} option enabled, and the file system is mounted using the NFSv4.1 protocol version ({{{mount -o vers=4.1}}}.
Code: commit