Linux 4.7 was released
Summary: This release adds support for the recent Radeon RX 480 GPUs, support for parallel pathname lookups in the same directory, a new experimental 'schedutils' frequency governor that should be faster and more accurate than existing governors, support for the EFI 'Capsule' mechanism for upgrading firmware, support for virtual USB Devices in USB/IP to make emulated phones behave as real USB devices, a new security module '{{{LoadPin}}}' that ensures that all kernel modules are loaded from the same filesystem, an interface to create histograms of events in the ftrace interface, support for attaching BPF programs to kernel tracepoints, support for callchains of events in the perf trace utility, stable support for the Android's sync_file fencing mechanism, and many other improvements and new drivers.
Support for Radeon RX480 GPUs
This release includes support for just released Radeon RX 480 GPUs
Code: (merge)
Parallel directory lookups
The directory cache caches information about path names to make them quickly available for pathname lookup. This allows to speed up many common operations; for example, it allows to determine if a particular file or directory exists without having to read the disk. This cache uses a mutex to serialize lookup of names in the same directory.
In this release, the serializing mutex has been switched to a read-write semaphore, allowing for parallel pathname lookups in the same directory. Most workloads won't notice any improvement (cached pathname lookups are fast and having locking contention issues there is very rare), specific workloads that make very heavy use of pathname lookups in the same directory will be faster because they will be able to do them in parallel. Most filesystems have been converted to allow this feature.
Code: commit
New 'schedutil" frequency governor
This release adds a new governor to the dynamic frequency scaling subsystem (cpufreq). There are two main differences between it and the existing governors. First, it uses information provided by the scheduler directly for making its decisions. Second, it can invoke cpufreq drivers and change the frequency to adjust CPU performance right away, without having to spawn work items to be executed in process context or similar.
What this means is that the latency to make frequency changes in the face of workload variations should be very small, and thanks to the information provided by the scheduler, it can make more accurate decisions. Note also that the schedutil governor, as included in this release is very simple and it's regarded as a foundation for improving on the integration of the scheduler with CPU power management; but it works and the preliminary results are encouraging. The governor shares some tunables management with other governors.
Recommended LWN article: Improvements in CPU frequency management
Code: commit
Histograms of events in ftrace
'Hist' triggers are a new addition to ftrace, the Linux tracing infrastructure available since 2.6.27
{{{echo 'hist:key=common_pid.execname:val=count:sort=count.descending' > /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/trigger}}}
What this strange command does is to write a command to the {{{trigger}}} file of the {{{sys_enter_read}}} event (the one corresponding to a process entering the read() system call, that is, trying to read a file). Triggering this event will run the following hist command ({{{hist:}}}) that means the following: for each hit on the event, get the PID ({{{common_pid}}} (you can see all the possible fields to query in {{{/sys/kernel/debug/tracing/events/syscalls/sys_enter_read/format}}}) and convert it to process names ({{{.execname}}} suffix); this will be used as key ({{{key=}}}) in the histogram. The {{{val=count}}} parameter makes the hist command to also query the {{{count}}} field, which in the {{{sys_enter_read}}} event it means the number of bytes read. Finally, after the {{{:}}} separator, the {{{sort=count.descending}}} makes the command sort the result by the field {{{count}}} in descending order. This is the resulting output (note that the hits for the same PID will be aggregated):
{{{ # cat /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/hist
# trigger info: hist:keys=common_pid.execname:vals=count:sort=count.descending:size=2048 [active]
{ common_pid: gnome-terminal [ 3196] } hitcount: 280 count: 1093512
{ common_pid: Xorg [ 1309] } hitcount: 525 count: 256640
{ common_pid: compiz [ 2889] } hitcount: 59 count: 254400
{ common_pid: bash [ 8710] } hitcount: 3 count: 66369
{ common_pid: dbus-daemon-lau [ 8703] } hitcount: 49 count: 47739
{ common_pid: irqbalance [ 1252] } hitcount: 27 count: 27648
{ common_pid: 01ifupdown [ 8705] } hitcount: 3 count: 17216
{ common_pid: dbus-daemon [ 772] } hitcount: 10 count: 12396
{ common_pid: Socket Thread [ 8342] } hitcount: 11 count: 11264
{ common_pid: nm-dhcp-client. [ 8701] } hitcount: 6 count: 7424
{ common_pid: gmain [ 1315] } hitcount: 18 count: 6336
.
.
.
{ common_pid: postgres [ 1892] } hitcount: 2 count: 32
{ common_pid: postgres [ 1891] } hitcount: 2 count: 32
{ common_pid: gmain [ 8704] } hitcount: 2 count: 32
{ common_pid: upstart-dbus-br [ 2740] } hitcount: 21 count: 21
{ common_pid: nm-dispatcher.a [ 8696] } hitcount: 1 count: 16
{ common_pid: indicator-datet [ 2904] } hitcount: 1 count: 16
{ common_pid: gdbus [ 2998] } hitcount: 1 count: 16
{ common_pid: rtkit-daemon [ 2052] } hitcount: 1 count: 8
{ common_pid: init [ 1] } hitcount: 2 count: 2
Totals:
Hits: 2116
Entries: 51
Dropped: 0
}}}
This output shows what processes are reading files, how much (count), and how often they try to read (hitcount, which wasn't specified but it is included by default). For more information about hist and its possibilities, see the hist triggers documentation in Documentation/trace/events.txt
Code: commit
perf trace calls stack
In this release, {{{perf trace}}} adds the ability of printing a userspace callchain each time an system call is hit. An example of a callchain for a recvmsg() syscall issued by gnome-shell:
{{{3292.421 ( 0.002 ms): gnome-shell/2287 recvmsg(fd: 11
__GI___libc_recvmsg+0x2d (/usr/lib64/libpthread-2.22.so)
_xcb_in_read+0xa7 (/usr/lib64/libxcb.so.1.1.0)
poll_for_next_event+0x68 (/usr/lib64/libxcb.so.1.1.0)
poll_for_event+0xb8 (/usr/lib64/libX11.so.6.3.0)
poll_for_response+0xab (/usr/lib64/libX11.so.6.3.0)
_XEventsQueued+0x5d (/usr/lib64/libX11.so.6.3.0)
XPending+0x57 (/usr/lib64/libX11.so.6.3.0)
gdk_event_source_check+0x51 (/usr/lib64/libgdk-3.so.0.1800.9)
g_main_context_check+0x1b1 (/usr/lib64/libglib-2.0.so.0.4600.2)
g_main_context_iterate.isra.29+0x120 (/usr/lib64/libglib-2.0.so.0.4600.2)
g_main_loop_run+0xc2 (/usr/lib64/libglib-2.0.so.0.4600.2)
meta_run+0x2c (/usr/lib64/libmutter.so.0.0.0)
main+0x3f7 (/usr/bin/gnome-shell)
__libc_start_main+0xf0 (/usr/lib64/libc-2.22.so)
[0x2909] (/usr/bin/gnome-shell)}}}
You can try it with commands such as {{{# trace --call dwarf ping 127.0.0.1}}}. You can also only print callchains for a single event, for example: {{{perf trace --event sched:sched_switch/call-graph=fp/ -a sleep 1}}}. Tracing page faults (option {{{-F/--pf}}}) also support it, for example, tracing write syscalls and major page faults with callchains while starting firefox, limiting the stack to 5 frames, can be done with {{{# perf trace -e write --pf maj --max-stack 5 firefox}}}. An excerpt of a system wide {{{perf trace --call dwarf}}} session can be found here
Allow BPF programs to attach to tracepoints
This release adds a new type of BPF program ({{{BPF_PROG_TYPE_TRACEPOINT}}}) that can be used to build BPF programs that can be attached to kernel tracepoints. This makes possible to build programs that collect data from tracepoints and process them in the BPF program. This is a faster alternative to access tracepoints than kprobes, it can make the tracing programs more stable, and allows to build more complex tracing tools.
Recommended LWN article: Tracepoints with BPF
Code: commit
EFI 'Capsule' firmware updates
This release adds support for the the EFI Capsule mechanism, which allows to pass data blobs to the EFI firmware. The firmware then parses them and makes some decision based upon their contents. The most common use case is to bundle a flashable firmware image into a capsule that the firmware can use to upgrade in the next boot the existing version in the flash. Users can upload capsule by writting the firmware to the {{{/dev/efi_capsule_loader}}} device
Recommended blog: Better Firmware Updates in Linux using UEFI Capsules
Code: commit
Support for creating virtual USB Device Controllers in USB/IP
This feature has several uses; for example, it makes possible to improve phone emulation in development environments. Emulated phones can be now connected to developer's machine or another virtual machine as if it would be a physical phone. It is also useful for testing USB and for educational purposes.
Code: commit
Android's sync_file fencing mechanism considered stable
In this release, the sync_file code that was in the staging/ directory has been moved to the real kernel. The Linux Kernel only had an implicit fencing mechanism where the fence are attached directly to buffers and userspace is unaware of what is happening; explicit fencing is not supported.
sync_file is a explicit fencing mechanism designed for Android that help the userspace handles fences directly. Instead of attaching a fence to the buffer a producer driver, it sends the fence related to the buffer to userspace via a sync_file, which can then be sent to the consumer, that will not use the buffer for anything before the fence(s) signals. With this explicit fencing we have a global mechanism that optimizes the flow of buffers between consumers and producers, avoid a lot of waiting. So instead of waiting for a buffer to be processed by the GPU before sending it to DRM in an Atomic IOCTL we can get a sync_file fd from the GPU driver at the moment we submit the buffer processing. The compositor then passes these fds to DRM in a atomic commit request, that will not be displayed until the fences signal, i.e, the GPU finished processing the buffer and it is ready to display.
Documentation: Documentation/sync_file.txt
Code: commit
LoadPin, a security module to restrict the origin of kernel modules
{{{LoadPin}}} is a new Linux Security Module that ensures all files loaded by the kernel (kernel modules, firmware, kexec images, security policies) all originate from the same filesystem. The expectation is that the filesystem is backed by a read-only device such as a CDROM or dm-verity (this feature comes from ChromeOS, where the device as a whole is verified cryptographically via dm-verity). This allows systems that have a verified and/or unchangeable filesystem to enforce module and firmware loading restrictions without needing to sign the files
Recommended LWN article: The LoadPin security module
Code: commit