Linus Torvalds [Fri, 10 Mar 2017 00:30:37 +0000 (16:30 -0800)]
Merge tag 'pm-4.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix several issues in the intel_pstate driver and one issue in
the schedutil cpufreq governor, clean up that governor a bit and hook
up existing code for disabling cpufreq to a new kernel command line
option.
Specifics:
- Three fixes for intel_pstate problems related to the passive mode
(in which it acts as a regular cpufreq scaling driver), two for the
handling of global P-state limits and one for the handling of the
cpu_frequency tracepoint in that mode (Rafael Wysocki).
- Three fixes for the handling of P-state limits in intel_pstate in
the active mode (Rafael Wysocki).
- Introduction of a new cpufreq.off=1 kernel command line argument
that will disable cpufreq entirely if passed to the kernel and is
simply hooked up to the existing code used by Xen (Len Brown).
- Fix for the schedutil cpufreq governor to prevent it from using
stale raw frequency values in configurations with mutiple CPUs
sharing one policy object and a cleanup for it reducing its
overhead slightly (Viresh Kumar)"
* tag 'pm-4.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: intel_pstate: Do not reinit performance limits in ->setpolicy
cpufreq: intel_pstate: Fix intel_pstate_verify_policy()
cpufreq: intel_pstate: Fix global settings in active mode
cpufreq: Add the "cpufreq.off=1" cmdline option
cpufreq: schedutil: Pass sg_policy to get_next_freq()
cpufreq: schedutil: move cached_raw_freq to struct sugov_policy
cpufreq: intel_pstate: Avoid triggering cpu_frequency tracepoint unnecessarily
cpufreq: intel_pstate: Fix intel_cpufreq_verify_policy()
cpufreq: intel_pstate: Do not use performance_limits in passive mode
Linus Torvalds [Fri, 10 Mar 2017 00:02:04 +0000 (16:02 -0800)]
Merge tag 'pci-v4.11-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI fixes from Bjorn Helgaas:
"PCI fixes:
- fix NULL pointer dereference in Exynos driver
- fix NULL pointer dereference in ASPM with pre-1.1 PCIe devices
- blacklist QLogic ISP2722 to prevent panics while reading VPD"
* tag 'pci-v4.11-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
PCI/ASPM: Always set link->downstream to avoid NULL dereference on remove
PCI: Prevent VPD access for QLogic ISP2722
PCI: exynos: Initialize elbi_base even when using PHY framework
Linus Torvalds [Thu, 9 Mar 2017 23:53:25 +0000 (15:53 -0800)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"Sending this a bit sooner than I otherwise would have, as a fix in the
merge window had some unfortunate issues and side effects for some
folks.
This contains:
- Fixes from Jan for the bdi registration/unregistration. These have
been tested by the various parties reporting issues, and should be
solid at this point.
- Also from Jan, fix for axonram gendisk registration.
- A stable fix for zram from Johannes.
- A small series from Ming, fixing up some long standing issues with
blk-mq hardware queue kobject initialization and registration.
- A fix for sed opal from Jon, fixing a nonsensical range check and
some set-but-not-used variables.
- A fix from Neil for a long standing deadlock issue for stacking
device drivers. With this in place, dm/md don't have to work around
the issue anymore, and can be properly fixed up"
* 'for-linus' of git://git.kernel.dk/linux-block:
axonram: Fix gendisk handling
blk: improve order of bio handling in generic_make_request()
Revert "scsi, block: fix duplicate bdi name registration crashes"
block: Make del_gendisk() safer for disks without queues
bdi: Fix use-after-free in wb_congested_put()
block: Allow bdi re-registration
block/sed: Fix opal user range check and unused variables
zram: set physical queue limits to avoid array out of bounds accesses
blk-mq: free hctx->cpumask in release handler of hctx's kobject
blk-mq: make lifetime consistent between hctx and its kobject
blk-mq: make lifetime consitent between q/ctx and its kobject
blk-mq: initialize mq kobjects in blk_mq_init_allocated_queue()
Linus Torvalds [Thu, 9 Mar 2017 23:50:56 +0000 (15:50 -0800)]
Merge tag 'media/v4.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
"Media regression fixes:
- serial_ir: fix a Kernel crash during boot on Kernel 4.11-rc1, due
to an IRQ code called too early
- other IR regression fixes at lirc and at the raw IR decoding
- a deadlock fix at the RC nuvoton driver
- fix another issue with DMA on stack at dw2102 driver
There's an extra patch there that change a driver interface for the
SoC VSP1 driver, with is shared between the DRM and V4L2 driver. The
patch itself is trivial, and was acked by David Arlie"
* tag 'media/v4.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
[media] v4l: vsp1: Adapt vsp1_du_setup_lif() interface to use a structure
[media] dw2102: don't do DMA on stack
[media] rc: protocol is not set on register for raw IR devices
[media] rc: raw decoder for keymap protocol is not loaded on register
[media] rc: nuvoton: fix deadlock in nvt_write_wakeup_codes
[media] lirc: fix dead lock between open and wakeup_filter
[media] serial_ir: ensure we're ready to receive interrupts
Linus Torvalds [Thu, 9 Mar 2017 20:23:30 +0000 (12:23 -0800)]
Merge tag 'for-linus-4.11-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fix and cleanup from Juergen Gross:
"This contains one fix for MSIX handling under Xen and a trivial
cleanup patch"
* tag 'for-linus-4.11-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xenbus: Remove duplicate inclusion of linux/init.h
xen: do not re-use pirq number cached in pci device msi msg data
* pm-cpufreq:
cpufreq: intel_pstate: Do not reinit performance limits in ->setpolicy
cpufreq: intel_pstate: Fix intel_pstate_verify_policy()
cpufreq: intel_pstate: Fix global settings in active mode
cpufreq: Add the "cpufreq.off=1" cmdline option
cpufreq: intel_pstate: Avoid triggering cpu_frequency tracepoint unnecessarily
cpufreq: intel_pstate: Fix intel_cpufreq_verify_policy()
cpufreq: intel_pstate: Do not use performance_limits in passive mode
Tony Luck [Wed, 8 Mar 2017 17:35:39 +0000 (09:35 -0800)]
mm, page_alloc: Add missing check for memory holes
Commit 13ad59df67f1 ("mm, page_alloc: avoid page_to_pfn() when merging
buddies") moved the check for memory holes out of page_is_buddy() and
had the callers do the check.
But this wasn't done correctly in one place which caused ia64 to crash
very early in boot.
Update to fix that and make ia64 boot again.
[ v2: Vlastimil pointed out we don't need to call page_to_pfn()
since we already have the result of that in "buddy_pfn" ]
Fixes: 13ad59df67f1 ("avoid page_to_pfn() when merging buddies") Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Wed, 8 Mar 2017 19:06:05 +0000 (11:06 -0800)]
Merge tag 'ktest-v4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest
Pull ktest fixes from Steven Rostedt:
"Greg Kroah-Hartman reported to me that the ktest of v4.11-rc1 locked
up in an infinite loop while doing the make mrproper.
Looking into the cause I noticed that a recent update to the function
run_command (used for running all shell commands, including "make
mrproper") changed the internal loop to use the function
wait_for_input.
The wait_for_input function uses select to look at two file
descriptors. One is the file descriptor of the command it is running,
the other is STDIN. The STDIN check was not checking the return status
of the sysread call, and was also just writing a lot of data into
syswrite without regard to the size of the data read.
Changing the code to check the return status of sysread, and also to
still process the passed in descriptor data without looping back to
the select fixed Greg's problem.
While looking at this code I also realized that the loop did not honor
the timeout if STDIN always had input (or for some reason return
error). this could prevent wait_for_input to timeout on the file
descriptor it is suppose to be waiting for. That is fixed too"
* tag 'ktest-v4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest:
ktest: Make sure wait_for_input does honor the timeout
ktest: Fix while loop in wait_for_input
Linus Torvalds [Wed, 8 Mar 2017 18:42:13 +0000 (10:42 -0800)]
overlayfs: remove now unnecessary header file include
This removes the extra include header file that was added in commit e58bc927835a "Pull overlayfs updates from Miklos Szeredi" now that it
is no longer needed.
There are probably other such includes that got added during the
scheduler header splitup series, but this is the one that annoyed me
personally and I know about.
Linus Torvalds [Tue, 7 Mar 2017 23:33:14 +0000 (15:33 -0800)]
sched/headers: fix up header file dependency on <linux/sched/signal.h>
The scheduler header file split and cleanups ended up exposing a few
nasty header file dependencies, and in particular it showed how we in
<linux/wait.h> ended up depending on "signal_pending()", which now comes
from <linux/sched/signal.h>.
That's a very subtle and annoying dependency, which already caused a
semantic merge conflict (see commit e58bc927835a "Pull overlayfs updates
from Miklos Szeredi", which added that fixup in the merge commit).
It turns out that we can avoid this dependency _and_ improve code
generation by moving the guts of the fairly nasty helper #define
__wait_event_interruptible_locked() to out-of-line code. The code that
includes the signal_pending() check is all in the slow-path where we
actually go to sleep waiting for the event anyway, so using a helper
function is the right thing to do.
Using a helper function is also what we already did for the non-locked
versions, see the "__wait_event*()" macros and the "prepare_to_wait*()"
set of helper functions.
We might want to try to unify all these macro games, we have a _lot_ of
subtly different wait-event loops. But this is the minimal patch to fix
the annoying header dependency.
NeilBrown [Tue, 7 Mar 2017 20:38:05 +0000 (07:38 +1100)]
blk: improve order of bio handling in generic_make_request()
To avoid recursion on the kernel stack when stacked block devices
are in use, generic_make_request() will, when called recursively,
queue new requests for later handling. They will be handled when the
make_request_fn for the current bio completes.
If any bios are submitted by a make_request_fn, these will ultimately
be handled seqeuntially. If the handling of one of those generates
further requests, they will be added to the end of the queue.
This strict first-in-first-out behaviour can lead to deadlocks in
various ways, normally because a request might need to wait for a
previous request to the same device to complete. This can happen when
they share a mempool, and can happen due to interdependencies
particular to the device. Both md and dm have examples where this happens.
These deadlocks can be erradicated by more selective ordering of bios.
Specifically by handling them in depth-first order. That is: when the
handling of one bio generates one or more further bios, they are
handled immediately after the parent, before any siblings of the
parent. That way, when generic_make_request() calls make_request_fn
for some particular device, we can be certain that all previously
submited requests for that device have been completely handled and are
not waiting for anything in the queue of requests maintained in
generic_make_request().
An easy way to achieve this would be to use a last-in-first-out stack
instead of a queue. However this will change the order of consecutive
bios submitted by a make_request_fn, which could have unexpected consequences.
Instead we take a slightly more complex approach.
A fresh queue is created for each call to a make_request_fn. After it completes,
any bios for a different device are placed on the front of the main queue, followed
by any bios for the same device, followed by all bios that were already on
the queue before the make_request_fn was called.
This provides the depth-first approach without reordering bios on the same level.
This, by itself, it not enough to remove all deadlocks. It just makes
it possible for drivers to take the extra step required themselves.
To avoid deadlocks, drivers must never risk waiting for a request
after submitting one to generic_make_request. This includes never
allocing from a mempool twice in the one call to a make_request_fn.
A common pattern in drivers is to call bio_split() in a loop, handling
the first part and then looping around to possibly split the next part.
Instead, a driver that finds it needs to split a bio should queue
(with generic_make_request) the second part, handle the first part,
and then return. The new code in generic_make_request will ensure the
requests to underlying bios are processed first, then the second bio
that was split off. If it splits again, the same process happens. In
each case one bio will be completely handled before the next one is attempted.
With this is place, it should be possible to disable the
punt_bios_to_recover() recovery thread for many block devices, and
eventually it may be possible to remove it completely.
Ref: http://www.spinics.net/lists/raid/msg54680.html Tested-by: Jinpu Wang <jinpu.wang@profitbricks.com> Inspired-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
Jan Kara [Wed, 8 Mar 2017 16:48:34 +0000 (17:48 +0100)]
Revert "scsi, block: fix duplicate bdi name registration crashes"
This reverts commit 0dba1314d4f81115dce711292ec7981d17231064. It causes
leaking of device numbers for SCSI when SCSI registers multiple gendisks
for one request_queue in succession. It can be easily reproduced using
Omar's script [1] on kernel with CONFIG_DEBUG_TEST_DRIVER_REMOVE.
Furthermore the protection provided by this commit is not needed anymore
as the problem it was fixing got also fixed by commit 165a5e22fafb
"block: Move bdi_unregister() to del_gendisk()".
Signed-off-by: Jan Kara <jack@suse.cz> Acked-by: Dan Williams <dan.j.williams@intel.com> Tested-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
Jan Kara [Wed, 8 Mar 2017 16:48:33 +0000 (17:48 +0100)]
block: Make del_gendisk() safer for disks without queues
Commit 165a5e22fafb "block: Move bdi_unregister() to del_gendisk()"
added disk->queue dereference to del_gendisk(). Although del_gendisk()
is not supposed to be called without disk->queue valid and
blk_unregister_queue() warns in that case, this change will make it oops
instead. Return to the old more robust behavior of just warning when
del_gendisk() gets called for gendisk with disk->queue being NULL.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jan Kara <jack@suse.cz> Tested-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
Jan Kara [Wed, 8 Mar 2017 16:48:32 +0000 (17:48 +0100)]
bdi: Fix use-after-free in wb_congested_put()
bdi_writeback_congested structures get created for each blkcg and bdi
regardless whether bdi is registered or not. When they are created in
unregistered bdi and the request queue (and thus bdi) is then destroyed
while blkg still holds reference to bdi_writeback_congested structure,
this structure will be referencing freed bdi and last wb_congested_put()
will try to remove the structure from already freed bdi.
With commit 165a5e22fafb "block: Move bdi_unregister() to
del_gendisk()", SCSI started to destroy bdis without calling
bdi_unregister() first (previously it was calling bdi_unregister() even
for unregistered bdis) and thus the code detaching
bdi_writeback_congested in cgwb_bdi_destroy() was not triggered and we
started hitting this use-after-free bug. It is enough to boot a KVM
instance with virtio-scsi device to trigger this behavior.
Fix the problem by detaching bdi_writeback_congested structures in
bdi_exit() instead of bdi_unregister(). This is also more logical as
they can get attached to bdi regardless whether it ever got registered
or not.
Fixes: 165a5e22fafb127ecb5914e12e8c32a1f0d3f820 Signed-off-by: Jan Kara <jack@suse.cz> Tested-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
Jan Kara [Wed, 8 Mar 2017 16:48:31 +0000 (17:48 +0100)]
block: Allow bdi re-registration
SCSI can call device_add_disk() several times for one request queue when
a device in unbound and bound, creating new gendisk each time. This will
lead to bdi being repeatedly registered and unregistered. This was not a
big problem until commit 165a5e22fafb "block: Move bdi_unregister() to
del_gendisk()" since bdi was only registered repeatedly (bdi_register()
handles repeated calls fine, only we ended up leaking reference to
gendisk due to overwriting bdi->owner) but unregistered only in
blk_cleanup_queue() which didn't get called repeatedly. After 165a5e22fafb we were doing correct bdi_register() - bdi_unregister()
cycles however bdi_unregister() is not prepared for it. So make sure
bdi_unregister() cleans up bdi in such a way that it is prepared for
a possible following bdi_register() call.
An easy way to provoke this behavior is to enable
CONFIG_DEBUG_TEST_DRIVER_REMOVE and use scsi_debug driver to create a
scsi disk which immediately hangs without this fix.
Fixes: 165a5e22fafb127ecb5914e12e8c32a1f0d3f820 Signed-off-by: Jan Kara <jack@suse.cz> Tested-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
zram: set physical queue limits to avoid array out of bounds accesses
zram can handle at most SECTORS_PER_PAGE sectors in a bio's bvec. When using
the NVMe over Fabrics loopback target which potentially sends a huge bulk of
pages attached to the bio's bvec this results in a kernel panic because of
array out of bounds accesses in zram_decompress_page().
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Jens Axboe <axboe@fb.com>
Ming Lei [Wed, 22 Feb 2017 10:14:02 +0000 (18:14 +0800)]
blk-mq: free hctx->cpumask in release handler of hctx's kobject
It is obviously that hctx->cpumask is per hctx, and both
share same lifetime, so this patch moves freeing of hctx->cpumask
into release handler of hctx's kobject.
Signed-off-by: Ming Lei <tom.leiming@gmail.com> Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Jens Axboe <axboe@fb.com>
Ming Lei [Wed, 22 Feb 2017 10:14:01 +0000 (18:14 +0800)]
blk-mq: make lifetime consistent between hctx and its kobject
This patch removes kobject_put() over hctx in __blk_mq_unregister_dev(),
and trys to keep lifetime consistent between hctx and hctx's kobject.
Now blk_mq_sysfs_register() and blk_mq_sysfs_unregister() become
totally symmetrical, and kobject's refcounter drops to zero just
when the hctx is freed.
Signed-off-by: Ming Lei <tom.leiming@gmail.com> Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Jens Axboe <axboe@fb.com>
Ming Lei [Wed, 22 Feb 2017 10:14:00 +0000 (18:14 +0800)]
blk-mq: make lifetime consitent between q/ctx and its kobject
Currently from kobject view, both q->mq_kobj and ctx->kobj can
be released during one cycle of blk_mq_register_dev() and
blk_mq_unregister_dev(). Actually, sw queue's lifetime is
same with its request queue's, which is covered by request_queue->kobj.
So we don't need to call kobject_put() for the two kinds of
kobject in __blk_mq_unregister_dev(), instead we do that
in release handler of request queue.
Signed-off-by: Ming Lei <tom.leiming@gmail.com> Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Jens Axboe <axboe@fb.com>
ktest: Make sure wait_for_input does honor the timeout
The function wait_for_input takes in a timeout, and even has a default
timeout. But if for some reason the STDIN descriptor keeps sending in data,
the function will never time out. The timout is to wait for the data from
the passed in file descriptor, not for STDIN. Adding a test in the case
where there's no data from the passed in file descriptor that checks to see
if the timeout passed, will ensure that it will timeout properly even if
there's input in STDIN.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
The run_command function was changed to use the wait_for_input function to
allow having a timeout if the command to run takes too much time. There was
a bug in the wait_for_input where it could end up going into an infinite
loop. There's two issues here. One is that the return value of the sysread
wasn't used for the write (to write a proper size), and that it should
continue processing the passed in file descriptor too even if there was
input. There was no check for error, if for some reason STDIN returned an
error, the function would go into an infinite loop and never exit.
Arnd Bergmann [Wed, 8 Mar 2017 07:29:31 +0000 (08:29 +0100)]
MIPS: Add missing include files
After the split of linux/sched.h, several platforms in arch/mips stopped building.
Add the respective additional #include statements to fix the problem I first
tried adding these into asm/processor.h, but ran into circular header
dependencies with that which I could not figure out.
The commit I listed as causing the problem is the branch merge, as there is
likely a combination of multiple patches in that branch.
Linus Torvalds [Tue, 7 Mar 2017 22:47:24 +0000 (14:47 -0800)]
Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"Misc fixes and minor updates all over the place:
- an SGI/UV fix
- a defconfig update
- a build warning fix
- move the boot_params file to the arch location in debugfs
- a pkeys fix
- selftests fix
- boot message fixes
- sparse fixes
- a resume warning fix
- ioapic hotplug fixes
- reboot quirks
... plus various minor cleanups"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/build/x86_64_defconfig: Enable CONFIG_R8169
x86/reboot/quirks: Add ASUS EeeBook X205TA/W reboot quirk
x86/hpet: Prevent might sleep splat on resume
x86/boot: Correct setup_header.start_sys name
x86/purgatory: Fix sparse warning, symbol not declared
x86/purgatory: Make functions and variables static
x86/events: Remove last remnants of old filenames
x86/pkeys: Check against max pkey to avoid overflows
x86/ioapic: Split IOAPIC hot-removal into two steps
x86/PCI: Implement pcibios_release_device to release IRQ from IOAPIC
x86/intel_rdt: Remove duplicate inclusion of linux/cpu.h
x86/vmware: Remove duplicate inclusion of asm/timer.h
x86/hyperv: Hide unused label
x86/reboot/quirks: Add ASUS EeeBook X205TA reboot quirk
x86/platform/uv/BAU: Fix HUB errors by remove initial write to sw-ack register
x86/selftests: Add clobbers for int80 on x86_64
x86/apic: Simplify enable_IR_x2apic(), remove try_to_enable_IR()
x86/apic: Fix a warning message in logical CPU IDs allocation
x86/kdebugfs: Move boot params hierarchy under (debugfs)/x86/
Linus Torvalds [Tue, 7 Mar 2017 22:45:22 +0000 (14:45 -0800)]
Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Ingo Molnar:
"This includes a fix for lockups caused by incorrect nsecs related
cleanup, and a capabilities check fix for timerfd"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
jiffies: Revert bogus conversion of NSEC_PER_SEC to TICK_NSEC
timerfd: Only check CAP_WAKE_ALARM when it is needed
Linus Torvalds [Tue, 7 Mar 2017 22:42:34 +0000 (14:42 -0800)]
Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
"A fix for KVM's scheduler clock which (erroneously) was always marked
unstable, a fix for RT/DL load balancing, plus latency fixes"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/clock, x86/tsc: Rework the x86 'unstable' sched_clock() interface
sched/core: Fix pick_next_task() for RT,DL
sched/fair: Make select_idle_cpu() more aggressive
Linus Torvalds [Tue, 7 Mar 2017 22:38:16 +0000 (14:38 -0800)]
Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"This includes a fix for a crash if certain special addresses are
kprobed, plus does a rename of two Kconfig variables that were a minor
misnomer"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/core: Rename CONFIG_[UK]PROBE_EVENT to CONFIG_[UK]PROBE_EVENTS
kprobes/x86: Fix kernel panic when certain exception-handling addresses are probed
Linus Torvalds [Tue, 7 Mar 2017 22:33:11 +0000 (14:33 -0800)]
Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fixes from Ingo Molnar:
- Change the new refcount_t warnings from WARN() to WARN_ONCE()
- two ww_mutex fixes
- plus a new lockdep self-consistency check for a bug that triggered in
practice
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/ww_mutex: Adjust the lock number for stress test
locking/lockdep: Add nest_lock integrity test
locking/ww_mutex: Replace cpu_relax() with cond_resched() for tests
locking/refcounts: Change WARN() to WARN_ONCE()
Yinghai Lu [Wed, 1 Mar 2017 08:25:40 +0000 (00:25 -0800)]
PCI/ASPM: Always set link->downstream to avoid NULL dereference on remove
We call pcie_aspm_exit_link_state() when we remove a device. If the device
is the last PCIe function to be removed below a bridge and the bridge has
an ASPM link_state struct, we disable ASPM on the link. Disabling ASPM
requires link->downstream (used in pcie_config_aspm_link()).
We previously set link->downstream in pcie_aspm_cap_init(), but only if the
device was not blacklisted. Removing the blacklisted device caused a NULL
pointer dereference in the pcie_aspm_exit_link_state() ->
pcie_config_aspm_link() path:
Linus Torvalds [Tue, 7 Mar 2017 18:52:26 +0000 (10:52 -0800)]
Merge branch 'idr-4.11' of git://git.infradead.org/users/willy/linux-dax
Pull idr fix (and new tests) from Matthew Wilcox:
"One urgent patch in here; freeing the correct IDA bitmap.
Everything else is changes to the test suite"
* 'idr-4.11' of git://git.infradead.org/users/willy/linux-dax:
radix tree test suite: Specify -m32 in LDFLAGS too
ida: Free correct IDA bitmap
radix tree test suite: Depend on Makefile and quieten grep
radix tree test suite: Fix build with --as-needed
radix tree test suite: Build 32 bit binaries
radix tree test suite: Add performance test for radix_tree_join()
radix tree test suite: Add performance test for radix_tree_split()
radix tree test suite: Add performance benchmarks
radix tree test suite: Add test for radix_tree_clear_tags()
radix tree test suite: Add tests for ida_simple_get() and ida_simple_remove()
radix tree test suite: Add test for idr_get_next()
Jaehoon Chung [Tue, 7 Mar 2017 10:54:05 +0000 (19:54 +0900)]
PCI: exynos: Initialize elbi_base even when using PHY framework
Even when using the PHY framework, we need the elbi_base. Before this
patch, we didn't initialize elbi_base, which caused NULL pointer
dereferences later.
Fixes: e7cd7ef58e1f ("PCI: exynos: Support the PHY generic framework") Signed-off-by: Jaehoon Chung <jh80.chung@samsung.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Linus Torvalds [Tue, 7 Mar 2017 18:46:10 +0000 (10:46 -0800)]
Merge tag 'powerpc-4.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"Five fairly small fixes for things that went in this cycle.
A fairly large patch to rework the CAS logic on Power9, necessitated
by a late change to the firmware API, and we can't boot without it.
Three fixes going to stable, allowing more instructions to be emulated
on LE, fixing a boot crash on 32-bit Freescale BookE machines, and the
OPAL XICS workaround.
And a patch from me to sort the selects under CONFIG PPC. Annoying
churn, but worth it in the long run, and best for it to go in now to
avoid conflicts.
Thanks to:
Alexey Kardashevskiy, Anton Blanchard, Balbir Singh, Gautham R.
Shenoy, Laurentiu Tudor, Nicholas Piggin, Paul Mackerras, Ravi
Bangoria, Sachin Sant, Shile Zhang, Suraj Jitindar Singh"
* tag 'powerpc-4.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc: Sort the selects under CONFIG_PPC
powerpc/64: Fix L1D cache shape vector reporting L1I values
powerpc/64: Avoid panic during boot due to divide by zero in init_cache_info()
powerpc: Update to new option-vector-5 format for CAS
powerpc: Parse the command line before calling CAS
powerpc/xics: Work around limitations of OPAL XICS priority handling
powerpc/64: Fix checksum folding in csum_add()
powerpc/powernv: Fix opal tracepoints with JUMP_LABEL=n
powerpc/booke: Fix boot crash due to null hugepd
powerpc: Fix compiling a BE kernel with a powerpc64le toolchain
selftest/powerpc: Fix false failures for skipped tests
powerpc/powernv: Fix bug due to labeling ambiguity in power_enter_stop
powerpc/64: Invalidate process table caching after setting process table
powerpc: emulate_step() tests for load/store instructions
powerpc: Emulation support for load/store instructions on LE
Matthew Wilcox [Fri, 3 Mar 2017 17:28:37 +0000 (12:28 -0500)]
radix tree test suite: Specify -m32 in LDFLAGS too
Michael's patch to use the default make rule for linking and the patch
from Rehas to use -m32 if building a 32-bit test-suite on a 64-bit
platform don't work well together.
Reported-by: Rehas Sachdeva <aquannie@gmail.com> Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Matthew Wilcox [Fri, 3 Mar 2017 17:16:10 +0000 (12:16 -0500)]
ida: Free correct IDA bitmap
There's a relatively rare race where we look at the per-cpu preallocated
IDA bitmap, see it's NULL, allocate a new one, and atomically update it.
If the kmalloc() happened to sleep and we were rescheduled to a different
CPU, or an interrupt came in at the exact right time, another task
might have successfully allocated a bitmap and already deposited it.
I forgot what the semantics of cmpxchg() were and ended up freeing the
wrong bitmap leading to KASAN reporting a use-after-free.
Dmitry found the bug with syzkaller & wrote the patch. I wrote the test
case that will reproduce the bug without his patch being applied.
Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Matthew Wilcox [Thu, 2 Mar 2017 17:24:28 +0000 (12:24 -0500)]
radix tree test suite: Depend on Makefile and quieten grep
Changing the CFLAGS in the Makefile didn't always lead to a
recompilation because the OFILES didn't depend on the Makefile.
Also, after doing make clean, grep would still complain about
a missing map-shift.h; we need -s as well as -q.
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Currently the radix tree test suite doesn't build with toolchains that
use --as-needed by default, for example Ubuntu's:
cc -I. -I../../include -g -O2 -Wall -D_LGPL_SOURCE -fsanitize=address -lpthread -lurcu main.o ... -o main
/usr/bin/ld: regression1.o: undefined reference to symbol 'pthread_join@@GLIBC_2.17'
/lib/powerpc64le-linux-gnu/libpthread.so.0: error adding symbols: DSO missing from command line
collect2: error: ld returned 1 exit status
This is caused by the custom makefile rules placing LDFLAGS before the
.o files that need the libraries.
We could fix it by using --no-as-needed, or rewriting the custom rules.
But we can also just drop the custom rules and move the libraries to
LDLIBS, and then the default rules work correctly - with the one caveat
that we need to add -fsanitize=address to LDFLAGS because that must be
passed to the linker as well as the compiler.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Rehas Sachdeva [Sun, 26 Feb 2017 11:33:00 +0000 (06:33 -0500)]
radix tree test suite: Add test for radix_tree_clear_tags()
Assert that radix_tree_clear_tags() clears the tags on the passed node and
slot. Assert that the case where the radix tree has only one entry at index
zero and the node is NULL, is also handled.
Signed-off-by: Rehas Sachdeva <aquannie@gmail.com> Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Linus Torvalds [Tue, 7 Mar 2017 18:06:25 +0000 (10:06 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull namespace fix from Eric Biederman:
"This fixes a race between put_ucounts and get_ucounts that can cause a
use after free. The fix works by simplifying the code and so there is
not even a temptation to be clever and play spinlock vs atomic
reference games"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
ucount: Remove the atomicity from ucount->count
Linus Torvalds [Tue, 7 Mar 2017 17:37:28 +0000 (09:37 -0800)]
Merge tag 'trace-v4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
"There was some breakage with the changes for jump labels in the 4.11
merge window:
- powerpc broke as jump labels uses the two LSB bits as flags in
initialization.
A check was added to make sure that all jump label entries were 4
bytes aligned, but powerpc didn't work that way for modules. Adding
an alignment in the module linker script appeared to be the best
solution.
- Jump labels also added an anonymous union to access those LSB bits
as a normal long. But because this structure had static
initialization, it broke older compilers that could not statically
initialize anonymous unions without brackets.
- The command line parameter for setting function graph filter broke
the "EMPTY_HASH" descriptor by modifying it instead of creating a
new hash to hold the entries.
- The command line parameter ftrace_graph_max_depth was added to
allow its setting at boot time. It uses existing code and only the
command line hook was added.
This is not really a fix, but as it uses existing code without
affecting anything else, I added it to this release. It was ready
before the merge window closed, but I wanted to let it sit in
linux-next for a couple of days first"
* tag 'trace-v4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
ftrace/graph: Add ftrace_graph_max_depth kernel parameter
tracing: Add #undef to fix compile error
jump_label: Add comment about initialization order for anonymous unions
jump_label: Fix anonymous union initialization
module: set __jump_table alignment to 8
ftrace/graph: Do not modify the EMPTY_HASH for the function_graph filter
tracing: Fix code comment for ftrace_ops_get_func()
Josh Poimboeuf [Thu, 2 Mar 2017 22:57:23 +0000 (16:57 -0600)]
objtool: Fix another GCC jump table detection issue
Arnd Bergmann reported a (false positive) objtool warning:
drivers/infiniband/sw/rxe/rxe_resp.o: warning: objtool: rxe_responder()+0xfe: sibling call from callable instruction with changed frame pointer
The issue is in find_switch_table(). It tries to find a switch
statement's jump table by walking backwards from an indirect jump
instruction, looking for a relocation to the .rodata section. In this
case it stopped walking prematurely: the first .rodata relocation it
encountered was for a variable (resp_state_name) instead of a jump
table, so it just assumed there wasn't a jump table.
The fix is to ignore any .rodata relocation which refers to an ELF
object symbol. This works because the jump tables are anonymous and
have no symbols associated with them.
Guenter Roeck [Mon, 6 Mar 2017 01:05:57 +0000 (17:05 -0800)]
avr32: Fix build error caused by include file reshuffling
Various avr32 builds fail:
arch/avr32/oprofile/backtrace.c:58: error: dereferencing pointer to incomplete type
arch/avr32/oprofile/backtrace.c:60: error: implicit declaration of function 'user_mode'
Always increment/decrement ucount->count under the ucounts_lock. The
increments are there already and moving the decrements there means the
locking logic of the code is simpler. This simplification in the
locking logic fixes a race between put_ucounts and get_ucounts that
could result in a use-after-free because the count could go zero then
be found by get_ucounts and then be freed by put_ucounts.
A bug presumably this one was found by a combination of syzkaller and
KASAN. JongWhan Kim reported the syzkaller failure and Dmitry Vyukov
spotted the race in the code.
Cc: stable@vger.kernel.org Fixes: f6b2db1a3e8d ("userns: Make the count of user namespaces per user") Reported-by: JongHwan Kim <zzoru007@gmail.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
On Kernel 4.9, WARNINGs about doing DMA on stack are hit at
the dw2102 driver: one in su3000_power_ctrl() and the other in tt_s2_4600_frontend_attach().
Both were due to the use of buffers on the stack as parameters to
dvb_usb_generic_rw() and the resulting attempt to do DMA with them.
The device was non-functional as a result.
So, switch this driver over to use a buffer within the device state
structure, as has been done with other DVB-USB drivers.
Tested with TechnoTrend TT-connect S2-4600.
[mchehab@osg.samsung.com: fixed a warning at su3000_i2c_transfer() that
state var were dereferenced before check 'd'] Signed-off-by: Jonathan McDowell <noodles@earth.li> Cc: <stable@vger.kernel.org> Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
We have a big list of selects under CONFIG_PPC, and currently they're
completely unsorted. This means people tend to add new selects at the
bottom of the list, and so two commits which both add a new select will
often conflict.
Instead sort it alphabetically. This is nicer in and of itself, but also
means two commits that add a new select will have a greater chance of
not conflicting.
Add a note at the top and bottom asking people to keep it sorted.
And while we're here pad out the 'if' expressions to make them stand
out.
Suggested-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
It seems we didn't pay quite enough attention when testing the new cache
shape vectors, which means we didn't notice the bug where the vector for
the L1D was using the L1I values. Fix it, resulting in eg:
Fixes: 98a5f361b862 ("powerpc: Add new cache geometry aux vectors") Cut-and-paste-bug-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Badly-reviewed-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Anton Blanchard [Sat, 4 Mar 2017 23:54:34 +0000 (10:54 +1100)]
powerpc/64: Avoid panic during boot due to divide by zero in init_cache_info()
I see a panic in early boot when building with a recent gcc toolchain.
The issue is a divide by zero, which is undefined. Older toolchains
let us get away with it:
int foo(int a) { return a / 0; }
foo:
li 9,0
divw 3,3,9
extsw 3,3
blr
But newer ones catch it:
foo:
trap
Add a check to avoid the divide by zero.
Fixes: e2827fe5c156 ("powerpc/64: Clean up ppc64_caches using a struct per cache") Signed-off-by: Anton Blanchard <anton@samba.org> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
powerpc: Update to new option-vector-5 format for CAS
On POWER9 the ibm,client-architecture-support (CAS) negotiation process
has been updated to change how the host to guest negotiation is done for
the new hash/radix mmu as well as the nest mmu, process tables and guest
translation shootdown (GTSE).
This is documented in the unreleased PAPR ACR "CAS option vector
additions for P9".
The host tells the guest which options it supports in
ibm,arch-vec-5-platform-support. The guest then chooses a subset of these
to request in the CAS call and these are agreed to in the
ibm,architecture-vec-5 property of the chosen node.
Thus we read ibm,arch-vec-5-platform-support and make our selection before
calling CAS. We then parse the ibm,architecture-vec-5 property of the
chosen node to check whether we should run as hash or radix.
ibm,arch-vec-5-platform-support format:
index value pairs: <index, val> ... <index, val>
index: Option vector 5 byte number
val: Some representation of supported values
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Acked-by: Paul Mackerras <paulus@ozlabs.org>
[mpe: Don't print about unknown options, be consistent with OV5_FEAT] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
powerpc: Parse the command line before calling CAS
On POWER9 the hypervisor requires the guest to decide whether it would
like to use a hash or radix mmu model at the time it calls
ibm,client-architecture-support (CAS) based on what the hypervisor has
said it's allowed to do. It is possible to disable radix by passing
"disable_radix" on the command line. The next patch will add support for
the new CAS format, thus we need to parse the command line before calling
CAS so we can correctly select which mmu we would like to use.
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Reviewed-by: Paul Mackerras <paulus@ozlabs.org> Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Balbir Singh [Fri, 3 Mar 2017 00:58:44 +0000 (11:58 +1100)]
powerpc/xics: Work around limitations of OPAL XICS priority handling
The CPPR (Current Processor Priority Register) of a XICS interrupt
presentation controller contains a value N, such that only interrupts
with a priority "more favoured" than N will be received by the CPU,
where "more favoured" means "less than". So if the CPPR has the value 5
then only interrupts with a priority of 0-4 inclusive will be received.
In theory the CPPR can support a value of 0 to 255 inclusive.
In practice Linux only uses values of 0, 4, 5 and 0xff. Setting the CPPR
to 0 rejects all interrupts, setting it to 0xff allows all interrupts.
The values 4 and 5 are used to differentiate IPIs from external
interrupts. Setting the CPPR to 5 allows IPIs to be received but not
external interrupts.
The CPPR emulation in the OPAL XICS implementation only directly
supports priorities 0 and 0xff. All other priorities are considered
equivalent, and mapped to a single priority value internally. This means
when using icp-opal we can not allow IPIs but not externals.
This breaks Linux's use of priority values when a CPU is hot unplugged.
After migrating IRQs away from the CPU that is being offlined, we set
the priority to 5, meaning we still want the offline CPU to receive
IPIs. But the effect of the OPAL XICS emulation's use of a single
priority value is that all interrupts are rejected by the CPU. With the
CPU offline, and not receiving IPIs, we may not be able to wake it up to
bring it back online.
The first part of the fix is in icp_opal_set_cpu_priority(). CPPR values
of 0 to 4 inclusive will correctly cause all interrupts to be rejected,
so we pass those CPPR values through to OPAL. However if we are called
with a CPPR of 5 or greater, the caller is expecting to be able to allow
IPIs but not external interrupts. We know this doesn't work, so instead
of rejecting all interrupts we choose the opposite which is to allow all
interrupts. This is still not correct behaviour, but we know for the
only existing caller (xics_migrate_irqs_away()), that it is the better
option.
The other part of the fix is in xics_migrate_irqs_away(). Instead of
setting priority (CPPR) to 0, and then back to 5 before migrating IRQs,
we migrate the IRQs before setting the priority back to 5. This should
have no effect on an ICP backend with a working set_priority(), and on
icp-opal it means we will keep all interrupts blocked until after we've
finished doing the IRQ migration. Additionally we wait for 5ms after
doing the migration to make sure there are no IRQs in flight.
cpufreq: intel_pstate: Do not reinit performance limits in ->setpolicy
If the current P-state selection algorithm is set to "performance"
in intel_pstate_set_policy(), the limits may be initialized from
scratch, but only if no_turbo is not set and the maximum frequency
allowed for the given CPU (i.e. the policy object representing it)
is at least equal to the max frequency supported by the CPU. In all
of the other cases, the limits will not be updated.
That is confusing for two reasons. First, the initial attempt to
change min_perf_pct to 94 seems to have no effect, even though
setting the global limits should always work. Second, after
changing scaling_max_freq for policy0 the global min_perf_pct
attribute shows 94, even though it should have not been affected
by that operation in principle.
Moreover, the final attempt to change min_perf_pct to 95 worked
as expected, because scaling_max_freq for the only policy with
scaling_governor equal to "performance" was different from the
maximum at that time.
To make all that confusion go away, modify intel_pstate_set_policy()
so that it doesn't reinitialize the limits at all.
At the same time, change intel_pstate_set_performance_limits() to
set min_sysfs_pct to 100 in the "performance" limits set so that
switching the P-state selection algorithm to "performance" causes
intel_pstate/min_perf_pct in sysfs to go to 100 (or whatever value
min_sysfs_pct in the "performance" limits is set to later).
That requires per-CPU limits to be initialized explicitly rather
than by copying the global limits to avoid setting min_sysfs_pct
in the per-CPU limits to 100.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The code added to intel_pstate_verify_policy() by commit 1443ebbacfd7
(cpufreq: intel_pstate: Fix sysfs limits enforcement for performance
policy) should use perf_limits instead of limits, because otherwise
setting global limits via sysfs may affect policies inconsistently.
For example, in the sequence of shell commands below, the
scaling_min_freq attribute for policy1 and policy2 should be
affected in the same way, because scaling_governor is set in
the same way for both of them:
The are affected differently, because intel_pstate_verify_policy()
is invoked with limits set to &performance_limits (left behind by
policy0) for policy1 and with limits set to &powersave_limits (left
behind by policy1) for policy2. Since perf_limits is set to the
set of limits matching the policy being updated, using it instead
of limits fixes the inconsistency.
Fixes: 1443ebbacfd7 (cpufreq: intel_pstate: Fix sysfs limits enforcement for performance policy) Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
cpufreq: intel_pstate: Fix global settings in active mode
Commit 111b8b3fe4fa (cpufreq: intel_pstate: Always keep all
limits settings in sync) changed intel_pstate to invoke
cpufreq_update_policy() for every registered CPU on global sysfs
attributes updates, but that led to undesirable effects in the
active mode if the "performance" P-state selection algorithm is
configufred for one CPU and the "powersave" one is chosen for
all of the other CPUs.
The reason why this happens is because intel_pstate attempts to
maintain two sets of global limits in the active mode, one for
the "performance" P-state selection algorithm and one for the
"powersave" P-state selection algorithm, but the P-state selection
algorithms are set per policy, so the global limits cannot reflect
all of them at the same time if they are different for different
policies.
In the particular situation above, the attempt to change
min_perf_pct to 94 caused cpufreq_update_policy() to be run
for a CPU with the "powersave" P-state selection algorithm
and intel_pstate_set_policy() called by it silently switched the
global limits to the "powersave" set which finally was reflected
by the sysfs interface.
To prevent that from happening, modify intel_pstate_update_policies()
to always switch back to the set of limits that was used right before
it has been invoked.
Fixes: 111b8b3fe4fa (cpufreq: intel_pstate: Always keep all limits settings in sync) Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Len Brown [Tue, 28 Feb 2017 21:44:16 +0000 (16:44 -0500)]
cpufreq: Add the "cpufreq.off=1" cmdline option
Add the "cpufreq.off=1" cmdline option.
At boot-time, this allows a user to request CONFIG_CPU_FREQ=n
behavior from a kernel built with CONFIG_CPU_FREQ=y.
This is analogous to the existing "cpuidle.off=1" option
and CONFIG_CPU_IDLE=y
This capability is valuable when we need to debug end-user
issues in the BIOS or in Linux. It is also convenient
for enabling comparisons, which may otherwise require a new kernel,
or help from BIOS SETUP, which may be buggy or unavailable.
Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Viresh Kumar [Thu, 2 Mar 2017 08:33:21 +0000 (14:03 +0530)]
cpufreq: schedutil: Pass sg_policy to get_next_freq()
get_next_freq() uses sg_cpu only to get sg_policy, which the callers of
get_next_freq() already have. Pass sg_policy instead of sg_cpu to
get_next_freq(), to make it more efficient.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Viresh Kumar [Thu, 2 Mar 2017 08:33:20 +0000 (14:03 +0530)]
cpufreq: schedutil: move cached_raw_freq to struct sugov_policy
cached_raw_freq applies to the entire cpufreq policy and not individual
CPUs. Apart from wasting per-cpu memory, it is actually wrong to keep it
in struct sugov_cpu as we may end up comparing next_freq with a stale
cached_raw_freq of a random CPU.
Move cached_raw_freq to struct sugov_policy.
Fixes: 5cbea46984d6 (cpufreq: schedutil: map raw required frequency to driver frequency) Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
4) Fix sendmsg deadlock in rxrpc, from David Howells.
5) Add missing RCU locking to transport hashtable scan, from Xin Long.
6) Fix potential packet loss in mlxsw driver, from Ido Schimmel.
7) Fix race in NAPI handling between poll handlers and busy polling,
from Eric Dumazet.
8) TX path in vxlan and geneve need proper RCU locking, from Jakub
Kicinski.
9) SYN processing in DCCP and TCP need to disable BH, from Eric
Dumazet.
10) Properly handle net_enable_timestamp() being invoked from IRQ
context, also from Eric Dumazet.
11) Fix crash on device-tree systems in xgene driver, from Alban Bedel.
12) Do not call sk_free() on a locked socket, from Arnaldo Carvalho de
Melo.
13) Fix use-after-free in netvsc driver, from Dexuan Cui.
14) Fix max MTU setting in bonding driver, from WANG Cong.
15) xen-netback hash table can be allocated from softirq context, so use
GFP_ATOMIC. From Anoob Soman.
16) Fix MAC address change bug in bgmac driver, from Hari Vyas.
17) strparser needs to destroy strp_wq on module exit, from WANG Cong.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (69 commits)
strparser: destroy workqueue on module exit
sfc: fix IPID endianness in TSOv2
sfc: avoid max() in array size
rds: remove unnecessary returned value check
rxrpc: Fix potential NULL-pointer exception
nfp: correct DMA direction in XDP DMA sync
nfp: don't tell FW about the reserved buffer space
net: ethernet: bgmac: mac address change bug
net: ethernet: bgmac: init sequence bug
xen-netback: don't vfree() queues under spinlock
xen-netback: keep a local pointer for vif in backend_disconnect()
netfilter: nf_tables: don't call nfnetlink_set_err() if nfnetlink_send() fails
netfilter: nft_set_rbtree: incorrect assumption on lower interval lookups
netfilter: nf_conntrack_sip: fix wrong memory initialisation
can: flexcan: fix typo in comment
can: usb_8dev: Fix memory leak of priv->cmd_msg_buffer
can: gs_usb: fix coding style
can: gs_usb: Don't use stack memory for USB transfers
ixgbe: Limit use of 2K buffers on architectures with 256B or larger cache lines
ixgbe: update the rss key on h/w, when ethtool ask for it
...
Linus Torvalds [Sat, 4 Mar 2017 19:36:19 +0000 (11:36 -0800)]
Merge tag 'kvm-4.11-2' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull more KVM updates from Radim Krčmář:
"Second batch of KVM changes for the 4.11 merge window:
PPC:
- correct assumption about ASDR on POWER9
- fix MMIO emulation on POWER9
x86:
- add a simple test for ioperm
- cleanup TSS (going through KVM tree as the whole undertaking was
caused by VMX's use of TSS)
- fix nVMX interrupt delivery
- fix some performance counters in the guest
... and two cleanup patches"
* tag 'kvm-4.11-2' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: nVMX: Fix pending events injection
x86/kvm/vmx: remove unused variable in segment_base()
selftests/x86: Add a basic selftest for ioperm
x86/asm: Tidy up TSS limit code
kvm: convert kvm.users_count from atomic_t to refcount_t
KVM: x86: never specify a sample period for virtualized in_tx_cp counters
KVM: PPC: Book3S HV: Don't use ASDR for real-mode HPT faults on POWER9
KVM: PPC: Book3S HV: Fix software walk of guest process page tables
Linus Torvalds [Sat, 4 Mar 2017 19:26:18 +0000 (11:26 -0800)]
Merge tag 'staging-4.11-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging/IIO driver fixes from Greg KH:
"Here are a few small staging and IIO driver fixes for issues that
showed up after the big set if changes you merged last week.
Nothing major, just small bugs resolved in some IIO drivers, a lustre
allocation fix, and some RaspberryPi driver fixes for reported
problems, as well as a MAINTAINERS entry update.
All of these have been in linux-next for a week with no reported
issues"
* tag 'staging-4.11-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: fsl-mc: fix warning in DT ranges parser
MAINTAINERS: Remove Noralf Trønnes as fbtft maintainer
staging: vchiq_2835_arm: Make cache-line-size a required DT property
staging: bcm2835/mmal-vchiq: unlock on error in buffer_from_host()
staging/lustre/lnet: Fix allocation size for sv_cpt_data
iio: adc: xilinx: Fix error handling
iio: 104-quad-8: Fix off-by-one error when addressing flag register
iio: adc: handle unknow of_device_id data
Linus Torvalds [Sat, 4 Mar 2017 18:42:53 +0000 (10:42 -0800)]
Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
- vmalloc stack regression in CCM
- Build problem in CRC32 on ARM
- Memory leak in cavium
- Missing Kconfig dependencies in atmel and mediatek
- XTS Regression on some platforms (s390 and ppc)
- Memory overrun in CCM test vector
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: vmx - Use skcipher for xts fallback
crypto: vmx - Use skcipher for cbc fallback
crypto: testmgr - Pad aes_ccm_enc_tv_template vector
crypto: arm/crc32 - add build time test for CRC instruction support
crypto: arm/crc32 - fix build error with outdated binutils
crypto: ccm - move cbcmac input off the stack
crypto: xts - Propagate NEED_FALLBACK bit
crypto: api - Add crypto_requires_off helper
crypto: atmel - CRYPTO_DEV_MEDIATEK should depend on HAS_DMA
crypto: atmel - CRYPTO_DEV_ATMEL_TDES and CRYPTO_DEV_ATMEL_SHA should depend on HAS_DMA
crypto: cavium - fix leak on curr if curr->head fails to be allocated
crypto: cavium - Fix couple of static checker errors
Shile Zhang [Sat, 4 Feb 2017 09:03:40 +0000 (17:03 +0800)]
powerpc/64: Fix checksum folding in csum_add()
Paul's patch to fix checksum folding, commit b492f7e4e07a ("powerpc/64:
Fix checksum folding in csum_tcpudp_nofold and ip_fast_csum_nofold")
missed a case in csum_add(). Fix it.
Signed-off-by: Shile Zhang <shile.zhang@nokia.com> Acked-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
powerpc/powernv: Fix opal tracepoints with JUMP_LABEL=n
The recent commit to allow calling OPAL calls in real mode, commit ab9bad0ead9a ("powerpc/powernv: Remove separate entry for OPAL real mode
calls"), introduced a bug when CONFIG_JUMP_LABEL=n.
The commit moved the "mfmsr r12" prior to the call to OPAL_BRANCH, but
we missed that OPAL_BRANCH clobbers r12 when jump labels are disabled.
This leads to us using the tracepoint refcount as the MSR value,
typically zero, and saving that into PACASAVEDMSR. When we return from
OPAL we use that value as the MSR value for rfid, meaning we switch to
32-bit BE real mode - hilarity ensues.
Fix it by using r11 in OPAL_BRANCH, which is not live at the time the
macro is used in OPAL_CALL.
Fixes: ab9bad0ead9a ("powerpc/powernv: Remove separate entry for OPAL real mode calls") Suggested-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Linus Torvalds [Sat, 4 Mar 2017 05:36:56 +0000 (21:36 -0800)]
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull more SCSI updates from James Bottomley:
"This is the set of stuff that didn't quite make the initial pull and a
set of fixes for stuff which did.
The new stuff is basically lpfc (nvme), qedi and aacraid. The fixes
cover a lot of previously submitted stuff, the most important of which
probably covers some of the failing irq vectors allocation and other
fallout from having the SCSI command allocated as part of the block
allocation functions"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (59 commits)
scsi: qedi: Fix memory leak in tmf response processing.
scsi: aacraid: remove redundant zero check on ret
scsi: lpfc: use proper format string for dma_addr_t
scsi: lpfc: use div_u64 for 64-bit division
scsi: mac_scsi: Fix MAC_SCSI=m option when SCSI=m
scsi: cciss: correct check map error.
scsi: qla2xxx: fix spelling mistake: "seperator" -> "separator"
scsi: aacraid: Fixed expander hotplug for SMART family
scsi: mpt3sas: switch to pci_alloc_irq_vectors
scsi: qedf: fixup compilation warning about atomic_t usage
scsi: remove scsi_execute_req_flags
scsi: merge __scsi_execute into scsi_execute
scsi: simplify scsi_execute_req_flags
scsi: make the sense header argument to scsi_test_unit_ready mandatory
scsi: sd: improve TUR handling in sd_check_events
scsi: always zero sshdr in scsi_normalize_sense
scsi: scsi_dh_emc: return success in clariion_std_inquiry()
scsi: fix memory leak of sdpk on when gd fails to allocate
scsi: sd: make sd_devt_release() static
scsi: qedf: Add QLogic FastLinQ offload FCoE driver framework.
...
WANG Cong [Fri, 3 Mar 2017 20:21:14 +0000 (12:21 -0800)]
strparser: destroy workqueue on module exit
Fixes: 43a0c6751a32 ("strparser: Stream parser for messages") Cc: Tom Herbert <tom@herbertland.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The following patchset contains Netfilter fixes for your net tree,
they are:
1) Missing check for full sock in ip_route_me_harder(), from
Florian Westphal.
2) Incorrect sip helper structure initilization that breaks it when
several ports are used, from Christophe Leroy.
3) Fix incorrect assumption when looking up for matching with adjacent
intervals in the nft_set_rbtree.
4) Fix broken netlink event error reporting in nf_tables that results
in misleading ESRCH errors propagated to userspace listeners.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Sat, 4 Mar 2017 00:48:48 +0000 (16:48 -0800)]
Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm fixes from Dan Williams:
"A fix and regression test case for nvdimm namespace label
compatibility.
Details:
- An "nvdimm namespace label" is metadata on an nvdimm that
provisions dimm capacity into a "namespace" that can host a block
device / dax-filesytem, or a device-dax character device.
A namespace is an object that other operating environment and
platform firmware needs to comprehend for capabilities like booting
from an nvdimm.
The label metadata contains a checksum that Linux was not
calculating correctly leading to other environments rejecting the
Linux label.
These have received a build success notification from the kbuild
robot, and a positive test result from Nick who reported the problem"
* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
nfit, libnvdimm: fix interleave set cookie calculation
tools/testing/nvdimm: make iset cookie predictable
Linus Torvalds [Sat, 4 Mar 2017 00:44:21 +0000 (16:44 -0800)]
Merge tag 'pci-v4.11-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI fixes from Bjorn Helgaas:
- fix NULL pointer dereferences in many DesignWare-based drivers due to
refactoring error
- fix Altera config write breakage due to my refactoring error
* tag 'pci-v4.11-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
PCI: altera: Fix TLP_CFG_DW0 for TLP write
PCI: dwc: Fix crashes seen due to missing assignments
In the passive mode the cpu_frequency trace event is already
triggered by the cpufreq core or by scaling governors, so
intel_pstate should not trigger it once again for the same
P-state updates.
In addition to that, the frequency returned by
intel_cpufreq_fast_switch() and passed via freqs.new from
intel_cpufreq_target() to cpufreq_freq_transition_end() should
reflect the P-state actually set, so make that happen.
Fixes: 001c76f05b01 (cpufreq: intel_pstate: Generic governors support) Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The intel_pstate_update_perf_limits() called from
intel_cpufreq_verify_policy() may cause global P-state limits
to change which is generally confusing and unnecessary.
In the passive mode the global limits are only applied to the
frequency selected by the scaling governor (they are not taken
into account by governors when making decisions anyway), so making
them follow the per-policy limits serves no purpose and may go
against user expectations (as it generally causes the global
attributes in sysfs to change even though they have not been
written to in some cases).
Fix that by dropping the intel_pstate_update_perf_limits()
invocation from intel_cpufreq_verify_policy() (which also
reduces the code size by a few lines).
This change does not affect the per-CPU limits case, because those
limits allow any P-state to be set by default in the passive mode
and it removes the only piece of code updating them in that mode,
so the per-policy settings will be the only ones taken into account
in that case as expected.
Fixes: 001c76f05b01 (cpufreq: intel_pstate: Generic governors support) Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
cpufreq: intel_pstate: Do not use performance_limits in passive mode
Using performance_limits in the passive mode doesn't make
sense, because in that mode the global limits are applied to the
frequency selected by the scaling governor.
The maximum and minimum P-state limits in performance_limits are both
set to 100 percent which will put all CPUs into the turbo range
regardless of what governor is used and what frequencies are
selected by it (that is particularly undesirable on CPUs with the
generic powersave governor attached).
For this reason, make intel_pstate_register_driver() always point
limits to powersave_limits in the passive mode.
Fixes: 001c76f05b01 (cpufreq: intel_pstate: Generic governors support) Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Linus Torvalds [Sat, 4 Mar 2017 00:20:06 +0000 (16:20 -0800)]
Merge branch 'parisc-4.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc fixes and cleanups from Helge Deller:
"Nothing really important in this patchset: fix resource leaks in error
paths, coding style cleanups and code removal"
* 'parisc-4.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Remove flush_user_dcache_range and flush_user_icache_range
parisc: fix a printk
parisc: ccio-dma: Handle return NULL error from ioremap_nocache
parisc: Define access_ok() as macro
parisc: eisa: Fix resource leaks in error paths
parisc: eisa: Remove coding style errors
Linus Torvalds [Sat, 4 Mar 2017 00:17:55 +0000 (16:17 -0800)]
Merge tag 'xtensa-20170303' of git://github.com/jcmvbkbc/linux-xtensa
Pull Xtensa updates from Max Filippov:
- clean up bootable image build targets: provide separate 'Image',
'zImage' and 'uImage' make targets that only build corresponding
image type. Make 'all' build all images appropriate for a platform
- allow merging vectors code into .text section as a preparation step
for XIP support
- fix handling external FDT when the kernel is built without
BLK_DEV_INITRD support
* tag 'xtensa-20170303' of git://github.com/jcmvbkbc/linux-xtensa:
xtensa: allow merging vectors into .text section
xtensa: clean up bootable image build targets
xtensa: move parse_tag_fdt out of #ifdef CONFIG_BLK_DEV_INITRD
Linus Torvalds [Sat, 4 Mar 2017 00:15:48 +0000 (16:15 -0800)]
Merge tag 'armsoc-late' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC late DT updates from Arnd Bergmann:
"These updates have been kept in a separate branch mostly because they
rely on updates to the respective clk drivers to keep the shared
header files in sync.
This includes two branches for arm64 dt updates, both following up on
earlier changes for the same platforms that are already merged:
Samsung:
- add USB3 support in Exynos7
- minor PM related updates
Amlogic:
- new machines: WeTek Set-top-boxes
- various devices added to DT
There are also a couple of bugfixes that trickled in since the start
of the merge window:
- The moxart_defconfig was not building the intended platform
- CPU-hotplug was broken on ux500
- Coresight was broken on Juno (never worked)"
* tag 'armsoc-late' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (26 commits)
ARM: deconfig: fix the moxart defconfig
ARM: ux500: resume the second core properly
arm64: dts: juno: update definition for programmable replicator
arm64: dts: exynos: Add regulators for Vbus and Vbus-Boost
arm64: dts: exynos: Add USB 3.0 controller node for Exynos7
arm64: dts: exynos: Use macros for pinctrl configuration on Exynos7
pinctrl: dt-bindings: samsung: Add Exynos7 specific pinctrl macro definitions
arm64: dts: exynos: Add initial configuration for DISP clocks for TM2/TM2e
ARM64: dts: meson-gxbb-p200: add ADC laddered keys
ARM64: dts: meson: meson-gx: add the SAR ADC
ARM64: dts: meson-gxl: add the pwm_ao_b pin
ARM64: dts: meson-gx: add the missing pwm_AO_ab node
clk: gxbb: fix CLKID_ETH defined twice
ARM64: dts: meson-gxl: rename Nexbox A95x for consistency
clk: gxbb: add the SAR ADC clocks and expose them
dt-bindings: amlogic: Add WeTek boards
ARM64: dts: meson-gxbb: Add support for WeTek Hub and Play
dt-bindings: vendor-prefix: Add wetek vendor prefix
ARM64: dts: meson-gxm: Rename q200 and q201 DT files for consistency
ARM64: dts: meson-gx: Add HDMI HPD/DDC pinctrl nodes
...
Linus Torvalds [Sat, 4 Mar 2017 00:00:59 +0000 (16:00 -0800)]
Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6
Pull SMB3 fixes from Steve French:
"Some small bug fixes as well as SMB2.1/SMB3 enablement for DFS (global
namespace) which previously was only enabled for CIFS"
* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
smb2: Enforce sec= mount option
CIFS: Fix sparse warnings
CIFS: implement get_dfs_refer for SMB2+
CIFS: use DFS pathnames in SMB2+ Create requests
CIFS: set signing flag in SMB2+ TreeConnect if needed
CIFS: let ses->ipc_tid hold smb2 TreeIds
CIFS: add use_ipc flag to SMB2_ioctl()
CIFS: add build_path_from_dentry_optional_prefix()
CIFS: move DFS response parsing out of SMB1 code
CIFS: Fix possible use after free in demultiplex thread
Martyn Welch [Fri, 3 Mar 2017 22:43:30 +0000 (22:43 +0000)]
docs: Fix htmldocs build failure
Build of HTML docs failing due to conversion of deviceiobook.tmpl in 8a8a602f and regulator.tmpl in 028f2533 to RST without removing from
DOCBOOKS in Makefile, resulting (in the case of deviceiobook) the
following error:
make[1]: *** No rule to make target 'Documentation/DocBook/deviceiobook.xml', needed by 'Documentation/DocBook/deviceiobook.aux.xml'. Stop.
Makefile:1452: recipe for target 'htmldocs' failed
make: *** [htmldocs] Error 2
Update DOCBOOKS to reflect available books.
Signed-off-by: Martyn Welch <martyn.welch@collabora.co.uk> Signed-off-by: Jonathan Corbet <corbet@lwn.net>