Radim Krčmář [Thu, 18 May 2017 17:37:31 +0000 (19:37 +0200)]
KVM: x86/vPMU: fix undefined shift in intel_pmu_refresh()
Static analysis noticed that pmu->nr_arch_gp_counters can be 32
(INTEL_PMC_MAX_GENERIC) and therefore cannot be used to shift 'int'.
I didn't add BUILD_BUG_ON for it as we have a better checker.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: 2dd01dcc98f6 ("KVM: x86/vPMU: Define kvm_pmu_ops to support vPMU function dispatch") Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Radim Krčmář [Thu, 18 May 2017 17:37:30 +0000 (19:37 +0200)]
KVM: x86: zero base3 of unusable segments
Static checker noticed that base3 could be used uninitialized if the
segment was not present (useable). Random stack values probably would
not pass VMCS entry checks.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: 9e17427875c9 ("KVM: x86 emulator: consolidate segment accessors") Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Wanpeng Li [Fri, 19 May 2017 09:46:56 +0000 (02:46 -0700)]
KVM: X86: Fix read out-of-bounds vulnerability in kvm pio emulation
Huawei folks reported a read out-of-bounds vulnerability in kvm pio emulation.
- "inb" instruction to access PIT Mod/Command register (ioport 0x43, write only,
a read should be ignored) in guest can get a random number.
- "rep insb" instruction to access PIT register port 0x43 can control memcpy()
in emulator_pio_in_emulated() to copy max 0x400 bytes but only read 1 bytes,
which will disclose the unimportant kernel memory in host but no crash.
The similar test program below can reproduce the read out-of-bounds vulnerability:
void hexdump(void *mem, unsigned int len)
{
unsigned int i, j;
The vcpu->arch.pio_data buffer is used by both in/out instrutions emulation
w/o clear after using which results in some random datas are left over in
the buffer. Guest reads port 0x43 will be ignored since it is write only,
however, the function kernel_pio() can't distigush this ignore from successfully
reads data from device's ioport. There is no new data fill the buffer from
port 0x43, however, emulator_pio_in_emulated() will copy the stale data in
the buffer to the guest unconditionally. This patch fixes it by clearing the
buffer before in instruction emulation to avoid to grant guest the stale data
in the buffer.
In addition, string I/O is not supported for in kernel device. So there is no
iteration to read ioport %RCX times for string I/O. The function kernel_pio()
just reads one round, and then copy the io size * %RCX to the guest unconditionally,
actually it copies the one round ioport data w/ other random datas which are left
over in the vcpu->arch.pio_data buffer to the guest. This patch fixes it by
introducing the string I/O support for in kernel device in order to grant the right
ioport datas to the guest.
Before the patch:
0x000000: fe 38 93 93 ff ff ab ab .8......
0x000008: ab ab ab ab ab ab ab ab ........
0x000010: ab ab ab ab ab ab ab ab ........
0x000018: ab ab ab ab ab ab ab ab ........
0x000020: ab ab ab ab ab ab ab ab ........
0x000028: ab ab ab ab ab ab ab ab ........
0x000030: ab ab ab ab ab ab ab ab ........
0x000038: ab ab ab ab ab ab ab ab ........
0x000000: f6 00 00 00 00 00 00 00 ........
0x000008: 00 00 00 00 00 00 00 00 ........
0x000010: 00 00 00 00 4d 51 30 30 ....MQ00
0x000018: 30 30 20 33 20 20 20 20 00 3
0x000020: ab ab ab ab ab ab ab ab ........
0x000028: ab ab ab ab ab ab ab ab ........
0x000030: ab ab ab ab ab ab ab ab ........
0x000038: ab ab ab ab ab ab ab ab ........
0x000000: f6 00 00 00 00 00 00 00 ........
0x000008: 00 00 00 00 00 00 00 00 ........
0x000010: 00 00 00 00 4d 51 30 30 ....MQ00
0x000018: 30 30 20 33 20 20 20 20 00 3
0x000020: ab ab ab ab ab ab ab ab ........
0x000028: ab ab ab ab ab ab ab ab ........
0x000030: ab ab ab ab ab ab ab ab ........
0x000038: ab ab ab ab ab ab ab ab ........
After the patch:
0x000000: 1e 02 f8 00 ff ff ab ab ........
0x000008: ab ab ab ab ab ab ab ab ........
0x000010: ab ab ab ab ab ab ab ab ........
0x000018: ab ab ab ab ab ab ab ab ........
0x000020: ab ab ab ab ab ab ab ab ........
0x000028: ab ab ab ab ab ab ab ab ........
0x000030: ab ab ab ab ab ab ab ab ........
0x000038: ab ab ab ab ab ab ab ab ........
0x000000: d2 e2 d2 df d2 db d2 d7 ........
0x000008: d2 d3 d2 cf d2 cb d2 c7 ........
0x000010: d2 c4 d2 c0 d2 bc d2 b8 ........
0x000018: d2 b4 d2 b0 d2 ac d2 a8 ........
0x000020: ab ab ab ab ab ab ab ab ........
0x000028: ab ab ab ab ab ab ab ab ........
0x000030: ab ab ab ab ab ab ab ab ........
0x000038: ab ab ab ab ab ab ab ab ........
0x000000: 00 00 00 00 00 00 00 00 ........
0x000008: 00 00 00 00 00 00 00 00 ........
0x000010: 00 00 00 00 00 00 00 00 ........
0x000018: 00 00 00 00 00 00 00 00 ........
0x000020: ab ab ab ab ab ab ab ab ........
0x000028: ab ab ab ab ab ab ab ab ........
0x000030: ab ab ab ab ab ab ab ab ........
0x000038: ab ab ab ab ab ab ab ab ........
This can be reproduced by run kvm-unit-tests/hyperv_stimer.flat w/
CONFIG_PREEMPT and CONFIG_DEBUG_PREEMPT enabled.
Safe access to per-CPU data requires a couple of constraints, though: the
thread working with the data cannot be preempted and it cannot be migrated
while it manipulates per-CPU variables. If the thread is preempted, the
thread that replaces it could try to work with the same variables; migration
to another CPU could also cause confusion. However there is no preemption
disable when reads host per-CPU tsc rate to calculate the current kvmclock
timestamp.
This patch fixes it by utilizing get_cpu/put_cpu pair to guarantee both
__this_cpu_read() and rdtsc() are not preempted.
Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Dan Carpenter [Thu, 18 May 2017 07:38:53 +0000 (10:38 +0300)]
KVM: Silence underflow warning in avic_get_physical_id_entry()
Smatch complains that we check cap the upper bound of "index" but don't
check for negatives. It's a false positive because "index" is never
negative. But it's also simple enough to make it unsigned which makes
the code easier to audit.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Radim Krčmář [Thu, 18 May 2017 12:40:32 +0000 (14:40 +0200)]
Merge tag 'kvm-arm-for-v4.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm
KVM/ARM Fixes for v4.12-rc2.
Includes:
- A fix for a build failure introduced in -rc1 when tracepoints are
enabled on 32-bit ARM.
- Disabling use of stack pointer protection in the hyp code which can
cause panics.
- A handful of VGIC fixes.
- A fix to the init of the redistributors on GICv3 systems that
prevented boot with kvmtool on GICv3 systems introduced in -rc1.
- A number of race conditions fixed in our MMU handling code.
- A fix for the guest being able to program the debug extensions for
the host on the 32-bit side.
Christoffer Dall [Wed, 17 May 2017 11:12:51 +0000 (13:12 +0200)]
KVM: arm/arm64: Fix bug when registering redist iodevs
If userspace creates the VCPUs after initializing the VGIC, then we end
up in a situation where we trigger a bug in kvm_vcpu_get_idx(), because
it is called prior to adding the VCPU into the vcpus array on the VM.
There is no tight coupling between the VCPU index and the area of the
redistributor region used for the VCPU, so we can simply ensure that all
creations of redistributors are serialized per VM, and increment an
offset when we successfully add a redistributor.
The vgic_register_redist_iodev() function can be called from two paths:
vgic_redister_all_redist_iodev() which is called via the kvm_vgic_addr()
device attribute handler. This patch already holds the kvm->lock mutex.
The other path is via kvm_vgic_vcpu_init, which is called through a
longer chain from kvm_vm_ioctl_create_vcpu(), which releases the
kvm->lock mutex just before calling kvm_arch_vcpu_create(), so we can
simply take this mutex again later for our purposes.
Fixes: ab6f468c10 ("KVM: arm/arm64: Register iodevs when setting redist base and creating VCPUs") Signed-off-by: Christoffer Dall <cdall@linaro.org> Tested-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Reviewed-by: Eric Auger <eric.auger@redhat.com>
Paolo Bonzini [Tue, 18 Apr 2017 10:41:18 +0000 (12:41 +0200)]
KVM: x86: lower default for halt_poll_ns
In some fio benchmarks, halt_poll_ns=400000 caused CPU utilization to
increase heavily even in cases where the performance improvement was
small. In particular, bandwidth divided by CPU usage was as much as
60% lower.
To some extent this is the expected effect of the patch, and the
additional CPU utilization is only visible when running the
benchmarks. However, halving the threshold also halves the extra
CPU utilization (from +30-130% to +20-70%) and has no negative
effect on performance.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Suzuki K Poulose [Tue, 16 May 2017 09:34:55 +0000 (10:34 +0100)]
kvm: arm/arm64: Fix use after free of stage2 page table
We yield the kvm->mmu_lock occassionaly while performing an operation
(e.g, unmap or permission changes) on a large area of stage2 mappings.
However this could possibly cause another thread to clear and free up
the stage2 page tables while we were waiting for regaining the lock and
thus the original thread could end up in accessing memory that was
freed. This patch fixes the problem by making sure that the stage2
pagetable is still valid after we regain the lock. The fact that
mmu_notifer->release() could be called twice (via __mmu_notifier_release
and mmu_notifier_unregsister) enhances the possibility of hitting
this race where there are two threads trying to unmap the entire guest
shadow pages.
While at it, cleanup the redudant checks around cond_resched_lock in
stage2_wp_range(), as cond_resched_lock already does the same checks.
Cc: Mark Rutland <mark.rutland@arm.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: andreyknvl@google.com Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: stable@vger.kernel.org Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: Christoffer Dall <cdall@linaro.org> Signed-off-by: Christoffer Dall <cdall@linaro.org>
Suzuki K Poulose [Tue, 16 May 2017 09:34:54 +0000 (10:34 +0100)]
kvm: arm/arm64: Force reading uncached stage2 PGD
Make sure we don't use a cached value of the KVM stage2 PGD while
resetting the PGD.
Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: stable@vger.kernel.org Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: Christoffer Dall <cdall@linaro.org> Signed-off-by: Christoffer Dall <cdall@linaro.org>
Paolo Bonzini [Thu, 11 May 2017 11:23:29 +0000 (13:23 +0200)]
KVM: nVMX: fix EPT permissions as reported in exit qualification
This fixes the new ept_access_test_read_only and ept_access_test_read_write
testcases from vmx.flat.
The problem is that gpte_access moves bits around to switch from EPT
bit order (XWR) to ACC_*_MASK bit order (RWX). This results in an
incorrect exit qualification. To fix this, make pt_access and
pte_access operate on raw PTE values (only with NX flipped to mean
"can execute") and call gpte_access at the end of the walk. This
lets us use pte_access to compute the exit qualification with XWR
bit order.
Wanpeng Li [Thu, 11 May 2017 09:58:56 +0000 (02:58 -0700)]
KVM: VMX: Don't enable EPT A/D feature if EPT feature is disabled
We can observe eptad kvm_intel module parameter is still Y
even if ept is disabled which is weird. This patch will
not enable EPT A/D feature if EPT feature is disabled.
Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
SDM mentioned that "The MXCSR has several reserved bits, and attempting to write
a 1 to any of these bits will cause a general-protection exception(#GP) to be
generated". The syzkaller forks' testcase overrides xsave area w/ random values
and steps on the reserved bits of MXCSR register. The damaged MXCSR register
values of guest will be restored to SSEx MXCSR register before vmentry. This
patch fixes it by catching userspace override MXCSR register reserved bits w/
random values and bails out immediately.
Reported-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Zhichao Huang [Thu, 11 May 2017 12:46:12 +0000 (13:46 +0100)]
KVM: arm: rename pm_fake handler to trap_raz_wi
pm_fake doesn't quite describe what the handler does (ignoring writes
and returning 0 for reads).
As we're about to use it (a lot) in a different context, rename it
with a (admitedly cryptic) name that make sense for all users.
Signed-off-by: Zhichao Huang <zhichao.huang@linaro.org> Reviewed-by: Alex Bennee <alex.bennee@linaro.org> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Christoffer Dall <cdall@linaro.org>
Hardware debugging in guests is not intercepted currently, it means
that a malicious guest can bring down the entire machine by writing
to the debug registers.
This patch enable trapping of all debug registers, preventing the
guests to access the debug registers. This includes access to the
debug mode(DBGDSCR) in the guest world all the time which could
otherwise mess with the host state. Reads return 0 and writes are
ignored (RAZ_WI).
The result is the guest cannot detect any working hardware based debug
support. As debug exceptions are still routed to the guest normal
debug using software based breakpoints still works.
To support debugging using hardware registers we need to implement a
debug register aware world switch as well as special trapping for
registers that may affect the host state.
Cc: stable@vger.kernel.org Signed-off-by: Zhichao Huang <zhichao.huang@linaro.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Christoffer Dall <cdall@linaro.org> Signed-off-by: Christoffer Dall <cdall@linaro.org>
In kvm_free_stage2_pgd() we check the stage2 PGD before holding
the lock and proceed to take the lock if it is valid. And we unmap
the page tables, followed by releasing the lock. We reset the PGD
only after dropping this lock, which could cause a race condition
where another thread waiting on or even holding the lock, could
potentially see that the PGD is still valid and proceed to perform
a stage2 operation and later encounter a NULL PGD.
This patch moves the stage2 PGD manipulation under the lock.
Reported-by: Alexander Graf <agraf@suse.de> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Reviewed-by: Christoffer Dall <cdall@linaro.org> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Christoffer Dall <cdall@linaro.org>
Marc Zyngier [Tue, 2 May 2017 13:30:41 +0000 (14:30 +0100)]
KVM: arm/arm64: vgic-v3: Use PREbits to infer the number of ICH_APxRn_EL2 registers
The GICv3 documentation is extremely confusing, as it talks about
the number of priorities represented by the ICH_APxRn_EL2 registers,
while it should really talk about the number of preemption levels.
This leads to a bug where we may access undefined ICH_APxRn_EL2
registers, since PREbits is allowed to be smaller than PRIbits.
Thankfully, nobody seem to have taken this path so far...
The fix is to use ICH_VTR_EL2.PREbits instead.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Christoffer Dall <cdall@linaro.org> Signed-off-by: Christoffer Dall <cdall@linaro.org>
Marc Zyngier [Tue, 2 May 2017 13:30:40 +0000 (14:30 +0100)]
KVM: arm/arm64: vgic-v3: Do not use Active+Pending state for a HW interrupt
When an interrupt is injected with the HW bit set (indicating that
deactivation should be propagated to the physical distributor),
special care must be taken so that we never mark the corresponding
LR with the Active+Pending state (as the pending state is kept in
the physycal distributor).
Cc: stable@vger.kernel.org Fixes: 73315df4e7de ("KVM: arm/arm64: vgic-new: Add GICv3 world switch backend") Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Christoffer Dall <cdall@linaro.org> Signed-off-by: Christoffer Dall <cdall@linaro.org>
Marc Zyngier [Tue, 2 May 2017 13:30:39 +0000 (14:30 +0100)]
KVM: arm/arm64: vgic-v2: Do not use Active+Pending state for a HW interrupt
When an interrupt is injected with the HW bit set (indicating that
deactivation should be propagated to the physical distributor),
special care must be taken so that we never mark the corresponding
LR with the Active+Pending state (as the pending state is kept in
the physycal distributor).
Cc: stable@vger.kernel.org Fixes: 12c7421915e2 ("KVM: arm/arm64: vgic-new: Add GICv2 world switch backend") Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Christoffer Dall <cdall@linaro.org> Signed-off-by: Christoffer Dall <cdall@linaro.org>
Marc Zyngier [Tue, 2 May 2017 13:30:38 +0000 (14:30 +0100)]
arm: KVM: Do not use stack-protector to compile HYP code
We like living dangerously. Nothing explicitely forbids stack-protector
to be used in the HYP code, while distributions routinely compile their
kernel with it. We're just lucky that no code actually triggers the
instrumentation.
Let's not try our luck for much longer, and disable stack-protector
for code living at HYP.
Cc: stable@vger.kernel.org Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Acked-by: Christoffer Dall <cdall@linaro.org> Signed-off-by: Christoffer Dall <cdall@linaro.org>
Marc Zyngier [Tue, 2 May 2017 13:30:37 +0000 (14:30 +0100)]
arm64: KVM: Do not use stack-protector to compile EL2 code
We like living dangerously. Nothing explicitely forbids stack-protector
to be used in the EL2 code, while distributions routinely compile their
kernel with it. We're just lucky that no code actually triggers the
instrumentation.
Let's not try our luck for much longer, and disable stack-protector
for code living at EL2.
Cc: stable@vger.kernel.org Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Acked-by: Christoffer Dall <cdall@linaro.org> Signed-off-by: Christoffer Dall <cdall@linaro.org>
Marc Zyngier [Fri, 12 May 2017 10:04:52 +0000 (11:04 +0100)]
ARM: KVM: Fix tracepoint generation after move to virt/kvm/arm/
Moving most of the shared code to virt/kvm/arm had for consequence
that KVM/ARM doesn't build anymore, because the code that used to
define the tracepoints is now somewhere else.
Fix this by defining CREATE_TRACE_POINTS in coproc.c, and clean-up
trace.h as well.
Fixes: ec8d731a3fc7 ("KVM: arm/arm64: Move shared files to virt/kvm/arm") Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <cdall@linaro.org>
Linus Torvalds [Sat, 13 May 2017 17:25:05 +0000 (10:25 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull some more input subsystem updates from Dmitry Torokhov:
"An updated xpad driver with a few more recognized device IDs, and a
new psxpad-spi driver, allowing connecting Playstation 1 and 2 joypads
via SPI bus"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: cros_ec_keyb - remove extraneous 'const'
Input: add support for PlayStation 1/2 joypads connected via SPI
Input: xpad - add USB IDs for Mad Catz Brawlstick and Razer Sabertooth
Input: xpad - sync supported devices with xboxdrv
Input: xpad - sort supported devices by USB ID
Linus Torvalds [Sat, 13 May 2017 17:23:12 +0000 (10:23 -0700)]
Merge tag 'upstream-4.12-rc1' of git://git.infradead.org/linux-ubifs
Pull UBI/UBIFS updates from Richard Weinberger:
- new config option CONFIG_UBIFS_FS_SECURITY
- minor improvements
- random fixes
* tag 'upstream-4.12-rc1' of git://git.infradead.org/linux-ubifs:
ubi: Add debugfs file for tracking PEB state
ubifs: Fix a typo in comment of ioctl2ubifs & ubifs2ioctl
ubifs: Remove unnecessary assignment
ubifs: Fix cut and paste error on sb type comparisons
ubi: fastmap: Fix slab corruption
ubifs: Add CONFIG_UBIFS_FS_SECURITY to disable/enable security labels
ubi: Make mtd parameter readable
ubi: Fix section mismatch
Linus Torvalds [Sat, 13 May 2017 17:20:02 +0000 (10:20 -0700)]
Merge branch 'for-linus-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml
Pull UML fixes from Richard Weinberger:
"No new stuff, just fixes"
* 'for-linus-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
um: Add missing NR_CPUS include
um: Fix to call read_initrd after init_bootmem
um: Include kbuild.h instead of duplicating its macros
um: Fix PTRACE_POKEUSER on x86_64
um: Set number of CPUs
um: Fix _print_addr()
Linus Torvalds [Sat, 13 May 2017 16:49:35 +0000 (09:49 -0700)]
Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
"15 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm, docs: update memory.stat description with workingset* entries
mm: vmscan: scan until it finds eligible pages
mm, thp: copying user pages must schedule on collapse
dax: fix PMD data corruption when fault races with write
dax: fix data corruption when fault races with write
ext4: return to starting transaction in ext4_dax_huge_fault()
mm: fix data corruption due to stale mmap reads
dax: prevent invalidation of mapped DAX entries
Tigran has moved
mm, vmalloc: fix vmalloc users tracking properly
mm/khugepaged: add missed tracepoint for collapse_huge_page_swapin
gcov: support GCC 7.1
mm, vmstat: Remove spurious WARN() during zoneinfo print
time: delete current_fs_time()
hwpoison, memcg: forcibly uncharge LRU pages
With investigation, skipping page of isolate_lru_pages makes reclaim
void because it returns zero nr_taken easily so LRU shrinking is
effectively nothing and just increases priority aggressively. Finally,
OOM happens.
The problem is that get_scan_count determines nr_to_scan with eligible
zones so although priority drops to zero, it couldn't reclaim any pages
if the LRU contains mostly ineligible pages.
Assumes sc->priority is 0 and LRU list is as follows.
N-N-N-N-H-H-H-H-H-H-H-H-H-H-H-H-H-H-H-H
(Ie, small eligible pages are in the head of LRU but others are
almost ineligible pages)
In that case, size becomes 4 so VM want to scan 4 pages but 4 pages from
tail of the LRU are not eligible pages. If get_scan_count counts
skipped pages, it doesn't reclaim any pages remained after scanning 4
pages so it ends up OOM happening.
This patch makes isolate_lru_pages try to scan pages until it encounters
eligible zones's pages.
[akpm@linux-foundation.org: clean up mind-bending `for' statement. Tweak comment text] Fixes: 4c6b86eb9d02 ("Revert "mm, vmscan: account for skipped pages as a partial scan"") Link: http://lkml.kernel.org/r/1494457232-27401-1-git-send-email-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
grab_mapping_entry()
- we add huge zero page to the radix tree
and map it to page tables
The result is that hole page is mapped into page tables (and thus zeros
are seen in mmap) while file has data written in that place.
Fix the problem by locking exception entry before mapping blocks for the
fault. That way we are sure invalidate_inode_pages2_range() call for
racing write will either block on entry lock waiting for the fault to
finish (and unmap stale page tables after that) or read fault will see
already allocated blocks by write(2).
Fixes: 724fa8d602970 ("dax: Call ->iomap_begin without entry lock during dax fault") Link: http://lkml.kernel.org/r/20170510172700.18991-1-ross.zwisler@linux.intel.com Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Dan Williams <dan.j.williams@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Kara [Fri, 12 May 2017 22:46:57 +0000 (15:46 -0700)]
dax: fix data corruption when fault races with write
Currently DAX read fault can race with write(2) in the following way:
CPU1 - write(2) CPU2 - read fault
dax_iomap_pte_fault()
->iomap_begin() - sees hole
dax_iomap_rw()
iomap_apply()
->iomap_begin - allocates blocks
dax_iomap_actor()
invalidate_inode_pages2_range()
- there's nothing to invalidate
grab_mapping_entry()
- we add zero page in the radix tree
and map it to page tables
The result is that hole page is mapped into page tables (and thus zeros
are seen in mmap) while file has data written in that place.
Fix the problem by locking exception entry before mapping blocks for the
fault. That way we are sure invalidate_inode_pages2_range() call for
racing write will either block on entry lock waiting for the fault to
finish (and unmap stale page tables after that) or read fault will see
already allocated blocks by write(2).
Fixes: 724fa8d602970d4e05b4c04731e1b5812eecc65c Link: http://lkml.kernel.org/r/20170510085419.27601-5-jack@suse.cz Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Kara [Fri, 12 May 2017 22:46:54 +0000 (15:46 -0700)]
ext4: return to starting transaction in ext4_dax_huge_fault()
DAX will return to locking exceptional entry before mapping blocks for a
page fault to fix possible races with concurrent writes. To avoid lock
inversion between exceptional entry lock and transaction start, start
the transaction already in ext4_dax_huge_fault().
Fixes: 724fa8d602970d4e05b4c04731e1b5812eecc65c Link: http://lkml.kernel.org/r/20170510085419.27601-4-jack@suse.cz Signed-off-by: Jan Kara <jack@suse.cz> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Kara [Fri, 12 May 2017 22:46:50 +0000 (15:46 -0700)]
mm: fix data corruption due to stale mmap reads
Currently, we didn't invalidate page tables during invalidate_inode_pages2()
for DAX. That could result in e.g. 2MiB zero page being mapped into
page tables while there were already underlying blocks allocated and
thus data seen through mmap were different from data seen by read(2).
The following sequence reproduces the problem:
- open an mmap over a 2MiB hole
- read from a 2MiB hole, faulting in a 2MiB zero page
- write to the hole with write(3p). The write succeeds but we
incorrectly leave the 2MiB zero page mapping intact.
- via the mmap, read the data that was just written. Since the zero
page mapping is still intact we read back zeroes instead of the new
data.
Fix the problem by unconditionally calling invalidate_inode_pages2_range()
in dax_iomap_actor() for new block allocations and by properly
invalidating page tables in invalidate_inode_pages2_range() for DAX
mappings.
Fixes: 6c8310064fa28c552eb202d9ab72be34cbcfdd18 Link: http://lkml.kernel.org/r/20170510085419.27601-3-jack@suse.cz Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ross Zwisler [Fri, 12 May 2017 22:46:47 +0000 (15:46 -0700)]
dax: prevent invalidation of mapped DAX entries
Patch series "mm,dax: Fix data corruption due to mmap inconsistency",
v4.
This series fixes data corruption that can happen for DAX mounts when
page faults race with write(2) and as a result page tables get out of
sync with block mappings in the filesystem and thus data seen through
mmap is different from data seen through read(2).
The series passes testing with t_mmap_stale test program from Ross and
also other mmap related tests on DAX filesystem.
This patch (of 4):
dax_invalidate_mapping_entry() currently removes DAX exceptional entries
only if they are clean and unlocked. This is done via:
However, for page cache pages removed in invalidate_mapping_pages()
there is an additional criteria which is that the page must not be
mapped. This is noted in the comments above invalidate_mapping_pages()
and is checked in invalidate_inode_page().
For DAX entries this means that we can can end up in a situation where a
DAX exceptional entry, either a huge zero page or a regular DAX entry,
could end up mapped but without an associated radix tree entry. This is
inconsistent with the rest of the DAX code and with what happens in the
page cache case.
We aren't able to unmap the DAX exceptional entry because according to
its comments invalidate_mapping_pages() isn't allowed to block, and
unmap_mapping_range() takes a write lock on the mapping->i_mmap_rwsem.
Since we essentially never have unmapped DAX entries to evict from the
radix tree, just remove dax_invalidate_mapping_entry().
Fixes: 6c8310064fa2 ("mm: Invalidate DAX radix tree entries only if appropriate") Link: http://lkml.kernel.org/r/20170510085419.27601-2-jack@suse.cz Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Reported-by: Jan Kara <jack@suse.cz> Cc: Dan Williams <dan.j.williams@intel.com> Cc: <stable@vger.kernel.org> [4.10+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michal Hocko [Fri, 12 May 2017 22:46:41 +0000 (15:46 -0700)]
mm, vmalloc: fix vmalloc users tracking properly
Commit ea8a428e4e58 ("mm, vmalloc: properly track vmalloc users") has
pulled asm/pgtable.h include dependency to linux/vmalloc.h and that
turned out to be a bad idea for some architectures. E.g. m68k fails
with
In file included from arch/m68k/include/asm/pgtable_mm.h:145:0,
from arch/m68k/include/asm/pgtable.h:4,
from include/linux/vmalloc.h:9,
from arch/m68k/kernel/module.c:9:
arch/m68k/include/asm/mcf_pgtable.h: In function 'nocache_page':
>> arch/m68k/include/asm/mcf_pgtable.h:339:43: error: 'init_mm' undeclared (first use in this function)
#define pgd_offset_k(address) pgd_offset(&init_mm, address)
as spotted by kernel build bot. nios2 fails for other reason
In file included from include/asm-generic/io.h:767:0,
from arch/nios2/include/asm/io.h:61,
from include/linux/io.h:25,
from arch/nios2/include/asm/pgtable.h:18,
from include/linux/mm.h:70,
from include/linux/pid_namespace.h:6,
from include/linux/ptrace.h:9,
from arch/nios2/include/uapi/asm/elf.h:23,
from arch/nios2/include/asm/elf.h:22,
from include/linux/elf.h:4,
from include/linux/module.h:15,
from init/main.c:16:
include/linux/vmalloc.h: In function '__vmalloc_node_flags':
include/linux/vmalloc.h:99:40: error: 'PAGE_KERNEL' undeclared (first use in this function); did you mean 'GFP_KERNEL'?
which is due to the newly added #include <asm/pgtable.h>, which on nios2
includes <linux/io.h> and thus <asm/io.h> and <asm-generic/io.h> which
again includes <linux/vmalloc.h>.
Tweaking that around just turns out a bigger headache than necessary.
This patch reverts ea8a428e4e58 and reimplements the original fix in a
different way. __vmalloc_node_flags can stay static inline which will
cover vmalloc* functions. We only have one external user
(kvmalloc_node) and we can export __vmalloc_node_flags_caller and
provide the caller directly. This is much simpler and it doesn't really
need any games with header files.
SeongJae Park [Fri, 12 May 2017 22:46:38 +0000 (15:46 -0700)]
mm/khugepaged: add missed tracepoint for collapse_huge_page_swapin
One return case of `__collapse_huge_page_swapin()` does not invoke
tracepoint while every other return case does. This commit adds a
tracepoint invocation for the case.
Link: http://lkml.kernel.org/r/20170507101813.30187-1-sj38.park@gmail.com Signed-off-by: SeongJae Park <sj38.park@gmail.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Reza Arbab [Fri, 12 May 2017 22:46:32 +0000 (15:46 -0700)]
mm, vmstat: Remove spurious WARN() during zoneinfo print
After commit 5b61b7818a88 ("mm, vmstat: print non-populated zones in
zoneinfo"), /proc/zoneinfo will show unpopulated zones.
A memoryless node, having no populated zones at all, was previously
ignored, but will now trigger the WARN() in is_zone_first_populated().
Remove this warning, as its only purpose was to warn of a situation that
has since been enabled.
Aside: The "per-node stats" are still printed under the first populated
zone, but that's not necessarily the first stanza any more. I'm not
sure which criteria is more important with regard to not breaking
parsers, but it looks a little weird to the eye.
Michal Hocko [Fri, 12 May 2017 22:46:26 +0000 (15:46 -0700)]
hwpoison, memcg: forcibly uncharge LRU pages
Laurent Dufour has noticed that hwpoinsoned pages are kept charged. In
his particular case he has hit a bad_page("page still charged to
cgroup") when onlining a hwpoison page. While this looks like something
that shouldn't happen in the first place because onlining hwpages and
returning them to the page allocator makes only little sense it shows a
real problem.
hwpoison pages do not get freed usually so we do not uncharge them (at
least not since commit 81c31c36ef54 ("mm: memcontrol: rewrite uncharge
API")). Each charge pins memcg (since 571987223af3 ("mm: memcontrol:
take a css reference for each charged page")) as well and so the
mem_cgroup and the associated state will never go away. Fix this leak
by forcibly uncharging a LRU hwpoisoned page in delete_from_lru_cache().
We also have to tweak uncharge_list because it cannot rely on zero ref
count for these pages.
Linus Torvalds [Fri, 12 May 2017 22:43:10 +0000 (15:43 -0700)]
Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm fixes from Dan Williams:
"Incremental fixes and a small feature addition on top of the main
libnvdimm 4.12 pull request:
- Geert noticed that tinyconfig was bloated by BLOCK selecting DAX.
The size regression is fixed by moving all dax helpers into the
dax-core and only specifying "select DAX" for FS_DAX and
dax-capable drivers. He also asked for clarification of the
NR_DEV_DAX config option which, on closer look, does not need to be
a config option at all. Mike also throws in a DEV_DAX_PMEM fixup
for good measure.
- Ben's attention to detail on -stable patch submissions caught a
case where the recent fixes to arch_copy_from_iter_pmem() missed a
condition where we strand dirty data in the cache. This is tagged
for -stable and will also be included in the rework of the pmem api
to a proposed {memcpy,copy_user}_flushcache() interface for 4.13.
- Vishal adds a feature that missed the initial pull due to pending
review feedback. It allows the kernel to clear media errors when
initializing a BTT (atomic sector update driver) instance on a pmem
namespace.
- Ross noticed that the dax_device + dax_operations conversion broke
__dax_zero_page_range(). The nvdimm unit tests fail to check this
path, but xfstests immediately trips over it. No excuse for missing
this before submitting the 4.12 pull request.
These all pass the nvdimm unit tests and an xfstests spot check. The
set has received a build success notification from the kbuild robot"
* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
filesystem-dax: fix broken __dax_zero_page_range() conversion
libnvdimm, btt: ensure that initializing metadata clears poison
libnvdimm: add an atomic vs process context flag to rw_bytes
x86, pmem: Fix cache flushing for iovec write < 8 bytes
device-dax: kill NR_DEV_DAX
block, dax: move "select DAX" from BLOCK to FS_DAX
device-dax: Tell kbuild DEV_DAX_PMEM depends on DEV_DAX
Linus Torvalds [Fri, 12 May 2017 19:10:38 +0000 (12:10 -0700)]
Merge tag 'sound-fix-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"This contains a one-liner change that has a significant impact:
disabling the build of OSS. It's been unmaintained for long time, and
we'd like to drop the stuff. Finally, as the first step, stop the
build. Let's see whether it works without much complaints.
Other than that, there are two small fixes for HD-audio"
* tag 'sound-fix-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
sound: Disable the build of OSS drivers
ALSA: hda: Fix cpu lockup when stopping the cmd dmas
ALSA: hda - Add mute led support for HP EliteBook 840 G3
Linus Torvalds [Fri, 12 May 2017 19:02:21 +0000 (12:02 -0700)]
Merge tag 'for-v4.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply
Pull more power-supply updates from Sebastian Reichel:
"The power-supply subsystem has a few more changes for the v4.12 merge
window:
- New battery driver for AXP20X and AXP22X PMICs
- Improve max17042_battery for usage on x86
- Misc small cleanups & fixes"
* tag 'for-v4.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply: (34 commits)
power: supply: cpcap-charger: Keep trickle charger bits disabled
power: supply: cpcap-charger: Fix enable for 3.8V charge setting
power: supply: cpcap-charger: Fix charge voltage configuration
power: supply: cpcap-charger: Fix charger name
power: supply: twl4030-charger: make twl4030_bci_property_is_writeable static
power: supply: sbs-battery: Add alert callback
mailmap: add Sebastian Reichel
power: supply: avoid unused twl4030-madc.h
power: supply: sbs-battery: Correct supply status with current draw
power: supply: sbs-battery: Don't ignore the first external power change
power: supply: pda_power: move from timer to delayed_work
power: supply: max17042_battery: Add support for the SCOPE property
power: supply: max17042_battery: Add support for the CHARGE_NOW property
power: supply: max17042_battery: Add support for the CHARGE_FULL_DESIGN property
power: supply: max17042_battery: mAh readings depend on r_sns value
power: supply: max17042_battery: Add support for the VOLT_MIN property
power: supply: max17042_battery: Add support for the TECHNOLOGY attribute
power: supply: max17042_battery: Add external_power_changed callback
power: supply: max17042_battery: Add support for the STATUS property
power: supply: max17042_battery: Add default platform_data fallback data
...
Linus Torvalds [Fri, 12 May 2017 18:58:45 +0000 (11:58 -0700)]
Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux
Pull thermal management updates from Zhang Rui:
- Fix a problem where orderly_shutdown() is called for multiple times
due to multiple critical overheating events raised in a short period
by platform thermal driver. (Keerthy)
- Introduce a backup thermal shutdown mechanism, which invokes
kernel_power_off()/emergency_restart() directly, after
orderly_shutdown() being issued for certain amount of time(specified
via Kconfig). This is useful in certain conditions that userspace may
be unable to power off the system in a clean manner and leaves the
system in a critical state, like in the middle of driver probing
phase. (Keerthy)
- Introduce a new interface in thermal devfreq_cooling code so that the
driver can provide more precise data regarding actual power to the
thermal governor every time the power budget is calculated. (Lukasz
Luba)
- Introduce BCM 2835 soc thermal driver and northstar thermal driver,
within a new sub-folder. (Rafał Miłecki)
- Small fix on MTK and intel-soc-dts thermal driver. (Dawei Chien,
Brian Bian)
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux: (25 commits)
thermal: core: Add a back up thermal shutdown mechanism
thermal: core: Allow orderly_poweroff to be called only once
Thermal: Intel SoC DTS: Change interrupt request behavior
trace: thermal: add another parameter 'power' to the tracing function
thermal: devfreq_cooling: add new interface for direct power read
thermal: devfreq_cooling: refactor code and add get_voltage function
thermal: mt8173: minor mtk_thermal.c cleanups
thermal: bcm2835: move to the broadcom subdirectory
thermal: broadcom: ns: specify myself as MODULE_AUTHOR
thermal: da9062/61: Thermal junction temperature monitoring driver
Documentation: devicetree: thermal: da9062/61 TJUNC temperature binding
thermal: broadcom: add Northstar thermal driver
dt-bindings: thermal: add support for Broadcom's Northstar thermal
thermal: bcm2835: add thermal driver for bcm2835 SoC
dt-bindings: Add thermal zone to bcm2835-thermal example
thermal: rcar_gen3_thermal: add suspend and resume support
thermal: rcar_gen3_thermal: store device match data in private structure
thermal: rcar_gen3_thermal: enable hardware interrupts for trip points
thermal: rcar_gen3_thermal: record and check number of TSCs found
thermal: rcar_gen3_thermal: check that TSC exists before memory allocation
...
Linus Torvalds [Fri, 12 May 2017 18:48:26 +0000 (11:48 -0700)]
Merge tag 'drm-fixes-for-v4.12-rc1' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"AMD, nouveau, one i915, and one EDID fix for v4.12-rc1
Some fixes that it would be good to have in rc1. It contains the i915
quiet fix that you reported.
It also has an amdgpu fixes pull, with lots of ongoing work on Vega10
which is new in this kernel and is preliminary support so may have a
fair bit of movement.
Otherwise a few non-Vega10 AMD fixes, one EDID fix and some nouveau
regression fixers"
* tag 'drm-fixes-for-v4.12-rc1' of git://people.freedesktop.org/~airlied/linux: (144 commits)
drm/i915: Make vblank evade warnings optional
drm/nouveau/therm: remove ineffective workarounds for alarm bugs
drm/nouveau/tmr: avoid processing completed alarms when adding a new one
drm/nouveau/tmr: fix corruption of the pending list when rescheduling an alarm
drm/nouveau/tmr: handle races with hw when updating the next alarm time
drm/nouveau/tmr: ack interrupt before processing alarms
drm/nouveau/core: fix static checker warning
drm/nouveau/fb/ram/gf100-: remove 0x10f200 read
drm/nouveau/kms/nv50: skip core channel cursor update on position-only changes
drm/nouveau/kms/nv50: fix source-rect-only plane updates
drm/nouveau/kms/nv50: remove pointless argument to window atomic_check_acquire()
drm/amd/powerplay: refine pwm1_enable callback functions for CI.
drm/amd/powerplay: refine pwm1_enable callback functions for vi.
drm/amd/powerplay: refine pwm1_enable callback functions for Vega10.
drm/amdgpu: refine amdgpu pwm1_enable sysfs interface.
drm/amdgpu: add amd fan ctrl mode enums.
drm/amd/powerplay: add more smu message on Vega10.
drm/amdgpu: fix dependency issue
drm/amd: fix init order of sched job
drm/amdgpu: add some additional vega10 pci ids
...
Linus Torvalds [Fri, 12 May 2017 18:44:13 +0000 (11:44 -0700)]
Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target updates from Nicholas Bellinger:
"Things were a lot more calm than previously expected. It's primarily
fixes in various areas, with most of the new functionality centering
around TCMU backend driver work that Xiubo Li has been driving.
Here's the summary on the feature side:
- Make T10-PI verify configurable for emulated (FILEIO + RD) backends
(Dmitry Monakhov)
- Allow target-core/TCMU pass-through to use in-kernel SPC-PR logic
(Bryant Ly + MNC)
- Add TCMU support for growing ring buffer size (Xiubo Li + MNC)
- Add TCMU support for global block data pool (Xiubo Li + MNC)
and on the bug-fix side:
- Fix COMPARE_AND_WRITE non GOOD status handling for READ phase
failures (Gary Guo + nab)
- Fix iscsi-target hang with explicitly changing per NodeACL
CmdSN number depth with concurrent login driven session
reinstatement. (Gary Guo + nab)
- Fix ibmvscsis fabric driver ABORT task handling (Bryant Ly)
- Fix target-core/FILEIO zero length handling (Bart Van Assche)
Also, there was an OOPs introduced with the WRITE_VERIFY changes that
I ended up reverting at the last minute, because as not unusual Bart
and I could not agree on the fix in time for -rc1. Since it's specific
to a conformance test, it's been reverted for now.
There is a separate patch in the queue to address the underlying
control CDB write overflow regression in >= v4.3 separate from the
WRITE_VERIFY revert here, that will be pushed post -rc1"
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (30 commits)
Revert "target: Fix VERIFY and WRITE VERIFY command parsing"
IB/srpt: Avoid that aborting a command triggers a kernel warning
IB/srpt: Fix abort handling
target/fileio: Fix zero-length READ and WRITE handling
ibmvscsis: Do not send aborted task response
tcmu: fix module removal due to stuck thread
target: Don't force session reset if queue_depth does not change
iscsi-target: Set session_fall_back_to_erl0 when forcing reinstatement
target: Fix compare_and_write_callback handling for non GOOD status
tcmu: Recalculate the tcmu_cmd size to save cmd area memories
tcmu: Add global data block pool support
tcmu: Add dynamic growing data area feature support
target: fixup error message in target_tg_pt_gp_tg_pt_gp_id_store()
target: fixup error message in target_tg_pt_gp_alua_access_type_store()
target/user: PGR Support
target: Add WRITE_VERIFY_16
Documentation/target: add an example script to configure an iSCSI target
target: Use kmalloc_array() in transport_kmap_data_sg()
target: Use kmalloc_array() in compare_and_write_callback()
target: Improve size determinations in two functions
...
Linus Torvalds [Fri, 12 May 2017 18:39:59 +0000 (11:39 -0700)]
Merge branch 'work.sane_pwd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull misc vfs updates from Al Viro:
"Making sure that something like a referral point won't end up as pwd
or root.
The main part is the last commit (fixing mntns_install()); that one
fixes a hard-to-hit race. The fchdir() commit is making fchdir(2) a
bit more robust - it should be impossible to get opened files (even
O_PATH ones) for referral points in the first place, so the existing
checks are OK, but checking the same thing as in chdir(2) is just as
cheap.
The path_init() commit removes a redundant check that shouldn't have
been there in the first place"
* 'work.sane_pwd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
make sure that mntns_install() doesn't end up with referral for root
path_init(): don't bother with checking MAY_EXEC for LOOKUP_ROOT
make sure that fchdir() won't accept referral points, etc.
Linus Torvalds [Fri, 12 May 2017 17:45:36 +0000 (10:45 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates/fixes from Ingo Molnar:
"Mostly tooling updates, but also two kernel fixes: a call chain
handling robustness fix and an x86 PMU driver event definition fix"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/callchain: Force USER_DS when invoking perf_callchain_user()
tools build: Fixup sched_getcpu feature test
perf tests kmod-path: Don't fail if compressed modules aren't supported
perf annotate: Fix AArch64 comment char
perf tools: Fix spelling mistakes
perf/x86: Fix Broadwell-EP DRAM RAPL events
perf config: Refactor a duplicated code for obtaining config file name
perf symbols: Allow user probes on versioned symbols
perf symbols: Accept symbols starting at address 0
tools lib string: Adopt prefixcmp() from perf and subcmd
perf units: Move parse_tag_value() to units.[ch]
perf ui gtk: Move gtk .so name to the only place where it is used
perf tools: Move HAS_BOOL define to where perl headers are used
perf memswap: Split the byteswap memory range wrappers from util.[ch]
perf tools: Move event prototypes from util.h to event.h
perf buildid: Move prototypes from util.h to build-id.h
Linus Torvalds [Fri, 12 May 2017 17:41:45 +0000 (10:41 -0700)]
Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull stackprotector fixlet from Ingo Molnar:
"A single fix/enhancement to increase stackprotector canary randomness
on 64-bit kernels with very little cost"
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
stackprotector: Increase the per-task stack canary's random range from 32 bits to 64 bits on 64-bit platforms
Linus Torvalds [Fri, 12 May 2017 17:11:50 +0000 (10:11 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"Misc fixes:
- two boot crash fixes
- unwinder fixes
- kexec related kernel direct mappings enhancements/fixes
- more Clang support quirks
- minor cleanups
- Documentation fixes"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/intel_rdt: Fix a typo in Documentation
x86/build: Don't add -maccumulate-outgoing-args w/o compiler support
x86/boot/32: Fix UP boot on Quark and possibly other platforms
x86/mm/32: Set the '__vmalloc_start_set' flag in initmem_init()
x86/kexec/64: Use gbpages for identity mappings if available
x86/mm: Add support for gbpages to kernel_ident_mapping_init()
x86/boot: Declare error() as noreturn
x86/mm/kaslr: Use the _ASM_MUL macro for multiplication to work around Clang incompatibility
x86/mm: Fix boot crash caused by incorrect loop count calculation in sync_global_pgds()
x86/asm: Don't use RBP as a temporary register in csum_partial_copy_generic()
x86/microcode/AMD: Remove redundant NULL check on mc
Linus Torvalds [Fri, 12 May 2017 17:09:14 +0000 (10:09 -0700)]
Merge tag 'for-linus-4.12b-rc0c-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
"This contains two fixes for booting under Xen introduced during this
merge window and two fixes for older problems, where one is just much
more probable due to another merge window change"
* tag 'for-linus-4.12b-rc0c-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen: adjust early dom0 p2m handling to xen hypervisor behavior
x86/amd: don't set X86_BUG_SYSRET_SS_ATTRS when running under Xen
xen/x86: Do not call xen_init_time_ops() until shared_info is initialized
x86/xen: fix xsave capability setting
Linus Torvalds [Fri, 12 May 2017 17:04:09 +0000 (10:04 -0700)]
Merge tag 'powerpc-4.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull more powerpc updates from Michael Ellerman:
"The change to the Linux page table geometry was delayed for more
testing with 16G pages, and there's the new CPU features stuff which
just needed one more polish before going in. Plus a few changes from
Scott which came in a bit late. And then various fixes, mostly minor.
Summary highlights:
- rework the Linux page table geometry to lower memory usage on
64-bit Book3S (IBM chips) using the Hash MMU.
- support for a new device tree binding for discovering CPU features
on future firmwares.
- Freescale updates from Scott:
"Includes a fix for a powerpc/next mm regression on 64e, a fix for
a kernel hang on 64e when using a debugger inside a relocated
kernel, a qman fix, and misc qe improvements."
Thanks to: Christophe Leroy, Gavin Shan, Horia Geantă, LiuHailong,
Nicholas Piggin, Roy Pledge, Scott Wood, Valentin Longchamp"
* tag 'powerpc-4.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/64s: Support new device tree binding for discovering CPU features
powerpc: Don't print cpu_spec->cpu_name if it's NULL
of/fdt: introduce of_scan_flat_dt_subnodes and of_get_flat_dt_phandle
powerpc/64s: Fix unnecessary machine check handler relocation branch
powerpc/mm/book3s/64: Rework page table geometry for lower memory usage
powerpc: Fix distclean with Makefile.postlink
powerpc/64e: Don't place the stack beyond TASK_SIZE
powerpc/powernv: Block PCI config access on BCM5718 during EEH recovery
powerpc/8xx: Adding support of IRQ in MPC8xx GPIO
soc/fsl/qbman: Disable IRQs for deferred QBMan work
soc/fsl/qe: add EXPORT_SYMBOL for the 2 qe_tdm functions
soc/fsl/qe: only apply QE_General4 workaround on affected SoCs
soc/fsl/qe: round brg_freq to 1kHz granularity
soc/fsl/qe: get rid of immrbar_virt_to_phys()
net: ethernet: ucc_geth: fix MEM_PART_MURAM mode
powerpc/64e: Fix hang when debugging programs with relocated kernel
Linus Torvalds [Fri, 12 May 2017 16:56:30 +0000 (09:56 -0700)]
Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus
Pull MIPS updates from James Hogan:
"math-emu:
- Add missing clearing of BLTZALL and BGEZALL emulation counters
- Fix BC1EQZ and BC1NEZ condition handling
- Fix BLEZL and BGTZL identification
BPF:
- Add JIT support for SKF_AD_HATYPE
- Use unsigned access for unsigned SKB fields
- Quit clobbering callee saved registers in JIT code
- Fix multiple problems in JIT skb access helpers
Loongson 3:
- Select MIPS_L1_CACHE_SHIFT_6
Octeon:
- Remove vestiges of CONFIG_CAVIUM_OCTEON_2ND_KERNEL
- Remove unused L2C types and macros.
- Remove unused SLI types and macros.
- Fix compile error when USB is not enabled.
- Octeon: Remove unused PCIERCX types and macros.
- Octeon: Clean up platform code.
SNI:
- Remove recursive include of cpu-feature-overrides.h
Sibyte:
- Export symbol periph_rev to sb1250-mac network driver.
- Fix Kconfig warning.
Generic platform:
- Enable Root FS on NFS in generic_defconfig
SMP-MT:
- Use CPU interrupt controller IPI IRQ domain support
UASM:
- Add support for LHU for uasm.
- Remove needless ISA abstraction
mm:
- Add 48-bit VA space and 4-level page tables for 4K pages.
PCI:
- Add controllers before the specified head
irqchip driver for MIPS CPU:
- Replace magic 0x100 with IE_SW0
- Prepare for non-legacy IRQ domains
- Introduce IPI IRQ domain support
MAINTAINERS:
- Update email-id of Rahul Bedarkar
NET:
- sb1250-mac: Add missing MODULE_LICENSE()
CPUFREQ:
- Loongson2: drop set_cpus_allowed_ptr()
Misc:
- Disable Werror when W= is set
- Opt into HAVE_COPY_THREAD_TLS
- Enable GENERIC_CPU_AUTOPROBE
- Use common outgoing-CPU-notification code
- Remove dead define of ST_OFF
- Remove CONFIG_ARCH_HAS_ILOG2_U{32,64}
- Stengthen IPI IRQ domain sanity check
- Remove confusing else statement in __do_page_fault()
- Don't unnecessarily include kmalloc.h into <asm/cache.h>.
- Delete unused definition of SMP_CACHE_SHIFT.
- Delete redundant definition of SMP_CACHE_BYTES"
* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (39 commits)
MIPS: Sibyte: Fix Kconfig warning.
MIPS: Sibyte: Export symbol periph_rev to sb1250-mac network driver.
NET: sb1250-mac: Add missing MODULE_LICENSE()
MAINTAINERS: Update email-id of Rahul Bedarkar
MIPS: Remove confusing else statement in __do_page_fault()
MIPS: Stengthen IPI IRQ domain sanity check
MIPS: smp-mt: Use CPU interrupt controller IPI IRQ domain support
irqchip: mips-cpu: Introduce IPI IRQ domain support
irqchip: mips-cpu: Prepare for non-legacy IRQ domains
irqchip: mips-cpu: Replace magic 0x100 with IE_SW0
MIPS: Remove CONFIG_ARCH_HAS_ILOG2_U{32,64}
MIPS: generic: Enable Root FS on NFS in generic_defconfig
MIPS: mach-rm: Remove recursive include of cpu-feature-overrides.h
MIPS: Opt into HAVE_COPY_THREAD_TLS
CPUFREQ: Loongson2: drop set_cpus_allowed_ptr()
MIPS: uasm: Remove needless ISA abstraction
MIPS: Remove dead define of ST_OFF
MIPS: Use common outgoing-CPU-notification code
MIPS: math-emu: Fix BC1EQZ and BC1NEZ condition handling
MIPS: r2-on-r6-emu: Clear BLTZALL and BGEZALL debugfs counters
...
Linus Torvalds [Fri, 12 May 2017 16:53:16 +0000 (09:53 -0700)]
Merge tag 'nios2-v4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/lftan/nios2
Pull nios2 updates from Ley Foon Tan:
"nios2 fixes/enhancements and adding nios2 R2 support"
* tag 'nios2-v4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/lftan/nios2:
nios2: remove custom early console implementation
nios2: use generic strncpy_from_user() and strnlen_user()
nios2: Add CDX support
nios2: Add BMX support
nios2: Add NIOS2_ARCH_REVISION to select between R1 and R2
nios2: implement flush_dcache_mmap_lock/unlock
nios2: enable earlycon support
nios2: constify irq_domain_ops
nios2: remove wrapper header for cmpxchg.h
nios2: add .gitignore entries for auto-generated files
Takashi Iwai [Thu, 11 May 2017 09:14:45 +0000 (11:14 +0200)]
sound: Disable the build of OSS drivers
OSS drivers are left as badly unmaintained, and now we're facing a
problem to clean up the hackish set_fs() usage in their codes. Since
most of drivers have been covered by ALSA, and the others are dead old
and inactive, let's leave them RIP.
This patch is the first step: disable the build of OSS drivers.
We'll eventually drop the whole codes and clean up later.
Note that sound/oss/dmasound is still kept, since it's a completely
different implementation of OSS, and it doesn't suffer from set_fs()
hack. Moreover, the build of ALSA is disabled on M68K by some reason,
thus disabling it shall result in a regression. This one will be
disabled / removed once when we add the support in ALSA side.
Tested-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Takashi Iwai <tiwai@suse.de>
Paul Mackerras [Thu, 11 May 2017 04:31:59 +0000 (14:31 +1000)]
KVM: PPC: Book3S PR: Don't include SPAPR TCE code on non-pseries platforms
Commit 7b9b9c90a0f7 ("KVM: PPC: Enable IOMMU_API for KVM_BOOK3S_64
permanently", 2017-03-22) enabled the SPAPR TCE code for all 64-bit
Book 3S kernel configurations in order to simplify the code and
reduce #ifdefs. However, 64-bit Book 3S PPC platforms other than
pseries and powernv don't implement the necessary IOMMU callbacks,
leading to build failures like the following (for a pasemi config):
scripts/kconfig/conf --silentoldconfig Kconfig
warning: (KVM_BOOK3S_64) selects SPAPR_TCE_IOMMU which has unmet direct dependencies (IOMMU_SUPPORT && (PPC_POWERNV || PPC_PSERIES))
...
CC [M] arch/powerpc/kvm/book3s_64_vio.o
/home/paulus/kernel/kvm/arch/powerpc/kvm/book3s_64_vio.c: In function ‘kvmppc_clear_tce’:
/home/paulus/kernel/kvm/arch/powerpc/kvm/book3s_64_vio.c:363:2: error: implicit declaration of function ‘iommu_tce_xchg’ [-Werror=implicit-function-declaration]
iommu_tce_xchg(tbl, entry, &hpa, &dir);
^
To fix this, we make the inclusion of the SPAPR TCE support, and the
code that uses it in book3s_vio.c and book3s_vio_hv.c, depend on
the inclusion of support for the pseries and/or powernv platforms.
This means that when running a 'pseries' guest on those platforms,
the guest won't have in-kernel acceleration of the PAPR TCE hypercalls,
but at least now they compile.
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
The PR KVM implementation of the PAPR HPT hypercalls (H_ENTER etc.)
access an image of the HPT in userspace memory using copy_from_user
and copy_to_user. Recently, the declarations of those functions were
annotated to indicate that the return value must be checked. Since
this code doesn't currently check the return value, this causes
compile warnings like the ones shown below, and since on PPC the
default is to compile arch/powerpc with -Werror, this causes the
build to fail.
To fix this, we check the return values, and if non-zero, fail the
hypercall being processed with a H_FUNCTION error return value.
There is really no good error return value to use since PAPR didn't
envisage the possibility that the hypervisor may not be able to access
the guest's HPT, and H_FUNCTION (function not supported) seems as
good as any.
The typical compile warnings look like this:
CC arch/powerpc/kvm/book3s_pr_papr.o
/home/paulus/kernel/kvm/arch/powerpc/kvm/book3s_pr_papr.c: In function ‘kvmppc_h_pr_enter’:
/home/paulus/kernel/kvm/arch/powerpc/kvm/book3s_pr_papr.c:53:2: error: ignoring return value of ‘copy_from_user’, declared with attribute warn_unused_result [-Werror=unused-result]
copy_from_user(pteg, (void __user *)pteg_addr, sizeof(pteg));
^
/home/paulus/kernel/kvm/arch/powerpc/kvm/book3s_pr_papr.c:74:2: error: ignoring return value of ‘copy_to_user’, declared with attribute warn_unused_result [-Werror=unused-result]
copy_to_user((void __user *)pteg_addr, hpte, HPTE_SIZE);
^
POWER9 running a radix guest will take some hypervisor interrupts
without going to real mode (turning off the MMU). This means that
early hypercall handlers may now be called in virtual mode. Most of
the handlers work just fine in both modes, but there are some that
can crash the host if called in virtual mode, notably the TCE (IOMMU)
hypercalls H_PUT_TCE, H_STUFF_TCE and H_PUT_TCE_INDIRECT. These
already have both a real-mode and a virtual-mode version, so we
arrange for the real-mode version to return H_TOO_HARD for radix
guests, which will result in the virtual-mode version being called.
The other hypercall which is sensitive to the MMU mode is H_RANDOM.
It doesn't have a virtual-mode version, so this adds code to enable
it to be called in either mode.
An alternative solution was considered which would refuse to call any
of the early hypercall handlers when doing a virtual-mode exit from a
radix guest. However, the XICS-on-XIVE code depends on the XICS
hypercalls being handled early even for virtual-mode exits, because
the handlers need to be called before the XIVE vCPU state has been
pulled off the hardware. Therefore that solution would have become
quite invasive and complicated, and was rejected in favour of the
simpler, though less elegant, solution presented here.
Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Tested-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Ville Syrjälä [Sun, 7 May 2017 17:12:52 +0000 (20:12 +0300)]
drm/i915: Make vblank evade warnings optional
Add a new Kconfig option to enable/disable the extra warnings
from the vblank evade code. For now we'll keep the warning
about an actually missed vblank always enabled as that can have
an actual user visible impact. But if we miss the deadline
othrwise there's no real need to bother the user with that.
We'll want these warnings enabled during development however
so that we can catch regressions.
Based on the reports it looks like this is still very easy
to hit on SKL, so we have more work ahead of us to optimize
the crtiical section further.
Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Dave Airlie <airlied@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reported-by: Jens Axboe <axboe@kernel.dk> Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Fixes: 7986f0499fc6 ("drm/i915: Complain if we take too long under vblank evasion.") Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Fri, 12 May 2017 04:25:22 +0000 (14:25 +1000)]
Merge branch 'linux-4.12' of git://github.com/skeggsb/linux into drm-next
Quite a few patches, but not much code changed:
- Fixes regression from atomic when only the source rect of a plane
changes (ie. xrandr --right-of)
- Fixes another issue where atomic changed behaviour underneath us,
potentially causing laggy cursor position updates
- Fixes for a bunch of races in thermal code, which lead to random
lockups for a lot of users
* 'linux-4.12' of git://github.com/skeggsb/linux:
drm/nouveau/therm: remove ineffective workarounds for alarm bugs
drm/nouveau/tmr: avoid processing completed alarms when adding a new one
drm/nouveau/tmr: fix corruption of the pending list when rescheduling an alarm
drm/nouveau/tmr: handle races with hw when updating the next alarm time
drm/nouveau/tmr: ack interrupt before processing alarms
drm/nouveau/core: fix static checker warning
drm/nouveau/fb/ram/gf100-: remove 0x10f200 read
drm/nouveau/kms/nv50: skip core channel cursor update on position-only changes
drm/nouveau/kms/nv50: fix source-rect-only plane updates
drm/nouveau/kms/nv50: remove pointless argument to window atomic_check_acquire()
Dave Airlie [Fri, 12 May 2017 03:58:29 +0000 (13:58 +1000)]
Merge branch 'drm-next-4.12' of git://people.freedesktop.org/~agd5f/linux into drm-next
Fixes for 4.12. This is a bit bigger than usual since it's 3 weeks
worth of fixes and most of these changes are for vega10 which is
new for 4.12 and still in a fair amount of flux. It looks like
you missed my last pull request, so those patches are included here
as well. Highlights:
- Lots of vega10 fixes
- Fix interruptable wait mixup
- Fan control method fixes
- Misc display fixes for radeon and amdgpu
- Misc bug fixes
* 'drm-next-4.12' of git://people.freedesktop.org/~agd5f/linux: (132 commits)
drm/amd/powerplay: refine pwm1_enable callback functions for CI.
drm/amd/powerplay: refine pwm1_enable callback functions for vi.
drm/amd/powerplay: refine pwm1_enable callback functions for Vega10.
drm/amdgpu: refine amdgpu pwm1_enable sysfs interface.
drm/amdgpu: add amd fan ctrl mode enums.
drm/amd/powerplay: add more smu message on Vega10.
drm/amdgpu: fix dependency issue
drm/amd: fix init order of sched job
drm/amdgpu: add some additional vega10 pci ids
drm/amdgpu/soc15: use atomfirmware for setting bios scratch for reset
drm/amdgpu/atomfirmware: add function to update engine hang status
drm/radeon: only warn once in radeon_ttm_bo_destroy if va list not empty
drm/amdgpu: fix mutex list null pointer reference
drm/amd/powerplay: fix bug sclk/mclk level can't be set on vega10.
drm/amd/powerplay: Setup sw CTF to allow graceful exit when temperature exceeds maximum.
drm/amd/powerplay: delete dead code in powerplay.
drm/amdgpu: Use less generic enum definitions
drm/amdgpu/gfx9: derive tile pipes from golden settings
drm/amdgpu/gfx: drop max_gs_waves_per_vgt
drm/amd/powerplay: disable engine spread spectrum feature on Vega10.
...
Dave Airlie [Fri, 12 May 2017 03:57:20 +0000 (13:57 +1000)]
Merge tag 'drm-misc-next-fixes-2017-05-05' of git://anongit.freedesktop.org/git/drm-misc into drm-next
Core Changes:
- Add quirk for LGD 764 panel to default 10bpc (Mario)
Cc: Mario Kleiner <mario.kleiner.de@gmail.com>
* tag 'drm-misc-next-fixes-2017-05-05' of git://anongit.freedesktop.org/git/drm-misc:
drm/edid: Add 10 bpc quirk for LGD 764 panel in HP zBook 17 G2
Ben Skeggs [Thu, 11 May 2017 07:13:29 +0000 (17:13 +1000)]
drm/nouveau/tmr: avoid processing completed alarms when adding a new one
The idea here was to avoid having to "manually" program the HW if there's
a new earliest alarm. This was lazy and bad, as it leads to loads of fun
races between inter-related callers (ie. therm).
Turns out, it's not so difficult after all. Go figure ;)
Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Cc: stable@vger.kernel.org
This "optimisation" (which was originally meant to skip updating cursor
settings in the core channel on position-only updates) turned out to be
pointless in the final design of the code before it was merged.
Remove it completely, as it breaks other cases.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Cc: stable@vger.kernel.org [4.10+]
Linus Torvalds [Thu, 11 May 2017 18:29:52 +0000 (11:29 -0700)]
Merge tag 'docs-4.12-2' of git://git.lwn.net/linux
Pull more documentation updates from Jonathan Corbet:
"Connect the newly RST-formatted documentation to the rest; this had to
wait until the input pull was done. There's also a few small fixes
that wandered in"
* tag 'docs-4.12-2' of git://git.lwn.net/linux:
doc: replace FTP URL to kernel.org with HTTPS one
docs: update references to the device io book
Documentation: earlycon: fix Marvell Armada 3700 UART name
docs-rst: add input docs at main index and use kernel-figure
Linus Torvalds [Thu, 11 May 2017 18:01:56 +0000 (11:01 -0700)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"A smaller collection of fixes that should go into -rc1. This contains:
- A fix from Christoph, fixing a regression with the WRITE_SAME and
partial completions. Caused a BUG() on ppc.
- Fixup for __blk_mq_stop_hw_queues(), it should be static. From
Colin.
- Removal of dmesg error messages on elevator switching, when invoked
from sysfs. From me.
- Fix for blk-stat, using this_cpu_ptr() in a section only protected
by rcu_read_lock(). This breaks when PREEMPT_RCU is enabled. From
me.
- Two fixes for BFQ from Paolo, one fixing a crash and one updating
the documentation.
- An error handling lightnvm memory leak, from Rakesh.
- The previous blk-mq hot unplug lock reversal depends on the CPU
hotplug rework that isn't in mainline yet. This caused a lockdep
splat when people unplugged CPUs with blk-mq devices. From Wanpeng.
- A regression fix for DIF/DIX on blk-mq. From Wen"
* 'for-linus' of git://git.kernel.dk/linux-block:
block: handle partial completions for special payload requests
blk-mq: NVMe 512B/4K+T10 DIF/DIX format returns I/O error on dd with split op
blk-stat: don't use this_cpu_ptr() in a preemptable section
elevator: remove redundant warnings on IO scheduler switch
block, bfq: stress that low_latency must be off to get max throughput
block, bfq: use pointer entity->sched_data only if set
nvme: lightnvm: fix memory leak
blk-mq: make __blk_mq_stop_hw_queues static
lightnvm: remove unused rq parameter of nvme_nvm_rqtocmd() to kill warning
block/mq: fix potential deadlock during cpu hotplug
Linus Torvalds [Thu, 11 May 2017 17:44:22 +0000 (10:44 -0700)]
Merge tag 'for-linus-20170510' of git://git.infradead.org/linux-mtd
Pull MTD updates from Brian Norris:
"NAND, from Boris:
- some minor fixes/improvements on existing drivers (fsmc, gpio, ifc,
davinci, brcmnand, omap)
- a huge cleanup/rework of the denali driver accompanied with core
fixes/improvements to simplify the driver code
- a complete rewrite of the atmel driver to support new DT bindings
make future evolution easier
- the addition of per-vendor detection/initialization steps to avoid
extending the nand_ids table with more extended-id entries
SPI NOR, from Cyrille:
- fixes in the hisi, intel and Mediatek SPI controller drivers
- fixes to some SPI flash memories not supporting the Chip Erase
command.
- add support to some new memory parts (Winbond, Macronix, Micron,
ESMT).
- add new driver for the STM32 QSPI controller
And a few fixes for Gemini and Versatile platforms on physmap-of"
* tag 'for-linus-20170510' of git://git.infradead.org/linux-mtd: (100 commits)
MAINTAINERS: Update NAND subsystem git repositories
mtd: nand: gpio: update binding
mtd: nand: add ooblayout for old hamming layout
mtd: oxnas_nand: Allocating more than necessary in probe()
dt-bindings: mtd: Document the STM32 QSPI bindings
mtd: mtk-nor: set controller's address width according to nor flash
mtd: spi-nor: add driver for STM32 quad spi flash controller
mtd: nand: brcmnand: Check flash #WP pin status before nand erase/program
mtd: nand: davinci: add comment on NAND subpage write status on keystone
mtd: nand: omap2: Fix partition creation via cmdline mtdparts
mtd: nand: NULL terminate a of_device_id table
mtd: nand: Fix a couple error codes
mtd: nand: allow drivers to request minimum alignment for passed buffer
mtd: nand: allocate aligned buffers if NAND_OWN_BUFFERS is unset
mtd: nand: denali: allow to override revision number
mtd: nand: denali_dt: use pdev instead of ofdev for platform_device
mtd: nand: denali_dt: remove dma-mask DT property
mtd: nand: denali: support 64bit capable DMA engine
mtd: nand: denali_dt: enable HW_ECC_FIXUP for Altera SOCFPGA variant
mtd: nand: denali: support HW_ECC_FIXUP capability
...
block: handle partial completions for special payload requests
SCSI devices can return short writes on Write Same just like for normal
writes, so we need to handle this case for our special payload requests
as well.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com> Tested-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com> Signed-off-by: Jens Axboe <axboe@fb.com>
Juergen Gross [Wed, 10 May 2017 04:08:44 +0000 (06:08 +0200)]
xen: adjust early dom0 p2m handling to xen hypervisor behavior
When booted as pv-guest the p2m list presented by the Xen is already
mapped to virtual addresses. In dom0 case the hypervisor might make use
of 2M- or 1G-pages for this mapping. Unfortunately while being properly
aligned in virtual and machine address space, those pages might not be
aligned properly in guest physical address space.
So when trying to obtain the guest physical address of such a page
pud_pfn() and pmd_pfn() must be avoided as those will mask away guest
physical address bits not being zero in this special case.
x86/amd: don't set X86_BUG_SYSRET_SS_ATTRS when running under Xen
When running as Xen pv guest X86_BUG_SYSRET_SS_ATTRS must not be set
on AMD cpus.
This bug/feature bit is kind of special as it will be used very early
when switching threads. Setting the bit and clearing it a little bit
later leaves a critical window where things can go wrong. This time
window has enlarged a little bit by using setup_clear_cpu_cap() instead
of the hypervisor's set_cpu_features callback. It seems this larger
window now makes it rather easy to hit the problem.
The proper solution is to never set the bit in case of Xen.
Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Juergen Gross <jgross@suse.com>
arm64: Silence first allocation with CONFIG_ARM64_MODULE_PLTS=y
When CONFIG_ARM64_MODULE_PLTS is enabled, the first allocation using the
module space fails, because the module is too big, and then the module
allocation is attempted from vmalloc space. Silence the first allocation
failure in that case by setting __GFP_NOWARN.
ARM: Silence first allocation with CONFIG_ARM_MODULE_PLTS=y
When CONFIG_ARM_MODULE_PLTS is enabled, the first allocation using the
module space fails, because the module is too big, and then the module
allocation is attempted from vmalloc space. Silence the first allocation
failure in that case by setting __GFP_NOWARN.
Acked-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
mm: Silence vmap() allocation failures based on caller gfp_flags
If the caller has set __GFP_NOWARN don't print the following message:
vmap allocation for size 15736832 failed: use vmalloc=<size> to increase
size.
This can happen with the ARM/Linux or ARM64/Linux module loader built
with CONFIG_ARM{,64}_MODULE_PLTS=y which does a first attempt at loading
a large module from module space, then falls back to vmalloc space.
Tobias Klauser [Thu, 11 May 2017 09:40:16 +0000 (11:40 +0200)]
nios2: remove custom early console implementation
As of commits d8f347ba35cf ("nios2: enable earlycon support"), a1439015d9b8 ("serial: altera_jtaguart: add earlycon support") and 1202049a279b ("serial: altera_uart: add earlycon support"), the nios2
architecture and the altera_uart/altera_jtaguart drivers support
earlycon. Thus, the custom early console implementation for nios2 is no
longer necessary to get early boot messages. Remove it and rely fully on
earlycon support.
Author: Bart Van Assche <bart.vanassche@sandisk.com>
Date: Thu Mar 30 10:12:39 2017 -0700
target: Fix VERIFY and WRITE VERIFY command parsing
This patch broke existing behaviour for WRITE_VERIFY because
it dropped the original SCF_SCSI_DATA_CDB assignment for
bytchk = 0 so target_cmd_size_check() no longer rejected
this case, allowing an overflow case to trigger an OOPs
in iscsi-target.
Since the short term and long term fixes are still being
discussed, revert it for now since it's late in the merge
window and try again in v4.13-rc1.
Conflicts:
drivers/target/target_core_sbc.c
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
The conversion of __dax_zero_page_range() to 'struct dax_operations'
caused it to frequently fail. The mistake was treating the @size
parameter as a dax mapping length rather than just a length of the
clear_pmem() operation. The dax mapping length is assumed to be hard
coded as PAGE_SIZE.
Without this fix any page unaligned zeroing request will trigger a
-EINVAL return from bdev_dax_pgoff().
Cc: Jan Kara <jack@suse.com> Cc: Christoph Hellwig <hch@lst.de> Reported-by: Ross Zwisler <ross.zwisler@linux.intel.com> Tested-by: Ross Zwisler <ross.zwisler@linux.intel.com> Fixes: d3d097939229 ("filesystem-dax: convert to dax_direct_access()") Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Vishal Verma [Wed, 10 May 2017 21:01:31 +0000 (15:01 -0600)]
libnvdimm, btt: ensure that initializing metadata clears poison
If we had badblocks/poison in the metadata area of a BTT, recreating the
BTT would not clear the poison in all cases, notably the flog area. This
is because rw_bytes will only clear errors if the request being sent
down is 512B aligned and sized.
Make sure that when writing the map and info blocks, the rw_bytes being
sent are of the correct size/alignment. For the flog, instead of doing
the smaller log_entry writes only, first do a 'wipe' of the entire area
by writing zeroes in large enough chunks so that errors get cleared.
Cc: Andy Rudoff <andy.rudoff@intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Vishal Verma [Wed, 10 May 2017 21:01:30 +0000 (15:01 -0600)]
libnvdimm: add an atomic vs process context flag to rw_bytes
nsio_rw_bytes can clear media errors, but this cannot be done while we
are in an atomic context due to locking within ACPI. From the BTT,
->rw_bytes may be called either from atomic or process context depending
on whether the calls happen during initialization or during IO.
During init, we want to ensure error clearing happens, and the flag
marking process context allows nsio_rw_bytes to do that. When called
during IO, we're in atomic context, and error clearing can be skipped.
Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Linus Torvalds [Thu, 11 May 2017 03:45:36 +0000 (20:45 -0700)]
Merge tag 'kbuild-uapi-v4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild UAPI updates from Masahiro Yamada:
"Improvement of headers_install by Nicolas Dichtel.
It has been long since the introduction of uapi directories, but the
de-coupling of exported headers has not been completed. Headers listed
in header-y are exported whether they exist in uapi directories or
not. His work fixes this inconsistency.
All (and only) headers under uapi directories are now exported. The
asm-generic wrappers are still exceptions, but this is a big step
forward"
* tag 'kbuild-uapi-v4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
arch/include: remove empty Kbuild files
uapi: export all arch specifics directories
uapi: export all headers under uapi directories
smc_diag.h: fix include from userland
btrfs_tree.h: fix include from userland
uapi: includes linux/types.h before exporting files
Makefile.headersinst: remove destination-y option
Makefile.headersinst: cleanup input files
x86: stop exporting msr-index.h to userland
nios2: put setup.h in uapi
h8300: put bitsperlong.h in uapi
Linus Torvalds [Thu, 11 May 2017 03:41:43 +0000 (20:41 -0700)]
Merge tag 'kbuild-misc-v4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull misc Kbuild updates from Masahiro Yamada:
- clean up builddeb script
- use full path for KBUILD_IMAGE to fix rpm-pkg build
- fix objdiff tool to ignore debug info
* tag 'kbuild-misc-v4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
builddeb: fix typo
builddeb: Update a few outdated and hardcoded strings
deb-pkg: Remove the KBUILD_IMAGE workaround
unicore32: Use full path in KBUILD_IMAGE definition
sh: Use full path in KBUILD_IMAGE definition
arc: Use full path in KBUILD_IMAGE definition
arm: Use full path in KBUILD_IMAGE definition
arm64: Use full path in KBUILD_IMAGE definition
scripts: objdiff: Ignore debug info when comparing
- improve compiler flag evaluation for better build performance
- fix GCC version-dependent warning
- fix genksyms
* tag 'kbuild-v4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (23 commits)
kbuild: dtbinst: remove unnecessary __dtbs_install_prep target
ia64: beatify build log for gate.so and gate-syms.o
alpha: make short build log available for division routines
alpha: merge build rules of division routines
alpha: add $(src)/ rather than $(obj)/ to make source file path
Makefile: evaluate LDFLAGS_BUILD_ID only once
objtool: make it visible in make V=1 output
kbuild: clang: add -no-integrated-as to KBUILD_[AC]FLAGS
kbuild: Add support to generate LLVM assembly files
kbuild: Add better clang cross build support
kbuild: drop -Wno-unknown-warning-option from clang options
kbuild: fix asm-offset generation to work with clang
kbuild: consolidate redundant sed script ASM offset generation
frv: Use OFFSET macro in DEF_*REG()
kbuild: avoid conflict between -ffunction-sections and -pg on gcc-4.7
kbuild: Consolidate header generation from ASM offset information
kbuild: use -Oz instead of -Os when using clang
kbuild, LLVMLinux: Add -Werror to cc-option to support clang
Kbuild: make designated_init attribute fatal
kbuild: drop unneeded patterns '.*.orig' and '.*.rej' from distclean
...