Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
"A set of fixes for x86:
- Prevent multiplication result truncation on 32bit. Introduced with
the early timestamp reworrk.
- Ensure microcode revision storage to be consistent under all
circumstances
- Prevent write tearing of PTEs
- Prevent confusion of user and kernel reegisters when dumping fatal
signals verbosely
- Make an error return value in a failure path of the vector
allocation negative. Returning EINVAL might the caller assume
success and causes further wreckage.
- A trivial kernel doc warning fix"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: Use WRITE_ONCE() when setting PTEs
x86/apic/vector: Make error return value negative
x86/process: Don't mix user/kernel regs in 64bit __show_regs()
x86/tsc: Prevent result truncation on 32bit
x86: Fix kernel-doc atomic.h warnings
x86/microcode: Update the new microcode revision unconditionally
x86/microcode: Make sure boot_cpu_data.microcode is up-to-date
Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timekeeping fixes from Thomas Gleixner:
"Two fixes for timekeeping:
- Revert to the previous kthread based update, which is unfortunately
required due to lock ordering issues. The removal caused boot
failures on old Core2 machines. Add a proper comment why the thread
needs to stay to prevent accidental removal in the future.
- Fix a silly typo in a function declaration"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
clocksource: Revert "Remove kthread"
timekeeping: Fix declaration of read_persistent_wall_and_boot_offset()
Merge branch 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull cpu hotplug fixes from Thomas Gleixner:
"Two fixes for the hotplug state machine code:
- Move the misplaces smb() in the hotplug thread function to the
proper place, otherwise a half update control struct could be
observed
- Prevent state corruption on error rollback, which causes the state
to advance by one and as a consequence skip it in the bringup
sequence"
* 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
cpu/hotplug: Prevent state corruption on error rollback
cpu/hotplug: Adjust misplaced smb() in cpuhp_thread_fun()
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random
Pull random driver fix from Ted Ts'o:
"Fix things so the choice of whether or not to trust RDRAND to
initialize the CRNG is configurable via the boot option
random.trust_cpu={on,off}"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random:
random: make CPU trust a boot parameter
Merge tag 'kbuild-fixes-v4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- make setlocalversion more robust about -dirty check
- loosen the pkg-config requirement for Kconfig
- change missing depmod to a warning from an error
- warn modules_install when System.map is missing
* tag 'kbuild-fixes-v4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kbuild: modules_install: warn when missing System.map file
kbuild: make missing $DEPMOD a Warning instead of an Error
kconfig: do not require pkg-config on make {menu,n}config
kconfig: remove a spurious self-assignment
scripts/setlocalversion: git: Make -dirty check more robust
Randy Dunlap [Thu, 6 Sep 2018 23:37:24 +0000 (16:37 -0700)]
kbuild: modules_install: warn when missing System.map file
If there is no System.map file for "make modules_install",
scripts/depmod.sh will silently exit with success, having done
nothing. Since this is an unexpected situation, change it to
report a Warning for the missing file. The behavior is not
changed except for the Warning message.
The (previous) silent success and new Warning can be reproduced
by:
$ make mrproper; make defconfig
$ make modules; make modules_install
and since System.map is produced by "make vmlinux", the steps
above omit producing the System.map file.
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Radim Krčmář:
"ARM:
- Fix a VFP corruption in 32-bit guest
- Add missing cache invalidation for CoW pages
- Two small cleanups
s390:
- Fallout from the hugetlbfs support: pfmf interpretion and locking
- VSIE: fix keywrapping for nested guests
PPC:
- Fix a bug where pages might not get marked dirty, causing guest
memory corruption on migration
- Fix a bug causing reads from guest memory to use the wrong guest
real address for very large HPT guests (>256G of memory), leading
to failures in instruction emulation.
x86:
- Fix out of bound access from malicious pv ipi hypercalls
(introduced in rc1)
- Fix delivery of pending interrupts when entering a nested guest,
preventing arbitrarily late injection
- Sanitize kvm_stat output after destroying a guest
- Fix infinite loop when emulating a nested guest page fault and
improve the surrounding emulation code
- Two minor cleanups"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (28 commits)
KVM: LAPIC: Fix pv ipis out-of-bounds access
KVM: nVMX: Fix loss of pending IRQ/NMI before entering L2
arm64: KVM: Remove pgd_lock
KVM: Remove obsolete kvm_unmap_hva notifier backend
arm64: KVM: Only force FPEXC32_EL2.EN if trapping FPSIMD
KVM: arm/arm64: Clean dcache to PoC when changing PTE due to CoW
KVM: s390: Properly lock mm context allow_gmap_hpage_1m setting
KVM: s390: vsie: copy wrapping keys to right place
KVM: s390: Fix pfmf and conditional skey emulation
tools/kvm_stat: re-animate display of dead guests
tools/kvm_stat: indicate dead guests as such
tools/kvm_stat: handle guest removals more gracefully
tools/kvm_stat: don't reset stats when setting PID filter for debugfs
tools/kvm_stat: fix updates for dead guests
tools/kvm_stat: fix handling of invalid paths in debugfs provider
tools/kvm_stat: fix python3 issues
KVM: x86: Unexport x86_emulate_instruction()
KVM: x86: Rename emulate_instruction() to kvm_emulate_instruction()
KVM: x86: Do not re-{try,execute} after failed emulation in L2
KVM: x86: Default to not allowing emulation retry in kvm_mmu_page_fault
...
Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Olof Johansson:
"A few more fixes who have trickled in:
- MMC bus width fixup for some Allwinner platforms
- Fix for NULL deref in ti-aemif when no platform data is passed in
- Fix div by 0 in SCMI code
- Add a missing module alias in a new RPi driver"
* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
memory: ti-aemif: fix a potential NULL-pointer dereference
firmware: arm_scmi: fix divide by zero when sustained_perf_level is zero
hwmon: rpi: add module alias to raspberrypi-hwmon
arm64: allwinner: dts: h6: fix Pine H64 MMC bus width
Nadav Amit [Sun, 2 Sep 2018 18:14:50 +0000 (11:14 -0700)]
x86/mm: Use WRITE_ONCE() when setting PTEs
When page-table entries are set, the compiler might optimize their
assignment by using multiple instructions to set the PTE. This might
turn into a security hazard if the user somehow manages to use the
interim PTE. L1TF does not make our lives easier, making even an interim
non-present PTE a security hazard.
Using WRITE_ONCE() to set PTEs and friends should prevent this potential
security hazard.
I skimmed the differences in the binary with and without this patch. The
differences are (obviously) greater when CONFIG_PARAVIRT=n as more
code optimizations are possible. For better and worse, the impact on the
binary with this patch is pretty small. Skimming the code did not cause
anything to jump out as a security hazard, but it seems that at least
move_soft_dirty_pte() caused set_pte_at() to use multiple writes.
Signed-off-by: Nadav Amit <namit@vmware.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180902181451.80520-1-namit@vmware.com
Thomas Gleixner [Sat, 8 Sep 2018 10:07:26 +0000 (12:07 +0200)]
x86/apic/vector: Make error return value negative
activate_managed() returns EINVAL instead of -EINVAL in case of
error. While this is unlikely to happen, the positive return value would
cause further malfunction at the call site.
Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
- bugfixes for uniphier, i801, and xiic drivers
- ID removal (never produced) for imx
- one MAINTAINER addition
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: xiic: Record xilinx i2c with Zynq fragment
i2c: xiic: Make the start and the byte count write atomic
i2c: i801: fix DNV's SMBCTRL register offset
i2c: imx-lpi2c: Remove mx8dv compatible entry
dt-bindings: imx-lpi2c: Remove mx8dv compatible entry
i2c: uniphier-f: issue STOP only for last message or I2C_M_STOP
i2c: uniphier: issue STOP only for last message or I2C_M_STOP
* tag 'arc-4.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
ARC: don't check for HIGHMEM pages in arch_dma_alloc
ARC: IOC: panic if both IOC and ZONE_HIGHMEM enabled
ARC: dma [IOC] Enable per device io coherency
ARC: dma [IOC]: mark DMA devices connected as dma-coherent
ARC: atomics: unbork atomic_fetch_##op()
arc: remove redundant GCC version checks
ARC: sort Kconfig
ARC: cleanup show_faulting_vma()
ARC: [plat-axs*]: Enable SWAP
ARC: [plat-axs*/plat-hsdk]: Allow U-Boot to pass MAC-address to the kernel
ARC: configs: cleanup
David Howells [Fri, 7 Sep 2018 22:55:17 +0000 (23:55 +0100)]
afs: Fix cell specification to permit an empty address list
Fix the cell specification mechanism to allow cells to be pre-created
without having to specify at least one address (the addresses will be
upcalled for).
This allows the cell information preload service to avoid the need to issue
loads of DNS lookups during boot to get the addresses for each cell (500+
lookups for the 'standard' cell list[*]). The lookups can be done later as
each cell is accessed through the filesystem.
Also remove the print statement that prints a line every time a new cell is
added.
[*] There are 144 cells in the list. Each cell is first looked up for an
SRV record, and if that fails, for an AFSDB record. These get a list
of server names, each of which then has to be looked up to get the
addresses for that server. E.g.:
dig srv _afs3-vlserver._udp.grand.central.org
Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Merge tag 'md/4.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md
Pull MD fixes from Shaohua Li:
- Fix a locking issue for md-cluster (Guoqing)
- Fix a sync crash for raid10 (Ni)
- Fix a reshape bug with raid5 cache enabled (me)
* tag 'md/4.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md:
md-cluster: release RESYNC lock after the last resync message
RAID10 BUG_ON in raise_barrier when force is true and conf->barrier is 0
md/raid5-cache: disable reshape completely
Merge tag 'ceph-for-4.19-rc3' of https://github.com/ceph/ceph-client
Pull ceph fixes from Ilya Dryomov:
"Two rbd patches to complete support for images within namespaces that
went into -rc1 and a use-after-free fix.
The rbd changes have been sitting in a branch for quite a while but
couldn't be included into the -rc1 pull request because of a pending
wire protocol backwards compatibility fixup that only got committed
early this week"
* tag 'ceph-for-4.19-rc3' of https://github.com/ceph/ceph-client:
rbd: support cloning across namespaces
rbd: factor out get_parent_info()
ceph: avoid a use-after-free in ceph_destroy_options()
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fix from Will Deacon:
"Just one small fix here, preventing a VM_WARN_ON when a !present
PMD/PUD is "freed" as part of a huge ioremap() operation.
The correct behaviour is to skip the free silently in this case, which
is a little weird (the function is a bit of a misnomer), but it
follows the x86 implementation"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: fix erroneous warnings in page freeing functions
Merge tag 'acpi-4.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"These fix a regression from the 4.18 cycle in the ACPI driver for
Intel SoCs (LPSS) and prevent dmi_check_system() from being called on
non-x86 systems in the ACPI core.
Specifics:
- Fix a power management regression in the ACPI driver for Intel SoCs
(LPSS) introduced by a system-wide suspend/resume fix during the
4.18 cycle (Zhang Rui).
- Prevent dmi_check_system() from being called on non-x86 systems in
the ACPI core (Jean Delvare)"
* tag 'acpi-4.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI / LPSS: Force LPSS quirks on boot
ACPI / bus: Only call dmi_check_system() on X86
Merge tag 'sound-4.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Just a few small fixes:
- a fix for the recursive work cancellation in a specific HD-audio
operation mode
- a fix for potentially uninitialized memory access via rawmidi
- the register bit access fixes for ASoC HD-audio"
* tag 'sound-4.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda: Fix several mismatch for register mask and value
ALSA: rawmidi: Initialize allocated buffers
ALSA: hda - Fix cancel_work_sync() stall from jackpoll work
Wanpeng Li [Thu, 30 Aug 2018 02:03:30 +0000 (10:03 +0800)]
KVM: LAPIC: Fix pv ipis out-of-bounds access
Dan Carpenter reported that the untrusted data returns from kvm_register_read()
results in the following static checker warning:
arch/x86/kvm/lapic.c:576 kvm_pv_send_ipi()
error: buffer underflow 'map->phys_map' 's32min-s32max'
KVM guest can easily trigger this by executing the following assembly sequence
in Ring0:
As this will cause KVM to execute the following code-path:
vmx_handle_exit() -> handle_vmcall() -> kvm_emulate_hypercall() -> kvm_pv_send_ipi()
which will reach out-of-bounds access.
This patch fixes it by adding a check to kvm_pv_send_ipi() against map->max_apic_id,
ignoring destinations that are not present and delivering the rest. We also check
whether or not map->phys_map[min + i] is NULL since the max_apic_id is set to the
max apic id, some phys_map maybe NULL when apic id is sparse, especially kvm
unconditionally set max_apic_id to 255 to reserve enough space for any xAPIC ID.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Liran Alon <liran.alon@oracle.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Liran Alon <liran.alon@oracle.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
[Add second "if (min > map->max_apic_id)" to complete the fix. -Radim] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
KVM: nVMX: Fix loss of pending IRQ/NMI before entering L2
Consider the case L1 had a IRQ/NMI event until it executed
VMLAUNCH/VMRESUME which wasn't delivered because it was disallowed
(e.g. interrupts disabled). When L1 executes VMLAUNCH/VMRESUME,
L0 needs to evaluate if this pending event should cause an exit from
L2 to L1 or delivered directly to L2 (e.g. In case L1 don't intercept
EXTERNAL_INTERRUPT).
Usually this would be handled by L0 requesting a IRQ/NMI window
by setting VMCS accordingly. However, this setting was done on
VMCS01 and now VMCS02 is active instead. Thus, when L1 executes
VMLAUNCH/VMRESUME we force L0 to perform pending event evaluation by
requesting a KVM_REQ_EVENT.
Note that above scenario exists when L1 KVM is about to enter L2 but
requests an "immediate-exit". As in this case, L1 will
disable-interrupts and then send a self-IPI before entering L2.
Reviewed-by: Nikita Leshchenko <nikita.leshchenko@oracle.com> Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Steven Price [Mon, 13 Aug 2018 16:04:53 +0000 (17:04 +0100)]
arm64: KVM: Remove pgd_lock
The lock has never been used and the page tables are protected by
mmu_lock in struct kvm.
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Steven Price <steven.price@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
kvm_unmap_hva is long gone, and we only have kvm_unmap_hva_range to
deal with. Drop the now obsolete code.
Fixes: 3e712df46335 ("KVM: update to new mmu_notifier semantic v2") Cc: James Hogan <jhogan@kernel.org> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Marc Zyngier [Thu, 23 Aug 2018 10:51:43 +0000 (11:51 +0100)]
arm64: KVM: Only force FPEXC32_EL2.EN if trapping FPSIMD
If trapping FPSIMD in the context of an AArch32 guest, it is critical
to set FPEXC32_EL2.EN to 1 so that the trapping is taken to EL2 and
not EL1.
Conversely, it is just as critical *not* to set FPEXC32_EL2.EN to 1
if we're not going to trap FPSIMD, as we then corrupt the existing
VFP state.
Moving the call to __activate_traps_fpsimd32 to the point where we
know for sure that we are going to trap ensures that we don't set that
bit spuriously.
Fixes: 6080b5459fe8 ("KVM: arm64: Optimise FPSIMD handling to reduce guest/host thrashing") Cc: stable@vger.kernel.org # v4.18 Cc: Dave Martin <dave.martin@arm.com> Reported-by: Alexander Graf <agraf@suse.de> Tested-by: Alexander Graf <agraf@suse.de> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Marc Zyngier [Thu, 23 Aug 2018 08:58:27 +0000 (09:58 +0100)]
KVM: arm/arm64: Clean dcache to PoC when changing PTE due to CoW
When triggering a CoW, we unmap the RO page via an MMU notifier
(invalidate_range_start), and then populate the new PTE using another
one (change_pte). In the meantime, we'll have copied the old page
into the new one.
The problem is that the data for the new page is sitting in the
cache, and should the guest have an uncached mapping to that page
(or its MMU off), following accesses will bypass the cache.
In a way, this is similar to what happens on a translation fault:
We need to clean the page to the PoC before mapping it. So let's just
do that.
This fixes a KVM unit test regression observed on a HiSilicon platform,
and subsequently reproduced on Seattle.
Fixes: 8eb4541cfab2 ("KVM: arm/arm64: Only clean the dcache on translation fault") Cc: stable@vger.kernel.org # v4.16+ Reported-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Merge tag 'drm-fixes-2018-09-07' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Seems to have been overly quiet this week so I expect next week will
be more stuff, just one pull from Rodrigo with i915 fixes in it.
Quoting Rodrigo:
'The critical fix here on display side is the DP MST regression one.
But this pull also include fixes for DP SST, small VDSC register
fix and GVT's bucked with "BXT fixes, two guest warning fixes,
dmabuf format mod fix and one for recent multiple VM timeout
failure'."
* tag 'drm-fixes-2018-09-07' of git://anongit.freedesktop.org/drm/drm:
drm/i915/dp_mst: Fix enabling pipe clock for all streams
drm/i915/dsc: Fix PPS register definition macros for 2nd VDSC engine
drm/i915: Re-apply "Perform link quality check, unconditionally during long pulse"
drm/i915/gvt: Give new born vGPU higher scheduling chance
drm/i915/gvt: Fix drm_format_mod value for vGPU plane
drm/i915/gvt: move intel_runtime_pm_get out of spin_lock in stop_schedule
drm/i915/gvt: Handle GEN9_WM_CHICKEN3 with F_CMD_ACCESS.
drm/i915/gvt: Make correct handling to vreg BXT_PHY_CTL_FAMILY
drm/i915/gvt: emulate gen9 dbuf ctl register access
Dave Airlie [Fri, 7 Sep 2018 01:06:58 +0000 (11:06 +1000)]
Merge tag 'drm-intel-fixes-2018-09-05' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
The critical fix here on display side is the DP MST regression one.
But this pull also include fixes for DP SST, small VDSC register fix
and GVT's bucked with "BXT fixes, two guest warning fixes, dmabuf
format mod fix and one for recent multiple VM timeout failure."
Merge tag 'mips_fixes_4.19_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux
Pull MIPS fix from Paul Burton:
"A single fix for v4.19-rc3, resolving a problem with our VDSO data
page for systems with dcache aliasing. Those systems could previously
observe stale data, causing clock_gettime() & gettimeofday() to return
incorrect values"
* tag 'mips_fixes_4.19_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
MIPS: VDSO: Match data page cache colouring when D$ aliases
Merge tag '4.19-rc2-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
"Four small SMB3 fixes, three for stable, and one minor debug
clarification"
* tag '4.19-rc2-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: connect to servername instead of IP for IPC$ share
smb3: check for and properly advertise directory lease support
smb3: minor debugging clarifications in rfc1001 len processing
SMB3: Backup intent flag missing for directory opens with backupuid mounts
fs/cifs: don't translate SFM_SLASH (U+F026) to backslash
Peter Zijlstra [Wed, 5 Sep 2018 08:41:58 +0000 (10:41 +0200)]
clocksource: Revert "Remove kthread"
I turns out that the silly spawn kthread from worker was actually needed.
clocksource_watchdog_kthread() cannot be called directly from
clocksource_watchdog_work(), because clocksource_select() calls
timekeeping_notify() which uses stop_machine(). One cannot use
stop_machine() from a workqueue() due lock inversions wrt CPU hotplug.
Revert the patch but add a comment that explain why we jump through such
apparently silly hoops.
Merge tag 'for-linus-20180906' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"Small collection of fixes that should go into this release. This
contains:
- Small series that fixes a race between blkcg teardown and writeback
(Dennis Zhou)
- Fix disallowing invalid block size settings from the nbd ioctl (me)
- BFQ fix for a use-after-free on last release of a bfqg (Konstantin
Khlebnikov)
- Fix for the "don't warn for flush" fix (Mikulas)"
* tag 'for-linus-20180906' of git://git.kernel.dk/linux-block:
block: bfq: swap puts in bfqg_and_blkg_put
block: don't warn when doing fsync on read-only devices
nbd: don't allow invalid blocksize settings
blkcg: use tryget logic when associating a blkg with a bio
blkcg: delay blkg destruction until after writeback has finished
Revert "blk-throttle: fix race between blkcg_bio_issue_check() and cgroup_rmdir()"
i2c: xiic: Make the start and the byte count write atomic
Disable interrupts while configuring the transfer and enable them back.
We have below as the programming sequence
1. start and slave address
2. byte count and stop
In some customer platform there was a lot of interrupts between 1 and 2
and after slave address (around 7 clock cyles) if 2 is not executed
then the transaction is nacked.
To fix this case make the 2 writes atomic.
Signed-off-by: Shubhrajyoti Datta <shubhrajyoti.datta@xilinx.com> Signed-off-by: Michal Simek <michal.simek@xilinx.com>
[wsa: added a newline for better readability] Signed-off-by: Wolfram Sang <wsa@the-dreams.de> Cc: stable@kernel.org
Jia He [Tue, 28 Aug 2018 04:53:26 +0000 (12:53 +0800)]
irqchip/gic-v3-its: Cap lpi_id_bits to reduce memory footprint
Commit 7887d025b5d7 ("irqchip/gic-v3-its: Use full range of LPIs"), removes
the cap for lpi_id_bits, which causes the following warning to trigger on a
QDF2400 server:
In its_alloc_lpi_tables(), lpi_id_bits is 24 in QDF2400. The allocation in
allocate_prop_table() tries therefore to allocate 16M (order 12 if
pagesize=4k), which triggers the warning.
As said by MarcL
Capping lpi_id_bits at 16 (which is what we had before) is plenty,
will save a some memory, and gives some margin before we need to push
it up again.
Bring the upper limit of lpi_id_bits back to prevent
Fixes: 7887d025b5d7 ("irqchip/gic-v3-its: Use full range of LPIs") Suggested-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Jia He <jia.he@hxt-semitech.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Tested-by: Olof Johansson <olof@lixom.net> Cc: Jason Cooper <jason@lakedaemon.net> Cc: linux-arm-kernel@lists.infradead.org Link: https://lkml.kernel.org/r/1535432006-2304-1-git-send-email-jia.he@hxt-semitech.com
Fix trivial use-after-free. This could be last reference to bfqg.
Fixes: ecfaf373c85a ("block, bfq: access and cache blkg data only when safe") Acked-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Mark Rutland [Wed, 5 Sep 2018 16:38:57 +0000 (17:38 +0100)]
arm64: fix erroneous warnings in page freeing functions
In pmd_free_pte_page() and pud_free_pmd_page() we try to warn if they
hit a present non-table entry. In both cases we'll warn for non-present
entries, as the VM_WARN_ON() only checks the entry is not a table entry.
This has been observed to result in warnings when booting a v4.19-rc2
kernel under qemu.
Fix this by bailing out earlier for non-present entries.
Fixes: 63b12f86ba9d1548 ("arm64: Implement page table free interfaces") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
firmware: arm_scmi: fix divide by zero when sustained_perf_level is zero
Firmware can provide zero as values for sustained performance level and
corresponding sustained frequency in kHz in order to hide the actual
frequencies and provide only abstract values. It may endup with divide
by zero scenario resulting in kernel panic.
Let's set the multiplication factor to one if either one or both of them
(sustained_perf_level and sustained_freq) are set to zero.
Fixes: ccbe18e5d8bc ("firmware: arm_scmi: add initial support for performance protocol") Reported-by: Ionela Voinescu <ionela.voinescu@arm.com> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Olof Johansson <olof@lixom.net>
Merge tag 'apparmor-pr-2018-09-06' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor
Pull apparmor fix from John Johansen:
"A fix for an issue syzbot discovered last week:
- Fix for bad debug check when converting secids to secctx"
* tag 'apparmor-pr-2018-09-06' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor:
apparmor: fix bad debug check in apparmor_secid_to_secctx()
Merge tag 'trace-v4.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
"This fixes two annoying bugs:
- The first one is a side effect caused by using SRCU for rcuidle
tracepoints. It seems that the perf was depending on the rcuidle
tracepoints to make RCU watch when it wasn't.
The real fix will be to have perf use SRCU instead of depending on
RCU watching, but that can't be done until SRCU is safe to use in
NMI context (Paul's working on that).
- The second bug fix is for a bug that's been periodically making my
tests fail randomly for some time. I haven't had time to track it
down, but finally have. It has to do with stressing NMIs (via perf)
while enabling or disabling ftrace function handling with lockdep
enabled.
If an interrupt happens and just as it returns, it sets lockdep
back to "interrupts enabled" but before it returns an NMI is
triggered, and if this happens while printk_nmi_enter has a
breakpoint attached to it (because ftrace is converting it to or
from nop to call fentry), the breakpoint trap also calls into
lockdep, and since returning from the NMI to a interrupt handler,
interrupts were disabled when the NMI went off, lockdep keeps its
state as interrupts disabled when it returns back from the
interrupt handler where interrupts are enabled.
This causes lockdep_assert_irqs_enabled() to trigger a false
positive"
* tag 'trace-v4.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
printk/tracing: Do not trace printk_nmi_enter()
tracing: Add back in rcu_irq_enter/exit_irqson() for rcuidle tracepoints
Merge tag 'for-4.19-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- fix for improper fsync after hardlink
- fix for a corruption during file deduplication
- use after free fixes
- RCU warning fix
- fix for buffered write to nodatacow file
* tag 'for-4.19-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: Fix suspicious RCU usage warning in btrfs_debug_in_rcu
btrfs: use after free in btrfs_quota_enable
btrfs: btrfs_shrink_device should call commit transaction at the end
btrfs: fix qgroup_free wrong num_bytes in btrfs_subvolume_reserve_metadata
Btrfs: fix data corruption when deduplicating between different files
Btrfs: sync log after logging new name
Btrfs: fix unexpected failure of nocow buffered writes after snapshotting when low on space
------------[ cut here ]------------
IRQs not enabled as expected
WARNING: CPU: 3 PID: 0 at kernel/time/tick-sched.c:982 tick_nohz_idle_enter+0x44/0x8c
Modules linked in: ip6t_REJECT nf_reject_ipv6 ip6table_filter ip6_tables ipv6
CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.19.0-rc2-test+ #2
Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
EIP: tick_nohz_idle_enter+0x44/0x8c
Code: ec 05 00 00 00 75 26 83 b8 c0 05 00 00 00 75 1d 80 3d d0 36 3e c1 00
75 14 68 94 63 12 c1 c6 05 d0 36 3e c1 01 e8 04 ee f8 ff <0f> 0b 58 fa bb a0
e5 66 c1 e8 25 0f 04 00 64 03 1d 28 31 52 c1 8b
EAX: 0000001c EBX: f26e7f8c ECX: 00000006 EDX: 00000007
ESI: f26dd1c0 EDI: 00000000 EBP: f26e7f40 ESP: f26e7f38
DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010296
CR0: 80050033 CR2: 0813c6b0 CR3: 2f342000 CR4: 001406f0
Call Trace:
do_idle+0x33/0x202
cpu_startup_entry+0x61/0x63
start_secondary+0x18e/0x1ed
startup_32_smp+0x164/0x168
irq event stamp: 18773830
hardirqs last enabled at (18773829): [<c040150c>] trace_hardirqs_on_thunk+0xc/0x10
hardirqs last disabled at (18773830): [<c040151c>] trace_hardirqs_off_thunk+0xc/0x10
softirqs last enabled at (18773824): [<c0ddaa6f>] __do_softirq+0x25f/0x2bf
softirqs last disabled at (18773767): [<c0416bbe>] call_on_stack+0x45/0x4b
---[ end trace b7c64aa79e17954a ]---
After a bit of debugging, I found what was happening. This would trigger
when performing "perf" with a high NMI interrupt rate, while enabling and
disabling function tracer. Ftrace uses breakpoints to convert the nops at
the start of functions to calls to the function trampolines. The breakpoint
traps disable interrupts and this makes calls into lockdep via the
trace_hardirqs_off_thunk in the entry.S code. What happens is the following:
do_idle {
[interrupts enabled]
<interrupt> [interrupts disabled]
TRACE_IRQS_OFF [lockdep says irqs off]
[...]
TRACE_IRQS_IRET
test if pt_regs say return to interrupts enabled [yes]
TRACE_IRQS_ON [lockdep says irqs are on]
<nmi>
nmi_enter() {
printk_nmi_enter() [traced by ftrace]
[ hit ftrace breakpoint ]
<breakpoint exception>
TRACE_IRQS_OFF [lockdep says irqs off]
[...]
TRACE_IRQS_IRET [return from breakpoint]
test if pt_regs say interrupts enabled [no]
[iret back to interrupt]
[iret back to code]
tick_nohz_idle_enter() {
lockdep_assert_irqs_enabled() [lockdep say no!]
Although interrupts are indeed enabled, lockdep thinks it is not, and since
we now do asserts via lockdep, it gives a false warning. The issue here is
that printk_nmi_enter() is called before lockdep_off(), which disables
lockdep (for this reason) in NMIs. By simply not allowing ftrace to see
printk_nmi_enter() (via notrace annotation) we keep lockdep from getting
confused.
Cc: stable@vger.kernel.org Fixes: a95813d717b89 ("printk/nmi: generic solution for safe printk in NMI") Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Acked-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Ilya Dryomov [Wed, 22 Aug 2018 15:26:10 +0000 (17:26 +0200)]
rbd: support cloning across namespaces
If parent_get class method is not supported by the OSDs, fall back to
the legacy class method and assume that the parent is in the default
(i.e. "") namespace. The "use the child's image namespace" workaround
is no longer needed because creating images within namespaces will
require parent_get aware OSDs.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Jason Dillaman <dillaman@redhat.com>
Ilya Dryomov [Fri, 24 Aug 2018 13:32:43 +0000 (15:32 +0200)]
ceph: avoid a use-after-free in ceph_destroy_options()
syzbot reported a use-after-free in ceph_destroy_options(), called from
ceph_mount(). The problem was that create_fs_client() consumed the opt
pointer on some errors, but not on all of them. Make sure it always
consumes both libceph and ceph options.
Thomas Gleixner [Thu, 6 Sep 2018 13:21:38 +0000 (15:21 +0200)]
cpu/hotplug: Prevent state corruption on error rollback
When a teardown callback fails, the CPU hotplug code brings the CPU back to
the previous state. The previous state becomes the new target state. The
rollback happens in undo_cpu_down() which increments the state
unconditionally even if the state is already the same as the target.
As a consequence the next CPU hotplug operation will start at the wrong
state. This is easily to observe when __cpu_disable() fails.
Prevent the unconditional undo by checking the state vs. target before
incrementing state and fix up the consequently wrong conditional in the
unplug code which handles the failure of the final CPU take down on the
control CPU side.
cpu/hotplug: Adjust misplaced smb() in cpuhp_thread_fun()
The smp_mb() in cpuhp_thread_fun() is misplaced. It needs to be after the
load of st->should_run to prevent reordering of the later load/stores
w.r.t. the load of st->should_run.
Jann Horn [Fri, 31 Aug 2018 19:41:51 +0000 (21:41 +0200)]
x86/process: Don't mix user/kernel regs in 64bit __show_regs()
When the kernel.print-fatal-signals sysctl has been enabled, a simple
userspace crash will cause the kernel to write a crash dump that contains,
among other things, the kernel gsbase into dmesg.
As suggested by Andy, limit output to pt_regs, FS_BASE and KERNEL_GS_BASE
in this case.
This also moves the bitness-specific logic from show_regs() into
process_{32,64}.c.
Chuanhua Lei [Thu, 6 Sep 2018 10:03:23 +0000 (18:03 +0800)]
x86/tsc: Prevent result truncation on 32bit
Loops per jiffy is calculated by multiplying tsc_khz with 1e3 and then
dividing it by HZ.
Both tsc_khz and the temporary variable holding the multiplication result
are of type unsigned long, so on 32bit the result is truncated to the lower
32bit.
Use u64 as type for the temporary variable and cast tsc_khz to it before
multiplying.
[ tglx: Massaged changelog and removed pointless braces ]
Fixes: b4dc10ba0daa ("x86/tsc: Calibrate tsc only once") Signed-off-by: Chuanhua Lei <chuanhua.lei@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: yixin.zhu@linux.intel.com Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Len Brown <len.brown@intel.com> Cc: Pavel Tatashin <pasha.tatashin@microsoft.com> Cc: Rajvi Jingar <rajvi.jingar@intel.com> Cc: Dou Liyang <douly.fnst@cn.fujitsu.com> Link: https://lkml.kernel.org/r/1536228203-18701-1-git-send-email-chuanhua.lei@linux.intel.com
Commit 5ad14e651c6a (ACPI / LPSS: Avoid PM quirks on suspend and resume
from hibernation) bypasses lpss quirks for S3 and S4, by setting a flag
for S3/S4 in acpi_lpss_suspend(), and check that flag in
acpi_lpss_resume().
But this overlooks the boot case where acpi_lpss_resume() may get called
without a corresponding acpi_lpss_suspend() having been called.
Thus force setting the flag during boot.
Fixes: 5ad14e651c6a (ACPI / LPSS: Avoid PM quirks on suspend and resume from hibernation) Link: https://bugzilla.kernel.org/show_bug.cgi?id=200989 Reported-and-tested-by: William Lieurance <william.lieurance@namikoda.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com> Cc: 4.15+ <stable@vger.kernel.org> # 4.15+: 5ad14e651c6a (ACPI / LPSS: Avoid ...) Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Jean Delvare [Tue, 4 Sep 2018 12:55:26 +0000 (14:55 +0200)]
ACPI / bus: Only call dmi_check_system() on X86
Calling dmi_check_system() early only works on X86. Other
architectures initialize the DMI subsystem later so it's not
ready yet when ACPI itself gets initialized.
In the best case it results in a useless call to a function which
will do nothing. But depending on the dmi implementation, it could
also result in warnings. Best is to not call the function when it
can't work and isn't needed.
Additionally, if anyone ever needs to add non-x86 quirks, it would
surprisingly not work, so document the limitation to avoid confusion.
Signed-off-by: Jean Delvare <jdelvare@suse.de> Fixes: fe375c4eecd3 (ACPI: fix early DSDT dmi check warnings on ia64) Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
block: don't warn when doing fsync on read-only devices
It is possible to call fsync on a read-only handle (for example, fsck.ext2
does it when doing read-only check), and this call results in kernel
warning.
The patch ea6e80c9ca4d ("block: don't warn for flush on read-only device")
attempted to disable the warning, but it is buggy and it doesn't
(op_is_flush tests flags, but bio_op strips off the flags).
Peter Robinson [Fri, 20 Jul 2018 23:02:12 +0000 (00:02 +0100)]
hwmon: rpi: add module alias to raspberrypi-hwmon
The raspberrypi-hwmon driver doesn't automatically load, although it does work
when loaded, by adding the alias it auto loads as expected when built as a
module. Tested on RPi2/RPi3 on 32 bit kernel and RPi3B+ on aarch64 with
Fedora 28 and a patched 4.18 RC kernel.
Fixes: 3c493c885cf ("hwmon: Add support for RPi voltage sensor") Signed-off-by: Peter Robinson <pbrobinson@gmail.com> CC: Stefan Wahren <stefan.wahren@i2se.com> CC: Eric Anholt <eric@anholt.net> Acked-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Stefan Wahren <stefan.wahren@i2se.com> Reviewed-by: Eric Anholt <eric@anholt.net> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Merge tag 'gpio-v4.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
Pull GPIO fixes from Linus Walleij:
"Some GPIO fixes. The ACPI stuff is probably the most annoying for
users that get fixed this time.
- Atomic contexts, cansleep* calls and such fastpath/slopwpath
things.
- Defer ACPI event handler registration to late_initcall() so IRQs do
not fire in our face before other drivers have a chance to register
handlers.
- Race condition if a consumer requests a GPIO after
gpiochip_add_data_with_key() but before of_gpiochip_add()
- Probe errorpath in the dwapb driver"
* tag 'gpio-v4.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
gpio: Fix crash due to registration race
gpio: dwapb: Fix error handling in dwapb_gpio_probe()
gpiolib-acpi: Register GpioInt ACPI event handlers from a late_initcall
gpiolib: acpi: Switch to cansleep version of GPIO library call
gpio: adp5588: Fix sleep-in-atomic-context bug
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"A set of very minor fixes and a couple of reverts to fix a major
problem (the attempt to change the busy count causes a hang when
attempting to change the drive cache type)"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: aacraid: fix a signedness bug
Revert "scsi: core: avoid host-wide host_busy counter for scsi_mq"
Revert "scsi: core: fix scsi_host_queue_ready"
scsi: libata: Add missing newline at end of file
scsi: target: iscsi: cxgbit: use pr_debug() instead of pr_info()
scsi: hpsa: limit transfer length to 1MB, not 512kB
scsi: lpfc: Correct MDS diag and nvmet configuration
scsi: lpfc: Default fdmi_on to on
scsi: csiostor: fix incorrect port capabilities
scsi: csiostor: add a check for NULL pointer after kmalloc()
scsi: documentation: add scsi_mod.use_blk_mq to scsi-parameters
scsi: core: Update SCSI_MQ_DEFAULT help text to match default
Merge tag 'nds32-for-linus-4.19-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/greentime/linux
Pull nds32 updates from Greentime Hu:
"Contained in here are the bug fixes, building error fixes and ftrace
support for nds32"
* tag 'nds32-for-linus-4.19-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/greentime/linux:
nds32: linker script: GCOV kernel may refers data in __exit
nds32: fix build error because of wrong semicolon
nds32: Fix a kernel panic issue because of wrong frame pointer access.
nds32: Only print one page of stack when die to prevent printing too much information.
nds32: Add macro definition for offset of lp register on stack
nds32: Remove the deprecated ABI implementation
nds32/stack: Get real return address by using ftrace_graph_ret_addr
nds32/ftrace: Support dynamic function graph tracer
nds32/ftrace: Support dynamic function tracer
nds32/ftrace: Add RECORD_MCOUNT support
nds32/ftrace: Support static function graph tracer
nds32/ftrace: Support static function tracer
nds32: Extract the checking and getting pointer to a macro
nds32: Clean up the coding style
nds32: Fix get_user/put_user macro expand pointer problem
nds32: Fix empty call trace
nds32: add NULL entry to the end of_device_id array
nds32: fix logic for module
tracing: Add back in rcu_irq_enter/exit_irqson() for rcuidle tracepoints
Borislav reported the following splat:
=============================
WARNING: suspicious RCU usage
4.19.0-rc1+ #1 Not tainted
-----------------------------
./include/linux/rcupdate.h:631 rcu_read_lock() used illegally while idle!
other info that might help us debug this:
RCU used illegally from idle CPU!
rcu_scheduler_active = 2, debug_locks = 1
RCU used illegally from extended quiescent state!
1 lock held by swapper/0/0:
#0: 000000004557ee0e (rcu_read_lock){....}, at: perf_event_output_forward+0x0/0x130
This is due to the tracepoints moving to SRCU usage which does not require
RCU to be "watching". But perf uses these tracepoints with RCU and expects
it to be. Hence, we still need to add in the rcu_irq_enter/exit_irqson()
calls for "rcuidle" tracepoints. This is a temporary fix until we have SRCU
working in NMI context, and then perf can be converted to use that instead
of normal RCU.
Link: http://lkml.kernel.org/r/20180904162611.6a120068@gandalf.local.home Cc: x86-ml <x86@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Reported-by: Borislav Petkov <bp@alien8.de> Tested-by: Borislav Petkov <bp@alien8.de> Reviewed-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Fixes: be52de1fa074c ("tracepoint: Make rcuidle tracepoint callers use SRCU") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Greentime Hu [Tue, 4 Sep 2018 06:25:57 +0000 (14:25 +0800)]
nds32: linker script: GCOV kernel may refers data in __exit
This patch is used to fix nds32 allmodconfig/allyesconfig build error
because GCOV kernel embeds counters in the kernel for each line
and a part of that embed in __exit text. So we need to keep the
EXIT_TEXT and EXIT_DATA if CONFIG_GCOV_KERNEL=y.
Link: https://lkml.org/lkml/2018/9/1/125 Signed-off-by: Greentime Hu <greentime@andestech.com> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
nilfs2: convert to SPDX license tags
drivers/dax/device.c: convert variable to vm_fault_t type
lib/Kconfig.debug: fix three typos in help text
checkpatch: add __ro_after_init to known $Attribute
mm: fix BUG_ON() in vmf_insert_pfn_pud() from VM_MIXEDMAP removal
uapi/linux/keyctl.h: don't use C++ reserved keyword as a struct member name
memory_hotplug: fix kernel_panic on offline page processing
checkpatch: add optional static const to blank line declarations test
ipc/shm: properly return EIDRM in shm_lock()
mm/hugetlb: filter out hugetlb pages if HUGEPAGE migration is not supported.
mm/util.c: improve kvfree() kerneldoc
tools/vm/page-types.c: fix "defined but not used" warning
tools/vm/slabinfo.c: fix sign-compare warning
kmemleak: always register debugfs file
mm: respect arch_dup_mmap() return value
mm, oom: fix missing tlb_finish_mmu() in __oom_reap_task_mm().
mm: memcontrol: print proper OOM header when no eligible victim left
Dave Jiang [Tue, 4 Sep 2018 22:46:16 +0000 (15:46 -0700)]
mm: fix BUG_ON() in vmf_insert_pfn_pud() from VM_MIXEDMAP removal
It looks like I missed the PUD path when doing VM_MIXEDMAP removal.
This can be triggered by:
1. Boot with memmap=4G!8G
2. build ndctl with destructive flag on
3. make TESTS=device-dax check
[ +0.000675] kernel BUG at mm/huge_memory.c:824!
Applying the same change that was applied to vmf_insert_pfn_pmd() in the
original patch.
Link: http://lkml.kernel.org/r/153565957352.35524.1005746906902065126.stgit@djiang5-desk3.ch.intel.com Fixes: 45b3307d95e ("dax: remove VM_MIXEDMAP for fsdax and device dax") Signed-off-by: Dave Jiang <dave.jiang@intel.com> Reported-by: Vishal Verma <vishal.l.verma@intel.com> Tested-by: Vishal Verma <vishal.l.verma@intel.com> Acked-by: Jeff Moyer <jmoyer@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Randy Dunlap [Tue, 4 Sep 2018 22:46:13 +0000 (15:46 -0700)]
uapi/linux/keyctl.h: don't use C++ reserved keyword as a struct member name
Since this header is in "include/uapi/linux/", apparently people want to
use it in userspace programs -- even in C++ ones. However, the header
uses a C++ reserved keyword ("private"), so change that to "dh_private"
instead to allow the header file to be used in C++ userspace.
Fixes https://bugzilla.kernel.org/show_bug.cgi?id=191051 Link: http://lkml.kernel.org/r/0db6c314-1ef4-9bfa-1baa-7214dd2ee061@infradead.org Fixes: ae5f7ff9f45a ("KEYS: Add KEYCTL_DH_COMPUTE command") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: David Howells <dhowells@redhat.com> Cc: James Morris <jmorris@namei.org> Cc: "Serge E. Hallyn" <serge@hallyn.com> Cc: Mat Martineau <mathew.j.martineau@linux.intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
That VM_BUG_ON was triggered by the page poisoning introduced in
mm/sparse.c with the git commit 4e55b6a92d7a ("mm/memory_hotplug:
optimize memory hotplug").
With the same commit the new 'nid' field has been added to the struct
memory_block in order to store and later on derive the node id for
offline pages (instead of accessing struct page which might be
uninitialized). But one reference to nid in show_valid_zones() function
has been overlooked. Fixed with current commit. Also, nr_pages will
not be used any more after test_pages_in_a_zone() call, do not update
it.
Joe Perches [Tue, 4 Sep 2018 22:46:06 +0000 (15:46 -0700)]
checkpatch: add optional static const to blank line declarations test
Using a static const struct definition as part of a series of
declarations produces a false positive "Missing a blank line after
declarations" for code like:
WARNING: Missing a blank line after declarations
#710: FILE: drivers/gpu/drm/tidss/tidss_scale_coefs.c:137:
+ int inc;
+ static const struct {
When getting rid of the general ipc_lock(), this was missed furthermore,
making the comment around the ipc object validity check bogus. Under
EIDRM conditions, callers will in turn not see the error and continue
with the operation.
mm/hugetlb: filter out hugetlb pages if HUGEPAGE migration is not supported.
When scanning for movable pages, filter out Hugetlb pages if hugepage
migration is not supported. Without this we hit infinte loop in
__offline_pages() where we do
pfn = scan_movable_pages(start_pfn, end_pfn);
if (pfn) { /* We have movable pages */
ret = do_migrate_range(pfn, end_pfn);
goto repeat;
}
Fix this by checking hugepage_migration_supported both in
has_unmovable_pages which is the primary backoff mechanism for page
offlining and for consistency reasons also into scan_movable_pages
because it doesn't make any sense to return a pfn to non-migrateable
huge page.
This issue was revealed by, but not caused by c5b52c6e41f2 ("mm,
memory_hotplug: do not fail offlining too early").
Link: http://lkml.kernel.org/r/20180824063314.21981-1-aneesh.kumar@linux.ibm.com Fixes: c5b52c6e41f2 ("mm, memory_hotplug: do not fail offlining too early") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Reported-by: Haren Myneni <haren@linux.vnet.ibm.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton [Tue, 4 Sep 2018 22:45:55 +0000 (15:45 -0700)]
mm/util.c: improve kvfree() kerneldoc
Scooped from an email from Matthew.
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
slabinfo.c:854:22: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
if (s->object_size < min_objsize)
^
due to the mismatch of signed/unsigned comparison. ->object_size and
->slab_size are never expected to be negative, so let's define them as
unsigned int.
If kmemleak built in to the kernel, but is disabled by default, the
debugfs file is never registered. Because of this, it is not possible
to find out if the kernel is built with kmemleak support by checking for
the presence of this file. To allow this, always register the file.
After this patch, if the file doesn't exist, kmemleak is not available
in the kernel. If writing "scan" or any other value than "clear" to
this file results in EBUSY, then kmemleak is available but is disabled
by default and can be activated via the kernel command line.
Catalin: "that's also consistent with a late disabling of kmemleak when
the debugfs entry sticks around."
Link: http://lkml.kernel.org/r/20180824131220.19176-1-vincent.whitchurch@axis.com Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nadav Amit [Tue, 4 Sep 2018 22:45:41 +0000 (15:45 -0700)]
mm: respect arch_dup_mmap() return value
Commit 1536f4900e33 ("include/linux/sched/mm.h: uninline mmdrop_async(),
etc") ignored the return value of arch_dup_mmap(). As a result, on x86,
a failure to duplicate the LDT (e.g. due to memory allocation error)
would leave the duplicated memory mapping in an inconsistent state.
Fix by using the return value, as it was before the change.
Link: http://lkml.kernel.org/r/20180823051229.211856-1-namit@vmware.com Fixes: 1536f4900e33a ("include/linux/sched/mm.h: uninline mmdrop_async(), etc") Signed-off-by: Nadav Amit <namit@vmware.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
mm, oom: fix missing tlb_finish_mmu() in __oom_reap_task_mm().
Commit 4af644febfdb ("mm, oom: distinguish blockable mode for mmu
notifiers") has added an ability to skip over vmas with blockable mmu
notifiers. This however didn't call tlb_finish_mmu as it should.
As a result inc_tlb_flush_pending has been called without its pairing
dec_tlb_flush_pending and all callers mm_tlb_flush_pending would flush
even though this is not really needed. This alone is not harmful and it
seems there shouldn't be any such callers for oom victims at all but
there is no real reason to skip tlb_finish_mmu on early skip either so
call it.
[mhocko@suse.com: new changelog] Link: http://lkml.kernel.org/r/b752d1d5-81ad-7a35-2394-7870641be51c@i-love.sakura.ne.jp Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Acked-by: Michal Hocko <mhocko@suse.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Johannes Weiner [Tue, 4 Sep 2018 22:45:34 +0000 (15:45 -0700)]
mm: memcontrol: print proper OOM header when no eligible victim left
When the memcg OOM killer runs out of killable tasks, it currently
prints a WARN with no further OOM context. This has caused some user
confusion.
Warnings indicate a kernel problem. In a reported case, however, the
situation was triggered by a nonsensical memcg configuration (hard limit
set to 0). But without any VM context this wasn't obvious from the
report, and it took some back and forth on the mailing list to identify
what is actually a trivial issue.
Handle this OOM condition like we handle it in the global OOM killer:
dump the full OOM context and tell the user we ran out of tasks.
This way the user can identify misconfigurations easily by themselves
and rectify the problem - without having to go through the hassle of
running into an obscure but unsettling warning, finding the appropriate
kernel mailing list and waiting for a kernel developer to remote-analyze
that the memcg configuration caused this.
If users cannot make sense of why the OOM killer was triggered or why it
failed, they will still report it to the mailing list, we know that from
experience. So in case there is an actual kernel bug causing this,
kernel developers will very likely hear about it.
Link: http://lkml.kernel.org/r/20180821160406.22578-1-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ARC: don't check for HIGHMEM pages in arch_dma_alloc
__GFP_HIGHMEM flag is cleared by upper layer functions
(in include/linux/dma-mapping.h) so we'll never get a
__GFP_HIGHMEM flag in arch_dma_alloc gfp argument.
That's why alloc_pages will never return highmem page
here.
Get rid of highmem pages handling and cleanup arch_dma_alloc
and arch_dma_free functions.
So far the IOC treatment was global on ARC, being turned on (or off)
for all devices in the system. With this patch, this can now be done
per device using the "dma-coherent" DT property; IOW with this patch
we can use both HW-coherent and regular DMA peripherals simultaneously.
The changes involved are too many so enlisting the summary below:
1. common code calls ARC arch_setup_dma_ops() per device.
2. For coherent dma (IOC) it plugs in generic @dma_direct_ops which
doesn't need any arch specific backend: No need for any explicit
cache flushes or MMU mappings to provide for uncached access
- dma_(map|sync)_single* return early as corresponding dma ops callbacks
are NULL in generic code.
So arch_sync_dma_*() -> dma_cache_*() need not handle the coherent
dma case, hence drop ARC __dma_cache_*_ioc() which were no-op anyways
3. For noncoherent dma (non IOC) generic @dma_noncoherent_ops is used
which in turns calls ARC specific routines
- arch_dma_alloc() no longer checks for @ioc_enable since this is
called only for !IOC case.
1) Must perform TXQ teardown before unregistering interfaces in
mac80211, from Toke Høiland-Jørgensen.
2) Don't allow creating mac80211_hwsim with less than one channel, from
Johannes Berg.
3) Division by zero in cfg80211, fix from Johannes Berg.
4) Fix endian issue in tipc, from Haiqing Bai.
5) BPF sockmap use-after-free fixes from Daniel Borkmann.
6) Spectre-v1 in mac80211_hwsim, from Jinbum Park.
7) Missing rhashtable_walk_exit() in tipc, from Cong Wang.
8) Revert kvzalloc() conversion of AF_PACKET, it breaks mmap() when
kvzalloc() tries to use kmalloc() pages. From Eric Dumazet.
9) Fix deadlock in hv_netvsc, from Dexuan Cui.
10) Do not restart timewait timer on RST, from Florian Westphal.
11) Fix double lwstate refcount grab in ipv6, from Alexey Kodanev.
12) Unsolicit report count handling is off-by-one, fix from Hangbin Liu.
13) Sleep-in-atomic in cadence driver, from Jia-Ju Bai.
14) Respect ttl-inherit in ip6 tunnel driver, from Hangbin Liu.
15) Use-after-free in act_ife, fix from Cong Wang.
16) Missing hold to meta module in act_ife, from Vlad Buslov.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (91 commits)
net: phy: sfp: Handle unimplemented hwmon limits and alarms
net: sched: action_ife: take reference to meta module
act_ife: fix a potential use-after-free
net/mlx5: Fix SQ offset in QPs with small RQ
tipc: correct spelling errors for tipc_topsrv_queue_evt() comments
tipc: correct spelling errors for struct tipc_bc_base's comment
bnxt_en: Do not adjust max_cp_rings by the ones used by RDMA.
bnxt_en: Clean up unused functions.
bnxt_en: Fix firmware signaled resource change logic in open.
sctp: not traverse asoc trans list if non-ipv6 trans exists for ipv6_flowlabel
sctp: fix invalid reference to the index variable of the iterator
net/ibm/emac: wrong emac_calc_base call was used by typo
net: sched: null actions array pointer before releasing action
vhost: fix VHOST_GET_BACKEND_FEATURES ioctl request definition
r8169: add support for NCube 8168 network card
ip6_tunnel: respect ttl inherit for ip6tnl
mac80211: shorten the IBSS debug messages
mac80211: don't Tx a deauth frame if the AP forbade Tx
mac80211: Fix station bandwidth setting after channel switch
mac80211: fix a race between restart and CSA flows
...
Andrew Lunn [Tue, 4 Sep 2018 02:23:56 +0000 (04:23 +0200)]
net: phy: sfp: Handle unimplemented hwmon limits and alarms
Not all SFPs implement the registers containing sensor limits and
alarms. Luckily, there is a bit indicating if they are implemented or
not. Add checking for this bit, when deciding if the hwmon attributes
should be visible.
Fixes: 3571a0f56659 ("net: phy: sfp: Add HWMON support for module sensors") Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
net: sched: action_ife: take reference to meta module
Recent refactoring of add_metainfo() caused use_all_metadata() to add
metainfo to ife action metalist without taking reference to module. This
causes warning in module_put called from ife action cleanup function.
Implement add_metainfo_and_get_ops() function that returns with reference
to module taken if metainfo was added successfully, and call it from
use_all_metadata(), instead of calling __add_metainfo() directly.
Fixes: 28cc946751c9 ("act_ife: fix a potential deadlock") Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Cong Wang [Mon, 3 Sep 2018 18:08:15 +0000 (11:08 -0700)]
act_ife: fix a potential use-after-free
Immediately after module_put(), user could delete this
module, so e->ops could be already freed before we call
e->ops->release().
Fix this by moving module_put() after ops->release().
Fixes: c8e4759039b0 ("introduce IFE action") Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Correct the formula for calculating the RQ page remainder,
which should be in byte granularity. The result will be
non-zero only for RQs smaller than PAGE_SIZE, as an RQ size
is a power of 2.
Divide this by the SQ stride (MLX5_SEND_WQE_BB) to get the
SQ offset in strides granularity.
Fixes: 4d8df7ccad94 ("net/mlx5: Fix QP fragmented buffer allocation") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Merge tag 'kvm-ppc-fixes-4.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc
PPC KVM fixes for 4.19
Two small fixes for KVM on POWER machines; one fixes a bug where pages
might not get marked dirty, causing guest memory corruption on migration,
and the other fixes a bug causing reads from guest memory to use the
wrong guest real address for very large HPT guests (>256G of memory),
leading to failures in instruction emulation.
syzbot reports a divide-by-zero off the NBD_SET_BLKSIZE ioctl.
We need proper validation of the input here. Not just if it's
zero, but also if the value is a power-of-2 and in a valid
range. Add that.
Felipe Balbi [Mon, 3 Sep 2018 08:24:57 +0000 (11:24 +0300)]
i2c: i801: fix DNV's SMBCTRL register offset
DNV's iTCO is slightly different with SMBCTRL sitting at a different
offset when compared to all other devices. Let's fix so that we can
properly use iTCO watchdog.
Fixes: 420cfc05c165 ("i2c: i801: Add support for Intel DNV") Cc: <stable@vger.kernel.org> # v4.4+ Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com> Reviewed-by: Jean Delvare <jdelvare@suse.de> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>