Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Thomas Gleixner:
"Three fixlets for perf:
- add a missing NULL pointer check in the intel BTS driver
- make BTS an exclusive PMU because BTS can only handle one event at
a time
- ensure that exclusive events are limited to one PMU so that several
exclusive events can be scheduled on different PMU instances"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/core: Limit matching exclusive events to one PMU
perf/x86/intel/bts: Make it an exclusive PMU
perf/x86/intel/bts: Make sure debug store is valid
Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fixes from Thomas Gleixner:
"Two smallish fixes:
- use the proper asm constraint in the Super-H atomic_fetch_ops
- a trivial typo fix in the Kconfig help text"
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/hung_task: Fix typo in CONFIG_DETECT_HUNG_TASK help text
locking/atomic, arch/sh: Fix ATOMIC_FETCH_OP()
Merge branch 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI fixes from Thomas Gleixner:
"Two fixes for EFI/PAT:
- a 32bit overflow bug in the PAT code which was unearthed by the
large EFI mappings
- prevent a boot hang on large systems when EFI mixed mode is enabled
but not used"
* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/efi: Only map RAM into EFI page tables if in mixed-mode
x86/mm/pat: Prevent hang during boot when mapping pages
Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
"Three fixes for irq core and irq chip drivers:
- Do not set the irq type if type is NONE. Fixes a boot regression
on various SoCs
- Use the proper cpu for setting up the GIC target list. Discovered
by the cpumask debugging code.
- A rather large fix for the MIPS-GIC so per cpu local interrupts
work again. This was discovered late because the code falls back
to slower timers which use normal device interrupts"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/mips-gic: Fix local interrupts
irqchip/gicv3: Silence noisy DEBUG_PER_CPU_MAPS warning
genirq: Skip chained interrupt trigger setup if type is IRQ_TYPE_NONE
Merge branch 'hughd-fixes' (patches from Hugh Dickins)
Merge VM fixes from High Dickins:
"I get the impression that Andrew is away or busy at the moment, so I'm
going to send you three independent uncontroversial little mm fixes
directly - though none is strictly a 4.8 regression fix.
- shmem: fix tmpfs to handle the huge= option properly from Toshi
Kani is a one-liner to fix a major embarrassment in 4.8's hugepages
on tmpfs feature: although Hillf pointed it out in June, somehow
both Kirill and I repeatedly dropped the ball on this one. You
might wonder if the feature got tested at all with that bug in:
yes, it did, but for wider testing coverage, Kirill and I had each
relied too much on an override which bypasses that condition.
- huge tmpfs: fix Committed_AS leak just a run-of-the-mill accounting
fix in the same feature.
- mm: delete unnecessary and unsafe init_tlb_ubc() is an unrelated
fix to 4.3's TLB flush batching in reclaim: the bug would be rare,
and none of us will be shamed if this one misses 4.8; but it got
such a quick ack from Mel today that I'm inclined to offer it along
with the first two"
* emailed patches from Hugh Dickins <hughd@google.com>:
mm: delete unnecessary and unsafe init_tlb_ubc()
huge tmpfs: fix Committed_AS leak
shmem: fix tmpfs to handle the huge= option properly
init_tlb_ubc() looked unnecessary to me: tlb_ubc is statically
initialized with zeroes in the init_task, and copied from parent to
child while it is quiescent in arch_dup_task_struct(); so I went to
delete it.
But inserted temporary debug WARN_ONs in place of init_tlb_ubc() to
check that it was always empty at that point, and found them firing:
because memcg reclaim can recurse into global reclaim (when allocating
biosets for swapout in my case), and arrive back at the init_tlb_ubc()
in shrink_node_memcg().
Resetting tlb_ubc.flush_required at that point is wrong: if the upper
level needs a deferred TLB flush, but the lower level turns out not to,
we miss a TLB flush. But fortunately, that's the only part of the
protocol that does not nest: with the initialization removed, cpumask
collects bits from upper and lower levels, and flushes TLB when needed.
Fixes: 72b252aed506 ("mm: send one IPI per CPU to TLB flush all entries after unmapping pages") Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: stable@vger.kernel.org # 4.3+ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Under swapping load on huge tmpfs, /proc/meminfo's Committed_AS grows
bigger and bigger: just a cosmetic issue for most users, but disabling
for those who run without overcommit (/proc/sys/vm/overcommit_memory 2).
shmem_uncharge() was forgetting to unaccount __vm_enough_memory's
charge, and shmem_charge() was forgetting it on the filesystem-full
error path.
Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"Three driver bugfixes: fixing uninitialized memory pointers (eg20t),
pm/clock imbalance (qup), and a wrongly set cached variable (pc954x)"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: qup: skip qup_i2c_suspend if the device is already runtime suspended
i2c: mux: pca954x: retry updating the mux selection on failure
i2c-eg20t: fix race between i2c init and interrupt enable
Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"Three fixes, two regressions and one that poses a problem in blk-mq
with the new nvmef code"
* 'for-linus' of git://git.kernel.dk/linux-block:
blk-mq: skip unmapped queues in blk_mq_alloc_request_hctx
nvme-rdma: only clear queue flags after successful connect
blk-throttle: Extend slice if throttle group is not empty
Merge branch 'for-linus-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
"Josef fixed a problem when quotas are enabled with his latest ENOSPC
rework, and Jeff added more checks into the subvol ioctls to avoid
tripping up lookup_one_len"
* 'for-linus-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
btrfs: ensure that file descriptor used with subvol ioctls is a dir
Btrfs: handle quota reserve failure properly
Merge tag 'regmap-fix-v4.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap
Pull regmap fix from Mark Brown:
"A fix for an issue with double locking that was introduced earlier
this release. I'd missed in review that we were already in a locked
region when trying to drop part of the cache"
* tag 'regmap-fix-v4.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
regmap: fix deadlock on _regmap_raw_write() error path
Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
"This fixes a regression in RSA that was only half-fixed earlier in the
cycle. It also fixes an older regression that breaks the keyring
subsystem"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: rsa-pkcs1pad - Handle leading zero for decryption
KEYS: Fix skcipher IV clobbering
Merge tag 'tags/nand-fixes-for-4.8-rc8' of git://git.infradead.org/linux-ubifs
Pull MTD fixes from Richard Weinberger:
"NAND Fixes for 4.8-rc8.
This contains fixes for bugs which got introduced in -rc1. Usually
Brian takes NAND patches from Boris, but since Brian is very busy
these days with other stuff and Boris is not yet member of the
kernel.org web of trust I stepped in.
Boris will be in Berlin at ELCE, I'll sign his key and hopefully other
Kernel developers too such that he can issue his own pull requests
soon.
Summary:
- Fix a wrong OOB layout definition in the mxc driver
- Fix incorrect ECC handling in the mtk driver"
* tag 'tags/nand-fixes-for-4.8-rc8' of git://git.infradead.org/linux-ubifs:
mtd: nand: mxc: fix obiwan error in mxc_nand_v[12]_ooblayout_free() functions
mtd: nand: fix chances to create incomplete ECC data when writing
mtd: nand: fix generating over-boundary ECC data when writing
Handle read-only cases when CONFIG_DEBUG_RODATA (4.0) or
CONFIG_DEBUG_SET_MODULE_RONX (3.18) are enabled by using
aarch64_insn_write() instead of probe_kernel_write() as introduced by
commit 2f896d586610 ("arm64: use fixmap for text patching") in 4.0.
Fixes: 11d91a770f1f ("arm64: Add CONFIG_DEBUG_SET_MODULE_RONX support") Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Jason Wessel <jason.wessel@windriver.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
David Daney [Tue, 20 Sep 2016 18:46:35 +0000 (11:46 -0700)]
arm64: Call numa_store_cpu_info() earlier.
The wq_numa_init() function makes a private CPU to node map by calling
cpu_to_node() early in the boot process, before the non-boot CPUs are
brought online. Since the default implementation of cpu_to_node()
returns zero for CPUs that have never been brought online, the
workqueue system's view is that *all* CPUs are on node zero.
When the unbound workqueue for a non-zero node is created, the
tsk_cpus_allowed() for the worker threads is the empty set because
there are, in the view of the workqueue system, no CPUs on non-zero
nodes. The code in try_to_wake_up() using this empty cpumask ends up
using the cpumask empty set value of NR_CPUS as an index into the
per-CPU area pointer array, and gets garbage as it is one past the end
of the array. This results in:
Fix by moving call to numa_store_cpu_info() for all CPUs into
smp_prepare_cpus(), which happens before wq_numa_init(). Since
smp_store_cpu_info() now contains only a single function call,
simplify by removing the function and out-lining its contents.
Suggested-by: Robert Richter <rric@kernel.org> Fixes: 1a2db300348b ("arm64, numa: Add NUMA support for arm64 platforms.") Cc: <stable@vger.kernel.org> # 4.7.x- Signed-off-by: David Daney <david.daney@cavium.com> Reviewed-by: Robert Richter <rrichter@cavium.com> Tested-by: Yisheng Xie <xieyisheng1@huawei.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Sudeep Holla [Thu, 25 Aug 2016 11:23:39 +0000 (12:23 +0100)]
i2c: qup: skip qup_i2c_suspend if the device is already runtime suspended
If the i2c device is already runtime suspended, if qup_i2c_suspend is
executed during suspend-to-idle or suspend-to-ram it will result in the
following splat:
CPU: 3 PID: 1593 Comm: bash Tainted: G W 4.8.0-rc3 #14
Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
PC is at clk_core_unprepare+0x80/0x90
LR is at clk_unprepare+0x28/0x40
pc : [<ffff0000086eecf0>] lr : [<ffff0000086f0c58>] pstate: 60000145
Call trace:
clk_core_unprepare+0x80/0x90
qup_i2c_disable_clocks+0x2c/0x68
qup_i2c_suspend+0x10/0x20
platform_pm_suspend+0x24/0x68
...
This patch fixes the issue by executing qup_i2c_pm_suspend_runtime
conditionally in qup_i2c_suspend.
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Reviewed-by: Andy Gross <andy.gross@linaro.org> Signed-off-by: Wolfram Sang <wsa@the-dreams.de> Cc: stable@kernel.org
perf/core: Limit matching exclusive events to one PMU
An "exclusive" PMU is the one that can only have one event scheduled in
at any given time. There may be more than one of such PMUs in a system,
though, like Intel PT and BTS. It should be allowed to have one event
for either of those inside the same context (there may be other constraints
that may prevent this, but those would be hardware-specific). However,
the exclusivity code is written so that only one event from any of the
"exclusive" PMUs is allowed in a context.
Fix this by making the exclusive event filter explicitly match two events'
PMUs.
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: vince@deater.net Link: http://lkml.kernel.org/r/20160920154811.3255-3-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
Just like intel_pt, intel_bts can only handle one event at a time,
which is the reason we introduced PERF_PMU_CAP_EXCLUSIVE in the first
place. However, at the moment one can have as many intel_bts events
within the same context at the same time as one pleases. Only one of
them, however, will get scheduled and receive the actual trace data.
Fix this by making intel_bts an "exclusive" PMU.
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: vince@deater.net Link: http://lkml.kernel.org/r/20160920154811.3255-2-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
regmap: fix deadlock on _regmap_raw_write() error path
Commit 815806e39bf6 ("regmap: drop cache if the bus transfer error")
added a call to regcache_drop_region() to error path in
_regmap_raw_write(). However that path runs with regmap lock taken,
and regcache_drop_region() tries to re-take it, causing a deadlock.
Fix that by calling map->cache_ops->drop() directly.
Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com> Signed-off-by: Mark Brown <broonie@kernel.org>
When there is no Card which is set to "broken-cd", it's displayed a clock
information continuously. Because it's polling for detecting card.
This patch is fixed this problem.
Since the TFO socket is accepted right off SYN-data, the socket
owner can call getsockopt(TCP_INFO) to collect ongoing SYN-ACK
retransmission or timeout stats (i.e., tcpi_total_retrans,
tcpi_retransmits). Currently those stats are only updated
upon handshake completes. This patch fixes it.
Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes these under-accounting SNMP rtx stats
LINUX_MIB_TCPFORWARDRETRANS
LINUX_MIB_TCPFASTRETRANS
LINUX_MIB_TCPSLOWSTARTRETRANS
when retransmitting TSO packets
Fixes: 10d3be569243 ("tcp-tso: do not split TSO packets at retransmit time") Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Wed, 21 Sep 2016 03:33:15 +0000 (23:33 -0400)]
MAINTAINERS: Update b44 maintainer.
Taking over as maintainer since Gary Zambrano is no longer working
for Broadcom.
Signed-off-by: Michael Chan <michael.chan@broadcom.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 21 Sep 2016 01:06:17 +0000 (18:06 -0700)]
net: get rid of an signed integer overflow in ip_idents_reserve()
Jiri Pirko reported an UBSAN warning happening in ip_idents_reserve()
[] UBSAN: Undefined behaviour in ./arch/x86/include/asm/atomic.h:156:11
[] signed integer overflow:
[] -2117905507 + -695755206 cannot be represented in type 'int'
Since we do not have uatomic_add_return() yet, use atomic_cmpxchg()
so that the arithmetics can be done using unsigned int.
Fixes: 04ca6973f7c1 ("ip: make IP identifiers less predictable") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
Kamal Heib [Tue, 20 Sep 2016 11:55:31 +0000 (14:55 +0300)]
net/mlx4_core: Fix to clean devlink resources
This patch cleans devlink resources by calling devlink_port_unregister()
to avoid the following issues:
- Kernel panic when triggering reset flow.
- Memory leak due to unfreed resources in mlx4_init_port_info().
Fixes: 09d4d087cd48 ("mlx4: Implement devlink interface") Signed-off-by: Kamal Heib <kamalh@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jeff Mahoney [Wed, 21 Sep 2016 12:31:29 +0000 (08:31 -0400)]
btrfs: ensure that file descriptor used with subvol ioctls is a dir
If the subvol/snapshot create/destroy ioctls are passed a regular file
with execute permissions set, we'll eventually Oops while trying to do
inode->i_op->lookup via lookup_one_len.
This patch ensures that the file descriptor refers to a directory.
Josef Bacik [Thu, 15 Sep 2016 18:57:48 +0000 (14:57 -0400)]
Btrfs: handle quota reserve failure properly
btrfs/022 was spitting a warning for the case that we exceed the quota. If we
fail to make our quota reservation we need to clean up our data space
reservation. Thanks,
Signed-off-by: Josef Bacik <jbacik@fb.com> Tested-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
i2c-eg20t: fix race between i2c init and interrupt enable
the eg20t driver call request_irq() function before the pch_base_address,
base address of i2c controller's register, is assigned an effective value.
there is one possible scenario that an interrupt which isn't inside eg20t
arrives immediately after request_irq() is executed when i2c controller
shares an interrupt number with others. since the interrupt handler
pch_i2c_handler() has already active as shared action, it will be called
and read its own register to determine if this interrupt is from itself.
At that moment, since base address of i2c registers is not remapped
in kernel space yet,so the INT handler will access an illegal address
and then a error occurs.
Signed-off-by: Yadi.hu <yadi.hu@windriver.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de> Cc: stable@kernel.org
Marek Vasut [Mon, 19 Sep 2016 19:34:01 +0000 (21:34 +0200)]
net: can: ifi: Configure transmitter delay
Configure the transmitter delay register at +0x1c to correctly handle
the CAN FD bitrate switch (BRS). This moves the SSP (secondary sample
point) to a proper offset, so that the TDC mechanism works and won't
generate error frames on the CAN link.
Signed-off-by: Marek Vasut <marex@denx.de> Cc: Marc Kleine-Budde <mkl@pengutronix.de> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Oliver Hartkopp <socketcan@hartkopp.net> Cc: Wolfgang Grandegger <wg@grandegger.com> Cc: linux-stable <stable@vger.kernel.org> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Nicolas Dichtel [Mon, 19 Sep 2016 14:17:57 +0000 (16:17 +0200)]
vti6: fix input path
Since commit 1625f4529957, vti6 is broken, all input packets are dropped
(LINUX_MIB_XFRMINNOSTATES is incremented).
XFRM_TUNNEL_SKB_CB(skb)->tunnel.ip6 is set by vti6_rcv() before calling
xfrm6_rcv()/xfrm6_rcv_spi(), thus we cannot set to NULL that value in
xfrm6_rcv_spi().
A new function xfrm6_rcv_tnl() that enables to pass a value to
xfrm6_rcv_spi() is added, so that xfrm6_rcv() is not touched (this function
is used in several handlers).
CC: Alexey Kodanev <alexey.kodanev@oracle.com> Fixes: 1625f4529957 ("net/xfrm_input: fix possible NULL deref of tunnel.ip6->parms.i_key") Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
When I introduced the lastuse member I made a subtle error because it was
returned as an absolute value but that is meaningless to user-space as it
doesn't allow to see how old exactly an entry is. Let's make it similar to
how the bridge returns such values and make it relative to "now" (jiffies).
This allows us to show the actual age of the entries and is much more
useful (e.g. user-space daemons can age out entries, iproute2 can display
the lastuse properly).
Fixes: 43b9e1274060 ("net: ipmr/ip6mr: add support for keeping an entry age") Reported-by: Satish Ashok <sashok@cumulusnetworks.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
cxgb4/cxgb4vf: Allocate more queues for 25G and 100G adapter
We were missing check for 25G and 100G while checking port speed,
which lead to less number of queues getting allocated for 25G & 100G
adapters and leading to low throughput. Adding the missing check for
both NIC and vNIC driver.
Also fixes port advertisement for 25G and 100G in ethtool output.
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Al Viro [Tue, 20 Sep 2016 19:07:42 +0000 (20:07 +0100)]
fix fault_in_multipages_...() on architectures with no-op access_ok()
Switching iov_iter fault-in to multipages variants has exposed an old
bug in underlying fault_in_multipages_...(); they break if the range
passed to them wraps around. Normally access_ok() done by callers will
prevent such (and it's a guaranteed EFAULT - ERR_PTR() values fall into
such a range and they should not point to any valid objects).
However, on architectures where userland and kernel live in different
MMU contexts (e.g. s390) access_ok() is a no-op and on those a range
with a wraparound can reach fault_in_multipages_...().
Since any wraparound means EFAULT there, the fix is trivial - turn
those
while (uaddr <= end)
...
into
if (unlikely(uaddr > end))
return -EFAULT;
do
...
while (uaddr <= end);
Reported-by: Jan Stancek <jstancek@redhat.com> Tested-by: Jan Stancek <jstancek@redhat.com> Cc: stable@vger.kernel.org # v3.5+ Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
fffffc0000f3b1a8 is a module address. Modules may have compiled in
strings which could get copied to userspace. In this instance, it
looks like "." which matches with a size of 1 byte. Extend the
is_vmalloc_addr check to be is_vmalloc_or_module_addr to cover
all possible cases.
Signed-off-by: Laura Abbott <labbott@redhat.com> Signed-off-by: Kees Cook <keescook@chromium.org>
Paul Burton [Tue, 13 Sep 2016 16:53:35 +0000 (17:53 +0100)]
irqchip/mips-gic: Fix local interrupts
Since the device hierarchy domain was added by commit c98c1822ee13
("irqchip/mips-gic: Add device hierarchy domain"), GIC local interrupts
have been broken.
Users attempting to setup a per-cpu local IRQ, for example the GIC timer
clock events code in drivers/clocksource/mips-gic-timer.c, the
setup_percpu_irq function would refuse with -EINVAL because the GIC
irqchip driver never called irq_set_percpu_devid so the
IRQ_PER_CPU_DEVID flag was never set for the IRQ. This happens because
irq_set_percpu_devid was being called from the gic_irq_domain_map
function which is no longer called.
Doing only that runs into further problems because gic_dev_domain_alloc
set the struct irq_chip for all interrupts, local or shared, to
gic_level_irq_controller despite that only being suitable for shared
interrupts. The typical outcome of this is that gic_level_irq_controller
callback functions are called for local interrupts, and then hwirq
number calculations overflow & the driver ends up attempting to access
some invalid register with an address calculated from an invalid hwirq
number. Best case scenario is that this then leads to a bus error. This
is fixed by abstracting the setup of the hwirq & chip to a new function
gic_setup_dev_chip which is used by both the root GIC IRQ domain & the
device domain.
Finally, decoding local interrupts failed because gic_dev_domain_alloc
only called irq_domain_alloc_irqs_parent for shared interrupts. Local
ones were therefore never associated with hwirqs in the root GIC IRQ
domain and the virq in gic_handle_local_int would always be 0. This is
fixed by calling irq_domain_alloc_irqs_parent unconditionally & having
gic_irq_domain_alloc handle both local & shared interrupts, which is
easy due to the aforementioned abstraction of chip setup into
gic_setup_dev_chip.
This fixes use of the MIPS GIC timer for clock events, which has been
broken since c98c1822ee13 ("irqchip/mips-gic: Add device hierarchy
domain") but hadn't been noticed due to a silent fallback to the MIPS
coprocessor 0 count/compare clock events device.
Fixes: c98c1822ee13 ("irqchip/mips-gic: Add device hierarchy domain") Signed-off-by: Paul Burton <paul.burton@imgtec.com> Cc: linux-mips@linux-mips.org Cc: Jason Cooper <jason@lakedaemon.net> Cc: Qais Yousef <qsyousef@gmail.com> Cc: stable@vger.kernel.org Cc: Marc Zyngier <marc.zyngier@arm.com> Link: http://lkml.kernel.org/r/20160913165335.31389-1-paul.burton@imgtec.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This happens because the code introduced in this commit dereferences the
debug store pointer unconditionally. The debug store is not guaranteed to
be available, so a NULL pointer check as on other places is required.
Fixes: 4d4c47412464 ("perf/x86/intel/bts: Fix BTS PMI detection") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: vince@deater.net Cc: eranian@google.com Link: http://lkml.kernel.org/r/20160920131220.xg5pbdjtznszuyzb@breakpoint.cc Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Matt Fleming [Mon, 19 Sep 2016 12:09:09 +0000 (13:09 +0100)]
x86/efi: Only map RAM into EFI page tables if in mixed-mode
Waiman reported that booting with CONFIG_EFI_MIXED enabled on his
multi-terabyte HP machine results in boot crashes, because the EFI
region mapping functions loop forever while trying to map those
regions describing RAM.
While this patch doesn't fix the underlying hang, there's really no
reason to map EFI_CONVENTIONAL_MEMORY regions into the EFI page tables
when mixed-mode is not in use at runtime.
Matt Fleming [Tue, 20 Sep 2016 13:26:21 +0000 (14:26 +0100)]
x86/mm/pat: Prevent hang during boot when mapping pages
There's a mixture of signed 32-bit and unsigned 32-bit and 64-bit data
types used for keeping track of how many pages have been mapped.
This leads to hangs during boot when mapping large numbers of pages
(multiple terabytes, as reported by Waiman) because those values are
interpreted as being negative.
commit 742563777e8d ("x86/mm/pat: Avoid truncation when converting
cpa->numpages to address") fixed one of those bugs, but there is
another lurking in __change_page_attr_set_clr().
Additionally, the return value type for the populate_*() functions can
return negative values when a large number of pages have been mapped,
triggering the error paths even though no error occurred.
Consistently use 64-bit types on 64-bit platforms when counting pages.
Even in the signed case this gives us room for regions 8PiB
(pebibytes) in size whilst still allowing the usual negative value
error checking idiom.
Commit fe56b9e6a8d95 ("qed: Add module with basic common support")
has introduced a stack corruption during probe, where filling a
local struct with data to be sent to management firmware is incorrectly
filled; The data is written outside of the struct and corrupts
the stack.
Changes from v1:
----------------
- Correct the value written [Caught by David Laight]
Fixes: fe56b9e6a8d95 ("qed: Add module with basic common support") Signed-off-by: Yuval Mintz <Yuval.Mintz@caviumnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Sun, 18 Sep 2016 19:17:19 +0000 (21:17 +0200)]
MAINTAINERS: Add an entry for the core network DSA code
The core distributed switch architecture code currently does not have
a MAINTAINERS entry, which results in some contributions not landing
in the right peoples inbox.
Signed-off-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vincent Bernat [Sun, 18 Sep 2016 15:46:07 +0000 (17:46 +0200)]
net: ipv6: fallback to full lookup if table lookup is unsuitable
Commit 8c14586fc320 ("net: ipv6: Use passed in table for nexthop
lookups") introduced a regression: insertion of an IPv6 route in a table
not containing the appropriate connected route for the gateway but which
contained a non-connected route (like a default gateway) fails while it
was previously working:
$ ip link add eth0 type dummy
$ ip link set up dev eth0
$ ip addr add 2001:db8::1/64 dev eth0
$ ip route add ::/0 via 2001:db8::5 dev eth0 table 20
$ ip route add 2001:db8:cafe::1/128 via 2001:db8::6 dev eth0 table 20
RTNETLINK answers: No route to host
$ ip -6 route show table 20
default via 2001:db8::5 dev eth0 metric 1024 pref medium
After this patch, we get:
$ ip route add 2001:db8:cafe::1/128 via 2001:db8::6 dev eth0 table 20
$ ip -6 route show table 20
2001:db8:cafe::1 via 2001:db8::6 dev eth0 metric 1024 pref medium
default via 2001:db8::5 dev eth0 metric 1024 pref medium
Fixes: 8c14586fc320 ("net: ipv6: Use passed in table for nexthop lookups") Signed-off-by: Vincent Bernat <vincent@bernat.im> Acked-by: David Ahern <dsa@cumulusnetworks.com> Tested-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 20 Sep 2016 02:10:25 +0000 (22:10 -0400)]
Merge branch 'mlx5-fixes'
Or Gerlitz says:
====================
mlx5 fixes to 4.8-rc6
This series series has a fix from Roi to memory corruption bug in
the bulk flow counters code and two late and hopefully last fixes
from me to the new eswitch offloads code.
Series done over net commit 37dd348 "bna: fix crash in bnad_get_strings()"
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Or Gerlitz [Sun, 18 Sep 2016 15:20:29 +0000 (18:20 +0300)]
net/mlx5: E-Switch, Handle mode change failures
E-switch mode changes involve creating HW tables, potentially allocating
netdevices, etc, and things can fail. Add an attempt to rollback to the
existing mode when changing to the new mode fails. Only if rollback fails,
getting proper SRIOV functionality requires module unload or sriov
disablement/enablement.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Or Gerlitz [Sun, 18 Sep 2016 15:20:28 +0000 (18:20 +0300)]
net/mlx5: E-Switch, Fix error flow in the SRIOV e-switch init code
When enablement of the SRIOV e-switch in certain mode (switchdev or legacy)
fails, we must set the mode to none. Otherwise, we'll run into double free
based crashes when further attempting to deal with the e-switch (such
as when disabling sriov or unloading the driver).
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Roi Dayan [Sun, 18 Sep 2016 15:20:27 +0000 (18:20 +0300)]
net/mlx5: Fix flow counter bulk command out mailbox allocation
The FW command output length should be only the length of struct
mlx5_cmd_fc_bulk out field. Failing to do so will cause the memcpy
call which is invoked later in the driver to write over wrong memory
address and corrupt kernel memory which results in random crashes.
This bug was found using the kernel address sanitizer (kasan).
Fixes: a351a1b03bf1 ('net/mlx5: Introduce bulk reading of flow counters') Signed-off-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
gic_raise_softirq() walks the list of cpus using for_each_cpu(), it calls
gic_compute_target_list() which advances the iterator by the number of
CPUs in the cluster.
If gic_compute_target_list() reaches the last CPU it leaves the iterator
pointing at the last CPU. This means the next time round the for_each_cpu()
loop cpumask_next() will be called with an invalid CPU.
This triggers a warning when built with CONFIG_DEBUG_PER_CPU_MAPS:
[ 3.077738] GICv3: CPU1: found redistributor 1 region 0:0x000000002f120000
[ 3.077943] CPU1: Booted secondary processor [410fd0f0]
[ 3.078542] ------------[ cut here ]------------
[ 3.078746] WARNING: CPU: 1 PID: 0 at ../include/linux/cpumask.h:121 gic_raise_softirq+0x12c/0x170
[ 3.078812] Modules linked in:
[ 3.078869]
[ 3.078930] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.8.0-rc5+ #5188
[ 3.078994] Hardware name: Foundation-v8A (DT)
[ 3.079059] task: ffff80087a1a0080 task.stack: ffff80087a19c000
[ 3.079145] PC is at gic_raise_softirq+0x12c/0x170
[ 3.079226] LR is at gic_raise_softirq+0xa4/0x170
[ 3.079296] pc : [<ffff0000083ead24>] lr : [<ffff0000083eac9c>] pstate: 200001c9
[ 3.081139] Call trace:
[ 3.081202] Exception stack(0xffff80087a19fbe0 to 0xffff80087a19fd10)
Avoid updating the iterator if the next call to cpumask_next() would
cause the for_each_cpu() loop to exit.
There is no change to gic_raise_softirq()'s behaviour, (cpumask_next()s
eventual call to _find_next_bit() will return early as start >= nbits),
this patch just silences the warning.
Fixes: 021f653791ad ("irqchip: gic-v3: Initial support for GICv3") Signed-off-by: James Morse <james.morse@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: Jason Cooper <jason@lakedaemon.net> Link: http://lkml.kernel.org/r/1474306155-3303-1-git-send-email-james.morse@arm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
rapidio/rio_cm: avoid GFP_KERNEL in atomic context
Revert "ocfs2: bump up o2cb network protocol version"
ocfs2: fix start offset to ocfs2_zero_range_for_truncate()
cgroup: duplicate cgroup reference when cloning sockets
mm: memcontrol: make per-cpu charge cache IRQ-safe for socket accounting
ocfs2: fix double unlock in case retry after free truncate log
fanotify: fix list corruption in fanotify_get_response()
fsnotify: add a way to stop queueing events on group shutdown
ocfs2: fix trans extend while free cached blocks
ocfs2: fix trans extend while flush truncate log
ipc/shm: fix crash if CONFIG_SHMEM is not set
mm: fix the page_swap_info() BUG_ON check
autofs: use dentry flags to block walks during expire
MAINTAINERS: update email for VLYNQ bus entry
mm: avoid endless recursion in dump_page()
mm, thp: fix leaking mapped pte in __collapse_huge_page_swapin()
khugepaged: fix use-after-free in collapse_huge_page()
MAINTAINERS: Maik has moved
ocfs2/dlm: fix race between convert and migration
mem-hotplug: don't clear the only node in new_node_page()
rapidio/rio_cm: avoid GFP_KERNEL in atomic context
As reported by Alexey Khoroshilov (https://lkml.org/lkml/2016/9/9/737):
riocm_send_close() is called from rio_cm_shutdown() under
spin_lock_bh(idr_lock), but riocm_send_close() uses a GFP_KERNEL
allocation.
Fix by taking riocm_send_close() outside of spinlock protected code.
Junxiao Bi [Mon, 19 Sep 2016 21:44:44 +0000 (14:44 -0700)]
Revert "ocfs2: bump up o2cb network protocol version"
This reverts commit 38b52efd218b ("ocfs2: bump up o2cb network protocol
version").
This commit made rolling upgrade fail. When one node is upgraded to new
version with this commit, the remaining nodes will fail to establish
connections to it, then the application like VMs on the remaining nodes
can't be live migrated to the upgraded one. This will cause an outage.
Since negotiate hb timeout behavior didn't change without this commit,
so revert it.
Fixes: 38b52efd218bf ("ocfs2: bump up o2cb network protocol version") Link: http://lkml.kernel.org/r/1471396924-10375-1-git-send-email-junxiao.bi@oracle.com Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Joseph Qi <joseph.qi@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ocfs2: fix start offset to ocfs2_zero_range_for_truncate()
If we punch a hole on a reflink such that following conditions are met:
1. start offset is on a cluster boundary
2. end offset is not on a cluster boundary
3. (end offset is somewhere in another extent) or
(hole range > MAX_CONTIG_BYTES(1MB)),
we dont COW the first cluster starting at the start offset. But in this
case, we were wrongly passing this cluster to
ocfs2_zero_range_for_truncate() to zero out. This will modify the
cluster in place and zero it in the source too.
Fix this by skipping this cluster in such a scenario.
To reproduce:
1. Create a random file of say 10 MB
xfs_io -c 'pwrite -b 4k 0 10M' -f 10MBfile
2. Reflink it
reflink -f 10MBfile reflnktest
3. Punch a hole at starting at cluster boundary with range greater that
1MB. You can also use a range that will put the end offset in another
extent.
fallocate -p -o 0 -l 1048615 reflnktest
4. sync
5. Check the first cluster in the source file. (It will be zeroed out).
dd if=10MBfile iflag=direct bs=<cluster size> count=1 | hexdump -C
Link: http://lkml.kernel.org/r/1470957147-14185-1-git-send-email-ashish.samant@oracle.com Signed-off-by: Ashish Samant <ashish.samant@oracle.com> Reported-by: Saar Maoz <saar.maoz@oracle.com> Reviewed-by: Srinivas Eeda <srinivas.eeda@oracle.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <joseph.qi@huawei.com> Cc: Eric Ren <zren@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Johannes Weiner [Mon, 19 Sep 2016 21:44:38 +0000 (14:44 -0700)]
cgroup: duplicate cgroup reference when cloning sockets
When a socket is cloned, the associated sock_cgroup_data is duplicated
but not its reference on the cgroup. As a result, the cgroup reference
count will underflow when both sockets are destroyed later on.
Fixes: bd1060a1d671 ("sock, cgroup: add sock->sk_cgroup") Link: http://lkml.kernel.org/r/20160914194846.11153-2-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Tejun Heo <tj@kernel.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Vladimir Davydov <vdavydov@virtuozzo.com> Cc: <stable@vger.kernel.org> [4.5+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Johannes Weiner [Mon, 19 Sep 2016 21:44:36 +0000 (14:44 -0700)]
mm: memcontrol: make per-cpu charge cache IRQ-safe for socket accounting
During cgroup2 rollout into production, we started encountering css
refcount underflows and css access crashes in the memory controller.
Splitting the heavily shared css reference counter into logical users
narrowed the imbalance down to the cgroup2 socket memory accounting.
The problem turns out to be the per-cpu charge cache. Cgroup1 had a
separate socket counter, but the new cgroup2 socket accounting goes
through the common charge path that uses a shared per-cpu cache for all
memory that is being tracked. Those caches are safe against scheduling
preemption, but not against interrupts - such as the newly added packet
receive path. When cache draining is interrupted by network RX taking
pages out of the cache, the resuming drain operation will put references
of in-use pages, thus causing the imbalance.
Disable IRQs during all per-cpu charge cache operations.
Fixes: f7e1cb6ec51b ("mm: memcontrol: account socket memory in unified hierarchy memory controller") Link: http://lkml.kernel.org/r/20160914194846.11153-1-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Tejun Heo <tj@kernel.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Michal Hocko <mhocko@suse.cz> Cc: Vladimir Davydov <vdavydov@virtuozzo.com> Cc: <stable@vger.kernel.org> [4.5+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joseph Qi [Mon, 19 Sep 2016 21:44:33 +0000 (14:44 -0700)]
ocfs2: fix double unlock in case retry after free truncate log
If ocfs2_reserve_cluster_bitmap_bits() fails with ENOSPC, it will try to
free truncate log and then retry. Since ocfs2_try_to_free_truncate_log
will lock/unlock global bitmap inode, we have to unlock it before
calling this function. But when retry reserve and it fails with no
global bitmap inode lock taken, it will unlock again in error handling
branch and BUG.
This issue also exists if no need retry and then ocfs2_inode_lock fails.
So fix it.
Fixes: 2070ad1aebff ("ocfs2: retry on ENOSPC if sufficient space in truncate log") Link: http://lkml.kernel.org/r/57D91939.6030809@huawei.com Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Jiufei Xue <xuejiufei@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Kara [Mon, 19 Sep 2016 21:44:30 +0000 (14:44 -0700)]
fanotify: fix list corruption in fanotify_get_response()
fanotify_get_response() calls fsnotify_remove_event() when it finds that
group is being released from fanotify_release() (bypass_perm is set).
However the event it removes need not be only in the group's notification
queue but it can have already moved to access_list (userspace read the
event before closing the fanotify instance fd) which is protected by a
different lock. Thus when fsnotify_remove_event() races with
fanotify_release() operating on access_list, the list can get corrupted.
Fix the problem by moving all the logic removing permission events from
the lists to one place - fanotify_release().
Fixes: 5838d4442bd5 ("fanotify: fix double free of pending permission events") Link: http://lkml.kernel.org/r/1473797711-14111-3-git-send-email-jack@suse.cz Signed-off-by: Jan Kara <jack@suse.cz> Reported-by: Miklos Szeredi <mszeredi@redhat.com> Tested-by: Miklos Szeredi <mszeredi@redhat.com> Reviewed-by: Miklos Szeredi <mszeredi@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Junxiao Bi [Mon, 19 Sep 2016 21:44:24 +0000 (14:44 -0700)]
ocfs2: fix trans extend while free cached blocks
The root cause of this issue is the same with the one fixed by the last
patch, but this time credits for allocator inode and group descriptor
may not be consumed before trans extend.
Junxiao Bi [Mon, 19 Sep 2016 21:44:21 +0000 (14:44 -0700)]
ocfs2: fix trans extend while flush truncate log
Every time, ocfs2_extend_trans() included a credit for truncate log
inode, but as that inode had been managed by jbd2 running transaction
first time, it will not consume that credit until
jbd2_journal_restart().
Since total credits to extend always included the un-consumed ones,
there will be more and more un-consumed credit, at last
jbd2_journal_restart() will fail due to credit number over the half of
max transction credit.
The following error was caught when unlinking a large file with many
extents:
Commit c01d5b300774 ("shmem: get_unmapped_area align huge page") makes
use of shm_get_unmapped_area() in shm_file_operations() unconditional to
CONFIG_MMU.
As Tony Battersby pointed this can lead NULL-pointer dereference on
machine with CONFIG_MMU=y and CONFIG_SHMEM=n. In this case ipc/shm is
backed by ramfs which doesn't provide f_op->get_unmapped_area for
configurations with MMU.
The solution is to provide dummy f_op->get_unmapped_area for ramfs when
CONFIG_MMU=y, which just call current->mm->get_unmapped_area().
Fixes: c01d5b300774 ("shmem: get_unmapped_area align huge page") Link: http://lkml.kernel.org/r/20160912102704.140442-1-kirill.shutemov@linux.intel.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reported-by: Tony Battersby <tonyb@cybernetics.com> Tested-by: Tony Battersby <tonyb@cybernetics.com> Cc: Hugh Dickins <hughd@google.com> Cc: <stable@vger.kernel.org> [4.7.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 62c230bc1790 ("mm: add support for a filesystem to activate
swap files and use direct_IO for writing swap pages") replaced the
swap_aops dirty hook from __set_page_dirty_no_writeback() with
swap_set_page_dirty().
For normal cases without these special SWP flags code path falls back to
__set_page_dirty_no_writeback() so the behaviour is expected to be the
same as before.
But swap_set_page_dirty() makes use of the page_swap_info() helper to
get the swap_info_struct to check for the flags like SWP_FILE,
SWP_BLKDEV etc as desired for those features. This helper has
BUG_ON(!PageSwapCache(page)) which is racy and safe only for the
set_page_dirty_lock() path.
For the set_page_dirty() path which is often needed for cases to be
called from irq context, kswapd() can toggle the flag behind the back
while the call is getting executed when system is low on memory and
heavy swapping is ongoing.
This ends up with undesired kernel panic.
This patch just moves the check outside the helper to its users
appropriately to fix kernel panic for the described path. Couple of
users of helpers already take care of SwapCache condition so I skipped
them.
Link: http://lkml.kernel.org/r/1473460718-31013-1-git-send-email-santosh.shilimkar@oracle.com Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Joe Perches <joe@perches.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Rik van Riel <riel@redhat.com> Cc: David S. Miller <davem@davemloft.net> Cc: Jens Axboe <axboe@fb.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Hugh Dickins <hughd@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: <stable@vger.kernel.org> [4.7.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ian Kent [Mon, 19 Sep 2016 21:44:12 +0000 (14:44 -0700)]
autofs: use dentry flags to block walks during expire
Somewhere along the way the autofs expire operation has changed to hold
a spin lock over expired dentry selection. The autofs indirect mount
expired dentry selection is complicated and quite lengthy so it isn't
appropriate to hold a spin lock over the operation.
Commit 47be61845c77 ("fs/dcache.c: avoid soft-lockup in dput()") added a
might_sleep() to dput() causing a WARN_ONCE() about this usage to be
issued.
But the spin lock doesn't need to be held over this check, the autofs
dentry info. flags are enough to block walks into dentrys during the
expire.
I've left the direct mount expire as it is (for now) because it is much
simpler and quicker than the indirect mount expire and adding spin lock
release and re-aquires would do nothing more than add overhead.
Fixes: 47be61845c77 ("fs/dcache.c: avoid soft-lockup in dput()") Link: http://lkml.kernel.org/r/20160912014017.1773.73060.stgit@pluto.themaw.net Signed-off-by: Ian Kent <raven@themaw.net> Reported-by: Takashi Iwai <tiwai@suse.de> Tested-by: Takashi Iwai <tiwai@suse.de> Cc: Takashi Iwai <tiwai@suse.de> Cc: NeilBrown <neilb@suse.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
dump_page() uses page_mapcount() to get mapcount of the page.
page_mapcount() has VM_BUG_ON_PAGE(PageSlab(page)) as mapcount doesn't
make sense for slab pages and the field in struct page used for other
information.
It leads to recursion if dump_page() called for slub page and DEBUG_VM
is enabled:
mm, thp: fix leaking mapped pte in __collapse_huge_page_swapin()
Currently, khugepaged does not permit swapin if there are enough young
pages in a THP. The problem is when a THP does not have enough young
pages, khugepaged leaks mapped ptes.
This patch prohibits leaking mapped ptes.
Link: http://lkml.kernel.org/r/1472820276-7831-1-git-send-email-ebru.akagunduz@gmail.com Signed-off-by: Ebru Akagunduz <ebru.akagunduz@gmail.com> Suggested-by: Andrea Arcangeli <aarcange@redhat.com> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
khugepaged: fix use-after-free in collapse_huge_page()
hugepage_vma_revalidate() tries to re-check if we still should try to
collapse small pages into huge one after the re-acquiring mmap_sem.
The problem Dmitry Vyukov reported[1] is that the vma found by
hugepage_vma_revalidate() can be suitable for huge pages, but not the
same vma we had before dropping mmap_sem. And dereferencing original
vma can lead to fun results..
Let's use vma hugepage_vma_revalidate() found instead of assuming it's the
same as what we had before the lock was dropped.
Joseph Qi [Mon, 19 Sep 2016 21:43:55 +0000 (14:43 -0700)]
ocfs2/dlm: fix race between convert and migration
Commit ac7cf246dfdb ("ocfs2/dlm: fix race between convert and recovery")
checks if lockres master has changed to identify whether new master has
finished recovery or not. This will introduce a race that right after
old master does umount ( means master will change), a new convert
request comes.
In this case, it will reset lockres state to DLM_RECOVERING and then
retry convert, and then fail with lockres->l_action being set to
OCFS2_AST_INVALID, which will cause inconsistent lock level between
ocfs2 and dlm, and then finally BUG.
Since dlm recovery will clear lock->convert_pending in
dlm_move_lockres_to_recovery_list, we can use it to correctly identify
the race case between convert and recovery. So fix it.
Fixes: ac7cf246dfdb ("ocfs2/dlm: fix race between convert and recovery") Link: http://lkml.kernel.org/r/57CE1569.8010704@huawei.com Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Jun Piao <piaojun@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Li Zhong [Mon, 19 Sep 2016 21:43:52 +0000 (14:43 -0700)]
mem-hotplug: don't clear the only node in new_node_page()
Commit 394e31d2ceb4 ("mem-hotplug: alloc new page from a nearest
neighbor node when mem-offline") introduced new_node_page() for memory
hotplug.
In new_node_page(), the nid is cleared before calling
__alloc_pages_nodemask(). But if it is the only node of the system, and
the first round allocation fails, it will not be able to get memory from
an empty nodemask, and will trigger oom.
The patch checks whether it is the last node on the system, and if it
is, then don't clear the nid in the nodemask.
Fixes: 394e31d2ceb4 ("mem-hotplug: alloc new page from a nearest neighbor node when mem-offline") Link: http://lkml.kernel.org/r/1473044391.4250.19.camel@TP420 Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com> Reported-by: John Allen <jallen@linux.vnet.ibm.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Xishi Qiu <qiuxishi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
scripts/faddr2line: improve on base path filtering a bit
Due to our compiler include directives, the build pathnames for header
files often end up being of the form "$srcdir/./include/linux/xyz.h",
which ends up having that extra "." path component after the build base
in it.
Teach faddr2line to skip that too, to make code generated in inline
functions in header files match the filename for the regular C files.
Rabin Vincent pointed out that I can't make a stricter regexp match by
using the " at " prefix for the pathname, because that ends up being
locale-dependent. But this does require that the path match be preceded
by a space, to make it a bit more strict (that matters mainly if we
didn't find any base_dir at all, and we only end up with the "./" part
of the match)
blk-throttle: Extend slice if throttle group is not empty
Right now, if slice is expired, we start a new slice. If a bio is
queued, we keep on extending slice by throtle_slice interval (100ms).
This worked well as long as pending timer function got executed with-in
few milli seconds of scheduled time. But looks like with recent changes
in timer subsystem, slack can be much longer depending on the expiry time
of the scheduled timer.
commit 500462a9de65 ("timers: Switch to a non-cascading wheel")
This means, by the time timer function gets executed, it is possible the
delay from scheduled time is more than 100ms. That means current code
will conclude that existing slice has expired and a new one needs to
be started. New slice will be 100ms by default and that will not be
sufficient to meet rate requirement of group given the bio size and
bio will not be dispatched and we will start a new timer function to
wait. And when that timer expires, same process will repeat and we
will wait again and this can easily be an infinite loop.
Solve this issue by starting a new slice only if throttle gropup is
empty. If it is not empty, that means there should be an active slice
going on. Ideally it should not be expired but given the slack, it is
possible that it has expired.
Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
"This fixes a potential weakness in IPsec CBC IV generation, as well as
a number of issues that arose out of an OOM crash on ARM with CTR-mode
AES"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: arm64/aes-ctr - fix NULL dereference in tail processing
crypto: arm/aes-ctr - fix NULL dereference in tail processing
crypto: skcipher - Fix blkcipher walk OOM crash
crypto: echainiv - Replace chaining with multiplication
Merge tag 'drm-fixes-for-4.8-rc7' of git://people.freedesktop.org/~airlied/linux
Pull exynos and one stable ABI fix from Dave Airlie:
"One important drm 32/64 ABI fix came in so I'll dequeue what I have,
the rest is just exynos runtime pm fixes, but the net removal of code
seems like a win to me.
I'm going to be sporadic this week due to school holidays, so if
anything urgent turns up, Daniel will take care of it"
* tag 'drm-fixes-for-4.8-rc7' of git://people.freedesktop.org/~airlied/linux:
drm: Only use compat ioctl for addfb2 on X86/IA64
Subject: [PATCH, RESEND] drm: exynos: avoid unused function warning
drm/exynos: g2d: fix system and runtime pm integration
drm/exynos: rotator: fix system and runtime pm integration
drm/exynos: gsc: fix system and runtime pm integration
drm/exynos: fimc: fix system and runtime pm integration
exynos-drm: Fix unsupported GEM memory type error message to be clear