Linus Torvalds [Sat, 29 Oct 2016 00:02:58 +0000 (17:02 -0700)]
Merge tag 'arc-4.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc
Pull ARC updates from Vineet Gupta:
- support IDU intc for UP builds
- support gz, lzma compressed uImage [Daniel Mentz]
- adjust /proc/cpuinfo for non-continuous cpu ids [Noam Camus]
- syscall for userspace cmpxchg assist for configs lacking hardware atomics
- rework of boot log printing mainly for identifying older arc700 cores
- retiring some old code, build toggles
* tag 'arc-4.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
ARC: module: print pretty section names
ARC: module: elide loop to save reference to .eh_frame
ARC: mm: retire ARC_DBG_TLB_MISS_COUNT...
ARC: build: retire old toggles
ARC: boot log: refactor cpu name/release printing
ARC: boot log: remove awkward space comma from MMU line
ARC: boot log: don't assume SWAPE instruction support
ARC: boot log: refactor printing abt features not captured in BCRs
ARCv2: boot log: print IOC exists as well as enabled status
ARCv2: IOC: use @ioc_enable not @ioc_exist where intended
ARC: syscall for userspace cmpxchg assist
ARC: fix build warning in elf.h
ARC: Adjust cpuinfo for non-continuous cpu ids
ARC: [build] Support gz, lzma compressed uImage
ARCv2: intc: untangle SMP, MCIP and IDU
Linus Torvalds [Fri, 28 Oct 2016 23:52:28 +0000 (16:52 -0700)]
Merge tag 'powerpc-4.9-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"Fixes marked for stable:
- Convert cmp to cmpd in idle enter sequence (Segher Boessenkool)
- cxl: Fix leaking pid refs in some error paths (Vaibhav Jain)
- Re-fix race condition between going idle and entering guest (Paul Mackerras)
- Fix race condition in setting lock bit in idle/wakeup code (Paul Mackerras)
- radix: Use tlbiel only if we ever ran on the current cpu (Aneesh Kumar K.V)
- relocation, register save fixes for system reset interrupt (Nicholas Piggin)
Fixes for code merged this cycle:
- Fix CONFIG_ALIVEC typo in restore_tm_state() (Valentin Rothberg)
- KVM: PPC: Book3S HV: Fix build error when SMP=n (Michael Ellerman)"
* tag 'powerpc-4.9-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/64s: relocation, register save fixes for system reset interrupt
powerpc/mm/radix: Use tlbiel only if we ever ran on the current cpu
powerpc/process: Fix CONFIG_ALIVEC typo in restore_tm_state()
powerpc/64: Fix race condition in setting lock bit in idle/wakeup code
powerpc/64: Re-fix race condition between going idle and entering guest
cxl: Fix leaking pid refs in some error paths
powerpc: Convert cmp to cmpd in idle enter sequence
KVM: PPC: Book3S HV: Fix build error when SMP=n
Linus Torvalds [Fri, 28 Oct 2016 23:27:16 +0000 (16:27 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"Misc kernel fixes: a virtualization environment related fix, an uncore
PMU driver removal handling fix, a PowerPC fix and new events for
Knights Landing"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel: Honour the CPUID for number of fixed counters in hypervisors
perf/powerpc: Don't call perf_event_disable() from atomic context
perf/core: Protect PMU device removal with a 'pmu_bus_running' check, to fix CONFIG_DEBUG_TEST_DRIVER_REMOVE=y kernel panic
perf/x86/intel/cstate: Add C-state residency events for Knights Landing
Linus Torvalds [Fri, 28 Oct 2016 18:47:45 +0000 (11:47 -0700)]
Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm fixes from Dan Williams:
"A build fix, a NULL de-reference found by static analysis, a misuse of
the percpu_ref_exit() (tagged for -stable), and notification of failed
attempts to clear media errors.
These patches have received a build success notification from the
0day- kbuild-robot and appeared in next-20161028"
* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
device-dax: fix percpu_ref_exit ordering
nvdimm: make CONFIG_NVDIMM_DAX 'bool'
pmem: report error on clear poison failure
libnvdimm, namespace: potential NULL deref on allocation error
Linus Torvalds [Fri, 28 Oct 2016 18:31:06 +0000 (11:31 -0700)]
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"Three arm64 fixes for -rc3. They're all pretty straightforward: a
couple of NUMA issues from the Huawei folks and a thinko in
__page_to_voff that seems to be benign, but is certainly better off
fixed.
Summary:
- couple of NUMA fixes
- thinko in __page_to_voff"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: mm: fix __page_to_voff definition
arm64/numa: fix incorrect log for memory-less node
arm64/numa: fix pcpu_cpu_distance() to get correct CPU proximity
Linus Torvalds [Fri, 28 Oct 2016 18:26:01 +0000 (11:26 -0700)]
Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Ingo Molnar:
"Fix four timer locking races: two were noticed by Linus while
reviewing the code while chasing for a corruption bug, and two
from fixing spurious USB timeouts"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timers: Prevent base clock corruption when forwarding
timers: Prevent base clock rewind when forwarding clock
timers: Lock base for same bucket optimization
timers: Plug locking race vs. timer migration
Linus Torvalds [Fri, 28 Oct 2016 17:12:27 +0000 (10:12 -0700)]
Merge branches 'core-urgent-for-linus', 'irq-urgent-for-linus' and 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull objtool, irq and scheduler fixes from Ingo Molnar:
"One more objtool fixlet for GCC6 code generation patterns, an irq
DocBook fix and an unused variable warning fix in the scheduler"
Vineet Gupta [Tue, 25 Oct 2016 17:43:20 +0000 (10:43 -0700)]
ARC: module: elide loop to save reference to .eh_frame
The loop was really needed in .debug_frame regime where wanted make it
as SH_ALLOC so that apply_relocate_add() would process it. That's not
needed for .eh_frame, so we check this in apply_relocate_add() which
gets called for each section.
Note that we need to save reference to "section name strings" section in
module_frob_arch_sections() since apply_relocate_add() doesn't get that
Vineet Gupta [Thu, 27 Oct 2016 21:33:19 +0000 (14:33 -0700)]
ARC: boot log: refactor cpu name/release printing
The motivation is to identify ARC750 vs. ARC770 (we currently print
generic "ARC700").
A given ARC700 release could be 750 or 770, with same ARCNUM (or family
identifier which is unfortunate). The existing arc_cpu_tbl[] kept a single
concatenated string for core name and release which thus doesn't work
for 750 vs. 770 identification.
So split this into 2 tables, one with core names and other with release.
And while we are at it, get rid of the range checking for family numbers.
We just document the known to exist cores running Linux and ditch
others.
With this in place, we add detection of ARC750 which is
- cores 0x33 and before
- cores 0x34 and later with MMUv2
Vineet Gupta [Fri, 21 Oct 2016 01:08:10 +0000 (18:08 -0700)]
ARC: boot log: don't assume SWAPE instruction support
This came to light when helping a customer with oldish ARC750 core who
were getting instruction errors because of lack of SWAPE but boot log
was incorrectly printing it as being present
Vineet Gupta [Fri, 21 Oct 2016 00:49:15 +0000 (17:49 -0700)]
ARC: boot log: refactor printing abt features not captured in BCRs
On older arc700 cores, some of the features configured were not present
in Build config registers. To print about them at boot, we just use the
Kconfig option i.e. whether linux is built to use them or not.
So yes this seems bogus, but what else can be done. Moreover if linux is
booting with these enabled, then the Kconfig info is a good indicator
anyways.
Over time these "hacks" accumulated in read_arc_build_cfg_regs() as well
as arc_cpu_mumbojumbo(). so refactor and move all of those in a single
place: read_arc_build_cfg_regs(). This causes some code redcution too:
Linus Torvalds [Fri, 28 Oct 2016 17:07:35 +0000 (10:07 -0700)]
Merge branch 'for-linus-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
"My patch fixes the btrfs list_head abuse that we tracked down during
Dave Jones' memory corruption investigation. With both Jens and my
patches in place, I'm no longer able to trigger problems.
Filipe is fixing a difficult old bug between snapshots, balance and
send. Dave is cooking a few more for the next rc, but these are tested
and ready"
* 'for-linus-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
btrfs: fix races on root_log_ctx lists
btrfs: fix incremental send failure caused by balance
Linus Torvalds [Fri, 28 Oct 2016 17:00:44 +0000 (10:00 -0700)]
Merge tag 'sound-4.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"This contains the usual stuff -- the fixups and quirks for HD-audio
and USB-audio, in addition to a bad regression fix in ALSA sequencer
timer since 4.8, and a trivial fix for asihpi PCI driver"
* tag 'sound-4.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: usb-audio: Add quirk for Syntek STK1160
ALSA: seq: Fix time account regression
ALSA: hda - Fix surround output pins for ASRock B150M mobo
ALSA: hda - Fix headset mic detection problem for two Dell laptops
ALSA: asihpi: fix kernel memory disclosure
ALSA: hda - Adding a new group of pin cfg into ALC295 pin quirk table
ALSA: hda - allow 40 bit DMA mask for NVidia devices
Linus Torvalds [Fri, 28 Oct 2016 16:36:07 +0000 (09:36 -0700)]
Merge tag 'drm-x86-pat-regression-fix' of git://people.freedesktop.org/~airlied/linux
Pull drm x86/pat regression fixes from Dave Airlie:
"This is a standalone pull request for the fix for a regression
introduced in -rc1 by a change to vm_insert_mixed to start using the
PAT range tracking to validate page protections. With this fix in
place, all the VRAM mappings for GPU drivers ended up at UC instead of
WC.
There are probably better ways to fix this long term, but nothing I'd
considered for -fixes that wouldn't need more settling in time. So
I've just created a new arch API that the drivers can reserve all
their VRAM aperture ranges as WC"
* tag 'drm-x86-pat-regression-fix' of git://people.freedesktop.org/~airlied/linux:
drm/drivers: add support for using the arch wc mapping API.
x86/io: add interface to reserve io memtype for a resource range. (v1.1)
Linus Torvalds [Fri, 28 Oct 2016 16:27:58 +0000 (09:27 -0700)]
Merge tag 'dm-4.9-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
- a couple DM raid and DM mirror fixes
- a couple .request_fn request-based DM NULL pointer fixes
- a fix for a DM target reference count leak, on target load error,
that prevented associated DM target kernel module(s) from being
removed
* tag 'dm-4.9-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm table: fix missing dm_put_target_type() in dm_table_add_target()
dm rq: clear kworker_task if kthread_run() returned an error
dm: free io_barrier after blk_cleanup_queue call
dm raid: fix activation of existing raid4/10 devices
dm mirror: use all available legs on multiple failures
dm mirror: fix read error on recovery after default leg failure
dm raid: fix compat_features validation
Linus Torvalds [Fri, 28 Oct 2016 16:23:59 +0000 (09:23 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull key fixes from James Morris:
- fix a buffer overflow when displaying /proc/keys [CVE-2016-7042].
- fix broken initialisation in the big_key implementation that can
result in an oops.
- make big_key depend on having a random number generator available in
Kconfig.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
security/keys: make BIG_KEYS dependent on stdrng.
KEYS: Sort out big_key initialisation
KEYS: Fix short sprintf buffer in /proc/keys show function
Imre Palik [Fri, 21 Oct 2016 08:18:59 +0000 (01:18 -0700)]
perf/x86/intel: Honour the CPUID for number of fixed counters in hypervisors
perf doesn't seem to honour the number of fixed counters specified by CPUID
leaf 0xa. It always assumes that Intel CPUs have at least 3 fixed counters.
So if some of the fixed counters are masked out by the hypervisor, it still
tries to check/set them.
This patch makes perf behave nicer when the kernel is running under a
hypervisor that doesn't expose all the counters.
This patch contains some ideas from Matt Wilson.
Signed-off-by: Imre Palik <imrep@amazon.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: Alexander Kozyrev <alexander.kozyrev@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Artyom Kuanbekov <artyom.kuanbekov@intel.com> Cc: David Carrillo-Cisneros <davidcc@google.com> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Wilson <msw@amazon.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1477037939-15605-1-git-send-email-imrep.amz@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
While it looks like the first WARN() is probably valid, the other one is
triggered by disabling event via perf_event_disable() from atomic context.
The event is disabled here in case we were not able to emulate
the instruction that hit the breakpoint. By disabling the event
we unschedule the event and make sure it's not scheduled back.
But we can't call perf_event_disable() from atomic context, instead
we need to use the event's pending_disable irq_work method to disable it.
Reported-by: Jan Stancek <jstancek@redhat.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael Neuling <mikey@neuling.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20161026094824.GA21397@krava Signed-off-by: Ingo Molnar <mingo@kernel.org>
The reason for the crash is that perf_pmu_unregister() tries to remove
a PMU device which is not added at this point. We add PMU devices
only after pmu_bus is registered, which happens in the
perf_event_sysfs_init() call and sets the 'pmu_bus_running' flag.
The fix is to get the 'pmu_bus_running' flag state at the point
the PMU is taken out of the PMU list and remove the device
later only if it's set.
Reported-by: CAI Qian <caiqian@redhat.com> Tested-by: CAI Qian <caiqian@redhat.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rob Herring <robh@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20161020111011.GA13361@krava Signed-off-by: Ingo Molnar <mingo@kernel.org>
Borislav Petkov [Thu, 27 Oct 2016 12:36:23 +0000 (14:36 +0200)]
x86/microcode/AMD: Fix more fallout from CONFIG_RANDOMIZE_MEMORY=y
We needed the physical address of the container in order to compute the
offset within the relocated ramdisk. And we did this by doing __pa() on
the virtual address.
However, __pa() does checks whether the physical address is within
PAGE_OFFSET and __START_KERNEL_map - see __phys_addr() - which fail
if we have CONFIG_RANDOMIZE_MEMORY enabled: we feed a virtual address
which *doesn't* have the randomization offset into a function which uses
PAGE_OFFSET which *does* have that offset.
Linus Torvalds [Fri, 28 Oct 2016 02:58:39 +0000 (19:58 -0700)]
Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
"20 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
drivers/misc/sgi-gru/grumain.c: remove bogus 0x prefix from printk
cris/arch-v32: cryptocop: print a hex number after a 0x prefix
ipack: print a hex number after a 0x prefix
block: DAC960: print a hex number after a 0x prefix
fs: exofs: print a hex number after a 0x prefix
lib/genalloc.c: start search from start of chunk
mm: memcontrol: do not recurse in direct reclaim
CREDITS: update credit information for Martin Kepplinger
proc: fix NULL dereference when reading /proc/<pid>/auxv
mm: kmemleak: ensure that the task stack is not freed during scanning
lib/stackdepot.c: bump stackdepot capacity from 16MB to 128MB
latent_entropy: raise CONFIG_FRAME_WARN by default
kconfig.h: remove config_enabled() macro
ipc: account for kmem usage on mqueue and msg
mm/slab: improve performance of gathering slabinfo stats
mm: page_alloc: use KERN_CONT where appropriate
mm/list_lru.c: avoid error-path NULL pointer deref
h8300: fix syscall restarting
kcov: properly check if we are in an interrupt
mm/slab: fix kmemcg cache creation delayed issue
Daniel Mentz [Fri, 28 Oct 2016 00:46:59 +0000 (17:46 -0700)]
lib/genalloc.c: start search from start of chunk
gen_pool_alloc_algo() iterates over the chunks of a pool trying to find
a contiguous block of memory that satisfies the allocation request.
The shortcut
if (size > atomic_read(&chunk->avail))
continue;
makes the loop skip over chunks that do not have enough bytes left to
fulfill the request. There are two situations, though, where an
allocation might still fail:
(1) The available memory is not contiguous, i.e. the request cannot
be fulfilled due to external fragmentation.
(2) A race condition. Another thread runs the same code concurrently
and is quicker to grab the available memory.
In those situations, the loop calls pool->algo() to search the entire
chunk, and pool->algo() returns some value that is >= end_bit to
indicate that the search failed. This return value is then assigned to
start_bit. The variables start_bit and end_bit describe the range that
should be searched, and this range should be reset for every chunk that
is searched. Today, the code fails to reset start_bit to 0. As a
result, prefixes of subsequent chunks are ignored. Memory allocations
might fail even though there is plenty of room left in these prefixes of
those other chunks.
Fixes: 7f184275aa30 ("lib, Make gen_pool memory allocator lockless") Link: http://lkml.kernel.org/r/1477420604-28918-1-git-send-email-danielmentz@google.com Signed-off-by: Daniel Mentz <danielmentz@google.com> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Johannes Weiner [Fri, 28 Oct 2016 00:46:56 +0000 (17:46 -0700)]
mm: memcontrol: do not recurse in direct reclaim
On 4.0, we saw a stack corruption from a page fault entering direct
memory cgroup reclaim, calling into btrfs_releasepage(), which then
tried to allocate an extent and recursed back into a kmem charge ad
nauseam:
On later kernels, kmem charging is opt-in rather than opt-out, and that
particular kmem allocation in btrfs_releasepage() is no longer being
charged and won't recurse and overrun the stack anymore.
But it's not impossible for an accounted allocation to happen from the
memcg direct reclaim context, and we needed to reproduce this crash many
times before we even got a useful stack trace out of it.
Like other direct reclaimers, mark tasks in memcg reclaim PF_MEMALLOC to
avoid recursing into any other form of direct reclaim. Then let
recursive charges from PF_MEMALLOC contexts bypass the cgroup limit.
Link: http://lkml.kernel.org/r/20161025141050.GA13019@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Tejun Heo <tj@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Leon Yu [Fri, 28 Oct 2016 00:46:50 +0000 (17:46 -0700)]
proc: fix NULL dereference when reading /proc/<pid>/auxv
Reading auxv of any kernel thread results in NULL pointer dereferencing
in auxv_read() where mm can be NULL. Fix that by checking for NULL mm
and bailing out early. This is also the original behavior changed by
recent commit c5317167854e ("proc: switch auxv to use of __mem_open()").
# cat /proc/2/auxv
Unable to handle kernel NULL pointer dereference at virtual address 000000a8
Internal error: Oops: 17 [#1] PREEMPT SMP ARM
CPU: 3 PID: 113 Comm: cat Not tainted 4.9.0-rc1-ARCH+ #1
Hardware name: BCM2709
task: ea3b0b00 task.stack: e99b2000
PC is at auxv_read+0x24/0x4c
LR is at do_readv_writev+0x2fc/0x37c
Process cat (pid: 113, stack limit = 0xe99b2210)
Call chain:
auxv_read
do_readv_writev
vfs_readv
default_file_splice_read
splice_direct_to_actor
do_splice_direct
do_sendfile
SyS_sendfile64
ret_fast_syscall
Fixes: c5317167854e ("proc: switch auxv to use of __mem_open()") Link: http://lkml.kernel.org/r/1476966200-14457-1-git-send-email-chianglungyu@gmail.com Signed-off-by: Leon Yu <chianglungyu@gmail.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Kees Cook <keescook@chromium.org> Cc: John Stultz <john.stultz@linaro.org> Cc: Mateusz Guzik <mguzik@redhat.com> Cc: Janis Danisevskis <jdanis@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Catalin Marinas [Fri, 28 Oct 2016 00:46:47 +0000 (17:46 -0700)]
mm: kmemleak: ensure that the task stack is not freed during scanning
Commit 68f24b08ee89 ("sched/core: Free the stack early if
CONFIG_THREAD_INFO_IN_TASK") may cause the task->stack to be freed
during kmemleak_scan() execution, leading to either a NULL pointer fault
(if task->stack is NULL) or kmemleak accessing already freed memory.
This patch uses the new try_get_task_stack() API to ensure that the task
stack is not freed during kmemleak stack scanning.
Dmitry Vyukov [Fri, 28 Oct 2016 00:46:44 +0000 (17:46 -0700)]
lib/stackdepot.c: bump stackdepot capacity from 16MB to 128MB
KASAN uses stackdepot to memorize stacks for all kmalloc/kfree calls.
Current stackdepot capacity is 16MB (1024 top level entries x 4 pages on
second level). Size of each stack is (num_frames + 3) * sizeof(long).
Which gives us ~84K stacks. This capacity was chosen empirically and it
is enough to run kernel normally.
However, when lots of configs are enabled and a fuzzer tries to maximize
code coverage, it easily hits the limit within tens of minutes. I've
tested for long a time with number of top level entries bumped 4x
(4096). And I think I've seen overflow only once. But I don't have all
configs enabled and code coverage has not reached maximum yet. So bump
it 8x to 8192.
Since we have two-level table, memory cost of this is very moderate --
currently the top-level table is 8KB, with this patch it is 64KB, which
is negligible under KASAN.
Here is some approx math.
128MB allows us to memorize ~670K stacks (assuming stack is ~200b).
I've grepped kernel for kmalloc|kfree|kmem_cache_alloc|kmem_cache_free|
kzalloc|kstrdup|kstrndup|kmemdup and it gives ~60K matches. Most of
alloc/free call sites are reachable with only one stack. But some
utility functions can have large fanout. Assuming average fanout is 5x,
total number of alloc/free stacks is ~300K.
Link: http://lkml.kernel.org/r/1476458416-122131-1-git-send-email-dvyukov@google.com Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Baozeng Ding <sploving1@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kees Cook [Fri, 28 Oct 2016 00:46:41 +0000 (17:46 -0700)]
latent_entropy: raise CONFIG_FRAME_WARN by default
When building with the latent_entropy plugin, set the default
CONFIG_FRAME_WARN to 2048, since some __init functions have many basic
blocks that, when instrumented by the latent_entropy plugin, grow beyond
1024 byte stack size on 32-bit builds.
Link: http://lkml.kernel.org/r/20161018211216.GA39687@beast Signed-off-by: Kees Cook <keescook@chromium.org> Reported-by: kbuild test robot <fengguang.wu@intel.com> Cc: Emese Revfy <re.emese@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Michal Marek <mmarek@suse.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Masahiro Yamada [Fri, 28 Oct 2016 00:46:38 +0000 (17:46 -0700)]
kconfig.h: remove config_enabled() macro
The use of config_enabled() is ambiguous. For config options,
IS_ENABLED(), IS_REACHABLE(), etc. will make intention clearer.
Sometimes config_enabled() has been used for non-config options because
it is useful to check whether the given symbol is defined or not.
I have been tackling on deprecating config_enabled(), and now is the
time to finish this work.
Some new users have appeared for v4.9-rc1, but it is trivial to replace
them:
- arch/x86/mm/kaslr.c
replace config_enabled() with IS_ENABLED() because
CONFIG_X86_ESPFIX64 and CONFIG_EFI are boolean.
- include/asm-generic/export.h
replace config_enabled() with __is_defined().
Then, config_enabled() can be removed now.
Going forward, please use IS_ENABLED(), IS_REACHABLE(), etc. for config
options, and __is_defined() for non-config symbols.
Link: http://lkml.kernel.org/r/1476616078-32252-1-git-send-email-yamada.masahiro@socionext.com Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org> Cc: Peter Oberparleiter <oberpar@linux.vnet.ibm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Kees Cook <keescook@chromium.org> Cc: Michal Marek <mmarek@suse.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Garnier <thgarnie@google.com> Cc: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Aristeu Rozanski [Fri, 28 Oct 2016 00:46:35 +0000 (17:46 -0700)]
ipc: account for kmem usage on mqueue and msg
When kmem accounting switched from account by default to only account if
flagged by __GFP_ACCOUNT, IPC mqueue and messages was left out.
The production use case at hand is that mqueues should be customizable
via sysctls in Docker containers in a Kubernetes cluster. This can only
be safely allowed to the users of the cluster (without the risk that
they can cause resource shortage on a node, influencing other users'
containers) if all resources they control are bounded, i.e. accounted
for.
Link: http://lkml.kernel.org/r/1476806075-1210-1-git-send-email-arozansk@redhat.com Signed-off-by: Aristeu Rozanski <arozansk@redhat.com> Reported-by: Stefan Schimanski <sttts@redhat.com> Acked-by: Davidlohr Bueso <dave@stgolabs.net> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Stefan Schimanski <sttts@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
mm/slab: improve performance of gathering slabinfo stats
On large systems, when some slab caches grow to millions of objects (and
many gigabytes), running 'cat /proc/slabinfo' can take up to 1-2
seconds. During this time, interrupts are disabled while walking the
slab lists (slabs_full, slabs_partial, and slabs_free) for each node,
and this sometimes causes timeouts in other drivers (for instance,
Infiniband).
This patch optimizes 'cat /proc/slabinfo' by maintaining a counter for
total number of allocated slabs per node, per cache. This counter is
updated when a slab is created or destroyed. This enables us to skip
traversing the slabs_full list while gathering slabinfo statistics, and
since slabs_full tends to be the biggest list when the cache is large,
it results in a dramatic performance improvement. Getting slabinfo
statistics now only requires walking the slabs_free and slabs_partial
lists, and those lists are usually much smaller than slabs_full.
We tested this after growing the dentry cache to 70GB, and the
performance improved from 2s to 5ms.
Link: http://lkml.kernel.org/r/1472517876-26814-1-git-send-email-aruna.ramakrishna@oracle.com Signed-off-by: Aruna Ramakrishna <aruna.ramakrishna@oracle.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
As described in https://bugzilla.kernel.org/show_bug.cgi?id=177821:
After some analysis it seems to be that the problem is in alloc_super().
In case list_lru_init_memcg() fails it goes into destroy_super(), which
calls list_lru_destroy().
And in list_lru_init() we see that in case memcg_init_list_lru() fails,
lru->node is freed, but not set NULL, which then leads list_lru_destroy()
to believe it is initialized and call memcg_destroy_list_lru().
memcg_destroy_list_lru() in turn can access lru->node[i].memcg_lrus,
which is NULL.
[akpm@linux-foundation.org: add comment] Signed-off-by: Alexander Polakov <apolyakov@beget.ru> Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mark Rutland [Fri, 28 Oct 2016 00:46:24 +0000 (17:46 -0700)]
h8300: fix syscall restarting
Back in commit f56141e3e2d9 ("all arches, signal: move restart_block to
struct task_struct"), all architectures and core code were changed to
use task_struct::restart_block. However, when h8300 support was
subsequently restored in v4.2, it was not updated to account for this,
and maintains thread_info::restart_block, which is not kept in sync.
This patch drops the redundant restart_block from thread_info, and moves
h8300 to the common one in task_struct, ensuring that syscall restarting
always works as expected.
Fixes: f56141e3e2d9 ("all arches, signal: move restart_block to struct task_struct") Link: http://lkml.kernel.org/r/1476714934-11635-1-git-send-email-mark.rutland@arm.com Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: uclinux-h8-devel@lists.sourceforge.jp Cc: <stable@vger.kernel.org> [4.2+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrey Konovalov [Fri, 28 Oct 2016 00:46:21 +0000 (17:46 -0700)]
kcov: properly check if we are in an interrupt
in_interrupt() returns a nonzero value when we are either in an
interrupt or have bh disabled via local_bh_disable(). Since we are
interested in only ignoring coverage from actual interrupts, do a proper
check instead of just calling in_interrupt().
As a result of this change, kcov will start to collect coverage from
within local_bh_disable()/local_bh_enable() sections.
This issue is caused by kmemcg feature that try to create new set of
kmem_caches for each memcg. Recently, kmem_cache creation is slowed by
synchronize_sched() and futher kmem_cache creation is also delayed since
kmem_cache creation is synchronized by a global slab_mutex lock. So,
the number of kworker that try to create kmem_cache increases quietly.
synchronize_sched() is for lockless access to node's shared array but
it's not needed when a new kmem_cache is created. So, this patch rules
out that case.
Fixes: 801faf0db894 ("mm/slab: lockless decision to grow cache") Link: http://lkml.kernel.org/r/1475734855-4837-1-git-send-email-iamjoonsoo.kim@lge.com Reported-by: Doug Smythies <dsmythies@telus.net> Tested-by: Doug Smythies <dsmythies@telus.net> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dan Williams [Fri, 28 Oct 2016 00:04:05 +0000 (17:04 -0700)]
device-dax: fix percpu_ref_exit ordering
We need to wait until the percpu_ref is released before exit. Otherwise,
we sometimes lose the race and trigger this new warning that was added
in v4.9 (commit a67823c1ed10 "percpu-refcount: init ->confirm_switch
member properly"):
Cc: <stable@vger.kernel.org> Fixes: ab68f2622136 ("/dev/dax, pmem: direct access to persistent memory") Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Linus Torvalds [Thu, 27 Oct 2016 23:23:01 +0000 (16:23 -0700)]
Allow KASAN and HOTPLUG_MEMORY to co-exist when doing build testing
No, KASAN may not be able to co-exist with HOTPLUG_MEMORY at runtime,
but for build testing there is no reason not to allow them together.
This hopefully means better build coverage and fewer embarrasing silly
problems like the one fixed by commit 9db4f36e82c2 ("mm: remove unused
variable in memory hotplug") in the future.
Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Arnd Bergmann [Tue, 25 Oct 2016 15:52:04 +0000 (17:52 +0200)]
nvdimm: make CONFIG_NVDIMM_DAX 'bool'
A bugfix just tried to address a randconfig build problem and introduced
a variant of the same problem: with CONFIG_LIBNVDIMM=y and
CONFIG_NVDIMM_DAX=m, the nvdimm module now fails to link:
drivers/nvdimm/built-in.o: In function `to_nd_device_type':
bus.c:(.text+0x1b5d): undefined reference to `is_nd_dax'
drivers/nvdimm/built-in.o: In function `nd_region_notify_driver_action.constprop.2':
region_devs.c:(.text+0x6b6c): undefined reference to `is_nd_dax'
region_devs.c:(.text+0x6b8c): undefined reference to `to_nd_dax'
drivers/nvdimm/built-in.o: In function `nd_region_probe':
region.c:(.text+0x70f3): undefined reference to `nd_dax_create'
drivers/nvdimm/built-in.o: In function `mode_show':
namespace_devs.c:(.text+0xa196): undefined reference to `is_nd_dax'
drivers/nvdimm/built-in.o: In function `nvdimm_namespace_common_probe':
(.text+0xa55f): undefined reference to `is_nd_dax'
drivers/nvdimm/built-in.o: In function `nvdimm_namespace_common_probe':
(.text+0xa56e): undefined reference to `to_nd_dax'
This reverts the earlier fix, making NVDIMM_DAX a 'bool' option again
as it should be (it gets linked into the libnvdimm module). To fix
the original problem, I'm adding a dependency on LIBNVDIMM to
DEV_DAX_PMEM, which ensures we can't have that one built-in if the
rest is a module.
Fixes: 4e65e9381c7a ("/dev/dax: fix Kconfig dependency build breakage") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Linus Torvalds [Thu, 27 Oct 2016 22:49:12 +0000 (15:49 -0700)]
mm: remove unused variable in memory hotplug
When I removed the per-zone bitlock hashed waitqueues in commit 9dcb8b685fc3 ("mm: remove per-zone hashtable of bitlock waitqueues"), I
removed all the magic hotplug memory initialization of said waitqueues
too.
But when I actually _tested_ the resulting build, I stupidly assumed
that "allmodconfig" would enable memory hotplug. And it doesn't,
because it enables KASAN instead, which then disables hotplug memory
support.
As a result, my build test of the per-zone waitqueues was totally
broken, and I didn't notice that the compiler warns about the now unused
iterator variable 'i'.
I guess I should be happy that that seems to be the worst breakage from
my clearly horribly failed test coverage.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Thu, 27 Oct 2016 22:06:29 +0000 (15:06 -0700)]
Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"I2C has some driver bugfixes, module autoload fixes, and driver
enablement on some architectures"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: imx: defer probe if bus recovery GPIOs are not ready
i2c: designware: Avoid aborted transfers with fast reacting I2C slaves
i2c: i801: Fix I2C Block Read on 8-Series/C220 and later
i2c: xgene: Avoid dma_buffer overrun
i2c: digicolor: Fix module autoload
i2c: xlr: Fix module autoload for OF registration
i2c: xlp9xx: Fix module autoload
i2c: jz4780: Fix module autoload
i2c: allow configuration of imx driver for ColdFire architecture
i2c: mark device nodes only in case of successful instantiation
i2c: rk3x: Give the tuning value 0 during rk3x_i2c_v0_calc_timings
i2c: hix5hd2: allow build with ARCH_HISI
Linus Torvalds [Thu, 27 Oct 2016 21:33:08 +0000 (14:33 -0700)]
Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux
Pull thermal updates from Zhang Rui:
"The latest Thermal Management updates for v4.9-rc3:
- Fix a regression introduced by commit b721ca0d19(thermal/powerclamp: remove cpu whitelist), that
powerclamp driver checks cpu support in a wrong way. From: Eric
Ernst.
- Fix a problem that intel_pch_thermal driver misses passive trip
point when the PCH thermal device has an ACPI companion device
associated. From: Srinivas Pandruvada.
- Add missing support for Haswell PCH thermal sensor. From: Srinivas
Pandruvada"
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux:
thermal/powerclamp: correct cpu support check
thermal: intel_pch_thermal: Enable Haswell PCH
thermal: intel_pch_thermal: Add an ACPI passive trip
Linus Torvalds [Thu, 27 Oct 2016 21:16:30 +0000 (14:16 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Martin Schwidefsky:
"A few more s390 patches for 4.9:
- a fix for an overflow in the dasd driver reported by UBSAN
- fix a regression and add hotplug memory to the zone movable again
- add ignore defines for the pkey system calls
- fix the ouput of the merged stack tracer
- replace printk with pr_cont in arch/s390 where appropriate
- remove the arch specific return_address function again
- ignore reserved channel paths at boot time
- add a missing hugetlb_bad_size call to the arch backend"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/mm: fix zone calculation in arch_add_memory()
s390/dumpstack: use pr_cont within show_stack and die
s390/dumpstack: get rid of return_address again
s390/disassambler: use pr_cont where appropriate
s390/dumpstack: use pr_cont where appropriate
s390/dumpstack: restore reliable indicator for call traces
s390/mm: use hugetlb_bad_size()
s390/cio: don't register chpids in reserved state
s390: ignore pkey system calls
s390/dasd: avoid undefined behaviour
Linus Torvalds [Thu, 27 Oct 2016 21:12:04 +0000 (14:12 -0700)]
Merge tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux
Pull module maintainership updates from Rusty Russell:
"(Quoting from the MAINTAINERS commit:)
Being a Linux kernel maintainer has been my proudest professional
accomplishment, spanning the last 19 years. But now we have a surfeit
of excellent hackers, and I can hand this over without regret.
I'll still be around as co-maintainer for another cycle, but Jessica
is now the one to convince if you want your patches applied. She
rocks, and is far more timely than me too!"
* tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
MAINTAINERS: Begin module maintainer transition
Linus Torvalds [Thu, 27 Oct 2016 19:52:46 +0000 (12:52 -0700)]
Merge tag 'for-linus-4.9-rc2-ofs-1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux
Pull oreangefs updates from Mike Marshall:
"A couple of orangefs cleanups sent in by other developers:
- use d_fsdata instead of d_time (Miklos Szeredi)
- use file_inode(file) instead of file->f_path.dentry->d_inode (Amir
Goldstein)"
* tag 'for-linus-4.9-rc2-ofs-1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux:
orangefs: don't use d_time
orangefs: user file_inode() where it is due
Linus Torvalds [Thu, 27 Oct 2016 19:34:50 +0000 (12:34 -0700)]
Merge tag 'xfs-fixes-for-linus-4.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs
Pull xfs fixes from Dave Chinner:
"This update contains fixes for most of the outstanding regressions
introduced with the 4.9-rc1 XFS merge. There is also a fix for an
iomap bug, too.
This is a quite a bit larger than I'd prefer for a -rc3, but most of
the change comes from cleaning up the new reflink copy on write code;
it's much simpler and easier to understand now. These changes fixed
several bugs in the new code, and it wasn't clear that there was an
easier/simpler way to fix them. The rest of the fixes are the usual
size you'd expect at this stage.
I've left the commits to soak in linux-next for a some extra time
because of the size before asking you to pull, no new problems with
them have been reported so I think it's all OK.
Summary:
- iomap page offset masking fix for page faults
- add IOMAP_REPORT to distinguish between read and fiemap map
requests
- cleanups to new shared data extent code
- fix mount active status on failed log recovery
- fix broken dquots in a buffer calculation
- fix locking order issues and merge xfs_reflink_remap_range and
xfs_file_share_range
- rework unmapping of CoW extents and remove now unused functions
- clean state when CoW is done"
* tag 'xfs-fixes-for-linus-4.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: (25 commits)
xfs: clear cowblocks tag when cow fork is emptied
xfs: fix up inode cowblocks tracking tracepoints
fs: Do to trim high file position bits in iomap_page_mkwrite_actor
xfs: remove xfs_bunmapi_cow
xfs: optimize xfs_reflink_end_cow
xfs: optimize xfs_reflink_cancel_cow_blocks
xfs: refactor xfs_bunmapi_cow
xfs: optimize writes to reflink files
xfs: don't bother looking at the refcount tree for reads
xfs: handle "raw" delayed extents xfs_reflink_trim_around_shared
xfs: add xfs_trim_extent
iomap: add IOMAP_REPORT
xfs: merge xfs_reflink_remap_range and xfs_file_share_range
xfs: remove xfs_file_wait_for_io
xfs: move inode locking from xfs_reflink_remap_range to xfs_file_share_range
xfs: fix the same_inode check in xfs_file_share_range
xfs: remove the same fs check from xfs_file_share_range
libxfs: v3 inodes are only valid on crc-enabled filesystems
libxfs: clean up _calc_dquots_per_chunk
xfs: unset MS_ACTIVE if mount fails
...
Chris Mason [Thu, 27 Oct 2016 17:42:20 +0000 (10:42 -0700)]
btrfs: fix races on root_log_ctx lists
btrfs_remove_all_log_ctxs takes a shortcut where it avoids walking the
list because it knows all of the waiters are patiently waiting for the
commit to finish.
But, there's a small race where btrfs_sync_log can remove itself from
the list if it finds a log commit is already done. Also, it uses
list_del_init() to remove itself from the list, but there's no way to
know if btrfs_remove_all_log_ctxs has already run, so we don't know for
sure if it is safe to call list_del_init().
This gets rid of all the shortcuts for btrfs_remove_all_log_ctxs(), and
just calls it with the proper locking.
This is part two of the corruption fixed by cbd60aa7cd1. I should have
done this in the first place, but convinced myself the optimizations were
safe. A 12 hour run of dbench 2048 will eventually trigger a list debug
WARN_ON for the list_del_init() in btrfs_sync_log().
Fixes: d1433debe7f4346cf9fc0dafc71c3137d2a97bc4 Reported-by: Dave Jones <davej@codemonkey.org.uk>
cc: stable@vger.kernel.org # 3.15+ Signed-off-by: Chris Mason <clm@fb.com>
Linus Torvalds [Thu, 27 Oct 2016 17:08:58 +0000 (10:08 -0700)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Two small fixes: one is a fatal section mismatch (reference to init
after it's discarded) and the other two are iscsi locking fixes"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: NCR5380: no longer mark irq probing as __init
scsi: be2iscsi: Replace _bh with _irqsave/irqrestore
scsi: libiscsi: Fix locking in __iscsi_conn_send_pdu
Linus Torvalds [Thu, 27 Oct 2016 17:07:13 +0000 (10:07 -0700)]
Merge branch 'for-4.9-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata
Pull libata fixes from Tejun Heo:
"The AHCI MSI handling change in rc1 was a bit broken and caused disk
probing failures on some machines. These three patches should fix the
issues"
David Howells comments:
"My test machine fell foul of this using a PCIe M.2-attached SSD card.
The patches fix it for me"
* 'for-4.9-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
ahci: fix the single MSI-X case in ahci_init_one
ahci: fix nvec check
ahci: only try to use multi-MSI mode if there is more than 1 port
Linus Torvalds [Thu, 27 Oct 2016 17:05:31 +0000 (10:05 -0700)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"A set of fixes for this series, most notably the fix for the blk-mq
software queue regression in from this merge window.
Apart from that, a fix for an unlikely hang if a queue is flooded with
FUA requests from Ming, and a few small fixes for nbd and badblocks.
Lastly, a rename update for the proc softirq output, since the block
polling code was made generic"
* 'for-linus' of git://git.kernel.dk/linux-block:
blk-mq: update hardware and software queues for sleeping alloc
block: flush: fix IO hang in case of flood fua req
nbd: fix incorrect unlock of nbd->sock_lock in sock_shutdown
badblocks: badblocks_set/clear update unacked_exist
softirq: Display IRQ_POLL for irq-poll statistics
Linus Torvalds [Wed, 26 Oct 2016 17:15:30 +0000 (10:15 -0700)]
mm: remove per-zone hashtable of bitlock waitqueues
The per-zone waitqueues exist because of a scalability issue with the
page waitqueues on some NUMA machines, but it turns out that they hurt
normal loads, and now with the vmalloced stacks they also end up
breaking gfs2 that uses a bit_wait on a stack object:
where 'gh' can be a reference to the local variable 'mount_gh' on the
stack of fill_super().
The reason the per-zone hash table breaks for this case is that there is
no "zone" for virtual allocations, and trying to look up the physical
page to get at it will fail (with a BUG_ON()).
It turns out that I actually complained to the mm people about the
per-zone hash table for another reason just a month ago: the zone lookup
also hurts the regular use of "unlock_page()" a lot, because the zone
lookup ends up forcing several unnecessary cache misses and generates
horrible code.
As part of that earlier discussion, we had a much better solution for
the NUMA scalability issue - by just making the page lock have a
separate contention bit, the waitqueue doesn't even have to be looked at
for the normal case.
Peter Zijlstra already has a patch for that, but let's see if anybody
even notices. In the meantime, let's fix the actual gfs2 breakage by
simplifying the bitlock waitqueues and removing the per-zone issue.
Reported-by: Andreas Gruenbacher <agruenba@redhat.com> Tested-by: Bob Peterson <rpeterso@redhat.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jens Axboe [Thu, 27 Oct 2016 15:49:19 +0000 (09:49 -0600)]
blk-mq: update hardware and software queues for sleeping alloc
If we end up sleeping due to running out of requests, we should
update the hardware and software queues in the map ctx structure.
Otherwise we could end up having rq->mq_ctx point to the pre-sleep
context, and risk corrupting ctx->rq_list since we'll be
grabbing the wrong lock when inserting the request.
Reported-by: Dave Jones <davej@codemonkey.org.uk> Reported-by: Chris Mason <clm@fb.com> Tested-by: Chris Mason <clm@fb.com> Fixes: 63581af3f31e ("blk-mq: remove non-blocking pass in blk_mq_map_request") Signed-off-by: Jens Axboe <axboe@fb.com>
Nicholas Piggin [Thu, 13 Oct 2016 02:17:14 +0000 (13:17 +1100)]
powerpc/64s: relocation, register save fixes for system reset interrupt
This patch does a couple of things. First of all, powernv immediately
explodes when running a relocated kernel, because the system reset
exception for handling sleeps does not do correct relocated branches.
Secondly, the sleep handling code trashes the condition and cfar
registers, which we would like to preserve for debugging purposes (for
non-sleep case exception).
This patch changes the exception to use the standard format that saves
registers before any tests or branches are made. It adds the test for
idle-wakeup as an "extra" to break out of the normal exception path.
Then it branches to a relocated idle handler that calls the various
idle handling functions.
After this patch, POWER8 CPU simulator now boots powernv kernel that is
running at non-zero.
Fixes: 948cf67c4726 ("powerpc: Add NAP mode support on Power7 in HV mode") Cc: stable@vger.kernel.org # v3.0+ Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Acked-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
powerpc/mm/radix: Use tlbiel only if we ever ran on the current cpu
Before this patch, we used tlbiel, if we ever ran only on this core.
That was mostly derived from the nohash usage of the same. But is
incorrect, the ISA 3.0 clarifies tlbiel such that:
"All TLB entries that have all of the following properties are made
invalid on the thread executing the tlbiel instruction"
ie. tlbiel only invalidates TLB entries on the current thread. So if the
mm has been used on any other thread (aka. cpu) then we must broadcast
the invalidate.
This bug could lead to invalid TLB entries if a program runs on multiple
threads of a core.
Hence use tlbiel, if we only ever ran on only the current cpu.
powerpc/process: Fix CONFIG_ALIVEC typo in restore_tm_state()
It should be ALTIVEC, not ALIVEC.
Cyril explains: If a thread performs a transaction with altivec and then
gets preempted for whatever reason, this bug may cause the kernel to not
re-enable altivec when that thread runs again. This will result in an
altivec unavailable fault, when that fault happens inside a user
transaction the kernel has no choice but to enable altivec and doom the
transaction.
The result is that transactions using altivec may get aborted more often
than they should.
The difficulty in catching this with a selftest is my deliberate use of
the word may above. Optimisations to avoid FPU/altivec/VSX faults mean
that the kernel will always leave them on for 255 switches. This code
prevents the kernel turning it off if it got to the 256th switch (and
userspace was transactional).
Fixes: dc16b553c949 ("powerpc: Always restore FPU/VEC/VSX if hardware transactional memory in use") Reviewed-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Valentin Rothberg <valentinrothberg@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Marcel Hasler [Wed, 26 Oct 2016 22:42:27 +0000 (00:42 +0200)]
ALSA: usb-audio: Add quirk for Syntek STK1160
The stk1160 chip needs QUIRK_AUDIO_ALIGN_TRANSFER. This patch resolves
the issue reported on the mailing list
(http://marc.info/?l=linux-sound&m=139223599126215&w=2) and also fixes
bug 180071 (https://bugzilla.kernel.org/show_bug.cgi?id=180071).
... improved objtool's ability to detect GCC switch statement jump
tables for GCC 6. However the check to allow short jumps with the
scanned range of instructions wasn't quite right. The pattern detection
should allow jumps to the indirect jump instruction itself.
This fixes the following warning:
drivers/infiniband/sw/rxe/rxe_comp.o: warning: objtool: rxe_completer()+0x315: sibling call from callable instruction with changed frame pointer
Artem Savkov [Wed, 26 Oct 2016 14:02:09 +0000 (15:02 +0100)]
security/keys: make BIG_KEYS dependent on stdrng.
Since BIG_KEYS can't be compiled as module it requires one of the "stdrng"
providers to be compiled into kernel. Otherwise big_key_crypto_init() fails
on crypto_alloc_rng step and next dereference of big_key_skcipher (e.g. in
big_key_preparse()) results in a NULL pointer dereference.
Fixes: 13100a72f40f5748a04017e0ab3df4cf27c809ef ('Security: Keys: Big keys stored encrypted') Signed-off-by: Artem Savkov <asavkov@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com>
cc: Stephan Mueller <smueller@chronox.de>
cc: Kirill Marinushkin <k.marinushkin@gmail.com>
cc: stable@vger.kernel.org Signed-off-by: James Morris <james.l.morris@oracle.com>
David Howells [Wed, 26 Oct 2016 14:02:01 +0000 (15:02 +0100)]
KEYS: Sort out big_key initialisation
big_key has two separate initialisation functions, one that registers the
key type and one that registers the crypto. If the key type fails to
register, there's no problem if the crypto registers successfully because
there's no way to reach the crypto except through the key type.
However, if the key type registers successfully but the crypto does not,
big_key_rng and big_key_blkcipher may end up set to NULL - but the code
neither checks for this nor unregisters the big key key type.
Furthermore, since the key type is registered before the crypto, it is
theoretically possible for the kernel to try adding a big_key before the
crypto is set up, leading to the same effect.
Fix this by merging big_key_crypto_init() and big_key_init() and calling
the resulting function late. If they're going to be encrypted, we
shouldn't be creating big_keys before we have the facilities to do the
encryption available. The key type registration is also moved after the
crypto initialisation.
The fix also includes message printing on failure.
If the big_key type isn't correctly set up, simply doing:
dd if=/dev/zero bs=4096 count=1 | keyctl padd big_key a @s
ought to cause an oops.
Fixes: 13100a72f40f5748a04017e0ab3df4cf27c809ef ('Security: Keys: Big keys stored encrypted') Signed-off-by: David Howells <dhowells@redhat.com>
cc: Peter Hlavaty <zer0mem@yahoo.com>
cc: Kirill Marinushkin <k.marinushkin@gmail.com>
cc: Artem Savkov <asavkov@redhat.com>
cc: stable@vger.kernel.org Signed-off-by: James Morris <james.l.morris@oracle.com>
David Howells [Wed, 26 Oct 2016 14:01:54 +0000 (15:01 +0100)]
KEYS: Fix short sprintf buffer in /proc/keys show function
This fixes CVE-2016-7042.
Fix a short sprintf buffer in proc_keys_show(). If the gcc stack protector
is turned on, this can cause a panic due to stack corruption.
The problem is that xbuf[] is not big enough to hold a 64-bit timeout
rendered as weeks:
(gdb) p 0xffffffffffffffffULL/(60*60*24*7)
$2 = 30500568904943
That's 14 chars plus NUL, not 11 chars plus NUL.
Expand the buffer to 16 chars.
I think the unpatched code apparently works if the stack-protector is not
enabled because on a 32-bit machine the buffer won't be overflowed and on a
64-bit machine there's a 64-bit aligned pointer at one side and an int that
isn't checked again on the other side.
Reported-by: Ondrej Kozina <okozina@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Ondrej Kozina <okozina@redhat.com>
cc: stable@vger.kernel.org Signed-off-by: James Morris <james.l.morris@oracle.com>
Yisheng Xie [Fri, 21 Oct 2016 08:13:55 +0000 (16:13 +0800)]
arm64/numa: fix pcpu_cpu_distance() to get correct CPU proximity
The pcpu_build_alloc_info() function group CPUs according to their
proximity, by call callback function @cpu_distance_fn from different
ARCHs.
For arm64 the callback of @cpu_distance_fn is
pcpu_cpu_distance(from, to)
-> node_distance(from, to)
The @from and @to for function node_distance() should be nid.
However, pcpu_cpu_distance() in arch/arm64/mm/numa.c just past the
cpu id for @from and @to, and didn't convert to numa node id.
For this incorrect cpu proximity get from ARCH, it may cause each CPU
in one group and make group_cnt out of bound:
setup_per_cpu_areas()
pcpu_embed_first_chunk()
pcpu_build_alloc_info()
in pcpu_build_alloc_info, since cpu_distance_fn will return
REMOTE_DISTANCE if we pass cpu ids (0,1,2...), so
cpu_distance_fn(cpu, tcpu) > LOCAL_DISTANCE will wrongly be ture.
This may results in triggering the BUG_ON(unit != nr_units) later:
Ming Lei [Wed, 26 Oct 2016 08:57:15 +0000 (16:57 +0800)]
block: flush: fix IO hang in case of flood fua req
This patch fixes one issue reported by Kent, which can
be triggered in bcachefs over sata disk. Actually it
is a generic issue in block flush vs. blk-tag.
Cc: Christoph Hellwig <hch@infradead.org> Reported-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Ming Lei <tom.leiming@gmail.com> Signed-off-by: Jens Axboe <axboe@fb.com>
Steven Rostedt [Mon, 24 Oct 2016 19:01:48 +0000 (15:01 -0400)]
x86: Fix export for mcount and __fentry__
Commit 784d5699eddc5 ("x86: move exports to actual definitions") removed the
EXPORT_SYMBOL(__fentry__) and EXPORT_SYMBOL(mcount) from x8664_ksyms_64.c,
and added EXPORT_SYMBOL(function_hook) in mcount_64.S instead. The problem
is that function_hook isn't a function at all, but a macro that is defined
as either mcount or __fentry__ depending on the support from gcc.
Originally, I thought this was a macro issue, like what __stringify()
is used for. But the problem is a bit deeper. The Makefile.build has
some magic that does post processing of files to create the CRC
bindings. It does some searches for EXPORT_SYMBOL() and because it
finds a macro name and not the actual functions, this causes
function_hook not to be converted into mcount or __fentry__ and they
are missed.
Instead of adding more magic to Makefile.build, just add
EXPORT_SYMBOL() for mcount and __fentry__ where the ifdef is used.
Since this is assembly and not C, it doesn't require being set after
the function is defined.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Tested-by: Borislav Petkov <bp@alien8.de> Cc: Gabriel C <nix.or.die@gmail.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Al Viro <viro@ZenIV.linux.org.uk> Link: http://lkml.kernel.org/r/20161024150148.4f9d90e4@gandalf.local.home Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
commit 92ca8d20dee2 ("genirq/msi: Switch to new irq spreading")
introduced new parameter to msi_init_setup and but did not update
docbook comments. Fixes 'make htmldocs' warning.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Cc: bhelgaas@google.com Cc: linux-pci@vger.kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Dave Airlie [Mon, 24 Oct 2016 05:37:48 +0000 (15:37 +1000)]
drm/drivers: add support for using the arch wc mapping API.
This fixes a regression in all these drivers since the cache
mode tracking was fixed for mixed mappings. It uses the new
arch API to add the VRAM range to the PAT mapping tracking
tables.
Fixes: 87744ab3832 (mm: fix cache mode tracking in vm_insert_mixed()) Reviewed-by: Christian König <christian.koenig@amd.com>. Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Mon, 24 Oct 2016 05:27:59 +0000 (15:27 +1000)]
x86/io: add interface to reserve io memtype for a resource range. (v1.1)
A recent change to the mm code in: 87744ab3832b mm: fix cache mode tracking in vm_insert_mixed()
started enforcing checking the memory type against the registered list for
amixed pfn insertion mappings. It happens that the drm drivers for a number
of gpus relied on this being broken. Currently the driver only inserted
VRAM mappings into the tracking table when they came from the kernel,
and userspace mappings never landed in the table. This led to a regression
where all the mapping end up as UC instead of WC now.
I've considered a number of solutions but since this needs to be fixed
in fixes and not next, and some of the solutions were going to introduce
overhead that hadn't been there before I didn't consider them viable at
this stage. These mainly concerned hooking into the TTM io reserve APIs,
but these API have a bunch of fast paths I didn't want to unwind to add
this to.
The solution I've decided on is to add a new API like the arch_phys_wc
APIs (these would have worked but wc_del didn't take a range), and
use them from the drivers to add a WC compatible mapping to the table
for all VRAM on those GPUs. This means we can then create userspace
mapping that won't get degraded to UC.
v1.1: use CONFIG_X86_PAT + add some comments in io.h
Cc: Toshi Kani <toshi.kani@hp.com> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Brian Gerst <brgerst@gmail.com> Cc: x86@kernel.org Cc: mcgrof@suse.com Cc: Dan Williams <dan.j.williams@intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Dave Airlie <airlied@redhat.com>
Rusty Russell [Tue, 25 Oct 2016 23:37:44 +0000 (10:07 +1030)]
MAINTAINERS: Begin module maintainer transition
Being a Linux kernel maintainer has been my proudest professional
accomplishment, spanning the last 19 years. But now we have a surfeit
of excellent hackers, and I can hand this over without regret.
I'll still be around as co-maintainer for another cycle, but Jessica
is now the one to convince if you want your patches applied. She
rocks, and is far more timely than me too!
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Jessica Yu <jeyu@redhat.com>
We need to make sure hpriv->irq is set properly if we don't use per-port
vectors, so switch from blindly assigning pdev->irq to using
pci_irq_vector, which handles all interrupt types correctly.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Robert Richter <robert.richter@cavium.com> Tested-by: Robert Richter <robert.richter@cavium.com> Tested-by: David Daney <ddaney.cavm@gmail.com> Fixes: 0b9e2988ab22 ("ahci: use pci_alloc_irq_vectors") Signed-off-by: Tejun Heo <tj@kernel.org>
Thomas Gleixner [Sat, 22 Oct 2016 11:07:37 +0000 (11:07 +0000)]
timers: Prevent base clock corruption when forwarding
When a timer is enqueued we try to forward the timer base clock. This
mechanism has two issues:
1) Forwarding a remote base unlocked
The forwarding function is called from get_target_base() with the current
timer base lock held. But if the new target base is a different base than
the current base (can happen with NOHZ, sigh!) then the forwarding is done
on an unlocked base. This can lead to corruption of base->clk.
Solution is simple: Invoke the forwarding after the target base is locked.
2) Possible corruption due to jiffies advancing
This is similar to the issue in get_net_timer_interrupt() which was fixed
in the previous patch. jiffies can advance between check and assignement
and therefore advancing base->clk beyond the next expiry value.
So we need to read jiffies into a local variable once and do the checks and
assignment with the local copy.
Fixes: a683f390b93f("timers: Forward the wheel clock whenever possible") Reported-by: Ashton Holmes <scoopta@gmail.com> Reported-by: Michael Thayer <michael.thayer@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Michal Necasek <michal.necasek@oracle.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: knut.osmundsen@oracle.com Cc: stable@vger.kernel.org Cc: stern@rowland.harvard.edu Cc: rt@linutronix.de Link: http://lkml.kernel.org/r/20161022110552.253640125@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
jiffies has not advanced since run_timers(), so this assignment effectively
decrements base->clk by one.
base->clk is the index into the timer wheel arrays. So let's assume the
following state after the base->clk increment in run_timers():
jiffies = 0
base->clk = 1
A timer gets enqueued with an expiry delta of 63 ticks (which is the case
with the USB timeout and HZ=250) so the resulting bucket index is:
base->clk + delta = 1 + 63 = 64
The timer goes into the first wheel level. The array size is 64 so it ends
up in bucket 0, which is correct as it takes 63 ticks to advance base->clk
to index into bucket 0 again.
If the cpu goes idle before jiffies advance, then the bug in the forwarding
mechanism sets base->clk back to 0, so the next invocation of run_timers()
at the next tick will index into bucket 0 and therefore expire the timer 62
ticks too early.
Instead of blindly setting base->clk to jiffies we must make the forwarding
conditional on jiffies > base->clk, but we cannot use jiffies for this as
we might run into the following issue:
if (time_after(jiffies, base->clk) {
if (time_after(nextevt, base->clk))
base->clk = jiffies;
jiffies can increment between the check and the assigment far enough to
advance beyond nextevt. So we need to use a stable value for checking.
get_next_timer_interrupt() has the basej argument which is the jiffies
value snapshot taken in the calling code. So we can just that.
Thanks to Ashton for bisecting and providing trace data!
Fixes: a683f390b93f ("timers: Forward the wheel clock whenever possible") Reported-by: Ashton Holmes <scoopta@gmail.com> Reported-by: Michael Thayer <michael.thayer@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Michal Necasek <michal.necasek@oracle.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: knut.osmundsen@oracle.com Cc: stable@vger.kernel.org Cc: stern@rowland.harvard.edu Cc: rt@linutronix.de Link: http://lkml.kernel.org/r/20161022110552.175308322@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Thomas Gleixner [Mon, 24 Oct 2016 09:55:10 +0000 (11:55 +0200)]
timers: Lock base for same bucket optimization
Linus stumbled over the unlocked modification of the timer expiry value in
mod_timer() which is an optimization for timers which stay in the same
bucket - due to the bucket granularity - despite their expiry time getting
updated.
The optimization itself still makes sense even if we take the lock, because
in case that the bucket stays the same, we avoid the pointless
queue/enqueue dance.
Make the check and the modification of timer->expires protected by the base
lock and shuffle the remaining code around so we can keep the lock held
when we actually have to requeue the timer to a different bucket.
Fixes: f00c0afdfa62 ("timers: Implement optimization for same expiry time in mod_timer()") Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1610241711220.4983@nanos Cc: stable@vger.kernel.org Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org>
Thomas Gleixner [Mon, 24 Oct 2016 09:41:56 +0000 (11:41 +0200)]
timers: Plug locking race vs. timer migration
Linus noticed that lock_timer_base() lacks a READ_ONCE() for accessing the
timer flags. As a consequence the compiler is allowed to reload the flags
between the initial check for TIMER_MIGRATION and the following timer base
computation and the spin lock of the base.
While this has not been observed (yet), we need to make sure that it never
happens.
Fixes: 0eeda71bc30d ("timer: Replace timer base by a cpu index") Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1610241711220.4983@nanos Cc: stable@vger.kernel.org Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org>
Takashi Iwai [Tue, 25 Oct 2016 13:56:35 +0000 (15:56 +0200)]
ALSA: seq: Fix time account regression
The recent rewrite of the sequencer time accounting using timespec64
in the commit [3915bf294652: ALSA: seq_timer: use monotonic times
internally] introduced a bad regression. Namely, the time reported
back doesn't increase but goes back and forth.
The culprit was obvious: the delta is stored to the result (cur_time =
delta), instead of adding the delta (cur_time += delta)!
Stefan Agner [Tue, 27 Sep 2016 00:18:58 +0000 (17:18 -0700)]
i2c: imx: defer probe if bus recovery GPIOs are not ready
Some SoC might load the GPIO driver after the I2C driver and
using the I2C bus recovery mechanism via GPIOs. In this case
it is crucial to defer probing if the GPIO request functions
do so, otherwise the I2C driver gets loaded without recovery
mechanisms enabled.
Signed-off-by: Stefan Agner <stefan@agner.ch> Acked-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Acked-by: Li Yang <leoyang.li@nxp.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de> Cc: stable@kernel.org
Jarkko Nikula [Thu, 29 Sep 2016 13:04:59 +0000 (16:04 +0300)]
i2c: designware: Avoid aborted transfers with fast reacting I2C slaves
I2C DesignWare may abort transfer with arbitration lost if I2C slave pulls
SDA down quickly after falling edge of SCL. Reason for this is unknown but
after trial and error it was found this can be avoided by enabling non-zero
SDA RX hold time for the receiver.
By the specification SDA RX hold time extends incoming SDA low to high
transition by n * ic_clk cycles but only when SCL is high. However it
seems to help avoid above faulty arbitration lost error.
Bits 23:16 in IC_SDA_HOLD register define the SDA RX hold time for the
receiver. Be conservative and enable 1 ic_clk cycle long hold time in
case boot firmware hasn't set it up.
Reported-by: Jukka Laitinen <jukka.laitinen@intel.com> Signed-off-by: Jarkko Nikula <jarkko.nikula@linux.intel.com> Tested-by: Jukka Laitinen <jukka.laitinen@intel.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Jean Delvare [Tue, 11 Oct 2016 11:13:27 +0000 (13:13 +0200)]
i2c: i801: Fix I2C Block Read on 8-Series/C220 and later
Starting with the 8-Series/C220 PCH (Lynx Point), the SMBus
controller includes a SPD EEPROM protection mechanism. Once the SPD
Write Disable bit is set, only reads are allowed to slave addresses
0x50-0x57.
However the legacy implementation of I2C Block Read since the ICH5
looks like a write, and is therefore blocked by the SPD protection
mechanism. This causes the eeprom and at24 drivers to fail.
So assume that I2C Block Read is implemented as an actual read on
these chipsets. I tested it on my Q87 chipset and it seems to work
just fine.
Signed-off-by: Jean Delvare <jdelvare@suse.de> Tested-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
[wsa: rebased to v4.9-rc2] Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
If the driver is built as a module, autoload won't work because the module
alias information is not filled. So user-space can't match the registered
device with the corresponding module.
Export the module alias information using the MODULE_DEVICE_TABLE() macro.
Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com> Acked-by: Baruch Siach <baruch@tkos.co.il> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
If the driver is built as a module, autoload won't work because the module
alias information is not filled. So user-space can't match the registered
device with the corresponding module.
Export the module alias information using the MODULE_DEVICE_TABLE() macro.
Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
If the driver is built as a module, autoload won't work because the module
alias information is not filled. So user-space can't match the registered
device with the corresponding module.
Export the module alias information using the MODULE_DEVICE_TABLE() macro.
Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
If the driver is built as a module, autoload won't work because the module
alias information is not filled. So user-space can't match the registered
device with the corresponding module.
Export the module alias information using the MODULE_DEVICE_TABLE() macro.
Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Greg Ungerer [Mon, 17 Oct 2016 01:54:05 +0000 (11:54 +1000)]
i2c: allow configuration of imx driver for ColdFire architecture
The i2c controller used by Freescales iMX processors is the same
hardware module used on Freescales ColdFire family of processors.
We can use the existing i2c-imx driver on ColdFire family members.
Modify the configuration to allow it to be selected when compiling
for ColdFire targets.
Signed-off-by: Greg Ungerer <gerg@linux-m68k.org> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Ralf Ramsauer [Mon, 17 Oct 2016 13:59:57 +0000 (15:59 +0200)]
i2c: mark device nodes only in case of successful instantiation
Instantiated I2C device nodes are marked with OF_POPULATE. This was
introduced in 4f001fd30145a6. On unloading, loaded device nodes will of
course be unmarked. The problem are nodes that fail during
initialisation: If a node fails, it won't be unloaded and hence not be
unmarked.
If a I2C driver module is unloaded and reloaded, it will skip nodes that
failed before.
Skip device nodes that are already populated and mark them only in case
of success.
Fixes: 4f001fd30145a6 ("i2c: Mark instantiated device nodes with OF_POPULATE") Signed-off-by: Ralf Ramsauer <ralf@ramses-pyramidenbau.de> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Pantelis Antoniou <pantelis.antoniou@konsulko.com>
[wsa: use 14-digit commit sha] Signed-off-by: Wolfram Sang <wsa@the-dreams.de> Cc: stable@kernel.org
Arnd Bergmann [Mon, 24 Oct 2016 15:33:18 +0000 (17:33 +0200)]
x86/quirks: Hide maybe-uninitialized warning
gcc -Wmaybe-uninitialized detects that quirk_intel_brickland_xeon_ras_cap
uses uninitialized data when CONFIG_PCI is not set:
arch/x86/kernel/quirks.c: In function ‘quirk_intel_brickland_xeon_ras_cap’:
arch/x86/kernel/quirks.c:641:13: error: ‘capid0’ is used uninitialized in this function [-Werror=uninitialized]
However, the function is also not called in this configuration, so we
can avoid the warning by moving the existing #ifdef to cover it as well.
The NMI hit in the entry code right after setting up the stack pointer
from 'cpu_current_top_of_stack', so the kernel stack was empty. The
'guess' version of __unwind_start() attempted to dereference the "top of
stack" pointer, which is not actually *on* the stack.
Add a check in the guess unwinder to deal with an empty stack. (The
frame pointer unwinder already has such a check.)
Reported-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 7c7900f89770 ("x86/unwind: Add new unwind interface and implementations") Link: http://lkml.kernel.org/r/20161024133127.e5evgeebdbohnmpb@treble Signed-off-by: Ingo Molnar <mingo@kernel.org>