]> git.baikalelectronics.ru Git - kernel.git/log
kernel.git
5 years agomm/page_alloc.c: ratelimit allocation failure warnings more aggressively
Johannes Weiner [Wed, 6 Nov 2019 05:16:51 +0000 (21:16 -0800)]
mm/page_alloc.c: ratelimit allocation failure warnings more aggressively

While investigating a bug related to higher atomic allocation failures,
we noticed the failure warnings positively drowning the console, and in
our case trigger lockup warnings because of a serial console too slow to
handle all that output.

But even if we had a faster console, it's unclear what additional
information the current level of repetition provides.

Allocation failures happen for three reasons: The machine is OOM, the VM
is failing to handle reasonable requests, or somebody is making
unreasonable requests (and didn't acknowledge their opportunism with
__GFP_NOWARN).  Having the memory dump, a callstack, and the ratelimit
stats on skipped failure warnings should provide enough information to
let users/admins/developers know whether something is wrong and point
them in the right direction for debugging, bpftracing etc.

Limit allocation failure warnings to one spew every ten seconds.

Link: http://lkml.kernel.org/r/20191028194906.26899-1-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agomm/khugepaged: fix might_sleep() warn with CONFIG_HIGHPTE=y
Ville Syrjälä [Wed, 6 Nov 2019 05:16:48 +0000 (21:16 -0800)]
mm/khugepaged: fix might_sleep() warn with CONFIG_HIGHPTE=y

I got some khugepaged spew on a 32bit x86:

  BUG: sleeping function called from invalid context at include/linux/mmu_notifier.h:346
  in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 25, name: khugepaged
  INFO: lockdep is turned off.
  CPU: 1 PID: 25 Comm: khugepaged Not tainted 5.4.0-rc5-elk+ #206
  Hardware name: System manufacturer P5Q-EM/P5Q-EM, BIOS 2203    07/08/2009
  Call Trace:
   dump_stack+0x66/0x8e
   ___might_sleep.cold.96+0x95/0xa6
   __might_sleep+0x2e/0x80
   collapse_huge_page.isra.51+0x5ac/0x1360
   khugepaged+0x9a9/0x20f0
   kthread+0xf5/0x110
   ret_from_fork+0x2e/0x38

Looks like it's due to CONFIG_HIGHPTE=y pte_offset_map()->kmap_atomic()
vs.  mmu_notifier_invalidate_range_start().  Let's do the naive approach
and just reorder the two operations.

Link: http://lkml.kernel.org/r/20191029201513.GG1208@intel.com
Fixes: 8b5f461ebd1003 ("mm/mmu_notifiers: annotate with might_sleep()")
Signed-off-by: Ville Syrjl <ville.syrjala@linux.intel.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agomm, vmstat: reduce zone->lock holding time by /proc/pagetypeinfo
Michal Hocko [Wed, 6 Nov 2019 05:16:44 +0000 (21:16 -0800)]
mm, vmstat: reduce zone->lock holding time by /proc/pagetypeinfo

pagetypeinfo_showfree_print is called by zone->lock held in irq mode.
This is not really nice because it blocks both any interrupts on that
cpu and the page allocator.  On large machines this might even trigger
the hard lockup detector.

Considering the pagetypeinfo is a debugging tool we do not really need
exact numbers here.  The primary reason to look at the outuput is to see
how pageblocks are spread among different migratetypes and low number of
pages is much more interesting therefore putting a bound on the number
of pages on the free_list sounds like a reasonable tradeoff.

The new output will simply tell
  [...]
  Node    6, zone   Normal, type      Movable >100000 >100000 >100000 >100000  41019  31560  23996  10054   3229    983    648

instead of
  Node    6, zone   Normal, type      Movable 399568 294127 221558 102119  41019  31560  23996  10054   3229    983    648

The limit has been chosen arbitrary and it is a subject of a future
change should there be a need for that.

While we are at it, also drop the zone lock after each free_list
iteration which will help with the IRQ and page allocator responsiveness
even further as the IRQ lock held time is always bound to those 100k
pages.

[akpm@linux-foundation.org: tweak comment text, per David Hildenbrand]
Link: http://lkml.kernel.org/r/20191025072610.18526-3-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Waiman Long <longman@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Rafael Aquini <aquini@redhat.com>
Acked-by: David Rientjes <rientjes@google.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jann Horn <jannh@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Roman Gushchin <guro@fb.com>
Cc: Song Liu <songliubraving@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agomm, vmstat: hide /proc/pagetypeinfo from normal users
Michal Hocko [Wed, 6 Nov 2019 05:16:40 +0000 (21:16 -0800)]
mm, vmstat: hide /proc/pagetypeinfo from normal users

/proc/pagetypeinfo is a debugging tool to examine internal page
allocator state wrt to fragmentation.  It is not very useful for any
other use so normal users really do not need to read this file.

Waiman Long has noticed that reading this file can have negative side
effects because zone->lock is necessary for gathering data and that a)
interferes with the page allocator and its users and b) can lead to hard
lockups on large machines which have very long free_list.

Reduce both issues by simply not exporting the file to regular users.

Link: http://lkml.kernel.org/r/20191025072610.18526-2-mhocko@kernel.org
Fixes: 0397be336e9e ("Print out statistics in relation to fragmentation avoidance to /proc/pagetypeinfo")
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Waiman Long <longman@redhat.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Waiman Long <longman@redhat.com>
Acked-by: Rafael Aquini <aquini@redhat.com>
Acked-by: David Rientjes <rientjes@google.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Roman Gushchin <guro@fb.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Jann Horn <jannh@google.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agomm/mmu_notifiers: use the right return code for WARN_ON
Jason Gunthorpe [Wed, 6 Nov 2019 05:16:37 +0000 (21:16 -0800)]
mm/mmu_notifiers: use the right return code for WARN_ON

The return code from the op callback is actually in _ret, while the
WARN_ON was checking ret which causes it to misfire.

Link: http://lkml.kernel.org/r/20191025175502.GA31127@ziepe.ca
Fixes: fafe3afb7a8e ("mm/mmu_notifiers: check if mmu notifier callbacks are allowed to fail")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agoocfs2: protect extent tree in ocfs2_prepare_inode_for_write()
Shuning Zhang [Wed, 6 Nov 2019 05:16:34 +0000 (21:16 -0800)]
ocfs2: protect extent tree in ocfs2_prepare_inode_for_write()

When the extent tree is modified, it should be protected by inode
cluster lock and ip_alloc_sem.

The extent tree is accessed and modified in the
ocfs2_prepare_inode_for_write, but isn't protected by ip_alloc_sem.

The following is a case.  The function ocfs2_fiemap is accessing the
extent tree, which is modified at the same time.

  kernel BUG at fs/ocfs2/extent_map.c:475!
  invalid opcode: 0000 [#1] SMP
  Modules linked in: tun ocfs2 ocfs2_nodemanager configfs ocfs2_stackglue [...]
  CPU: 16 PID: 14047 Comm: o2info Not tainted 4.1.12-124.23.1.el6uek.x86_64 #2
  Hardware name: Oracle Corporation ORACLE SERVER X7-2L/ASM, MB MECH, X7-2L, BIOS 42040600 10/19/2018
  task: ffff88019487e200 ti: ffff88003daa4000 task.ti: ffff88003daa4000
  RIP: ocfs2_get_clusters_nocache.isra.11+0x390/0x550 [ocfs2]
  Call Trace:
    ocfs2_fiemap+0x1e3/0x430 [ocfs2]
    do_vfs_ioctl+0x155/0x510
    SyS_ioctl+0x81/0xa0
    system_call_fastpath+0x18/0xd8
  Code: 18 48 c7 c6 60 7f 65 a0 31 c0 bb e2 ff ff ff 48 8b 4a 40 48 8b 7a 28 48 c7 c2 78 2d 66 a0 e8 38 4f 05 00 e9 28 fe ff ff 0f 1f 00 <0f> 0b 66 0f 1f 44 00 00 bb 86 ff ff ff e9 13 fe ff ff 66 0f 1f
  RIP  ocfs2_get_clusters_nocache.isra.11+0x390/0x550 [ocfs2]
  ---[ end trace c8aa0c8180e869dc ]---
  Kernel panic - not syncing: Fatal exception
  Kernel Offset: disabled

This issue can be reproduced every week in a production environment.

This issue is related to the usage mode.  If others use ocfs2 in this
mode, the kernel will panic frequently.

[akpm@linux-foundation.org: coding style fixes]
[Fix new warning due to unused function by removing said function - Linus ]
Link: http://lkml.kernel.org/r/1568772175-2906-2-git-send-email-sunny.s.zhang@oracle.com
Signed-off-by: Shuning Zhang <sunny.s.zhang@oracle.com>
Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Gang He <ghe@suse.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agomm: thp: handle page cache THP correctly in PageTransCompoundMap
Yang Shi [Wed, 6 Nov 2019 05:16:30 +0000 (21:16 -0800)]
mm: thp: handle page cache THP correctly in PageTransCompoundMap

We have a usecase to use tmpfs as QEMU memory backend and we would like
to take the advantage of THP as well.  But, our test shows the EPT is
not PMD mapped even though the underlying THP are PMD mapped on host.
The number showed by /sys/kernel/debug/kvm/largepage is much less than
the number of PMD mapped shmem pages as the below:

  7f2778200000-7f2878200000 rw-s 00000000 00:14 262232 /dev/shm/qemu_back_mem.mem.Hz2hSf (deleted)
  Size:            4194304 kB
  [snip]
  AnonHugePages:         0 kB
  ShmemPmdMapped:   579584 kB
  [snip]
  Locked:                0 kB

  cat /sys/kernel/debug/kvm/largepages
  12

And some benchmarks do worse than with anonymous THPs.

By digging into the code we figured out that commit 0b9f3899d79a ("mm:
thp: kvm: fix memory corruption in KVM with THP enabled") checks if
there is a single PTE mapping on the page for anonymous THP when setting
up EPT map.  But the _mapcount < 0 check doesn't work for page cache THP
since every subpage of page cache THP would get _mapcount inc'ed once it
is PMD mapped, so PageTransCompoundMap() always returns false for page
cache THP.  This would prevent KVM from setting up PMD mapped EPT entry.

So we need handle page cache THP correctly.  However, when page cache
THP's PMD gets split, kernel just remove the map instead of setting up
PTE map like what anonymous THP does.  Before KVM calls get_user_pages()
the subpages may get PTE mapped even though it is still a THP since the
page cache THP may be mapped by other processes at the mean time.

Checking its _mapcount and whether the THP has PTE mapped or not.
Although this may report some false negative cases (PTE mapped by other
processes), it looks not trivial to make this accurate.

With this fix /sys/kernel/debug/kvm/largepage would show reasonable
pages are PMD mapped by EPT as the below:

  7fbeaee00000-7fbfaee00000 rw-s 00000000 00:14 275464 /dev/shm/qemu_back_mem.mem.SKUvat (deleted)
  Size:            4194304 kB
  [snip]
  AnonHugePages:         0 kB
  ShmemPmdMapped:   557056 kB
  [snip]
  Locked:                0 kB

  cat /sys/kernel/debug/kvm/largepages
  271

And the benchmarks are as same as anonymous THPs.

[yang.shi@linux.alibaba.com: v4]
Link: http://lkml.kernel.org/r/1571865575-42913-1-git-send-email-yang.shi@linux.alibaba.com
Link: http://lkml.kernel.org/r/1571769577-89735-1-git-send-email-yang.shi@linux.alibaba.com
Fixes: b93b9b57b171 ("rmap: support file thp")
Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
Reported-by: Gang Deng <gavin.dg@linux.alibaba.com>
Tested-by: Gang Deng <gavin.dg@linux.alibaba.com>
Suggested-by: Hugh Dickins <hughd@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org> [4.8+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agomm, meminit: recalculate pcpu batch and high limits after init completes
Mel Gorman [Wed, 6 Nov 2019 05:16:27 +0000 (21:16 -0800)]
mm, meminit: recalculate pcpu batch and high limits after init completes

Deferred memory initialisation updates zone->managed_pages during the
initialisation phase but before that finishes, the per-cpu page
allocator (pcpu) calculates the number of pages allocated/freed in
batches as well as the maximum number of pages allowed on a per-cpu
list.  As zone->managed_pages is not up to date yet, the pcpu
initialisation calculates inappropriately low batch and high values.

This increases zone lock contention quite severely in some cases with
the degree of severity depending on how many CPUs share a local zone and
the size of the zone.  A private report indicated that kernel build
times were excessive with extremely high system CPU usage.  A perf
profile indicated that a large chunk of time was lost on zone->lock
contention.

This patch recalculates the pcpu batch and high values after deferred
initialisation completes for every populated zone in the system.  It was
tested on a 2-socket AMD EPYC 2 machine using a kernel compilation
workload -- allmodconfig and all available CPUs.

mmtests configuration: config-workload-kernbench-max Configuration was
modified to build on a fresh XFS partition.

kernbench
                                5.4.0-rc3              5.4.0-rc3
                                  vanilla           resetpcpu-v2
Amean     user-256    13249.50 (   0.00%)    16401.31 * -23.79%*
Amean     syst-256    14760.30 (   0.00%)     4448.39 *  69.86%*
Amean     elsp-256      162.42 (   0.00%)      119.13 *  26.65%*
Stddev    user-256       42.97 (   0.00%)       19.15 (  55.43%)
Stddev    syst-256      336.87 (   0.00%)        6.71 (  98.01%)
Stddev    elsp-256        2.46 (   0.00%)        0.39 (  84.03%)

                   5.4.0-rc3    5.4.0-rc3
                     vanilla resetpcpu-v2
Duration User       39766.24     49221.79
Duration System     44298.10     13361.67
Duration Elapsed      519.11       388.87

The patch reduces system CPU usage by 69.86% and total build time by
26.65%.  The variance of system CPU usage is also much reduced.

Before, this was the breakdown of batch and high values over all zones
was:

    256               batch: 1
    256               batch: 63
    512               batch: 7
    256               high:  0
    256               high:  378
    512               high:  42

512 pcpu pagesets had a batch limit of 7 and a high limit of 42.  After
the patch:

    256               batch: 1
    768               batch: 63
    256               high:  0
    768               high:  378

[mgorman@techsingularity.net: fix merge/linkage snafu]
Link: http://lkml.kernel.org/r/20191023084705.GD3016@techsingularity.netLink:
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Qian Cai <cai@lca.pw>
Cc: <stable@vger.kernel.org> [4.1+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agomm/gup_benchmark: fix MAP_HUGETLB case
John Hubbard [Wed, 6 Nov 2019 05:16:24 +0000 (21:16 -0800)]
mm/gup_benchmark: fix MAP_HUGETLB case

The MAP_HUGETLB ("-H" option) of gup_benchmark fails:

  $ sudo ./gup_benchmark -H
  mmap: Invalid argument

This is because gup_benchmark.c is passing in a file descriptor to
mmap(), but the fd came from opening up the /dev/zero file.  This
confuses the mmap syscall implementation, which thinks that, if the
caller did not specify MAP_ANONYMOUS, then the file must be a huge page
file.  So it attempts to verify that the file really is a huge page
file, as you can see here:

ksys_mmap_pgoff()
{
    if (!(flags & MAP_ANONYMOUS)) {
        retval = -EINVAL;
        if (unlikely(flags & MAP_HUGETLB && !is_file_hugepages(file)))
            goto out_fput; /* THIS IS WHERE WE END UP */

    else if (flags & MAP_HUGETLB) {
        ...proceed normally, /dev/zero is ok here...

...and of course is_file_hugepages() returns "false" for the /dev/zero
file.

The problem is that the user space program, gup_benchmark.c, really just
wants anonymous memory here.  The simplest way to get that is to pass
MAP_ANONYMOUS whenever MAP_HUGETLB is specified, so that's what this
patch does.

Link: http://lkml.kernel.org/r/20191021212435.398153-2-jhubbard@nvidia.com
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Jérôme Glisse <jglisse@redhat.com>
Cc: Keith Busch <keith.busch@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agomm: memcontrol: fix NULL-ptr deref in percpu stats flush
Shakeel Butt [Wed, 6 Nov 2019 05:16:21 +0000 (21:16 -0800)]
mm: memcontrol: fix NULL-ptr deref in percpu stats flush

__mem_cgroup_free() can be called on the failure path in
mem_cgroup_alloc().  However memcg_flush_percpu_vmstats() and
memcg_flush_percpu_vmevents() which are called from __mem_cgroup_free()
access the fields of memcg which can potentially be null if called from
failure path from mem_cgroup_alloc().  Indeed syzbot has reported the
following crash:

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 30393 Comm: syz-executor.1 Not tainted 5.4.0-rc2+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:memcg_flush_percpu_vmstats+0x4ae/0x930 mm/memcontrol.c:3436
Code: 05 41 89 c0 41 0f b6 04 24 41 38 c7 7c 08 84 c0 0f 85 5d 03 00 00 44 3b 05 33 d5 12 08 0f 83 e2 00 00 00 4c 89 f0 48 c1 e8 03 <42> 80 3c 28 00 0f 85 91 03 00 00 48 8b 85 10 fe ff ff 48 8b b0 90
RSP: 0018:ffff888095c27980 EFLAGS: 00010206
RAX: 0000000000000012 RBX: ffff888095c27b28 RCX: ffffc90008192000
RDX: 0000000000040000 RSI: ffffffff8340fae7 RDI: 0000000000000007
RBP: ffff888095c27be0 R08: 0000000000000000 R09: ffffed1013f0da33
R10: ffffed1013f0da32 R11: ffff88809f86d197 R12: fffffbfff138b760
R13: dffffc0000000000 R14: 0000000000000090 R15: 0000000000000007
FS:  00007f5027170700(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000710158 CR3: 00000000a7b18000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
__mem_cgroup_free+0x1a/0x190 mm/memcontrol.c:5021
mem_cgroup_free mm/memcontrol.c:5033 [inline]
mem_cgroup_css_alloc+0x3a1/0x1ae0 mm/memcontrol.c:5160
css_create kernel/cgroup/cgroup.c:5156 [inline]
cgroup_apply_control_enable+0x44d/0xc40 kernel/cgroup/cgroup.c:3119
cgroup_mkdir+0x899/0x11b0 kernel/cgroup/cgroup.c:5401
kernfs_iop_mkdir+0x14d/0x1d0 fs/kernfs/dir.c:1124
vfs_mkdir+0x42e/0x670 fs/namei.c:3807
do_mkdirat+0x234/0x2a0 fs/namei.c:3830
__do_sys_mkdir fs/namei.c:3846 [inline]
__se_sys_mkdir fs/namei.c:3844 [inline]
__x64_sys_mkdir+0x5c/0x80 fs/namei.c:3844
do_syscall_64+0xfa/0x760 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixing this by moving the flush to mem_cgroup_free as there is no need
to flush anything if we see failure in mem_cgroup_alloc().

Link: http://lkml.kernel.org/r/20191018165231.249872-1-shakeelb@google.com
Fixes: ce6b59abbaa8 ("mm: memcontrol: flush percpu vmevents before releasing memcg")
Fixes: 412ac8b533e6 ("mm: memcontrol: flush percpu vmstats before releasing memcg")
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Reported-by: syzbot+515d5bcfe179cdf049b2@syzkaller.appspotmail.com
Reviewed-by: Roman Gushchin <guro@fb.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agoLinux 5.4-rc6
Linus Torvalds [Sun, 3 Nov 2019 22:07:26 +0000 (14:07 -0800)]
Linux 5.4-rc6

5 years agoMerge tag 'usb-5.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Linus Torvalds [Sun, 3 Nov 2019 16:25:25 +0000 (08:25 -0800)]
Merge tag 'usb-5.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB fixes from Greg KH:
 "The USB sub-maintainers woke up this past week and sent a bunch of
  tiny fixes. Here are a lot of small patches that that resolve a bunch
  of reported issues in the USB core, drivers, serial drivers, gadget
  drivers, and of course, xhci :)

  All of these have been in linux-next with no reported issues"

* tag 'usb-5.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (31 commits)
  usb: dwc3: gadget: fix race when disabling ep with cancelled xfers
  usb: cdns3: gadget: Fix g_audio use case when connected to Super-Speed host
  usb: cdns3: gadget: reset EP_CLAIMED flag while unloading
  USB: serial: whiteheat: fix line-speed endianness
  USB: serial: whiteheat: fix potential slab corruption
  USB: gadget: Reject endpoints with 0 maxpacket value
  UAS: Revert commit 18615eb8a2ac ("UAS: fix alignment of scatter/gather segments")
  usb-storage: Revert commit 4fd9527839f3 ("usb-storage: Set virt_boundary_mask to avoid SG overflows")
  usbip: Fix free of unallocated memory in vhci tx
  usbip: tools: Fix read_usb_vudc_device() error path handling
  usb: xhci: fix __le32/__le64 accessors in debugfs code
  usb: xhci: fix Immediate Data Transfer endianness
  xhci: Fix use-after-free regression in xhci clear hub TT implementation
  USB: ldusb: fix control-message timeout
  USB: ldusb: use unsigned size format specifiers
  USB: ldusb: fix ring-buffer locking
  USB: Skip endpoints with 0 maxpacket length
  usb: cdns3: gadget: Don't manage pullups
  usb: dwc3: remove the call trace of USBx_GFLADJ
  usb: gadget: configfs: fix concurrent issue between composite APIs
  ...

5 years agoMerge tag '5.4-rc6-smb3-fix' of git://git.samba.org/sfrench/cifs-2.6
Linus Torvalds [Sat, 2 Nov 2019 21:34:00 +0000 (14:34 -0700)]
Merge tag '5.4-rc6-smb3-fix' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs fix from Steve French:
 "A small smb3 memleak fix"

* tag '5.4-rc6-smb3-fix' of git://git.samba.org/sfrench/cifs-2.6:
  fix memory leak in large read decrypt offload

5 years agoMerge tag 'hwmon-for-v5.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/groec...
Linus Torvalds [Sat, 2 Nov 2019 18:28:59 +0000 (11:28 -0700)]
Merge tag 'hwmon-for-v5.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging

Pull hwmon fixes from Guenter Roeck:

 - Fix read timeout problem in ina3221 driver

 - Fix wrong bitmask in nct7904 driver

* tag 'hwmon-for-v5.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  hwmon: (ina3221) Fix read timeout issue
  hwmon: (nct7904) Fix the incorrect value of vsen_mask & tcpu_mask & temp_mode in nct7904_data struct.

5 years agoMerge tag 'pwm/for-5.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry...
Linus Torvalds [Sat, 2 Nov 2019 18:23:09 +0000 (11:23 -0700)]
Merge tag 'pwm/for-5.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm

Pull pwm fixes from Thierry Reding:
 "It turned out that relying solely on drivers storing all the PWM state
  in hardware was a little premature and causes a number of subtle (and
  some not so subtle) regressions. Revert the offending patch for now"

* tag 'pwm/for-5.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm:
  Revert "pwm: Let pwm_get_state() return the last implemented state"

5 years agoMerge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Linus Torvalds [Sat, 2 Nov 2019 18:15:52 +0000 (11:15 -0700)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "Nine changes, eight in drivers [ufs, target, lpfc x 2, qla2xxx x 4]
  and one core change in sd that fixes an I/O failure on DIF type 3
  devices"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: qla2xxx: stop timer in shutdown path
  scsi: sd: define variable dif as unsigned int instead of bool
  scsi: target: cxgbit: Fix cxgbit_fw4_ack()
  scsi: qla2xxx: Fix partial flash write of MBI
  scsi: qla2xxx: Initialized mailbox to prevent driver load failure
  scsi: lpfc: Honor module parameter lpfc_use_adisc
  scsi: ufs-bsg: Wake the device before sending raw upiu commands
  scsi: lpfc: Check queue pointer before use
  scsi: qla2xxx: fixup incorrect usage of host_byte

5 years agoMerge tag 'powerpc-5.4-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc...
Linus Torvalds [Sat, 2 Nov 2019 18:08:19 +0000 (11:08 -0700)]
Merge tag 'powerpc-5.4-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 "Our recent cleanup of EEH led to an oops on bare metal machines when
  the cxl (CAPI) driver creates virtual devices for an attached FPGA
  accelerator.

  The "secure virtual machine" support we added in v5.4 had a bug if the
  kernel was relocated (moved during boot), in those cases the signature
  of the kernel text wouldn't verify and the Ultravisor would refuse to
  run the VM.

  A recent change to disable interrupts before calling
  arch_cpu_idle_dead() caused a WARN_ON() in our bare metal CPU offline
  code to always trigger.

  The KUAP (SMAP) support we added for 32-bit Book3S had a bug if the
  address range crossed a segment (256MB) boundary which could lead to
  spurious faults.

  Thanks to: Christophe Leroy, Frederic Barrat, Michael Anderson,
  Nicholas Piggin, Sam Bobroff, Thiago Jung Bauermann"

* tag 'powerpc-5.4-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/powernv: Fix CPU idle to be called with IRQs disabled
  powerpc/prom_init: Undo relocation before entering secure mode
  powerpc/powernv/eeh: Fix oops when probing cxl devices
  powerpc/32s: fix allow/prevent_user_access() when crossing segment boundaries.

5 years agoMerge tag 's390-5.4-6' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Linus Torvalds [Sat, 2 Nov 2019 18:00:26 +0000 (11:00 -0700)]
Merge tag 's390-5.4-6' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 fixes from Vasily Gorbik:

 - Fix cpu idle time accounting

 - Fix stack unwinder case when both pt_regs and sp are specified

 - Fix information leak via cmm timeout proc handler

* tag 's390-5.4-6' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/idle: fix cpu idle time calculation
  s390/unwind: fix mixing regs and sp
  s390/cmm: fix information leak in cmm_timeout_handler()

5 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Sat, 2 Nov 2019 00:48:11 +0000 (17:48 -0700)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from David Miller:

 1) Fix free/alloc races in batmanadv, from Sven Eckelmann.

 2) Several leaks and other fixes in kTLS support of mlx5 driver, from
    Tariq Toukan.

 3) BPF devmap_hash cost calculation can overflow on 32-bit, from Toke
    Høiland-Jørgensen.

 4) Add an r8152 device ID, from Kazutoshi Noguchi.

 5) Missing include in ipv6's addrconf.c, from Ben Dooks.

 6) Use siphash in flow dissector, from Eric Dumazet. Attackers can
    easily infer the 32-bit secret otherwise etc.

 7) Several netdevice nesting depth fixes from Taehee Yoo.

 8) Fix several KCSAN reported errors, from Eric Dumazet. For example,
    when doing lockless skb_queue_empty() checks, and accessing
    sk_napi_id/sk_incoming_cpu lockless as well.

 9) Fix jumbo packet handling in RXRPC, from David Howells.

10) Bump SOMAXCONN and tcp_max_syn_backlog values, from Eric Dumazet.

11) Fix DMA synchronization in gve driver, from Yangchun Fu.

12) Several bpf offload fixes, from Jakub Kicinski.

13) Fix sk_page_frag() recursion during memory reclaim, from Tejun Heo.

14) Fix ping latency during high traffic rates in hisilicon driver, from
    Jiangfent Xiao.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (146 commits)
  net: fix installing orphaned programs
  net: cls_bpf: fix NULL deref on offload filter removal
  selftests: bpf: Skip write only files in debugfs
  selftests: net: reuseport_dualstack: fix uninitalized parameter
  r8169: fix wrong PHY ID issue with RTL8168dp
  net: dsa: bcm_sf2: Fix IMP setup for port different than 8
  net: phylink: Fix phylink_dbg() macro
  gve: Fixes DMA synchronization.
  inet: stop leaking jiffies on the wire
  ixgbe: Remove duplicate clear_bit() call
  Documentation: networking: device drivers: Remove stray asterisks
  e1000: fix memory leaks
  i40e: Fix receive buffer starvation for AF_XDP
  igb: Fix constant media auto sense switching when no cable is connected
  net: ethernet: arc: add the missed clk_disable_unprepare
  igb: Enable media autosense for the i350.
  igb/igc: Don't warn on fatal read failures when the device is removed
  tcp: increase tcp_max_syn_backlog max value
  net: increase SOMAXCONN to 4096
  netdevsim: Fix use-after-free during device dismantle
  ...

5 years agoMerge tag 'nfs-for-5.4-3' of git://git.linux-nfs.org/projects/anna/linux-nfs
Linus Torvalds [Sat, 2 Nov 2019 00:37:44 +0000 (17:37 -0700)]
Merge tag 'nfs-for-5.4-3' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client bugfixes from Anna Schumaker:
 "This contains two delegation fixes (with the RCU lock leak fix marked
  for stable), and three patches to fix destroying the the sunrpc back
  channel.

  Stable bugfixes:

   - Fix an RCU lock leak in nfs4_refresh_delegation_stateid()

  Other fixes:

   - The TCP back channel mustn't disappear while requests are
     outstanding

   - The RDMA back channel mustn't disappear while requests are
     outstanding

   - Destroy the back channel when we destroy the host transport

   - Don't allow a cached open with a revoked delegation"

* tag 'nfs-for-5.4-3' of git://git.linux-nfs.org/projects/anna/linux-nfs:
  NFS: Fix an RCU lock leak in nfs4_refresh_delegation_stateid()
  NFSv4: Don't allow a cached open with a revoked delegation
  SUNRPC: Destroy the back channel when we destroy the host transport
  SUNRPC: The RDMA back channel mustn't disappear while requests are outstanding
  SUNRPC: The TCP back channel mustn't disappear while requests are outstanding

5 years agoMerge tag 'for-linus-20191101' of git://git.kernel.dk/linux-block
Linus Torvalds [Sat, 2 Nov 2019 00:33:12 +0000 (17:33 -0700)]
Merge tag 'for-linus-20191101' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:

 - Two small nvme fixes, one is a fabrics connection fix, the other one
   a cleanup made possible by that fix (Anton, via Keith)

 - Fix requeue handling in umb ubd (Anton)

 - Fix spin_lock_irq() nesting in blk-iocost (Dan)

 - Three small io_uring fixes:
     - Install io_uring fd after done with ctx (me)
     - Clear ->result before every poll issue (me)
     - Fix leak of shadow request on error (Pavel)

* tag 'for-linus-20191101' of git://git.kernel.dk/linux-block:
  iocost: don't nest spin_lock_irq in ioc_weight_write()
  io_uring: ensure we clear io_kiocb->result before each issue
  um-ubd: Entrust re-queue to the upper layers
  nvme-multipath: remove unused groups_only mode in ana log
  nvme-multipath: fix possible io hang after ctrl reconnect
  io_uring: don't touch ctx in setup after ring fd install
  io_uring: Fix leaked shadow_req

5 years agoMerge tag 'riscv/for-v5.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv...
Linus Torvalds [Sat, 2 Nov 2019 00:20:53 +0000 (17:20 -0700)]
Merge tag 'riscv/for-v5.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V fixes from Paul Walmsley:
 "One fix for PCIe users:

   - Fix legacy PCI I/O port access emulation

  One set of cleanups:

   - Resolve most of the warnings generated by sparse across arch/riscv.
     No functional changes

  And one MAINTAINERS update:

   - Update Palmer's E-mail address"

* tag 'riscv/for-v5.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  MAINTAINERS: Change to my personal email address
  RISC-V: Add PCIe I/O BAR memory mapping
  riscv: for C functions called only from assembly, mark with __visible
  riscv: fp: add missing __user pointer annotations
  riscv: add missing header file includes
  riscv: mark some code and data as file-static
  riscv: init: merge split string literals in preprocessor directive
  riscv: add prototypes for assembly language functions from head.S

5 years agoMerge branch 'parisc-5.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller...
Linus Torvalds [Fri, 1 Nov 2019 22:16:25 +0000 (15:16 -0700)]
Merge branch 'parisc-5.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux

Pull parisc fix from Helge Deller:
 "Fix a parisc kernel crash with ftrace functions when compiled without
  frame pointers"

* 'parisc-5.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  parisc: fix frame pointer in ftrace_regs_caller()

5 years agoMerge branch 'fix-BPF-offload-related-bugs'
David S. Miller [Fri, 1 Nov 2019 22:16:01 +0000 (15:16 -0700)]
Merge branch 'fix-BPF-offload-related-bugs'

Jakub Kicinski says:

====================
fix BPF offload related bugs

test_offload.py catches some recently added bugs.

First of a bug in test_offload.py itself after recent changes
to netdevsim is fixed.

Second patch fixes a bug in cls_bpf, and last one addresses
a problem with the recently added XDP installation optimization.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: fix installing orphaned programs
Jakub Kicinski [Fri, 1 Nov 2019 03:07:00 +0000 (20:07 -0700)]
net: fix installing orphaned programs

When netdevice with offloaded BPF programs is destroyed
the programs are orphaned and removed from the program
IDA - their IDs get released (the programs may remain
accessible via existing open file descriptors and pinned
files). After IDs are released they are set to 0.

This confuses dev_change_xdp_fd() because it compares
the __dev_xdp_query() result where 0 means no program
with prog->aux->id where 0 means orphaned.

dev_change_xdp_fd() would have incorrectly returned success
even though it had not installed the program.

Since drivers already catch this case via bpf_offload_dev_match()
let them handle this case. The error message drivers produce in
this case ("program loaded for a different device") is in fact
correct as the orphaned program must had to be loaded for a
different device.

Fixes: 5cb5cb68b057 ("net: Don't call XDP_SETUP_PROG when nothing is changed")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: cls_bpf: fix NULL deref on offload filter removal
Jakub Kicinski [Fri, 1 Nov 2019 03:06:59 +0000 (20:06 -0700)]
net: cls_bpf: fix NULL deref on offload filter removal

Commit 09a5b0aafd51 ("net: sched: refactor block offloads counter
usage") missed the fact that either new prog or old prog may be
NULL.

Fixes: 09a5b0aafd51 ("net: sched: refactor block offloads counter usage")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftests: bpf: Skip write only files in debugfs
Jakub Kicinski [Fri, 1 Nov 2019 03:06:58 +0000 (20:06 -0700)]
selftests: bpf: Skip write only files in debugfs

DebugFS for netdevsim now contains some "action trigger" files
which are write only. Don't try to capture the contents of those.

Note that we can't use os.access() because the script requires
root.

Fixes: e095a5a4a3b5 ("netdevsim: implement support for devlink region and snapshots")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftests: net: reuseport_dualstack: fix uninitalized parameter
Wei Wang [Thu, 31 Oct 2019 23:24:36 +0000 (16:24 -0700)]
selftests: net: reuseport_dualstack: fix uninitalized parameter

This test reports EINVAL for getsockopt(SOL_SOCKET, SO_DOMAIN)
occasionally due to the uninitialized length parameter.
Initialize it to fix this, and also use int for "test_family" to comply
with the API standard.

Fixes: 294f16fc4d45 ("soreuseport: test mixed v4/v6 sockets")
Reported-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Wei Wang <weiwan@google.com>
Cc: Craig Gallek <cgallek@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agor8169: fix wrong PHY ID issue with RTL8168dp
Heiner Kallweit [Thu, 31 Oct 2019 23:10:21 +0000 (00:10 +0100)]
r8169: fix wrong PHY ID issue with RTL8168dp

As reported in [0] at least one RTL8168dp version has problems
establishing a link. This chip version has an integrated RTL8211b PHY,
however the chip seems to report a wrong PHY ID, resulting in a wrong
PHY driver (for Generic Realtek PHY) being loaded.
Work around this issue by adding a hook to r8168dp_2_mdio_read()
for returning the correct PHY ID.

[0] https://bbs.archlinux.org/viewtopic.php?id=246508

Fixes: c7a4b304c4eb ("r8169: use phy_resume/phy_suspend")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dsa: bcm_sf2: Fix IMP setup for port different than 8
Florian Fainelli [Thu, 31 Oct 2019 22:54:05 +0000 (15:54 -0700)]
net: dsa: bcm_sf2: Fix IMP setup for port different than 8

Since it became possible for the DSA core to use a CPU port different
than 8, our bcm_sf2_imp_setup() function was broken because it assumes
that registers are applicable to port 8. In particular, the port's MAC
is going to stay disabled, so make sure we clear the RX_DIS and TX_DIS
bits if we are not configured for port 8.

Fixes: 875ec329cfcf ("net: dsa: make "label" property optional for dsa2")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phylink: Fix phylink_dbg() macro
Florian Fainelli [Thu, 31 Oct 2019 22:42:26 +0000 (15:42 -0700)]
net: phylink: Fix phylink_dbg() macro

The phylink_dbg() macro does not follow dynamic debug or defined(DEBUG)
and as a result, it spams the kernel log since a PR_DEBUG level is
currently used. Fix it to be defined appropriately whether
CONFIG_DYNAMIC_DEBUG or defined(DEBUG) are set.

Fixes: be737870d068 ("net: phylink: Add phylink_{printk, err, warn, info, dbg} macros")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agogve: Fixes DMA synchronization.
Yangchun Fu [Fri, 1 Nov 2019 17:09:56 +0000 (10:09 -0700)]
gve: Fixes DMA synchronization.

Synces the DMA buffer properly in order for CPU and device to see
the most up-to-data data.

Signed-off-by: Yangchun Fu <yangchun@google.com>
Reviewed-by: Catherine Sullivan <csully@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoinet: stop leaking jiffies on the wire
Eric Dumazet [Fri, 1 Nov 2019 17:32:19 +0000 (10:32 -0700)]
inet: stop leaking jiffies on the wire

Historically linux tried to stick to RFC 791, 1122, 2003
for IPv4 ID field generation.

RFC 6864 made clear that no matter how hard we try,
we can not ensure unicity of IP ID within maximum
lifetime for all datagrams with a given source
address/destination address/protocol tuple.

Linux uses a per socket inet generator (inet_id), initialized
at connection startup with a XOR of 'jiffies' and other
fields that appear clear on the wire.

Thiemo Nagel pointed that this strategy is a privacy
concern as this provides 16 bits of entropy to fingerprint
devices.

Let's switch to a random starting point, this is just as
good as far as RFC 6864 is concerned and does not leak
anything critical.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Thiemo Nagel <tnagel@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net...
David S. Miller [Fri, 1 Nov 2019 21:50:27 +0000 (14:50 -0700)]
Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue

Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates 2019-11-01

This series contains updates to e1000, igb, igc, ixgbe, i40e and driver
documentation.

Lyude Paul fixes an issue where a fatal read error occurs when the
device is unplugged from the machine.  So change the read error into a
warn while the device is still present.

Manfred Rudigier found that the i350 device was not apart of the "Media
Auto Sense" feature, yet the device supports it.  So add the missing
i350 device to the check and fix an issue where the media auto sense
would flip/flop when no cable was connected to the port causing spurious
kernel log messages.

I fixed an issue where the fix to resolve receive buffer starvation was
applied in more than one place in the driver, one being the incorrect
location in the i40e driver.

Wenwen Wang fixes a potential memory leak in e1000 where allocated
memory is not properly cleaned up in one of the error paths.

Jonathan Neuschäfer cleans up the driver documentation to be consistent
and remove the footnote reference, since the footnote no longer exists in
the documentation.

Igor Pylypiv cleans up a duplicate clearing of a bit, no need to clear
it twice.

v2: Fixed alignment issue in patch 3 of the series based on community
    feedback.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoixgbe: Remove duplicate clear_bit() call
Igor Pylypiv [Fri, 4 Oct 2019 06:53:57 +0000 (23:53 -0700)]
ixgbe: Remove duplicate clear_bit() call

__IXGBE_RX_BUILD_SKB_ENABLED bit is already cleared.

Signed-off-by: Igor Pylypiv <igor.pylypiv@gmail.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoDocumentation: networking: device drivers: Remove stray asterisks
Jonathan Neuschäfer [Wed, 2 Oct 2019 15:09:55 +0000 (17:09 +0200)]
Documentation: networking: device drivers: Remove stray asterisks

These asterisks were once references to a line that said:
  "* Other names and brands may be claimed as the property of others."
But now, they serve no purpose; they can only irritate the reader.

Fixes: c3c0ca653b39 ("e1000: update README for e1000")
Fixes: a248654d9f14 ("e100.txt: Cleanup license info in kernel doc")
Fixes: c31d7b80052e ("e1000e.txt: Add e1000e documentation")
Fixes: 258d70d3cbfd ("Documentation: fm10k: Add kernel documentation")
Fixes: cda7ed94ee4e ("igb.txt: Add igb documentation")
Fixes: 3e696ba4e222 ("igbvf.txt: Add igbvf Documentation")
Fixes: 0f3259171b1a ("Documentation/networking/: Update Intel wired LAN driver documentation")
Fixes: e14bc305527e ("ixgbevf.txt: Update ixgbevf documentation")
Fixes: 3fa895fc6197 ("Documentation: i40e: Prepare documentation for RST conversion")
Fixes: 051803291273 ("i40evf: add driver to kernel build system")
Fixes: 5d0aaef7ac3f ("Documentation: ice: Prepare documentation for RST conversion")
Fixes: d56fd88c18e5 ("ionic: Add basic framework for IONIC Network device driver")
Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoe1000: fix memory leaks
Wenwen Wang [Mon, 12 Aug 2019 05:59:21 +0000 (00:59 -0500)]
e1000: fix memory leaks

In e1000_set_ringparam(), 'tx_old' and 'rx_old' are not deallocated if
e1000_up() fails, leading to memory leaks. Refactor the code to fix this
issue.

Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoi40e: Fix receive buffer starvation for AF_XDP
Jeff Kirsher [Mon, 7 Oct 2019 22:07:24 +0000 (15:07 -0700)]
i40e: Fix receive buffer starvation for AF_XDP

Magnus's fix to resolve a potential receive buffer starvation for AF_XDP
got applied to both the i40e_xsk_umem_enable/disable() functions, when it
should have only been applied to the "enable".  So clean up the undesired
code in the disable function.

CC: Magnus Karlsson <magnus.karlsson@intel.com>
Fixes: fceee18ac987 ("i40e: fix potential RX buffer starvation for AF_XDP")
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
5 years agoigb: Fix constant media auto sense switching when no cable is connected
Manfred Rudigier [Thu, 15 Aug 2019 20:55:20 +0000 (13:55 -0700)]
igb: Fix constant media auto sense switching when no cable is connected

At least on the i350 there is an annoying behavior that is maybe also
present on 82580 devices, but was probably not noticed yet as MAS is not
widely used.

If no cable is connected on both fiber/copper ports the media auto sense
code will constantly swap between them as part of the watchdog task and
produce many unnecessary kernel log messages.

The swap code responsible for this behavior (switching to fiber) should
not be executed if the current media type is copper and there is no signal
detected on the fiber port. In this case we can safely wait until the
AUTOSENSE_EN bit is cleared.

Signed-off-by: Manfred Rudigier <manfred.rudigier@omicronenergy.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoMerge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 1 Nov 2019 18:49:54 +0000 (11:49 -0700)]
Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler fixes from Ingo Molnar:
 "Fix two scheduler topology bugs/oversights on Juno r0 2+4 big.LITTLE
  systems"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/topology: Allow sched_asym_cpucapacity to be disabled
  sched/topology: Don't try to build empty sched domains

5 years agoMerge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 1 Nov 2019 18:40:47 +0000 (11:40 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf fixes from Ingo Molnar:
 "Misc fixes: an ABI fix for a reserved field, AMD IBS fixes, an Intel
  uncore PMU driver fix and a header typo fix"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/headers: Fix spelling s/EACCESS/EACCES/, s/privilidge/privilege/
  perf/x86/uncore: Fix event group support
  perf/x86/amd/ibs: Handle erratum #420 only on the affected CPU family (10h)
  perf/x86/amd/ibs: Fix reading of the IBS OpData register and thus precise RIP validity
  perf/core: Start rejecting the syscall with attr.__reserved_2 set

5 years agoMerge branch 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 1 Nov 2019 18:32:50 +0000 (11:32 -0700)]
Merge branch 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull EFI fixes from Ingo Molnar:
 "Various fixes all over the map: prevent boot crashes on HyperV,
  classify UEFI randomness as bootloader randomness, fix EFI boot for
  the Raspberry Pi2, fix efi_test permissions, etc"

* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  efi/efi_test: Lock down /dev/efi_test and require CAP_SYS_ADMIN
  x86, efi: Never relocate kernel below lowest acceptable address
  efi: libstub/arm: Account for firmware reserved memory at the base of RAM
  efi/random: Treat EFI_RNG_PROTOCOL output as bootloader randomness
  efi/tpm: Return -EINVAL when determining tpm final events log size fails
  efi: Make CONFIG_EFI_RCI2_TABLE selectable on x86 only

5 years agoMerge tag 'wireless-drivers-2019-11-01' of git://git.kernel.org/pub/scm/linux/kernel...
David S. Miller [Fri, 1 Nov 2019 17:36:46 +0000 (10:36 -0700)]
Merge tag 'wireless-drivers-2019-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers

Kalle Valo says:

====================
wireless-drivers fixes for 5.4

Third set of fixes for 5.4. Most of them are for iwlwifi but important
fixes also for rtlwifi and mt76, the overflow fix for rtlwifi being
most important.

iwlwifi

* fix merge damage on earlier patch

* various fixes to device id handling

* fix scan config command handling which caused firmware asserts

rtlwifi

* fix overflow on P2P IE handling

* don't deliver too small frames to mac80211

mt76

* disable PCIE_ASPM

* fix buffer DMA unmap on certain cases
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: ethernet: arc: add the missed clk_disable_unprepare
Chuhong Yuan [Fri, 1 Nov 2019 12:17:25 +0000 (20:17 +0800)]
net: ethernet: arc: add the missed clk_disable_unprepare

The remove misses to disable and unprepare priv->macclk like what is done
when probe fails.
Add the missed call in remove.

Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Linus Torvalds [Fri, 1 Nov 2019 17:03:46 +0000 (10:03 -0700)]
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fixes from Will Deacon:
 "These are almost exclusively related to CPU errata in CPUs from
  Broadcom and Qualcomm where the workarounds were either not being
  enabled when they should have been or enabled when they shouldn't have
  been.

  The only "interesting" fix is ensuring that writeable, shared mappings
  are initially mapped as clean since we inadvertently broke the logic
  back in v4.14 and then noticed the problem via code inspection the
  other day.

  The only critical issue we have outstanding is a sporadic NULL
  dereference in the scheduler, which doesn't appear to be
  arm64-specific and PeterZ is tearing his hair out over it at the
  moment.

  Summary:

   - Enable CPU errata workarounds for Broadcom Brahma-B53

   - Enable CPU errata workarounds for Qualcomm Hydra/Kryo CPUs

   - Fix initial dirty status of writeable, shared mappings"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: apply ARM64_ERRATUM_843419 workaround for Brahma-B53 core
  arm64: Brahma-B53 is SSB and spectre v2 safe
  arm64: apply ARM64_ERRATUM_845719 workaround for Brahma-B53 core
  arm64: cpufeature: Enable Qualcomm Falkor errata 1009 for Kryo
  arm64: cpufeature: Enable Qualcomm Falkor/Kryo errata 1003
  arm64: Ensure VM_WRITE|VM_SHARED ptes are clean by default

5 years agoMerge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Linus Torvalds [Fri, 1 Nov 2019 16:54:38 +0000 (09:54 -0700)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "generic:
   - fix memory leak on failure to create VM

  x86:
   - fix MMU corner case with AMD nested paging disabled"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: vmx, svm: always run with EFER.NXE=1 when shadow paging is active
  kvm: call kvm_arch_destroy_vm if vm creation fails
  kvm: Allocate memslots and buses before calling kvm_arch_init_vm

5 years agoMerge tag 'drm-fixes-2019-11-01' of git://anongit.freedesktop.org/drm/drm
Linus Torvalds [Fri, 1 Nov 2019 16:41:08 +0000 (09:41 -0700)]
Merge tag 'drm-fixes-2019-11-01' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Dave Airlie:
 "This is the regular drm fixes pull request for 5.4-rc6. It's a bit
  larger than I'd like but then last week was quieter than usual.

  The main fixes are amdgpu, and the two bigger area are navi fixes
  which are the newest GPU range so still getting actively fixed up, but
  also a bunch of clang stack alignment fixes (as amdgpu uses double in
  some places).

  Otherwise it's all fairly run of the mill fixes, i915, panfrost,
  etnaviv, v3d and radeon, along with a core scheduler fix.

  Summary:

  amdgpu:
   - clang alignment fixes
   - Updated golden settings
   - navi: gpuvm, sdma and display fixes
   - Freesync fix
   - Gamma fix for DCN
   - DP dongle detection fix
   - vega10: Fix for undervolting

  radeon:
   - reenable kexec fix for ppc

  scheduler:
   - set an error if hw job failed

  i915:
   - fix PCH reference clock for HSW/BDW
   - TGL display PLL doc fix

  panfrost:
   - warning fix
   - runtime pm fix
   - bad pointer dereference fix

  v3d:
   - memleak fix

  etnaviv:
   - memory corruption fix
   - deadlock fix
   - reintroduce lost debug message"

* tag 'drm-fixes-2019-11-01' of git://anongit.freedesktop.org/drm/drm: (29 commits)
  drm/amdgpu: enable -msse2 for GCC 7.1+ users
  drm/amdgpu: fix stack alignment ABI mismatch for GCC 7.1+
  drm/amdgpu: fix stack alignment ABI mismatch for Clang
  drm/radeon: Fix EEH during kexec
  drm/amdgpu/gmc10: properly set BANK_SELECT and FRAGMENT_SIZE
  drm/amdgpu/powerplay/vega10: allow undervolting in p7
  dc.c:use kzalloc without test
  drm/amd/display: setting the DIG_MODE to the correct value.
  drm/amd/display: Passive DP->HDMI dongle detection fix
  drm/amd/display: add 50us buffer as WA for pstate switch in active
  drm/amd/display: Allow inverted gamma
  drm/amd/display: do not synchronize "drr" displays
  drm/amdgpu: If amdgpu_ib_schedule fails return back the error.
  drm/sched: Set error to s_fence if HW job submission failed.
  drm/amdgpu/gfx10: update gfx golden settings for navi12
  drm/amdgpu/gfx10: update gfx golden settings for navi14
  drm/amdgpu/gfx10: update gfx golden settings
  drm/amd/display: Change Navi14's DWB flag to 1
  drm/amdgpu/sdma5: do not execute 0-sized IBs (v2)
  drm/amdgpu: Fix SDMA hang when performing VKexample test
  ...

5 years agoMerge tag 'pm-5.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Linus Torvalds [Fri, 1 Nov 2019 16:30:48 +0000 (09:30 -0700)]
Merge tag 'pm-5.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fix from Rafael Wysocki:
 "Fix a recently introduced (mostly theoretical) issue that the requests
  to confine the maximum CPU frequency coming from the platform firmware
  may not be taken into account if multiple CPUs are covered by one
  cpufreq policy on a system with ACPI"

* tag 'pm-5.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: processor: Add QoS requests for all CPUs

5 years agoMerge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Linus Torvalds [Fri, 1 Nov 2019 16:21:48 +0000 (09:21 -0700)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma fixes from Jason Gunthorpe:
 "A number of bug fixes and a regression fix:

   - Various issues from static analysis in hfi1, uverbs, hns, and cxgb4

   - Fix for deadlock in a case when the new auto RDMA module loading is
     used

   - Missing _irq notation in a prior -rc patch found by lockdep

   - Fix a locking and lifetime issue in siw

   - Minor functional bug fixes in cxgb4, mlx5, qedr

   - Fix a regression where vlan interfaces no longer worked with RDMA
     CM in some cases"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  RDMA/hns: Prevent memory leaks of eq->buf_list
  RDMA/iw_cxgb4: Avoid freeing skb twice in arp failure case
  RDMA/mlx5: Use irq xarray locking for mkey_table
  IB/core: Avoid deadlock during netlink message handling
  RDMA/nldev: Skip counter if port doesn't match
  RDMA/uverbs: Prevent potential underflow
  IB/core: Use rdma_read_gid_l2_fields to compare GID L2 fields
  RDMA/qedr: Fix reported firmware version
  RDMA/siw: free siw_base_qp in kref release routine
  RDMA/iwcm: move iw_rem_ref() calls out of spinlock
  iw_cxgb4: fix ECN check on the passive accept
  IB/hfi1: Use a common pad buffer for 9B and 16B packets
  IB/hfi1: Avoid excessive retry for TID RDMA READ request
  RDMA/mlx5: Clear old rate limit when closing QP

5 years agoMerge tag 'sound-5.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai...
Linus Torvalds [Fri, 1 Nov 2019 16:18:00 +0000 (09:18 -0700)]
Merge tag 'sound-5.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
 "A couple of regression fixes and a fix for mutex deadlock at
  hog-unplug, as well as other device-specific fixes:

   - A commit to avoid the spurious unsolicited interrupt on HD-audio
     bus caused a stall at shutdown, so it's reverted now.

   - The recent support of AMD/Nvidia audio component binding caused a
     mutex deadlock; fixed by splitting to another mutex

   - The device hot-unplug and the ALSA timer close combo may lead to
     another mutex deadlock; fixed by moving put_device() calls

   - Usual device-specific small quirks for HD- and USB-audio drivers

   - An old error check fix in FireWire driver"

* tag 'sound-5.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: timer: Fix mutex deadlock at releasing card
  ALSA: hda - Fix mutex deadlock in HDMI codec driver
  Revert "ALSA: hda: Flush interrupts on disabling"
  ALSA: bebob: Fix prototype of helper function to return negative value
  ALSA: hda/realtek - Fix 2 front mics of codec 0x623
  ALSA: hda/realtek - Add support for ALC623
  ALSA: usb-audio: Add DSD support for Gustard U16/X26 USB Interface

5 years agoNFS: Fix an RCU lock leak in nfs4_refresh_delegation_stateid()
Trond Myklebust [Thu, 31 Oct 2019 22:40:33 +0000 (18:40 -0400)]
NFS: Fix an RCU lock leak in nfs4_refresh_delegation_stateid()

A typo in nfs4_refresh_delegation_stateid() means we're leaking an
RCU lock, and always returning a value of 'false'. As the function
description states, we were always supposed to return 'true' if a
matching delegation was found.

Fixes: 2e4147bd634a ("NFSv4: Retry CLOSE and DELEGRETURN on NFS4ERR_OLD_STATEID.")
Cc: stable@vger.kernel.org # v4.15+
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
5 years agoNFSv4: Don't allow a cached open with a revoked delegation
Trond Myklebust [Thu, 31 Oct 2019 22:40:32 +0000 (18:40 -0400)]
NFSv4: Don't allow a cached open with a revoked delegation

If the delegation is marked as being revoked, we must not use it
for cached opens.

Fixes: 277ef18b07a3 ("NFSv4: Fix races between nfs_remove_bad_delegation() and delegation return")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
5 years agoarm64: apply ARM64_ERRATUM_843419 workaround for Brahma-B53 core
Florian Fainelli [Thu, 31 Oct 2019 21:47:25 +0000 (14:47 -0700)]
arm64: apply ARM64_ERRATUM_843419 workaround for Brahma-B53 core

The Broadcom Brahma-B53 core is susceptible to the issue described by
ARM64_ERRATUM_843419 so this commit enables the workaround to be applied
when executing on that core.

Since there are now multiple entries to match, we must convert the
existing ARM64_ERRATUM_843419 into an erratum list and use
cpucap_multi_entry_cap_matches to match our entries.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Will Deacon <will@kernel.org>
5 years agoarm64: Brahma-B53 is SSB and spectre v2 safe
Florian Fainelli [Thu, 31 Oct 2019 21:47:24 +0000 (14:47 -0700)]
arm64: Brahma-B53 is SSB and spectre v2 safe

Add the Brahma-B53 CPU (all versions) to the whitelists of CPUs for the
SSB and spectre v2 mitigations.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Will Deacon <will@kernel.org>
5 years agoarm64: apply ARM64_ERRATUM_845719 workaround for Brahma-B53 core
Doug Berger [Thu, 31 Oct 2019 21:47:23 +0000 (14:47 -0700)]
arm64: apply ARM64_ERRATUM_845719 workaround for Brahma-B53 core

The Broadcom Brahma-B53 core is susceptible to the issue described by
ARM64_ERRATUM_845719 so this commit enables the workaround to be applied
when executing on that core.

Since there are now multiple entries to match, we must convert the
existing ARM64_ERRATUM_845719 into an erratum list.

Signed-off-by: Doug Berger <opendmb@gmail.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Will Deacon <will@kernel.org>
5 years agoMerge tag 'drm-fixes-5.4-2019-10-30' of git://people.freedesktop.org/~agd5f/linux...
Dave Airlie [Fri, 1 Nov 2019 01:27:39 +0000 (11:27 +1000)]
Merge tag 'drm-fixes-5.4-2019-10-30' of git://people.freedesktop.org/~agd5f/linux into drm-fixes

drm-fixes-5.4-2019-10-30:

amdgpu:
- clang fixes
- Updated golden settings
- GPUVM fixes for navi
- Navi sdma fix
- Navi display fixes
- Freesync fix
- Gamma fix for DCN
- DP dongle detection fix
- Fix for undervolting on vega10

radeon:
- enable kexec fix for PPC

scheduler:
- set an error on fence if hw job failed

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191030162339.44366-1-alexander.deucher@amd.com
5 years agoMerge tag 'drm-intel-fixes-2019-10-31' of git://anongit.freedesktop.org/drm/drm-intel...
Dave Airlie [Fri, 1 Nov 2019 01:13:35 +0000 (11:13 +1000)]
Merge tag 'drm-intel-fixes-2019-10-31' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes

- Fix PCH reference clock for FDI on HSW/BDW which was causing users blank screen
- Small documentation fix for TGL display PLLs

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191031171209.GA6586@intel.com
5 years agoMerge tag 'drm-misc-fixes-2019-10-30-1' of git://anongit.freedesktop.org/drm/drm...
Dave Airlie [Fri, 1 Nov 2019 01:09:42 +0000 (11:09 +1000)]
Merge tag 'drm-misc-fixes-2019-10-30-1' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes

 - three fixes for panfrost, one to silence a warning, one to fix
   runtime_pm and one to prevent bogus pointer dereferences
 - one fix for a memleak in v3d

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maxime Ripard <mripard@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20191030182207.evrscl7lnv42u5zu@hendrix
5 years agoMerge branch 'etnaviv/fixes' of https://git.pengutronix.de/git/lst/linux into drm...
Dave Airlie [Fri, 1 Nov 2019 01:08:24 +0000 (11:08 +1000)]
Merge branch 'etnaviv/fixes' of https://git.pengutronix.de/git/lst/linux into drm-fixes

One memory corruption fix in the MMUv2 GPU coredump code, a deadlock
fix also in the coredump code and reintroduction of a helpful message,
which got dropped by accident in this cycle.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Lucas Stach <l.stach@pengutronix.de>
Link: https://patchwork.freedesktop.org/patch/msgid/b0d640267662e3ce5e0089d0afedc1baba55058d.camel@pengutronix.de
5 years agoigb: Enable media autosense for the i350.
Manfred Rudigier [Thu, 15 Aug 2019 20:55:19 +0000 (13:55 -0700)]
igb: Enable media autosense for the i350.

This patch enables the hardware feature "Media Auto Sense" also on the
i350. It works in the same way as on the 82850 devices. Hardware designs
using dual PHYs (fiber/copper) can enable this feature by setting the MAS
enable bits in the NVM_COMPAT register (0x03) in the EEPROM.

Signed-off-by: Manfred Rudigier <manfred.rudigier@omicronenergy.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoigb/igc: Don't warn on fatal read failures when the device is removed
Lyude Paul [Thu, 22 Aug 2019 18:33:18 +0000 (14:33 -0400)]
igb/igc: Don't warn on fatal read failures when the device is removed

Fatal read errors are worth warning about, unless of course the device
was just unplugged from the machine - something that's a rather normal
occurrence when the igb/igc adapter is located on a Thunderbolt dock. So,
let's only WARN() if there's a fatal read error while the device is
still present.

This fixes the following WARN splat that's been appearing whenever I
unplug my Caldigit TS3 Thunderbolt dock from my laptop:

  igb 0000:09:00.0 enp9s0: PCIe link lost
  ------------[ cut here ]------------
  igb: Failed to read reg 0x18!
  WARNING: CPU: 7 PID: 516 at
  drivers/net/ethernet/intel/igb/igb_main.c:756 igb_rd32+0x57/0x6a [igb]
  Modules linked in: igb dca thunderbolt fuse vfat fat elan_i2c mei_wdt
  mei_hdcp i915 wmi_bmof intel_wmi_thunderbolt iTCO_wdt
  iTCO_vendor_support x86_pkg_temp_thermal intel_powerclamp joydev
  coretemp crct10dif_pclmul crc32_pclmul i2c_algo_bit ghash_clmulni_intel
  intel_cstate drm_kms_helper intel_uncore syscopyarea sysfillrect
  sysimgblt fb_sys_fops intel_rapl_perf intel_xhci_usb_role_switch mei_me
  drm roles idma64 i2c_i801 ucsi_acpi typec_ucsi mei intel_lpss_pci
  processor_thermal_device typec intel_pch_thermal intel_soc_dts_iosf
  intel_lpss int3403_thermal thinkpad_acpi wmi int340x_thermal_zone
  ledtrig_audio int3400_thermal acpi_thermal_rel acpi_pad video
  pcc_cpufreq ip_tables serio_raw nvme nvme_core crc32c_intel uas
  usb_storage e1000e i2c_dev
  CPU: 7 PID: 516 Comm: kworker/u16:3 Not tainted 5.2.0-rc1Lyude-Test+ #14
  Hardware name: LENOVO 20L8S2N800/20L8S2N800, BIOS N22ET35W (1.12 ) 04/09/2018
  Workqueue: kacpi_hotplug acpi_hotplug_work_fn
  RIP: 0010:igb_rd32+0x57/0x6a [igb]
  Code: 87 b8 fc ff ff 48 c7 47 08 00 00 00 00 48 c7 c6 33 42 9b c0 4c 89
  c7 e8 47 45 cd dc 89 ee 48 c7 c7 43 42 9b c0 e8 c1 94 71 dc <0f> 0b eb
  08 8b 00 ff c0 75 b0 eb c8 44 89 e0 5d 41 5c c3 0f 1f 44
  RSP: 0018:ffffba5801cf7c48 EFLAGS: 00010286
  RAX: 0000000000000000 RBX: ffff9e7956608840 RCX: 0000000000000007
  RDX: 0000000000000000 RSI: ffffba5801cf7b24 RDI: ffff9e795e3d6a00
  RBP: 0000000000000018 R08: 000000009dec4a01 R09: ffffffff9e61018f
  R10: 0000000000000000 R11: ffffba5801cf7ae5 R12: 00000000ffffffff
  R13: ffff9e7956608840 R14: ffff9e795a6f10b0 R15: 0000000000000000
  FS:  0000000000000000(0000) GS:ffff9e795e3c0000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000564317bc4088 CR3: 000000010e00a006 CR4: 00000000003606e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   igb_release_hw_control+0x1a/0x30 [igb]
   igb_remove+0xc5/0x14b [igb]
   pci_device_remove+0x3b/0x93
   device_release_driver_internal+0xd7/0x17e
   pci_stop_bus_device+0x36/0x75
   pci_stop_bus_device+0x66/0x75
   pci_stop_bus_device+0x66/0x75
   pci_stop_and_remove_bus_device+0xf/0x19
   trim_stale_devices+0xc5/0x13a
   ? __pm_runtime_resume+0x6e/0x7b
   trim_stale_devices+0x103/0x13a
   ? __pm_runtime_resume+0x6e/0x7b
   trim_stale_devices+0x103/0x13a
   acpiphp_check_bridge+0xd8/0xf5
   acpiphp_hotplug_notify+0xf7/0x14b
   ? acpiphp_check_bridge+0xf5/0xf5
   acpi_device_hotplug+0x357/0x3b5
   acpi_hotplug_work_fn+0x1a/0x23
   process_one_work+0x1a7/0x296
   worker_thread+0x1a8/0x24c
   ? process_scheduled_works+0x2c/0x2c
   kthread+0xe9/0xee
   ? kthread_destroy_worker+0x41/0x41
   ret_from_fork+0x35/0x40
  ---[ end trace 252bf10352c63d22 ]---

Signed-off-by: Lyude Paul <lyude@redhat.com>
Fixes: 845f243b1467 ("igb/igc: warn when fatal read failure happens")
Acked-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Acked-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agotcp: increase tcp_max_syn_backlog max value
Eric Dumazet [Wed, 30 Oct 2019 17:05:46 +0000 (10:05 -0700)]
tcp: increase tcp_max_syn_backlog max value

tcp_max_syn_backlog default value depends on memory size
and TCP ehash size. Before this patch, the max value
was 2048 [1], which is considered too small nowadays.

Increase it to 4096 to match the recent SOMAXCONN change.

[1] This is with TCP ehash size being capped to 524288 buckets.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Yue Cao <ycao009@ucr.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: increase SOMAXCONN to 4096
Eric Dumazet [Wed, 30 Oct 2019 16:36:20 +0000 (09:36 -0700)]
net: increase SOMAXCONN to 4096

SOMAXCONN is /proc/sys/net/core/somaxconn default value.

It has been defined as 128 more than 20 years ago.

Since it caps the listen() backlog values, the very small value has
caused numerous problems over the years, and many people had
to raise it on their hosts after beeing hit by problems.

Google has been using 1024 for at least 15 years, and we increased
this to 4096 after TCP listener rework has been completed, more than
4 years ago. We got no complain of this change breaking any
legacy application.

Many applications indeed setup a TCP listener with listen(fd, -1);
meaning they let the system select the backlog.

Raising SOMAXCONN lowers chance of the port being unavailable under
even small SYNFLOOD attack, and reduces possibilities of side channel
vulnerabilities.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Yue Cao <ycao009@ucr.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'pm-cpufreq'
Rafael J. Wysocki [Thu, 31 Oct 2019 20:41:37 +0000 (21:41 +0100)]
Merge branch 'pm-cpufreq'

* pm-cpufreq:
  ACPI: processor: Add QoS requests for all CPUs

5 years agonetdevsim: Fix use-after-free during device dismantle
Ido Schimmel [Thu, 31 Oct 2019 16:20:30 +0000 (18:20 +0200)]
netdevsim: Fix use-after-free during device dismantle

Commit d0e066259b52 ("netdevsim: Add devlink-trap support") added
delayed work to netdevsim that periodically iterates over the registered
netdevsim ports and reports various packet traps via devlink.

While the delayed work takes the 'port_list_lock' mutex to protect
against concurrent addition / deletion of ports, during device creation
/ dismantle ports are added / deleted without this lock, which can
result in a use-after-free [1].

Fix this by making sure that the ports list is always modified under the
lock.

[1]
[   59.205543] ==================================================================
[   59.207748] BUG: KASAN: use-after-free in nsim_dev_trap_report_work+0xa67/0xad0
[   59.210247] Read of size 8 at addr ffff8883cbdd3398 by task kworker/3:1/38
[   59.212584]
[   59.213148] CPU: 3 PID: 38 Comm: kworker/3:1 Not tainted 5.4.0-rc3-custom-16119-ge6abb5f0261e #2013
[   59.215896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20180724_192412-buildhw-07.phx2.fedoraproject.org-1.fc29 04/01/2014
[   59.218384] Workqueue: events nsim_dev_trap_report_work
[   59.219428] Call Trace:
[   59.219924]  dump_stack+0xa9/0x10e
[   59.220623]  print_address_description.constprop.4+0x21/0x340
[   59.221976]  ? vprintk_func+0x66/0x240
[   59.222752]  __kasan_report.cold.8+0x78/0x91
[   59.223602]  ? nsim_dev_trap_report_work+0xa67/0xad0
[   59.224603]  kasan_report+0xe/0x20
[   59.225296]  nsim_dev_trap_report_work+0xa67/0xad0
[   59.226435]  ? rcu_read_lock_sched_held+0xaf/0xe0
[   59.227512]  ? trace_event_raw_event_rcu_quiescent_state_report+0x360/0x360
[   59.228851]  process_one_work+0x98f/0x1760
[   59.229684]  ? pwq_dec_nr_in_flight+0x330/0x330
[   59.230656]  worker_thread+0x91/0xc40
[   59.231587]  ? process_one_work+0x1760/0x1760
[   59.232451]  kthread+0x34a/0x410
[   59.233104]  ? __kthread_queue_delayed_work+0x240/0x240
[   59.234141]  ret_from_fork+0x3a/0x50
[   59.234982]
[   59.235371] Allocated by task 187:
[   59.236189]  save_stack+0x19/0x80
[   59.236853]  __kasan_kmalloc.constprop.5+0xc1/0xd0
[   59.237822]  kmem_cache_alloc_trace+0x14c/0x380
[   59.238769]  __nsim_dev_port_add+0xaf/0x5c0
[   59.239627]  nsim_dev_probe+0x4fc/0x1140
[   59.240550]  really_probe+0x264/0xc00
[   59.241418]  driver_probe_device+0x208/0x2e0
[   59.242255]  __device_attach_driver+0x215/0x2d0
[   59.243150]  bus_for_each_drv+0x154/0x1d0
[   59.243944]  __device_attach+0x1ba/0x2b0
[   59.244923]  bus_probe_device+0x1dd/0x290
[   59.245805]  device_add+0xbac/0x1550
[   59.246528]  new_device_store+0x1f4/0x400
[   59.247306]  bus_attr_store+0x7b/0xa0
[   59.248047]  sysfs_kf_write+0x10f/0x170
[   59.248941]  kernfs_fop_write+0x283/0x430
[   59.249843]  __vfs_write+0x81/0x100
[   59.250546]  vfs_write+0x1ce/0x510
[   59.251190]  ksys_write+0x104/0x200
[   59.251873]  do_syscall_64+0xa4/0x4e0
[   59.252642]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[   59.253837]
[   59.254203] Freed by task 187:
[   59.254811]  save_stack+0x19/0x80
[   59.255463]  __kasan_slab_free+0x125/0x170
[   59.256265]  kfree+0x100/0x440
[   59.256870]  nsim_dev_remove+0x98/0x100
[   59.257651]  nsim_bus_remove+0x16/0x20
[   59.258382]  device_release_driver_internal+0x20b/0x4d0
[   59.259588]  bus_remove_device+0x2e9/0x5a0
[   59.260551]  device_del+0x410/0xad0
[   59.263777]  device_unregister+0x26/0xc0
[   59.264616]  nsim_bus_dev_del+0x16/0x60
[   59.265381]  del_device_store+0x2d6/0x3c0
[   59.266295]  bus_attr_store+0x7b/0xa0
[   59.267192]  sysfs_kf_write+0x10f/0x170
[   59.267960]  kernfs_fop_write+0x283/0x430
[   59.268800]  __vfs_write+0x81/0x100
[   59.269551]  vfs_write+0x1ce/0x510
[   59.270252]  ksys_write+0x104/0x200
[   59.270910]  do_syscall_64+0xa4/0x4e0
[   59.271680]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[   59.272812]
[   59.273211] The buggy address belongs to the object at ffff8883cbdd3200
[   59.273211]  which belongs to the cache kmalloc-512 of size 512
[   59.275838] The buggy address is located 408 bytes inside of
[   59.275838]  512-byte region [ffff8883cbdd3200ffff8883cbdd3400)
[   59.278151] The buggy address belongs to the page:
[   59.279215] page:ffffea000f2f7400 refcount:1 mapcount:0 mapping:ffff8883ecc0ce00 index:0x0 compound_mapcount: 0
[   59.281449] flags: 0x200000000010200(slab|head)
[   59.282356] raw: 0200000000010200 ffffea000f2f3a08 ffffea000f2fd608 ffff8883ecc0ce00
[   59.283949] raw: 0000000000000000 0000000000150015 00000001ffffffff 0000000000000000
[   59.285608] page dumped because: kasan: bad access detected
[   59.286981]
[   59.287337] Memory state around the buggy address:
[   59.288310]  ffff8883cbdd3280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[   59.289763]  ffff8883cbdd3300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[   59.291452] >ffff8883cbdd3380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[   59.292945]                             ^
[   59.293815]  ffff8883cbdd3400: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[   59.295220]  ffff8883cbdd3480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[   59.296872] ==================================================================

Fixes: d0e066259b52 ("netdevsim: Add devlink-trap support")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: syzbot+9ed8f68ab30761f3678e@syzkaller.appspotmail.com
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agorxrpc: Fix handling of last subpacket of jumbo packet
David Howells [Thu, 31 Oct 2019 12:13:46 +0000 (12:13 +0000)]
rxrpc: Fix handling of last subpacket of jumbo packet

When rxrpc_recvmsg_data() sets the return value to 1 because it's drained
all the data for the last packet, it checks the last-packet flag on the
whole packet - but this is wrong, since the last-packet flag is only set on
the final subpacket of the last jumbo packet.  This means that a call that
receives its last packet in a jumbo packet won't complete properly.

Fix this by having rxrpc_locate_data() determine the last-packet state of
the subpacket it's looking at and passing that back to the caller rather
than having the caller look in the packet header.  The caller then needs to
cache this in the rxrpc_call struct as rxrpc_locate_data() isn't then
called again for this packet.

Fixes: a42fc64729e7 ("rxrpc: Rewrite the data and ack handling code")
Fixes: 85aa2117dd22 ("rxrpc: Use info in skbuff instead of reparsing a jumbo packet")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge tag 'mac80211-for-net-2019-10-31' of git://git.kernel.org/pub/scm/linux/kernel...
David S. Miller [Thu, 31 Oct 2019 18:43:36 +0000 (11:43 -0700)]
Merge tag 'mac80211-for-net-2019-10-31' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211

Johannes Berg says:

====================
Just two fixes:
 * HT operation is not allowed on channel 14 (Japan only)
 * netlink policy for nexthop attribute was wrong
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agousb: dwc3: gadget: fix race when disabling ep with cancelled xfers
Felipe Balbi [Thu, 31 Oct 2019 09:07:13 +0000 (11:07 +0200)]
usb: dwc3: gadget: fix race when disabling ep with cancelled xfers

When disabling an endpoint which has cancelled requests, we should
make sure to giveback requests that are currently pending in the
cancelled list, otherwise we may fall into a situation where command
completion interrupt fires after endpoint has been disabled, therefore
causing a splat.

Fixes: cf160a5a1490 "usb: dwc3: gadget: remove wait_end_transfer"
Reported-by: Roger Quadros <rogerq@ti.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Link: https://lore.kernel.org/r/20191031090713.1452818-1-felipe.balbi@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoiocost: don't nest spin_lock_irq in ioc_weight_write()
Dan Carpenter [Thu, 31 Oct 2019 10:53:41 +0000 (13:53 +0300)]
iocost: don't nest spin_lock_irq in ioc_weight_write()

This code causes a static analysis warning:

    block/blk-iocost.c:2113 ioc_weight_write() error: double lock 'irq'

We disable IRQs in blkg_conf_prep() and re-enable them in
blkg_conf_finish().  IRQ disable/enable should not be nested because
that means the IRQs will be enabled at the first unlock instead of the
second one.

Fixes: fe734d3093a8 ("blkcg: implement blk-iocost")
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agos390/idle: fix cpu idle time calculation
Heiko Carstens [Mon, 28 Oct 2019 10:03:27 +0000 (11:03 +0100)]
s390/idle: fix cpu idle time calculation

The idle time reported in /proc/stat sometimes incorrectly contains
huge values on s390. This is caused by a bug in arch_cpu_idle_time().

The kernel tries to figure out when a different cpu entered idle by
accessing its per-cpu data structure. There is an ordering problem: if
the remote cpu has an idle_enter value which is not zero, and an
idle_exit value which is zero, it is assumed it is idle since
"now". The "now" timestamp however is taken before the idle_enter
value is read.

Which in turn means that "now" can be smaller than idle_enter of the
remote cpu. Unconditionally subtracting idle_enter from "now" can thus
lead to a negative value (aka large unsigned value).

Fix this by moving the get_tod_clock() invocation out of the
loop. While at it also make the code a bit more readable.

A similar bug also exists for show_idle_time(). Fix this is as well.

Cc: <stable@vger.kernel.org>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
5 years agos390/unwind: fix mixing regs and sp
Ilya Leoshkevich [Wed, 2 Oct 2019 11:29:57 +0000 (13:29 +0200)]
s390/unwind: fix mixing regs and sp

unwind_for_each_frame stops after the first frame if regs->gprs[15] <=
sp.

The reason is that in case regs are specified, the first frame should be
regs->psw.addr and the second frame should be sp->gprs[8]. However,
currently the second frame is regs->gprs[15], which confuses
outside_of_stack().

Fix by introducing a flag to distinguish this special case from
unwinding the interrupt handler, for which the current behavior is
appropriate.

Fixes: ef2ec76e0480 ("s390/unwind: introduce stack unwind API")
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: stable@vger.kernel.org # v5.2+
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
5 years agos390/cmm: fix information leak in cmm_timeout_handler()
Yihui ZENG [Fri, 25 Oct 2019 09:31:48 +0000 (12:31 +0300)]
s390/cmm: fix information leak in cmm_timeout_handler()

The problem is that we were putting the NUL terminator too far:

buf[sizeof(buf) - 1] = '\0';

If the user input isn't NUL terminated and they haven't initialized the
whole buffer then it leads to an info leak.  The NUL terminator should
be:

buf[len - 1] = '\0';

Signed-off-by: Yihui Zeng <yzeng56@asu.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
[heiko.carstens@de.ibm.com: keep semantics of how *lenp and *ppos are handled]
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
5 years agoarm64: cpufeature: Enable Qualcomm Falkor errata 1009 for Kryo
Bjorn Andersson [Tue, 29 Oct 2019 23:27:38 +0000 (16:27 -0700)]
arm64: cpufeature: Enable Qualcomm Falkor errata 1009 for Kryo

The Kryo cores share errata 1009 with Falkor, so add their model
definitions and enable it for them as well.

Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
[will: Update entry in silicon-errata.rst]
Signed-off-by: Will Deacon <will@kernel.org>
5 years agoKVM: vmx, svm: always run with EFER.NXE=1 when shadow paging is active
Paolo Bonzini [Sun, 27 Oct 2019 15:23:23 +0000 (16:23 +0100)]
KVM: vmx, svm: always run with EFER.NXE=1 when shadow paging is active

VMX already does so if the host has SMEP, in order to support the combination of
CR0.WP=1 and CR4.SMEP=1.  However, it is perfectly safe to always do so, and in
fact VMX already ends up running with EFER.NXE=1 on old processors that lack the
"load EFER" controls, because it may help avoiding a slow MSR write.  Removing
all the conditionals simplifies the code.

SVM does not have similar code, but it should since recent AMD processors do
support SMEP.  So this patch also makes the code for the two vendors more similar
while fixing NPT=0, CR0.WP=1 and CR4.SMEP=1 on AMD processors.

Cc: stable@vger.kernel.org
Cc: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
5 years agokvm: call kvm_arch_destroy_vm if vm creation fails
Jim Mattson [Fri, 25 Oct 2019 11:34:58 +0000 (13:34 +0200)]
kvm: call kvm_arch_destroy_vm if vm creation fails

In kvm_create_vm(), if we've successfully called kvm_arch_init_vm(), but
then fail later in the function, we need to call kvm_arch_destroy_vm()
so that it can do any necessary cleanup (like freeing memory).

Fixes: b09f7780cbb061 ("KVM: x86: Detect and Initialize AVIC support")
Signed-off-by: John Sperbeck <jsperbeck@google.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Junaid Shahid <junaids@google.com>
[Remove dependency on "kvm: Don't clear reference count on
 kvm_create_vm() error path" which was not committed. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
5 years agoefi/efi_test: Lock down /dev/efi_test and require CAP_SYS_ADMIN
Javier Martinez Canillas [Tue, 29 Oct 2019 17:37:55 +0000 (18:37 +0100)]
efi/efi_test: Lock down /dev/efi_test and require CAP_SYS_ADMIN

The driver exposes EFI runtime services to user-space through an IOCTL
interface, calling the EFI services function pointers directly without
using the efivar API.

Disallow access to the /dev/efi_test character device when the kernel is
locked down to prevent arbitrary user-space to call EFI runtime services.

Also require CAP_SYS_ADMIN to open the chardev to prevent unprivileged
users to call the EFI runtime services, instead of just relying on the
chardev file mode bits for this.

The main user of this driver is the fwts [0] tool that already checks if
the effective user ID is 0 and fails otherwise. So this change shouldn't
cause any regression to this tool.

[0]: https://wiki.ubuntu.com/FirmwareTestSuite/Reference/uefivarinfo

Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Laszlo Ersek <lersek@redhat.com>
Acked-by: Matthew Garrett <mjg59@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: https://lkml.kernel.org/r/20191029173755.27149-7-ardb@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
5 years agox86, efi: Never relocate kernel below lowest acceptable address
Kairui Song [Tue, 29 Oct 2019 17:37:54 +0000 (18:37 +0100)]
x86, efi: Never relocate kernel below lowest acceptable address

Currently, kernel fails to boot on some HyperV VMs when using EFI.
And it's a potential issue on all x86 platforms.

It's caused by broken kernel relocation on EFI systems, when below three
conditions are met:

1. Kernel image is not loaded to the default address (LOAD_PHYSICAL_ADDR)
   by the loader.
2. There isn't enough room to contain the kernel, starting from the
   default load address (eg. something else occupied part the region).
3. In the memmap provided by EFI firmware, there is a memory region
   starts below LOAD_PHYSICAL_ADDR, and suitable for containing the
   kernel.

EFI stub will perform a kernel relocation when condition 1 is met. But
due to condition 2, EFI stub can't relocate kernel to the preferred
address, so it fallback to ask EFI firmware to alloc lowest usable memory
region, got the low region mentioned in condition 3, and relocated
kernel there.

It's incorrect to relocate the kernel below LOAD_PHYSICAL_ADDR. This
is the lowest acceptable kernel relocation address.

The first thing goes wrong is in arch/x86/boot/compressed/head_64.S.
Kernel decompression will force use LOAD_PHYSICAL_ADDR as the output
address if kernel is located below it. Then the relocation before
decompression, which move kernel to the end of the decompression buffer,
will overwrite other memory region, as there is no enough memory there.

To fix it, just don't let EFI stub relocate the kernel to any address
lower than lowest acceptable address.

[ ardb: introduce efi_low_alloc_above() to reduce the scope of the change ]

Signed-off-by: Kairui Song <kasong@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: https://lkml.kernel.org/r/20191029173755.27149-6-ardb@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
5 years agoefi: libstub/arm: Account for firmware reserved memory at the base of RAM
Ard Biesheuvel [Tue, 29 Oct 2019 17:37:53 +0000 (18:37 +0100)]
efi: libstub/arm: Account for firmware reserved memory at the base of RAM

The EFI stubloader for ARM starts out by allocating a 32 MB window
at the base of RAM, in order to ensure that the decompressor (which
blindly copies the uncompressed kernel into that window) does not
overwrite other allocations that are made while running in the context
of the EFI firmware.

In some cases, (e.g., U-Boot running on the Raspberry Pi 2), this is
causing boot failures because this initial allocation conflicts with
a page of reserved memory at the base of RAM that contains the SMP spin
tables and other pieces of firmware data and which was put there by
the bootloader under the assumption that the TEXT_OFFSET window right
below the kernel is only used partially during early boot, and will be
left alone once the memory reservations are processed and taken into
account.

So let's permit reserved memory regions to exist in the region starting
at the base of RAM, and ending at TEXT_OFFSET - 5 * PAGE_SIZE, which is
the window below the kernel that is not touched by the early boot code.

Tested-by: Guillaume Gardet <Guillaume.Gardet@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Chester Lin <clin@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: https://lkml.kernel.org/r/20191029173755.27149-5-ardb@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
5 years agoefi/random: Treat EFI_RNG_PROTOCOL output as bootloader randomness
Dominik Brodowski [Tue, 29 Oct 2019 17:37:52 +0000 (18:37 +0100)]
efi/random: Treat EFI_RNG_PROTOCOL output as bootloader randomness

Commit 69788ded7904 ("fdt: add support for rng-seed") introduced
add_bootloader_randomness(), permitting randomness provided by the
bootloader or firmware to be credited as entropy. However, the fact
that the UEFI support code was already wired into the RNG subsystem
via a call to add_device_randomness() was overlooked, and so it was
not converted at the same time.

Note that this UEFI (v2.4 or newer) feature is currently only
implemented for EFI stub booting on ARM, and further note that
CONFIG_RANDOM_TRUST_BOOTLOADER must be enabled, and this should be
done only if there indeed is sufficient trust in the bootloader
_and_ its source of randomness.

[ ardb: update commit log ]

Tested-by: Bhupesh Sharma <bhsharma@redhat.com>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: https://lkml.kernel.org/r/20191029173755.27149-4-ardb@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
5 years agoefi/tpm: Return -EINVAL when determining tpm final events log size fails
Jerry Snitselaar [Tue, 29 Oct 2019 17:37:51 +0000 (18:37 +0100)]
efi/tpm: Return -EINVAL when determining tpm final events log size fails

Currently nothing checks the return value of efi_tpm_eventlog_init(),
but in case that changes in the future make sure an error is
returned when it fails to determine the tpm final events log
size.

Suggested-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jerry Snitselaar <jsnitsel@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Fixes: 45b1359bc44f ("efi/tpm: Only set 'efi_tpm_final_log_size' after ...")
Link: https://lkml.kernel.org/r/20191029173755.27149-3-ardb@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
5 years agoefi: Make CONFIG_EFI_RCI2_TABLE selectable on x86 only
Narendra K [Tue, 29 Oct 2019 17:37:50 +0000 (18:37 +0100)]
efi: Make CONFIG_EFI_RCI2_TABLE selectable on x86 only

For the EFI_RCI2_TABLE Kconfig option, 'make oldconfig' asks the user
for input on platforms where the option may not be applicable. This patch
modifies the Kconfig option to ask the user for input only when CONFIG_X86
or CONFIG_COMPILE_TEST is set to y.

Suggested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Narendra K <Narendra.K@dell.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: https://lkml.kernel.org/r/20191029173755.27149-2-ardb@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
5 years agoMerge tag 'dmaengine-fix-5.4-rc6' of git://git.infradead.org/users/vkoul/slave-dma
Linus Torvalds [Thu, 31 Oct 2019 07:34:09 +0000 (07:34 +0000)]
Merge tag 'dmaengine-fix-5.4-rc6' of git://git.infradead.org/users/vkoul/slave-dma

Pull dmaengine fixes from Vinod Koul:
 "A few fixes to the dmaengine drivers:

   - fix in sprd driver for link list and potential memory leak

   - tegra transfer failure fix

   - imx size check fix for script_number

   - xilinx fix for 64bit AXIDMA and control reg update

   - qcom bam dma resource leak fix

   - cppi slave transfer fix when idle"

* tag 'dmaengine-fix-5.4-rc6' of git://git.infradead.org/users/vkoul/slave-dma:
  dmaengine: cppi41: Fix cppi41_dma_prep_slave_sg() when idle
  dmaengine: qcom: bam_dma: Fix resource leak
  dmaengine: sprd: Fix the possible memory leak issue
  dmaengine: xilinx_dma: Fix control reg update in vdma_channel_set_config
  dmaengine: xilinx_dma: Fix 64-bit simple AXIDMA transfer
  dmaengine: imx-sdma: fix size check for sdma script_number
  dmaengine: tegra210-adma: fix transfer failure
  dmaengine: sprd: Fix the link-list pointer register configuration issue

5 years agoMerge branch 'hv_netvsc-fix-error-handling-in-netvsc_attach-set_features'
David S. Miller [Thu, 31 Oct 2019 01:17:36 +0000 (18:17 -0700)]
Merge branch 'hv_netvsc-fix-error-handling-in-netvsc_attach-set_features'

Haiyang Zhang says:

====================
hv_netvsc: fix error handling in netvsc_attach/set_features

The error handling code path in these functions are not correct.
This patch set fixes them.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agohv_netvsc: Fix error handling in netvsc_attach()
Haiyang Zhang [Wed, 30 Oct 2019 15:32:13 +0000 (15:32 +0000)]
hv_netvsc: Fix error handling in netvsc_attach()

If rndis_filter_open() fails, we need to remove the rndis device created
in earlier steps, before returning an error code. Otherwise, the retry of
netvsc_attach() from its callers will fail and hang.

Fixes: 467e8847a1ac ("hv_netvsc: common detach logic")
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agohv_netvsc: Fix error handling in netvsc_set_features()
Haiyang Zhang [Wed, 30 Oct 2019 15:32:11 +0000 (15:32 +0000)]
hv_netvsc: Fix error handling in netvsc_set_features()

When an error is returned by rndis_filter_set_offload_params(), we should
still assign the unaffected features to ndev->features. Otherwise, these
features will be missing.

Fixes: 60c5e0ec397f ("hv_netvsc: Add handler for LRO setting change")
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agocxgb4: fix panic when attaching to ULD fail
Vishal Kulkarni [Wed, 30 Oct 2019 14:47:57 +0000 (20:17 +0530)]
cxgb4: fix panic when attaching to ULD fail

Release resources when attaching to ULD fail. Otherwise, data
mismatch is seen between LLD and ULD later on, which lead to
kernel panic when accessing resources that should not even
exist in the first place.

Fixes: 670bddc26792 ("cxgb4: Add support for dynamic allocation of resources for ULD")
Signed-off-by: Shahjada Abul Husain <shahjada@chelsio.com>
Signed-off-by: Vishal Kulkarni <vishal@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: annotate lockless accesses to sk->sk_napi_id
Eric Dumazet [Tue, 29 Oct 2019 17:54:44 +0000 (10:54 -0700)]
net: annotate lockless accesses to sk->sk_napi_id

We already annotated most accesses to sk->sk_napi_id

We missed sk_mark_napi_id() and sk_mark_napi_id_once()
which might be called without socket lock held in UDP stack.

KCSAN reported :
BUG: KCSAN: data-race in udpv6_queue_rcv_one_skb / udpv6_queue_rcv_one_skb

write to 0xffff888121c6d108 of 4 bytes by interrupt on cpu 0:
 sk_mark_napi_id include/net/busy_poll.h:125 [inline]
 __udpv6_queue_rcv_skb net/ipv6/udp.c:571 [inline]
 udpv6_queue_rcv_one_skb+0x70c/0xb40 net/ipv6/udp.c:672
 udpv6_queue_rcv_skb+0xb5/0x400 net/ipv6/udp.c:689
 udp6_unicast_rcv_skb.isra.0+0xd7/0x180 net/ipv6/udp.c:832
 __udp6_lib_rcv+0x69c/0x1770 net/ipv6/udp.c:913
 udpv6_rcv+0x2b/0x40 net/ipv6/udp.c:1015
 ip6_protocol_deliver_rcu+0x22a/0xbe0 net/ipv6/ip6_input.c:409
 ip6_input_finish+0x30/0x50 net/ipv6/ip6_input.c:450
 NF_HOOK include/linux/netfilter.h:305 [inline]
 NF_HOOK include/linux/netfilter.h:299 [inline]
 ip6_input+0x177/0x190 net/ipv6/ip6_input.c:459
 dst_input include/net/dst.h:442 [inline]
 ip6_rcv_finish+0x110/0x140 net/ipv6/ip6_input.c:76
 NF_HOOK include/linux/netfilter.h:305 [inline]
 NF_HOOK include/linux/netfilter.h:299 [inline]
 ipv6_rcv+0x1a1/0x1b0 net/ipv6/ip6_input.c:284
 __netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5010
 __netif_receive_skb+0x37/0xf0 net/core/dev.c:5124
 process_backlog+0x1d3/0x420 net/core/dev.c:5955
 napi_poll net/core/dev.c:6392 [inline]
 net_rx_action+0x3ae/0xa90 net/core/dev.c:6460

write to 0xffff888121c6d108 of 4 bytes by interrupt on cpu 1:
 sk_mark_napi_id include/net/busy_poll.h:125 [inline]
 __udpv6_queue_rcv_skb net/ipv6/udp.c:571 [inline]
 udpv6_queue_rcv_one_skb+0x70c/0xb40 net/ipv6/udp.c:672
 udpv6_queue_rcv_skb+0xb5/0x400 net/ipv6/udp.c:689
 udp6_unicast_rcv_skb.isra.0+0xd7/0x180 net/ipv6/udp.c:832
 __udp6_lib_rcv+0x69c/0x1770 net/ipv6/udp.c:913
 udpv6_rcv+0x2b/0x40 net/ipv6/udp.c:1015
 ip6_protocol_deliver_rcu+0x22a/0xbe0 net/ipv6/ip6_input.c:409
 ip6_input_finish+0x30/0x50 net/ipv6/ip6_input.c:450
 NF_HOOK include/linux/netfilter.h:305 [inline]
 NF_HOOK include/linux/netfilter.h:299 [inline]
 ip6_input+0x177/0x190 net/ipv6/ip6_input.c:459
 dst_input include/net/dst.h:442 [inline]
 ip6_rcv_finish+0x110/0x140 net/ipv6/ip6_input.c:76
 NF_HOOK include/linux/netfilter.h:305 [inline]
 NF_HOOK include/linux/netfilter.h:299 [inline]
 ipv6_rcv+0x1a1/0x1b0 net/ipv6/ip6_input.c:284
 __netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5010
 __netif_receive_skb+0x37/0xf0 net/core/dev.c:5124
 process_backlog+0x1d3/0x420 net/core/dev.c:5955

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 10890 Comm: syz-executor.0 Not tainted 5.4.0-rc3+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011

Fixes: 1b84b46f6912 ("udp: enable busy polling for all sockets")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoALSA: timer: Fix mutex deadlock at releasing card
Takashi Iwai [Wed, 30 Oct 2019 21:42:57 +0000 (22:42 +0100)]
ALSA: timer: Fix mutex deadlock at releasing card

When a card is disconnected while in use, the system waits until all
opened files are closed then releases the card.  This is done via
put_device() of the card device in each device release code.

The recently reported mutex deadlock bug happens in this code path;
snd_timer_close() for the timer device deals with the global
register_mutex and it calls put_device() there.  When this timer
device is the last one, the card gets freed and it eventually calls
snd_timer_free(), which has again the protection with the global
register_mutex -- boom.

Basically put_device() call itself is race-free, so a relative simple
workaround is to move this put_device() call out of the mutex.  For
achieving that, in this patch, snd_timer_close_locked() got a new
argument to store the card device pointer in return, and each caller
invokes put_device() with the returned object after the mutex unlock.

Reported-and-tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
5 years agoio_uring: ensure we clear io_kiocb->result before each issue
Jens Axboe [Wed, 30 Oct 2019 19:53:09 +0000 (13:53 -0600)]
io_uring: ensure we clear io_kiocb->result before each issue

We use io_kiocb->result == -EAGAIN as a way to know if we need to
re-submit a polled request, as -EAGAIN reporting happens out-of-line
for IO submission failures. This field is cleared when we originally
allocate the request, but it isn't reset when we retry the submission
from async context. This can cause issues where we think something
needs a re-issue, but we're really just reading stale data.

Reset ->result whenever we re-prep a request for polled submission.

Cc: stable@vger.kernel.org
Fixes: 9c35176caf98 ("io_uring: add support for sqe links")
Reported-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agoparisc: fix frame pointer in ftrace_regs_caller()
Sven Schnelle [Wed, 30 Oct 2019 08:17:18 +0000 (09:17 +0100)]
parisc: fix frame pointer in ftrace_regs_caller()

The current code in ftrace_regs_caller() doesn't assign
%r3 to contain the address of the current frame. This
is hidden if the kernel is compiled with FRAME_POINTER,
but without it just crashes because it tries to dereference
an arbitrary address. Fix this by always setting %r3 to the
current stack frame.

Signed-off-by: Sven Schnelle <svens@stackframe.org>
Signed-off-by: Helge Deller <deller@gmx.de>
5 years agonet: annotate accesses to sk->sk_incoming_cpu
Eric Dumazet [Wed, 30 Oct 2019 20:00:04 +0000 (13:00 -0700)]
net: annotate accesses to sk->sk_incoming_cpu

This socket field can be read and written by concurrent cpus.

Use READ_ONCE() and WRITE_ONCE() annotations to document this,
and avoid some compiler 'optimizations'.

KCSAN reported :

BUG: KCSAN: data-race in tcp_v4_rcv / tcp_v4_rcv

write to 0xffff88812220763c of 4 bytes by interrupt on cpu 0:
 sk_incoming_cpu_update include/net/sock.h:953 [inline]
 tcp_v4_rcv+0x1b3c/0x1bb0 net/ipv4/tcp_ipv4.c:1934
 ip_protocol_deliver_rcu+0x4d/0x420 net/ipv4/ip_input.c:204
 ip_local_deliver_finish+0x110/0x140 net/ipv4/ip_input.c:231
 NF_HOOK include/linux/netfilter.h:305 [inline]
 NF_HOOK include/linux/netfilter.h:299 [inline]
 ip_local_deliver+0x133/0x210 net/ipv4/ip_input.c:252
 dst_input include/net/dst.h:442 [inline]
 ip_rcv_finish+0x121/0x160 net/ipv4/ip_input.c:413
 NF_HOOK include/linux/netfilter.h:305 [inline]
 NF_HOOK include/linux/netfilter.h:299 [inline]
 ip_rcv+0x18f/0x1a0 net/ipv4/ip_input.c:523
 __netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5010
 __netif_receive_skb+0x37/0xf0 net/core/dev.c:5124
 process_backlog+0x1d3/0x420 net/core/dev.c:5955
 napi_poll net/core/dev.c:6392 [inline]
 net_rx_action+0x3ae/0xa90 net/core/dev.c:6460
 __do_softirq+0x115/0x33f kernel/softirq.c:292
 do_softirq_own_stack+0x2a/0x40 arch/x86/entry/entry_64.S:1082
 do_softirq.part.0+0x6b/0x80 kernel/softirq.c:337
 do_softirq kernel/softirq.c:329 [inline]
 __local_bh_enable_ip+0x76/0x80 kernel/softirq.c:189

read to 0xffff88812220763c of 4 bytes by interrupt on cpu 1:
 sk_incoming_cpu_update include/net/sock.h:952 [inline]
 tcp_v4_rcv+0x181a/0x1bb0 net/ipv4/tcp_ipv4.c:1934
 ip_protocol_deliver_rcu+0x4d/0x420 net/ipv4/ip_input.c:204
 ip_local_deliver_finish+0x110/0x140 net/ipv4/ip_input.c:231
 NF_HOOK include/linux/netfilter.h:305 [inline]
 NF_HOOK include/linux/netfilter.h:299 [inline]
 ip_local_deliver+0x133/0x210 net/ipv4/ip_input.c:252
 dst_input include/net/dst.h:442 [inline]
 ip_rcv_finish+0x121/0x160 net/ipv4/ip_input.c:413
 NF_HOOK include/linux/netfilter.h:305 [inline]
 NF_HOOK include/linux/netfilter.h:299 [inline]
 ip_rcv+0x18f/0x1a0 net/ipv4/ip_input.c:523
 __netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5010
 __netif_receive_skb+0x37/0xf0 net/core/dev.c:5124
 process_backlog+0x1d3/0x420 net/core/dev.c:5955
 napi_poll net/core/dev.c:6392 [inline]
 net_rx_action+0x3ae/0xa90 net/core/dev.c:6460
 __do_softirq+0x115/0x33f kernel/softirq.c:292
 run_ksoftirqd+0x46/0x60 kernel/softirq.c:603
 smpboot_thread_fn+0x37d/0x4a0 kernel/smpboot.c:165

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 16 Comm: ksoftirqd/1 Not tainted 5.4.0-rc3+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: core: Unpublish devlink parameters during reload
Jiri Pirko [Wed, 30 Oct 2019 09:04:22 +0000 (11:04 +0200)]
mlxsw: core: Unpublish devlink parameters during reload

The devlink parameter "acl_region_rehash_interval" is a runtime
parameter whose value is stored in a dynamically allocated memory. While
reloading the driver, this memory is freed and then allocated again. A
use-after-free might happen if during this time frame someone tries to
retrieve its value.

Since commit 070c63f20f6c ("net: devlink: allow to change namespaces
during reload") the use-after-free can be reliably triggered when
reloading the driver into a namespace, as after freeing the memory (via
reload_down() callback) all the parameters are notified.

Fix this by unpublishing and then re-publishing the parameters during
reload.

Fixes: 9b0152a33b05 ("mlxsw: spectrum: add "acl_region_rehash_interval" devlink param")
Fixes: 2265de84911e ("devlink: publish params only after driver init is done")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoqed: Optimize execution time for nvm attributes configuration.
Sudarsana Reddy Kalluru [Wed, 30 Oct 2019 08:39:58 +0000 (01:39 -0700)]
qed: Optimize execution time for nvm attributes configuration.

Current implementation for nvm_attr configuration instructs the management
FW to load/unload the nvm-cfg image for each user-provided attribute in
the input file. This consumes lot of cycles even for few tens of
attributes.
This patch updates the implementation to perform load/commit of the config
for every 50 attributes. After loading the nvm-image, MFW expects that
config should be committed in a predefined timer value (5 sec), hence it's
not possible to write large number of attributes in a single load/commit
window. Hence performing the commits in chunks.

Fixes: d1d50190eb79 ("qed: Add driver API for flashing the config attributes.")
Signed-off-by: Sudarsana Reddy Kalluru <skalluru@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agovxlan: fix unexpected failure of vxlan_changelink()
Taehee Yoo [Wed, 30 Oct 2019 08:15:12 +0000 (08:15 +0000)]
vxlan: fix unexpected failure of vxlan_changelink()

After commit e3f889100f51 ("vxlan: add adjacent link to limit depth
level"), vxlan_changelink() could fail because of
netdev_adjacent_change_prepare().
netdev_adjacent_change_prepare() returns -EEXIST when old lower device
and new lower device are same.
(old lower device is "dst->remote_dev" and new lower device is "lowerdev")
So, before calling it, lowerdev should be NULL if these devices are same.

Test command1:
    ip link add dummy0 type dummy
    ip link add vxlan0 type vxlan dev dummy0 dstport 4789 vni 1
    ip link set vxlan0 type vxlan ttl 5
    RTNETLINK answers: File exists

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: e3f889100f51 ("vxlan: add adjacent link to limit depth level")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoqed: fix spelling mistake "queuess" -> "queues"
Colin Ian King [Wed, 30 Oct 2019 07:59:22 +0000 (08:59 +0100)]
qed: fix spelling mistake "queuess" -> "queues"

There is a spelling misake in a DP_NOTICE message. Fix it.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoSUNRPC: Destroy the back channel when we destroy the host transport
Trond Myklebust [Thu, 17 Oct 2019 13:02:21 +0000 (09:02 -0400)]
SUNRPC: Destroy the back channel when we destroy the host transport

When we're destroying the host transport mechanism, we should ensure
that we do not leak memory by failing to release any back channel
slots that might still exist.

Reported-by: Neil Brown <neilb@suse.de>
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
5 years agoSUNRPC: The RDMA back channel mustn't disappear while requests are outstanding
Trond Myklebust [Thu, 17 Oct 2019 13:02:20 +0000 (09:02 -0400)]
SUNRPC: The RDMA back channel mustn't disappear while requests are outstanding

If there are RDMA back channel requests being processed by the
server threads, then we should hold a reference to the transport
to ensure it doesn't get freed from underneath us.

Reported-by: Neil Brown <neilb@suse.de>
Fixes: f0bd49ebbf4b ("xprtrdma: Handle incoming backward direction RPC calls")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
5 years agoSUNRPC: The TCP back channel mustn't disappear while requests are outstanding
Trond Myklebust [Thu, 17 Oct 2019 13:02:19 +0000 (09:02 -0400)]
SUNRPC: The TCP back channel mustn't disappear while requests are outstanding

If there are TCP back channel requests being processed by the
server threads, then we should hold a reference to the transport
to ensure it doesn't get freed from underneath us.

Reported-by: Neil Brown <neilb@suse.de>
Fixes: 9a4e3969cce7 ("SUNRPC: RPC callbacks may be split across several..")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
5 years agodrm/amdgpu: enable -msse2 for GCC 7.1+ users
Nick Desaulniers [Wed, 16 Oct 2019 23:02:09 +0000 (16:02 -0700)]
drm/amdgpu: enable -msse2 for GCC 7.1+ users

A final attempt at enabling sse2 for GCC users.

Orininally attempted in:
commit 8d73ceda10c9 ("drm/amd/display: add -msse2 to prevent Clang from emitting libcalls to undefined SW FP routines")

Reverted due to "reported instability" in:
commit 4a3058cdd203 ("Revert "drm/amd/display: add -msse2 to prevent Clang from emitting libcalls to undefined SW FP routines"")

Re-added just for Clang in:
commit 4ba6fcf75364 ("drm/amd/display: readd -msse2 to prevent Clang from emitting libcalls to undefined SW FP routines")

The original report didn't have enough information to know if the GPF
was due to misalignment, but I suspect that it was. (The missing
information was the disassembly of the function at the bottom of the
trace, to see if the instruction pointer pointed to an instruction with
16B alignment memory operand requirements.  The stack trace does show
the stack was only 8B but not 16B aligned though, which makes this a
strong possibility).

Now that the stack misalignment issue has been fixed for users of GCC
7.1+, reattempt adding -msse2. This matches Clang.

It will likely never be safe to enable this for pre-GCC 7.1 AND use a
16B aligned stack in these translation units.

This is only a functional change for GCC 7.1+ users, and should be boot
tested.

Link: https://bugs.freedesktop.org/show_bug.cgi?id=109487
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: fix stack alignment ABI mismatch for GCC 7.1+
Nick Desaulniers [Wed, 16 Oct 2019 23:02:08 +0000 (16:02 -0700)]
drm/amdgpu: fix stack alignment ABI mismatch for GCC 7.1+

GCC earlier than 7.1 errors when compiling code that makes use of
`double`s and sets a stack alignment outside of the range of [2^4-2^12]:

$ cat foo.c
double foo(double x, double y) {
  return x + y;
}
$ gcc-4.9 -mpreferred-stack-boundary=3 foo.c
error: -mpreferred-stack-boundary=3 is not between 4 and 12

This is likely why the AMDGPU driver was ever compiled with a different
stack alignment (and thus different ABI) than the rest of the x86
kernel. The kernel uses 8B stack alignment, while the driver was using
16B stack alignment in a few places.

Since GCC 7.1+ doesn't error, fix the ABI mismatch for users of newer
versions of GCC.

There was discussion about whether to mark the driver broken or not for
users of GCC earlier than 7.1, but since the driver currently is
working, don't explicitly break the driver for them here.

Relying on differing stack alignment is unspecified behavior, and
brittle, and may break in the future.

This patch is no functional change for GCC users earlier than 7.1. It's
been compile tested on GCC 4.9 and 8.3 to check the correct flags. It
should be boot tested when built with GCC 7.1+.

-mincoming-stack-boundary= or -mstackrealign may help keep this code
building for pre-GCC 7.1 users.

The version check for GCC is broken into two conditionals, both because
cc-ifversion is currently GCC specific, and it simplifies a subsequent
patch.

Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>