]> git.baikalelectronics.ru Git - kernel.git/log
kernel.git
4 years agoMerge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Linus Torvalds [Sat, 26 Sep 2020 18:18:37 +0000 (11:18 -0700)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "Three fixes: one in drivers (lpfc) and two for zoned block devices.

  The latter also impinges on the block layer but only to introduce a
  new block API for setting the zone model rather than fiddling with the
  queue directly in the zoned block driver"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: sd: sd_zbc: Fix ZBC disk initialization
  scsi: sd: sd_zbc: Fix handling of host-aware ZBC disks
  scsi: lpfc: Fix initial FLOGI failure due to BBSCN not supported

4 years agoMerge tag 'io_uring-5.9-2020-09-25' of git://git.kernel.dk/linux-block
Linus Torvalds [Sat, 26 Sep 2020 18:13:51 +0000 (11:13 -0700)]
Merge tag 'io_uring-5.9-2020-09-25' of git://git.kernel.dk/linux-block

Pull io_uring fixes from Jens Axboe:
 "Two fixes for regressions in this cycle, and one that goes to 5.8
  stable:

   - fix leak of getname() retrieved filename

   - remove plug->nowait assignment, fixing a regression with btrfs

   - fix for async buffered retry"

* tag 'io_uring-5.9-2020-09-25' of git://git.kernel.dk/linux-block:
  io_uring: ensure async buffered read-retry is setup properly
  io_uring: don't unconditionally set plug->nowait = true
  io_uring: ensure open/openat2 name is cleaned on cancelation

4 years agoMerge tag 'block-5.9-2020-09-25' of git://git.kernel.dk/linux-block
Linus Torvalds [Sat, 26 Sep 2020 18:07:36 +0000 (11:07 -0700)]
Merge tag 'block-5.9-2020-09-25' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
 "NVMe pull request from Christoph, and removal of a dead define.

   - fix error during controller probe that cause double free irqs
     (Keith Busch)

   - FC connection establishment fix (James Smart)

   - properly handle completions for invalid tags (Xianting Tian)

   - pass the correct nsid to the command effects and supported log
     (Chaitanya Kulkarni)"

* tag 'block-5.9-2020-09-25' of git://git.kernel.dk/linux-block:
  block: remove unused BLK_QC_T_EAGAIN flag
  nvme-core: don't use NVME_NSID_ALL for command effects and supported log
  nvme-fc: fail new connections to a deleted host or remote port
  nvme-pci: fix NULL req in completion handler
  nvme: return errors for hwmon init

4 years agoMerge tag 's390-5.9-7' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Linus Torvalds [Sat, 26 Sep 2020 18:01:18 +0000 (11:01 -0700)]
Merge tag 's390-5.9-7' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 fix from Vasily Gorbik:
 "Fix truncated ZCRYPT_PERDEV_REQCNT ioctl result. Copy entire reqcnt
  list"

* tag 's390-5.9-7' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/zcrypt: Fix ZCRYPT_PERDEV_REQCNT ioctl

4 years agoMerge branch 'akpm' (patches from Andrew)
Linus Torvalds [Sat, 26 Sep 2020 17:53:35 +0000 (10:53 -0700)]
Merge branch 'akpm' (patches from Andrew)

Merge misc fixes from Andrew Morton:
 "9 patches.

  Subsystems affected by this patch series: mm (thp, memcg, gup,
  migration, memory-hotplug), lib, and x86"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mm: don't rely on system state to detect hot-plug operations
  mm: replace memmap_context by meminit_context
  arch/x86/lib/usercopy_64.c: fix __copy_user_flushcache() cache writeback
  lib/memregion.c: include memregion.h
  lib/string.c: implement stpcpy
  mm/migrate: correct thp migration stats
  mm/gup: fix gup_fast with dynamic page table folding
  mm: memcontrol: fix missing suffix of workingset_restore
  mm, THP, swap: fix allocating cluster for swapfile by mistake

4 years agomm: validate pmd after splitting
Minchan Kim [Tue, 15 Sep 2020 06:32:15 +0000 (23:32 -0700)]
mm: validate pmd after splitting

syzbot reported the following KASAN splat:

  general protection fault, probably for non-canonical address 0xdffffc0000000003: 0000 [#1] PREEMPT SMP KASAN
  KASAN: null-ptr-deref in range [0x0000000000000018-0x000000000000001f]
  CPU: 1 PID: 6826 Comm: syz-executor142 Not tainted 5.9.0-rc4-syzkaller #0
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
  RIP: 0010:__lock_acquire+0x84/0x2ae0 kernel/locking/lockdep.c:4296
  Code: ff df 8a 04 30 84 c0 0f 85 e3 16 00 00 83 3d 56 58 35 08 00 0f 84 0e 17 00 00 83 3d 25 c7 f5 07 00 74 2c 4c 89 e8 48 c1 e8 03 <80> 3c 30 00 74 12 4c 89 ef e8 3e d1 5a 00 48 be 00 00 00 00 00 fc
  RSP: 0018:ffffc90004b9f850 EFLAGS: 00010006
  Call Trace:
    lock_acquire+0x140/0x6f0 kernel/locking/lockdep.c:5006
    __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
    _raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:151
    spin_lock include/linux/spinlock.h:354 [inline]
    madvise_cold_or_pageout_pte_range+0x52f/0x25c0 mm/madvise.c:389
    walk_pmd_range mm/pagewalk.c:89 [inline]
    walk_pud_range mm/pagewalk.c:160 [inline]
    walk_p4d_range mm/pagewalk.c:193 [inline]
    walk_pgd_range mm/pagewalk.c:229 [inline]
    __walk_page_range+0xe7b/0x1da0 mm/pagewalk.c:331
    walk_page_range+0x2c3/0x5c0 mm/pagewalk.c:427
    madvise_pageout_page_range mm/madvise.c:521 [inline]
    madvise_pageout mm/madvise.c:557 [inline]
    madvise_vma mm/madvise.c:946 [inline]
    do_madvise+0x12d0/0x2090 mm/madvise.c:1145
    __do_sys_madvise mm/madvise.c:1171 [inline]
    __se_sys_madvise mm/madvise.c:1169 [inline]
    __x64_sys_madvise+0x76/0x80 mm/madvise.c:1169
    do_syscall_64+0x31/0x70 arch/x86/entry/common.c:46
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

The backing vma was shmem.

In case of split page of file-backed THP, madvise zaps the pmd instead
of remapping of sub-pages.  So we need to check pmd validity after
split.

Reported-by: syzbot+ecf80462cb7d5d552bc7@syzkaller.appspotmail.com
Fixes: 234be0c86ee5 ("mm: introduce MADV_PAGEOUT")
Signed-off-by: Minchan Kim <minchan@kernel.org>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm: don't rely on system state to detect hot-plug operations
Laurent Dufour [Sat, 26 Sep 2020 04:19:31 +0000 (21:19 -0700)]
mm: don't rely on system state to detect hot-plug operations

In register_mem_sect_under_node() the system_state's value is checked to
detect whether the call is made during boot time or during an hot-plug
operation.  Unfortunately, that check against SYSTEM_BOOTING is wrong
because regular memory is registered at SYSTEM_SCHEDULING state.  In
addition, memory hot-plug operation can be triggered at this system
state by the ACPI [1].  So checking against the system state is not
enough.

The consequence is that on system with interleaved node's ranges like this:

 Early memory node ranges
   node   1: [mem 0x0000000000000000-0x000000011fffffff]
   node   2: [mem 0x0000000120000000-0x000000014fffffff]
   node   1: [mem 0x0000000150000000-0x00000001ffffffff]
   node   0: [mem 0x0000000200000000-0x000000048fffffff]
   node   2: [mem 0x0000000490000000-0x00000007ffffffff]

This can be seen on PowerPC LPAR after multiple memory hot-plug and
hot-unplug operations are done.  At the next reboot the node's memory
ranges can be interleaved and since the call to link_mem_sections() is
made in topology_init() while the system is in the SYSTEM_SCHEDULING
state, the node's id is not checked, and the sections registered to
multiple nodes:

  $ ls -l /sys/devices/system/memory/memory21/node*
  total 0
  lrwxrwxrwx 1 root root     0 Aug 24 05:27 node1 -> ../../node/node1
  lrwxrwxrwx 1 root root     0 Aug 24 05:27 node2 -> ../../node/node2

In that case, the system is able to boot but if later one of theses
memory blocks is hot-unplugged and then hot-plugged, the sysfs
inconsistency is detected and this is triggering a BUG_ON():

  kernel BUG at /Users/laurent/src/linux-ppc/mm/memory_hotplug.c:1084!
  Oops: Exception in kernel mode, sig: 5 [#1]
  LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
  Modules linked in: rpadlpar_io rpaphp pseries_rng rng_core vmx_crypto gf128mul binfmt_misc ip_tables x_tables xfs libcrc32c crc32c_vpmsum autofs4
  CPU: 8 PID: 10256 Comm: drmgr Not tainted 5.9.0-rc1+ #25
  Call Trace:
    add_memory_resource+0x23c/0x340 (unreliable)
    __add_memory+0x5c/0xf0
    dlpar_add_lmb+0x1b4/0x500
    dlpar_memory+0x1f8/0xb80
    handle_dlpar_errorlog+0xc0/0x190
    dlpar_store+0x198/0x4a0
    kobj_attr_store+0x30/0x50
    sysfs_kf_write+0x64/0x90
    kernfs_fop_write+0x1b0/0x290
    vfs_write+0xe8/0x290
    ksys_write+0xdc/0x130
    system_call_exception+0x160/0x270
    system_call_common+0xf0/0x27c

This patch addresses the root cause by not relying on the system_state
value to detect whether the call is due to a hot-plug operation.  An
extra parameter is added to link_mem_sections() detailing whether the
operation is due to a hot-plug operation.

[1] According to Oscar Salvador, using this qemu command line, ACPI
memory hotplug operations are raised at SYSTEM_SCHEDULING state:

  $QEMU -enable-kvm -machine pc -smp 4,sockets=4,cores=1,threads=1 -cpu host -monitor pty \
        -m size=$MEM,slots=255,maxmem=4294967296k  \
        -numa node,nodeid=0,cpus=0-3,mem=512 -numa node,nodeid=1,mem=512 \
        -object memory-backend-ram,id=memdimm0,size=134217728 -device pc-dimm,node=0,memdev=memdimm0,id=dimm0,slot=0 \
        -object memory-backend-ram,id=memdimm1,size=134217728 -device pc-dimm,node=0,memdev=memdimm1,id=dimm1,slot=1 \
        -object memory-backend-ram,id=memdimm2,size=134217728 -device pc-dimm,node=0,memdev=memdimm2,id=dimm2,slot=2 \
        -object memory-backend-ram,id=memdimm3,size=134217728 -device pc-dimm,node=0,memdev=memdimm3,id=dimm3,slot=3 \
        -object memory-backend-ram,id=memdimm4,size=134217728 -device pc-dimm,node=1,memdev=memdimm4,id=dimm4,slot=4 \
        -object memory-backend-ram,id=memdimm5,size=134217728 -device pc-dimm,node=1,memdev=memdimm5,id=dimm5,slot=5 \
        -object memory-backend-ram,id=memdimm6,size=134217728 -device pc-dimm,node=1,memdev=memdimm6,id=dimm6,slot=6 \

Fixes: f2af3c38dbfa ("mm/memory_hotplug.c: make register_mem_sect_under_node() a callback of walk_memory_range()")
Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Nathan Lynch <nathanl@linux.ibm.com>
Cc: Scott Cheloha <cheloha@linux.ibm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20200915094143.79181-3-ldufour@linux.ibm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm: replace memmap_context by meminit_context
Laurent Dufour [Sat, 26 Sep 2020 04:19:28 +0000 (21:19 -0700)]
mm: replace memmap_context by meminit_context

Patch series "mm: fix memory to node bad links in sysfs", v3.

Sometimes, firmware may expose interleaved memory layout like this:

 Early memory node ranges
   node   1: [mem 0x0000000000000000-0x000000011fffffff]
   node   2: [mem 0x0000000120000000-0x000000014fffffff]
   node   1: [mem 0x0000000150000000-0x00000001ffffffff]
   node   0: [mem 0x0000000200000000-0x000000048fffffff]
   node   2: [mem 0x0000000490000000-0x00000007ffffffff]

In that case, we can see memory blocks assigned to multiple nodes in
sysfs:

  $ ls -l /sys/devices/system/memory/memory21
  total 0
  lrwxrwxrwx 1 root root     0 Aug 24 05:27 node1 -> ../../node/node1
  lrwxrwxrwx 1 root root     0 Aug 24 05:27 node2 -> ../../node/node2
  -rw-r--r-- 1 root root 65536 Aug 24 05:27 online
  -r--r--r-- 1 root root 65536 Aug 24 05:27 phys_device
  -r--r--r-- 1 root root 65536 Aug 24 05:27 phys_index
  drwxr-xr-x 2 root root     0 Aug 24 05:27 power
  -r--r--r-- 1 root root 65536 Aug 24 05:27 removable
  -rw-r--r-- 1 root root 65536 Aug 24 05:27 state
  lrwxrwxrwx 1 root root     0 Aug 24 05:25 subsystem -> ../../../../bus/memory
  -rw-r--r-- 1 root root 65536 Aug 24 05:25 uevent
  -r--r--r-- 1 root root 65536 Aug 24 05:27 valid_zones

The same applies in the node's directory with a memory21 link in both
the node1 and node2's directory.

This is wrong but doesn't prevent the system to run.  However when
later, one of these memory blocks is hot-unplugged and then hot-plugged,
the system is detecting an inconsistency in the sysfs layout and a
BUG_ON() is raised:

  kernel BUG at /Users/laurent/src/linux-ppc/mm/memory_hotplug.c:1084!
  LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
  Modules linked in: rpadlpar_io rpaphp pseries_rng rng_core vmx_crypto gf128mul binfmt_misc ip_tables x_tables xfs libcrc32c crc32c_vpmsum autofs4
  CPU: 8 PID: 10256 Comm: drmgr Not tainted 5.9.0-rc1+ #25
  Call Trace:
    add_memory_resource+0x23c/0x340 (unreliable)
    __add_memory+0x5c/0xf0
    dlpar_add_lmb+0x1b4/0x500
    dlpar_memory+0x1f8/0xb80
    handle_dlpar_errorlog+0xc0/0x190
    dlpar_store+0x198/0x4a0
    kobj_attr_store+0x30/0x50
    sysfs_kf_write+0x64/0x90
    kernfs_fop_write+0x1b0/0x290
    vfs_write+0xe8/0x290
    ksys_write+0xdc/0x130
    system_call_exception+0x160/0x270
    system_call_common+0xf0/0x27c

This has been seen on PowerPC LPAR.

The root cause of this issue is that when node's memory is registered,
the range used can overlap another node's range, thus the memory block
is registered to multiple nodes in sysfs.

There are two issues here:

 (a) The sysfs memory and node's layouts are broken due to these
     multiple links

 (b) The link errors in link_mem_sections() should not lead to a system
     panic.

To address (a) register_mem_sect_under_node should not rely on the
system state to detect whether the link operation is triggered by a hot
plug operation or not.  This is addressed by the patches 1 and 2 of this
series.

Issue (b) will be addressed separately.

This patch (of 2):

The memmap_context enum is used to detect whether a memory operation is
due to a hot-add operation or happening at boot time.

Make it general to the hotplug operation and rename it as
meminit_context.

There is no functional change introduced by this patch

Suggested-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J . Wysocki" <rafael@kernel.org>
Cc: Nathan Lynch <nathanl@linux.ibm.com>
Cc: Scott Cheloha <cheloha@linux.ibm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20200915094143.79181-1-ldufour@linux.ibm.com
Link: https://lkml.kernel.org/r/20200915132624.9723-1-ldufour@linux.ibm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoarch/x86/lib/usercopy_64.c: fix __copy_user_flushcache() cache writeback
Mikulas Patocka [Sat, 26 Sep 2020 04:19:24 +0000 (21:19 -0700)]
arch/x86/lib/usercopy_64.c: fix __copy_user_flushcache() cache writeback

If we copy less than 8 bytes and if the destination crosses a cache
line, __copy_user_flushcache would invalidate only the first cache line.

This patch makes it invalidate the second cache line as well.

Fixes: b8d3b7263dbe11 ("x86, uaccess: introduce copy_from_iter_flushcache for pmem / cache-bypass operations")
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Dan Williams <dan.j.wiilliams@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/alpine.LRH.2.02.2009161451140.21915@file01.intranet.prod.int.rdu2.redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agolib/memregion.c: include memregion.h
Jason Yan [Sat, 26 Sep 2020 04:19:21 +0000 (21:19 -0700)]
lib/memregion.c: include memregion.h

This addresses the following sparse warning:

  lib/memregion.c:8:5: warning: symbol 'memregion_alloc' was not declared. Should it be static?
  lib/memregion.c:14:6: warning: symbol 'memregion_free' was not declared. Should it be static?

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Link: https://lkml.kernel.org/r/20200921142852.875312-1-yanaijie@huawei.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agolib/string.c: implement stpcpy
Nick Desaulniers [Sat, 26 Sep 2020 04:19:18 +0000 (21:19 -0700)]
lib/string.c: implement stpcpy

LLVM implemented a recent "libcall optimization" that lowers calls to
`sprintf(dest, "%s", str)` where the return value is used to
`stpcpy(dest, str) - dest`.

This generally avoids the machinery involved in parsing format strings.
`stpcpy` is just like `strcpy` except it returns the pointer to the new
tail of `dest`.  This optimization was introduced into clang-12.

Implement this so that we don't observe linkage failures due to missing
symbol definitions for `stpcpy`.

Similar to last year's fire drill with: commit 5093e926bb0e
("lib/string.c: implement a basic bcmp")

The kernel is somewhere between a "freestanding" environment (no full
libc) and "hosted" environment (many symbols from libc exist with the
same type, function signature, and semantics).

As Peter Anvin notes, there's not really a great way to inform the
compiler that you're targeting a freestanding environment but would like
to opt-in to some libcall optimizations (see pr/47280 below), rather
than opt-out.

Arvind notes, -fno-builtin-* behaves slightly differently between GCC
and Clang, and Clang is missing many __builtin_* definitions, which I
consider a bug in Clang and am working on fixing.

Masahiro summarizes the subtle distinction between compilers justly:
  To prevent transformation from foo() into bar(), there are two ways in
  Clang to do that; -fno-builtin-foo, and -fno-builtin-bar.  There is
  only one in GCC; -fno-buitin-foo.

(Any difference in that behavior in Clang is likely a bug from a missing
__builtin_* definition.)

Masahiro also notes:
  We want to disable optimization from foo() to bar(),
  but we may still benefit from the optimization from
  foo() into something else. If GCC implements the same transform, we
  would run into a problem because it is not -fno-builtin-bar, but
  -fno-builtin-foo that disables that optimization.

  In this regard, -fno-builtin-foo would be more future-proof than
  -fno-built-bar, but -fno-builtin-foo is still potentially overkill. We
  may want to prevent calls from foo() being optimized into calls to
  bar(), but we still may want other optimization on calls to foo().

It seems that compilers today don't quite provide the fine grain control
over which libcall optimizations pseudo-freestanding environments would
prefer.

Finally, Kees notes that this interface is unsafe, so we should not
encourage its use.  As such, I've removed the declaration from any
header, but it still needs to be exported to avoid linkage errors in
modules.

Reported-by: Sami Tolvanen <samitolvanen@google.com>
Suggested-by: Andy Lavr <andy.lavr@gmail.com>
Suggested-by: Arvind Sankar <nivedita@alum.mit.edu>
Suggested-by: Joe Perches <joe@perches.com>
Suggested-by: Kees Cook <keescook@chromium.org>
Suggested-by: Masahiro Yamada <masahiroy@kernel.org>
Suggested-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Nathan Chancellor <natechancellor@gmail.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20200914161643.938408-1-ndesaulniers@google.com
Link: https://bugs.llvm.org/show_bug.cgi?id=47162
Link: https://bugs.llvm.org/show_bug.cgi?id=47280
Link: https://github.com/ClangBuiltLinux/linux/issues/1126
Link: https://man7.org/linux/man-pages/man3/stpcpy.3.html
Link: https://pubs.opengroup.org/onlinepubs/9699919799/functions/stpcpy.html
Link: https://reviews.llvm.org/D85963
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm/migrate: correct thp migration stats
Zi Yan [Sat, 26 Sep 2020 04:19:14 +0000 (21:19 -0700)]
mm/migrate: correct thp migration stats

PageTransHuge returns true for both thp and hugetlb, so thp stats was
counting both thp and hugetlb migrations.  Exclude hugetlb migration by
setting is_thp variable right.

Clean up thp handling code too when we are there.

Fixes: 7f9aa991e53d ("mm/vmstat: add events for THP migration without split")
Signed-off-by: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lkml.kernel.org/r/20200917210413.1462975-1-zi.yan@sent.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm/gup: fix gup_fast with dynamic page table folding
Vasily Gorbik [Sat, 26 Sep 2020 04:19:10 +0000 (21:19 -0700)]
mm/gup: fix gup_fast with dynamic page table folding

Currently to make sure that every page table entry is read just once
gup_fast walks perform READ_ONCE and pass pXd value down to the next
gup_pXd_range function by value e.g.:

  static int gup_pud_range(p4d_t p4d, unsigned long addr, unsigned long end,
                           unsigned int flags, struct page **pages, int *nr)
  ...
          pudp = pud_offset(&p4d, addr);

This function passes a reference on that local value copy to pXd_offset,
and might get the very same pointer in return.  This happens when the
level is folded (on most arches), and that pointer should not be
iterated.

On s390 due to the fact that each task might have different 5,4 or
3-level address translation and hence different levels folded the logic
is more complex and non-iteratable pointer to a local copy leads to
severe problems.

Here is an example of what happens with gup_fast on s390, for a task
with 3-level paging, crossing a 2 GB pud boundary:

  // addr = 0x1007ffff000, end = 0x10080001000
  static int gup_pud_range(p4d_t p4d, unsigned long addr, unsigned long end,
                           unsigned int flags, struct page **pages, int *nr)
  {
        unsigned long next;
        pud_t *pudp;

        // pud_offset returns &p4d itself (a pointer to a value on stack)
        pudp = pud_offset(&p4d, addr);
        do {
                // on second iteratation reading "random" stack value
                pud_t pud = READ_ONCE(*pudp);

                // next = 0x10080000000, due to PUD_SIZE/MASK != PGDIR_SIZE/MASK on s390
                next = pud_addr_end(addr, end);
                ...
        } while (pudp++, addr = next, addr != end); // pudp++ iterating over stack

        return 1;
  }

This happens since s390 moved to common gup code with commit
a3201ad56ca0 ("s390/mm: make the pxd_offset functions more robust") and
commit 1c8abd149bd0 ("s390/mm: convert to the generic
get_user_pages_fast code").

s390 tried to mimic static level folding by changing pXd_offset
primitives to always calculate top level page table offset in pgd_offset
and just return the value passed when pXd_offset has to act as folded.

What is crucial for gup_fast and what has been overlooked is that
PxD_SIZE/MASK and thus pXd_addr_end should also change correspondingly.
And the latter is not possible with dynamic folding.

To fix the issue in addition to pXd values pass original pXdp pointers
down to gup_pXd_range functions.  And introduce pXd_offset_lockless
helpers, which take an additional pXd entry value parameter.  This has
already been discussed in

  https://lkml.kernel.org/r/20190418100218.0a4afd51@mschwideX1

Fixes: 1c8abd149bd0 ("s390/mm: convert to the generic get_user_pages_fast code")
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
Cc: <stable@vger.kernel.org> [5.2+]
Link: https://lkml.kernel.org/r/patch.git-943f1e5dcff2.your-ad-here.call-01599856292-ext-8676@work.hours
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm: memcontrol: fix missing suffix of workingset_restore
Muchun Song [Sat, 26 Sep 2020 04:19:05 +0000 (21:19 -0700)]
mm: memcontrol: fix missing suffix of workingset_restore

We forget to add the suffix to the workingset_restore string, so fix it.

And also update the documentation of cgroup-v2.rst.

Fixes: 08dbfb3b7740 ("mm/workingset: prepare the workingset detection infrastructure for anon LRU")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Tejun Heo <tj@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Link: https://lkml.kernel.org/r/20200916100030.71698-1-songmuchun@bytedance.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm, THP, swap: fix allocating cluster for swapfile by mistake
Gao Xiang [Sat, 26 Sep 2020 04:19:01 +0000 (21:19 -0700)]
mm, THP, swap: fix allocating cluster for swapfile by mistake

SWP_FS is used to make swap_{read,write}page() go through the
filesystem, and it's only used for swap files over NFS.  So, !SWP_FS
means non NFS for now, it could be either file backed or device backed.
Something similar goes with legacy SWP_FILE.

So in order to achieve the goal of the original patch, SWP_BLKDEV should
be used instead.

FS corruption can be observed with SSD device + XFS + fragmented
swapfile due to CONFIG_THP_SWAP=y.

I reproduced the issue with the following details:

Environment:

  QEMU + upstream kernel + buildroot + NVMe (2 GB)

Kernel config:

  CONFIG_BLK_DEV_NVME=y
  CONFIG_THP_SWAP=y

Some reproducible steps:

  mkfs.xfs -f /dev/nvme0n1
  mkdir /tmp/mnt
  mount /dev/nvme0n1 /tmp/mnt
  bs="32k"
  sz="1024m"    # doesn't matter too much, I also tried 16m
  xfs_io -f -c "pwrite -R -b $bs 0 $sz" -c "fdatasync" /tmp/mnt/sw
  xfs_io -f -c "pwrite -R -b $bs 0 $sz" -c "fdatasync" /tmp/mnt/sw
  xfs_io -f -c "pwrite -R -b $bs 0 $sz" -c "fdatasync" /tmp/mnt/sw
  xfs_io -f -c "pwrite -F -S 0 -b $bs 0 $sz" -c "fdatasync" /tmp/mnt/sw
  xfs_io -f -c "pwrite -R -b $bs 0 $sz" -c "fsync" /tmp/mnt/sw

  mkswap /tmp/mnt/sw
  swapon /tmp/mnt/sw

  stress --vm 2 --vm-bytes 600M   # doesn't matter too much as well

Symptoms:
 - FS corruption (e.g. checksum failure)
 - memory corruption at: 0xd2808010
 - segfault

Fixes: 4a62af296daa ("mm, THP, swap: Don't allocate huge cluster for file backed swap device")
Fixes: 52139f97334d ("mm, THP, swap: delay splitting THP during swap out")
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: "Huang, Ying" <ying.huang@intel.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Acked-by: Rafael Aquini <aquini@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Carlos Maiolino <cmaiolino@redhat.com>
Cc: Eric Sandeen <esandeen@redhat.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20200820045323.7809-1-hsiangkao@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm: slab: fix potential double free in ___cache_free
Shakeel Butt [Sat, 26 Sep 2020 14:13:41 +0000 (07:13 -0700)]
mm: slab: fix potential double free in ___cache_free

With the commit faa69fb579fc ("mm: memcg/slab: use a single set of
kmem_caches for all allocations"), it becomes possible to call kfree()
from the slabs_destroy().

The functions cache_flusharray() and do_drain() calls slabs_destroy() on
array_cache of the local CPU without updating the size of the
array_cache.  This enables the kfree() call from the slabs_destroy() to
recursively call cache_flusharray() which can potentially call
free_block() on the same elements of the array_cache of the local CPU
and causing double free and memory corruption.

To fix the issue, simply update the local CPU array_cache cache before
calling slabs_destroy().

Fixes: faa69fb579fc ("mm: memcg/slab: use a single set of kmem_caches for all allocations")
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Reviewed-by: Roman Gushchin <guro@fb.com>
Tested-by: Ming Lei <ming.lei@redhat.com>
Reported-by: kernel test robot <rong.a.chen@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ted Ts'o <tytso@mit.edu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoMerge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Linus Torvalds [Sat, 26 Sep 2020 00:15:19 +0000 (17:15 -0700)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull more kvm fixes from Paolo Bonzini:
 "Five small fixes.

  The nested migration bug will be fixed with a better API in 5.10 or
  5.11, for now this is a fix that works with existing userspace but
  keeps the current ugly API"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: SVM: Add a dedicated INVD intercept routine
  KVM: x86: Reset MMU context if guest toggles CR4.SMAP or CR4.PKE
  KVM: x86: fix MSR_IA32_TSC read for nested migration
  selftests: kvm: Fix assert failure in single-step test
  KVM: x86: VMX: Make smaller physical guest address space support user-configurable

4 years agoMerge tag 'mips_fixes_5.9_3' of git://git.kernel.org/pub/scm/linux/kernel/git/mips...
Linus Torvalds [Fri, 25 Sep 2020 22:24:04 +0000 (15:24 -0700)]
Merge tag 'mips_fixes_5.9_3' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux

Pull MIPS fixes from Thomas Bogendoerfer:

 - fixed FP register access on Loongsoon-3

 - added missing 1074 cpu handling

 - fixed Loongson2ef build error

* tag 'mips_fixes_5.9_3' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
  MIPS: BCM47XX: Remove the needless check with the 1074K
  MIPS: Add the missing 'CPU_1074K' into __get_cpu_type()
  MIPS: Loongson2ef: Disable Loongson MMI instructions
  MIPS: Loongson-3: Fix fp register access if MSA enabled

4 years agoMerge tag 'spi-fix-v5.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Linus Torvalds [Fri, 25 Sep 2020 22:21:54 +0000 (15:21 -0700)]
Merge tag 'spi-fix-v5.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi

Pull spi fixes from Mark Brown:
 "A small collection of driver specific fixes, the fsl-espi and bcm-qspi
  changes in particular have been causing breakage for users"

* tag 'spi-fix-v5.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi: bcm-qspi: Fix probe regression on iProc platforms
  spi: fsl-dspi: fix use-after-free in remove path
  spi: fsl-espi: Only process interrupts for expected events
  spi: bcm2835: Make polling_limit_us static
  spi: spi-fsl-dspi: use XSPI mode instead of DMA for DPAA2 SoCs

4 years agoMerge tag 'regulator-fix-v5.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 25 Sep 2020 22:16:01 +0000 (15:16 -0700)]
Merge tag 'regulator-fix-v5.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator

Pull regulator fix from Mark Brown:
 "A single fix for incorrect specification of some of the register
  fields on axp20x devices which would break voltage setting on affected
  systems"

* tag 'regulator-fix-v5.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
  regulator: axp20x: fix LDO2/4 description

4 years agoMerge tag 'regmap-fix-v5.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 25 Sep 2020 22:11:24 +0000 (15:11 -0700)]
Merge tag 'regmap-fix-v5.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap

Pull regmap fixes from Mark Brown:
 "Two issues here - one is a fix for use after free issues in the case
  where a regmap overrides its name using something dynamically
  generated, the other is that we weren't handling access checks
  non-incrementing I/O on registers within paged register regions
  correctly resulting in spurious errors.

  Both of these are quite rare but serious if they occur"

* tag 'regmap-fix-v5.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
  regmap: fix page selection for noinc writes
  regmap: fix page selection for noinc reads
  regmap: debugfs: Add back in erroneously removed initialisation of ret
  regmap: debugfs: Fix handling of name string for debugfs init delays

4 years agoio_uring: ensure async buffered read-retry is setup properly
Jens Axboe [Fri, 25 Sep 2020 21:23:43 +0000 (15:23 -0600)]
io_uring: ensure async buffered read-retry is setup properly

A previous commit for fixing up short reads botched the async retry
path, so we ended up going to worker threads more often than we should.
Fix this up, so retries work the way they originally were intended to.

Fixes: 1fa9de55d2c2 ("io_uring: internally retry short reads")
Reported-by: Hao_Xu <haoxu@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoMerge tag 'nfsd-5.9-2' of git://git.linux-nfs.org/projects/cel/cel-2.6
Linus Torvalds [Fri, 25 Sep 2020 17:46:11 +0000 (10:46 -0700)]
Merge tag 'nfsd-5.9-2' of git://git.linux-nfs.org/projects/cel/cel-2.6

Pull NFS server fix from Chuck Lever:
 "Fix incorrect calculation on platforms that implement
  flush_dcache_page()"

* tag 'nfsd-5.9-2' of git://git.linux-nfs.org/projects/cel/cel-2.6:
  SUNRPC: Fix svc_flush_dcache()

4 years agoMerge tag 'pm-5.9-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Linus Torvalds [Fri, 25 Sep 2020 17:39:22 +0000 (10:39 -0700)]
Merge tag 'pm-5.9-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fixes from Rafael Wysocki:
 "These fix more fallout of recent RCU-lockdep changes in CPU idle code
  and two devfreq issues.

  Specifics:

   - Export rcu_idle_{enter,exit} to modules to fix build issues
     introduced by recent RCU-lockdep fixes (Borislav Petkov)

   - Add missing return statement to a stub function in the ACPI
     processor driver to fix a build issue introduced by recent
     RCU-lockdep fixes (Rafael Wysocki)

   - Fix recently introduced suspicious RCU usage warnings in the PSCI
     cpuidle driver and drop stale comments regarding RCU_NONIDLE()
     usage from enter_s2idle_proper() (Ulf Hansson)

   - Fix error code path in the tegra30 devfreq driver (Dan Carpenter)

   - Add missing information to devfreq_summary debugfs (Chanwoo Choi)"

* tag 'pm-5.9-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: processor: Fix build for ARCH_APICTIMER_STOPS_ON_C3 unset
  PM / devfreq: tegra30: Disable clock on error in probe
  PM / devfreq: Add timer type to devfreq_summary debugfs
  cpuidle: Drop misleading comments about RCU usage
  cpuidle: psci: Fix suspicious RCU usage
  rcu/tree: Export rcu_idle_{enter,exit} to modules

4 years agoKVM: SVM: Add a dedicated INVD intercept routine
Tom Lendacky [Thu, 24 Sep 2020 18:41:57 +0000 (13:41 -0500)]
KVM: SVM: Add a dedicated INVD intercept routine

The INVD instruction intercept performs emulation. Emulation can't be done
on an SEV guest because the guest memory is encrypted.

Provide a dedicated intercept routine for the INVD intercept. And since
the instruction is emulated as a NOP, just skip it instead.

Fixes: 527390bcc2f8 ("KVM: SVM: Add KVM_SEV_INIT command")
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Message-Id: <a0b9a19ffa7fef86a3cc700c7ea01cb2731e04e5.1600972918.git.thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoMerge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Linus Torvalds [Fri, 25 Sep 2020 16:49:19 +0000 (09:49 -0700)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma fix from Jason Gunthorpe:
 "One fix for a bug that blktests hits when using rxe: tear down the CQ
  pool before waiting for all references to go away"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  RDMA/core: Fix ordering of CQ pool destruction

4 years agoMerge tag 'drm-fixes-2020-09-25' of git://anongit.freedesktop.org/drm/drm
Linus Torvalds [Fri, 25 Sep 2020 16:41:57 +0000 (09:41 -0700)]
Merge tag 'drm-fixes-2020-09-25' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Dave Airlie:
 "Fairly quiet, a couple of i915 fixes, one dma-buf fix, one vc4 and two
  sun4i changes

  dma-buf:
   - Single null pointer deref fix

  i915:
   - Fix selftest reference to stack data out of scope
   - Fix GVT null pointer dereference

  vc4:
   - fill asoc card owner

  sun4i:
   - program secondary CSC correctly"

* tag 'drm-fixes-2020-09-25' of git://anongit.freedesktop.org/drm/drm:
  drm/i915/selftests: Push the fake iommu device from the stack to data
  dmabuf: fix NULL pointer dereference in dma_buf_release()
  drm/i915/gvt: Fix port number for BDW on EDID region setup
  drm/sun4i: mixer: Extend regmap max_register
  drm/sun4i: sun8i-csc: Secondary CSC register correction
  drm/vc4/vc4_hdmi: fill ASoC card owner

4 years agoMerge branch 'pm-cpuidle'
Rafael J. Wysocki [Fri, 25 Sep 2020 16:33:46 +0000 (18:33 +0200)]
Merge branch 'pm-cpuidle'

* pm-cpuidle:
  ACPI: processor: Fix build for ARCH_APICTIMER_STOPS_ON_C3 unset
  cpuidle: Drop misleading comments about RCU usage
  cpuidle: psci: Fix suspicious RCU usage
  rcu/tree: Export rcu_idle_{enter,exit} to modules

4 years agoio_uring: don't unconditionally set plug->nowait = true
Jens Axboe [Fri, 25 Sep 2020 15:01:53 +0000 (09:01 -0600)]
io_uring: don't unconditionally set plug->nowait = true

This causes all the bios to be submitted with REQ_NOWAIT, which can be
problematic on either btrfs or on file systems that otherwise use a mix
of block devices where only some of them support it.

For now, just remove the setting of plug->nowait = true.

Reported-by: Dan Melnic <dmm@fb.com>
Reported-by: Brian Foster <bfoster@redhat.com>
Fixes: be6d4627241c ("io_uring: re-issue block requests that failed because of resources")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoMerge tag 'devfreq-fixes-for-5.9-rc7' of git://git.kernel.org/pub/scm/linux/kernel...
Rafael J. Wysocki [Fri, 25 Sep 2020 14:33:19 +0000 (16:33 +0200)]
Merge tag 'devfreq-fixes-for-5.9-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux

Pull devfreq updates for 5.9-rc7 from Chanwoo Choi:

"1. Update devfreq core
  - Add missing timer type to devfreq_summary debugfs node.

 2. Fix devfreq device driver
  - Fix the exception handling about clock on tegra30-devfreq.c"

* tag 'devfreq-fixes-for-5.9-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux:
  PM / devfreq: tegra30: Disable clock on error in probe
  PM / devfreq: Add timer type to devfreq_summary debugfs

4 years agoblock: remove unused BLK_QC_T_EAGAIN flag
Jeffle Xu [Fri, 25 Sep 2020 06:00:31 +0000 (14:00 +0800)]
block: remove unused BLK_QC_T_EAGAIN flag

commit 8881f9ecd517 ("block: remove REQ_NOWAIT_INLINE") removed the
REQ_NOWAIT_INLINE related code, but the diff wasn't applied to
blk_types.h somehow.

Then commit a835071977af ("block: remove the REQ_NOWAIT_INLINE flag")
removed the REQ_NOWAIT_INLINE flag while the BLK_QC_T_EAGAIN flag still
remains.

Fixes: 8881f9ecd517 ("block: remove REQ_NOWAIT_INLINE")
Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoio_uring: ensure open/openat2 name is cleaned on cancelation
Jens Axboe [Thu, 24 Sep 2020 20:55:54 +0000 (14:55 -0600)]
io_uring: ensure open/openat2 name is cleaned on cancelation

If we cancel these requests, we'll leak the memory associated with the
filename. Add them to the table of ops that need cleaning, if
REQ_F_NEED_CLEANUP is set.

Cc: stable@vger.kernel.org
Fixes: 3419807290cc ("io_uring: call statx directly")
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoKVM: x86: Reset MMU context if guest toggles CR4.SMAP or CR4.PKE
Sean Christopherson [Wed, 23 Sep 2020 21:53:52 +0000 (14:53 -0700)]
KVM: x86: Reset MMU context if guest toggles CR4.SMAP or CR4.PKE

Reset the MMU context during kvm_set_cr4() if SMAP or PKE is toggled.
Recent commits to (correctly) not reload PDPTRs when SMAP/PKE are
toggled inadvertantly skipped the MMU context reset due to the mask
of bits that triggers PDPTR loads also being used to trigger MMU context
resets.

Fixes: c186276251e9 ("kvm: x86: Toggling CR4.SMAP does not load PDPTEs in PAE mode")
Fixes: 2e38dbb331fb ("kvm: x86: Toggling CR4.PKE does not load PDPTEs in PAE mode")
Cc: Jim Mattson <jmattson@google.com>
Cc: Peter Shier <pshier@google.com>
Cc: Oliver Upton <oupton@google.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200923215352.17756-1-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoMerge tag 'drm-misc-fixes-2020-09-24' of git://anongit.freedesktop.org/drm/drm-misc...
Dave Airlie [Fri, 25 Sep 2020 01:28:36 +0000 (11:28 +1000)]
Merge tag 'drm-misc-fixes-2020-09-24' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes

drm-misc-fixes for v5.9:
- Single null pointer deref fix for dma-buf.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/4106c21e-f52c-4c05-6cdb-daa743bb8617@linux.intel.com
4 years agoMerge tag 'drm-intel-fixes-2020-09-24' of git://anongit.freedesktop.org/drm/drm-intel...
Dave Airlie [Fri, 25 Sep 2020 01:07:01 +0000 (11:07 +1000)]
Merge tag 'drm-intel-fixes-2020-09-24' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes

drm/i915 fixes for v5.9-rc7:
- Fix selftest reference to stack data out of scope
- Fix GVT null pointer dereference
- Backmerge from Linus' master to fix build

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/87zh5fpmha.fsf@intel.com
4 years agoBackMerge commit '873fdb7702e27b8008ca9f2012a485d82347fd4d' into drm-fixes
Dave Airlie [Fri, 25 Sep 2020 01:06:18 +0000 (11:06 +1000)]
BackMerge commit '873fdb7702e27b8008ca9f2012a485d82347fd4d' into drm-fixes

The dax mess had some fallout, and i915 used a later base to fix their CI.

Signed-off-by: Dave Airlie <airlied@redhat.com>
4 years agoMerge tag 'nvme-5.9-2020-09-24' of git://git.infradead.org/nvme into block-5.9
Jens Axboe [Thu, 24 Sep 2020 19:42:40 +0000 (13:42 -0600)]
Merge tag 'nvme-5.9-2020-09-24' of git://git.infradead.org/nvme into block-5.9

Pull NVMe fixes from Christoph:

"nvme fixes for 5.9

  - fix error during controller probe that cause double free irqs
    (Keith Busch)
  - FC connection establishment fix (James Smart)
  - properly handle completions for invalid tags (Xianting Tian)
  - pass the correct nsid to the command effects and supported log
    (Chaitanya Kulkarni)"

* tag 'nvme-5.9-2020-09-24' of git://git.infradead.org/nvme:
  nvme-core: don't use NVME_NSID_ALL for command effects and supported log
  nvme-fc: fail new connections to a deleted host or remote port
  nvme-pci: fix NULL req in completion handler
  nvme: return errors for hwmon init

4 years agoKVM: x86: fix MSR_IA32_TSC read for nested migration
Maxim Levitsky [Mon, 21 Sep 2020 10:38:05 +0000 (13:38 +0300)]
KVM: x86: fix MSR_IA32_TSC read for nested migration

MSR reads/writes should always access the L1 state, since the (nested)
hypervisor should intercept all the msrs it wants to adjust, and these
that it doesn't should be read by the guest as if the host had read it.

However IA32_TSC is an exception. Even when not intercepted, guest still
reads the value + TSC offset.
The write however does not take any TSC offset into account.

This is documented in Intel's SDM and seems also to happen on AMD as well.

This creates a problem when userspace wants to read the IA32_TSC value and then
write it. (e.g for migration)

In this case it reads L2 value but write is interpreted as an L1 value.
To fix this make the userspace initiated reads of IA32_TSC return L1 value
as well.

Huge thanks to Dave Gilbert for helping me understand this very confusing
semantic of MSR writes.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20200921103805.9102-2-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoMerge tag 'mmc-v5.9-rc4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Linus Torvalds [Thu, 24 Sep 2020 16:09:47 +0000 (09:09 -0700)]
Merge tag 'mmc-v5.9-rc4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc

Pull MMC fix from Ulf Hansson:
 "Fix build warning in mmc_spi when CONFIG_HAS_DMA is unset"

* tag 'mmc-v5.9-rc4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
  mmc: mmc_spi: Fix mmc_spi_dma_alloc() return type for !HAS_DMA

4 years agoMerge tag 'media/v5.9-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab...
Linus Torvalds [Thu, 24 Sep 2020 16:05:04 +0000 (09:05 -0700)]
Merge tag 'media/v5.9-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media

Pull media fixes from Mauro Carvalho Chehab:

 - fix a regression at the CEC adapter core

 - two uAPI patches (one revert) for changes in this development cycle

* tag 'media/v5.9-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
  media: dt-bindings: media: imx274: Convert to json-schema
  media: media/v4l2: remove V4L2_FLAG_MEMORY_NON_CONSISTENT flag
  media: cec-adap.c: don't use flush_scheduled_work()

4 years agoMerge tag 'sound-5.9-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai...
Linus Torvalds [Thu, 24 Sep 2020 16:00:05 +0000 (09:00 -0700)]
Merge tag 'sound-5.9-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
 "Just a handful small device-specific fixes including a couple of
  reverts"

* tag 'sound-5.9-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  Revert "ALSA: usb-audio: Disable Lenovo P620 Rear line-in volume control"
  Revert "ALSA: hda - Fix silent audio output and corrupted input on MSI X570-A PRO"
  ALSA: usb-audio: Add delay quirk for H570e USB headsets
  ALSA: hda/realtek: Enable front panel headset LED on Lenovo ThinkStation P520
  ALSA: hda/realtek - Couldn't detect Mic if booting with headset plugged
  ALSA: asihpi: fix iounmap in error handler

4 years agomm: fix misplaced unlock_page in do_wp_page()
Linus Torvalds [Thu, 24 Sep 2020 15:41:32 +0000 (08:41 -0700)]
mm: fix misplaced unlock_page in do_wp_page()

Commit a9e8b43ca540 ("mm: do_wp_page() simplification") reorganized all
the code around the page re-use vs copy, but in the process also moved
the final unlock_page() around to after the wp_page_reuse() call.

That normally doesn't matter - but it means that the unlock_page() is
now done after releasing the page table lock.  Again, not a big deal,
you'd think.

But it turns out that it's very wrong indeed, because once we've
released the page table lock, we've basically lost our only reference to
the page - the page tables - and it could now be free'd at any time.  We
do hold the mmap_sem, so no actual unmap() can happen, but madvise can
come in and a MADV_DONTNEED will zap the page range - and free the page.

So now the page may be free'd just as we're unlocking it, which in turn
will usually trigger a "Bad page state" error in the freeing path.  To
make matters more confusing, by the time the debug code prints out the
page state, the unlock has typically completed and everything looks fine
again.

This all doesn't happen in any normal situations, but it does trigger
with the dirtyc0w_child LTP test.  And it seems to trigger much more
easily (but not expclusively) on s390 than elsewhere, probably because
s390 doesn't do the "batch pages up for freeing after the TLB flush"
that gives the unlock_page() more time to complete and makes the race
harder to hit.

Fixes: a9e8b43ca540 ("mm: do_wp_page() simplification")
Link: https://lore.kernel.org/lkml/a46e9bbef2ed4e17778f5615e818526ef848d791.camel@redhat.com/
Link: https://lore.kernel.org/linux-mm/c41149a8-211e-390b-af1d-d5eee690fecb@linux.alibaba.com/
Reported-by: Qian Cai <cai@redhat.com>
Reported-by: Alex Shi <alex.shi@linux.alibaba.com>
Bisected-and-analyzed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Tested-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agospi: bcm-qspi: Fix probe regression on iProc platforms
Ray Jui [Thu, 10 Sep 2020 15:25:38 +0000 (08:25 -0700)]
spi: bcm-qspi: Fix probe regression on iProc platforms

iProc chips have QSPI controller that does not have the MSPI_REV
offset. Reading from that offset will cause a bus error. Fix it by
having MSPI_REV query disabled in the generic compatible string.

Fixes: 81fd5939fcd2 ("spi: bcm-qspi: Handle lack of MSPI_REV offset")
Link: https://lore.kernel.org/linux-arm-kernel/20200909211857.4144718-1-f.fainelli@gmail.com/T/#u
Signed-off-by: Ray Jui <ray.jui@broadcom.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20200910152539.45584-3-ray.jui@broadcom.com
Signed-off-by: Mark Brown <broonie@kernel.org>
4 years agos390/zcrypt: Fix ZCRYPT_PERDEV_REQCNT ioctl
Christian Borntraeger [Mon, 21 Sep 2020 10:48:36 +0000 (12:48 +0200)]
s390/zcrypt: Fix ZCRYPT_PERDEV_REQCNT ioctl

reqcnt is an u32 pointer but we do copy sizeof(reqcnt) which is the
size of the pointer. This means we only copy 8 byte. Let us copy
the full monty.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Harald Freudenberger <freude@linux.ibm.com>
Cc: stable@vger.kernel.org
Fixes: 9c549e36892a ("s390/zcrypt: Support up to 256 crypto adapters.")
Reviewed-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
4 years agoMerge tag 'trace-v5.9-rc5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt...
Linus Torvalds [Wed, 23 Sep 2020 21:52:22 +0000 (14:52 -0700)]
Merge tag 'trace-v5.9-rc5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull bootconfig fixes from Steven Rostedt:
 "A couple of fixes for bootconfig.

  Masami discovered two bugs which this fixes and he added tests to
  cover these issues.

   - Fix a bug that breaks bootconfig tree nodes

   - Fix a bug that does not truncate whitespace properly

   - Add tests to cover the above two cases"

* tag 'trace-v5.9-rc5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tools/bootconfig: Add testcase for tailing space
  tools/bootconfig: Add testcases for repeated key with brace
  lib/bootconfig: Fix to remove tailing spaces after value
  lib/bootconfig: Fix a bug of breaking existing tree nodes

4 years agoMerge tag 'for-5.9/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/devic...
Linus Torvalds [Wed, 23 Sep 2020 21:38:21 +0000 (14:38 -0700)]
Merge tag 'for-5.9/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper fixes from Mike Snitzer:

 - DM core fix for incorrect double bio splitting. Keep "fixing" this
   because past attempts didn't fully appreciate the liability relative
   to recursive bio splitting. This fix limits DM's bio splitting to a
   single method and does _not_ use blk_queue_split() for normal IO.

 - DM crypt Documentation updates for features added during 5.9 merge.

* tag 'for-5.9/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm crypt: document encrypted keyring key option
  dm crypt: document new no_workqueue flags
  dm: fix comment in dm_process_bio()
  dm: fix bio splitting and its bio completion order for regular IO

4 years agoMerge tag 'for-5.9-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave...
Linus Torvalds [Wed, 23 Sep 2020 21:32:23 +0000 (14:32 -0700)]
Merge tag 'for-5.9-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:
 "syzkaller started to hit us with reports, here's a fix for one type
  (stack overflow when printing checksums on read error).

  The other patch is a fix for sysfs object, we have a test for that and
  it leads to a crash."

* tag 'for-5.9-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: fix put of uninitialized kobject after seed device delete
  btrfs: fix overflow when copying corrupt csums for a message

4 years agonvme-core: don't use NVME_NSID_ALL for command effects and supported log
Chaitanya Kulkarni [Tue, 22 Sep 2020 19:49:38 +0000 (12:49 -0700)]
nvme-core: don't use NVME_NSID_ALL for command effects and supported log

In the function nvme_get_effects_log() it uses NVME_NSID_ALL which has
namespace scope. The command effect log page is controller specific.

Replace NVME_NSID_ALL with 0x00 which specifies the controller scope
instead of namespace scope.

Fixes: 13329f5e2749 ("nvme: check admin passthru command effects")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=209287
Reported-by: Huai-Cheng Kuo <hh81478072@gmail.com>
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
4 years agomm: move the copy_one_pte() pte_present check into the caller
Linus Torvalds [Wed, 23 Sep 2020 17:04:16 +0000 (10:04 -0700)]
mm: move the copy_one_pte() pte_present check into the caller

This completes the split of the non-present and present pte cases by
moving the check for the source pte being present into the single
caller, which also means that we clearly separate out the very different
return value case for a non-present pte.

The present pte case currently always succeeds.

This is a pure code re-organization with no semantic change: the intent
is to make it much easier to add a new return case to the present pte
case for when we do early COW at page table copy time.

This was split out from the previous commit simply to make it easy to
visually see that there were no semantic changes from this code
re-organization.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm: split out the non-present case from copy_one_pte()
Linus Torvalds [Wed, 23 Sep 2020 16:56:59 +0000 (09:56 -0700)]
mm: split out the non-present case from copy_one_pte()

This is a purely mechanical split of the copy_one_pte() function.  It's
not immediately obvious when looking at the diff because of the
indentation change, but the way to see what is going on in this commit
is to use the "-w" flag to not show pure whitespace changes, and you see
how the first part of copy_one_pte() is simply lifted out into a
separate function.

And since the non-present case is marked unlikely, don't make the new
function be inlined.  Not that gcc really seems to care, since it looks
like it will inline it anyway due to the whole "single callsite for
static function" logic.  In fact, code generation with the function
split is almost identical to before.  But not marking it inline is the
right thing to do.

This is pure prep-work and cleanup for subsequent changes.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agospi: fsl-dspi: fix use-after-free in remove path
Sascha Hauer [Wed, 23 Sep 2020 13:10:26 +0000 (15:10 +0200)]
spi: fsl-dspi: fix use-after-free in remove path

spi_unregister_controller() not only unregisters the controller, but
also frees the controller. This will free the driver data with it, so
we must not access it later dspi_remove().

Solve this by allocating the driver data separately from the SPI
controller.

Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Link: https://lore.kernel.org/r/20200923131026.20707-1-s.hauer@pengutronix.de
Signed-off-by: Mark Brown <broonie@kernel.org>
4 years agoregulator: axp20x: fix LDO2/4 description
Icenowy Zheng [Wed, 23 Sep 2020 00:51:42 +0000 (08:51 +0800)]
regulator: axp20x: fix LDO2/4 description

Currently we wrongly set the mask of value of LDO2/4 both to the mask of
LDO2, and the LDO4 voltage configuration is left untouched. This leads
to conflict when LDO2/4 are both in use.

Fix this issue by setting different vsel_mask to both regulators.

Fixes: 6a624ed1dbd9 ("regulator: axp20x: use defines for masks")
Signed-off-by: Icenowy Zheng <icenowy@aosc.io>
Link: https://lore.kernel.org/r/20200923005142.147135-1-icenowy@aosc.io
Signed-off-by: Mark Brown <broonie@kernel.org>
4 years agoselftests: kvm: Fix assert failure in single-step test
Yang Weijiang [Wed, 26 Aug 2020 01:55:24 +0000 (09:55 +0800)]
selftests: kvm: Fix assert failure in single-step test

This is a follow-up patch to fix an issue left in commit:
cb9f391ea8e8d39ae81adc741bc82f0ce0f4e439
selftests: kvm: Use a shorter encoding to clear RAX

With the change in the commit, we also need to modify "xor" instruction
length from 3 to 2 in array ss_size accordingly to pass below check:

for (i = 0; i < (sizeof(ss_size) / sizeof(ss_size[0])); i++) {
        target_rip += ss_size[i];
        CLEAR_DEBUG();
        debug.control = KVM_GUESTDBG_ENABLE | KVM_GUESTDBG_SINGLESTEP;
        debug.arch.debugreg[7] = 0x00000400;
        APPLY_DEBUG();
        vcpu_run(vm, VCPU_ID);
        TEST_ASSERT(run->exit_reason == KVM_EXIT_DEBUG &&
                    run->debug.arch.exception == DB_VECTOR &&
                    run->debug.arch.pc == target_rip &&
                    run->debug.arch.dr6 == target_dr6,
                    "SINGLE_STEP[%d]: exit %d exception %d rip 0x%llx "
                    "(should be 0x%llx) dr6 0x%llx (should be 0x%llx)",
                    i, run->exit_reason, run->debug.arch.exception,
                    run->debug.arch.pc, target_rip, run->debug.arch.dr6,
                    target_dr6);
}

Reported-by: kernel test robot <rong.a.chen@intel.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Message-Id: <20200826015524.13251-1-weijiang.yang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoKVM: x86: VMX: Make smaller physical guest address space support user-configurable
Mohammed Gamal [Thu, 3 Sep 2020 14:11:22 +0000 (16:11 +0200)]
KVM: x86: VMX: Make smaller physical guest address space support user-configurable

This patch exposes allow_smaller_maxphyaddr to the user as a module parameter.
Since smaller physical address spaces are only supported on VMX, the
parameter is only exposed in the kvm_intel module.

For now disable support by default, and let the user decide if they want
to enable it.

Modifications to VMX page fault and EPT violation handling will depend
on whether that parameter is enabled.

Signed-off-by: Mohammed Gamal <mgamal@redhat.com>
Message-Id: <20200903141122.72908-1-mgamal@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoMIPS: BCM47XX: Remove the needless check with the 1074K
Wei Li [Wed, 23 Sep 2020 06:53:26 +0000 (14:53 +0800)]
MIPS: BCM47XX: Remove the needless check with the 1074K

As there is no known soc powered by mips 1074K in bcm47xx series,
the check with 1074K is needless. So just remove it.

Link: https://wireless.wiki.kernel.org/en/users/Drivers/b43/soc
Fixes: 525269dcf957 ("MIPS: Add 1074K CPU support explicitly.")
Signed-off-by: Wei Li <liwei391@huawei.com>
Acked-by: Rafał Miłecki <rafal@milecki.pl>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
4 years agoMIPS: Add the missing 'CPU_1074K' into __get_cpu_type()
Wei Li [Wed, 23 Sep 2020 06:53:12 +0000 (14:53 +0800)]
MIPS: Add the missing 'CPU_1074K' into __get_cpu_type()

Commit 525269dcf957 ("MIPS: Add 1074K CPU support explicitly.") split
1074K from the 74K as an unique CPU type, while it missed to add the
'CPU_1074K' in __get_cpu_type(). So let's add it back.

Fixes: 525269dcf957 ("MIPS: Add 1074K CPU support explicitly.")
Signed-off-by: Wei Li <liwei391@huawei.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
4 years agoMIPS: Loongson2ef: Disable Loongson MMI instructions
Jiaxun Yang [Wed, 23 Sep 2020 10:33:12 +0000 (18:33 +0800)]
MIPS: Loongson2ef: Disable Loongson MMI instructions

It was missed when I was forking Loongson2ef from Loongson64 but
should be applied to Loongson2ef as march=loongson2f
will also enable Loongson MMI in GCC-9+.

Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Fixes: e775c6217f38 ("MIPS: Fork loongson2ef from loongson64")
Reported-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: stable@vger.kernel.org # v5.8+
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
4 years agoACPI: processor: Fix build for ARCH_APICTIMER_STOPS_ON_C3 unset
Rafael J. Wysocki [Wed, 23 Sep 2020 11:50:12 +0000 (13:50 +0200)]
ACPI: processor: Fix build for ARCH_APICTIMER_STOPS_ON_C3 unset

Fix the lapic_timer_needs_broadcast() stub for
ARCH_APICTIMER_STOPS_ON_C3 unset to actually return
a value.

Fixes: ca0b43305061 ("ACPI: processor: Use CPUIDLE_FLAG_TIMER_STOP")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
4 years agodrm/i915/selftests: Push the fake iommu device from the stack to data
Chris Wilson [Wed, 16 Sep 2020 10:50:22 +0000 (11:50 +0100)]
drm/i915/selftests: Push the fake iommu device from the stack to data

Since we store a pointer to the fake iommu device that is allocated on
the stack, as soon as we leave the function it goes out of scope and any
future dereference is undefined behaviour. Just in case we may need to
look at the fake iommu device after initialiation, move the allocation
from the stack into the data.

Fixes: 8421db464818 ("iommu/vt-d: Use dev_iommu_priv_get/set()")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200916105022.28316-2-chris@chris-wilson.co.uk
(cherry picked from commit 9f9f4101fc98db56714e71676d5a1e2d27e01f7e)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
4 years agoPM / devfreq: tegra30: Disable clock on error in probe
Dan Carpenter [Tue, 8 Sep 2020 07:25:57 +0000 (10:25 +0300)]
PM / devfreq: tegra30: Disable clock on error in probe

This error path needs to call clk_disable_unprepare().

Fixes: 290f3facde37 ("PM / devfreq: tegra30: Handle possible round-rate error")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
4 years agoPM / devfreq: Add timer type to devfreq_summary debugfs
Chanwoo Choi [Tue, 8 Sep 2020 10:57:07 +0000 (19:57 +0900)]
PM / devfreq: Add timer type to devfreq_summary debugfs

The commit b81710670d10 ("PM / devfreq: Add support delayed timer for
polling mode") supports the delayed timer but this commit missed
the adding the timer type to devfreq_summary debugfs node.
Add the timer type to devfreq_summary debugfs.

Fixes: b81710670d10 ("PM / devfreq: Add support delayed timer for polling mode")
Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
4 years agoMerge tag 'drm-misc-fixes-2020-09-18' of git://anongit.freedesktop.org/drm/drm-misc...
Dave Airlie [Tue, 22 Sep 2020 23:30:29 +0000 (09:30 +1000)]
Merge tag 'drm-misc-fixes-2020-09-18' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes

drm-misc-fixes for v5.9-rc6:
- Fill asoc card owner in vc4.
- Program secondary CSC correctly in sun4i, and extend
  register mapping to cover secondary CSC registers.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/e3ab56cf-3b8e-9b21-f1b6-9a4989a52996@linux.intel.com
4 years agoMerge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Linus Torvalds [Tue, 22 Sep 2020 22:08:41 +0000 (15:08 -0700)]
Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull vfs fixes from Al Viro:
 "No common topic, just assorted fixes"

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fuse: fix the ->direct_IO() treatment of iov_iter
  fs: fix cast in fsparam_u32hex() macro
  vboxsf: Fix the check for the old binary mount-arguments struct

4 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Tue, 22 Sep 2020 21:43:50 +0000 (14:43 -0700)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:

 - fix failure to add bond interfaces to a bridge, the offload-handling
   code was too defensive there and recent refactoring unearthed that.
   Users complained (Ido)

 - fix unnecessarily reflecting ECN bits within TOS values / QoS marking
   in TCP ACK and reset packets (Wei)

 - fix a deadlock with bpf iterator. Hopefully we're in the clear on
   this front now... (Yonghong)

 - BPF fix for clobbering r2 in bpf_gen_ld_abs (Daniel)

 - fix AQL on mt76 devices with FW rate control and add a couple of AQL
   issues in mac80211 code (Felix)

 - fix authentication issue with mwifiex (Maximilian)

 - WiFi connectivity fix: revert IGTK support in ti/wlcore (Mauro)

 - fix exception handling for multipath routes via same device (David
   Ahern)

 - revert back to a BH spin lock flavor for nsid_lock: there are paths
   which do require the BH context protection (Taehee)

 - fix interrupt / queue / NAPI handling in the lantiq driver (Hauke)

 - fix ife module load deadlock (Cong)

 - make an adjustment to netlink reply message type for code added in
   this release (the sole change touching uAPI here) (Michal)

 - a number of fixes for small NXP and Microchip switches (Vladimir)

[ Pull request acked by David: "you can expect more of this in the
  future as I try to delegate more things to Jakub" ]

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (167 commits)
  net: mscc: ocelot: fix some key offsets for IP4_TCP_UDP VCAP IS2 entries
  net: dsa: seville: fix some key offsets for IP4_TCP_UDP VCAP IS2 entries
  net: dsa: felix: fix some key offsets for IP4_TCP_UDP VCAP IS2 entries
  inet_diag: validate INET_DIAG_REQ_PROTOCOL attribute
  net: bridge: br_vlan_get_pvid_rcu() should dereference the VLAN group under RCU
  net: Update MAINTAINERS for MediaTek switch driver
  net/mlx5e: mlx5e_fec_in_caps() returns a boolean
  net/mlx5e: kTLS, Avoid kzalloc(GFP_KERNEL) under spinlock
  net/mlx5e: kTLS, Fix leak on resync error flow
  net/mlx5e: kTLS, Add missing dma_unmap in RX resync
  net/mlx5e: kTLS, Fix napi sync and possible use-after-free
  net/mlx5e: TLS, Do not expose FPGA TLS counter if not supported
  net/mlx5e: Fix using wrong stats_grps in mlx5e_update_ndo_stats()
  net/mlx5e: Fix multicast counter not up-to-date in "ip -s"
  net/mlx5e: Fix endianness when calculating pedit mask first bit
  net/mlx5e: Enable adding peer miss rules only if merged eswitch is supported
  net/mlx5e: CT: Fix freeing ct_label mapping
  net/mlx5e: Fix memory leak of tunnel info when rule under multipath not ready
  net/mlx5e: Use synchronize_rcu to sync with NAPI
  net/mlx5e: Use RCU to protect rq->xdp_prog
  ...

4 years agoMerge tag 'io_uring-5.9-2020-09-22' of git://git.kernel.dk/linux-block
Linus Torvalds [Tue, 22 Sep 2020 21:36:50 +0000 (14:36 -0700)]
Merge tag 'io_uring-5.9-2020-09-22' of git://git.kernel.dk/linux-block

Pull io_uring fixes from Jens Axboe:
 "A few fixes - most of them regression fixes from this cycle, but also
  a few stable heading fixes, and a build fix for the included demo tool
  since some systems now actually have gettid() available"

* tag 'io_uring-5.9-2020-09-22' of git://git.kernel.dk/linux-block:
  io_uring: fix openat/openat2 unified prep handling
  io_uring: mark statx/files_update/epoll_ctl as non-SQPOLL
  tools/io_uring: fix compile breakage
  io_uring: don't use retry based buffered reads for non-async bdev
  io_uring: don't re-setup vecs/iter in io_resumit_prep() is already there
  io_uring: don't run task work on an exiting task
  io_uring: drop 'ctx' ref on task work cancelation
  io_uring: grab any needed state during defer prep

4 years agoMerge tag 'block-5.9-2020-09-22' of git://git.kernel.dk/linux-block
Linus Torvalds [Tue, 22 Sep 2020 21:31:38 +0000 (14:31 -0700)]
Merge tag 'block-5.9-2020-09-22' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
 "A few NVMe fixes, and a dasd write zero fix"

* tag 'block-5.9-2020-09-22' of git://git.kernel.dk/linux-block:
  nvmet: get transport reference for passthru ctrl
  nvme-core: get/put ctrl and transport module in nvme_dev_open/release()
  nvme-tcp: fix kconfig dependency warning when !CRYPTO
  nvme-pci: disable the write zeros command for Intel 600P/P3100
  s390/dasd: Fix zero write for FBA devices

4 years agocpuidle: Drop misleading comments about RCU usage
Ulf Hansson [Tue, 22 Sep 2020 09:15:50 +0000 (11:15 +0200)]
cpuidle: Drop misleading comments about RCU usage

The commit 851764809860 ("sched,idle,rcu: Push rcu_idle deeper into the
idle path"), moved the calls rcu_idle_enter|exit() into the cpuidle core.

However, it forgot to remove a couple of comments in enter_s2idle_proper()
about why RCU_NONIDLE earlier was needed. So, let's drop them as they have
become a bit misleading.

Fixes: 851764809860 ("sched,idle,rcu: Push rcu_idle deeper into the idle path")
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
4 years agodm crypt: document encrypted keyring key option
Milan Broz [Thu, 20 Aug 2020 19:20:26 +0000 (21:20 +0200)]
dm crypt: document encrypted keyring key option

Commit 345174b7962e7 ("dm crypt: support using encrypted keys")
introduced support for encrypted keyring type.

Fix documentation in admin guide to mention this type.

Fixes: 345174b7962e7 ("dm crypt: support using encrypted keys")
Signed-off-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
4 years agoMerge tag 'gvt-fixes-2020-09-17' of https://github.com/intel/gvt-linux into drm-intel...
Jani Nikula [Tue, 22 Sep 2020 17:25:11 +0000 (20:25 +0300)]
Merge tag 'gvt-fixes-2020-09-17' of https://github.com/intel/gvt-linux into drm-intel-fixes

gvt-fixes-2020-09-17

- Fix kernel oops for VFIO edid on BDW (Zhenyu)

Signed-off-by: Jani Nikula <jani.nikula@intel.com>
From: Zhenyu Wang <zhenyuw@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200917064208.GF11592@zhen-hp.sh.intel.com
4 years agodm crypt: document new no_workqueue flags
Milan Broz [Thu, 20 Aug 2020 17:45:38 +0000 (19:45 +0200)]
dm crypt: document new no_workqueue flags

Commit 376c416a5ffd ("dm crypt: add flags to optionally bypass kcryptd
workqueues") introduced new dm-crypt 'no_read_workqueue' and
'no_write_workqueue' flags.

Add documentation to admin guide for them.

Fixes: 376c416a5ffd ("dm crypt: add flags to optionally bypass kcryptd workqueues")
Signed-off-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
4 years agoMerge tag 'trace-v5.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt...
Linus Torvalds [Tue, 22 Sep 2020 16:08:33 +0000 (09:08 -0700)]
Merge tag 'trace-v5.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:

 - Check kprobe is enabled before unregistering from ftrace as it isn't
   registered when disabled.

 - Remove kprobes enabled via command-line that is on init text when
   freed.

 - Add missing RCU synchronization for ftrace trampoline symbols removed
   from kallsyms.

 - Free trampoline on error path if ftrace_startup() fails.

 - Give more space for the longer PID numbers in trace output.

 - Fix a possible double free in the histogram code.

 - A couple of fixes that were discovered by sparse.

* tag 'trace-v5.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  bootconfig: init: make xbc_namebuf static
  kprobes: tracing/kprobes: Fix to kill kprobes on initmem after boot
  tracing: fix double free
  ftrace: Let ftrace_enable_sysctl take a kernel pointer buffer
  tracing: Make the space reserved for the pid wider
  ftrace: Fix missing synchronize_rcu() removing trampoline from kallsyms
  ftrace: Free the trampoline when ftrace_startup() fails
  kprobes: Fix to check probe enabled before disarm_kprobe_ftrace()

4 years agonvme-fc: fail new connections to a deleted host or remote port
James Smart [Thu, 17 Sep 2020 20:33:22 +0000 (13:33 -0700)]
nvme-fc: fail new connections to a deleted host or remote port

The lldd may have made calls to delete a remote port or local port and
the delete is in progress when the cli then attempts to create a new
controller. Currently, this proceeds without error although it can't be
very successful.

Fix this by validating that both the host port and remote port are
present when a new controller is to be created.

Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
4 years agonvme-pci: fix NULL req in completion handler
Xianting Tian [Tue, 22 Sep 2020 06:25:17 +0000 (14:25 +0800)]
nvme-pci: fix NULL req in completion handler

Currently, we use nvmeq->q_depth as the upper limit for a valid tag in
nvme_handle_cqe(), it is not correct. Because the available tag number
is recorded in tagset, which is not equal to nvmeq->q_depth.

The nvme driver registers interrupts for queues before initializing the
tagset, because it uses the number of successful request_irq() calls to
configure the tagset parameters. This allows a race condition with the
current tag validity check if the controller happens to produce an
interrupt with a corrupted CQE before the tagset is initialized.

Replace the driver's indirect tag check with the one already provided by
the block layer.

Signed-off-by: Xianting Tian <tian.xianting@h3c.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
4 years agonvme: return errors for hwmon init
Keith Busch [Thu, 17 Sep 2020 15:50:25 +0000 (08:50 -0700)]
nvme: return errors for hwmon init

Initializing the nvme hwmon retrieves a log from the controller. If the
controller is broken, we need to return the appropriate error so that
subsequent initialization doesn't attempt to continue.

Reported-by: Tong Zhang <ztong0001@gmail.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
4 years agoRevert "ALSA: usb-audio: Disable Lenovo P620 Rear line-in volume control"
Kai-Heng Feng [Tue, 15 Sep 2020 10:39:23 +0000 (18:39 +0800)]
Revert "ALSA: usb-audio: Disable Lenovo P620 Rear line-in volume control"

This reverts commit 904639f491fb30d010642017cb9ff997bcd437f2.

According to Realtek, volume FU works for line-in.

I can confirm volume control works after device firmware is updated.

Fixes: 904639f491fb ("ALSA: usb-audio: Disable Lenovo P620 Rear line-in volume control")
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Link: https://lore.kernel.org/r/20200915103925.12777-1-kai.heng.feng@canonical.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
4 years agobtrfs: fix put of uninitialized kobject after seed device delete
Anand Jain [Fri, 4 Sep 2020 17:34:21 +0000 (01:34 +0800)]
btrfs: fix put of uninitialized kobject after seed device delete

The following test case leads to NULL kobject free error:

  mount seed /mnt
  add sprout to /mnt
  umount /mnt
  mount sprout to /mnt
  delete seed

  kobject: '(null)' (00000000dd2b87e4): is not initialized, yet kobject_put() is being called.
  WARNING: CPU: 1 PID: 15784 at lib/kobject.c:736 kobject_put+0x80/0x350
  RIP: 0010:kobject_put+0x80/0x350
  ::
  Call Trace:
  btrfs_sysfs_remove_devices_dir+0x6e/0x160 [btrfs]
  btrfs_rm_device.cold+0xa8/0x298 [btrfs]
  btrfs_ioctl+0x206c/0x22a0 [btrfs]
  ksys_ioctl+0xe2/0x140
  __x64_sys_ioctl+0x1e/0x29
  do_syscall_64+0x96/0x150
  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  RIP: 0033:0x7f4047c6288b
  ::

This is because, at the end of the seed device-delete, we try to remove
the seed's devid sysfs entry. But for the seed devices under the sprout
fs, we don't initialize the devid kobject yet. So add a kobject state
check, which takes care of the bug.

Fixes: 524da8c979b1 ("btrfs: sysfs, add devid/dev_state kobject and device attributes")
CC: stable@vger.kernel.org # 5.6+
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
4 years agoMIPS: Loongson-3: Fix fp register access if MSA enabled
Huacai Chen [Mon, 24 Aug 2020 07:44:03 +0000 (15:44 +0800)]
MIPS: Loongson-3: Fix fp register access if MSA enabled

If MSA is enabled, FPU_REG_WIDTH is 128 rather than 64, then get_fpr64()
/set_fpr64() in the original unaligned instruction emulation code access
the wrong fp registers. This is because the current code doesn't specify
the correct index field, so fix it.

Fixes: ba0de8875d1b7d2e06d818d5 ("MIPS: Loongson-3: Add some unaligned instructions emulation")
Signed-off-by: Huacai Chen <chenhc@lemote.com>
Signed-off-by: Pei Huang <huangpei@loongson.cn>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
4 years agoMerge remote-tracking branch 'origin/master' into drm-intel-fixes
Jani Nikula [Tue, 22 Sep 2020 10:15:19 +0000 (13:15 +0300)]
Merge remote-tracking branch 'origin/master' into drm-intel-fixes

Direct backmerge to get commit fc95624fedd2 ("dax: Fix compilation for
CONFIG_DAX && !CONFIG_FS_DAX") to fix CI build issues.

Acked-by: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
4 years agomedia: dt-bindings: media: imx274: Convert to json-schema
Jacopo Mondi [Sat, 12 Sep 2020 10:30:45 +0000 (12:30 +0200)]
media: dt-bindings: media: imx274: Convert to json-schema

Convert the imx274 bindings document to json-schema and update
the MAINTAINERS file accordingly.

Reviewed-by: Luca Ceresoli <luca@lucaceresoli.net>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Jacopo Mondi <jacopo+renesas@jmondi.org>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
4 years agotools/bootconfig: Add testcase for tailing space
Masami Hiramatsu [Mon, 21 Sep 2020 09:45:11 +0000 (18:45 +0900)]
tools/bootconfig: Add testcase for tailing space

Add testcases for removing/keeping tailing space
in the value.

Link: https://lkml.kernel.org/r/160068151151.1088739.3469541807296024227.stgit@devnote2
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
4 years agotools/bootconfig: Add testcases for repeated key with brace
Masami Hiramatsu [Mon, 21 Sep 2020 09:45:02 +0000 (18:45 +0900)]
tools/bootconfig: Add testcases for repeated key with brace

Add a testcase for repeated key with brace parsing issue.

Link: https://lkml.kernel.org/r/160068150176.1088739.409481347784771987.stgit@devnote2
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
4 years agolib/bootconfig: Fix to remove tailing spaces after value
Masami Hiramatsu [Mon, 21 Sep 2020 09:44:51 +0000 (18:44 +0900)]
lib/bootconfig: Fix to remove tailing spaces after value

Fix to remove tailing spaces after value. If there is a space
after value, the bootconfig failed to remove it because it
applies strim() before replacing the delimiter with null.

For example,

foo = var    # comment

was parsed as below.

foo="var    "

but user will expect

foo="var"

This fixes it by applying strim() after removing the delimiter.

Link: https://lkml.kernel.org/r/160068149134.1088739.8868306567670058853.stgit@devnote2
Fixes: 002e0be7fc1b ("bootconfig: Add Extra Boot Config support")
Cc: Ingo Molnar <mingo@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
4 years agolib/bootconfig: Fix a bug of breaking existing tree nodes
Masami Hiramatsu [Mon, 21 Sep 2020 09:44:42 +0000 (18:44 +0900)]
lib/bootconfig: Fix a bug of breaking existing tree nodes

Fix a bug of breaking existing tree nodes by parsing the second
and subsequent braces. Since the bootconfig parser uses the
node.next field as a flag of current parent node, but this will
break the existing tree if the same key node is specified again
in the bootconfig.

For example, the following bootconfig should be foo.buz and bar.

foo
bar
foo { buz }

However, when parsing the brace "{", it breaks foo->bar link
by marking open-brace node. So the bootconfig unlinks bar
from the bootconfig internal tree.

This introduces a stack outside of the tree and record the
last open-brace on the stack instead of using node.next field.

Link: https://lkml.kernel.org/r/160068148267.1088739.8264704338030168660.stgit@devnote2
Fixes: 002e0be7fc1b ("bootconfig: Add Extra Boot Config support")
Cc: Ingo Molnar <mingo@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
4 years agoMerge branch 'Fix-broken-tc-flower-rules-for-mscc_ocelot-switches'
David S. Miller [Tue, 22 Sep 2020 00:40:53 +0000 (17:40 -0700)]
Merge branch 'Fix-broken-tc-flower-rules-for-mscc_ocelot-switches'

Vladimir Oltean says:

====================
Fix broken tc-flower rules for mscc_ocelot switches

All 3 switch drivers from the Ocelot family have the same bug in the
VCAP IS2 key offsets, which is that some keys are in the incorrect
order.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: mscc: ocelot: fix some key offsets for IP4_TCP_UDP VCAP IS2 entries
Vladimir Oltean [Mon, 21 Sep 2020 22:56:38 +0000 (01:56 +0300)]
net: mscc: ocelot: fix some key offsets for IP4_TCP_UDP VCAP IS2 entries

The IS2 IP4_TCP_UDP key offsets do not correspond to the VSC7514
datasheet. Whether they work or not is unknown to me. On VSC9959 and
VSC9953, with the same mistake and same discrepancy from the
documentation, tc-flower src_port and dst_port rules did not work, so I
am assuming the same is true here.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: dsa: seville: fix some key offsets for IP4_TCP_UDP VCAP IS2 entries
Vladimir Oltean [Mon, 21 Sep 2020 22:56:37 +0000 (01:56 +0300)]
net: dsa: seville: fix some key offsets for IP4_TCP_UDP VCAP IS2 entries

Since these were copied from the Felix VCAP IS2 code, and only the
offsets were adjusted, the order of the bit fields is still wrong.
Fix it.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: dsa: felix: fix some key offsets for IP4_TCP_UDP VCAP IS2 entries
Xiaoliang Yang [Mon, 21 Sep 2020 22:56:36 +0000 (01:56 +0300)]
net: dsa: felix: fix some key offsets for IP4_TCP_UDP VCAP IS2 entries

Some of the IS2 IP4_TCP_UDP keys are not correct, like L4_DPORT,
L4_SPORT and other L4 keys. This prevents offloaded tc-flower rules from
matching on src_port and dst_port for TCP and UDP packets.

Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoinet_diag: validate INET_DIAG_REQ_PROTOCOL attribute
Eric Dumazet [Mon, 21 Sep 2020 14:27:20 +0000 (07:27 -0700)]
inet_diag: validate INET_DIAG_REQ_PROTOCOL attribute

User space could send an invalid INET_DIAG_REQ_PROTOCOL attribute
as caught by syzbot.

BUG: KMSAN: uninit-value in inet_diag_lock_handler net/ipv4/inet_diag.c:55 [inline]
BUG: KMSAN: uninit-value in __inet_diag_dump+0x58c/0x720 net/ipv4/inet_diag.c:1147
CPU: 0 PID: 8505 Comm: syz-executor174 Not tainted 5.9.0-rc4-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x21c/0x280 lib/dump_stack.c:118
 kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:122
 __msan_warning+0x58/0xa0 mm/kmsan/kmsan_instr.c:219
 inet_diag_lock_handler net/ipv4/inet_diag.c:55 [inline]
 __inet_diag_dump+0x58c/0x720 net/ipv4/inet_diag.c:1147
 inet_diag_dump_compat+0x2a5/0x380 net/ipv4/inet_diag.c:1254
 netlink_dump+0xb73/0x1cb0 net/netlink/af_netlink.c:2246
 __netlink_dump_start+0xcf2/0xea0 net/netlink/af_netlink.c:2354
 netlink_dump_start include/linux/netlink.h:246 [inline]
 inet_diag_rcv_msg_compat+0x5da/0x6c0 net/ipv4/inet_diag.c:1288
 sock_diag_rcv_msg+0x24f/0x620 net/core/sock_diag.c:256
 netlink_rcv_skb+0x6d7/0x7e0 net/netlink/af_netlink.c:2470
 sock_diag_rcv+0x63/0x80 net/core/sock_diag.c:275
 netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline]
 netlink_unicast+0x11c8/0x1490 net/netlink/af_netlink.c:1330
 netlink_sendmsg+0x173a/0x1840 net/netlink/af_netlink.c:1919
 sock_sendmsg_nosec net/socket.c:651 [inline]
 sock_sendmsg net/socket.c:671 [inline]
 ____sys_sendmsg+0xc82/0x1240 net/socket.c:2353
 ___sys_sendmsg net/socket.c:2407 [inline]
 __sys_sendmsg+0x6d1/0x820 net/socket.c:2440
 __do_sys_sendmsg net/socket.c:2449 [inline]
 __se_sys_sendmsg+0x97/0xb0 net/socket.c:2447
 __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2447
 do_syscall_64+0x9f/0x140 arch/x86/entry/common.c:48
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x441389
Code: e8 fc ab 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 1b 09 fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007fff3b02ce98 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000441389
RDX: 0000000000000000 RSI: 0000000020001500 RDI: 0000000000000003
RBP: 00000000006cb018 R08: 00000000004002c8 R09: 00000000004002c8
R10: 0000000000000004 R11: 0000000000000246 R12: 0000000000402130
R13: 00000000004021c0 R14: 0000000000000000 R15: 0000000000000000

Uninit was created at:
 kmsan_save_stack_with_flags mm/kmsan/kmsan.c:143 [inline]
 kmsan_internal_poison_shadow+0x66/0xd0 mm/kmsan/kmsan.c:126
 kmsan_slab_alloc+0x8a/0xe0 mm/kmsan/kmsan_hooks.c:80
 slab_alloc_node mm/slub.c:2907 [inline]
 __kmalloc_node_track_caller+0x9aa/0x12f0 mm/slub.c:4511
 __kmalloc_reserve net/core/skbuff.c:142 [inline]
 __alloc_skb+0x35f/0xb30 net/core/skbuff.c:210
 alloc_skb include/linux/skbuff.h:1094 [inline]
 netlink_alloc_large_skb net/netlink/af_netlink.c:1176 [inline]
 netlink_sendmsg+0xdb9/0x1840 net/netlink/af_netlink.c:1894
 sock_sendmsg_nosec net/socket.c:651 [inline]
 sock_sendmsg net/socket.c:671 [inline]
 ____sys_sendmsg+0xc82/0x1240 net/socket.c:2353
 ___sys_sendmsg net/socket.c:2407 [inline]
 __sys_sendmsg+0x6d1/0x820 net/socket.c:2440
 __do_sys_sendmsg net/socket.c:2449 [inline]
 __se_sys_sendmsg+0x97/0xb0 net/socket.c:2447
 __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2447
 do_syscall_64+0x9f/0x140 arch/x86/entry/common.c:48
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: 85d75fd89e14 ("inet_diag: support for wider protocol numbers")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Christoph Paasch <cpaasch@apple.com>
Cc: Mat Martineau <mathew.j.martineau@linux.intel.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: bridge: br_vlan_get_pvid_rcu() should dereference the VLAN group under RCU
Vladimir Oltean [Mon, 21 Sep 2020 22:07:09 +0000 (01:07 +0300)]
net: bridge: br_vlan_get_pvid_rcu() should dereference the VLAN group under RCU

When calling the RCU brother of br_vlan_get_pvid(), lockdep warns:

=============================
WARNING: suspicious RCU usage
5.9.0-rc3-01631-g13c17acb8e38-dirty #814 Not tainted
-----------------------------
net/bridge/br_private.h:1054 suspicious rcu_dereference_protected() usage!

Call trace:
 lockdep_rcu_suspicious+0xd4/0xf8
 __br_vlan_get_pvid+0xc0/0x100
 br_vlan_get_pvid_rcu+0x78/0x108

The warning is because br_vlan_get_pvid_rcu() calls nbp_vlan_group()
which calls rtnl_dereference() instead of rcu_dereference(). In turn,
rtnl_dereference() calls rcu_dereference_protected() which assumes
operation under an RCU write-side critical section, which obviously is
not the case here. So, when the incorrect primitive is used to access
the RCU-protected VLAN group pointer, READ_ONCE() is not used, which may
cause various unexpected problems.

I'm sad to say that br_vlan_get_pvid() and br_vlan_get_pvid_rcu() cannot
share the same implementation. So fix the bug by splitting the 2
functions, and making br_vlan_get_pvid_rcu() retrieve the VLAN groups
under proper locking annotations.

Fixes: 76487dd779cc ("bridge: add br_vlan_get_pvid_rcu()")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge tag 'mlx5-fixes-2020-09-18' of git://git.kernel.org/pub/scm/linux/kernel/git...
David S. Miller [Tue, 22 Sep 2020 00:32:42 +0000 (17:32 -0700)]
Merge tag 'mlx5-fixes-2020-09-18' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5 fixes-2020-09-18

This series introduces some fixes to mlx5 driver.

Please pull and let me know if there is any problem.

v1->v2:
 Remove missing patch from -stable list.

For -stable v5.1
 ('net/mlx5: Fix FTE cleanup')

For -stable v5.3
 ('net/mlx5e: TLS, Do not expose FPGA TLS counter if not supported')
 ('net/mlx5e: Enable adding peer miss rules only if merged eswitch is supported')

For -stable v5.7
 ('net/mlx5e: Fix memory leak of tunnel info when rule under multipath not ready')

For -stable v5.8
 ('net/mlx5e: Use RCU to protect rq->xdp_prog')
 ('net/mlx5e: Fix endianness when calculating pedit mask first bit')
 ('net/mlx5e: Use synchronize_rcu to sync with NAPI')
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: Update MAINTAINERS for MediaTek switch driver
Sean Wang [Mon, 21 Sep 2020 23:09:23 +0000 (07:09 +0800)]
net: Update MAINTAINERS for MediaTek switch driver

Update maintainers for MediaTek switch driver with Landen Chao who is
familiar with MediaTek MT753x switch devices and will help maintenance
from the vendor side.

Cc: Steven Liu <steven.liu@mediatek.com>
Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: Landen Chao <Landen.Chao@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet/mlx5e: mlx5e_fec_in_caps() returns a boolean
Saeed Mahameed [Fri, 11 Sep 2020 18:00:06 +0000 (11:00 -0700)]
net/mlx5e: mlx5e_fec_in_caps() returns a boolean

Returning errno is a bug, fix that.

Also fixes smatch warnings:
drivers/net/ethernet/mellanox/mlx5/core/en/port.c:453
mlx5e_fec_in_caps() warn: signedness bug returning '(-95)'

Fixes: 311f5799562b ("net/mlx5e: Advertise globaly supported FEC modes")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Aya Levin <ayal@nvidia.com>
4 years agonet/mlx5e: kTLS, Avoid kzalloc(GFP_KERNEL) under spinlock
Saeed Mahameed [Tue, 8 Sep 2020 05:54:43 +0000 (22:54 -0700)]
net/mlx5e: kTLS, Avoid kzalloc(GFP_KERNEL) under spinlock

The spinlock only needed when accessing the channel's icosq, grab the lock
after the buf allocation in resync_post_get_progress_params() to avoid
kzalloc(GFP_KERNEL) in atomic context.

Fixes: d92bf3a50da0 ("net/mlx5e: kTLS, Add kTLS RX resync support")
Reported-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
4 years agonet/mlx5e: kTLS, Fix leak on resync error flow
Saeed Mahameed [Fri, 11 Sep 2020 04:16:04 +0000 (21:16 -0700)]
net/mlx5e: kTLS, Fix leak on resync error flow

Resync progress params buffer and dma weren't released on error,
Add missing error unwinding for resync_post_get_progress_params().

Fixes: d92bf3a50da0 ("net/mlx5e: kTLS, Add kTLS RX resync support")
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
4 years agonet/mlx5e: kTLS, Add missing dma_unmap in RX resync
Saeed Mahameed [Tue, 8 Sep 2020 05:58:50 +0000 (22:58 -0700)]
net/mlx5e: kTLS, Add missing dma_unmap in RX resync

Progress params dma address is never unmapped, unmap it when completion
handling is over.

Fixes: d92bf3a50da0 ("net/mlx5e: kTLS, Add kTLS RX resync support")
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
4 years agonet/mlx5e: kTLS, Fix napi sync and possible use-after-free
Tariq Toukan [Mon, 10 Aug 2020 12:59:41 +0000 (15:59 +0300)]
net/mlx5e: kTLS, Fix napi sync and possible use-after-free

Using synchronize_rcu() is sufficient to wait until running NAPI quits.

See similar upstream fix with detailed explanation:
("net/mlx5e: Use synchronize_rcu to sync with NAPI")

This change also fixes a possible use-after-free as the NAPI
might be already released at this stage.

Fixes: d92bf3a50da0 ("net/mlx5e: kTLS, Add kTLS RX resync support")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
4 years agonet/mlx5e: TLS, Do not expose FPGA TLS counter if not supported
Tariq Toukan [Sun, 28 Jun 2020 10:06:06 +0000 (13:06 +0300)]
net/mlx5e: TLS, Do not expose FPGA TLS counter if not supported

The set of TLS TX global SW counters in mlx5e_tls_sw_stats_desc
is updated from all rings by using atomic ops.
This set of stats is used only in the FPGA TLS use case, not in
the Connect-X TLS one, where regular per-ring counters are used.

Do not expose them in the Connect-X use case, as this would cause
counter duplication. For example, tx_tls_drop_no_sync_data would
appear twice in the ethtool stats.

Fixes: 3e985319cc11 ("net/mlx5e: Add kTLS TX HW offload support")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
4 years agonet/mlx5e: Fix using wrong stats_grps in mlx5e_update_ndo_stats()
Alaa Hleihel [Tue, 25 Aug 2020 07:41:50 +0000 (10:41 +0300)]
net/mlx5e: Fix using wrong stats_grps in mlx5e_update_ndo_stats()

The cited commit started to reuse function mlx5e_update_ndo_stats() for
the representors as well.
However, the function is hard-coded to work on mlx5e_nic_stats_grps only.
Due to this issue, the representors statistics were not updated in the
output of "ip -s".

Fix it to work with the correct group by extracting it from the caller's
profile.

Also, while at it and since this function became generic, move it to
en_stats.c and rename it accordingly.

Fixes: 585f22fe4bc6 ("net/mlx5e: Convert rep stats to mlx5e_stats_grp-based infra")
Signed-off-by: Alaa Hleihel <alaa@nvidia.com>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
4 years agonet/mlx5e: Fix multicast counter not up-to-date in "ip -s"
Ron Diskin [Sun, 10 May 2020 11:39:51 +0000 (14:39 +0300)]
net/mlx5e: Fix multicast counter not up-to-date in "ip -s"

Currently the FW does not generate events for counters other than error
counters. Unlike ".get_ethtool_stats", ".ndo_get_stats64" (which ip -s
uses) might run in atomic context, while the FW interface is non atomic.
Thus, 'ip' is not allowed to issue FW commands, so it will only display
cached counters in the driver.

Add a SW counter (mcast_packets) in the driver to count rx multicast
packets. The counter also counts broadcast packets, as we consider it a
special case of multicast.
Use the counter value when calling "ip -s"/"ifconfig".

Fixes: 9df41abb3a64 ("net/mlx5: Extend mlx5_core to support ConnectX-4 Ethernet functionality")
Signed-off-by: Ron Diskin <rondi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
4 years agonet/mlx5e: Fix endianness when calculating pedit mask first bit
Maor Dickman [Wed, 2 Sep 2020 13:49:52 +0000 (16:49 +0300)]
net/mlx5e: Fix endianness when calculating pedit mask first bit

The field mask value is provided in network byte order and has to
be converted to host byte order before calculating pedit mask
first bit.

Fixes: 12d44f2c3b9e ("net/mlx5e: Bit sized fields rewrite support")
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>