]> git.baikalelectronics.ru Git - kernel.git/log
kernel.git
3 years agoMerge tag 'for-5.16-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave...
Linus Torvalds [Thu, 18 Nov 2021 20:41:14 +0000 (12:41 -0800)]
Merge tag 'for-5.16-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:
 "Several xes and one old ioctl deprecation. Namely there's fix for
  crashes/warnings with lzo compression that was suspected to be caused
  by first pull merge resolution, but it was a different bug.

  Summary:

   - regression fix for a crash in lzo due to missing boundary checks of
     the page array

   - fix crashes on ARM64 due to missing barriers when synchronizing
     status bits between work queues

   - silence lockdep when reading chunk tree during mount

   - fix false positive warning in integrity checker on devices with
     disabled write caching

   - fix signedness of bitfields in scrub

   - start deprecation of balance v1 ioctl"

* tag 'for-5.16-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: deprecate BTRFS_IOC_BALANCE ioctl
  btrfs: make 1-bit bit-fields of scrub_page unsigned int
  btrfs: check-integrity: fix a warning on write caching disabled disk
  btrfs: silence lockdep when reading chunk tree during mount
  btrfs: fix memory ordering between normal and ordered work functions
  btrfs: fix a out-of-bound access in copy_compressed_data_to_page()

3 years agoMerge tag 'fs_for_v5.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack...
Linus Torvalds [Thu, 18 Nov 2021 20:31:29 +0000 (12:31 -0800)]
Merge tag 'fs_for_v5.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs

Pull UDF fix from Jan Kara:
 "A fix for a long-standing UDF bug where we were not properly
  validating directory position inside readdir"

* tag 'fs_for_v5.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  udf: Fix crash after seekdir

3 years agoMerge tag 'fs.idmapped.v5.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Thu, 18 Nov 2021 20:17:33 +0000 (12:17 -0800)]
Merge tag 'fs.idmapped.v5.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux

Pull setattr idmapping fix from Christian Brauner:
 "This contains a simple fix for setattr. When determining the validity
  of the attributes the ia_{g,u}id fields contain the value that will be
  written to inode->i_{g,u}id. When the {g,u}id attribute of the file
  isn't altered and the caller's fs{g,u}id matches the current {g,u}id
  attribute the attribute change is allowed.

  The value in ia_{g,u}id does already account for idmapped mounts and
  will have taken the relevant idmapping into account. So in order to
  verify that the {g,u}id attribute isn't changed we simple need to
  compare the ia_{g,u}id value against the inode's i_{g,u}id value.

  This only has any meaning for idmapped mounts as idmapping helpers are
  idempotent without them. And for idmapped mounts this really only has
  a meaning when circular idmappings are used, i.e. mappings where e.g.
  id 1000 is mapped to id 1001 and id 1001 is mapped to id 1000. Such
  ciruclar mappings can e.g. be useful when sharing the same home
  directory between multiple users at the same time.

  Before this patch we could end up denying legitimate attribute changes
  and allowing invalid attribute changes when circular mappings are
  used. To even get into this situation the caller must've been
  privileged both to create that mapping and to create that idmapped
  mount.

  This hasn't been seen in the wild anywhere but came up when expanding
  the fstest suite during work on a series of hardening patches. All
  idmapped fstests pass without any regressions and we're adding new
  tests to verify the behavior of circular mappings.

  The new tests can be found at [1]"

Link: https://lore.kernel.org/linux-fsdevel/20211109145713.1868404-2-brauner@kernel.org
* tag 'fs.idmapped.v5.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
  fs: handle circular mappings correctly

3 years agoMerge tag 'for-5.16/parisc-4' of git://git.kernel.org/pub/scm/linux/kernel/git/deller...
Linus Torvalds [Thu, 18 Nov 2021 20:13:24 +0000 (12:13 -0800)]
Merge tag 'for-5.16/parisc-4' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux

Pull parisc fixes from Helge Deller:
 "parisc bug and warning fixes and wire up futex_waitv.

  Fix some warnings which showed up with allmodconfig builds, a revert
  of a change to the sigreturn trampoline which broke signal handling,
  wire up futex_waitv and add CONFIG_PRINTK_TIME=y to 32bit defconfig"

* tag 'for-5.16/parisc-4' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  parisc: Enable CONFIG_PRINTK_TIME=y in 32bit defconfig
  Revert "parisc: Reduce sigreturn trampoline to 3 instructions"
  parisc: Wrap assembler related defines inside __ASSEMBLY__
  parisc: Wire up futex_waitv
  parisc: Include stringify.h to avoid build error in crypto/api.c
  parisc/sticon: fix reverse colors

3 years agoMerge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Linus Torvalds [Thu, 18 Nov 2021 20:05:22 +0000 (12:05 -0800)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 "Selftest changes:

   - Cleanups for the perf test infrastructure and mapping hugepages

   - Avoid contention on mmap_sem when the guests start to run

   - Add event channel upcall support to xen_shinfo_test

  x86 changes:

   - Fixes for Xen emulation

   - Kill kvm_map_gfn() / kvm_unmap_gfn() and broken gfn_to_pfn_cache

   - Fixes for migration of 32-bit nested guests on 64-bit hypervisor

   - Compilation fixes

   - More SEV cleanups

  Generic:

   - Cap the return value of KVM_CAP_NR_VCPUS to both KVM_CAP_MAX_VCPUS
     and num_online_cpus(). Most architectures were only using one of
     the two"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (42 commits)
  KVM: x86: Cap KVM_CAP_NR_VCPUS by KVM_CAP_MAX_VCPUS
  KVM: s390: Cap KVM_CAP_NR_VCPUS by num_online_cpus()
  KVM: RISC-V: Cap KVM_CAP_NR_VCPUS by KVM_CAP_MAX_VCPUS
  KVM: PPC: Cap KVM_CAP_NR_VCPUS by KVM_CAP_MAX_VCPUS
  KVM: MIPS: Cap KVM_CAP_NR_VCPUS by KVM_CAP_MAX_VCPUS
  KVM: arm64: Cap KVM_CAP_NR_VCPUS by kvm_arm_default_max_vcpus()
  KVM: x86: Assume a 64-bit hypercall for guests with protected state
  selftests: KVM: Add /x86_64/sev_migrate_tests to .gitignore
  riscv: kvm: fix non-kernel-doc comment block
  KVM: SEV: Fix typo in and tweak name of cmd_allowed_from_miror()
  KVM: SEV: Drop a redundant setting of sev->asid during initialization
  KVM: SEV: WARN if SEV-ES is marked active but SEV is not
  KVM: SEV: Set sev_info.active after initial checks in sev_guest_init()
  KVM: SEV: Disallow COPY_ENC_CONTEXT_FROM if target has created vCPUs
  KVM: Kill kvm_map_gfn() / kvm_unmap_gfn() and gfn_to_pfn_cache
  KVM: nVMX: Use a gfn_to_hva_cache for vmptrld
  KVM: nVMX: Use kvm_read_guest_offset_cached() for nested VMCS check
  KVM: x86/xen: Use sizeof_field() instead of open-coding it
  KVM: nVMX: Use kvm_{read,write}_guest_cached() for shadow_vmcs12
  KVM: x86/xen: Fix get_attr of KVM_XEN_ATTR_TYPE_SHARED_INFO
  ...

3 years agoMerge tag 'docs-5.16-2' of git://git.lwn.net/linux
Linus Torvalds [Thu, 18 Nov 2021 19:01:06 +0000 (11:01 -0800)]
Merge tag 'docs-5.16-2' of git://git.lwn.net/linux

Pull documentation fixes from Jonathan Corbet:
 "A handful of documentation fixes for 5.16"

* tag 'docs-5.16-2' of git://git.lwn.net/linux:
  Documentation/process: fix a cross reference
  Documentation: update vcpu-requests.rst reference
  docs: accounting: update delay-accounting.rst reference
  libbpf: update index.rst reference
  docs: filesystems: Fix grammatical error "with" to "which"
  doc/zh_CN: fix a translation error in management-style
  docs: ftrace: fix the wrong path of tracefs
  Documentation: arm: marvell: Fix link to armada_1000_pb.pdf document
  Documentation: arm: marvell: Put Armada XP section between Armada 370 and 375
  Documentation: arm: marvell: Add some links to homepage / product infos
  docs: Update Sphinx requirements

3 years agoMerge tag 'printk-for-5.16-fixup' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Thu, 18 Nov 2021 18:50:45 +0000 (10:50 -0800)]
Merge tag 'printk-for-5.16-fixup' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux

Pull printk fixes from Petr Mladek:

 - Try to flush backtraces from other CPUs also on the local one. This
   was a regression caused by printk_safe buffers removal.

 - Remove header dependency warning.

* tag 'printk-for-5.16-fixup' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
  printk: Remove printk.h inclusion in percpu.h
  printk: restore flushing of NMI buffers on remote CPUs after NMI backtraces

3 years agoMerge branch 'rework/printk_safe-removal' into for-linus
Petr Mladek [Thu, 18 Nov 2021 09:03:47 +0000 (10:03 +0100)]
Merge branch 'rework/printk_safe-removal' into for-linus

3 years agoparisc: Enable CONFIG_PRINTK_TIME=y in 32bit defconfig
Helge Deller [Wed, 17 Nov 2021 14:48:45 +0000 (15:48 +0100)]
parisc: Enable CONFIG_PRINTK_TIME=y in 32bit defconfig

Signed-off-by: Helge Deller <deller@gmx.de>
3 years agoRevert "parisc: Reduce sigreturn trampoline to 3 instructions"
Helge Deller [Wed, 17 Nov 2021 10:05:07 +0000 (11:05 +0100)]
Revert "parisc: Reduce sigreturn trampoline to 3 instructions"

This reverts commit 968eac5049a3e0237de319a0cd9492459b38e123.

This patch shows problems with signal handling. Revert it for now.

Signed-off-by: Helge Deller <deller@gmx.de>
Cc: <stable@vger.kernel.org> # v5.15
3 years agoparisc: Wrap assembler related defines inside __ASSEMBLY__
Helge Deller [Tue, 16 Nov 2021 12:12:21 +0000 (13:12 +0100)]
parisc: Wrap assembler related defines inside __ASSEMBLY__

Building allmodconfig shows errors in the gpu/drm/msm snapdragon drivers,
because a COND() define is used there which conflicts with the COND() for
PA-RISC assembly.  Although the snapdragon driver isn't relevant for parisc, it
is nevertheless compiled when CONFIG_COMPILE_TEST is defined.

Move the COND() define and other PA-RISC mnemonics inside the #ifdef
__ASSEMBLY__ part to avoid this conflict.

Signed-off-by: Helge Deller <deller@gmx.de>
Reported-by: kernel test robot <lkp@intel.com>
3 years agoparisc: Wire up futex_waitv
Helge Deller [Tue, 16 Nov 2021 12:11:26 +0000 (13:11 +0100)]
parisc: Wire up futex_waitv

Signed-off-by: Helge Deller <deller@gmx.de>
3 years agoparisc: Include stringify.h to avoid build error in crypto/api.c
Helge Deller [Mon, 15 Nov 2021 16:34:50 +0000 (17:34 +0100)]
parisc: Include stringify.h to avoid build error in crypto/api.c

Include stringify.h to avoid this build error:
 arch/parisc/include/asm/jump_label.h: error: expected ':' before '__stringify'
 arch/parisc/include/asm/jump_label.h: error: label 'l_yes' defined but not used [-Werror=unused-label]

Signed-off-by: Helge Deller <deller@gmx.de>
Reported-by: kernel test robot <lkp@intel.com>
3 years agoKVM: x86: Cap KVM_CAP_NR_VCPUS by KVM_CAP_MAX_VCPUS
Vitaly Kuznetsov [Tue, 16 Nov 2021 16:34:43 +0000 (17:34 +0100)]
KVM: x86: Cap KVM_CAP_NR_VCPUS by KVM_CAP_MAX_VCPUS

It doesn't make sense to return the recommended maximum number of
vCPUs which exceeds the maximum possible number of vCPUs.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20211116163443.88707-7-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: s390: Cap KVM_CAP_NR_VCPUS by num_online_cpus()
Vitaly Kuznetsov [Tue, 16 Nov 2021 16:34:42 +0000 (17:34 +0100)]
KVM: s390: Cap KVM_CAP_NR_VCPUS by num_online_cpus()

KVM_CAP_NR_VCPUS is a legacy advisory value which on other architectures
return num_online_cpus() caped by KVM_CAP_NR_VCPUS or something else
(ppc and arm64 are special cases). On s390, KVM_CAP_NR_VCPUS returns
the same as KVM_CAP_MAX_VCPUS and this may turn out to be a bad
'advice'. Switch s390 to returning caped num_online_cpus() too.

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Message-Id: <20211116163443.88707-6-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: RISC-V: Cap KVM_CAP_NR_VCPUS by KVM_CAP_MAX_VCPUS
Vitaly Kuznetsov [Tue, 16 Nov 2021 16:34:41 +0000 (17:34 +0100)]
KVM: RISC-V: Cap KVM_CAP_NR_VCPUS by KVM_CAP_MAX_VCPUS

It doesn't make sense to return the recommended maximum number of
vCPUs which exceeds the maximum possible number of vCPUs.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: Anup Patel <anup.patel@wdc.com>
Reviewed-by: Anup Patel <anup.patel@wdc.com>
Message-Id: <20211116163443.88707-5-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: PPC: Cap KVM_CAP_NR_VCPUS by KVM_CAP_MAX_VCPUS
Vitaly Kuznetsov [Tue, 16 Nov 2021 16:34:40 +0000 (17:34 +0100)]
KVM: PPC: Cap KVM_CAP_NR_VCPUS by KVM_CAP_MAX_VCPUS

It doesn't make sense to return the recommended maximum number of
vCPUs which exceeds the maximum possible number of vCPUs.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20211116163443.88707-4-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: MIPS: Cap KVM_CAP_NR_VCPUS by KVM_CAP_MAX_VCPUS
Vitaly Kuznetsov [Tue, 16 Nov 2021 16:34:39 +0000 (17:34 +0100)]
KVM: MIPS: Cap KVM_CAP_NR_VCPUS by KVM_CAP_MAX_VCPUS

It doesn't make sense to return the recommended maximum number of
vCPUs which exceeds the maximum possible number of vCPUs.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20211116163443.88707-3-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: arm64: Cap KVM_CAP_NR_VCPUS by kvm_arm_default_max_vcpus()
Vitaly Kuznetsov [Tue, 16 Nov 2021 16:34:38 +0000 (17:34 +0100)]
KVM: arm64: Cap KVM_CAP_NR_VCPUS by kvm_arm_default_max_vcpus()

Generally, it doesn't make sense to return the recommended maximum number
of vCPUs which exceeds the maximum possible number of vCPUs.

Note: ARM64 is special as the value returned by KVM_CAP_MAX_VCPUS differs
depending on whether it is a system-wide ioctl or a per-VM one. Previously,
KVM_CAP_NR_VCPUS didn't have this difference and it seems preferable to
keep the status quo. Cap KVM_CAP_NR_VCPUS by kvm_arm_default_max_vcpus()
which is what gets returned by system-wide KVM_CAP_MAX_VCPUS.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20211116163443.88707-2-vkuznets@redhat.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: x86: Assume a 64-bit hypercall for guests with protected state
Tom Lendacky [Mon, 24 May 2021 17:48:57 +0000 (12:48 -0500)]
KVM: x86: Assume a 64-bit hypercall for guests with protected state

When processing a hypercall for a guest with protected state, currently
SEV-ES guests, the guest CS segment register can't be checked to
determine if the guest is in 64-bit mode. For an SEV-ES guest, it is
expected that communication between the guest and the hypervisor is
performed to shared memory using the GHCB. In order to use the GHCB, the
guest must have been in long mode, otherwise writes by the guest to the
GHCB would be encrypted and not be able to be comprehended by the
hypervisor.

Create a new helper function, is_64_bit_hypercall(), that assumes the
guest is in 64-bit mode when the guest has protected state, and returns
true, otherwise invoking is_64_bit_mode() to determine the mode. Update
the hypercall related routines to use is_64_bit_hypercall() instead of
is_64_bit_mode().

Add a WARN_ON_ONCE() to is_64_bit_mode() to catch occurences of calls to
this helper function for a guest running with protected state.

Fixes: 1eef9a70a655 ("KVM: SVM: Add required changes to support intercepts under SEV-ES")
Reported-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Message-Id: <e0b20c770c9d0d1403f23d83e785385104211f74.1621878537.git.thomas.lendacky@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoselftests: KVM: Add /x86_64/sev_migrate_tests to .gitignore
Arnaldo Carvalho de Melo [Tue, 16 Nov 2021 15:03:25 +0000 (12:03 -0300)]
selftests: KVM: Add /x86_64/sev_migrate_tests to .gitignore

  $ git status
  nothing to commit, working tree clean
  $
  $ make -C tools/testing/selftests/kvm/ > /dev/null 2>&1
  $ git status

  Untracked files:
    (use "git add <file>..." to include in what will be committed)
   tools/testing/selftests/kvm/x86_64/sev_migrate_tests

  nothing added to commit but untracked files present (use "git add" to track)
  $

Fixes: 7e1022b208735b89 ("selftest: KVM: Add intra host migration tests")
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Marc Orr <marcorr@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Gonda <pgonda@google.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Message-Id: <YZPIPfvYgRDCZi/w@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoriscv: kvm: fix non-kernel-doc comment block
Randy Dunlap [Sun, 7 Nov 2021 03:47:06 +0000 (20:47 -0700)]
riscv: kvm: fix non-kernel-doc comment block

Don't use "/**" to begin a comment block for a non-kernel-doc comment.

Prevents this docs build warning:

vcpu_sbi.c:3: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
 * Copyright (c) 2019 Western Digital Corporation or its affiliates.

Fixes: 694be2b5a02b ("RISC-V: KVM: Add SBI v0.1 support")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: kernel test robot <lkp@intel.com>
Cc: Atish Patra <atish.patra@wdc.com>
Cc: Anup Patel <anup.patel@wdc.com>
Cc: kvm@vger.kernel.org
Cc: kvm-riscv@lists.infradead.org
Cc: linux-riscv@lists.infradead.org
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Message-Id: <20211107034706.30672-1-rdunlap@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoMerge branch 'kvm-5.16-fixes' into kvm-master
Paolo Bonzini [Tue, 16 Nov 2021 13:08:19 +0000 (08:08 -0500)]
Merge branch 'kvm-5.16-fixes' into kvm-master

* Fixes for Xen emulation

* Kill kvm_map_gfn() / kvm_unmap_gfn() and broken gfn_to_pfn_cache

* Fixes for migration of 32-bit nested guests on 64-bit hypervisor

* Compilation fixes

* More SEV cleanups

3 years agoKVM: SEV: Fix typo in and tweak name of cmd_allowed_from_miror()
Sean Christopherson [Tue, 9 Nov 2021 21:51:01 +0000 (21:51 +0000)]
KVM: SEV: Fix typo in and tweak name of cmd_allowed_from_miror()

Rename cmd_allowed_from_miror() to is_cmd_allowed_from_mirror(), fixing
a typo and making it obvious that the result is a boolean where
false means "not allowed".

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211109215101.2211373-7-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: SEV: Drop a redundant setting of sev->asid during initialization
Sean Christopherson [Tue, 9 Nov 2021 21:51:00 +0000 (21:51 +0000)]
KVM: SEV: Drop a redundant setting of sev->asid during initialization

Remove a fully redundant write to sev->asid during SEV/SEV-ES guest
initialization.  The ASID is set a few lines earlier prior to the call to
sev_platform_init(), which doesn't take "sev" as a param, i.e. can't
muck with the ASID barring some truly magical behind-the-scenes code.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211109215101.2211373-6-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: SEV: WARN if SEV-ES is marked active but SEV is not
Sean Christopherson [Tue, 9 Nov 2021 21:50:59 +0000 (21:50 +0000)]
KVM: SEV: WARN if SEV-ES is marked active but SEV is not

WARN if the VM is tagged as SEV-ES but not SEV.  KVM relies on SEV and
SEV-ES being set atomically, and guards common flows with "is SEV", i.e.
observing SEV-ES without SEV means KVM has a fatal bug.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211109215101.2211373-5-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: SEV: Set sev_info.active after initial checks in sev_guest_init()
Sean Christopherson [Tue, 9 Nov 2021 21:50:58 +0000 (21:50 +0000)]
KVM: SEV: Set sev_info.active after initial checks in sev_guest_init()

Set sev_info.active during SEV/SEV-ES activation before calling any code
that can potentially consume sev_info.es_active, e.g. set "active" and
"es_active" as a pair immediately after the initial sanity checks.  KVM
generally expects that es_active can be true if and only if active is
true, e.g. sev_asid_new() deliberately avoids sev_es_guest() so that it
doesn't get a false negative.  This will allow WARNing in sev_es_guest()
if the VM is tagged as SEV-ES but not SEV.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211109215101.2211373-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: SEV: Disallow COPY_ENC_CONTEXT_FROM if target has created vCPUs
Sean Christopherson [Tue, 9 Nov 2021 21:50:56 +0000 (21:50 +0000)]
KVM: SEV: Disallow COPY_ENC_CONTEXT_FROM if target has created vCPUs

Reject COPY_ENC_CONTEXT_FROM if the destination VM has created vCPUs.
KVM relies on SEV activation to occur before vCPUs are created, e.g. to
set VMCB flags and intercepts correctly.

Fixes: 90d0c700aaf0 ("KVM: x86: Support KVM VMs sharing SEV context")
Cc: stable@vger.kernel.org
Cc: Peter Gonda <pgonda@google.com>
Cc: Marc Orr <marcorr@google.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Nathan Tempelman <natet@google.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211109215101.2211373-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: Kill kvm_map_gfn() / kvm_unmap_gfn() and gfn_to_pfn_cache
David Woodhouse [Mon, 15 Nov 2021 16:50:27 +0000 (16:50 +0000)]
KVM: Kill kvm_map_gfn() / kvm_unmap_gfn() and gfn_to_pfn_cache

In commit a4317b8a36c4 ("KVM: x86: Fix recording of guest steal time /
preempted status") I removed the only user of these functions because
it was basically impossible to use them safely.

There are two stages to the GFN->PFN mapping; first through the KVM
memslots to a userspace HVA and then through the page tables to
translate that HVA to an underlying PFN. Invalidations of the former
were being handled correctly, but no attempt was made to use the MMU
notifiers to invalidate the cache when the HVA->GFN mapping changed.

As a prelude to reinventing the gfn_to_pfn_cache with more usable
semantics, rip it out entirely and untangle the implementation of
the unsafe kvm_vcpu_map()/kvm_vcpu_unmap() functions from it.

All current users of kvm_vcpu_map() also look broken right now, and
will be dealt with separately. They broadly fall into two classes:

* Those which map, access the data and immediately unmap. This is
  mostly gratuitous and could just as well use the existing user
  HVA, and could probably benefit from a gfn_to_hva_cache as they
  do so.

* Those which keep the mapping around for a longer time, perhaps
  even using the PFN directly from the guest. These will need to
  be converted to the new gfn_to_pfn_cache and then kvm_vcpu_map()
  can be removed too.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <20211115165030.7422-8-dwmw2@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: nVMX: Use a gfn_to_hva_cache for vmptrld
David Woodhouse [Mon, 15 Nov 2021 16:50:26 +0000 (16:50 +0000)]
KVM: nVMX: Use a gfn_to_hva_cache for vmptrld

And thus another call to kvm_vcpu_map() can die.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <20211115165030.7422-7-dwmw2@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: nVMX: Use kvm_read_guest_offset_cached() for nested VMCS check
David Woodhouse [Mon, 15 Nov 2021 16:50:25 +0000 (16:50 +0000)]
KVM: nVMX: Use kvm_read_guest_offset_cached() for nested VMCS check

Kill another mostly gratuitous kvm_vcpu_map() which could just use the
userspace HVA for it.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <20211115165030.7422-6-dwmw2@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: x86/xen: Use sizeof_field() instead of open-coding it
David Woodhouse [Mon, 15 Nov 2021 16:50:23 +0000 (16:50 +0000)]
KVM: x86/xen: Use sizeof_field() instead of open-coding it

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <20211115165030.7422-4-dwmw2@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: nVMX: Use kvm_{read,write}_guest_cached() for shadow_vmcs12
David Woodhouse [Mon, 15 Nov 2021 16:50:24 +0000 (16:50 +0000)]
KVM: nVMX: Use kvm_{read,write}_guest_cached() for shadow_vmcs12

Using kvm_vcpu_map() for reading from the guest is entirely gratuitous,
when all we do is a single memcpy and unmap it again. Fix it up to use
kvm_read_guest()... but in fact I couldn't bring myself to do that
without also making it use a gfn_to_hva_cache for both that *and* the
copy in the other direction.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <20211115165030.7422-5-dwmw2@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: x86/xen: Fix get_attr of KVM_XEN_ATTR_TYPE_SHARED_INFO
David Woodhouse [Mon, 15 Nov 2021 16:50:21 +0000 (16:50 +0000)]
KVM: x86/xen: Fix get_attr of KVM_XEN_ATTR_TYPE_SHARED_INFO

In commit 2140a4ca7de5 ("KVM: xen: do not use struct gfn_to_hva_cache") we
stopped storing this in-kernel as a GPA, and started storing it as a GFN.
Which means we probably should have stopped calling gpa_to_gfn() on it
when userspace asks for it back.

Cc: stable@vger.kernel.org
Fixes: 2140a4ca7de5 ("KVM: xen: do not use struct gfn_to_hva_cache")
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <20211115165030.7422-2-dwmw2@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: x86/mmu: include EFER.LMA in extended mmu role
Maxim Levitsky [Mon, 15 Nov 2021 13:18:37 +0000 (15:18 +0200)]
KVM: x86/mmu: include EFER.LMA in extended mmu role

Incorporate EFER.LMA into kvm_mmu_extended_role, as it used to compute the
guest root level and is not reflected in kvm_mmu_page_role.level when TDP
is in use.  When simply running the guest, it is impossible for EFER.LMA
and kvm_mmu.root_level to get out of sync, as the guest cannot transition
from PAE paging to 64-bit paging without toggling CR0.PG, i.e. without
first bouncing through a different MMU context.  And stuffing guest state
via KVM_SET_SREGS{,2} also ensures a full MMU context reset.

However, if KVM_SET_SREGS{,2} is followed by KVM_SET_NESTED_STATE, e.g. to
set guest state when migrating the VM while L2 is active, the vCPU state
will reflect L2, not L1.  If L1 is using TDP for L2, then root_mmu will
have been configured using L2's state, despite not being used for L2.  If
L2.EFER.LMA != L1.EFER.LMA, and L2 is using PAE paging, then root_mmu will
be configured for guest PAE paging, but will match the mmu_role for 64-bit
paging and cause KVM to not reconfigure root_mmu on the next nested VM-Exit.

Alternatively, the root_mmu's role could be invalidated after a successful
KVM_SET_NESTED_STATE that yields vcpu->arch.mmu != vcpu->arch.root_mmu,
i.e. that switches the active mmu to guest_mmu, but doing so is unnecessarily
tricky, and not even needed if L1 and L2 do have the same role (e.g., they
are both 64-bit guests and run with the same CR4).

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20211115131837.195527-3-mlevitsk@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: nVMX: don't use vcpu->arch.efer when checking host state on nested state load
Maxim Levitsky [Mon, 15 Nov 2021 13:18:36 +0000 (15:18 +0200)]
KVM: nVMX: don't use vcpu->arch.efer when checking host state on nested state load

When loading nested state, don't use check vcpu->arch.efer to get the
L1 host's 64-bit vs. 32-bit state and don't check it for consistency
with respect to VM_EXIT_HOST_ADDR_SPACE_SIZE, as register state in vCPU
may be stale when KVM_SET_NESTED_STATE is called---and architecturally
does not exist.  When restoring L2 state in KVM, the CPU is placed in
non-root where nested VMX code has no snapshot of L1 host state: VMX
(conditionally) loads host state fields loaded on VM-exit, but they need
not correspond to the state before entry.  A simple case occurs in KVM
itself, where the host RIP field points to vmx_vmexit rather than the
instruction following vmlaunch/vmresume.

However, for the particular case of L1 being in 32- or 64-bit mode
on entry, the exit controls can be treated instead as the source of
truth regarding the state of L1 on entry, and can be used to check
that vmcs12.VM_EXIT_HOST_ADDR_SPACE_SIZE matches vmcs12.HOST_EFER if
vmcs12.VM_EXIT_LOAD_IA32_EFER is set.  The consistency check on CPU
EFER vs. vmcs12.VM_EXIT_HOST_ADDR_SPACE_SIZE, instead, happens only
on VM-Enter.  That's because, again, there's conceptually no "current"
L1 EFER to check on KVM_SET_NESTED_STATE.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20211115131837.195527-2-mlevitsk@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: Fix steal time asm constraints
David Woodhouse [Sun, 14 Nov 2021 08:59:02 +0000 (08:59 +0000)]
KVM: Fix steal time asm constraints

In 64-bit mode, x86 instruction encoding allows us to use the low 8 bits
of any GPR as an 8-bit operand. In 32-bit mode, however, we can only use
the [abcd] registers. For which, GCC has the "q" constraint instead of
the less restrictive "r".

Also fix st->preempted, which is an input/output operand rather than an
input.

Fixes: a4317b8a36c4 ("KVM: x86: Fix recording of guest steal time / preempted status")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <89bf72db1b859990355f9c40713a34e0d2d86c98.camel@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agocpuid: kvm_find_kvm_cpuid_features() should be declared 'static'
Paul Durrant [Mon, 15 Nov 2021 14:41:31 +0000 (14:41 +0000)]
cpuid: kvm_find_kvm_cpuid_features() should be declared 'static'

The lack a static declaration currently results in:

arch/x86/kvm/cpuid.c:128:26: warning: no previous prototype for function 'kvm_find_kvm_cpuid_features'

when compiling with "W=1".

Reported-by: kernel test robot <lkp@intel.com>
Fixes: 15fdefb7cfb6 ("KVM: x86: Make sure KVM_CPUID_FEATURES really are KVM_CPUID_FEATURES")
Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Message-Id: <20211115144131.5943-1-pdurrant@amazon.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoMerge tag 'gfs2-v5.16-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Wed, 17 Nov 2021 23:55:07 +0000 (15:55 -0800)]
Merge tag 'gfs2-v5.16-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull gfs2 fixes from Andreas Gruenbacher:

 - The current iomap_file_buffered_write behavior of failing the entire
   write when part of the user buffer cannot be faulted in leads to an
   endless loop in gfs2. Work around that in gfs2 for now.

 - Various other bugs all over the place.

* tag 'gfs2-v5.16-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  gfs2: Prevent endless loops in gfs2_file_buffered_write
  gfs2: Fix "Introduce flag for glock holder auto-demotion"
  gfs2: Fix length of holes reported at end-of-file
  gfs2: release iopen glock early in evict
  gfs2: Fix atomic bug in gfs2_instantiate
  gfs2: Only dereference i->iov when iter_is_iovec(i)

3 years agoMerge tag 'mips-fixes_5.16_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips...
Linus Torvalds [Wed, 17 Nov 2021 23:12:50 +0000 (15:12 -0800)]
Merge tag 'mips-fixes_5.16_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux

Pull MIPS fixes from Thomas Bogendoerfer:

 - wire futex_waitv syscall

 - build fixes for lantiq and bcm63xx configs

 - yamon-dt bugfix

* tag 'mips-fixes_5.16_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
  mips: lantiq: add support for clk_get_parent()
  mips: bcm63xx: add support for clk_get_parent()
  MIPS: generic/yamon-dt: fix uninitialized variable error
  MIPS: syscalls: Wire up futex_waitv syscall

3 years agoMerge tag 'hyperv-fixes-signed-20211117' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Wed, 17 Nov 2021 16:46:15 +0000 (08:46 -0800)]
Merge tag 'hyperv-fixes-signed-20211117' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux

Pull hyperv fixes from Wei Liu:

 - Fix ring size calculation for balloon driver (Boqun Feng)

 - Fix issues in Hyper-V setup code (Sean Christopherson)

* tag 'hyperv-fixes-signed-20211117' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
  x86/hyperv: Move required MSRs check to initial platform probing
  x86/hyperv: Fix NULL deref in set_hv_tscchange_cb() if Hyper-V setup fails
  Drivers: hv: balloon: Use VMBUS_RING_SIZE() wrapper for dm_ring_size

3 years agoMerge tag 'nfsd-5.16-1' of git://linux-nfs.org/~bfields/linux
Linus Torvalds [Wed, 17 Nov 2021 16:38:00 +0000 (08:38 -0800)]
Merge tag 'nfsd-5.16-1' of git://linux-nfs.org/~bfields/linux

Pull nfsd bugfix from Bruce Fields:
 "This is just one bugfix for a buffer overflow in knfsd's xdr decoding"

* tag 'nfsd-5.16-1' of git://linux-nfs.org/~bfields/linux:
  NFSD: Fix exposure in nfsd4_decode_bitmap()

3 years agoDocumentation/process: fix a cross reference
Mauro Carvalho Chehab [Tue, 16 Nov 2021 12:11:23 +0000 (12:11 +0000)]
Documentation/process: fix a cross reference

The cross-reference for the handbooks section works. However, it is
meant to describe the path inside the Kernel's doc where the section
is, but there's an space instead of a dash, plus it lacks the .rst at
the end, which makes:

./scripts/documentation-file-ref-check

to complain.

Fixes: 62458f5f8e6c ("Documentation/process: Add maintainer handbooks section")
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
3 years agoDocumentation: update vcpu-requests.rst reference
Mauro Carvalho Chehab [Tue, 16 Nov 2021 12:11:22 +0000 (12:11 +0000)]
Documentation: update vcpu-requests.rst reference

Changeset ee7146d3b441 ("Documentation: move Documentation/virtual to Documentation/virt")
renamed: Documentation/virtual/kvm/vcpu-requests.rst
to: Documentation/virt/kvm/vcpu-requests.rst.

Update its cross-reference accordingly.

Fixes: ee7146d3b441 ("Documentation: move Documentation/virtual to Documentation/virt")
Reviewed-by: Anup Patel <anup.patel@wdc.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
3 years agodocs: accounting: update delay-accounting.rst reference
Mauro Carvalho Chehab [Tue, 16 Nov 2021 12:11:21 +0000 (12:11 +0000)]
docs: accounting: update delay-accounting.rst reference

The file name: accounting/delay-accounting.rst
should be, instead: Documentation/accounting/delay-accounting.rst.

Also, there's no need to use doc:`foo`, as automarkup.py will
automatically handle plain text mentions to Documentation/
files.

So, update its cross-reference accordingly.

Fixes: 8d4df6235e14 ("delayacct: Document task_delayacct sysctl")
Fixes: c9a00e608a6e ("docs: accounting: convert to ReST")
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
3 years agolibbpf: update index.rst reference
Mauro Carvalho Chehab [Tue, 16 Nov 2021 12:11:20 +0000 (12:11 +0000)]
libbpf: update index.rst reference

Changeset 83a162644735 ("libbpf: Rename libbpf documentation index file")
renamed: Documentation/bpf/libbpf/libbpf.rst
to: Documentation/bpf/libbpf/index.rst.

Update its cross-reference accordingly.

Fixes: 83a162644735 ("libbpf: Rename libbpf documentation index file")
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
3 years agoparisc/sticon: fix reverse colors
Sven Schnelle [Sun, 14 Nov 2021 16:08:17 +0000 (17:08 +0100)]
parisc/sticon: fix reverse colors

sticon_build_attr() checked the reverse argument and flipped
background and foreground color, but returned the non-reverse
value afterwards. Fix this and also add two local variables
for foreground and background color to make the code easier
to read.

Signed-off-by: Sven Schnelle <svens@stackframe.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Helge Deller <deller@gmx.de>
3 years agofs: handle circular mappings correctly
Christian Brauner [Tue, 9 Nov 2021 14:57:12 +0000 (15:57 +0100)]
fs: handle circular mappings correctly

When calling setattr_prepare() to determine the validity of the attributes the
ia_{g,u}id fields contain the value that will be written to inode->i_{g,u}id.
When the {g,u}id attribute of the file isn't altered and the caller's fs{g,u}id
matches the current {g,u}id attribute the attribute change is allowed.

The value in ia_{g,u}id does already account for idmapped mounts and will have
taken the relevant idmapping into account. So in order to verify that the
{g,u}id attribute isn't changed we simple need to compare the ia_{g,u}id value
against the inode's i_{g,u}id value.

This only has any meaning for idmapped mounts as idmapping helpers are
idempotent without them. And for idmapped mounts this really only has a meaning
when circular idmappings are used, i.e. mappings where e.g. id 1000 is mapped
to id 1001 and id 1001 is mapped to id 1000. Such ciruclar mappings can e.g. be
useful when sharing the same home directory between multiple users at the same
time.

As an example consider a directory with two files: /source/file1 owned by
{g,u}id 1000 and /source/file2 owned by {g,u}id 1001. Assume we create an
idmapped mount at /target with an idmapping that maps files owned by {g,u}id
1000 to being owned by {g,u}id 1001 and files owned by {g,u}id 1001 to being
owned by {g,u}id 1000. In effect, the idmapped mount at /target switches the
ownership of /source/file1 and source/file2, i.e. /target/file1 will be owned
by {g,u}id 1001 and /target/file2 will be owned by {g,u}id 1000.

This means that a user with fs{g,u}id 1000 must be allowed to setattr
/target/file2 from {g,u}id 1000 to {g,u}id 1000. Similar, a user with fs{g,u}id
1001 must be allowed to setattr /target/file1 from {g,u}id 1001 to {g,u}id
1001. Conversely, a user with fs{g,u}id 1000 must fail to setattr /target/file1
from {g,u}id 1001 to {g,u}id 1000. And a user with fs{g,u}id 1001 must fail to
setattr /target/file2 from {g,u}id 1000 to {g,u}id 1000. Both cases must fail
with EPERM for non-capable callers.

Before this patch we could end up denying legitimate attribute changes and
allowing invalid attribute changes when circular mappings are used. To even get
into this situation the caller must've been privileged both to create that
mapping and to create that idmapped mount.

This hasn't been seen in the wild anywhere but came up when expanding the
testsuite during work on a series of hardening patches. All idmapped fstests
pass without any regressions and we add new tests to verify the behavior of
circular mappings.

Link: https://lore.kernel.org/r/20211109145713.1868404-1-brauner@kernel.org
Fixes: f8ec3ee89f72 ("attr: handle idmapped mounts")
Cc: Seth Forshee <seth.forshee@digitalocean.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@vger.kernel.org
CC: linux-fsdevel@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Seth Forshee <sforshee@digitalocean.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
3 years agoMerge branch 'kvm-selftest' into kvm-master
Paolo Bonzini [Tue, 16 Nov 2021 12:44:13 +0000 (07:44 -0500)]
Merge branch 'kvm-selftest' into kvm-master

- Cleanups for the perf test infrastructure and mapping hugepages

- Avoid contention on mmap_sem when the guests start to run

- Add event channel upcall support to xen_shinfo_test

3 years agobtrfs: deprecate BTRFS_IOC_BALANCE ioctl
Nikolay Borisov [Wed, 10 Nov 2021 11:41:04 +0000 (13:41 +0200)]
btrfs: deprecate BTRFS_IOC_BALANCE ioctl

The v2 balance ioctl has been introduced more than 9 years ago. Users of
the old v1 ioctl should have long been migrated to it. It's time we
deprecate it and eventually remove it.

The only known user is in btrfs-progs that tries v1 as a fallback in
case v2 is not supported. This is not necessary anymore.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
3 years agobtrfs: make 1-bit bit-fields of scrub_page unsigned int
Colin Ian King [Wed, 10 Nov 2021 19:20:08 +0000 (19:20 +0000)]
btrfs: make 1-bit bit-fields of scrub_page unsigned int

The bitfields have_csum and io_error are currently signed which is not
recommended as the representation is an implementation defined
behaviour. Fix this by making the bit-fields unsigned ints.

Fixes: 207f04feb33f ("btrfs: scrub: remove the anonymous structure from scrub_page")
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
3 years agobtrfs: check-integrity: fix a warning on write caching disabled disk
Wang Yugui [Wed, 27 Oct 2021 22:32:54 +0000 (06:32 +0800)]
btrfs: check-integrity: fix a warning on write caching disabled disk

When a disk has write caching disabled, we skip submission of a bio with
flush and sync requests before writing the superblock, since it's not
needed. However when the integrity checker is enabled, this results in
reports that there are metadata blocks referred by a superblock that
were not properly flushed. So don't skip the bio submission only when
the integrity checker is enabled for the sake of simplicity, since this
is a debug tool and not meant for use in non-debug builds.

fstests/btrfs/220 trigger a check-integrity warning like the following
when CONFIG_BTRFS_FS_CHECK_INTEGRITY=y and the disk with WCE=0.

  btrfs: attempt to write superblock which references block M @5242880 (sdb2/5242880/0) which is not flushed out of disk's write cache (block flush_gen=1, dev->flush_gen=0)!
  ------------[ cut here ]------------
  WARNING: CPU: 28 PID: 843680 at fs/btrfs/check-integrity.c:2196 btrfsic_process_written_superblock+0x22a/0x2a0 [btrfs]
  CPU: 28 PID: 843680 Comm: umount Not tainted 5.15.0-0.rc5.39.el8.x86_64 #1
  Hardware name: Dell Inc. Precision T7610/0NK70N, BIOS A18 09/11/2019
  RIP: 0010:btrfsic_process_written_superblock+0x22a/0x2a0 [btrfs]
  RSP: 0018:ffffb642afb47940 EFLAGS: 00010246
  RAX: 0000000000000000 RBX: 0000000000000002 RCX: 0000000000000000
  RDX: 00000000ffffffff RSI: ffff8b722fc97d00 RDI: ffff8b722fc97d00
  RBP: ffff8b5601c00000 R08: 0000000000000000 R09: c0000000ffff7fff
  R10: 0000000000000001 R11: ffffb642afb476f8 R12: ffffffffffffffff
  R13: ffffb642afb47974 R14: ffff8b5499254c00 R15: 0000000000000003
  FS:  00007f00a06d4080(0000) GS:ffff8b722fc80000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007fff5cff5ff0 CR3: 00000001c0c2a006 CR4: 00000000001706e0
  Call Trace:
   btrfsic_process_written_block+0x2f7/0x850 [btrfs]
   __btrfsic_submit_bio.part.19+0x310/0x330 [btrfs]
   ? bio_associate_blkg_from_css+0xa4/0x2c0
   btrfsic_submit_bio+0x18/0x30 [btrfs]
   write_dev_supers+0x81/0x2a0 [btrfs]
   ? find_get_pages_range_tag+0x219/0x280
   ? pagevec_lookup_range_tag+0x24/0x30
   ? __filemap_fdatawait_range+0x6d/0xf0
   ? __raw_callee_save___native_queued_spin_unlock+0x11/0x1e
   ? find_first_extent_bit+0x9b/0x160 [btrfs]
   ? __raw_callee_save___native_queued_spin_unlock+0x11/0x1e
   write_all_supers+0x1b3/0xa70 [btrfs]
   ? __raw_callee_save___native_queued_spin_unlock+0x11/0x1e
   btrfs_commit_transaction+0x59d/0xac0 [btrfs]
   close_ctree+0x11d/0x339 [btrfs]
   generic_shutdown_super+0x71/0x110
   kill_anon_super+0x14/0x30
   btrfs_kill_super+0x12/0x20 [btrfs]
   deactivate_locked_super+0x31/0x70
   cleanup_mnt+0xb8/0x140
   task_work_run+0x6d/0xb0
   exit_to_user_mode_prepare+0x1f0/0x200
   syscall_exit_to_user_mode+0x12/0x30
   do_syscall_64+0x46/0x80
   entry_SYSCALL_64_after_hwframe+0x44/0xae
  RIP: 0033:0x7f009f711dfb
  RSP: 002b:00007fff5cff7928 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
  RAX: 0000000000000000 RBX: 000055b68c6c9970 RCX: 00007f009f711dfb
  RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000055b68c6c9b50
  RBP: 0000000000000000 R08: 000055b68c6ca900 R09: 00007f009f795580
  R10: 0000000000000000 R11: 0000000000000246 R12: 000055b68c6c9b50
  R13: 00007f00a04bf184 R14: 0000000000000000 R15: 00000000ffffffff
  ---[ end trace 2c4b82abcef9eec4 ]---
  S-65536(sdb2/65536/1)
   -->
  M-1064960(sdb2/1064960/1)

Reviewed-by: Filipe Manana <fdmanana@gmail.com>
Signed-off-by: Wang Yugui <wangyugui@e16-tech.com>
Signed-off-by: David Sterba <dsterba@suse.com>
3 years agobtrfs: silence lockdep when reading chunk tree during mount
Filipe Manana [Thu, 4 Nov 2021 12:43:08 +0000 (12:43 +0000)]
btrfs: silence lockdep when reading chunk tree during mount

Often some test cases like btrfs/161 trigger lockdep splats that complain
about possible unsafe lock scenario due to the fact that during mount,
when reading the chunk tree we end up calling blkdev_get_by_path() while
holding a read lock on a leaf of the chunk tree. That produces a lockdep
splat like the following:

[ 3653.683975] ======================================================
[ 3653.685148] WARNING: possible circular locking dependency detected
[ 3653.686301] 5.15.0-rc7-btrfs-next-103 #1 Not tainted
[ 3653.687239] ------------------------------------------------------
[ 3653.688400] mount/447465 is trying to acquire lock:
[ 3653.689320] ffff8c6b0c76e528 (&disk->open_mutex){+.+.}-{3:3}, at: blkdev_get_by_dev.part.0+0xe7/0x320
[ 3653.691054]
               but task is already holding lock:
[ 3653.692155] ffff8c6b0a9f39e0 (btrfs-chunk-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x24/0x110 [btrfs]
[ 3653.693978]
               which lock already depends on the new lock.

[ 3653.695510]
               the existing dependency chain (in reverse order) is:
[ 3653.696915]
               -> #3 (btrfs-chunk-00){++++}-{3:3}:
[ 3653.698053]        down_read_nested+0x4b/0x140
[ 3653.698893]        __btrfs_tree_read_lock+0x24/0x110 [btrfs]
[ 3653.699988]        btrfs_read_lock_root_node+0x31/0x40 [btrfs]
[ 3653.701205]        btrfs_search_slot+0x537/0xc00 [btrfs]
[ 3653.702234]        btrfs_insert_empty_items+0x32/0x70 [btrfs]
[ 3653.703332]        btrfs_init_new_device+0x563/0x15b0 [btrfs]
[ 3653.704439]        btrfs_ioctl+0x2110/0x3530 [btrfs]
[ 3653.705405]        __x64_sys_ioctl+0x83/0xb0
[ 3653.706215]        do_syscall_64+0x3b/0xc0
[ 3653.706990]        entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 3653.708040]
               -> #2 (sb_internal#2){.+.+}-{0:0}:
[ 3653.708994]        lock_release+0x13d/0x4a0
[ 3653.709533]        up_write+0x18/0x160
[ 3653.710017]        btrfs_sync_file+0x3f3/0x5b0 [btrfs]
[ 3653.710699]        __loop_update_dio+0xbd/0x170 [loop]
[ 3653.711360]        lo_ioctl+0x3b1/0x8a0 [loop]
[ 3653.711929]        block_ioctl+0x48/0x50
[ 3653.712442]        __x64_sys_ioctl+0x83/0xb0
[ 3653.712991]        do_syscall_64+0x3b/0xc0
[ 3653.713519]        entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 3653.714233]
               -> #1 (&lo->lo_mutex){+.+.}-{3:3}:
[ 3653.715026]        __mutex_lock+0x92/0x900
[ 3653.715648]        lo_open+0x28/0x60 [loop]
[ 3653.716275]        blkdev_get_whole+0x28/0x90
[ 3653.716867]        blkdev_get_by_dev.part.0+0x142/0x320
[ 3653.717537]        blkdev_open+0x5e/0xa0
[ 3653.718043]        do_dentry_open+0x163/0x390
[ 3653.718604]        path_openat+0x3f0/0xa80
[ 3653.719128]        do_filp_open+0xa9/0x150
[ 3653.719652]        do_sys_openat2+0x97/0x160
[ 3653.720197]        __x64_sys_openat+0x54/0x90
[ 3653.720766]        do_syscall_64+0x3b/0xc0
[ 3653.721285]        entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 3653.721986]
               -> #0 (&disk->open_mutex){+.+.}-{3:3}:
[ 3653.722775]        __lock_acquire+0x130e/0x2210
[ 3653.723348]        lock_acquire+0xd7/0x310
[ 3653.723867]        __mutex_lock+0x92/0x900
[ 3653.724394]        blkdev_get_by_dev.part.0+0xe7/0x320
[ 3653.725041]        blkdev_get_by_path+0xb8/0xd0
[ 3653.725614]        btrfs_get_bdev_and_sb+0x1b/0xb0 [btrfs]
[ 3653.726332]        open_fs_devices+0xd7/0x2c0 [btrfs]
[ 3653.726999]        btrfs_read_chunk_tree+0x3ad/0x870 [btrfs]
[ 3653.727739]        open_ctree+0xb8e/0x17bf [btrfs]
[ 3653.728384]        btrfs_mount_root.cold+0x12/0xde [btrfs]
[ 3653.729130]        legacy_get_tree+0x30/0x50
[ 3653.729676]        vfs_get_tree+0x28/0xc0
[ 3653.730192]        vfs_kern_mount.part.0+0x71/0xb0
[ 3653.730800]        btrfs_mount+0x11d/0x3a0 [btrfs]
[ 3653.731427]        legacy_get_tree+0x30/0x50
[ 3653.731970]        vfs_get_tree+0x28/0xc0
[ 3653.732486]        path_mount+0x2d4/0xbe0
[ 3653.732997]        __x64_sys_mount+0x103/0x140
[ 3653.733560]        do_syscall_64+0x3b/0xc0
[ 3653.734080]        entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 3653.734782]
               other info that might help us debug this:

[ 3653.735784] Chain exists of:
                 &disk->open_mutex --> sb_internal#2 --> btrfs-chunk-00

[ 3653.737123]  Possible unsafe locking scenario:

[ 3653.737865]        CPU0                    CPU1
[ 3653.738435]        ----                    ----
[ 3653.739007]   lock(btrfs-chunk-00);
[ 3653.739449]                                lock(sb_internal#2);
[ 3653.740193]                                lock(btrfs-chunk-00);
[ 3653.740955]   lock(&disk->open_mutex);
[ 3653.741431]
                *** DEADLOCK ***

[ 3653.742176] 3 locks held by mount/447465:
[ 3653.742739]  #0: ffff8c6acf85c0e8 (&type->s_umount_key#44/1){+.+.}-{3:3}, at: alloc_super+0xd5/0x3b0
[ 3653.744114]  #1: ffffffffc0b28f70 (uuid_mutex){+.+.}-{3:3}, at: btrfs_read_chunk_tree+0x59/0x870 [btrfs]
[ 3653.745563]  #2: ffff8c6b0a9f39e0 (btrfs-chunk-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x24/0x110 [btrfs]
[ 3653.747066]
               stack backtrace:
[ 3653.747723] CPU: 4 PID: 447465 Comm: mount Not tainted 5.15.0-rc7-btrfs-next-103 #1
[ 3653.748873] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 3653.750592] Call Trace:
[ 3653.750967]  dump_stack_lvl+0x57/0x72
[ 3653.751526]  check_noncircular+0xf3/0x110
[ 3653.752136]  ? stack_trace_save+0x4b/0x70
[ 3653.752748]  __lock_acquire+0x130e/0x2210
[ 3653.753356]  lock_acquire+0xd7/0x310
[ 3653.753898]  ? blkdev_get_by_dev.part.0+0xe7/0x320
[ 3653.754596]  ? lock_is_held_type+0xe8/0x140
[ 3653.755125]  ? blkdev_get_by_dev.part.0+0xe7/0x320
[ 3653.755729]  ? blkdev_get_by_dev.part.0+0xe7/0x320
[ 3653.756338]  __mutex_lock+0x92/0x900
[ 3653.756794]  ? blkdev_get_by_dev.part.0+0xe7/0x320
[ 3653.757400]  ? do_raw_spin_unlock+0x4b/0xa0
[ 3653.757930]  ? _raw_spin_unlock+0x29/0x40
[ 3653.758437]  ? bd_prepare_to_claim+0x129/0x150
[ 3653.758999]  ? trace_module_get+0x2b/0xd0
[ 3653.759508]  ? try_module_get.part.0+0x50/0x80
[ 3653.760072]  blkdev_get_by_dev.part.0+0xe7/0x320
[ 3653.760661]  ? devcgroup_check_permission+0xc1/0x1f0
[ 3653.761288]  blkdev_get_by_path+0xb8/0xd0
[ 3653.761797]  btrfs_get_bdev_and_sb+0x1b/0xb0 [btrfs]
[ 3653.762454]  open_fs_devices+0xd7/0x2c0 [btrfs]
[ 3653.763055]  ? clone_fs_devices+0x8f/0x170 [btrfs]
[ 3653.763689]  btrfs_read_chunk_tree+0x3ad/0x870 [btrfs]
[ 3653.764370]  ? kvm_sched_clock_read+0x14/0x40
[ 3653.764922]  open_ctree+0xb8e/0x17bf [btrfs]
[ 3653.765493]  ? super_setup_bdi_name+0x79/0xd0
[ 3653.766043]  btrfs_mount_root.cold+0x12/0xde [btrfs]
[ 3653.766780]  ? rcu_read_lock_sched_held+0x3f/0x80
[ 3653.767488]  ? kfree+0x1f2/0x3c0
[ 3653.767979]  legacy_get_tree+0x30/0x50
[ 3653.768548]  vfs_get_tree+0x28/0xc0
[ 3653.769076]  vfs_kern_mount.part.0+0x71/0xb0
[ 3653.769718]  btrfs_mount+0x11d/0x3a0 [btrfs]
[ 3653.770381]  ? rcu_read_lock_sched_held+0x3f/0x80
[ 3653.771086]  ? kfree+0x1f2/0x3c0
[ 3653.771574]  legacy_get_tree+0x30/0x50
[ 3653.772136]  vfs_get_tree+0x28/0xc0
[ 3653.772673]  path_mount+0x2d4/0xbe0
[ 3653.773201]  __x64_sys_mount+0x103/0x140
[ 3653.773793]  do_syscall_64+0x3b/0xc0
[ 3653.774333]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 3653.775094] RIP: 0033:0x7f648bc45aaa

This happens because through btrfs_read_chunk_tree(), which is called only
during mount, ends up acquiring the mutex open_mutex of a block device
while holding a read lock on a leaf of the chunk tree while other paths
need to acquire other locks before locking extent buffers of the chunk
tree.

Since at mount time when we call btrfs_read_chunk_tree() we know that
we don't have other tasks running in parallel and modifying the chunk
tree, we can simply skip locking of chunk tree extent buffers. So do
that and move the assertion that checks the fs is not yet mounted to the
top block of btrfs_read_chunk_tree(), with a comment before doing it.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
3 years agobtrfs: fix memory ordering between normal and ordered work functions
Nikolay Borisov [Tue, 2 Nov 2021 12:49:16 +0000 (14:49 +0200)]
btrfs: fix memory ordering between normal and ordered work functions

Ordered work functions aren't guaranteed to be handled by the same thread
which executed the normal work functions. The only way execution between
normal/ordered functions is synchronized is via the WORK_DONE_BIT,
unfortunately the used bitops don't guarantee any ordering whatsoever.

This manifested as seemingly inexplicable crashes on ARM64, where
async_chunk::inode is seen as non-null in async_cow_submit which causes
submit_compressed_extents to be called and crash occurs because
async_chunk::inode suddenly became NULL. The call trace was similar to:

    pc : submit_compressed_extents+0x38/0x3d0
    lr : async_cow_submit+0x50/0xd0
    sp : ffff800015d4bc20

    <registers omitted for brevity>

    Call trace:
     submit_compressed_extents+0x38/0x3d0
     async_cow_submit+0x50/0xd0
     run_ordered_work+0xc8/0x280
     btrfs_work_helper+0x98/0x250
     process_one_work+0x1f0/0x4ac
     worker_thread+0x188/0x504
     kthread+0x110/0x114
     ret_from_fork+0x10/0x18

Fix this by adding respective barrier calls which ensure that all
accesses preceding setting of WORK_DONE_BIT are strictly ordered before
setting the flag. At the same time add a read barrier after reading of
WORK_DONE_BIT in run_ordered_work which ensures all subsequent loads
would be strictly ordered after reading the bit. This in turn ensures
are all accesses before WORK_DONE_BIT are going to be strictly ordered
before any access that can occur in ordered_func.

Reported-by: Chris Murphy <lists@colorremedies.com>
Fixes: fbc547198d8f ("btrfs: Added btrfs_workqueue_struct implemented ordered execution based on kernel workqueue")
CC: stable@vger.kernel.org # 4.4+
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2011928
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Tested-by: Chris Murphy <chris@colorremedies.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
3 years agobtrfs: fix a out-of-bound access in copy_compressed_data_to_page()
Qu Wenruo [Fri, 12 Nov 2021 04:47:30 +0000 (12:47 +0800)]
btrfs: fix a out-of-bound access in copy_compressed_data_to_page()

[BUG]
The following script can cause btrfs to crash:

  $ mount -o compress-force=lzo $DEV /mnt
  $ dd if=/dev/urandom of=/mnt/foo bs=4k count=1
  $ sync

The call trace looks like this:

  general protection fault, probably for non-canonical address 0xe04b37fccce3b000: 0000 [#1] PREEMPT SMP NOPTI
  CPU: 5 PID: 164 Comm: kworker/u20:3 Not tainted 5.15.0-rc7-custom+ #4
  Workqueue: btrfs-delalloc btrfs_work_helper [btrfs]
  RIP: 0010:__memcpy+0x12/0x20
  Call Trace:
   lzo_compress_pages+0x236/0x540 [btrfs]
   btrfs_compress_pages+0xaa/0xf0 [btrfs]
   compress_file_range+0x431/0x8e0 [btrfs]
   async_cow_start+0x12/0x30 [btrfs]
   btrfs_work_helper+0xf6/0x3e0 [btrfs]
   process_one_work+0x294/0x5d0
   worker_thread+0x55/0x3c0
   kthread+0x140/0x170
   ret_from_fork+0x22/0x30
  ---[ end trace 63c3c0f131e61982 ]---

[CAUSE]
In lzo_compress_pages(), parameter @out_pages is not only an output
parameter (for the number of compressed pages), but also an input
parameter, as the upper limit of compressed pages we can utilize.

In commit 98d42dc1537d ("btrfs: subpage: make lzo_compress_pages()
compatible"), the refactoring doesn't take @out_pages as an input, thus
completely ignoring the limit.

And for compress-force case, we could hit incompressible data that
compressed size would go beyond the page limit, and cause the above
crash.

[FIX]
Save @out_pages as @max_nr_page, and pass it to lzo_compress_pages(),
and check if we're beyond the limit before accessing the pages.

Note: this also fixes crash on 32bit architectures that was suspected to
be caused by merge of btrfs patches to 5.16-rc1. Reported in
https://lore.kernel.org/all/20211104115001.GU20319@twin.jikos.cz/ .

Reported-by: Omar Sandoval <osandov@fb.com>
Fixes: 98d42dc1537d ("btrfs: subpage: make lzo_compress_pages() compatible")
Reviewed-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ add note ]
Signed-off-by: David Sterba <dsterba@suse.com>
3 years agoKVM: x86: Fix uninitialized eoi_exit_bitmap usage in vcpu_load_eoi_exitmap()
黄乐 [Mon, 15 Nov 2021 14:08:29 +0000 (14:08 +0000)]
KVM: x86: Fix uninitialized eoi_exit_bitmap usage in vcpu_load_eoi_exitmap()

In vcpu_load_eoi_exitmap(), currently the eoi_exit_bitmap[4] array is
initialized only when Hyper-V context is available, in other path it is
just passed to kvm_x86_ops.load_eoi_exitmap() directly from on the stack,
which would cause unexpected interrupt delivery/handling issues, e.g. an
*old* linux kernel that relies on PIT to do clock calibration on KVM might
randomly fail to boot.

Fix it by passing ioapic_handled_vectors to load_eoi_exitmap() when Hyper-V
context is not available.

Fixes: f4258c991c32 ("KVM: x86: hyper-v: Prepare to meet unallocated Hyper-V context")
Cc: stable@vger.kernel.org
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Huang Le <huangle1@jd.com>
Message-Id: <62115b277dab49ea97da5633f8522daf@jd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: selftests: Use perf_test_destroy_vm in memslot_modification_stress_test
David Matlack [Thu, 11 Nov 2021 00:12:57 +0000 (00:12 +0000)]
KVM: selftests: Use perf_test_destroy_vm in memslot_modification_stress_test

Change memslot_modification_stress_test to use perf_test_destroy_vm
instead of manually calling ucall_uninit and kvm_vm_free.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
Reviewed-by: Ben Gardon <bgardon@google.com>
Message-Id: <20211111001257.1446428-5-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: selftests: Wait for all vCPU to be created before entering guest mode
David Matlack [Thu, 11 Nov 2021 00:12:56 +0000 (00:12 +0000)]
KVM: selftests: Wait for all vCPU to be created before entering guest mode

Thread creation requires taking the mmap_sem in write mode, which causes
vCPU threads running in guest mode to block while they are populating
memory. Fix this by waiting for all vCPU threads to be created and start
running before entering guest mode on any one vCPU thread.

This substantially improves the "Populate memory time" when using 1GiB
pages since it allows all vCPUs to zero pages in parallel rather than
blocking because a writer is waiting (which is waiting for another vCPU
that is busy zeroing a 1GiB page).

Before:

  $ ./dirty_log_perf_test -v256 -s anonymous_hugetlb_1gb
  ...
  Populate memory time: 52.811184013s

After:

  $ ./dirty_log_perf_test -v256 -s anonymous_hugetlb_1gb
  ...
  Populate memory time: 10.204573342s

Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20211111001257.1446428-4-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: selftests: Move vCPU thread creation and joining to common helpers
David Matlack [Thu, 11 Nov 2021 00:12:55 +0000 (00:12 +0000)]
KVM: selftests: Move vCPU thread creation and joining to common helpers

Move vCPU thread creation and joining to common helper functions. This
is in preparation for the next commit which ensures that all vCPU
threads are fully created before entering guest mode on any one
vCPU.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
Reviewed-by: Ben Gardon <bgardon@google.com>
Message-Id: <20211111001257.1446428-3-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: selftests: Start at iteration 0 instead of -1
David Matlack [Thu, 11 Nov 2021 00:12:54 +0000 (00:12 +0000)]
KVM: selftests: Start at iteration 0 instead of -1

Start at iteration 0 instead of -1 to avoid having to initialize
vcpu_last_completed_iteration when setting up vCPU threads. This
simplifies the next commit where we move vCPU thread initialization
out to a common helper.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20211111001257.1446428-2-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: selftests: Sync perf_test_args to guest during VM creation
Sean Christopherson [Thu, 11 Nov 2021 00:03:10 +0000 (00:03 +0000)]
KVM: selftests: Sync perf_test_args to guest during VM creation

Copy perf_test_args to the guest during VM creation instead of relying on
the caller to do so at their leisure.  Ideally, tests wouldn't even be
able to modify perf_test_args, i.e. they would have no motivation to do
the sync, but enforcing that is arguably a net negative for readability.

No functional change intended.

[Set wr_fract=1 by default and add helper to override it since the new
 access_tracking_perf_test needs to set it dynamically.]

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Reviewed-by: Ben Gardon <bgardon@google.com>
Message-Id: <20211111000310.1435032-13-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: selftests: Fill per-vCPU struct during "perf_test" VM creation
Sean Christopherson [Thu, 11 Nov 2021 00:03:09 +0000 (00:03 +0000)]
KVM: selftests: Fill per-vCPU struct during "perf_test" VM creation

Fill the per-vCPU args when creating the perf_test VM instead of having
the caller do so.  This helps ensure that any adjustments to the number
of pages (and thus vcpu_memory_bytes) are reflected in the per-VM args.
Automatically filling the per-vCPU args will also allow a future patch
to do the sync to the guest during creation.

Signed-off-by: Sean Christopherson <seanjc@google.com>
[Updated access_tracking_perf_test as well.]
Signed-off-by: David Matlack <dmatlack@google.com>
Reviewed-by: Ben Gardon <bgardon@google.com>
Message-Id: <20211111000310.1435032-12-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: selftests: Create VM with adjusted number of guest pages for perf tests
Sean Christopherson [Thu, 11 Nov 2021 00:03:08 +0000 (00:03 +0000)]
KVM: selftests: Create VM with adjusted number of guest pages for perf tests

Use the already computed guest_num_pages when creating the so called
extra VM pages for a perf test, and add a comment explaining why the
pages are allocated as extra pages.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Ben Gardon <bgardon@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20211111000310.1435032-11-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: selftests: Remove perf_test_args.host_page_size
Sean Christopherson [Thu, 11 Nov 2021 00:03:07 +0000 (00:03 +0000)]
KVM: selftests: Remove perf_test_args.host_page_size

Remove perf_test_args.host_page_size and instead use getpagesize() so
that it's somewhat obvious that, for tests that care about the host page
size, they care about the system page size, not the hardware page size,
e.g. that the logic is unchanged if hugepages are in play.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Ben Gardon <bgardon@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20211111000310.1435032-10-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: selftests: Move per-VM GPA into perf_test_args
Sean Christopherson [Thu, 11 Nov 2021 00:03:06 +0000 (00:03 +0000)]
KVM: selftests: Move per-VM GPA into perf_test_args

Move the per-VM GPA into perf_test_args instead of storing it as a
separate global variable.  It's not obvious that guest_test_phys_mem
holds a GPA, nor that it's connected/coupled with per_vcpu->gpa.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Ben Gardon <bgardon@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20211111000310.1435032-9-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: selftests: Use perf util's per-vCPU GPA/pages in demand paging test
Sean Christopherson [Thu, 11 Nov 2021 00:03:05 +0000 (00:03 +0000)]
KVM: selftests: Use perf util's per-vCPU GPA/pages in demand paging test

Grab the per-vCPU GPA and number of pages from perf_util in the demand
paging test instead of duplicating perf_util's calculations.

Note, this may or may not result in a functional change.  It's not clear
that the test's calculations are guaranteed to yield the same value as
perf_util, e.g. if guest_percpu_mem_size != vcpu_args->pages.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Ben Gardon <bgardon@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20211111000310.1435032-8-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: selftests: Capture per-vCPU GPA in perf_test_vcpu_args
Sean Christopherson [Thu, 11 Nov 2021 00:03:04 +0000 (00:03 +0000)]
KVM: selftests: Capture per-vCPU GPA in perf_test_vcpu_args

Capture the per-vCPU GPA in perf_test_vcpu_args so that tests can get
the GPA without having to calculate the GPA on their own.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Ben Gardon <bgardon@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20211111000310.1435032-7-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: selftests: Use shorthand local var to access struct perf_tests_args
Sean Christopherson [Thu, 11 Nov 2021 00:03:03 +0000 (00:03 +0000)]
KVM: selftests: Use shorthand local var to access struct perf_tests_args

Use 'pta' as a local pointer to the global perf_tests_args in order to
shorten line lengths and make the code borderline readable.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Ben Gardon <bgardon@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20211111000310.1435032-6-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: selftests: Require GPA to be aligned when backed by hugepages
Sean Christopherson [Thu, 11 Nov 2021 00:03:02 +0000 (00:03 +0000)]
KVM: selftests: Require GPA to be aligned when backed by hugepages

Assert that the GPA for a memslot backed by a hugepage is aligned to
the hugepage size and fix perf_test_util accordingly.  Lack of GPA
alignment prevents KVM from backing the guest with hugepages, e.g. x86's
write-protection of hugepages when dirty logging is activated is
otherwise not exercised.

Add a comment explaining that guest_page_size is for non-huge pages to
try and avoid confusion about what it actually tracks.

Cc: Ben Gardon <bgardon@google.com>
Cc: Yanan Wang <wangyanan55@huawei.com>
Cc: Andrew Jones <drjones@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Aaron Lewis <aaronlewis@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
[Used get_backing_src_pagesz() to determine alignment dynamically.]
Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20211111000310.1435032-5-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: selftests: Assert mmap HVA is aligned when using HugeTLB
Sean Christopherson [Thu, 11 Nov 2021 00:03:01 +0000 (00:03 +0000)]
KVM: selftests: Assert mmap HVA is aligned when using HugeTLB

Manually padding and aligning the mmap region is only needed when using
THP. When using HugeTLB, mmap will always return an address aligned to
the HugeTLB page size. Add a comment to clarify this and assert the mmap
behavior for HugeTLB.

[Removed requirement that HugeTLB mmaps must be padded per Yanan's
 feedback and added assertion that mmap returns aligned addresses
 when using HugeTLB.]

Cc: Ben Gardon <bgardon@google.com>
Cc: Yanan Wang <wangyanan55@huawei.com>
Cc: Andrew Jones <drjones@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Aaron Lewis <aaronlewis@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20211111000310.1435032-4-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: selftests: Expose align() helpers to tests
Sean Christopherson [Thu, 11 Nov 2021 00:03:00 +0000 (00:03 +0000)]
KVM: selftests: Expose align() helpers to tests

Refactor align() to work with non-pointers and split into separate
helpers for aligning up vs. down. Add align_ptr_up() for use with
pointers. Expose all helpers so that they can be used by tests and/or
other utilities.  The align_down() helper in particular will be used to
ensure gpa alignment for hugepages.

No functional change intended.

[Added sepearate up/down helpers and replaced open-coded alignment
 bit math throughout the KVM selftests.]

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Reviewed-by: Ben Gardon <bgardon@google.com>
Message-Id: <20211111000310.1435032-3-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: selftests: Explicitly state indicies for vm_guest_mode_params array
Sean Christopherson [Thu, 11 Nov 2021 00:02:59 +0000 (00:02 +0000)]
KVM: selftests: Explicitly state indicies for vm_guest_mode_params array

Explicitly state the indices when populating vm_guest_mode_params to
make it marginally easier to visualize what's going on.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Ben Gardon <bgardon@google.com>
[Added indices for new guest modes.]
Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20211111000310.1435032-2-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: selftests: Add event channel upcall support to xen_shinfo_test
David Woodhouse [Mon, 15 Nov 2021 16:50:22 +0000 (16:50 +0000)]
KVM: selftests: Add event channel upcall support to xen_shinfo_test

When I first looked at this, there was no support for guest exception
handling in the KVM selftests. In fact it was merged into 5.10 before
the Xen support got merged in 5.11, and I could have used it from the
start.

Hook it up now, to exercise the Xen upcall delivery. I'm about to make
things a bit more interesting by handling the full 2level event channel
stuff in-kernel on top of the basic vector injection that we already
have, and I'll want to build more tests on top.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <20211115165030.7422-3-dwmw2@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agomips: lantiq: add support for clk_get_parent()
Randy Dunlap [Mon, 15 Nov 2021 01:20:51 +0000 (17:20 -0800)]
mips: lantiq: add support for clk_get_parent()

Provide a simple implementation of clk_get_parent() in the
lantiq subarch so that callers of it will build without errors.

Fixes this build error:
ERROR: modpost: "clk_get_parent" [drivers/iio/adc/ingenic-adc.ko] undefined!

Fixes: 0d7c54bcdab4 ("MIPS: Lantiq: Add initial support for Lantiq SoCs")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Suggested-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Cc: linux-mips@vger.kernel.org
Cc: John Crispin <john@phrozen.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Jonathan Cameron <jic23@kernel.org>
Cc: linux-iio@vger.kernel.org
Cc: Russell King <linux@armlinux.org.uk>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: John Crispin <john@phrozen.org>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
3 years agomips: bcm63xx: add support for clk_get_parent()
Randy Dunlap [Mon, 15 Nov 2021 00:42:18 +0000 (16:42 -0800)]
mips: bcm63xx: add support for clk_get_parent()

BCM63XX selects HAVE_LEGACY_CLK but does not provide/support
clk_get_parent(), so add a simple implementation of that
function so that callers of it will build without errors.

Fixes these build errors:

mips-linux-ld: drivers/iio/adc/ingenic-adc.o: in function `jz4770_adc_init_clk_div':
ingenic-adc.c:(.text+0xe4): undefined reference to `clk_get_parent'
mips-linux-ld: drivers/iio/adc/ingenic-adc.o: in function `jz4725b_adc_init_clk_div':
ingenic-adc.c:(.text+0x1b8): undefined reference to `clk_get_parent'

Fixes: 6c22f5913d59 ("MIPS: BCM63xx: Add support for the Broadcom BCM63xx family of SOCs." )
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: kernel test robot <lkp@intel.com>
Suggested-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Cc: Artur Rojek <contact@artur-rojek.eu>
Cc: Paul Cercueil <paul@crapouillou.net>
Cc: linux-mips@vger.kernel.org
Cc: Jonathan Cameron <jic23@kernel.org>
Cc: Lars-Peter Clausen <lars@metafoo.de>
Cc: linux-iio@vger.kernel.org
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: bcm-kernel-feedback-list@broadcom.com
Cc: Jonas Gorski <jonas.gorski@gmail.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
3 years agoMIPS: generic/yamon-dt: fix uninitialized variable error
Colin Ian King [Wed, 10 Nov 2021 23:28:24 +0000 (23:28 +0000)]
MIPS: generic/yamon-dt: fix uninitialized variable error

In the case where fw_getenv returns an error when fetching values
for ememsizea and memsize then variable phys_memsize is not assigned
a variable and will be uninitialized on a zero check of phys_memsize.
Fix this by initializing phys_memsize to zero.

Cleans up cppcheck error:
arch/mips/generic/yamon-dt.c:100:7: error: Uninitialized variable: phys_memsize [uninitvar]

Fixes: 281b379d6742 ("MIPS: generic/yamon-dt: Support > 256MB of RAM")
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
3 years agoMIPS: syscalls: Wire up futex_waitv syscall
Wang Haojun [Wed, 3 Nov 2021 02:55:21 +0000 (10:55 +0800)]
MIPS: syscalls: Wire up futex_waitv syscall

Wire up the futex_waitv syscall.

Fix Build warning: #warning syscall futex_waitv not implemented [-Wcpp]

Signed-off-by: Wang Haojun <wanghaojun@loongson.cn>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
3 years agoNFSD: Fix exposure in nfsd4_decode_bitmap()
Chuck Lever [Sun, 14 Nov 2021 20:16:04 +0000 (15:16 -0500)]
NFSD: Fix exposure in nfsd4_decode_bitmap()

rtm@csail.mit.edu reports:
> nfsd4_decode_bitmap4() will write beyond bmval[bmlen-1] if the RPC
> directs it to do so. This can cause nfsd4_decode_state_protect4_a()
> to write client-supplied data beyond the end of
> nfsd4_exchange_id.spo_must_allow[] when called by
> nfsd4_decode_exchange_id().

Rewrite the loops so nfsd4_decode_bitmap() cannot iterate beyond
@bmlen.

Reported by: rtm@csail.mit.edu
Fixes: a5e5b8070e89 ("NFSD: Replace READ* macros in nfsd4_decode_fattr()")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
3 years agoprintk: Remove printk.h inclusion in percpu.h
Andy Shevchenko [Fri, 12 Nov 2021 14:07:49 +0000 (16:07 +0200)]
printk: Remove printk.h inclusion in percpu.h

After the commit a95813d717b8 ("printk/nmi: generic solution for safe
printk in NMI") the printk.h is not needed anymore in percpu.h.

Moreover `make headerdep` complains (an excerpt)

In file included from linux/printk.h,
                 from linux/dynamic_debug.h:188
                 from linux/printk.h:559 <-- here
                 from linux/percpu.h:9
                 from linux/idr.h:17
include/net/9p/client.h:13: warning: recursive header inclusion

Yeah, it's not a root cause of this, but removing will help to reduce
the noise.

Fixes: a95813d717b8 ("printk/nmi: generic solution for safe printk in NMI")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Dennis Zhou <dennis@kernel.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20211112140749.80042-1-andriy.shevchenko@linux.intel.com
3 years agox86/hyperv: Move required MSRs check to initial platform probing
Sean Christopherson [Thu, 4 Nov 2021 18:22:39 +0000 (18:22 +0000)]
x86/hyperv: Move required MSRs check to initial platform probing

Explicitly check for MSR_HYPERCALL and MSR_VP_INDEX support when probing
for running as a Hyper-V guest instead of waiting until hyperv_init() to
detect the bogus configuration.  Add messages to give the admin a heads
up that they are likely running on a broken virtual machine setup.

At best, silently disabling Hyper-V is confusing and difficult to debug,
e.g. the kernel _says_ it's using all these fancy Hyper-V features, but
always falls back to the native versions.  At worst, the half baked setup
will crash/hang the kernel.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20211104182239.1302956-3-seanjc@google.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
3 years agox86/hyperv: Fix NULL deref in set_hv_tscchange_cb() if Hyper-V setup fails
Sean Christopherson [Thu, 4 Nov 2021 18:22:38 +0000 (18:22 +0000)]
x86/hyperv: Fix NULL deref in set_hv_tscchange_cb() if Hyper-V setup fails

Check for a valid hv_vp_index array prior to derefencing hv_vp_index when
setting Hyper-V's TSC change callback.  If Hyper-V setup failed in
hyperv_init(), the kernel will still report that it's running under
Hyper-V, but will have silently disabled nearly all functionality.

  BUG: kernel NULL pointer dereference, address: 0000000000000010
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 0 P4D 0
  Oops: 0000 [#1] SMP
  CPU: 4 PID: 1 Comm: swapper/0 Not tainted 5.15.0-rc2+ #75
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
  RIP: 0010:set_hv_tscchange_cb+0x15/0xa0
  Code: <8b> 04 82 8b 15 12 17 85 01 48 c1 e0 20 48 0d ee 00 01 00 f6 c6 08
  ...
  Call Trace:
   kvm_arch_init+0x17c/0x280
   kvm_init+0x31/0x330
   vmx_init+0xba/0x13a
   do_one_initcall+0x41/0x1c0
   kernel_init_freeable+0x1f2/0x23b
   kernel_init+0x16/0x120
   ret_from_fork+0x22/0x30

Fixes: 7be53aec08ef ("x86/hyperv: Reenlightenment notifications support")
Cc: stable@vger.kernel.org
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Link: https://lore.kernel.org/r/20211104182239.1302956-2-seanjc@google.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
3 years agoDrivers: hv: balloon: Use VMBUS_RING_SIZE() wrapper for dm_ring_size
Boqun Feng [Mon, 1 Nov 2021 15:00:26 +0000 (23:00 +0800)]
Drivers: hv: balloon: Use VMBUS_RING_SIZE() wrapper for dm_ring_size

Baihua reported an error when boot an ARM64 guest with PAGE_SIZE=64k and
BALLOON is enabled:

hv_vmbus: registering driver hv_balloon
hv_vmbus: probe failed for device 1eccfd72-4b41-45ef-b73a-4a6e44c12924 (-22)

The cause of this is that the ringbuffer size for hv_balloon is not
adjusted with VMBUS_RING_SIZE(), which makes the size not large enough
for ringbuffers on guest with PAGE_SIZE=64k. Therefore use
VMBUS_RING_SIZE() to calculate the ringbuffer size. Note that the old
size (20 * 1024) counts a 4k header in the total size, while
VMBUS_RING_SIZE() expects the parameter as the payload size, so use
16 * 1024.

Cc: <stable@vger.kernel.org> # 5.15.x
Reported-by: Baihua Lu <baihua.lu@microsoft.com>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Tested-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Link: https://lore.kernel.org/r/20211101150026.736124-1-boqun.feng@gmail.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
3 years agodocs: filesystems: Fix grammatical error "with" to "which"
Wasin Thonkaew [Wed, 3 Nov 2021 19:35:04 +0000 (19:35 +0000)]
docs: filesystems: Fix grammatical error "with" to "which"

Signed-off-by: Wasin Thonkaew <wasin@wasin.io>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
3 years agodoc/zh_CN: fix a translation error in management-style
Alex Shi [Wed, 10 Nov 2021 12:02:13 +0000 (20:02 +0800)]
doc/zh_CN: fix a translation error in management-style

'The name of the game' means the most important part of an activity, so
we should translate it by the meaning instead of the words.

Suggested-by: Xinyong Wang <wang.xy.chn@gmail.com>
Signed-off-by: Alex Shi <alexs@kernel.org>
Reviewed-by: Yanteng Si <siyanteng@loongson.cn>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
3 years agodocs: ftrace: fix the wrong path of tracefs
Zhaoyu Liu [Sat, 13 Nov 2021 13:37:34 +0000 (21:37 +0800)]
docs: ftrace: fix the wrong path of tracefs

Delete "tracing" due to it has been included in /proc/mounts.
Delete "echo nop > $tracefs/tracing/current_tracer", maybe
this command is redundant.

Signed-off-by: Zhaoyu Liu <zackary.liu.pro@gmail.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
3 years agoDocumentation: arm: marvell: Fix link to armada_1000_pb.pdf document
Pali Rohár [Fri, 8 Oct 2021 16:01:05 +0000 (18:01 +0200)]
Documentation: arm: marvell: Fix link to armada_1000_pb.pdf document

File armada_1000_pb.pdf is not available on Marvell website anymore.
So update link to webarchive where is backup copy.

Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
3 years agoDocumentation: arm: marvell: Put Armada XP section between Armada 370 and 375
Pali Rohár [Fri, 8 Oct 2021 16:01:04 +0000 (18:01 +0200)]
Documentation: arm: marvell: Put Armada XP section between Armada 370 and 375

From evolution and feature point of view Armada XP belongs between Armada
370 and Armada 375 families.

Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
3 years agoDocumentation: arm: marvell: Add some links to homepage / product infos
Pali Rohár [Fri, 8 Oct 2021 16:01:03 +0000 (18:01 +0200)]
Documentation: arm: marvell: Add some links to homepage / product infos

Webarchive contains some useful resources like product info or links to
other documents.

Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
3 years agodocs: Update Sphinx requirements
Akira Yokosawa [Wed, 10 Nov 2021 09:16:48 +0000 (18:16 +0900)]
docs: Update Sphinx requirements

Commit b2eaaa4ad7d7 ("Move our minimum Sphinx version to 1.7") raised
the minimum version to 1.7.

For pdfdocs, sphinx_pre_install says:

    note: If you want pdf, you need at least Sphinx 2.4.4.

, and current requirements.txt installs Sphinx 2.4.4.

Update Sphinx versions mentioned in docs and remove a note on earlier
Sphinx versions.

Update zh_CN and it_IT translations as well.

Signed-off-by: Akira Yokosawa <akiyks@gmail.com>
Cc: Federico Vaga <federico.vaga@vaga.pv.it>
Cc: Alex Shi <alexs@kernel.org>
Reviewed-by: Alex Shi <alexs@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
3 years agoMerge tag 'trace-v5.16-5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt...
Linus Torvalds [Mon, 15 Nov 2021 03:07:19 +0000 (19:07 -0800)]
Merge tag 'trace-v5.16-5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fix from Steven Rostedt:
 "Update to tracing histogram variable string copy

  A fix to only copy the size of the field to the histogram string did
  not take into account that the size can be larger than the storage"

* tag 'trace-v5.16-5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Add length protection to histogram string copies

3 years agokbuild: Fix -Wimplicit-fallthrough=5 error for GCC 5.x and 6.x
Gustavo A. R. Silva [Mon, 15 Nov 2021 02:48:44 +0000 (20:48 -0600)]
kbuild: Fix -Wimplicit-fallthrough=5 error for GCC 5.x and 6.x

-Wimplicit-fallthrough=5 was under cc-option because it was only
available in GCC 7.x and newer so the build is now broken for GCC 5.x
and 6.x:

gcc: error: unrecognized command line option '-Wimplicit-fallthrough=5';
did you mean '-Wno-fallthrough'?

Fix this by moving -Wimplicit-fallthrough=5 under cc-option.

Fixes: cd9f1e404be7 ("kconfig: Add support for -Wimplicit-fallthrough")
Reported-by: Nathan Chancellor <nathan@kernel.org>
Co-developed-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agotracing: Add length protection to histogram string copies
Steven Rostedt (VMware) [Sun, 14 Nov 2021 18:28:34 +0000 (13:28 -0500)]
tracing: Add length protection to histogram string copies

The string copies to the histogram storage has a max size of 256 bytes
(defined by MAX_FILTER_STR_VAL). Only the string size of the event field
needs to be copied to the event storage, but no more than what is in the
event storage. Although nothing should be bigger than 256 bytes, there's
no protection against overwriting of the storage if one day there is.

Copy no more than the destination size, and enforce it.

Also had to turn MAX_FILTER_STR_VAL into an unsigned int, to keep the
min() comparison of the string sizes of comparable types.

Link: https://lore.kernel.org/all/CAHk-=wjREUihCGrtRBwfX47y_KrLCGjiq3t6QtoNJpmVrAEb1w@mail.gmail.com/
Link: https://lkml.kernel.org/r/20211114132834.183429a4@rorschach.local.home
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Tom Zanussi <zanussi@kernel.org>
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Fixes: b2e23b4fbc2a ("tracing/histogram: Do not copy the fixed-size char array field over the field size")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
3 years agoLinux 5.16-rc1
Linus Torvalds [Sun, 14 Nov 2021 21:56:52 +0000 (13:56 -0800)]
Linux 5.16-rc1

3 years agokconfig: Add support for -Wimplicit-fallthrough
Gustavo A. R. Silva [Sun, 14 Nov 2021 00:57:25 +0000 (18:57 -0600)]
kconfig: Add support for -Wimplicit-fallthrough

Add Kconfig support for -Wimplicit-fallthrough for both GCC and Clang.

The compiler option is under configuration CC_IMPLICIT_FALLTHROUGH,
which is enabled by default.

Special thanks to Nathan Chancellor who fixed the Clang bug[1][2]. This
bugfix only appears in Clang 14.0.0, so older versions still contain
the bug and -Wimplicit-fallthrough won't be enabled for them, for now.

This concludes a long journey and now we are finally getting rid
of the unintentional fallthrough bug-class in the kernel, entirely. :)

Link: https://github.com/llvm/llvm-project/commit/9ed4a94d6451046a51ef393cd62f00710820a7e8
Link: https://bugs.llvm.org/show_bug.cgi?id=51094
Link: https://github.com/KSPP/linux/issues/115
Link: https://github.com/ClangBuiltLinux/linux/issues/236
Co-developed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Co-developed-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoMerge tag 'xfs-5.16-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Linus Torvalds [Sun, 14 Nov 2021 20:18:22 +0000 (12:18 -0800)]
Merge tag 'xfs-5.16-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs cleanups from Darrick Wong:
 "The most 'exciting' aspect of this branch is that the xfsprogs
  maintainer and I have worked through the last of the code
  discrepancies between kernel and userspace libxfs such that there are
  no code differences between the two except for #includes.

  IOWs, diff suffices to demonstrate that the userspace tools behave the
  same as the kernel, and kernel-only bits are clearly marked in the
  /kernel/ source code instead of just the userspace source.

  Summary:

   - Clean up open-coded swap() calls.

   - A little bit of #ifdef golf to complete the reunification of the
     kernel and userspace libxfs source code"

* tag 'xfs-5.16-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: sync xfs_btree_split macros with userspace libxfs
  xfs: #ifdef out perag code for userspace
  xfs: use swap() to make dabtree code cleaner

3 years agoMerge tag 'for-5.16/parisc-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller...
Linus Torvalds [Sun, 14 Nov 2021 19:53:59 +0000 (11:53 -0800)]
Merge tag 'for-5.16/parisc-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux

Pull more parisc fixes from Helge Deller:
 "Fix a build error in stracktrace.c, fix resolving of addresses to
  function names in backtraces, fix single-stepping in assembly code and
  flush userspace pte's when using set_pte_at()"

* tag 'for-5.16/parisc-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  parisc/entry: fix trace test in syscall exit path
  parisc: Flush kernel data mapping in set_pte_at() when installing pte for user page
  parisc: Fix implicit declaration of function '__kernel_text_address'
  parisc: Fix backtrace to always include init funtion names

3 years agoMerge tag 'sh-for-5.16' of git://git.libc.org/linux-sh
Linus Torvalds [Sun, 14 Nov 2021 19:37:49 +0000 (11:37 -0800)]
Merge tag 'sh-for-5.16' of git://git.libc.org/linux-sh

Pull arch/sh updates from Rich Felker.

* tag 'sh-for-5.16' of git://git.libc.org/linux-sh:
  sh: pgtable-3level: Fix cast to pointer from integer of different size
  sh: fix READ/WRITE redefinition warnings
  sh: define __BIG_ENDIAN for math-emu
  sh: math-emu: drop unused functions
  sh: fix kconfig unmet dependency warning for FRAME_POINTER
  sh: Cleanup about SPARSE_IRQ
  sh: kdump: add some attribute to function
  maple: fix wrong return value of maple_bus_init().
  sh: boot: avoid unneeded rebuilds under arch/sh/boot/compressed/
  sh: boot: add intermediate vmlinux.bin* to targets instead of extra-y
  sh: boards: Fix the cacography in irq.c
  sh: check return code of request_irq
  sh: fix trivial misannotations

3 years agoMerge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm
Linus Torvalds [Sun, 14 Nov 2021 19:30:50 +0000 (11:30 -0800)]
Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm

Pull ARM fixes from Russell King:

 - Fix early_iounmap

 - Drop cc-option fallbacks for architecture selection

* tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
  ARM: 9156/1: drop cc-option fallbacks for architecture selection
  ARM: 9155/1: fix early early_iounmap()

3 years agoMerge tag 'devicetree-fixes-for-5.16-1' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 14 Nov 2021 19:11:51 +0000 (11:11 -0800)]
Merge tag 'devicetree-fixes-for-5.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux

Pull devicetree fixes from Rob Herring:

 - Two fixes due to DT node name changes on Arm, Ltd. boards

 - Treewide rename of Ingenic CGU headers

 - Update ST email addresses

 - Remove Netlogic DT bindings

 - Dropping few more cases of redundant 'maxItems' in schemas

 - Convert toshiba,tc358767 bridge binding to schema

* tag 'devicetree-fixes-for-5.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
  dt-bindings: watchdog: sunxi: fix error in schema
  bindings: media: venus: Drop redundant maxItems for power-domain-names
  dt-bindings: Remove Netlogic bindings
  clk: versatile: clk-icst: Ensure clock names are unique
  of: Support using 'mask' in making device bus id
  dt-bindings: treewide: Update @st.com email address to @foss.st.com
  dt-bindings: media: Update maintainers for st,stm32-hwspinlock.yaml
  dt-bindings: media: Update maintainers for st,stm32-cec.yaml
  dt-bindings: mfd: timers: Update maintainers for st,stm32-timers
  dt-bindings: timer: Update maintainers for st,stm32-timer
  dt-bindings: i2c: imx: hardware do not restrict clock-frequency to only 100 and 400 kHz
  dt-bindings: display: bridge: Convert toshiba,tc358767.txt to yaml
  dt-bindings: Rename Ingenic CGU headers to ingenic,*.h

3 years agoMerge tag 'timers-urgent-2021-11-14' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 14 Nov 2021 18:43:38 +0000 (10:43 -0800)]
Merge tag 'timers-urgent-2021-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer fix from Thomas Gleixner:
 "A single fix for POSIX CPU timers to address a problem where POSIX CPU
  timer delivery stops working for a new child task because
  copy_process() copies state information which is only valid for the
  parent task"

* tag 'timers-urgent-2021-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  posix-cpu-timers: Clear task::posix_cputimers_work in copy_process()