]> git.baikalelectronics.ru Git - kernel.git/log
kernel.git
2 years agoMerge branch kvm-arm64/misc-5.18 into kvmarm-master/next
Marc Zyngier [Wed, 9 Mar 2022 11:16:48 +0000 (11:16 +0000)]
Merge branch kvm-arm64/misc-5.18 into kvmarm-master/next

* kvm-arm64/misc-5.18:
  : .
  : Misc fixes for KVM/arm64 5.18:
  :
  : - Drop unused kvm parameter to kvm_psci_version()
  :
  : - Implement CONFIG_DEBUG_LIST at EL2
  :
  : - Make CONFIG_ARM64_ERRATUM_2077057 default y
  :
  : - Only do the interrupt dance if we have exited because of an interrupt
  :
  : - Remove traces of 32bit ARM host support from the documentation
  : .
  Documentation: KVM: Update documentation to indicate KVM is arm64-only
  KVM: arm64: Only open the interrupt window on exit due to an interrupt
  KVM: arm64: Enable Cortex-A510 erratum 2077057 by default

Signed-off-by: Marc Zyngier <maz@kernel.org>
2 years agoDocumentation: KVM: Update documentation to indicate KVM is arm64-only
Oliver Upton [Tue, 8 Mar 2022 17:28:57 +0000 (17:28 +0000)]
Documentation: KVM: Update documentation to indicate KVM is arm64-only

KVM support for 32-bit ARM hosts (KVM/arm) has been removed from the
kernel since commit cd6aa718d093 ("arm: Remove 32bit KVM host
support"). There still exists some remnants of the old architecture in
the KVM documentation.

Remove all traces of 32-bit host support from the documentation. Note
that AArch32 guests are still supported.

Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Oliver Upton <oupton@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220308172856.2997250-1-oupton@google.com
2 years agoKVM: arm64: Only open the interrupt window on exit due to an interrupt
Marc Zyngier [Fri, 4 Mar 2022 12:04:49 +0000 (12:04 +0000)]
KVM: arm64: Only open the interrupt window on exit due to an interrupt

Now that we properly account for interrupts taken whilst the guest
was running, it becomes obvious that there is no need to open
this accounting window if we didn't exit because of an interrupt.

This saves a number of system register accesses and other barriers
if we exited for any other reason (such as a trap, for example).

Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20220304135914.1464721-1-maz@kernel.org
2 years agoKVM: arm64: Enable Cortex-A510 erratum 2077057 by default
Mark Brown [Fri, 25 Feb 2022 18:46:58 +0000 (18:46 +0000)]
KVM: arm64: Enable Cortex-A510 erratum 2077057 by default

The recently added configuration option for Cortex A510 erratum 2077057 does
not have a "default y" unlike other errata fixes. This appears to simply be
an oversight since the help text suggests enabling the option if unsure and
there's nothing in the commit log to suggest it is intentional.

Fixes: cf1948d2dbeb5 ("KVM: arm64: Workaround Cortex-A510's single-step and PAC trap errata")
Signed-off-by: Mark Brown <broonie@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220225184658.172527-1-broonie@kernel.org
2 years agoMerge branch kvm-arm64/psci-1.1 into kvmarm-master/next
Marc Zyngier [Fri, 25 Feb 2022 13:49:48 +0000 (13:49 +0000)]
Merge branch kvm-arm64/psci-1.1 into kvmarm-master/next

* kvm-arm64/psci-1.1:
  : .
  : Limited PSCI-1.1 support from Will Deacon:
  :
  : This small series exposes the PSCI SYSTEM_RESET2 call to guests, which
  : allows the propagation of a "reset_type" and a "cookie" back to the VMM.
  : Although Linux guests only ever pass 0 for the type ("SYSTEM_WARM_RESET"),
  : the vendor-defined range can be used by a bootloader to provide additional
  : information about the reset, such as an error code.
  : .
  KVM: arm64: Remove unneeded semicolons
  KVM: arm64: Indicate SYSTEM_RESET2 in kvm_run::system_event flags field
  KVM: arm64: Expose PSCI SYSTEM_RESET2 call to the guest
  KVM: arm64: Bump guest PSCI version to 1.1

Signed-off-by: Marc Zyngier <maz@kernel.org>
2 years agoKVM: arm64: Remove unneeded semicolons
Changcheng Deng [Wed, 23 Feb 2022 09:27:50 +0000 (09:27 +0000)]
KVM: arm64: Remove unneeded semicolons

Fix the following coccicheck review:
./arch/arm64/kvm/psci.c: 379: 3-4: Unneeded semicolon

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Changcheng Deng <deng.changcheng@zte.com.cn>
[maz: squashed another instance of the same issue in the patch]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20220223092750.1934130-1-deng.changcheng@zte.com.cn
Link: https://lore.kernel.org/r/20220225122922.GA19390@willie-the-truck
2 years agoKVM: arm64: Indicate SYSTEM_RESET2 in kvm_run::system_event flags field
Will Deacon [Mon, 21 Feb 2022 15:35:24 +0000 (15:35 +0000)]
KVM: arm64: Indicate SYSTEM_RESET2 in kvm_run::system_event flags field

When handling reset and power-off PSCI calls from the guest, we
initialise X0 to PSCI_RET_INTERNAL_FAILURE in case the VMM tries to
re-run the vCPU after issuing the call.

Unfortunately, this also means that the VMM cannot see which PSCI call
was issued and therefore cannot distinguish between PSCI SYSTEM_RESET
and SYSTEM_RESET2 calls, which is necessary in order to determine the
validity of the "reset_type" in X1.

Allocate bit 0 of the previously unused 'flags' field of the
system_event structure so that we can indicate the PSCI call used to
initiate the reset.

Cc: Marc Zyngier <maz@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Alexandru Elisei <alexandru.elisei@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220221153524.15397-4-will@kernel.org
2 years agoKVM: arm64: Expose PSCI SYSTEM_RESET2 call to the guest
Will Deacon [Mon, 21 Feb 2022 15:35:23 +0000 (15:35 +0000)]
KVM: arm64: Expose PSCI SYSTEM_RESET2 call to the guest

PSCI v1.1 introduces the optional SYSTEM_RESET2 call, which allows the
caller to provide a vendor-specific "reset type" and "cookie" to request
a particular form of reset or shutdown.

Expose this call to the guest and handle it in the same way as PSCI
SYSTEM_RESET, along with some basic range checking on the type argument.

Cc: Marc Zyngier <maz@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Alexandru Elisei <alexandru.elisei@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220221153524.15397-3-will@kernel.org
2 years agoKVM: arm64: Bump guest PSCI version to 1.1
Will Deacon [Mon, 21 Feb 2022 15:35:22 +0000 (15:35 +0000)]
KVM: arm64: Bump guest PSCI version to 1.1

Expose PSCI version v1.1 to the guest by default. The only difference
for now is that an updated version number is reported by PSCI_VERSION.

Cc: Marc Zyngier <maz@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Alexandru Elisei <alexandru.elisei@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220221153524.15397-2-will@kernel.org
2 years agoMerge branch kvm-arm64/pmu-bl into kvmarm-master/next
Marc Zyngier [Tue, 8 Feb 2022 17:54:41 +0000 (17:54 +0000)]
Merge branch kvm-arm64/pmu-bl into kvmarm-master/next

* kvm-arm64/pmu-bl:
  : .
  : Improve PMU support on heterogeneous systems, courtesy of Alexandru Elisei
  : .
  KVM: arm64: Refuse to run VCPU if the PMU doesn't match the physical CPU
  KVM: arm64: Add KVM_ARM_VCPU_PMU_V3_SET_PMU attribute
  KVM: arm64: Keep a list of probed PMUs
  KVM: arm64: Keep a per-VM pointer to the default PMU
  perf: Fix wrong name in comment for struct perf_cpu_context
  KVM: arm64: Do not change the PMU event filter after a VCPU has run

Signed-off-by: Marc Zyngier <maz@kernel.org>
2 years agoKVM: arm64: Refuse to run VCPU if the PMU doesn't match the physical CPU
Alexandru Elisei [Thu, 27 Jan 2022 16:17:59 +0000 (16:17 +0000)]
KVM: arm64: Refuse to run VCPU if the PMU doesn't match the physical CPU

Userspace can assign a PMU to a VCPU with the KVM_ARM_VCPU_PMU_V3_SET_PMU
device ioctl. If the VCPU is scheduled on a physical CPU which has a
different PMU, the perf events needed to emulate a guest PMU won't be
scheduled in and the guest performance counters will stop counting. Treat
it as an userspace error and refuse to run the VCPU in this situation.

Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220127161759.53553-7-alexandru.elisei@arm.com
2 years agoKVM: arm64: Add KVM_ARM_VCPU_PMU_V3_SET_PMU attribute
Alexandru Elisei [Thu, 27 Jan 2022 16:17:58 +0000 (16:17 +0000)]
KVM: arm64: Add KVM_ARM_VCPU_PMU_V3_SET_PMU attribute

When KVM creates an event and there are more than one PMUs present on the
system, perf_init_event() will go through the list of available PMUs and
will choose the first one that can create the event. The order of the PMUs
in this list depends on the probe order, which can change under various
circumstances, for example if the order of the PMU nodes change in the DTB
or if asynchronous driver probing is enabled on the kernel command line
(with the driver_async_probe=armv8-pmu option).

Another consequence of this approach is that on heteregeneous systems all
virtual machines that KVM creates will use the same PMU. This might cause
unexpected behaviour for userspace: when a VCPU is executing on the
physical CPU that uses this default PMU, PMU events in the guest work
correctly; but when the same VCPU executes on another CPU, PMU events in
the guest will suddenly stop counting.

Fortunately, perf core allows user to specify on which PMU to create an
event by using the perf_event_attr->type field, which is used by
perf_init_event() as an index in the radix tree of available PMUs.

Add the KVM_ARM_VCPU_PMU_V3_CTRL(KVM_ARM_VCPU_PMU_V3_SET_PMU) VCPU
attribute to allow userspace to specify the arm_pmu that KVM will use when
creating events for that VCPU. KVM will make no attempt to run the VCPU on
the physical CPUs that share the PMU, leaving it up to userspace to manage
the VCPU threads' affinity accordingly.

To ensure that KVM doesn't expose an asymmetric system to the guest, the
PMU set for one VCPU will be used by all other VCPUs. Once a VCPU has run,
the PMU cannot be changed in order to avoid changing the list of available
events for a VCPU, or to change the semantics of existing events.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220127161759.53553-6-alexandru.elisei@arm.com
2 years agoKVM: arm64: Keep a list of probed PMUs
Alexandru Elisei [Thu, 27 Jan 2022 16:17:57 +0000 (16:17 +0000)]
KVM: arm64: Keep a list of probed PMUs

The ARM PMU driver calls kvm_host_pmu_init() after probing to tell KVM that
a hardware PMU is available for guest emulation. Heterogeneous systems can
have more than one PMU present, and the callback gets called multiple
times, once for each of them. Keep track of all the PMUs available to KVM,
as they're going to be needed later.

Reviewed-by: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220127161759.53553-5-alexandru.elisei@arm.com
2 years agoKVM: arm64: Keep a per-VM pointer to the default PMU
Marc Zyngier [Thu, 27 Jan 2022 16:17:56 +0000 (16:17 +0000)]
KVM: arm64: Keep a per-VM pointer to the default PMU

As we are about to allow selection of the PMU exposed to a guest, start by
keeping track of the default one instead of only the PMU version.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Link: https://lore.kernel.org/r/20220127161759.53553-4-alexandru.elisei@arm.com
2 years agoperf: Fix wrong name in comment for struct perf_cpu_context
Alexandru Elisei [Thu, 27 Jan 2022 16:17:55 +0000 (16:17 +0000)]
perf: Fix wrong name in comment for struct perf_cpu_context

Commit eba5f9cdad07 ("performance counters: core code") added the perf
subsystem (then called Performance Counters) to Linux, creating the struct
perf_cpu_context. The comment for the struct referred to it as a "struct
perf_counter_cpu_context".

Commit 4cd1bf376bde ("perf: Do the big rename: Performance Counters ->
Performance Events") changed the comment to refer to a "struct
perf_event_cpu_context", which was still the wrong name for the struct.

Change the comment to say "struct perf_cpu_context".

CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220127161759.53553-3-alexandru.elisei@arm.com
2 years agoKVM: arm64: Do not change the PMU event filter after a VCPU has run
Marc Zyngier [Thu, 27 Jan 2022 16:17:54 +0000 (16:17 +0000)]
KVM: arm64: Do not change the PMU event filter after a VCPU has run

Userspace can specify which events a guest is allowed to use with the
KVM_ARM_VCPU_PMU_V3_FILTER attribute. The list of allowed events can be
identified by a guest from reading the PMCEID{0,1}_EL0 registers.

Changing the PMU event filter after a VCPU has run can cause reads of the
registers performed before the filter is changed to return different values
than reads performed with the new event filter in place. The architecture
defines the two registers as read-only, and this behaviour contradicts
that.

Keep track when the first VCPU has run and deny changes to the PMU event
filter to prevent this from happening.

Signed-off-by: Marc Zyngier <maz@kernel.org>
[ Alexandru E: Added commit message, updated ioctl documentation ]
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220127161759.53553-2-alexandru.elisei@arm.com
2 years agoMerge branch kvm-arm64/misc-5.18 into kvmarm-master/next
Marc Zyngier [Tue, 8 Feb 2022 15:29:28 +0000 (15:29 +0000)]
Merge branch kvm-arm64/misc-5.18 into kvmarm-master/next

* kvm-arm64/misc-5.18:
  : .
  : Misc fixes for KVM/arm64 5.18:
  :
  : - Drop unused kvm parameter to kvm_psci_version()
  :
  : - Implement CONFIG_DEBUG_LIST at EL2
  : .
  KVM: arm64: pkvm: Implement CONFIG_DEBUG_LIST at EL2
  KVM: arm64: Drop unused param from kvm_psci_version()

Signed-off-by: Marc Zyngier <maz@kernel.org>
2 years agoKVM: arm64: pkvm: Implement CONFIG_DEBUG_LIST at EL2
Keir Fraser [Mon, 31 Jan 2022 12:40:53 +0000 (12:40 +0000)]
KVM: arm64: pkvm: Implement CONFIG_DEBUG_LIST at EL2

Currently the check functions are stubbed out at EL2. Implement
versions suitable for the constrained EL2 environment.

Signed-off-by: Keir Fraser <keirf@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220131124114.3103337-1-keirf@google.com
2 years agoKVM: arm64: Drop unused param from kvm_psci_version()
Oliver Upton [Tue, 8 Feb 2022 01:27:05 +0000 (01:27 +0000)]
KVM: arm64: Drop unused param from kvm_psci_version()

kvm_psci_version() consumes a pointer to struct kvm in addition to a
vcpu pointer. Drop the kvm pointer as it is unused. While the comment
suggests the explicit kvm pointer was useful for calling from hyp, there
exist no such callsite in hyp.

Signed-off-by: Oliver Upton <oupton@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220208012705.640444-1-oupton@google.com
2 years agoMerge branch kvm-arm64/selftest/vgic-5.18 into kvmarm-master/next
Marc Zyngier [Tue, 8 Feb 2022 15:20:16 +0000 (15:20 +0000)]
Merge branch kvm-arm64/selftest/vgic-5.18 into kvmarm-master/next

* kvm-arm64/selftest/vgic-5.18:
  : .
  : A bunch of selftest fixes, courtesy of Ricardo Koller
  : .
  kvm: selftests: aarch64: use a tighter assert in vgic_poke_irq()
  kvm: selftests: aarch64: fix some vgic related comments
  kvm: selftests: aarch64: fix the failure check in kvm_set_gsi_routing_irqchip_check
  kvm: selftests: aarch64: pass vgic_irq guest args as a pointer
  kvm: selftests: aarch64: fix assert in gicv3_access_reg

Signed-off-by: Marc Zyngier <maz@kernel.org>
2 years agokvm: selftests: aarch64: use a tighter assert in vgic_poke_irq()
Ricardo Koller [Thu, 27 Jan 2022 03:08:58 +0000 (19:08 -0800)]
kvm: selftests: aarch64: use a tighter assert in vgic_poke_irq()

vgic_poke_irq() checks that the attr argument passed to the vgic device
ioctl is sane. Make this check tighter by moving it to after the last
attr update.

Signed-off-by: Ricardo Koller <ricarkol@google.com>
Reported-by: Reiji Watanabe <reijiw@google.com>
Cc: Andrew Jones <drjones@redhat.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220127030858.3269036-6-ricarkol@google.com
2 years agokvm: selftests: aarch64: fix some vgic related comments
Ricardo Koller [Thu, 27 Jan 2022 03:08:57 +0000 (19:08 -0800)]
kvm: selftests: aarch64: fix some vgic related comments

Fix the formatting of some comments and the wording of one of them (in
gicv3_access_reg).

Signed-off-by: Ricardo Koller <ricarkol@google.com>
Reported-by: Reiji Watanabe <reijiw@google.com>
Cc: Andrew Jones <drjones@redhat.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220127030858.3269036-5-ricarkol@google.com
2 years agokvm: selftests: aarch64: fix the failure check in kvm_set_gsi_routing_irqchip_check
Ricardo Koller [Thu, 27 Jan 2022 03:08:56 +0000 (19:08 -0800)]
kvm: selftests: aarch64: fix the failure check in kvm_set_gsi_routing_irqchip_check

kvm_set_gsi_routing_irqchip_check(expect_failure=true) is used to check
the error code returned by the kernel when trying to setup an invalid
gsi routing table. The ioctl fails if "pin >= KVM_IRQCHIP_NUM_PINS", so
kvm_set_gsi_routing_irqchip_check() should test the error only when
"intid >= KVM_IRQCHIP_NUM_PINS+32". The issue is that the test check is
"intid >= KVM_IRQCHIP_NUM_PINS", so for a case like "intid =
KVM_IRQCHIP_NUM_PINS" the test wrongly assumes that the kernel will
return an error.  Fix this by using the right check.

Signed-off-by: Ricardo Koller <ricarkol@google.com>
Reported-by: Reiji Watanabe <reijiw@google.com>
Cc: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220127030858.3269036-4-ricarkol@google.com
2 years agokvm: selftests: aarch64: pass vgic_irq guest args as a pointer
Ricardo Koller [Thu, 27 Jan 2022 03:08:55 +0000 (19:08 -0800)]
kvm: selftests: aarch64: pass vgic_irq guest args as a pointer

The guest in vgic_irq gets its arguments in a struct. This struct used
to fit nicely in a single register so vcpu_args_set() was able to pass
it by value by setting x0 with it. Unfortunately, this args struct grew
after some commits and some guest args became random (specically
kvm_supports_irqfd).

Fix this by passing the guest args as a pointer (after allocating some
guest memory for it).

Signed-off-by: Ricardo Koller <ricarkol@google.com>
Reported-by: Reiji Watanabe <reijiw@google.com>
Cc: Andrew Jones <drjones@redhat.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220127030858.3269036-3-ricarkol@google.com
2 years agokvm: selftests: aarch64: fix assert in gicv3_access_reg
Ricardo Koller [Thu, 27 Jan 2022 03:08:54 +0000 (19:08 -0800)]
kvm: selftests: aarch64: fix assert in gicv3_access_reg

The val argument in gicv3_access_reg can have any value when used for a
read, not necessarily 0.  Fix the assert by checking val only for
writes.

Signed-off-by: Ricardo Koller <ricarkol@google.com>
Reported-by: Reiji Watanabe <reijiw@google.com>
Cc: Andrew Jones <drjones@redhat.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220127030858.3269036-2-ricarkol@google.com
2 years agoMerge branch kvm-arm64/vmid-allocator into kvmarm-master/next
Marc Zyngier [Tue, 8 Feb 2022 14:58:38 +0000 (14:58 +0000)]
Merge branch kvm-arm64/vmid-allocator into kvmarm-master/next

* kvm-arm64/vmid-allocator:
  : .
  : VMID allocation rewrite from Shameerali Kolothum Thodi, paving the
  : way for pinned VMIDs and SVA.
  : .
  KVM: arm64: Make active_vmids invalid on vCPU schedule out
  KVM: arm64: Align the VMID allocation with the arm64 ASID
  KVM: arm64: Make VMID bits accessible outside of allocator
  KVM: arm64: Introduce a new VMID allocator for KVM

Signed-off-by: Marc Zyngier <maz@kernel.org>
2 years agoKVM: arm64: Make active_vmids invalid on vCPU schedule out
Shameer Kolothum [Mon, 22 Nov 2021 12:18:44 +0000 (12:18 +0000)]
KVM: arm64: Make active_vmids invalid on vCPU schedule out

Like ASID allocator, we copy the active_vmids into the
reserved_vmids on a rollover. But it's unlikely that
every CPU will have a vCPU as current task and we may
end up unnecessarily reserving the VMID space.

Hence, set active_vmids to an invalid one when scheduling
out a vCPU.

Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211122121844.867-5-shameerali.kolothum.thodi@huawei.com
2 years agoKVM: arm64: Align the VMID allocation with the arm64 ASID
Julien Grall [Mon, 22 Nov 2021 12:18:43 +0000 (12:18 +0000)]
KVM: arm64: Align the VMID allocation with the arm64 ASID

At the moment, the VMID algorithm will send an SGI to all the
CPUs to force an exit and then broadcast a full TLB flush and
I-Cache invalidation.

This patch uses the new VMID allocator. The benefits are:
   - Aligns with arm64 ASID algorithm.
   - CPUs are not forced to exit at roll-over. Instead,
     the VMID will be marked reserved and context invalidation
     is broadcasted. This will reduce the IPIs traffic.
   - More flexible to add support for pinned KVM VMIDs in
     the future.
   
With the new algo, the code is now adapted:
    - The call to update_vmid() will be done with preemption
      disabled as the new algo requires to store information
      per-CPU.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211122121844.867-4-shameerali.kolothum.thodi@huawei.com
2 years agoKVM: arm64: Make VMID bits accessible outside of allocator
Shameer Kolothum [Mon, 22 Nov 2021 12:18:42 +0000 (12:18 +0000)]
KVM: arm64: Make VMID bits accessible outside of allocator

Since we already set the kvm_arm_vmid_bits in the VMID allocator
init function, make it accessible outside as well so that it can
be used in the subsequent patch.

Suggested-by: Will Deacon <will@kernel.org>
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211122121844.867-3-shameerali.kolothum.thodi@huawei.com
2 years agoKVM: arm64: Introduce a new VMID allocator for KVM
Shameer Kolothum [Mon, 22 Nov 2021 12:18:41 +0000 (12:18 +0000)]
KVM: arm64: Introduce a new VMID allocator for KVM

A new VMID allocator for arm64 KVM use. This is based on
arm64 ASID allocator algorithm.

One major deviation from the ASID allocator is the way we
flush the context. Unlike ASID allocator, we expect less
frequent rollover in the case of VMIDs. Hence, instead of
marking the CPU as flush_pending and issuing a local context
invalidation on the next context switch, we  broadcast TLB
flush + I-cache invalidation over the inner shareable domain
on rollover.

Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211122121844.867-2-shameerali.kolothum.thodi@huawei.com
2 years agoMerge branch kvm-arm64/fpsimd-doc into kvmarm-master/next
Marc Zyngier [Tue, 8 Feb 2022 14:44:46 +0000 (14:44 +0000)]
Merge branch kvm-arm64/fpsimd-doc into kvmarm-master/next

* kvm-arm64/fpsimd-doc:
  : .
  : FPSIMD documentation update, courtesy of Mark Brown
  : .
  arm64/fpsimd: Clarify the purpose of using last in fpsimd_save()
  KVM: arm64: Add some more comments in kvm_hyp_handle_fpsimd()
  KVM: arm64: Add comments for context flush and sync callbacks

Signed-off-by: Marc Zyngier <maz@kernel.org>
2 years agoarm64/fpsimd: Clarify the purpose of using last in fpsimd_save()
Mark Brown [Mon, 24 Jan 2022 16:11:15 +0000 (16:11 +0000)]
arm64/fpsimd: Clarify the purpose of using last in fpsimd_save()

When saving the floating point context in fpsimd_save() we always reference
the state using last-> rather than using current->. Looking at the FP code
in isolation the reason for this is not entirely obvious, it's done because
when KVM is running it will bind the guest context and rely on the host
writing out the guest state on context switch away from the guest.

There's a slight trick here in that KVM still uses TIF_FOREIGN_FPSTATE and
TIF_SVE to communicate what needs to be saved, it maintains those flags
and restores them when it is done running the guest so that the normal
restore paths function when we return back to userspace.

Add a comment to explain this to help future readers work out what's going
on a bit faster.

Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220124161115.115200-1-broonie@kernel.org
2 years agoKVM: arm64: Add some more comments in kvm_hyp_handle_fpsimd()
Mark Brown [Mon, 24 Jan 2022 15:57:20 +0000 (15:57 +0000)]
KVM: arm64: Add some more comments in kvm_hyp_handle_fpsimd()

The handling for FPSIMD/SVE traps is multi stage and involves some trap
manipulation which isn't quite so immediately obvious as might be desired
so add a few more comments.

Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220124155720.3943374-3-broonie@kernel.org
2 years agoKVM: arm64: Add comments for context flush and sync callbacks
Mark Brown [Mon, 24 Jan 2022 15:57:19 +0000 (15:57 +0000)]
KVM: arm64: Add comments for context flush and sync callbacks

Add a little bit of information on where _ctxflush_fp() and _ctxsync_fp()
are called to help people unfamiliar with the code get up to speed.

Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220124155720.3943374-2-broonie@kernel.org
2 years agoMerge branch kvm-arm64/mmu-rwlock into kvmarm-master/next
Marc Zyngier [Tue, 8 Feb 2022 14:29:29 +0000 (14:29 +0000)]
Merge branch kvm-arm64/mmu-rwlock into kvmarm-master/next

* kvm-arm64/mmu-rwlock:
  : .
  : MMU locking optimisations from Jing Zhang, allowing permission
  : relaxations to occur in parallel.
  : .
  KVM: selftests: Add vgic initialization for dirty log perf test for ARM
  KVM: arm64: Add fast path to handle permission relaxation during dirty logging
  KVM: arm64: Use read/write spin lock for MMU protection

Signed-off-by: Marc Zyngier <maz@kernel.org>
2 years agoKVM: selftests: Add vgic initialization for dirty log perf test for ARM
Jing Zhang [Tue, 18 Jan 2022 01:57:03 +0000 (01:57 +0000)]
KVM: selftests: Add vgic initialization for dirty log perf test for ARM

For ARM64, if no vgic is setup before the dirty log perf test, the
userspace irqchip would be used, which would affect the dirty log perf
test result.

Signed-off-by: Jing Zhang <jingzhangos@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220118015703.3630552-4-jingzhangos@google.com
2 years agoKVM: arm64: Add fast path to handle permission relaxation during dirty logging
Jing Zhang [Tue, 18 Jan 2022 01:57:02 +0000 (01:57 +0000)]
KVM: arm64: Add fast path to handle permission relaxation during dirty logging

To reduce MMU lock contention during dirty logging, all permission
relaxation operations would be performed under read lock.

Signed-off-by: Jing Zhang <jingzhangos@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220118015703.3630552-3-jingzhangos@google.com
2 years agoKVM: arm64: Use read/write spin lock for MMU protection
Jing Zhang [Tue, 18 Jan 2022 01:57:01 +0000 (01:57 +0000)]
KVM: arm64: Use read/write spin lock for MMU protection

Replace MMU spinlock with rwlock and update all instances of the lock
being acquired with a write lock acquisition.
Future commit will add a fast path for permission relaxation during
dirty logging under a read lock.

Signed-off-by: Jing Zhang <jingzhangos@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220118015703.3630552-2-jingzhangos@google.com
2 years agoMerge branch kvm-arm64/oslock into kvmarm-master/next
Marc Zyngier [Tue, 8 Feb 2022 14:26:30 +0000 (14:26 +0000)]
Merge branch kvm-arm64/oslock into kvmarm-master/next

* kvm-arm64/oslock:
  : .
  : Debug OS-Lock emulation courtesy of Oliver Upton. From the cover letter:
  :
  : "KVM does not implement the debug architecture to the letter of the
  : specification. One such issue is the fact that KVM treats the OS Lock as
  : RAZ/WI, rather than emulating its behavior on hardware. This series adds
  : emulation support for the OS Lock to KVM. Emulation is warranted as the
  : OS Lock affects debug exceptions taken from all ELs, and is not limited
  : to only the context of the guest."
  : .
  selftests: KVM: Test OS lock behavior
  selftests: KVM: Add OSLSR_EL1 to the list of blessed regs
  KVM: arm64: Emulate the OS Lock
  KVM: arm64: Allow guest to set the OSLK bit
  KVM: arm64: Stash OSLSR_EL1 in the cpu context
  KVM: arm64: Correctly treat writes to OSLSR_EL1 as undefined

Signed-off-by: Marc Zyngier <maz@kernel.org>
2 years agoselftests: KVM: Test OS lock behavior
Oliver Upton [Thu, 3 Feb 2022 17:41:59 +0000 (17:41 +0000)]
selftests: KVM: Test OS lock behavior

KVM now correctly handles the OS Lock for its guests. When set, KVM
blocks all debug exceptions originating from the guest. Add test cases
to the debug-exceptions test to assert that software breakpoint,
hardware breakpoint, watchpoint, and single-step exceptions are in fact
blocked.

Signed-off-by: Oliver Upton <oupton@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220203174159.2887882-7-oupton@google.com
2 years agoselftests: KVM: Add OSLSR_EL1 to the list of blessed regs
Oliver Upton [Thu, 3 Feb 2022 17:41:58 +0000 (17:41 +0000)]
selftests: KVM: Add OSLSR_EL1 to the list of blessed regs

OSLSR_EL1 is now part of the visible system register state. Add it to
the get-reg-list selftest to ensure we keep it that way.

Signed-off-by: Oliver Upton <oupton@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220203174159.2887882-6-oupton@google.com
2 years agoKVM: arm64: Emulate the OS Lock
Oliver Upton [Thu, 3 Feb 2022 17:41:57 +0000 (17:41 +0000)]
KVM: arm64: Emulate the OS Lock

The OS lock blocks all debug exceptions at every EL. To date, KVM has
not implemented the OS lock for its guests, despite the fact that it is
mandatory per the architecture. Simple context switching between the
guest and host is not appropriate, as its effects are not constrained to
the guest context.

Emulate the OS Lock by clearing MDE and SS in MDSCR_EL1, thereby
blocking all but software breakpoint instructions.

Signed-off-by: Oliver Upton <oupton@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220203174159.2887882-5-oupton@google.com
2 years agoKVM: arm64: Allow guest to set the OSLK bit
Oliver Upton [Thu, 3 Feb 2022 17:41:56 +0000 (17:41 +0000)]
KVM: arm64: Allow guest to set the OSLK bit

Allow writes to OSLAR and forward the OSLK bit to OSLSR. Do nothing with
the value for now.

Reviewed-by: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Oliver Upton <oupton@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220203174159.2887882-4-oupton@google.com
2 years agoKVM: arm64: Stash OSLSR_EL1 in the cpu context
Oliver Upton [Thu, 3 Feb 2022 17:41:55 +0000 (17:41 +0000)]
KVM: arm64: Stash OSLSR_EL1 in the cpu context

An upcoming change to KVM will emulate the OS Lock from the PoV of the
guest. Add OSLSR_EL1 to the cpu context and handle reads using the
stored value. Define some mnemonics for for handling the OSLM field and
use them to make the reset value of OSLSR_EL1 more readable.

Wire up a custom handler for writes from userspace and prevent any of
the invariant bits from changing. Note that the OSLK bit is not
invariant and will be made writable by the aforementioned change.

Reviewed-by: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Oliver Upton <oupton@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220203174159.2887882-3-oupton@google.com
2 years agoKVM: arm64: Correctly treat writes to OSLSR_EL1 as undefined
Oliver Upton [Thu, 3 Feb 2022 17:41:54 +0000 (17:41 +0000)]
KVM: arm64: Correctly treat writes to OSLSR_EL1 as undefined

Writes to OSLSR_EL1 are UNDEFINED and should never trap from EL1 to
EL2, but the kvm trap handler for OSLSR_EL1 handles writes via
ignore_write(). This is confusing to readers of code, but should have
no functional impact.

For clarity, use write_to_read_only() rather than ignore_write(). If a
trap is unexpectedly taken to EL2 in violation of the architecture, this
will WARN_ONCE() and inject an undef into the guest.

Reviewed-by: Reiji Watanabe <reijiw@google.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
[adopted Mark's changelog suggestion, thanks!]
Signed-off-by: Oliver Upton <oupton@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220203174159.2887882-2-oupton@google.com
2 years agoLinux 5.17-rc3
Linus Torvalds [Sun, 6 Feb 2022 20:20:50 +0000 (12:20 -0800)]
Linux 5.17-rc3

2 years agoMerge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 6 Feb 2022 18:34:45 +0000 (10:34 -0800)]
Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 fixes from Ted Ts'o:
 "Various bug fixes for ext4 fast commit and inline data handling.

  Also fix regression introduced as part of moving to the new mount API"

* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  fs/ext4: fix comments mentioning i_mutex
  ext4: fix incorrect type issue during replay_del_range
  jbd2: fix kernel-doc descriptions for jbd2_journal_shrink_{scan,count}()
  ext4: fix potential NULL pointer dereference in ext4_fill_super()
  jbd2: refactor wait logic for transaction updates into a common function
  jbd2: cleanup unused functions declarations from jbd2.h
  ext4: fix error handling in ext4_fc_record_modified_inode()
  ext4: remove redundant max inline_size check in ext4_da_write_inline_data_begin()
  ext4: fix error handling in ext4_restore_inline_data()
  ext4: fast commit may miss file actions
  ext4: fast commit may not fallback for ineligible commit
  ext4: modify the logic of ext4_mb_new_blocks_simple
  ext4: prevent used blocks from being allocated during fast commit replay

2 years agoMerge tag 'perf-tools-fixes-for-v5.17-2022-02-06' of git://git.kernel.org/pub/scm...
Linus Torvalds [Sun, 6 Feb 2022 18:18:23 +0000 (10:18 -0800)]
Merge tag 'perf-tools-fixes-for-v5.17-2022-02-06' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

Pull perf tools fixes from Arnaldo Carvalho de Melo:

 - Fix display of grouped aliased events in 'perf stat'.

 - Add missing branch_sample_type to perf_event_attr__fprintf().

 - Apply correct label to user/kernel symbols in branch mode.

 - Fix 'perf ftrace' system_wide tracing, it has to be set before
   creating the maps.

 - Return error if procfs isn't mounted for PID namespaces when
   synthesizing records for pre-existing processes.

 - Set error stream of objdump process for 'perf annotate' TUI, to avoid
   garbling the screen.

 - Add missing arm64 support to perf_mmap__read_self(), the kernel part
   got into 5.17.

 - Check for NULL pointer before dereference writing debug info about a
   sample.

 - Update UAPI copies for asound, perf_event, prctl and kvm headers.

 - Fix a typo in bpf_counter_cgroup.c.

* tag 'perf-tools-fixes-for-v5.17-2022-02-06' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
  perf ftrace: system_wide collection is not effective by default
  libperf: Add arm64 support to perf_mmap__read_self()
  tools include UAPI: Sync sound/asound.h copy with the kernel sources
  perf stat: Fix display of grouped aliased events
  perf tools: Apply correct label to user/kernel symbols in branch mode
  perf bpf: Fix a typo in bpf_counter_cgroup.c
  perf synthetic-events: Return error if procfs isn't mounted for PID namespaces
  perf session: Check for NULL pointer before dereference
  perf annotate: Set error stream of objdump process for TUI
  perf tools: Add missing branch_sample_type to perf_event_attr__fprintf()
  tools headers UAPI: Sync linux/kvm.h with the kernel sources
  tools headers UAPI: Sync linux/prctl.h with the kernel sources
  perf beauty: Make the prctl arg regexp more strict to cope with PR_SET_VMA
  tools headers cpufeatures: Sync with the kernel sources
  tools headers UAPI: Sync linux/perf_event.h with the kernel sources
  tools include UAPI: Sync sound/asound.h copy with the kernel sources

2 years agoMerge tag 'perf_urgent_for_v5.17_rc3' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 6 Feb 2022 18:11:14 +0000 (10:11 -0800)]
Merge tag 'perf_urgent_for_v5.17_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf fixes from Borislav Petkov:

 - Intel/PT: filters could crash the kernel

 - Intel: default disable the PMU for SMM, some new-ish EFI firmware has
   started using CPL3 and the PMU CPL filters don't discriminate against
   SMM, meaning that CPL3 (userspace only) events now also count EFI/SMM
   cycles.

 - Fixup for perf_event_attr::sig_data

* tag 'perf_urgent_for_v5.17_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel/pt: Fix crash with stop filters in single-range mode
  perf: uapi: Document perf_event_attr::sig_data truncation on 32 bit architectures
  selftests/perf_events: Test modification of perf_event_attr::sig_data
  perf: Copy perf_event_attr::sig_data on modification
  x86/perf: Default set FREEZE_ON_SMI for all

2 years agoMerge tag 'objtool_urgent_for_v5.17_rc3' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 6 Feb 2022 18:04:43 +0000 (10:04 -0800)]
Merge tag 'objtool_urgent_for_v5.17_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull objtool fix from Borislav Petkov:
 "Fix a potential truncated string warning triggered by gcc12"

* tag 'objtool_urgent_for_v5.17_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  objtool: Fix truncated string warning

2 years agoMerge tag 'irq_urgent_for_v5.17_rc3' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 6 Feb 2022 18:00:40 +0000 (10:00 -0800)]
Merge tag 'irq_urgent_for_v5.17_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq fix from Borislav Petkov:
 "Remove a bogus warning introduced by the recent PCI MSI irq affinity
  overhaul"

* tag 'irq_urgent_for_v5.17_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  PCI/MSI: Remove bogus warning in pci_irq_get_affinity()

2 years agoMerge tag 'edac_urgent_for_v5.17_rc3' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 6 Feb 2022 17:57:39 +0000 (09:57 -0800)]
Merge tag 'edac_urgent_for_v5.17_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras

Pull EDAC fixes from Borislav Petkov:
 "Fix altera and xgene EDAC drivers to propagate the correct error code
  from platform_get_irq() so that deferred probing still works"

* tag 'edac_urgent_for_v5.17_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
  EDAC/xgene: Fix deferred probing
  EDAC/altera: Fix deferred probing

2 years agoperf ftrace: system_wide collection is not effective by default
Changbin Du [Thu, 27 Jan 2022 13:20:10 +0000 (21:20 +0800)]
perf ftrace: system_wide collection is not effective by default

The ftrace.target.system_wide must be set before invoking
evlist__create_maps(), otherwise it has no effect.

Fixes: e23b42448a386589 ("perf ftrace: Add 'latency' subcommand")
Signed-off-by: Changbin Du <changbin.du@gmail.com>
Acked-by: Namhyung Kim <namhyung@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220127132010.4836-1-changbin.du@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2 years agolibperf: Add arm64 support to perf_mmap__read_self()
Rob Herring [Tue, 1 Feb 2022 21:40:56 +0000 (15:40 -0600)]
libperf: Add arm64 support to perf_mmap__read_self()

Add the arm64 variants for read_perf_counter() and read_timestamp().
Unfortunately the counter number is encoded into the instruction, so the
code is a bit verbose to enumerate all possible counters.

Tested-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Tested-by: John Garry <john.garry@huawei.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: https://lore.kernel.org/r/20220201214056.702854-1-robh@kernel.org
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: linux-kernel@vger.kernel.org
Cc: linux-perf-users@vger.kernel.org
2 years agotools include UAPI: Sync sound/asound.h copy with the kernel sources
Arnaldo Carvalho de Melo [Wed, 12 Feb 2020 14:04:23 +0000 (11:04 -0300)]
tools include UAPI: Sync sound/asound.h copy with the kernel sources

Picking the changes from:

  4842659e07f4d632 ("ASoC: hdmi-codec: Fix OOB memory accesses")

Which entails no changes in the tooling side as it doesn't introduce new
SNDRV_PCM_IOCTL_ ioctls.

To silence this perf tools build warning:

  Warning: Kernel ABI header at 'tools/include/uapi/sound/asound.h' differs from latest version at 'include/uapi/sound/asound.h'
  diff -u tools/include/uapi/sound/asound.h include/uapi/sound/asound.h

Cc: Dmitry Osipenko <digetx@gmail.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Takashi Iwai <tiwai@suse.de>
Link: https://lore.kernel.org/lkml/Yf+6OT+2eMrYDEeX@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2 years agoperf stat: Fix display of grouped aliased events
Ian Rogers [Sat, 5 Feb 2022 01:09:41 +0000 (17:09 -0800)]
perf stat: Fix display of grouped aliased events

An event may have a number of uncore aliases that when added to the
evlist are consecutive.

If there are multiple uncore events in a group then
parse_events__set_leader_for_uncore_aliase will reorder the evlist so
that events on the same PMU are adjacent.

The collect_all_aliases function assumes that aliases are in blocks so
that only the first counter is printed and all others are marked merged.

The reordering for groups breaks the assumption and so all counts are
printed.

This change removes the assumption from collect_all_aliases
that the events are in blocks and instead processes the entire evlist.

Before:

  ```
  $ perf stat -e '{UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE,UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE},duration_time' -a -A -- sleep 1

   Performance counter stats for 'system wide':

  CPU0                  256,866      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
  CPU36                 494,413      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
  CPU0                      967      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
  CPU36                   1,738      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
  CPU0                  285,161      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
  CPU36                 429,920      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
  CPU0                      955      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
  CPU36                   1,443      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
  CPU0                  310,753      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
  CPU36                 416,657      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
  CPU0                    1,231      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
  CPU36                   1,573      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
  CPU0                  416,067      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
  CPU36                 405,966      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
  CPU0                    1,481      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
  CPU36                   1,447      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
  CPU0                  312,911      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
  CPU36                 408,154      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
  CPU0                    1,086      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
  CPU36                   1,380      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
  CPU0                  333,994      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
  CPU36                 370,349      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
  CPU0                    1,287      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
  CPU36                   1,335      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
  CPU0                  188,107      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
  CPU36                 302,423      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
  CPU0                      701      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
  CPU36                   1,070      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
  CPU0                  307,221      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
  CPU36                 383,642      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
  CPU0                    1,036      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
  CPU36                   1,158      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
  CPU0                  318,479      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
  CPU36                 821,545      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
  CPU0                    1,028      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
  CPU36                   2,550      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
  CPU0                  227,618      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
  CPU36                 372,272      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
  CPU0                      903      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
  CPU36                   1,456      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
  CPU0                  376,783      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
  CPU36                 419,827      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
  CPU0                    1,406      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
  CPU36                   1,453      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
  CPU0                  286,583      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
  CPU36                 429,956      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
  CPU0                      999      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
  CPU36                   1,436      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
  CPU0                  313,867      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
  CPU36                 370,159      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
  CPU0                    1,114      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
  CPU36                   1,291      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
  CPU0                  342,083      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
  CPU36                 409,111      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
  CPU0                    1,399      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
  CPU36                   1,684      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
  CPU0                  365,828      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
  CPU36                 376,037      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
  CPU0                    1,378      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
  CPU36                   1,411      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
  CPU0                  382,456      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
  CPU36                 621,743      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
  CPU0                    1,232      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
  CPU36                   1,955      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
  CPU0                  342,316      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
  CPU36                 385,067      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
  CPU0                    1,176      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
  CPU36                   1,268      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
  CPU0                  373,588      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
  CPU36                 386,163      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
  CPU0                    1,394      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
  CPU36                   1,464      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
  CPU0                  381,206      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
  CPU36                 546,891      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
  CPU0                    1,266      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
  CPU36                   1,712      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
  CPU0                  221,176      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
  CPU36                 392,069      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
  CPU0                      831      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
  CPU36                   1,456      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
  CPU0                  355,401      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
  CPU36                 705,595      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
  CPU0                    1,235      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
  CPU36                   2,216      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
  CPU0                  371,436      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
  CPU36                 428,103      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
  CPU0                    1,306      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
  CPU36                   1,442      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
  CPU0                  384,352      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
  CPU36                 504,200      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
  CPU0                    1,468      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
  CPU36                   1,860      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
  CPU0                  228,856      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
  CPU36                 287,976      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
  CPU0                      832      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
  CPU36                   1,060      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
  CPU0                  215,121      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
  CPU36                 334,162      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
  CPU0                      681      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
  CPU36                   1,026      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
  CPU0                  296,179      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
  CPU36                 436,083      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
  CPU0                    1,084      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
  CPU36                   1,525      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
  CPU0                  262,296      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
  CPU36                 416,573      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
  CPU0                      986      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
  CPU36                   1,533      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
  CPU0                  285,852      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
  CPU36                 359,842      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
  CPU0                    1,073      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
  CPU36                   1,326      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
  CPU0                  303,379      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
  CPU36                 367,222      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
  CPU0                    1,008      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
  CPU36                   1,156      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
  CPU0                  273,487      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
  CPU36                 425,449      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
  CPU0                      932      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
  CPU36                   1,367      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
  CPU0                  297,596      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
  CPU36                 414,793      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
  CPU0                    1,140      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
  CPU36                   1,601      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
  CPU0                  342,365      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
  CPU36                 360,422      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
  CPU0                    1,291      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
  CPU36                   1,342      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
  CPU0                  327,196      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
  CPU36                 580,858      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
  CPU0                    1,122      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
  CPU36                   2,014      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
  CPU0                  296,564      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
  CPU36                 452,817      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
  CPU0                    1,087      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
  CPU36                   1,694      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
  CPU0                  375,002      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
  CPU36                 389,393      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
  CPU0                    1,478      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
  CPU36                   1,540      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
  CPU0                  365,213      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
  CPU36                 594,685      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
  CPU0                    1,401      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
  CPU36                   2,222      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
  CPU0            1,000,749,060 ns   duration_time

         1.000749060 seconds time elapsed
  ```

After:

  ```
   Performance counter stats for 'system wide':

  CPU0               20,547,434      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
  CPU36              45,202,862      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
  CPU0                   82,001      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
  CPU36                 159,688      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
  CPU0            1,000,464,828 ns   duration_time

         1.000464828 seconds time elapsed
  ```

Fixes: f3952a3dd8b48d9a ("perf parse-events: Handle uncore event aliases in small groups properly")
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Asaf Yaffe <asaf.yaffe@intel.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kshipra Bopardikar <kshipra.bopardikar@intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vineet Singh <vineet.singh@intel.com>
Cc: Zhengjun Xing <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220205010941.1065469-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2 years agoperf tools: Apply correct label to user/kernel symbols in branch mode
German Gomez [Wed, 26 Jan 2022 10:59:26 +0000 (10:59 +0000)]
perf tools: Apply correct label to user/kernel symbols in branch mode

In branch mode, the branch symbols were being displayed with incorrect
cpumode labels. So fix this.

For example, before:
  # perf record -b -a -- sleep 1
  # perf report -b

  Overhead  Command  Source Shared Object  Source Symbol               Target Symbol
     0.08%  swapper  [kernel.kallsyms]     [k] rcu_idle_enter          [k] cpuidle_enter_state
 ==> 0.08%  cmd0     [kernel.kallsyms]     [.] psi_group_change        [.] psi_group_change
     0.08%  cmd1     [kernel.kallsyms]     [k] psi_group_change        [k] psi_group_change

After:
  # perf report -b

  Overhead  Command  Source Shared Object  Source Symbol               Target Symbol
     0.08%  swapper  [kernel.kallsyms]     [k] rcu_idle_enter          [k] cpuidle_enter_state
     0.08%  cmd0     [kernel.kallsyms]     [k] psi_group_change        [k] pei_group_change
     0.08%  cmd1     [kernel.kallsyms]     [k] psi_group_change        [k] psi_group_change

Reviewed-by: James Clark <james.clark@arm.com>
Signed-off-by: German Gomez <german.gomez@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20220126105927.3411216-1-german.gomez@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2 years agoperf bpf: Fix a typo in bpf_counter_cgroup.c
Masanari Iida [Sat, 25 Dec 2021 00:55:58 +0000 (09:55 +0900)]
perf bpf: Fix a typo in bpf_counter_cgroup.c

This patch fixes a spelling typo in error message.

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20211225005558.503935-1-standby24x7@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2 years agoperf synthetic-events: Return error if procfs isn't mounted for PID namespaces
Leo Yan [Fri, 24 Dec 2021 12:40:13 +0000 (20:40 +0800)]
perf synthetic-events: Return error if procfs isn't mounted for PID namespaces

For perf recording, it retrieves process info by iterating nodes in proc
fs.  If we run perf in a non-root PID namespace with command:

  # unshare --fork --pid perf record -e cycles -a -- test_program

... in this case, unshare command creates a child PID namespace and
launches perf tool in it, but the issue is the proc fs is not mounted
for the non-root PID namespace, this leads to the perf tool gathering
process info from its parent PID namespace.

We can use below command to observe the process nodes under proc fs:

  # unshare --pid --fork ls /proc
1    137   1968  2128  3    342  48  62   78      crypto   kcore        net       uptime
10   138   2  2142  30   35  49  63   8      devices   keys        pagetypeinfo   version
11   139   20  2143  304  36  50  64   82      device-tree  key-users    partitions     vmallocinfo
12   14    2011  22    305  37  51  65   83      diskstats   kmsg        self       vmstat
128  140   2038  23    307  39  52  656  84      driver   kpagecgroup  slabinfo       zoneinfo
129  15    2074  24    309  4  53  67   9      execdomains  kpagecount   softirqs
13   16    2094  241   31   40  54  68   asound     fb   kpageflags   stat
130  164   2096  242   310  41  55  69   buddyinfo  filesystems  loadavg      swaps
131  17    2098  25    317  42  56  70   bus      fs   locks        sys
132  175   21  26    32   43  57  71   cgroups    interrupts   meminfo      sysrq-trigger
133  179   2102  263   329  44  58  75   cmdline    iomem   misc        sysvipc
134  1875  2103  27    330  45  59  76   config.gz  ioports   modules      thread-self
135  19    2117  29    333  46  6   77   consoles   irq   mounts       timer_list
136  1941  2121  298   34   47  60  773  cpuinfo    kallsyms   mtd        tty

So it shows many existed tasks, since unshared command has not mounted
the proc fs for the new created PID namespace, it still accesses the
proc fs of the root PID namespace.  This leads to two prominent issues:

- Firstly, PID values are mismatched between thread info and samples.
  The gathered thread info are coming from the proc fs of the root PID
  namespace, but samples record its PID from the child PID namespace.

- The second issue is profiled program 'test_program' returns its forked
  PID number from the child PID namespace, perf tool wrongly uses this
  PID number to retrieve the process info via the proc fs of the root
  PID namespace.

To avoid issues, we need to mount proc fs for the child PID namespace
with the option '--mount-proc' when use unshare command:

  # unshare --fork --pid --mount-proc perf record -e cycles -a -- test_program

Conversely, when the proc fs of the root PID namespace is used by child
namespace, perf tool can detect the multiple PID levels and
nsinfo__is_in_root_namespace() returns false, this patch reports error
for this case:

  # unshare --fork --pid perf record -e cycles -a -- test_program
  Couldn't synthesize bpf events.
  Perf runs in non-root PID namespace but it tries to gather process info from its parent PID namespace.
  Please mount the proc file system properly, e.g. add the option '--mount-proc' for unshare command.

Reviewed-by: James Clark <james.clark@arm.com>
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20211224124014.2492751-1-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2 years agoperf session: Check for NULL pointer before dereference
Ameer Hamza [Tue, 25 Jan 2022 12:11:41 +0000 (17:11 +0500)]
perf session: Check for NULL pointer before dereference

Move NULL pointer check before dereferencing the variable.

Addresses-Coverity: 1497622 ("Derereference before null check")
Reviewed-by: James Clark <james.clark@arm.com>
Signed-off-by: Ameer Hamza <amhamza.mgc@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Link: https://lore.kernel.org/r/20220125121141.18347-1-amhamza.mgc@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2 years agoperf annotate: Set error stream of objdump process for TUI
Namhyung Kim [Wed, 2 Feb 2022 07:08:25 +0000 (23:08 -0800)]
perf annotate: Set error stream of objdump process for TUI

The stderr should be set to a pipe when using TUI.  Otherwise it'd
print to stdout and break TUI windows with an error message.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20220202070828.143303-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2 years agoperf tools: Add missing branch_sample_type to perf_event_attr__fprintf()
Anshuman Khandual [Wed, 2 Feb 2022 10:57:23 +0000 (16:27 +0530)]
perf tools: Add missing branch_sample_type to perf_event_attr__fprintf()

This updates branch sample type with missing PERF_SAMPLE_BRANCH_TYPE_SAVE.

Suggested-by: James Clark <james.clark@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lore.kernel.org/lkml/1643799443-15109-1-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2 years agotools headers UAPI: Sync linux/kvm.h with the kernel sources
Arnaldo Carvalho de Melo [Sun, 9 May 2021 12:39:02 +0000 (09:39 -0300)]
tools headers UAPI: Sync linux/kvm.h with the kernel sources

To pick the changes in:

  484cd7daedf515bc ("kvm: Move KVM_GET_XSAVE2 IOCTL definition at the end of kvm.h")

That just rebuilds perf, as these patches don't add any new KVM ioctl to
be harvested for the the 'perf trace' ioctl syscall argument
beautifiers.

This is also by now used by tools/testing/selftests/kvm/, a simple test
build succeeded.

This silences this perf build warning:

  Warning: Kernel ABI header at 'tools/include/uapi/linux/kvm.h' differs from latest version at 'include/uapi/linux/kvm.h'
  diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h

Cc: Janosch Frank <frankja@linux.ibm.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: http://lore.kernel.org/lkml/Yf+4k5Fs5Q3HdSG9@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2 years agoMerge remote-tracking branch 'torvalds/master' into perf/urgent
Arnaldo Carvalho de Melo [Sun, 6 Feb 2022 11:28:34 +0000 (08:28 -0300)]
Merge remote-tracking branch 'torvalds/master' into perf/urgent

To check if more kernel API sync is needed and also to see if the perf
build tests continue to pass.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2 years agoMerge tag 'for-linus-5.17a-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 5 Feb 2022 18:40:17 +0000 (10:40 -0800)]
Merge tag 'for-linus-5.17a-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen fixes from Juergen Gross:

 - documentation fixes related to Xen

 - enable x2apic mode when available when running as hardware
   virtualized guest under Xen

 - cleanup and fix a corner case of vcpu enumeration when running a
   paravirtualized Xen guest

* tag 'for-linus-5.17a-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  x86/Xen: streamline (and fix) PV CPU enumeration
  xen: update missing ioctl magic numers documentation
  Improve docs for IOCTL_GNTDEV_MAP_GRANT_REF
  xen: xenbus_dev.h: delete incorrect file name
  xen/x2apic: enable x2apic mode when supported for HVM

2 years agoMerge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Linus Torvalds [Sat, 5 Feb 2022 17:55:59 +0000 (09:55 -0800)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "ARM:

   - A couple of fixes when handling an exception while a SError has
     been delivered

   - Workaround for Cortex-A510's single-step erratum

  RISC-V:

   - Make CY, TM, and IR counters accessible in VU mode

   - Fix SBI implementation version

  x86:

   - Report deprecation of x87 features in supported CPUID

   - Preparation for fixing an interrupt delivery race on AMD hardware

   - Sparse fix

  All except POWER and s390:

   - Rework guest entry code to correctly mark noinstr areas and fix
     vtime' accounting (for x86, this was already mostly correct but not
     entirely; for ARM, MIPS and RISC-V it wasn't)"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: Use ERR_PTR_USR() to return -EFAULT as a __user pointer
  KVM: x86: Report deprecated x87 features in supported CPUID
  KVM: arm64: Workaround Cortex-A510's single-step and PAC trap errata
  KVM: arm64: Stop handle_exit() from handling HVC twice when an SError occurs
  KVM: arm64: Avoid consuming a stale esr value when SError occur
  RISC-V: KVM: Fix SBI implementation version
  RISC-V: KVM: make CY, TM, and IR counters accessible in VU mode
  kvm/riscv: rework guest entry logic
  kvm/arm64: rework guest entry logic
  kvm/x86: rework guest entry logic
  kvm/mips: rework guest entry logic
  kvm: add guest_state_{enter,exit}_irqoff()
  KVM: x86: Move delivery of non-APICv interrupt into vendor code
  kvm: Move KVM_GET_XSAVE2 IOCTL definition at the end of kvm.h

2 years agoMerge tag 'xfs-5.17-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Linus Torvalds [Sat, 5 Feb 2022 17:21:55 +0000 (09:21 -0800)]
Merge tag 'xfs-5.17-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs fixes from Darrick Wong:
 "I was auditing operations in XFS that clear file privileges, and
  realized that XFS' fallocate implementation drops suid/sgid but
  doesn't clear file capabilities the same way that file writes and
  reflink do.

  There are VFS helpers that do it correctly, so refactor XFS to use
  them. I also noticed that we weren't flushing the log at the correct
  point in the fallocate operation, so that's fixed too.

  Summary:

   - Fix fallocate so that it drops all file privileges when files are
     modified instead of open-coding that incompletely.

   - Fix fallocate to flush the log if the caller wanted synchronous
     file updates"

* tag 'xfs-5.17-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: ensure log flush at the end of a synchronous fallocate call
  xfs: move xfs_update_prealloc_flags() to xfs_pnfs.c
  xfs: set prealloc flag in xfs_alloc_file_space()
  xfs: fallocate() should call file_modified()
  xfs: remove XFS_PREALLOC_SYNC
  xfs: reject crazy array sizes being fed to XFS_IOC_GETBMAP*

2 years agoMerge tag 'vfs-5.17-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Linus Torvalds [Sat, 5 Feb 2022 17:13:51 +0000 (09:13 -0800)]
Merge tag 'vfs-5.17-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull vfs fixes from Darrick Wong:
 "I was auditing the sync_fs code paths recently and noticed that most
  callers of ->sync_fs ignore its return value (and many implementations
  never return nonzero even if the fs is broken!), which means that
  internal fs errors and corruption are not passed up to userspace
  callers of syncfs(2) or FIFREEZE. Hence fixing the common code and
  XFS, and I'll start working on the ext4/btrfs folks if this is merged.

  Summary:

   - Fix a bug where callers of ->sync_fs (e.g. sync_filesystem and
     syncfs(2)) ignore the return value.

   - Fix a bug where callers of sync_filesystem (e.g. fs freeze) ignore
     the return value.

   - Fix a bug in XFS where xfs_fs_sync_fs never passed back error
     returns"

* tag 'vfs-5.17-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: return errors in xfs_fs_sync_fs
  quota: make dquot_quota_sync return errors from ->sync_fs
  vfs: make sync_filesystem return errors from ->sync_fs
  vfs: make freeze_super abort when sync_filesystem returns error

2 years agoMerge tag 'iomap-5.17-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Linus Torvalds [Sat, 5 Feb 2022 17:04:43 +0000 (09:04 -0800)]
Merge tag 'iomap-5.17-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull iomap fix from Darrick Wong:
 "A single bugfix for iomap.

  The fix should eliminate occasional complaints about stall warnings
  when a lot of writeback IO completes all at once and we have to then
  go clearing status on a large number of folios.

  Summary:

   - Limit the length of ioend chains in writeback so that we don't trip
     the softlockup watchdog and to limit long tail latency on clearing
     PageWriteback"

* tag 'iomap-5.17-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs, iomap: limit individual ioend chain lengths in writeback

2 years agoMerge tag 'kvmarm-fixes-5.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git...
Paolo Bonzini [Sat, 5 Feb 2022 05:58:25 +0000 (00:58 -0500)]
Merge tag 'kvmarm-fixes-5.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 fixes for 5.17, take #2

- A couple of fixes when handling an exception while a SError has been
  delivered

- Workaround for Cortex-A510's single-step[ erratum

2 years agoMerge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Linus Torvalds [Sat, 5 Feb 2022 00:28:11 +0000 (16:28 -0800)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma fixes from Jason Gunthorpe:
 "Some medium sized bugs in the various drivers. A couple are more
  recent regressions:

   - Fix two panics in hfi1 and two allocation problems

   - Send the IGMP to the correct address in cma

   - Squash a syzkaller bug related to races reading the multicast list

   - Memory leak in siw and cm

   - Fix a corner case spec compliance for HFI/QIB

   - Correct the implementation of fences in siw

   - Error unwind bug in mlx4"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  RDMA/mlx4: Don't continue event handler after memory allocation failure
  RDMA/siw: Fix broken RDMA Read Fence/Resume logic.
  IB/rdmavt: Validate remote_addr during loopback atomic tests
  IB/cm: Release previously acquired reference counter in the cm_id_priv
  RDMA/siw: Fix refcounting leak in siw_create_qp()
  RDMA/ucma: Protect mc during concurrent multicast leaves
  RDMA/cma: Use correct address when leaving multicast group
  IB/hfi1: Fix tstats alloc and dealloc
  IB/hfi1: Fix AIP early init panic
  IB/hfi1: Fix alloc failure with larger txqueuelen
  IB/hfi1: Fix panic with larger ipoib send_queue_size

2 years agoMerge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Linus Torvalds [Fri, 4 Feb 2022 23:27:45 +0000 (15:27 -0800)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "Seven fixes, six of which are fairly obvious driver fixes.

  The one core change to the device budget depth is to try to ensure
  that if the default depth is large (which can produce quite a sizeable
  bitmap allocation per device), we give back the memory we don't need
  if there's a queue size reduction in slave_configure (which happens to
  a lot of devices)"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: hisi_sas: Fix setting of hisi_sas_slot.is_internal
  scsi: pm8001: Fix use-after-free for aborted SSP/STP sas_task
  scsi: pm8001: Fix use-after-free for aborted TMF sas_task
  scsi: pm8001: Fix warning for undescribed param in process_one_iomb()
  scsi: core: Reallocate device's budget map on queue depth change
  scsi: bnx2fc: Make bnx2fc_recv_frame() mp safe
  scsi: pm80xx: Fix double completion for SATA devices

2 years agoMerge tag 'pci-v5.17-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaa...
Linus Torvalds [Fri, 4 Feb 2022 23:22:35 +0000 (15:22 -0800)]
Merge tag 'pci-v5.17-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull pci fixes from Bjorn Helgaas:

 - Restructure j721e_pcie_probe() so we don't dereference a NULL pointer
   (Bjorn Helgaas)

 - Add a kirin_pcie_data struct to identify different Kirin variants to
   fix probe failure for controllers with an internal PHY (Bjorn
   Helgaas)

* tag 'pci-v5.17-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  PCI: kirin: Add dev struct for of_device_get_match_data()
  PCI: j721e: Initialize pcie->cdns_pcie before using it

2 years agoPCI: kirin: Add dev struct for of_device_get_match_data()
Bjorn Helgaas [Wed, 2 Feb 2022 15:52:41 +0000 (09:52 -0600)]
PCI: kirin: Add dev struct for of_device_get_match_data()

Bean reported that 652f33e8a03b ("PCI: kirin: Prefer
of_device_get_match_data()") broke kirin_pcie_probe() because it assumed
match data of 0 was a failure when in fact, it meant the match data was
"(void *)PCIE_KIRIN_INTERNAL_PHY".

Therefore, probing of "hisilicon,kirin960-pcie" devices failed with -EINVAL
and an "OF data missing" message.

Add a struct kirin_pcie_data to encode the PHY type.  Then the result of
of_device_get_match_data() should always be a non-NULL pointer to a struct
kirin_pcie_data that contains the PHY type.

Fixes: 652f33e8a03b ("PCI: kirin: Prefer of_device_get_match_data()")
Link: https://lore.kernel.org/r/20220202162659.GA12603@bhelgaas
Link: https://lore.kernel.org/r/20220201215941.1203155-1-huobean@gmail.com
Reported-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2 years agoMerge tag 'for-5.17-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave...
Linus Torvalds [Fri, 4 Feb 2022 20:14:58 +0000 (12:14 -0800)]
Merge tag 'for-5.17-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:
 "A few fixes and error handling improvements:

   - fix deadlock between quota disable and qgroup rescan worker

   - fix use-after-free after failure to create a snapshot

   - skip warning on unmount after log cleanup failure

   - don't start transaction for scrub if the fs is mounted read-only

   - tree checker verifies item sizes"

* tag 'for-5.17-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: skip reserved bytes warning on unmount after log cleanup failure
  btrfs: fix use of uninitialized variable at rm device ioctl
  btrfs: fix use-after-free after failure to create a snapshot
  btrfs: tree-checker: check item_size for dev_item
  btrfs: tree-checker: check item_size for inode_item
  btrfs: fix deadlock between quota disable and qgroup rescan worker
  btrfs: don't start transaction for scrub if the fs is mounted read-only

2 years agoMerge tag 'erofs-for-5.17-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 4 Feb 2022 20:08:49 +0000 (12:08 -0800)]
Merge tag 'erofs-for-5.17-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs

Pull erofs fixes from Gao Xiang:
 "Two fixes related to fsdax cleanup in this cycle and ztailpacking to
  fix small compressed data inlining. There is also a trivial cleanup to
  rearrange code for better reading.

  Summary:

   - fix fsdax partition offset misbehavior

   - clean up z_erofs_decompressqueue_work() declaration

   - fix up EOF lcluster inlining, especially for small compressed data"

* tag 'erofs-for-5.17-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
  erofs: fix small compressed files inlining
  erofs: avoid unnecessary z_erofs_decompressqueue_work() declaration
  erofs: fix fsdax partition offset handling

2 years agoMerge tag 'block-5.17-2022-02-04' of git://git.kernel.dk/linux-block
Linus Torvalds [Fri, 4 Feb 2022 20:01:57 +0000 (12:01 -0800)]
Merge tag 'block-5.17-2022-02-04' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:

 - NVMe pull request
    - fix use-after-free in rdma and tcp controller reset (Sagi Grimberg)
    - fix the state check in nvmf_ctlr_matches_baseopts (Uday Shankar)

 - MD nowait null pointer fix (Song)

 - blk-integrity seed advance fix (Martin)

 - Fix a dio regression in this merge window (Ilya)

* tag 'block-5.17-2022-02-04' of git://git.kernel.dk/linux-block:
  block: bio-integrity: Advance seed correctly for larger interval sizes
  nvme-fabrics: fix state check in nvmf_ctlr_matches_baseopts()
  md: fix NULL pointer deref with nowait but no mddev->queue
  block: fix DIO handling regressions in blkdev_read_iter()
  nvme-rdma: fix possible use-after-free in transport error_recovery work
  nvme-tcp: fix possible use-after-free in transport error_recovery work
  nvme: fix a possible use-after-free in controller reset during load

2 years agoMerge tag 'ata-5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal...
Linus Torvalds [Fri, 4 Feb 2022 19:52:37 +0000 (11:52 -0800)]
Merge tag 'ata-5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata

Pull ATA fixes from Damien Le Moal:

 - Sergey volunteered to be a reviewer for the Renesas R-Car SATA driver
   and PATA drivers. Update the MAINTAINERS file accordingly.

 - Regression fix: add a horkage flag to prevent accessing the log
   directory log page with SATADOM-ML 3ME SATA devices as they react
   badly to reading that log page (from Anton).

* tag 'ata-5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
  ata: libata-core: Introduce ATA_HORKAGE_NO_LOG_DIR horkage
  MAINTAINERS: add myself as Renesas R-Car SATA driver reviewer
  MAINTAINERS: add myself as PATA drivers reviewer

2 years agoMerge tag 'iommu-fixes-v5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 4 Feb 2022 19:45:16 +0000 (11:45 -0800)]
Merge tag 'iommu-fixes-v5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull iommu fixes from Joerg Roedel:

 - Warning fixes and a fix for a potential use-after-free in IOMMU core
   code

 - Another potential memory leak fix for the Intel VT-d driver

 - Fix for an IO polling loop timeout issue in the AMD IOMMU driver

* tag 'iommu-fixes-v5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  iommu/amd: Fix loop timeout issue in iommu_ga_log_enable()
  iommu/vt-d: Fix potential memory leak in intel_setup_irq_remapping()
  iommu: Fix some W=1 warnings
  iommu: Fix potential use-after-free during probe

2 years agoMerge tag 'random-5.17-rc3-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 4 Feb 2022 19:38:01 +0000 (11:38 -0800)]
Merge tag 'random-5.17-rc3-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random

Pull random number generator fixes from Jason Donenfeld:
 "For this week, we have:

   - A fix to make more frequent use of hwgenerator randomness, from
     Dominik.

   - More cleanups to the boot initialization sequence, from Dominik.

   - A fix for an old shortcoming with the ZAP ioctl, from me.

   - A workaround for a still unfixed Clang CFI/FullLTO compiler bug,
     from me. On one hand, it's a bummer to commit workarounds for
     experimental compiler features that have bugs. But on the other, I
     think this actually improves the code somewhat, independent of the
     bug. So a win-win"

* tag 'random-5.17-rc3-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random:
  random: only call crng_finalize_init() for primary_crng
  random: access primary_pool directly rather than through pointer
  random: wake up /dev/random writers after zap
  random: continually use hwgenerator randomness
  lib/crypto: blake2s: avoid indirect calls to compression function for Clang CFI

2 years agoMerge tag 'acpi-5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael...
Linus Torvalds [Fri, 4 Feb 2022 19:32:46 +0000 (11:32 -0800)]
Merge tag 'acpi-5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI fix from Rafael Wysocki:
 "Fix compilation in the case when ACPI is selected and CRC32, depended
  on by ACPI after recent changes, is not (Randy Dunlap)"

* tag 'acpi-5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: require CRC32 to build

2 years agoMerge tag 'sound-5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai...
Linus Torvalds [Fri, 4 Feb 2022 19:24:28 +0000 (11:24 -0800)]
Merge tag 'sound-5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
 "A collection of small fixes.

  The major changes are ASoC core fixes, addressing the DPCM locking
  issue after the recent code changes and the potentially invalid
  register accesses via control API. Also, HD-audio got a core fix for
  Oops at dynamic unbinding.

  The rest are device-specific small fixes, including the usual stuff
  like HD-audio and USB-audio quirks"

* tag 'sound-5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (31 commits)
  ALSA: hda: Skip codec shutdown in case the codec is not registered
  ALSA: usb-audio: Correct quirk for VF0770
  ALSA: Replace acpi_bus_get_device()
  Input: wm97xx: Simplify resource management
  ALSA: hda/realtek: Add quirk for ASUS GU603
  ALSA: hda/realtek: Fix silent output on Gigabyte X570 Aorus Xtreme after reboot from Windows
  ALSA: hda/realtek: Fix silent output on Gigabyte X570S Aorus Master (newer chipset)
  ALSA: hda/realtek: Add missing fixup-model entry for Gigabyte X570 ALC1220 quirks
  ALSA: hda: realtek: Fix race at concurrent COEF updates
  ASoC: ops: Check for negative values before reading them
  ASoC: rt5682: Fix deadlock on resume
  ASoC: hdmi-codec: Fix OOB memory accesses
  ASoC: soc-pcm: Move debugfs removal out of spinlock
  ASoC: soc-pcm: Fix DPCM lockdep warning due to nested stream locks
  ASoC: fsl: Add missing error handling in pcm030_fabric_probe
  ALSA: hda: Fix signedness of sscanf() arguments
  ALSA: usb-audio: initialize variables that could ignore errors
  ALSA: hda: Fix UAF of leds class devs at unbinding
  ASoC: qdsp6: q6apm-dai: only stop graphs that are started
  ASoC: codecs: wcd938x: fix return value of mixer put function
  ...

2 years agoMerge tag 'drm-fixes-2022-02-04' of git://anongit.freedesktop.org/drm/drm
Linus Torvalds [Fri, 4 Feb 2022 19:13:54 +0000 (11:13 -0800)]
Merge tag 'drm-fixes-2022-02-04' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Dave Airlie:
 "Regular fixes for the week. Daniel has agreed to bring back the fbcon
  hw acceleration under a CONFIG option for the non-drm fbdev users, we
  don't advise turning this on unless you are in the niche that is old
  fbdev drivers, Since it's essentially a revert and shouldn't be high
  impact seemed like a good time to do it now.

  Otherwise, i915 and amdgpu fixes are most of it, along with some minor
  fixes elsewhere.

  fbdev:
   - readd fbcon acceleration

  i915:
   - fix DP monitor via type-c dock
   - fix for engine busyness and read timeout with GuC
   - use ALLOW_FAIL for error capture buffer allocs
   - don't use interruptible lock on error paths
   - smatch fix to reject zero sized overlays.

  amdgpu:
   - mGPU fan boost fix for beige goby
   - S0ix fixes
   - Cyan skillfish hang fix
   - DCN fixes for DCN 3.1
   - DCN fixes for DCN 3.01
   - Apple retina panel fix
   - ttm logic inversion fix

  dma-buf:
   - heaps: fix potential spectre v1 gadget

  kmb:
   - fix potential oob access

  mxsfb:
   - fix NULL ptr deref

  nouveau:
   - fix potential oob access during BIOS decode"

* tag 'drm-fixes-2022-02-04' of git://anongit.freedesktop.org/drm/drm: (24 commits)
  drm: mxsfb: Fix NULL pointer dereference
  drm/amdgpu: fix logic inversion in check
  drm/amd: avoid suspend on dGPUs w/ s2idle support when runtime PM enabled
  drm/amd/display: Force link_rate as LINK_RATE_RBR2 for 2018 15" Apple Retina panels
  drm/amd/display: revert "Reset fifo after enable otg"
  drm/amd/display: watermark latencies is not enough on DCN31
  drm/amd/display: Update watermark values for DCN301
  drm/amdgpu: fix a potential GPU hang on cyan skillfish
  drm/amd: Only run s3 or s0ix if system is configured properly
  drm/amd: add support to check whether the system is set to s3
  fbcon: Add option to enable legacy hardware acceleration
  Revert "fbcon: Disable accelerated scrolling"
  Revert "fbdev: Garbage collect fbdev scrolling acceleration, part 1 (from TODO list)"
  drm/i915/pmu: Fix KMD and GuC race on accessing busyness
  dma-buf: heaps: Fix potential spectre v1 gadget
  drm/amd: Warn users about potential s0ix problems
  drm/amd/pm: correct the MGpuFanBoost support for Beige Goby
  drm/nouveau: fix off by one in BIOS boundary checking
  drm/i915/adlp: Fix TypeC PHY-ready status readout
  drm/i915/pmu: Use PM timestamp instead of RING TIMESTAMP for reference
  ...

2 years agoMerge branch 'akpm' (patches from Andrew)
Linus Torvalds [Fri, 4 Feb 2022 18:34:19 +0000 (10:34 -0800)]
Merge branch 'akpm' (patches from Andrew)

Merge misc fixes from Andrew Morton:
 "10 patches.

  Subsystems affected by this patch series: ipc, MAINTAINERS, and mm
  (vmscan, debug, pagemap, kmemleak, and selftests)"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  kselftest/vm: revert "tools/testing/selftests/vm/userfaultfd.c: use swap() to make code cleaner"
  MAINTAINERS: update rppt's email
  mm/kmemleak: avoid scanning potential huge holes
  ipc/sem: do not sleep with a spin lock held
  mm/pgtable: define pte_index so that preprocessor could recognize it
  mm/page_table_check: check entries at pmd levels
  mm/khugepaged: unify collapse pmd clear, flush and free
  mm/page_table_check: use unsigned long for page counters and cleanup
  mm/debug_vm_pgtable: remove pte entry from the page table
  Revert "mm/page_isolation: unset migratetype directly for non Buddy page"

2 years agorandom: only call crng_finalize_init() for primary_crng
Dominik Brodowski [Sun, 30 Jan 2022 21:03:20 +0000 (22:03 +0100)]
random: only call crng_finalize_init() for primary_crng

crng_finalize_init() returns instantly if it is called for another pool
than primary_crng. The test whether crng_finalize_init() is still required
can be moved to the relevant caller in crng_reseed(), and
crng_need_final_init can be reset to false if crng_finalize_init() is
called with workqueues ready. Then, no previous callsite will call
crng_finalize_init() unless it is needed, and we can get rid of the
superfluous function parameter.

Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2 years agorandom: access primary_pool directly rather than through pointer
Dominik Brodowski [Sun, 30 Jan 2022 21:03:19 +0000 (22:03 +0100)]
random: access primary_pool directly rather than through pointer

Both crng_initialize_primary() and crng_init_try_arch_early() are
only called for the primary_pool. Accessing it directly instead of
through a function parameter simplifies the code.

Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2 years agorandom: wake up /dev/random writers after zap
Jason A. Donenfeld [Fri, 28 Jan 2022 22:44:03 +0000 (23:44 +0100)]
random: wake up /dev/random writers after zap

When account() is called, and the amount of entropy dips below
random_write_wakeup_bits, we wake up the random writers, so that they
can write some more in. However, the RNDZAPENTCNT/RNDCLEARPOOL ioctl
sets the entropy count to zero -- a potential reduction just like
account() -- but does not unblock writers. This commit adds the missing
logic to that ioctl to unblock waiting writers.

Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2 years agorandom: continually use hwgenerator randomness
Dominik Brodowski [Tue, 25 Jan 2022 20:14:57 +0000 (21:14 +0100)]
random: continually use hwgenerator randomness

The rngd kernel thread may sleep indefinitely if the entropy count is
kept above random_write_wakeup_bits by other entropy sources. To make
best use of multiple sources of randomness, mix entropy from hardware
RNGs into the pool at least once within CRNG_RESEED_INTERVAL.

Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2 years agolib/crypto: blake2s: avoid indirect calls to compression function for Clang CFI
Jason A. Donenfeld [Wed, 19 Jan 2022 13:35:06 +0000 (14:35 +0100)]
lib/crypto: blake2s: avoid indirect calls to compression function for Clang CFI

blake2s_compress_generic is weakly aliased by blake2s_compress. The
current harness for function selection uses a function pointer, which is
ordinarily inlined and resolved at compile time. But when Clang's CFI is
enabled, CFI still triggers when making an indirect call via a weak
symbol. This seems like a bug in Clang's CFI, as though it's bucketing
weak symbols and strong symbols differently. It also only seems to
trigger when "full LTO" mode is used, rather than "thin LTO".

[    0.000000][    T0] Kernel panic - not syncing: CFI failure (target: blake2s_compress_generic+0x0/0x1444)
[    0.000000][    T0] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.16.0-mainline-06981-g076c855b846e #1
[    0.000000][    T0] Hardware name: MT6873 (DT)
[    0.000000][    T0] Call trace:
[    0.000000][    T0]  dump_backtrace+0xfc/0x1dc
[    0.000000][    T0]  dump_stack_lvl+0xa8/0x11c
[    0.000000][    T0]  panic+0x194/0x464
[    0.000000][    T0]  __cfi_check_fail+0x54/0x58
[    0.000000][    T0]  __cfi_slowpath_diag+0x354/0x4b0
[    0.000000][    T0]  blake2s_update+0x14c/0x178
[    0.000000][    T0]  _extract_entropy+0xf4/0x29c
[    0.000000][    T0]  crng_initialize_primary+0x24/0x94
[    0.000000][    T0]  rand_initialize+0x2c/0x6c
[    0.000000][    T0]  start_kernel+0x2f8/0x65c
[    0.000000][    T0]  __primary_switched+0xc4/0x7be4
[    0.000000][    T0] Rebooting in 5 seconds..

Nonetheless, the function pointer method isn't so terrific anyway, so
this patch replaces it with a simple boolean, which also gets inlined
away. This successfully works around the Clang bug.

In general, I'm not too keen on all of the indirection involved here; it
clearly does more harm than good. Hopefully the whole thing can get
cleaned up down the road when lib/crypto is overhauled more
comprehensively. But for now, we go with a simple bandaid.

Fixes: c63117175cce ("lib/crypto: blake2s: include as built-in")
Link: https://github.com/ClangBuiltLinux/linux/issues/1567
Reported-by: Miles Chen <miles.chen@mediatek.com>
Tested-by: Miles Chen <miles.chen@mediatek.com>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: John Stultz <john.stultz@linaro.org>
Acked-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2 years agoMerge tag 'ceph-for-5.17-rc3' of git://github.com/ceph/ceph-client
Linus Torvalds [Fri, 4 Feb 2022 17:54:02 +0000 (09:54 -0800)]
Merge tag 'ceph-for-5.17-rc3' of git://github.com/ceph/ceph-client

Pull ceph fixes from Ilya Dryomov:
 "A patch to make it possible to disable zero copy path in the messenger
  to avoid checksum or authentication tag mismatches and ensuing session
  resets in case the destination buffer isn't guaranteed to be stable"

* tag 'ceph-for-5.17-rc3' of git://github.com/ceph/ceph-client:
  libceph: optionally use bounce buffer on recv path in crc mode
  libceph: make recv path in secure mode work the same as send path

2 years agoMerge tag '9p-for-5.17-rc3' of git://github.com/martinetd/linux
Linus Torvalds [Fri, 4 Feb 2022 17:44:42 +0000 (09:44 -0800)]
Merge tag '9p-for-5.17-rc3' of git://github.com/martinetd/linux

Pull 9p fix from Dominique Martinet:
 "Fix 'cannot walk open fid' rule

  The 9p 'walk' operation requires fid arguments to not originate from
  an open or create call and we've missed that for a while as the
  servers regularly running tests with don't enforce the check and no
  active reviewer knew about the rule.

  Both reporters confirmed reverting this patch fixes things for them
  and looking at it further wasn't actually required... Will take more
  time for follow up and enforcing the rule more thoroughly later"

* tag '9p-for-5.17-rc3' of git://github.com/martinetd/linux:
  Revert "fs/9p: search open fids first"

2 years agoMerge tag '5.17-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Linus Torvalds [Fri, 4 Feb 2022 17:34:37 +0000 (09:34 -0800)]
Merge tag '5.17-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs fixes from Steve French:
 "SMB3 client fixes including:

   - multiple fscache related fixes, reenabling ability to read/write to
     cached files for cifs.ko (that was temporarily disabled for cifs.ko
     a few weeks ago due to the recent fscache changes)

   - also includes a new fscache helper function ("query_occupancy")
     used by above

   - fix for multiuser mounts and NTLMSSP auth (workstation name) for
     stable

   - fix locking ordering problem in multichannel code

   - trivial malformed comment fix"

* tag '5.17-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: fix workstation_name for multiuser mounts
  Invalidate fscache cookie only when inode attributes are changed.
  cifs: Fix the readahead conversion to manage the batch when reading from cache
  cifs: Implement cache I/O by accessing the cache directly
  netfs, cachefiles: Add a method to query presence of data in the cache
  cifs: Transition from ->readpages() to ->readahead()
  cifs: unlock chan_lock before calling cifs_put_tcp_session
  Fix a warning about a malformed kernel doc comment in cifs

2 years agokselftest/vm: revert "tools/testing/selftests/vm/userfaultfd.c: use swap() to make...
Shuah Khan [Fri, 4 Feb 2022 04:49:45 +0000 (20:49 -0800)]
kselftest/vm: revert "tools/testing/selftests/vm/userfaultfd.c: use swap() to make code cleaner"

With this change, userfaultfd fails to build with undefined reference
swap() error:

  userfaultfd.c: In function `userfaultfd_stress':
  userfaultfd.c:1530:17: warning: implicit declaration of function `swap'; did you mean `swab'? [-Wimplicit-function-declaration]
   1530 |                 swap(area_src, area_dst);
        |                 ^~~~
        |                 swab
  /usr/bin/ld: /tmp/ccDGOAdV.o: in function `userfaultfd_stress':
  userfaultfd.c:(.text+0x549e): undefined reference to `swap'
  /usr/bin/ld: userfaultfd.c:(.text+0x54bc): undefined reference to `swap'
  collect2: error: ld returned 1 exit status

Revert the commit to fix the problem.

Link: https://lkml.kernel.org/r/20220202003340.87195-1-skhan@linuxfoundation.org
Fixes: ccdc379c9e32 ("tools/testing/selftests/vm/userfaultfd.c: use swap() to make code cleaner")
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Minghao Chi <chi.minghao@zte.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 years agoMAINTAINERS: update rppt's email
Mike Rapoport [Fri, 4 Feb 2022 04:49:41 +0000 (20:49 -0800)]
MAINTAINERS: update rppt's email

Use my @kernel.org address

Link: https://lkml.kernel.org/r/20220203090324.3701774-1-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 years agomm/kmemleak: avoid scanning potential huge holes
Lang Yu [Fri, 4 Feb 2022 04:49:37 +0000 (20:49 -0800)]
mm/kmemleak: avoid scanning potential huge holes

When using devm_request_free_mem_region() and devm_memremap_pages() to
add ZONE_DEVICE memory, if requested free mem region's end pfn were
huge(e.g., 0x400000000), the node_end_pfn() will be also huge (see
move_pfn_range_to_zone()).  Thus it creates a huge hole between
node_start_pfn() and node_end_pfn().

We found on some AMD APUs, amdkfd requested such a free mem region and
created a huge hole.  In such a case, following code snippet was just
doing busy test_bit() looping on the huge hole.

  for (pfn = start_pfn; pfn < end_pfn; pfn++) {
struct page *page = pfn_to_online_page(pfn);
if (!page)
continue;
...
  }

So we got a soft lockup:

  watchdog: BUG: soft lockup - CPU#6 stuck for 26s! [bash:1221]
  CPU: 6 PID: 1221 Comm: bash Not tainted 5.15.0-custom #1
  RIP: 0010:pfn_to_online_page+0x5/0xd0
  Call Trace:
    ? kmemleak_scan+0x16a/0x440
    kmemleak_write+0x306/0x3a0
    ? common_file_perm+0x72/0x170
    full_proxy_write+0x5c/0x90
    vfs_write+0xb9/0x260
    ksys_write+0x67/0xe0
    __x64_sys_write+0x1a/0x20
    do_syscall_64+0x3b/0xc0
    entry_SYSCALL_64_after_hwframe+0x44/0xae

I did some tests with the patch.

(1) amdgpu module unloaded

before the patch:

  real    0m0.976s
  user    0m0.000s
  sys     0m0.968s

after the patch:

  real    0m0.981s
  user    0m0.000s
  sys     0m0.973s

(2) amdgpu module loaded

before the patch:

  real    0m35.365s
  user    0m0.000s
  sys     0m35.354s

after the patch:

  real    0m1.049s
  user    0m0.000s
  sys     0m1.042s

Link: https://lkml.kernel.org/r/20211108140029.721144-1-lang.yu@amd.com
Signed-off-by: Lang Yu <lang.yu@amd.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 years agoipc/sem: do not sleep with a spin lock held
Minghao Chi [Fri, 4 Feb 2022 04:49:33 +0000 (20:49 -0800)]
ipc/sem: do not sleep with a spin lock held

We can't call kvfree() with a spin lock held, so defer it.

Link: https://lkml.kernel.org/r/20211223031207.556189-1-chi.minghao@zte.com.cn
Fixes: d45f0732e00e ("[PATCH] ipc sem: use kvmalloc for sem_undo allocation")
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Reviewed-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Yang Guang <cgel.zte@gmail.com>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Cc: Vasily Averin <vvs@virtuozzo.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 years agomm/pgtable: define pte_index so that preprocessor could recognize it
Mike Rapoport [Fri, 4 Feb 2022 04:49:29 +0000 (20:49 -0800)]
mm/pgtable: define pte_index so that preprocessor could recognize it

Since commit c80f5fc1c03c ("mm: consolidate pte_index() and
pte_offset_*() definitions") pte_index is a static inline and there is
no define for it that can be recognized by the preprocessor.  As a
result, vm_insert_pages() uses slower loop over vm_insert_page() instead
of insert_pages() that amortizes the cost of spinlock operations when
inserting multiple pages.

Link: https://lkml.kernel.org/r/20220111145457.20748-1-rppt@kernel.org
Fixes: c80f5fc1c03c ("mm: consolidate pte_index() and pte_offset_*() definitions")
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Reported-by: Christian Dietrich <stettberger@dokucode.de>
Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 years agomm/page_table_check: check entries at pmd levels
Pasha Tatashin [Fri, 4 Feb 2022 04:49:24 +0000 (20:49 -0800)]
mm/page_table_check: check entries at pmd levels

syzbot detected a case where the page table counters were not properly
updated.

  syzkaller login:  ------------[ cut here ]------------
  kernel BUG at mm/page_table_check.c:162!
  invalid opcode: 0000 [#1] PREEMPT SMP KASAN
  CPU: 0 PID: 3099 Comm: pasha Not tainted 5.16.0+ #48
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIO4
  RIP: 0010:__page_table_check_zero+0x159/0x1a0
  Call Trace:
   free_pcp_prepare+0x3be/0xaa0
   free_unref_page+0x1c/0x650
   free_compound_page+0xec/0x130
   free_transhuge_page+0x1be/0x260
   __put_compound_page+0x90/0xd0
   release_pages+0x54c/0x1060
   __pagevec_release+0x7c/0x110
   shmem_undo_range+0x85e/0x1250
  ...

The repro involved having a huge page that is split due to uprobe event
temporarily replacing one of the pages in the huge page.  Later the huge
page was combined again, but the counters were off, as the PTE level was
not properly updated.

Make sure that when PMD is cleared and prior to freeing the level the
PTEs are updated.

Link: https://lkml.kernel.org/r/20220131203249.2832273-5-pasha.tatashin@soleen.com
Fixes: e192db2bec49 ("mm: page table check")
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Slaby <jirislaby@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Paul Turner <pjt@google.com>
Cc: Wei Xu <weixugc@google.com>
Cc: Will Deacon <will@kernel.org>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 years agomm/khugepaged: unify collapse pmd clear, flush and free
Pasha Tatashin [Fri, 4 Feb 2022 04:49:20 +0000 (20:49 -0800)]
mm/khugepaged: unify collapse pmd clear, flush and free

Unify the code that flushes, clears pmd entry, and frees the PTE table
level into a new function collapse_and_free_pmd().

This cleanup is useful as in the next patch we will add another call to
this function to iterate through PTE prior to freeing the level for page
table check.

Link: https://lkml.kernel.org/r/20220131203249.2832273-4-pasha.tatashin@soleen.com
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Slaby <jirislaby@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Paul Turner <pjt@google.com>
Cc: Wei Xu <weixugc@google.com>
Cc: Will Deacon <will@kernel.org>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 years agomm/page_table_check: use unsigned long for page counters and cleanup
Pasha Tatashin [Fri, 4 Feb 2022 04:49:15 +0000 (20:49 -0800)]
mm/page_table_check: use unsigned long for page counters and cleanup

For consistency, use "unsigned long" for all page counters.

Also, reduce code duplication by calling __page_table_check_*_clear()
from __page_table_check_*_set() functions.

Link: https://lkml.kernel.org/r/20220131203249.2832273-3-pasha.tatashin@soleen.com
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: Wei Xu <weixugc@google.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Slaby <jirislaby@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Paul Turner <pjt@google.com>
Cc: Will Deacon <will@kernel.org>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>