Helge Deller [Thu, 2 Jun 2022 11:50:44 +0000 (13:50 +0200)]
parisc/stifb: Implement fb_is_primary_device()
Implement fb_is_primary_device() function, so that fbcon detects if this
framebuffer belongs to the default graphics card which was used to start
the system.
Mikulas Patocka [Wed, 1 Jun 2022 17:18:22 +0000 (13:18 -0400)]
parisc: fix a crash with multicore scheduler
With the kernel 5.18, the system will hang on boot if it is compiled with
CONFIG_SCHED_MC. The last printed message is "Brought up 1 node, 1 CPU".
The crash happens in sd_init
tl->mask (which is cpu_coregroup_mask) returns an empty mask. This happens
because cpu_topology[0].core_sibling is empty.
Consequently, sd_span is set to an empty mask
sd_id = cpumask_first(sd_span) sets sd_id == NR_CPUS (because the mask is
empty)
sd->shared = *per_cpu_ptr(sdd->sds, sd_id); sets sd->shared to NULL
because sd_id is out of range
atomic_inc(&sd->shared->ref); crashes without printing anything
We can fix it by calling reset_cpu_topology() from init_cpu_topology() -
this will initialize the sibling masks on CPUs, so that they're not empty.
This patch also removes the variable "dualcores_found", it is useless,
because during boot, init_cpu_topology is called before
store_cpu_topology. Thus, set_sched_topology(parisc_mc_topology) is never
called. We don't need to call it at all because default_topology in
kernel/sched/topology.c contains the same items as parisc_mc_topology.
Note that we should not call store_cpu_topology() from init_per_cpu()
because it is called too early in the kernel initialization process and it
results in the message "Failure to register CPU0 device". Before this
patch, store_cpu_topology() would exit immediatelly because
cpuid_topo->core id was uninitialized and it was 0.
David Howells [Sat, 21 May 2022 07:18:28 +0000 (08:18 +0100)]
afs: Fix afs_getattr() to refetch file status if callback break occurred
If a callback break occurs (change notification), afs_getattr() needs to
issue an FS.FetchStatus RPC operation to update the status of the file
being examined by the stat-family of system calls.
Fix afs_getattr() to do this if AFS_VNODE_CB_PROMISED has been cleared
on a vnode by a callback break. Skip this if AT_STATX_DONT_SYNC is set.
This can be tested by appending to a file on one AFS client and then
using "stat -L" to examine its length on a machine running kafs. This
can also be watched through tracing on the kafs machine. The callback
break is seen:
Linus Torvalds [Sun, 22 May 2022 18:04:38 +0000 (08:04 -1000)]
Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"Some I2C driver bugfixes for 5.18. Nothing spectacular but worth
fixing"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
drivers: i2c: thunderx: Allow driver to work with ACPI defined TWSI controllers
i2c: ismt: Provide a DMA buffer for Interrupt Cause Logging
i2c: mt7621: fix missing clk_disable_unprepare() on error in mtk_i2c_probe()
Linus Torvalds [Sun, 22 May 2022 00:14:02 +0000 (14:14 -1000)]
Merge tag 'perf-tools-fixes-for-v5.18-2022-05-21' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
Pull perf tools fixes from Arnaldo Carvalho de Melo:
- Fix and validate CPU map inputs in synthetic PERF_RECORD_STAT events
in 'perf stat'.
- Fix x86's arch__intr_reg_mask() for the hybrid platform.
- Address 'perf bench numa' compiler error on s390.
- Fix check for btf__load_from_kernel_by_id() in libbpf.
- Fix "all PMU test" 'perf test' to skip hv_24x7/hv_gpci tests on
powerpc.
- Fix session topology test to skip the test in guest environment.
- Skip BPF 'perf test' if clang is not present.
- Avoid shell test description infinite loop in 'perf test'.
- Fix Intel LBR callstack entries and nr print message.
* tag 'perf-tools-fixes-for-v5.18-2022-05-21' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
perf session: Fix Intel LBR callstack entries and nr print message
perf test bpf: Skip test if clang is not present
perf test session topology: Fix test to skip the test in guest environment
perf bench numa: Address compiler error on s390
perf test: Avoid shell test description infinite loop
perf regs x86: Fix arch__intr_reg_mask() for the hybrid platform
perf test: Fix "all PMU test" to skip hv_24x7/hv_gpci tests on powerpc
perf stat: Fix and validate CPU map inputs in synthetic PERF_RECORD_STAT events
perf build: Fix check for btf__load_from_kernel_by_id() in libbpf
Linus Torvalds [Sat, 21 May 2022 23:58:43 +0000 (13:58 -1000)]
Merge tag 'input-for-v5.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input fixes from Dmitry Torokhov:
"A small fixup to ili210x touchscreen driver, and updated maintainer
entry for the device tree binding of Mediatek 6779 keypad:
- fix reset timing of Ilitek touchscreens
- update maintainer entry of DT binding of Mediatek 6779 keypad"
* tag 'input-for-v5.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: ili210x - use one common reset implementation
Input: ili210x - fix reset timing
dt-bindings: input: mediatek,mt6779-keypad: update maintainer
Linus Torvalds [Sat, 21 May 2022 23:31:50 +0000 (13:31 -1000)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Two patches, both in drivers.
The iscsi one is fixing the cpumask issue you commented on and the ufs
one is a late arriving fix for conditions that can occur in Host
Performance Booster reads"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: ufs: core: Fix referencing invalid rsp field
scsi: target: Fix incorrect use of cpumask_t
Chengdong Li [Tue, 17 May 2022 01:57:26 +0000 (09:57 +0800)]
perf session: Fix Intel LBR callstack entries and nr print message
When generating callstack information from branch_stack(Intel LBR), the
actual number of callstack entry should be bigger than the number of
branch_stack, for example:
branch_stack records:
B() -> C()
A() -> B()
converted callstack records should be:
C()
B()
A()
though, the number of callstack equals
to the number of branch stack plus 1.
This patch fixes above issue in branch_stack__printf(). For example,
# echo 'scale=2000; 4*a(1)' > cmd
# perf record --call-graph lbr bc -l < cmd
Before applying this patch, `perf script -D` output:
Enabling verbose option provided debug logs which says clang/llvm needs
to be installed. Snippet of verbose logs:
<<>>
42.2: BPF pinning :
--- start ---
test child forked, pid 61423
ERROR: unable to find clang.
Hint: Try to install latest clang/llvm to support BPF.
Check your $PATH
<<logs_here>>
Failed to compile test case: 'Basic BPF llvm compile'
Unable to get BPF object, fix kbuild first
test child finished with -1
---- end ----
BPF filter subtest 2: FAILED!
<<>>
Here subtests, "BPF pinning" and "BPF prologue generation" failed and
logs shows clang/llvm is needed. After installing clang, testcase
passes.
Reason on why subtest failure happens though logs has proper debug
information:
Main function __test__bpf calls test_llvm__fetch_bpf_obj by
passing 4th argument as true ( 4th arguments maps to parameter
"force" in test_llvm__fetch_bpf_obj ). But this will cause
test_llvm__fetch_bpf_obj to skip the check for clang/llvm.
Snippet of code part which checks for clang based on
parameter "force" in test_llvm__fetch_bpf_obj:
<<>>
if (!force && (!llvm_param.user_set_param &&
<<>>
Since force is set to "false", test won't get skipped and fails to
compile test case. The BPF code compilation needs clang, So pass the
fourth argument as "false" and also skip the test if reason for return
is "TEST_SKIP"
Athira Rajeev [Wed, 11 May 2022 11:49:59 +0000 (17:19 +0530)]
perf test session topology: Fix test to skip the test in guest environment
The session topology test fails in powerpc pSeries platform.
Test logs:
<<>>
Session topology : FAILED!
<<>>
This testcases tests cpu topology by checking the core_id and socket_id
stored in perf_env from perf session. The data from perf session is
compared with the cpu topology information from
"/sys/devices/system/cpu/cpuX/topology" like core_id,
physical_package_id.
In case of virtual environment, detail like physical_package_id is
restricted to be exposed. Hence physical_package_id is set to -1. The
testcase fails on such platforms since socket_id can't be fetched from
topology info.
Skip the testcase in powerpc if physical_package_id returns -1.
Reviewed-by: Kajol Jain <kjain@linux.ibm.com> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>--- Tested-by: Disha Goel <disgoel@linux.vnet.ibm.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nageswara R Sastry <rnsastry@linux.ibm.com> Cc: linuxppc-dev@lists.ozlabs.org Link: https://lore.kernel.org/r/20220511114959.84002-1-atrajeev@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Thomas Richter [Fri, 20 May 2022 08:11:58 +0000 (10:11 +0200)]
perf bench numa: Address compiler error on s390
The compilation on s390 results in this error:
# make DEBUG=y bench/numa.o
...
bench/numa.c: In function ‘__bench_numa’:
bench/numa.c:1749:81: error: ‘%d’ directive output may be truncated
writing between 1 and 11 bytes into a region of size between
10 and 20 [-Werror=format-truncation=]
1749 | snprintf(tname, sizeof(tname), "process%d:thread%d", p, t);
^~
...
bench/numa.c:1749:64: note: directive argument in the range
[-2147483647, 2147483646]
...
#
The maximum length of the %d replacement is 11 characters because of the
negative sign. Therefore extend the array by two more characters.
Output after:
# make DEBUG=y bench/numa.o > /dev/null 2>&1; ll bench/numa.o
-rw-r--r-- 1 root root 418320 May 19 09:11 bench/numa.o
#
Fixes: 2ec394d998b083ca ("perf bench numa: Avoid possible truncation when using snprintf()") Suggested-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Sumanth Korikkar <sumanthk@linux.ibm.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Link: https://lore.kernel.org/r/20220520081158.2990006-1-tmricht@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Tue, 17 May 2022 20:41:44 +0000 (13:41 -0700)]
perf test: Avoid shell test description infinite loop
for_each_shell_test() is already strict in expecting tests to be files
and executable. It is sometimes possible when it iterates over all files
that it finds one that is executable and lacks a newline character. When
this happens the loop never terminates as it doesn't check for EOF.
Add the EOF check to make this loop at least bounded by the file size.
If the description is returned as NULL then also skip the test.
Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Marco Elver <elver@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: Sohaib Mohamed <sohaib.amhmd@gmail.com> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20220517204144.645913-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Kan Liang [Wed, 18 May 2022 14:51:25 +0000 (07:51 -0700)]
perf regs x86: Fix arch__intr_reg_mask() for the hybrid platform
The X86 specific arch__intr_reg_mask() is to check whether the kernel
and hardware can collect XMM registers. But it doesn't work on some
hybrid platform.
Without the patch on ADL-N:
$ perf record -I?
available registers: AX BX CX DX SI DI BP SP IP FLAGS CS SS R8 R9 R10
R11 R12 R13 R14 R15
The config of the test event doesn't contain the PMU information. The
kernel may fail to initialize it on the correct hybrid PMU and return
the wrong non-supported information.
Add the PMU information into the config for the hybrid platform. The
same register set is supported among different hybrid PMUs. Checking
the first available one is good enough.
With the patch on ADL-N:
$ perf record -I?
available registers: AX BX CX DX SI DI BP SP IP FLAGS CS SS R8 R9 R10
R11 R12 R13 R14 R15 XMM0 XMM1 XMM2 XMM3 XMM4 XMM5 XMM6 XMM7 XMM8 XMM9
XMM10 XMM11 XMM12 XMM13 XMM14 XMM15
Fixes: 415197a3373aa2a6 ("perf regs x86: Add X86 specific arch__intr_reg_mask()") Reported-by: Ammy Yi <ammy.yi@intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Link: https://lore.kernel.org/r/20220518145125.1494156-1-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Athira Rajeev [Fri, 20 May 2022 10:12:36 +0000 (15:42 +0530)]
perf test: Fix "all PMU test" to skip hv_24x7/hv_gpci tests on powerpc
"perf all PMU test" picks the input events from "perf list --raw-dump
pmu" list and runs "perf stat -e" for each of the event in the list. In
case of powerpc, the PowerVM environment supports events from hv_24x7
and hv_gpci PMU which is of example format like below:
The value for "?" needs to be filled in depending on system and
respective event. CPM_ADJUNCT_INST needs have core value and domain
value. hv_gpci event needs partition_id. Similarly, there are other
events for hv_24x7 and hv_gpci having "?" in event format. Hence skip
these events on powerpc platform since values like partition_id, domain
is specific to system and event.
Fixes: b409bf4543c6f814 ("perf test: Workload test of all PMUs") Signed-off-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: linuxppc-dev@lists.ozlabs.org Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nageswara R Sastry <rnsastry@linux.ibm.com> Link: https://lore.kernel.org/r/20220520101236.17249-1-atrajeev@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Mika Westerberg [Wed, 27 Apr 2022 10:19:10 +0000 (13:19 +0300)]
i2c: ismt: Provide a DMA buffer for Interrupt Cause Logging
Before sending a MSI the hardware writes information pertinent to the
interrupt cause to a memory location pointed by SMTICL register. This
memory holds three double words where the least significant bit tells
whether the interrupt cause of master/target/error is valid. The driver
does not use this but we need to set it up because otherwise it will
perform DMA write to the default address (0) and this will cause an
IOMMU fault such as below:
DMAR: DRHD: handling fault status reg 2
DMAR: [DMA Write] Request device [00:12.0] PASID ffffffff fault addr 0
[fault reason 05] PTE Write access is not set
To prevent this from happening, provide a proper DMA buffer for this
that then gets mapped by the IOMMU accordingly.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Reviewed-by: From: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Wolfram Sang <wsa@kernel.org>
Linus Torvalds [Sat, 21 May 2022 06:34:59 +0000 (20:34 -1000)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"ARM:
- Correctly expose GICv3 support even if no irqchip is created so
that userspace doesn't observe it changing pointlessly (fixing a
regression with QEMU)
- Don't issue a hypercall to set the id-mapped vectors when protected
mode is enabled (fix for pKVM in combination with CPUs affected by
Spectre-v3a)
x86 (five oneliners, of which the most interesting two are):
- a NULL pointer dereference on INVPCID executed with paging
disabled, but only if KVM is using shadow paging
- an incorrect bsearch comparison function which could truncate the
result and apply PMU event filtering incorrectly. This one comes
with a selftests update too"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86/mmu: fix NULL pointer dereference on guest INVPCID
KVM: x86: hyper-v: fix type of valid_bank_mask
KVM: Free new dirty bitmap if creating a new memslot fails
KVM: eventfd: Fix false positive RCU usage warning
selftests: kvm/x86: Verify the pmu event filter matches the correct event
selftests: kvm/x86: Add the helper function create_pmu_event_filter
kvm: x86/pmu: Fix the compare function used by the pmu event filter
KVM: arm64: Don't hypercall before EL2 init
KVM: arm64: vgic-v3: Consistently populate ID_AA64PFR0_EL1.GIC
KVM: x86/mmu: Update number of zapped pages even if page list is stable
Linus Torvalds [Sat, 21 May 2022 05:07:28 +0000 (19:07 -1000)]
Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk fixes from Stephen Boyd:
"Three clk driver fixes to close out the release
- Fix a divider calculation breaking boot on Broadcom bcm2835
- Fix HDMI output on Tanix TX6 mini board by reverting a patch
- Fix clk_set_rate_range() calls on at91 by considering the range
while calculating the divisor"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: at91: generated: consider range when calculating best rate
Revert "clk: sunxi-ng: sun6i-rtc: Add support for H6"
clk: bcm2835: fix bcm2835_clock_choose_div
Linus Torvalds [Sat, 21 May 2022 04:58:37 +0000 (18:58 -1000)]
Merge tag 'drm-fixes-2022-05-21' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Few final fixes for 5.18, one amdgpu, core dp mst leak fix, dma-buf
two fixes, and i915 has a few fixes, one for a regression on older
GM45 chipsets,
dma-buf:
- ioctl userspace use fix
- fix dma-buf sysfs name generation
core:
- dp/mst leak fix
amdgpu:
- suspend/resume regression fix
i915:
- fix for #5806: GPU hangs and display artifacts on Intel GM45
- reject DMC with out-of-spec MMIO
- correctly mark guilty contexts on GuC reset"
* tag 'drm-fixes-2022-05-21' of git://anongit.freedesktop.org/drm/drm:
drm/i915: Use i915_gem_object_ggtt_pin_ww for reloc_iomap
drm/amd: Don't reset dGPUs if the system is going to s2idle
drm/dp/mst: fix a possible memory leak in fetch_monitor_name()
dma-buf: fix use of DMA_BUF_SET_NAME_{A,B} in userspace
i915/guc/reset: Make __guc_reset_context aware of guilty engines
drm/i915/dmc: Add MMIO range restrictions
dma-buf: ensure unique directory name for dmabuf stats
Dave Airlie [Fri, 20 May 2022 20:00:48 +0000 (06:00 +1000)]
Merge tag 'drm-intel-fixes-2022-05-20' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
- fix for #5806: GPU hangs and display artifacts on 5.18-rc3 on Intel GM45
- reject DMC with out-of-spec MMIO (Cc: stable)
- correctly mark guilty contexts on GuC reset.
Peter Zijlstra [Fri, 20 May 2022 18:38:06 +0000 (20:38 +0200)]
perf: Fix sys_perf_event_open() race against self
Norbert reported that it's possible to race sys_perf_event_open() such
that the looser ends up in another context from the group leader,
triggering many WARNs.
The move_group case checks for races against itself, but the
!move_group case doesn't, seemingly relying on the previous
group_leader->ctx == ctx check. However, that check is racy due to not
holding any locks at that time.
Therefore, re-check the result after acquiring locks and bailing
if they no longer match.
Additionally, clarify the not_move_group case from the
move_group-vs-move_group race.
Linus Torvalds [Fri, 20 May 2022 18:26:28 +0000 (08:26 -1000)]
Merge tag 'gpio-fixes-for-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull gpio fixes from Bartosz Golaszewski:
- fix bitops logic in gpio-vf610
- return an error if the user tries to use inverted polarity in
gpio-mvebu
* tag 'gpio-fixes-for-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpio: mvebu/pwm: Refuse requests with inverted polarity
gpio: gpio-vf610: do not touch other bits when set the target bit
Linus Torvalds [Fri, 20 May 2022 18:15:40 +0000 (08:15 -1000)]
Merge tag 'ceph-for-5.18-rc8' of https://github.com/ceph/ceph-client
Pull ceph fix from Ilya Dryomov:
"A fix for a nasty use-after-free, marked for stable"
* tag 'ceph-for-5.18-rc8' of https://github.com/ceph/ceph-client:
libceph: fix misleading ceph_osdc_cancel_request() comment
libceph: fix potential use-after-free on linger ping and resends
Linus Torvalds [Fri, 20 May 2022 18:09:00 +0000 (08:09 -1000)]
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"Three arm64 fixes for -rc8/final.
The MTE and stolen time fixes have been doing the rounds for a little
while, but review and testing feedback was ongoing until earlier this
week. The kexec fix showed up on Monday and addresses a failure
observed under Qemu.
Summary:
- Add missing write barrier to publish MTE tags before a pte update
- Fix kexec relocation clobbering its own data structures
- Fix stolen time crash if a timer IRQ fires during CPU hotplug"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: mte: Ensure the cleared tags are visible before setting the PTE
arm64: kexec: load from kimage prior to clobbering
arm64: paravirt: Use RCU read locks to guard stolen_time
Paolo Bonzini [Fri, 20 May 2022 17:48:11 +0000 (13:48 -0400)]
KVM: x86/mmu: fix NULL pointer dereference on guest INVPCID
With shadow paging enabled, the INVPCID instruction results in a call
to kvm_mmu_invpcid_gva. If INVPCID is executed with CR0.PG=0, the
invlpg callback is not set and the result is a NULL pointer dereference.
Fix it trivially by checking for mmu->invlpg before every call.
There are other possibilities:
- check for CR0.PG, because KVM (like all Intel processors after P5)
flushes guest TLB on CR0.PG changes so that INVPCID/INVLPG are a
nop with paging disabled
- check for EFER.LMA, because KVM syncs and flushes when switching
MMU contexts outside of 64-bit mode
All of these are tricky, go for the simple solution. This is CVE-2022-1789.
Reported-by: Yongkang Jia <kangel@zju.edu.cn> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Yury Norov [Thu, 19 May 2022 17:15:04 +0000 (10:15 -0700)]
KVM: x86: hyper-v: fix type of valid_bank_mask
In kvm_hv_flush_tlb(), valid_bank_mask is declared as unsigned long,
but is used as u64, which is wrong for i386, and has been spotted by
LKP after applying "KVM: x86: hyper-v: replace bitmap_weight() with
hweight64()"
KVM: Free new dirty bitmap if creating a new memslot fails
Fix a goof in kvm_prepare_memory_region() where KVM fails to free the
new memslot's dirty bitmap during a CREATE action if
kvm_arch_prepare_memory_region() fails. The logic is supposed to detect
if the bitmap was allocated and thus needs to be freed, versus if the
bitmap was inherited from the old memslot and thus needs to be kept. If
there is no old memslot, then obviously the bitmap can't have been
inherited
The bug was exposed by commit c6e7aaf5d6d5 ("KVM: x86/mmu: Do not create
SPTEs for GFNs that exceed host.MAXPHYADDR"), which made it trivally easy
for syzkaller to trigger failure during kvm_arch_prepare_memory_region(),
but the bug can be hit other ways too, e.g. due to -ENOMEM when
allocating x86's memslot metadata.
Haibo Chen [Wed, 11 May 2022 02:15:04 +0000 (10:15 +0800)]
gpio: gpio-vf610: do not touch other bits when set the target bit
For gpio controller contain register PDDR, when set one target bit,
current logic will clear all other bits, this is wrong. Use operator
'|=' to fix it.
Fixes: e312bc80dfe0f6b5 ("perf evsel: Pass cpu not cpu map index to synthesize") Reported-by: Michael Petlan <mpetlan@redhat.com> Suggested-by: Michael Petlan <mpetlan@redhat.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Dave Marchevsky <davemarchevsky@fb.com> Cc: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: KP Singh <kpsingh@kernel.org> Cc: Lv Ruyi <lv.ruyi@zte.com.cn> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: netdev@vger.kernel.org Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Monnet <quentin@isovalent.com> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Cc: Yonghong Song <yhs@fb.com> Link: http://lore.kernel.org/lkml/20220519032005.1273691-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
resampler-list is protected by irq_srcu (see kvm_irqfd_assign), so fix
the false positive by using list_for_each_entry_srcu().
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Message-Id: <1652950153-12489-1-git-send-email-wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
perf build: Fix check for btf__load_from_kernel_by_id() in libbpf
Avi Kivity reported a problem where the __weak
btf__load_from_kernel_by_id() in tools/perf/util/bpf-event.c was being
used and it called btf__get_from_id() in tools/lib/bpf/btf.c that in
turn called back to btf__load_from_kernel_by_id(), resulting in an
endless loop.
Fix this by adding a feature test to check if
btf__load_from_kernel_by_id() is available when building perf with
LIBBPF_DYNAMIC=1, and if not then provide the fallback to the old
btf__get_from_id(), that doesn't call back to btf__load_from_kernel_by_id()
since at that time it didn't exist at all.
Tested on Fedora 35 where we have libbpf-devel 0.4.0 with LIBBPF_DYNAMIC
where we don't have btf__load_from_kernel_by_id() and thus its feature
test fail, not defining HAVE_LIBBPF_BTF__LOAD_FROM_KERNEL_BY_ID:
$ cat /tmp/build/perf-urgent/feature/test-libbpf-btf__load_from_kernel_by_id.make.output
test-libbpf-btf__load_from_kernel_by_id.c: In function ‘main’:
test-libbpf-btf__load_from_kernel_by_id.c:6:16: error: implicit declaration of function ‘btf__load_from_kernel_by_id’ [-Werror=implicit-function-declaration]
6 | return btf__load_from_kernel_by_id(20151128, NULL);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
$
$ nm /tmp/build/perf-urgent/perf | grep btf__load_from_kernel_by_id 00000000005ba180 T btf__load_from_kernel_by_id
$
Aaron Lewis [Tue, 17 May 2022 05:12:38 +0000 (05:12 +0000)]
selftests: kvm/x86: Verify the pmu event filter matches the correct event
Add a test to demonstrate that when the guest programs an event select
it is matched correctly in the pmu event filter and not inadvertently
filtered. This could happen on AMD if the high nybble[1] in the event
select gets truncated away only leaving the bottom byte[2] left for
matching.
This is a contrived example used for the convenience of demonstrating
this issue, however, this can be applied to event selects 0x28A (OC
Mode Switch) and 0x08A (L1 BTB Correction), where 0x08A could end up
being denied when the event select was only set up to deny 0x28A.
[1] bits 35:32 in the event select register and bits 11:8 in the event
select.
[2] bits 7:0 in the event select register and bits 7:0 in the event
select.
Signed-off-by: Aaron Lewis <aaronlewis@google.com>
Message-Id: <20220517051238.2566934-3-aaronlewis@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Aaron Lewis [Tue, 17 May 2022 05:12:37 +0000 (05:12 +0000)]
selftests: kvm/x86: Add the helper function create_pmu_event_filter
Add a helper function that creates a pmu event filter given an event
list. Currently, a pmu event filter can only be created with the same
hard coded event list. Add a way to create one given a different event
list.
Also, rename make_pmu_event_filter to alloc_pmu_event_filter to clarify
it's purpose given the introduction of create_pmu_event_filter.
No functional changes intended.
Signed-off-by: Aaron Lewis <aaronlewis@google.com>
Message-Id: <20220517051238.2566934-2-aaronlewis@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Aaron Lewis [Tue, 17 May 2022 05:12:36 +0000 (05:12 +0000)]
kvm: x86/pmu: Fix the compare function used by the pmu event filter
When returning from the compare function the u64 is truncated to an
int. This results in a loss of the high nybble[1] in the event select
and its sign if that nybble is in use. Switch from using a result that
can end up being truncated to a result that can only be: 1, 0, -1.
[1] bits 35:32 in the event select register and bits 11:8 in the event
select.
Fixes: 6c9cae9408beb ("KVM: x86/pmu: Use binary search to check filtered events") Signed-off-by: Aaron Lewis <aaronlewis@google.com> Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220517051238.2566934-1-aaronlewis@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Linus Torvalds [Fri, 20 May 2022 06:04:17 +0000 (20:04 -1000)]
Merge tag 'v5.18-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fix from Herbert Xu:
"Fix a regression in a recent fix to qcom-rng"
* tag 'v5.18-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: qcom-rng - fix infinite loop on requests not multiple of WORD_SZ
Daejun Park [Thu, 19 May 2022 06:05:29 +0000 (15:05 +0900)]
scsi: ufs: core: Fix referencing invalid rsp field
Fix referencing sense data when it is invalid. When the length of the data
segment is 0, there is no valid information in the rsp field, so
ufshpb_rsp_upiu() is returned without additional operation.
Link: https://lore.kernel.org/r/252651381.41652940482659.JavaMail.epsvc@epcpadp4 Fixes: ca2f60808241 ("scsi: ufs: ufshpb: L2P map management for HPB read") Acked-by: Avri Altman <avri.altman@wdc.com> Signed-off-by: Daejun Park <daejun7.park@samsung.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Linus Torvalds [Thu, 19 May 2022 16:10:09 +0000 (06:10 -1000)]
Merge tag 'for-5.18/parisc-4' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc architecture fixes from Helge Deller:
"We had two big outstanding issues after v5.18-rc6:
a) 32-bit kernels on 64-bit machines (e.g. on a C3700 which is able
to run 32- and 64-bit kernels) failed early in userspace.
b) 64-bit kernels on PA8800/PA8900 CPUs (e.g. in a C8000) showed
random userspace segfaults. We assumed that those problems were
caused by the tmpalias flushes.
Dave did a lot of testing and reorganization of the current flush code
and fixed the 32-bit cache flushing. For PA8800/PA8900 CPUs he
switched the code to flush using the virtual address of user and
kernel pages instead of using tmpalias flushes. The tmpalias flushes
don't seem to work reliable on such CPUs.
We tested the patches on a wide range machines (715/64, B160L, C3000,
C3700, C8000, rp3440) and they have been in for-next without any
conflicts.
Summary:
- Rewrite the cache flush code for PA8800/PA8900 CPUs to flush using
the virtual address of user and kernel pages instead of using
tmpalias flushes. Testing showed, that tmpalias flushes don't work
reliably on PA8800/PA8900 CPUs
- Fix flush code to allow 32-bit kernels to run on 64-bit capable
machines, e.g. a 32-bit kernel on C3700 machines"
* tag 'for-5.18/parisc-4' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Fix patch code locking and flushing
parisc: Rewrite cache flush code for PA8800/PA8900
parisc: Disable debug code regarding cache flushes in handle_nadtlb_fault()
Linus Torvalds [Thu, 19 May 2022 16:08:29 +0000 (06:08 -1000)]
Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm
Pull ARM fixes from Russell King:
"Two further fixes for Spectre-BHB from Ard for Cortex A15 and to use
the wide branch instruction for Thumb2"
* tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: 9197/1: spectre-bhb: fix loop8 sequence for Thumb2
ARM: 9196/1: spectre-bhb: enable for Cortex-A15
Linus Torvalds [Thu, 19 May 2022 16:02:41 +0000 (06:02 -1000)]
Merge tag 'pinctrl-v5.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
Pull pin control fixes from Linus Walleij:
- Fix an altmode in the Ocelot driver
- Fix the IES control pins in the Mediatek MT8365 driver
- Sunxi (AMLogic) driver:
- Fix the UART2 function pin assignments
- Fix the signal name of the PA2 SPI pin
* tag 'pinctrl-v5.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: sunxi: f1c100s: Fix signal name comment for PA2 SPI pin
pinctrl: sunxi: fix f1c100s uart2 function
pinctrl: mediatek: mt8365: fix IES control pins
pinctrl: ocelot: Fix for lan966x alt mode
Ulf Hansson [Tue, 17 May 2022 10:10:46 +0000 (12:10 +0200)]
mmc: core: Fix busy polling for MMC_SEND_OP_COND again
It turned out that polling period for MMC_SEND_OP_COND, that currently is
set to 1ms, still isn't sufficient. In particular a Micron eMMC on a
Beaglebone platform, is reported to sometimes fail to initialize.
Additional test, shows that extending the period to 4ms is working fine, so
let's make that change.
Reported-by: Jean Rene Dawin <jdawin@math.uni-bielefeld.de> Tested-by: Jean Rene Dawin <jdawin@math.uni-bielefeld.de> Fixes: 0eabc475f35f (mmc: core: Restore (almost) the busy polling for MMC_SEND_OP_COND") Fixes: cda89e402cff ("mmc: core: adjust polling interval for CMD1") Cc: stable@vger.kernel.org Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Link: https://lore.kernel.org/r/20220517101046.27512-1-ulf.hansson@linaro.org
drm/i915: Use i915_gem_object_ggtt_pin_ww for reloc_iomap
When removing short term pins, I've changed the the batch buffer
pinning for relocation to use __i915_vma_pin, because
i915_gem_object_ggtt_pin_ww was destroying the old vma. This
caused regressions, because the functions are not identical.
Fix the regressions by calling i915_gem_object_ggtt_pin_ww() again
on ggtt-only platforms, but only if the batch can be pinned without
being moved.
Fixes: a675bfc560f0 ("drm/i915: Remove short-term pins from execbuf, v6.") Cc: Matthew Auld <matthew.auld@intel.com> Reported-by: Mateusz Jończyk <mat.jonczyk@o2.pl> Tested-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Acked-by: Matthew Auld <matthew.auld@intel.com> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/5806 Link: https://patchwork.freedesktop.org/patch/msgid/20220511115219.46507-1-maarten.lankhorst@linux.intel.com
(cherry picked from commit 451374eef622fca6f00eeeda89aaccb45a30a149) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Andrew Lunn [Wed, 18 May 2022 00:58:40 +0000 (02:58 +0200)]
net: bridge: Clear offload_fwd_mark when passing frame up bridge interface.
It is possible to stack bridges on top of each other. Consider the
following which makes use of an Ethernet switch:
br1
/ \
/ \
/ \
br0.11 wlan0
|
br0
/ | \
p1 p2 p3
br0 is offloaded to the switch. Above br0 is a vlan interface, for
vlan 11. This vlan interface is then a slave of br1. br1 also has a
wireless interface as a slave. This setup trunks wireless lan traffic
over the copper network inside a VLAN.
A frame received on p1 which is passed up to the bridge has the
skb->offload_fwd_mark flag set to true, indicating that the switch has
dealt with forwarding the frame out ports p2 and p3 as needed. This
flag instructs the software bridge it does not need to pass the frame
back down again. However, the flag is not getting reset when the frame
is passed upwards. As a result br1 sees the flag, wrongly interprets
it, and fails to forward the frame to wlan0.
When passing a frame upwards, clear the flag. This is the Rx
equivalent of br_switchdev_frame_unmark() in br_dev_xmit().
Fixes: d7ca5bc918c9 ("bridge: switchdev: Use an helper to clear forward mark") Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://lore.kernel.org/r/20220518005840.771575-1-andrew@lunn.ch Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jonathan Lemon [Tue, 17 May 2022 21:46:00 +0000 (14:46 -0700)]
ptp: ocp: change sysfs attr group handling
In the detach path, the driver calls sysfs_remove_group() for the
groups it believes has been registered. However, if the group was
never previously registered, then this causes a splat.
Instead, compute the groups that should be registered in advance,
and then call sysfs_create_groups(), which registers them all at once.
Update the error handling appropriately.
Fixes: 2114f6086b69 ("ptp: ocp: Add firmware capability bits for feature gating") Reported-by: Zheyu Ma <zheyuma97@gmail.com> Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com> Link: https://lore.kernel.org/r/20220517214600.10606-1-jonathan.lemon@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
1) Reduce number of hardware offload retries from flowtable datapath
which might hog system with retries, from Felix Fietkau.
2) Skip neighbour lookup for PPPoE device, fill_forward_path() already
provides this and set on destination address from fill_forward_path for
PPPoE device, also from Felix.
4) When combining PPPoE on top of a VLAN device, set info->outdev to the
PPPoE device so software offload works, from Felix.
5) Fix TCP teardown flowtable state, races with conntrack gc might result
in resetting the state to ESTABLISHED and the time to one day. Joint
work with Oz Shlomo and Sven Auhagen.
6) Call dst_check() from flowtable datapath to check if dst is stale
instead of doing it from garbage collector path.
7) Disable register tracking infrastructure, either user-space or
kernel need to pre-fetch keys inconditionally, otherwise register
tracking assumes data is already available in register that might
not well be there, leading to incorrect reductions.
* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nf_tables: disable expression reduction infra
netfilter: flowtable: move dst_check to packet path
netfilter: flowtable: fix TCP flow teardown
netfilter: nft_flow_offload: fix offload with pppoe + vlan
net: fix dev_fill_forward_path with pppoe + bridge
netfilter: nft_flow_offload: skip dst neigh lookup for ppp devices
netfilter: flowtable: fix excessive hw offload attempts after failure
====================
Linus Torvalds [Thu, 19 May 2022 00:21:30 +0000 (14:21 -1000)]
Merge tag 'io_uring-5.18-2022-05-18' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
"Two small changes fixing issues from the 5.18 merge window:
- Fix wrong ordering of a tracepoint (Dylan)
- Fix MSG_RING on IOPOLL rings (me)"
* tag 'io_uring-5.18-2022-05-18' of git://git.kernel.dk/linux-block:
io_uring: don't attempt to IOPOLL for MSG_RING requests
io_uring: fix ordering of args in io_uring_queue_async_work
Linus Torvalds [Thu, 19 May 2022 00:19:26 +0000 (14:19 -1000)]
Merge tag 'audit-pr-20220518' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit
Pull audit fix from Paul Moore:
"A single audit patch to fix a problem where a task's audit_context was
not being properly reset with io_uring"
* tag 'audit-pr-20220518' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
audit,io_uring,io-wq: call __audit_uring_exit for dummy contexts
Linus Torvalds [Thu, 19 May 2022 00:07:43 +0000 (14:07 -1000)]
Merge branch 'arm/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC fixes from Arnd Bergmann:
"The SoC bug fixes have calmed down sufficiently, there is one minor
update for the MAINTAINERS file, and few bug fixes for dts
descriptions:
- Updates to the BananaPi R2-Pro (rk3568) dts to match production
hardware rather than the prototype version.
- Qualcomm sm8250 soundwire gets disabled on some machines to avoid
crashes
- A number of aspeed SoC specific fixes, addressing incorrect pin
cotrol settings, some values in the romed8hm board, and a revert
for an accidental removal of a DT node"
* 'arm/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
MAINTAINERS: omap: remove me as a maintainer
ARM: dts: aspeed: Add video engine to g6
ARM: dts: aspeed: romed8hm3: Fix GPIOB0 name
ARM: dts: aspeed: romed8hm3: Add lm25066 sense resistor values
ARM: dts: aspeed-g6: fix SPI1/SPI2 quad pin group
ARM: dts: aspeed-g6: add FWQSPI group in pinctrl dtsi
dt-bindings: pinctrl: aspeed-g6: add FWQSPI function/group
pinctrl: pinctrl-aspeed-g6: add FWQSPI function-group
dt-bindings: pinctrl: aspeed-g6: remove FWQSPID group
pinctrl: pinctrl-aspeed-g6: remove FWQSPID group in pinctrl
ARM: dts: aspeed-g6: remove FWQSPID group in pinctrl dtsi
arm64: dts: qcom: sm8250: don't enable rx/tx macro by default
arm64: dts: rockchip: Add gmac1 and change network settings of bpi-r2-pro
arm64: dts: rockchip: Change io-domains of bpi-r2-pro
Linus Torvalds [Thu, 19 May 2022 00:02:25 +0000 (14:02 -1000)]
Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull misc fixes from Al Viro:
"vhost race fix and a percpu_ref_init-caused cgroup double-free fix.
The latter had manifested as buggered struct mount refcounting - those
are also using percpu data structures, but anything that does percpu
allocations could be hit"
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
Fix double fget() in vhost_net_set_backend()
percpu_ref_init(): clean ->percpu_count_ref on failure
Marek Vasut [Wed, 18 May 2022 21:30:09 +0000 (14:30 -0700)]
Input: ili210x - use one common reset implementation
Rename ili251x_hardware_reset() to ili210x_hardware_reset(), change its
parameter from struct device * to struct gpio_desc *, and use it as one
single consistent reset implementation all over the driver. Also increase
the minimum reset duration to 12ms, to make sure the reset is really
within the spec.
Marek Vasut [Wed, 18 May 2022 21:28:32 +0000 (14:28 -0700)]
Input: ili210x - fix reset timing
According to Ilitek "231x & ILI251x Programming Guide" Version: 2.30
"2.1. Power Sequence", "T4 Chip Reset and discharge time" is minimum
10ms and "T2 Chip initial time" is maximum 150ms. Adjust the reset
timings such that T4 is 12ms and T2 is 160ms to fit those figures.
This prevents sporadic touch controller start up failures when some
systems with at least ILI251x controller boot, without this patch
the systems sometimes fail to communicate with the touch controller.
drm/amd: Don't reset dGPUs if the system is going to s2idle
An A+A configuration on ASUS ROG Strix G513QY proves that the ASIC
reset for handling aborted suspend can't work with s2idle.
This functionality was introduced in commit b890ad00836d61 ("drm/amdgpu:
always reset the asic in suspend (v2)"). A few other commits have
gone on top of the ASIC reset, but this still doesn't work on the A+A
configuration in s2idle.
Avoid doing the reset on dGPUs specifically when using s2idle.
Fixes: b890ad00836d61 ("drm/amdgpu: always reset the asic in suspend (v2)") Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2008 Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
cancel_request() never guaranteed that after its return the OSD
client would be completely done with the OSD request. The callback
(if specified) can still be invoked and a ref can still be held.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Xiubo Li <xiubli@redhat.com>
Ilya Dryomov [Sat, 14 May 2022 10:16:47 +0000 (12:16 +0200)]
libceph: fix potential use-after-free on linger ping and resends
request_reinit() is not only ugly as the comment rightfully suggests,
but also unsafe. Even though it is called with osdc->lock held for
write in all cases, resetting the OSD request refcount can still race
with handle_reply() and result in use-after-free. Taking linger ping
as an example:
# req->r_kref == 2 because handle_reply still holds its ref
down_write(&osdc->lock)
send_linger_ping(lreq)
req = lreq->ping_req # same req
# cancel_linger_request is NOT
# called - handle_reply already
# unregistered
request_reinit(req)
WARN_ON(req->r_kref != 1) # fires
request_init(req)
kref_init(req->r_kref)
# req->r_kref == 1 after kref_init
ceph_osdc_put_request(req)
kref_put(req->r_kref)
# req->r_kref == 0 after kref_put, req is freed
<further req initialization/use> !!!
This happens because send_linger_ping() always (re)uses the same OSD
request for watch ping requests, relying on cancel_linger_request() to
unregister it from the OSD client and rip its messages out from the
messenger. send_linger() does the same for watch/notify registration
and watch reconnect requests. Unfortunately cancel_request() doesn't
guarantee that after it returns the OSD client would be completely done
with the OSD request -- a ref could still be held and the callback (if
specified) could still be invoked too.
The original motivation for request_reinit() was inability to deal with
allocation failures in send_linger() and send_linger_ping(). Switching
to using osdc->req_mempool (currently only used by CephFS) respects that
and allows us to get rid of request_reinit().
Cc: stable@vger.kernel.org Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Xiubo Li <xiubli@redhat.com> Acked-by: Jeff Layton <jlayton@kernel.org>
Al Viro [Mon, 16 May 2022 08:42:13 +0000 (16:42 +0800)]
Fix double fget() in vhost_net_set_backend()
Descriptor table is a shared resource; two fget() on the same descriptor
may return different struct file references. get_tap_ptr_ring() is
called after we'd found (and pinned) the socket we'll be using and it
tries to find the private tun/tap data structures associated with it.
Redoing the lookup by the same file descriptor we'd used to get the
socket is racy - we need to same struct file.
Thanks to Jason for spotting a braino in the original variant of patch -
I'd missed the use of fd == -1 for disabling backend, and in that case
we can end up with sock == NULL and sock != oldsock.
Cc: stable@kernel.org Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Eli Cohen [Mon, 16 May 2022 08:47:35 +0000 (11:47 +0300)]
vdpa/mlx5: Use consistent RQT size
The current code evaluates RQT size based on the configured number of
virtqueues. This can raise an issue in the following scenario:
Assume MQ was negotiated.
1. mlx5_vdpa_set_map() gets called.
2. handle_ctrl_mq() is called setting cur_num_vqs to some value, lower
than the configured max VQs.
3. A second set_map gets called, but now a smaller number of VQs is used
to evaluate the size of the RQT.
4. handle_ctrl_mq() is called with a value larger than what the RQT can
hold. This will emit errors and the driver state is compromised.
To fix this, we use a new field in struct mlx5_vdpa_net to hold the
required number of entries in the RQT. This value is evaluated in
mlx5_vdpa_set_driver_features() where we have the negotiated features
all set up.
In addition to that, we take into consideration the max capability of RQT
entries early when the device is added so we don't need to take consider
it when creating the RQT.
Last, we remove the use of mlx5_vdpa_max_qps() which just returns the
max_vas / 2 and make the code clearer.
Fixes: b4cb1a4cf42c ("vdpa/mlx5: Add multiqueue support") Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Eli Cohen <elic@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Linus Torvalds [Wed, 18 May 2022 15:53:43 +0000 (05:53 -1000)]
Merge tag 'sound-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A collection of last-minute HD- an USB-audio quirks in addition to a
fix for the legacy ISA wavefront driver.
All look small and easy"
* tag 'sound-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: usb-audio: Restore Rane SL-1 quirk
ALSA: hda/realtek: fix right sounds and mute/micmute LEDs for HP machine
ALSA: hda/realtek: Add quirk for TongFang devices with pop noise
ALSA: hda/realtek: Add quirk for the Framework Laptop
ALSA: wavefront: Proper check of get_user() error
ALSA: hda/realtek: Add quirk for Dell Latitude 7520
ALSA: hda - fix unused Realtek function when PM is not enabled
ALSA: usb-audio: Don't get sample rate for MCT Trigger 5 USB-to-HDMI
netfilter: nf_tables: disable expression reduction infra
Either userspace or kernelspace need to pre-fetch keys inconditionally
before comparisons for this to work. Otherwise, register tracking data
is misleading and it might result in reducing expressions which are not
yet registers.
First expression is also guaranteed to be evaluated always, however,
certain expressions break before writing data to registers, before
comparing the data, leaving the register in undetermined state.
This patch disables this infrastructure by now.
Fixes: 1cb7f4d021db ("netfilter: nf_tables: do not reduce read-only expressions") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Ritaro Takenaka [Tue, 17 May 2022 10:55:30 +0000 (12:55 +0200)]
netfilter: flowtable: move dst_check to packet path
Fixes sporadic IPv6 packet loss when flow offloading is enabled.
IPv6 route GC and flowtable GC are not synchronized.
When dst_cache becomes stale and a packet passes through the flow before
the flowtable GC teardowns it, the packet can be dropped.
So, it is necessary to check dst every time in packet path.
Fixes: 9b435bd7f625 ("netfilter: nf_flowtable: skip device lookup from interface index") Signed-off-by: Ritaro Takenaka <ritarot634@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
1. ct gc may race to undo the timeout adjustment of the packet path, leaving
the conntrack entry in place with the internal offload timeout (one day).
2. ct gc removes the ct because the IPS_OFFLOAD_BIT is not set and the CLOSE
timeout is reached before the flow offload del.
3. tcp ct is always set to ESTABLISHED with a very long timeout
in flow offload teardown/delete even though the state might be already
CLOSED. Also as a remark we cannot assume that the FIN or RST packet
is hitting flow table teardown as the packet might get bumped to the
slow path in nftables.
This patch resets IPS_OFFLOAD_BIT from flow_offload_teardown(), so
conntrack handles the tcp rst/fin packet which triggers the CLOSE/FIN
state transition.
Moreover, teturn the connection's ownership to conntrack upon teardown
by clearing the offload flag and fixing the established timeout value.
The flow table GC thread will asynchonrnously free the flow table and
hardware offload entries.
Before this patch, the IPS_OFFLOAD_BIT remained set for expired flows on
which is also misleading since the flow is back to classic conntrack
path.
If nf_ct_delete() removes the entry from the conntrack table, then it
calls nf_ct_put() which decrements the refcnt. This is not a problem
because the flowtable holds a reference to the conntrack object from
flow_offload_alloc() path which is released via flow_offload_free().
This patch also updates nft_flow_offload to skip packets in SYN_RECV
state. Since we might miss or bump packets to slow path, we do not know
what will happen there while we are still in SYN_RECV, this patch
postpones offload up to the next packet which also aligns to the
existing behaviour in tc-ct.
flow_offload_teardown() does not reset the existing tcp state from
flow_offload_fixup_tcp() to ESTABLISHED anymore, packets bump to slow
path might have already update the state to CLOSE/FIN.
Joint work with Oz and Sven.
Fixes: af46bd490751 ("netfilter: nf_flow_table: teardown flow timeout race") Signed-off-by: Oz Shlomo <ozsh@nvidia.com> Signed-off-by: Sven Auhagen <sven.auhagen@voleatech.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Joel Stanley [Tue, 17 May 2022 09:22:17 +0000 (18:52 +0930)]
net: ftgmac100: Disable hardware checksum on AST2600
The AST2600 when using the i210 NIC over NC-SI has been observed to
produce incorrect checksum results with specific MTU values. This was
first observed when sending data across a long distance set of networks.
On a local network, the following test was performed using a 1MB file of
random data.
On the receiver run this script:
#!/bin/bash
while [ 1 ]; do
# Zero the stats
nstat -r > /dev/null
nc -l 9899 > test-file
# Check for checksum errors
TcpInCsumErrors=$(nstat | grep TcpInCsumErrors)
if [ -z "$TcpInCsumErrors" ]; then
echo No TcpInCsumErrors
else
echo TcpInCsumErrors = $TcpInCsumErrors
fi
done
On an AST2600 system:
# nc <IP of receiver host> 9899 < test-file
The test was repeated with various MTU values:
# ip link set mtu 1410 dev eth0
The observed results:
1500 - good
1434 - bad
1400 - good
1410 - bad
1420 - good
The test was repeated after disabling tx checksumming:
# ethtool -K eth0 tx-checksumming off
And all MTU values tested resulted in transfers without error.
An issue with the driver cannot be ruled out, however there has been no
bug discovered so far.
David has done the work to take the original bug report of slow data
transfer between long distance connections and triaged it down to this
test case.
The vendor suspects this this is a hardware issue when using NC-SI. The
fixes line refers to the patch that introduced AST2600 support.
Reported-by: David Wilder <wilder@us.ibm.com> Reviewed-by: Dylan Hung <dylan_hung@aspeedtech.com> Signed-off-by: Joel Stanley <joel@jms.id.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Kevin Mitchell [Tue, 17 May 2022 18:01:05 +0000 (11:01 -0700)]
igb: skip phy status check where unavailable
igb_read_phy_reg() will silently return, leaving phy_data untouched, if
hw->ops.read_reg isn't set. Depending on the uninitialized value of
phy_data, this led to the phy status check either succeeding immediately
or looping continuously for 2 seconds before emitting a noisy err-level
timeout. This message went out to the console even though there was no
actual problem.
Instead, first check if there is read_reg function pointer. If not,
proceed without trying to check the phy status register.
Fixes: 4db53b7f37a2 ("igb: When GbE link up, wait for Remote receiver status condition") Signed-off-by: Kevin Mitchell <kevmitch@arista.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Lin Ma [Wed, 18 May 2022 10:53:21 +0000 (18:53 +0800)]
nfc: pn533: Fix buggy cleanup order
When removing the pn533 device (i2c or USB), there is a logic error. The
original code first cancels the worker (flush_delayed_work) and then
destroys the workqueue (destroy_workqueue), leaving the timer the last
one to be deleted (del_timer). This result in a possible race condition
in a multi-core preempt-able kernel. That is, if the cleanup
(pn53x_common_clean) is concurrently run with the timer handler
(pn533_listen_mode_timer), the timer can queue the poll_work to the
already destroyed workqueue, causing use-after-free.
This patch reorder the cleanup: it uses the del_timer_sync to make sure
the handler is finished before the routine will destroy the workqueue.
Note that the timer cannot be activated by the worker again.
if (cur_mod->len == 0 && dev->poll_mod_count > 1)
mod_timer(&dev->listen_timer, ...);
That is, the mod_timer can be called only when pn533_send_poll_frame()
returns no error, which is impossible because the device is detaching
and the lower driver should return ENODEV code.
Signed-off-by: Lin Ma <linma@zju.edu.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 18 May 2022 12:05:43 +0000 (13:05 +0100)]
Merge branch 'mptcp-checksums'
Mat Martineau says:
====================
mptcp: Fix checksum byte order on little-endian
These patches address a bug in the byte ordering of MPTCP checksums on
little-endian architectures. The __sum16 type is always big endian, but
was being cast to u16 and then byte-swapped (on little-endian archs)
when reading/writing the checksum field in MPTCP option headers.
MPTCP checksums are off by default, but are enabled if one or both peers
request it in the SYN/SYNACK handshake.
The corrected code is verified to interoperate between big-endian and
little-endian machines.
Patch 1 fixes the checksum byte order, patch 2 partially mitigates
interoperation with peers sending bad checksums by falling back to TCP
instead of resetting the connection.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Mat Martineau [Tue, 17 May 2022 18:02:12 +0000 (11:02 -0700)]
mptcp: Do TCP fallback on early DSS checksum failure
RFC 8684 section 3.7 describes several opportunities for a MPTCP
connection to "fall back" to regular TCP early in the connection
process, before it has been confirmed that MPTCP options can be
successfully propagated on all SYN, SYN/ACK, and data packets. If a peer
acknowledges the first received data packet with a regular TCP header
(no MPTCP options), fallback is allowed.
If the recipient of that first data packet finds a MPTCP DSS checksum
error, this provides an opportunity to fail gracefully with a TCP
fallback rather than resetting the connection (as might happen if a
checksum failure were detected later).
This commit modifies the checksum failure code to attempt fallback on
the initial subflow of a MPTCP connection, only if it's a failure in the
first data mapping. In cases where the peer initiates the connection,
requests checksums, is the first to send data, and the peer is sending
incorrect checksums (see
https://github.com/multipath-tcp/mptcp_net-next/issues/275), this allows
the connection to proceed as TCP rather than reset.
Fixes: bb4f28ee8e4e ("mptcp: validate the data checksum") Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Tue, 17 May 2022 18:02:11 +0000 (11:02 -0700)]
mptcp: fix checksum byte order
The MPTCP code typecasts the checksum value to u16 and
then converts it to big endian while storing the value into
the MPTCP option.
As a result, the wire encoding for little endian host is
wrong, and that causes interoperabilty interoperability
issues with other implementation or host with different endianness.
Address the issue writing in the packet the unmodified __sum16 value.
MPTCP checksum is disabled by default, interoperating with systems
with bad mptcp-level csum encoding should cause fallback to TCP.
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/275 Fixes: 625e985bd890 ("mptcp: send out checksum for DSS") Fixes: 0155a35ad263 ("mptcp: receive checksum for DSS") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
ARM: 9197/1: spectre-bhb: fix loop8 sequence for Thumb2
In Thumb2, 'b . + 4' produces a branch instruction that uses a narrow
encoding, and so it does not jump to the following instruction as
expected. So use W(b) instead.
Fixes: 07cd9e9e75d8 ("ARM: fix Thumb2 regression with Spectre BHB") Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
The Spectre-BHB mitigations were inadvertently left disabled for
Cortex-A15, due to the fact that cpu_v7_bugs_init() is not called in
that case. So fix that.
Fixes: 404a7a266c1a ("ARM: Spectre-BHB workaround") Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Since the recent introduction supporting the SM3 and SM4 hash algos for IPsec, the kernel
produces invalid pfkey acquire messages, when these encryption modules are disabled. This
happens because the availability of the algos wasn't checked in all necessary functions.
This patch adds these checks.
Signed-off-by: Thomas Bartschies <thomas.bartschies@cvk.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Jiasheng Jiang [Tue, 17 May 2022 09:42:31 +0000 (17:42 +0800)]
net: af_key: add check for pfkey_broadcast in function pfkey_process
If skb_clone() returns null pointer, pfkey_broadcast() will
return error.
Therefore, it should be better to check the return value of
pfkey_broadcast() and return error if fails.
Al Viro [Wed, 18 May 2022 06:13:40 +0000 (02:13 -0400)]
percpu_ref_init(): clean ->percpu_count_ref on failure
That way percpu_ref_exit() is safe after failing percpu_ref_init().
At least one user (cgroup_create()) had a double-free that way;
there might be other similar bugs. Easier to fix in percpu_ref_init(),
rather than playing whack-a-mole in sloppy users...
Usual symptoms look like a messed refcounting in one of subsystems
that use percpu allocations (might be percpu-refcount, might be
something else). Having refcounts for two different objects share
memory is Not Nice(tm)...
Reported-by: syzbot+5b1e53987f858500ec00@syzkaller.appspotmail.com Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
In case fw sync reset is called in parallel to device removal, device
might stuck in the following deadlock:
CPU 0 CPU 1
----- -----
remove_one
uninit_one (locks intf_state_mutex)
mlx5_sync_reset_now_event()
work in fw_reset->wq.
mlx5_enter_error_state()
mutex_lock (intf_state_mutex)
cleanup_once
fw_reset_cleanup()
destroy_workqueue(fw_reset->wq)
Drain the fw_reset WQ, and make sure no new work is being queued, before
entering uninit_one().
The Drain is done before devlink_unregister() since fw_reset, in some
flows, is using devlink API devlink_remote_reload_actions_performed().
Paul Blakey [Thu, 7 Apr 2022 10:37:32 +0000 (13:37 +0300)]
net/mlx5e: CT: Fix setting flow_source for smfs ct tuples
Cited patch sets flow_source to ANY overriding the provided spec
flow_source, avoiding the optimization done by commit 2f0c2ba8e079
("net/mlx5: CT: Set flow source hint from provided tuple device").
To fix the above, set the dr_rule flow_source from provided flow spec.
Gal Pressman [Wed, 13 Apr 2022 12:50:42 +0000 (15:50 +0300)]
net/mlx5e: Remove HW-GRO from reported features
We got reports of certain HW-GRO flows causing kernel call traces, which
might be related to firmware. To be on the safe side, disable the
feature for now and re-enable it once a driver/firmware fix is found.
net/mlx5e: Properly block HW GRO when XDP is enabled
HW GRO is incompatible and mutually exclusive with XDP and XSK. However,
the needed checks are only made when enabling XDP. If HW GRO is enabled
when XDP is already active, the command will succeed, and XDP will be
skipped in the data path, although still enabled.
This commit fixes the bug by checking the XDP and XSK status in
mlx5e_fix_features and disabling HW GRO if XDP is enabled.
LRO is incompatible and mutually exclusive with XDP. However, the needed
checks are only made when enabling XDP. If LRO is enabled when XDP is
already active, the command will succeed, and XDP will be skipped in the
data path, although still enabled.
This commit fixes the bug by checking the XDP status in
mlx5e_fix_features and disabling LRO if XDP is enabled.
Aya Levin [Mon, 11 Apr 2022 14:29:08 +0000 (17:29 +0300)]
net/mlx5e: Block rx-gro-hw feature in switchdev mode
When the driver is in switchdev mode and rx-gro-hw is set, the RQ needs
special CQE handling. Till then, block setting of rx-gro-hw feature in
switchdev mode, to avoid failure while setting the feature due to
failure while opening the RQ.
net/mlx5e: Wrap mlx5e_trap_napi_poll into rcu_read_lock
The body of mlx5e_napi_poll is wrapped into rcu_read_lock to be able to
read the XDP program pointer using rcu_dereference. However, the trap RQ
NAPI doesn't use rcu_read_lock, because the trap RQ works only in the
non-linear mode, and mlx5e_skb_from_cqe_nonlinear, until recently,
didn't support XDP and didn't call rcu_dereference.
Starting from the cited commit, mlx5e_skb_from_cqe_nonlinear supports
XDP and calls rcu_dereference, but mlx5e_trap_napi_poll doesn't wrap it
into rcu_read_lock. It leads to RCU-lockdep warnings like this:
WARNING: suspicious RCU usage
This commit fixes the issue by adding an rcu_read_lock to
mlx5e_trap_napi_poll, similarly to mlx5e_napi_poll.
Fixes: 08b3f5911600 ("net/mlx5e: Add XDP multi buffer support to the non-linear legacy RQ") Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
net/mlx5: DR, Ignore modify TTL on RX if device doesn't support it
When modifying TTL, packet's csum has to be recalculated.
Due to HW issue in ConnectX-5, csum recalculation for modify
TTL on RX is supported through a work-around that is specifically
enabled by configuration.
If the work-around isn't enabled, rather than adding an unsupported
action the modify TTL action on RX should be ignored.
Ignoring modify TTL action might result in zero actions, so in such
cases we will not convert the match STE to modify STE, as it is done
by FW in DMFS.
This patch fixes an issue where modify TTL action was ignored both
on RX and TX instead of only on RX.
Fixes: a143963a8880 ("net/mlx5: DR, Ignore modify TTL if device doesn't support it") Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Shay Drory [Wed, 9 Mar 2022 12:45:58 +0000 (14:45 +0200)]
net/mlx5: Initialize flow steering during driver probe
Currently, software objects of flow steering are created and destroyed
during reload flow. In case a device is unloaded, the following error
is printed during grace period:
mlx5_core 0000:00:0b.0: mlx5_fw_fatal_reporter_err_work:690:(pid 95):
Driver is in error state. Unloading
As a solution to fix use-after-free bugs, where we try to access
these objects, when reading the value of flow_steering_mode devlink
param[1], let's split flow steering creation and destruction into two
routines:
* init and cleanup: memory, cache, and pools allocation/free.
* create and destroy: namespaces initialization and cleanup.
While at it, re-order the cleanup function to mirror the init function.
Maor Dickman [Mon, 21 Mar 2022 08:07:44 +0000 (10:07 +0200)]
net/mlx5: DR, Fix missing flow_source when creating multi-destination FW table
In order to support multiple destination FTEs with SW steering
FW table is created with single FTE with multiple actions and
SW steering rule forward to it. When creating this table, flow
source isn't set according to the original FTE.
Fix this by passing the original FTE flow source to the created
FW table.
if (!zalloc_cpumask_var(&mask, GFP_KERNEL))
return -ENOMEM;
... use 'mask' here ...
free_cpumask_var(mask);
Link: https://lore.kernel.org/r/20220516054721.1548-1-mingzhe.zou@easystack.cn Fixes: f7950f2df3c3 ("scsi: target: Add iscsi/cpus_allowed_list in configfs") Reported-by: Test Bot <zgrieee@gmail.com> Reviewed-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Mingzhe Zou <mingzhe.zou@easystack.cn> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>