Masahiro Yamada [Sat, 20 Aug 2022 16:51:29 +0000 (01:51 +0900)]
powerpc: align syscall table for ppc32
Christophe Leroy reported that commit 5da6c45f99c7 ("kbuild: link
symbol CRCs at final link, removing CONFIG_MODULE_REL_CRCS") broke
mpc85xx_defconfig + CONFIG_RELOCATABLE=y.
The compiler emits a bunch of R_PPC_UADDR32, which is not supported by
arch/powerpc/kernel/reloc_32.S.
The reason is there exists an unaligned symbol.
$ powerpc-linux-gnu-nm -n vmlinux
... c0b31258 d spe_aligninfo c0b31298 d __func__.0 c0b312a9 D sys_call_table c0b319b8 d __func__.0
Commit 5da6c45f99c7 is not the root cause. Even before that, I can
reproduce the same issue for mpc85xx_defconfig + CONFIG_RELOCATABLE=y
+ CONFIG_MODVERSIONS=n.
It is just that nobody noticed because when CONFIG_MODVERSIONS is
enabled, a __crc_* symbol inserted before sys_call_table was hiding the
unalignment issue.
Adding alignment to the syscall table for ppc32 fixes the issue.
Pali Rohár [Sat, 20 Aug 2022 11:51:13 +0000 (13:51 +0200)]
powerpc/pci: Enable PCI domains in /proc when PCI bus numbers are not unique
On 32-bit powerpc systems with more PCIe controllers and more PCI
domains, where on more PCI domains are same PCI numbers, when kernel is
compiled with CONFIG_PROC_FS=y and
CONFIG_PPC_PCI_BUS_NUM_DOMAIN_DEPENDENT=y options, kernel prints
"proc_dir_entry 'pci/01' already registered" error message.
This regression started appearing after commit d13e006fcde9 ("powerpc/pci: Add config option for using all 256 PCI
buses") in case in each mPCIe slot is connected PCIe card and therefore
PCI bus 1 is populated in for every PCIe controller / PCI domain.
The reason is that PCI procfs code expects that when PCI bus numbers are
not unique across all PCI domains, function pci_proc_domain() returns
true for domain dependent buses.
Fix this issue by setting PCI_ENABLE_PROC_DOMAINS and
PCI_COMPAT_DOMAIN_0 flags for 32-bit powerpc code when
CONFIG_PPC_PCI_BUS_NUM_DOMAIN_DEPENDENT is enabled. Same approach is
already implemented for 64-bit powerpc code (where PCI bus numbers are
always domain dependent).
Fixes: d13e006fcde9 ("powerpc/pci: Add config option for using all 256 PCI buses") Signed-off-by: Pali Rohár <pali@kernel.org>
[mpe: Trim change log oops message] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220820115113.30581-1-pali@kernel.org
Kajol Jain [Thu, 4 Aug 2022 07:48:52 +0000 (13:18 +0530)]
powerpc/papr_scm: Fix nvdimm event mappings
Commit 47b6f0f87f11 ("powerpc/papr_scm: Add perf interface support")
added performance monitoring support for papr-scm nvdimm devices via
perf interface. Commit also added an array in papr_scm_priv
structure called "nvdimm_events_map", which got filled based on the
result of H_SCM_PERFORMANCE_STATS hcall.
Currently there is an assumption that the order of events in the
stats buffer, returned by the hypervisor is same. And order also
happens to matches with the events specified in nvdimm driver code.
But this assumption is not documented in Power Architecture
Platform Requirements (PAPR) document. Although the order
of events happens to be same on current generation od system, but
it might not be true in future generation systems. Fix the issue, by
adding a static mapping for nvdimm events to corresponding stat-id,
and removing the dynamic map from papr_scm_priv structure. Also
remove the function papr_scm_pmu_check_events from papr_scm.c file,
as we no longer need to copy stat-ids dynamically.
Linus Torvalds [Sun, 21 Aug 2022 22:09:55 +0000 (15:09 -0700)]
Merge tag 'irq-urgent-2022-08-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Ingo Molnar:
"Misc irqchip fixes: LoongArch driver fixes and a Hyper-V IOMMU fix"
* tag 'irq-urgent-2022-08-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/loongson-liointc: Fix an error handling path in liointc_init()
irqchip/loongarch: Fix irq_domain_alloc_fwnode() abuse
irqchip/loongson-pch-pic: Move find_pch_pic() into CONFIG_ACPI
irqchip/loongson-eiointc: Fix a build warning
irqchip/loongson-eiointc: Fix irq affinity setting
iommu/hyper-v: Use helper instead of directly accessing affinity
Linus Torvalds [Sun, 21 Aug 2022 22:01:51 +0000 (15:01 -0700)]
Merge tag 'perf-urgent-2022-08-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 kprobes fix from Ingo Molnar:
"Fix a kprobes bug in JNG/JNLE emulation when a kprobe is installed at
such instructions, possibly resulting in incorrect execution (the
wrong branch taken)"
* tag 'perf-urgent-2022-08-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/kprobes: Fix JNG/JNLE emulation
Linus Torvalds [Sun, 21 Aug 2022 21:49:42 +0000 (14:49 -0700)]
Merge tag 'trace-v6.0-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
"Various fixes for tracing:
- Fix a return value of traceprobe_parse_event_name()
- Fix NULL pointer dereference from failed ftrace enabling
- Fix NULL pointer dereference when asking for registers from eprobes
- Make eprobes consistent with kprobes/uprobes, filters and
histograms"
* tag 'trace-v6.0-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Have filter accept "common_cpu" to be consistent
tracing/probes: Have kprobes and uprobes use $COMM too
tracing/eprobes: Have event probes be consistent with kprobes and uprobes
tracing/eprobes: Fix reading of string fields
tracing/eprobes: Do not hardcode $comm as a string
tracing/eprobes: Do not allow eprobes to use $stack, or % for regs
ftrace: Fix NULL pointer dereference in is_ftrace_trampoline when ftrace is dead
tracing/perf: Fix double put of trace event when init fails
tracing: React to error return from traceprobe_parse_event_name()
tracing: Have filter accept "common_cpu" to be consistent
Make filtering consistent with histograms. As "cpu" can be a field of an
event, allow for "common_cpu" to keep it from being confused with the
"cpu" field of the event.
tracing/probes: Have kprobes and uprobes use $COMM too
Both $comm and $COMM can be used to get current->comm in eprobes and the
filtering and histogram logic. Make kprobes and uprobes consistent in this
regard and allow both $comm and $COMM as well. Currently kprobes and
uprobes only handle $comm, which is inconsistent with the other utilities,
and can be confusing to users.
tracing/eprobes: Have event probes be consistent with kprobes and uprobes
Currently, if a symbol "@" is attempted to be used with an event probe
(eprobes), it will cause a NULL pointer dereference crash.
Both kprobes and uprobes can reference data other than the main registers.
Such as immediate address, symbols and the current task name. Have eprobes
do the same thing.
For "comm", if "comm" is used and the event being attached to does not
have the "comm" field, then make it the "$comm" that kprobes has. This is
consistent to the way histograms and filters work.
Link: https://lkml.kernel.org/r/20220820134401.136924220@goodmis.org Cc: stable@vger.kernel.org Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Tzvetomir Stoyanov <tz.stoyanov@gmail.com> Cc: Tom Zanussi <zanussi@kernel.org> Fixes: 1a327be3f20f ("tracing: Add a probe that attaches to trace events") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Currently when an event probe (eprobe) hooks to a string field, it does
not display it as a string, but instead as a number. This makes the field
rather useless. Handle the different kinds of strings, dynamic, static,
relational/dynamic etc.
Now when a string field is used, the ":string" type can be used to display
it:
tracing/eprobes: Do not hardcode $comm as a string
The variable $comm is hard coded as a string, which is true for both
kprobes and uprobes, but for event probes (eprobes) it is a field name. In
most cases the "comm" field would be a string, but there's no guarantee of
that fact.
Do not assume that comm is a string. Not to mention, it currently forces
comm fields to fault, as string processing for event probes is currently
broken.
Link: https://lkml.kernel.org/r/20220820134400.756152112@goodmis.org Cc: stable@vger.kernel.org Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Tzvetomir Stoyanov <tz.stoyanov@gmail.com> Cc: Tom Zanussi <zanussi@kernel.org> Fixes: 1a327be3f20f ("tracing: Add a probe that attaches to trace events") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
tracing/eprobes: Do not allow eprobes to use $stack, or % for regs
While playing with event probes (eprobes), I tried to see what would
happen if I attempted to retrieve the instruction pointer (%rip) knowing
that event probes do not use pt_regs. The result was:
Move the testing for TPARG_FL_TPOINT which is only used for event probes
to the top of the "$" variable check, as all the other variables are not
used for event probes. Also add a check in the register parsing "%" to
fail if an event probe is used.
Link: https://lkml.kernel.org/r/20220820134400.564426983@goodmis.org Cc: stable@vger.kernel.org Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Tzvetomir Stoyanov <tz.stoyanov@gmail.com> Cc: Tom Zanussi <zanussi@kernel.org> Fixes: 1a327be3f20f ("tracing: Add a probe that attaches to trace events") Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Yang Jihong [Thu, 18 Aug 2022 03:26:59 +0000 (11:26 +0800)]
ftrace: Fix NULL pointer dereference in is_ftrace_trampoline when ftrace is dead
ftrace_startup does not remove ops from ftrace_ops_list when
ftrace_startup_enable fails:
register_ftrace_function
ftrace_startup
__register_ftrace_function
...
add_ftrace_ops(&ftrace_ops_list, ops)
...
...
ftrace_startup_enable // if ftrace failed to modify, ftrace_disabled is set to 1
...
return 0 // ops is in the ftrace_ops_list.
When ftrace_disabled = 1, unregister_ftrace_function simply returns without doing anything:
unregister_ftrace_function
ftrace_shutdown
if (unlikely(ftrace_disabled))
return -ENODEV; // return here, __unregister_ftrace_function is not executed,
// as a result, ops is still in the ftrace_ops_list
__unregister_ftrace_function
...
If ops is dynamically allocated, it will be free later, in this case,
is_ftrace_trampoline accesses NULL pointer:
is_ftrace_trampoline
ftrace_ops_trampoline
do_for_each_ftrace_op(op, ftrace_ops_list) // OOPS! op may be NULL!
tracing/perf: Fix double put of trace event when init fails
If in perf_trace_event_init(), the perf_trace_event_open() fails, then it
will call perf_trace_event_unreg() which will not only unregister the perf
trace event, but will also call the put() function of the tp_event.
The problem here is that the trace_event_try_get_ref() is called by the
caller of perf_trace_event_init() and if perf_trace_event_init() returns a
failure, it will then call trace_event_put(). But since the
perf_trace_event_unreg() already called the trace_event_put() function, it
triggers a WARN_ON().
WARNING: CPU: 1 PID: 30309 at kernel/trace/trace_dynevent.c:46 trace_event_dyn_put_ref+0x15/0x20
If perf_trace_event_reg() does not call the trace_event_try_get_ref() then
the perf_trace_event_unreg() should not be calling trace_event_put(). This
breaks symmetry and causes bugs like these.
Pull out the trace_event_put() from perf_trace_event_unreg() and call it
in the locations that perf_trace_event_unreg() is called. This not only
fixes this bug, but also brings back the proper symmetry of the reg/unreg
vs get/put logic.
Lukas Bulwahn [Thu, 11 Aug 2022 07:17:34 +0000 (09:17 +0200)]
tracing: React to error return from traceprobe_parse_event_name()
The function traceprobe_parse_event_name() may set the first two function
arguments to a non-null value and still return -EINVAL to indicate an
unsuccessful completion of the function. Hence, it is not sufficient to
just check the result of the two function arguments for being not null,
but the return value also needs to be checked.
Commit 2d42a30384c4 ("tracing: Auto generate event name when creating a
group of events") changed the error-return-value checking of the second
traceprobe_parse_event_name() invocation in __trace_eprobe_create() and
removed checking the return value to jump to the error handling case.
Reinstate using the return value in the error-return-value checking.
Link: https://lkml.kernel.org/r/20220811071734.20700-1-lukas.bulwahn@gmail.com Fixes: 2d42a30384c4 ("tracing: Auto generate event name when creating a group of events") Acked-by: Linyu Yuan <quic_linyyuan@quicinc.com> Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Linus Torvalds [Sun, 21 Aug 2022 18:18:33 +0000 (11:18 -0700)]
Merge tag 'i2c-for-6.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"A revert to fix a regression introduced this merge window and a fix
for proper error handling in the remove path of the iMX driver"
* tag 'i2c-for-6.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: imx: Make sure to unregister adapter on remove()
Revert "i2c: scmi: Replace open coded device_get_match_data()"
Linus Torvalds [Sun, 21 Aug 2022 17:21:16 +0000 (10:21 -0700)]
Merge tag '6.0-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs client fixes from Steve French:
- memory leak fix
- two small cleanups
- trivial strlcpy removal
- update missing entry for cifs headers in MAINTAINERS file
* tag '6.0-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: move from strlcpy with unused retval to strscpy
cifs: Fix memory leak on the deferred close
cifs: remove useless parameter 'is_fsctl' from SMB2_ioctl()
cifs: remove unused server parameter from calc_smb_size()
cifs: missing directory in MAINTAINERS file
Nick Desaulniers [Fri, 19 Aug 2022 19:06:40 +0000 (12:06 -0700)]
asm goto: eradicate CC_HAS_ASM_GOTO
GCC has supported asm goto since 4.5, and Clang has since version 9.0.0.
The minimum supported versions of these tools for the build according to
Documentation/process/changes.rst are 5.1 and 11.0.0 respectively.
Remove the feature detection script, Kconfig option, and clean up some
fallback code that is no longer supported.
The removed script was also testing for a GCC specific bug that was
fixed in the 4.7 release.
Also remove workarounds for bpftrace using clang older than 9.0.0, since
other BPF backend fixes are required at this point.
i2c: imx: Make sure to unregister adapter on remove()
If for whatever reasons pm_runtime_resume_and_get() fails and .remove() is
exited early, the i2c adapter stays around and the irq still calls its
handler, while the driver data and the register mapping go away. So if
later the i2c adapter is accessed or the irq triggers this results in
havoc accessing freed memory and unmapped registers.
So unregister the software resources even if resume failed, and only skip
the hardware access in that case.
Fixes: bbcd03688544 ("i2c: imx: add runtime pm support to improve the performance") Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Acked-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Wolfram Sang <wsa@kernel.org>
Wolfram Sang [Thu, 18 Aug 2022 20:31:13 +0000 (22:31 +0200)]
Revert "i2c: scmi: Replace open coded device_get_match_data()"
This reverts commit ad157e498025ffc8ba9e2c690c59a687efaeade5. We got a
regression report, so ensure this machine boots again. We will come back
with a better version hopefully.
Linus Torvalds [Sat, 20 Aug 2022 21:55:38 +0000 (14:55 -0700)]
Merge tag 'kbuild-fixes-v6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- Fix module versioning broken on some architectures
- Make dummy-tools enable CONFIG_PPC_LONG_DOUBLE_128
- Remove -Wformat-zero-length, which has no warning instance
- Fix the order between drivers and libs in modules.order
- Fix false-positive warnings in clang-analyzer
* tag 'kbuild-fixes-v6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
scripts/clang-tools: Remove DeprecatedOrUnsafeBufferHandling check
kbuild: fix the modules order between drivers and libs
scripts/Makefile.extrawarn: Do not disable clang's -Wformat-zero-length
kbuild: dummy-tools: pretend we understand __LONG_DOUBLE_128__
modpost: fix module versioning when a symbol lacks valid CRC
* tag 'perf-tools-fixes-for-v6.0-2022-08-19' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
perf tools: Support reading PERF_FORMAT_LOST
libperf: Add a test case for read formats
libperf: Handle read format in perf_evsel__read()
tools headers UAPI: Sync linux/perf_event.h with the kernel sources
tools headers UAPI: Sync x86's asm/kvm.h with the kernel sources
tools headers UAPI: Sync KVM's vmx.h header with the kernel sources
tools include UAPI: Sync linux/vhost.h with the kernel sources
tools headers kvm s390: Sync headers with the kernel sources
tools headers UAPI: Sync linux/kvm.h with the kernel sources
tools headers UAPI: Sync drm/i915_drm.h with the kernel sources
tools headers cpufeatures: Sync with the kernel sources
tools headers UAPI: Sync linux/fscrypt.h with the kernel sources
tools arch x86: Sync the msr-index.h copy with the kernel sources
perf beauty: Update copy of linux/socket.h with the kernel sources
perf cpumap: Fix alignment for masks in event encoding
perf cpumap: Compute mask size in constant time
perf cpumap: Synthetic events and const/static
perf cpumap: Const map for max()
Linus Torvalds [Sat, 20 Aug 2022 18:29:01 +0000 (11:29 -0700)]
Merge tag 's390-6.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Alexander Gordeev:
- Fix a KVM crash on z12 and older machines caused by a wrong
assumption that Query AP Configuration Information is always
available.
- Lower severity of excessive Hypervisor filesystem error messages
when booting under KVM.
* tag 's390-6.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/ap: fix crash on older machines based on QCI info missing
s390/hypfs: avoid error message under KVM
Linus Torvalds [Sat, 20 Aug 2022 17:49:02 +0000 (10:49 -0700)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma fixes from Jason Gunthorpe:
"A few minor fixes:
- Fix buffer management in SRP to correct a regression with the login
authentication feature from v5.17
- Don't iterate over non-present ports in mlx5
- Fix an error introduced by the foritify work in cxgb4
- Two bug fixes for the recently merged ERDMA driver
- Unbreak RDMA dmabuf support, a regresion from v5.19"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA: Handle the return code from dma_resv_wait_timeout() properly
RDMA/erdma: Correct the max_qp and max_cq capacities of the device
RDMA/erdma: Using the key in FMR WR instead of MR structure
RDMA/cxgb4: fix accept failure due to increased cpl_t5_pass_accept_rpl size
RDMA/mlx5: Use the proper number of ports
IB/iser: Fix login with authentication
This `clang-analyzer` check flags the use of memset(), suggesting a more
secure version of the API, such as memset_s(), which does not exist in
the kernel:
warning: Call to function 'memset' is insecure as it does not provide
security checks introduced in the C11 standard. Replace with analogous
functions that support length arguments or provides boundary checks such
as 'memset_s' in case of C11
[clang-analyzer-security.insecureAPI.DeprecatedOrUnsafeBufferHandling]
Signed-off-by: Guru Das Srinagesh <quic_gurus@quicinc.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
scripts/Makefile.extrawarn: Do not disable clang's -Wformat-zero-length
There are no instances of this warning in the tree across several
difference architectures and configurations. This was added by
commit 2b470b2266a9 ("kbuild, LLVMLinux: Supress warnings unless W=1-3")
back in 2014, where it might have been necessary, but there are no
instances of it now so stop disabling it to increase warning coverage
for clang.
Jiri Slaby [Wed, 10 Aug 2022 09:26:03 +0000 (11:26 +0200)]
kbuild: dummy-tools: pretend we understand __LONG_DOUBLE_128__
There is a test in powerpc's Kconfig which checks __LONG_DOUBLE_128__
and sets CONFIG_PPC_LONG_DOUBLE_128 if it is understood by the compiler.
We currently don't handle it, so this results in PPC_LONG_DOUBLE_128 not
being in super-config generated by dummy-tools. So take this into
account in the gcc script and preprocess __LONG_DOUBLE_128__ as "1".
Masahiro Yamada [Tue, 9 Aug 2022 14:11:17 +0000 (23:11 +0900)]
modpost: fix module versioning when a symbol lacks valid CRC
Since commit 5da6c45f99c7 ("kbuild: link symbol CRCs at final link,
removing CONFIG_MODULE_REL_CRCS"), module versioning is broken on
some architectures. Loading a module fails with "disagrees about
version of symbol module_layout".
On such architectures (e.g. ARCH=sparc build with sparc64_defconfig),
modpost shows a warning, like follows:
WARNING: modpost: EXPORT symbol "_mcount" [vmlinux] version generation failed, symbol will not be versioned.
Is "_mcount" prototyped in <asm/asm-prototypes.h>?
Previously, it was a harmless warning (CRC check was just skipped),
but now wrong CRCs are used for comparison because invalid CRCs are
just skipped.
When the module subsystem looks up a CRC that comes after, it results
in reading out a wrong address. For example, when __crc__printk is
needed, the module subsystem reads 0xc53b44 instead of 0xc53b40.
All CRC entries must be output for correct index accessing. Invalid
CRCs will be unused, but are needed to keep the one-to-one mapping
between __ksymtab_* and __crc_*.
The best is to fix all modpost warnings, but several warnings are still
remaining on less popular architectures.
Fixes: 5da6c45f99c7 ("kbuild: link symbol CRCs at final link, removing CONFIG_MODULE_REL_CRCS") Reported-by: matoro <matoro_mailinglist_kernel@matoro.tk> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Tested-by: matoro <matoro_mailinglist_kernel@matoro.tk>
Linus Torvalds [Sat, 20 Aug 2022 17:17:05 +0000 (10:17 -0700)]
Merge tag 'block-6.0-2022-08-19' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"A few fixes that should go into this release:
- Small series of patches for ublk (ZiyangZhang)
- Remove dead function (Yu)
- Fix for running a block queue in case of resource starvation
(Yufen)"
* tag 'block-6.0-2022-08-19' of git://git.kernel.dk/linux-block:
blk-mq: run queue no matter whether the request is the last request
blk-mq: remove unused function blk_mq_queue_stopped()
ublk_drv: do not add a re-issued request aborted previously to ioucmd's task_work
ublk_drv: update comment for __ublk_fail_req()
ublk_drv: check ubq_daemon_is_dying() in __ublk_rq_task_work()
ublk_drv: update iod->addr for UBLK_IO_NEED_GET_DATA
Linus Torvalds [Sat, 20 Aug 2022 16:49:22 +0000 (09:49 -0700)]
Merge tag 'io_uring-6.0-2022-08-19' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
"A few fixes for regressions in this cycle:
- Two instances of using the wrong "has async data" helper (Pavel)
- Fixup zero-copy address import (Pavel)
- Bump zero-copy notification slot limit (Pavel)"
* tag 'io_uring-6.0-2022-08-19' of git://git.kernel.dk/linux-block:
io_uring/net: use right helpers for async_data
io_uring/notif: raise limit on notification slots
io_uring/net: improve zc addr import error handling
io_uring/net: use right helpers for async recycle
Linus Torvalds [Sat, 20 Aug 2022 16:43:45 +0000 (09:43 -0700)]
Merge tag 'ata-6.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata
Pull ATA fixes from Damien Le Moal:
- Add a missing command name definition for ata_get_cmd_name(), from
me.
- A fix to address a performance regression due to the default
max_sectors queue limit for ATA devices connected to AHCI adapters
being too small, from John.
* tag 'ata-6.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
ata: libata: Set __ATA_BASE_SHT max_sectors
ata: libata-eh: Add missing command name
Linus Torvalds [Sat, 20 Aug 2022 16:39:00 +0000 (09:39 -0700)]
Merge tag 'mmc-v6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Pull MMC host fixes from Ulf Hansson:
- meson-gx: Fix error handling in ->probe()
- mtk-sd: Fix a command problem when using cqe off/disable
- pxamci: Fix error handling in ->probe()
- sdhci-of-dwcmshc: Fix broken support for the BlueField-3 variant
* tag 'mmc-v6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: sdhci-of-dwcmshc: Re-enable support for the BlueField-3 SoC
mmc: meson-gx: Fix an error handling path in meson_mmc_probe()
mmc: mtk-sd: Clear interrupts when cqe off/disable
mmc: pxamci: Fix another error handling path in pxamci_probe()
mmc: pxamci: Fix an error handling path in pxamci_probe()
John Garry [Wed, 17 Aug 2022 15:20:08 +0000 (23:20 +0800)]
ata: libata: Set __ATA_BASE_SHT max_sectors
Commit 7cde501895cc ("ata: libata-scsi: cap ata_device->max_sectors
according to shost->max_sectors") inadvertently capped the max_sectors
value for some SATA disks to a value which is lower than we would want.
For a device which supports LBA48, we would previously have request queue
max_sectors_kb and max_hw_sectors_kb values of 1280 and 32767 respectively.
For AHCI controllers, the value chosen for shost max sectors comes from
the minimum of the SCSI host default max sectors in
SCSI_DEFAULT_MAX_SECTORS (1024) and the shost DMA device mapping limit.
This means that we would now set the max_sectors_kb and max_hw_sectors_kb
values for a disk which supports LBA48 at 512, ignoring DMA mapping limit.
As report by Oliver at [0], this caused a performance regression.
Fix by picking a large enough max sectors value for ATA host controllers
such that we don't needlessly reduce max_sectors_kb for LBA48 disks.
Fixes: 7cde501895cc ("ata: libata-scsi: cap ata_device->max_sectors according to shost->max_sectors") Reported-by: Oliver Sang <oliver.sang@intel.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Linus Torvalds [Fri, 19 Aug 2022 20:56:14 +0000 (13:56 -0700)]
Merge tag 'hardening-v6.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardening fixes from Kees Cook:
- Also undef LATENT_ENTROPY_PLUGIN for per-file disabling (Andrew
Donnellan)
- Return EFAULT on copy_from_user() failures in LoadPin (Kees Cook)
* tag 'hardening-v6.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
gcc-plugins: Undefine LATENT_ENTROPY_PLUGIN when plugin disabled for a file
LoadPin: Return EFAULT on copy_from_user() failures
Linus Torvalds [Fri, 19 Aug 2022 20:40:11 +0000 (13:40 -0700)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"ARM:
- Fix unexpected sign extension of KVM_ARM_DEVICE_ID_MASK
- Tidy-up handling of AArch32 on asymmetric systems
x86:
- Fix 'missing ENDBR' BUG for fastop functions
Generic:
- Some cleanup and static analyzer patches
- More fixes to KVM_CREATE_VM unwind paths"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: Drop unnecessary initialization of "ops" in kvm_ioctl_create_device()
KVM: Drop unnecessary initialization of "npages" in hva_to_pfn_slow()
x86/kvm: Fix "missing ENDBR" BUG for fastop functions
x86/kvm: Simplify FOP_SETCC()
x86/ibt, objtool: Add IBT_NOSEAL()
KVM: Rename mmu_notifier_* to mmu_invalidate_*
KVM: Rename KVM_PRIVATE_MEM_SLOTS to KVM_INTERNAL_MEM_SLOTS
KVM: MIPS: remove unnecessary definition of KVM_PRIVATE_MEM_SLOTS
KVM: Move coalesced MMIO initialization (back) into kvm_create_vm()
KVM: Unconditionally get a ref to /dev/kvm module when creating a VM
KVM: Properly unwind VM creation if creating debugfs fails
KVM: arm64: Reject 32bit user PSTATE on asymmetric systems
KVM: arm64: Treat PMCR_EL1.LC as RES1 on asymmetric systems
KVM: arm64: Fix compile error due to sign extension
Linus Torvalds [Fri, 19 Aug 2022 20:33:48 +0000 (13:33 -0700)]
Merge tag 'for-6.0-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"A few short fixes and a lockdep warning fix (needs moving some code):
- tree-log replay fixes:
- fix error handling when looking up extent refs
- fix warning when setting inode number of links
- relocation fixes:
- reset block group read-only status when relocation fails
- unset control structure if transaction fails when starting
to process a block group
- add lockdep annotations to fix a warning during relocation
where blocks temporarily belong to another tree and can lead
to reversed dependencies
- tree-checker verifies that extent items don't overlap"
* tag 'for-6.0-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: tree-checker: check for overlapping extent items
btrfs: fix warning during log replay when bumping inode link count
btrfs: fix lost error handling when looking up extended ref on log replay
btrfs: fix lockdep splat with reloc root extent buffers
btrfs: move lockdep class helpers to locking.c
btrfs: unset reloc control if transaction commit fails in prepare_to_relocate()
btrfs: reset RO counter on block group if we fail to relocate
Linus Torvalds [Fri, 19 Aug 2022 20:26:52 +0000 (13:26 -0700)]
Merge tag '5.20-rc2-ksmbd-smb3-server-fixes' of git://git.samba.org/ksmbd
Pull ksmbd server fixes from Steve French:
- important sparse file fix
- allocation size fix
- fix incorrect rc on bad share
- share config fix
* tag '5.20-rc2-ksmbd-smb3-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: don't remove dos attribute xattr on O_TRUNC open
ksmbd: remove unnecessary generic_fillattr in smb2_open
ksmbd: request update to stale share config
ksmbd: return STATUS_BAD_NETWORK_NAME error status if share is not configured
Namhyung Kim [Fri, 19 Aug 2022 00:36:44 +0000 (17:36 -0700)]
perf tools: Support reading PERF_FORMAT_LOST
The recent kernel added lost count can be read from either read(2) or
ring buffer data with PERF_SAMPLE_READ. As it's a variable length data
we need to access it according to the format info.
But for perf tools use cases, PERF_FORMAT_ID is always set. So we can
only check PERF_FORMAT_LOST bit to determine the data format.
Add sample_read_value_size() and next_sample_read_value() helpers to
make it a bit easier to access. Use them in all places where it reads
the struct sample_read_value.
Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220819003644.508916-5-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Fri, 19 Aug 2022 00:36:43 +0000 (17:36 -0700)]
libperf: Add a test case for read formats
It checks a various combination of the read format settings and verify
it return the value in a proper position. The test uses task-clock
software events to guarantee it's always active and sets enabled/running
time.
Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220819003644.508916-4-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Fri, 19 Aug 2022 00:36:42 +0000 (17:36 -0700)]
libperf: Handle read format in perf_evsel__read()
The perf_counts_values should be increased to read the new lost data.
Also adjust values after read according the read format.
This supports PERF_FORMAT_GROUP which has a different data format but
it's only available for leader events. Currently it doesn't have an API
to read sibling (member) events in the group. But users may read the
sibling event directly.
Also reading from mmap would be disabled when the read format has ID or
LOST bit as it's not exposed via mmap.
Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220819003644.508916-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
tools headers UAPI: Sync x86's asm/kvm.h with the kernel sources
To pick the changes in:
ed285b1fb9f6d018 ("KVM: x86: Tweak name of MONITOR/MWAIT #UD quirk to make it #UD specific") a561a5a8bb708c63 ("treewide: uapi: Replace zero-length arrays with flexible-array members") 419d42c3463b5f61 ("KVM: x86: Add a quirk for KVM's "MONITOR/MWAIT are NOPs!" behavior") efed1b70ad49ab42 ("KVM: x86: PIT: Preserve state of speaker port data bit") 95cd214f5b62ef31 ("KVM: x86: Extend KVM_{G,S}ET_VCPU_EVENTS to support pending triple fault")
That just rebuilds kvm-stat.c on x86, no change in functionality.
This silences these perf build warning:
Warning: Kernel ABI header at 'tools/arch/x86/include/uapi/asm/kvm.h' differs from latest version at 'arch/x86/include/uapi/asm/kvm.h'
diff -u tools/arch/x86/include/uapi/asm/kvm.h arch/x86/include/uapi/asm/kvm.h
Cc: Chenyi Qiang <chenyi.qiang@intel.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Gustavo A. R. Silva <gustavoars@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Paul Durrant <pdurrant@amazon.com> Link: https://lore.kernel.org/lkml/Yv6OMPKYqYSbUxwZ@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
That makes 'perf kvm-stat' aware of this new NOTIFY exit reason, thus
addressing the following perf build warning:
Warning: Kernel ABI header at 'tools/arch/x86/include/uapi/asm/vmx.h' differs from latest version at 'arch/x86/include/uapi/asm/vmx.h'
diff -u tools/arch/x86/include/uapi/asm/vmx.h arch/x86/include/uapi/asm/vmx.h
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Tao Xu <tao3.xu@intel.com> Link: http://lore.kernel.org/lkml/Yv6LavXMZ+njijpq@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Warning: Kernel ABI header at 'tools/include/uapi/linux/vhost.h' differs from latest version at 'include/uapi/linux/vhost.h'
diff -u tools/include/uapi/linux/vhost.h include/uapi/linux/vhost.h
To pick up these changes and support them:
$ tools/perf/trace/beauty/vhost_virtio_ioctl.sh > before
$ cp include/uapi/linux/vhost.h tools/include/uapi/linux/vhost.h
$ tools/perf/trace/beauty/vhost_virtio_ioctl.sh > after
$ diff -u before after
--- before 2022-08-18 09:46:12.355958316 -0300
+++ after 2022-08-18 09:46:19.701182822 -0300
@@ -29,6 +29,7 @@
[0x75] = "VDPA_SET_VRING_ENABLE",
[0x77] = "VDPA_SET_CONFIG_CALL",
[0x7C] = "VDPA_SET_GROUP_ASID",
+ [0x7D] = "VDPA_SUSPEND",
};
= {
[0x00] = "GET_FEATURES",
$
For instance, see how those 'cmd' ioctl arguments get translated, now
VDPA_SUSPEND will be as well:
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Eugenio Pérez <eperezma@redhat.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/Yv6Kb4OESuNJuH6X@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
tools headers kvm s390: Sync headers with the kernel sources
To pick the changes in:
fe36107dd2fc7fc1 ("KVM: s390: resetting the Topology-Change-Report")
None of them trigger any changes in tooling, this time this is just to silence
these perf build warnings:
Warning: Kernel ABI header at 'tools/arch/s390/include/uapi/asm/kvm.h' differs from latest version at 'arch/s390/include/uapi/asm/kvm.h'
diff -u tools/arch/s390/include/uapi/asm/kvm.h arch/s390/include/uapi/asm/kvm.h
tools headers UAPI: Sync linux/kvm.h with the kernel sources
To pick the changes in:
aaef20ab386999e9 ("RISC-V: KVM: Add extensible CSR emulation framework") fe36107dd2fc7fc1 ("KVM: s390: resetting the Topology-Change-Report") 571f00e73d7789ac ("KVM: stats: Fix value for KVM_STATS_UNIT_MAX for boolean stats") b9b6f8b52cf9c2f7 ("kvm: stats: tell userspace which values are boolean") 64990bbe901380db ("KVM: s390: add KVM_S390_ZPCI_OP to manage guest zPCI devices") a561a5a8bb708c63 ("treewide: uapi: Replace zero-length arrays with flexible-array members") 402e816d8e056b1a ("KVM: x86/MMU: Allow NX huge pages to be disabled on a per-vm basis") 4510b26371bd127a ("KVM: VMX: Enable Notify VM exit") 95cd214f5b62ef31 ("KVM: x86: Extend KVM_{G,S}ET_VCPU_EVENTS to support pending triple fault") 8c0d5a7eb53b8fc0 ("KVM: s390: Add KVM_CAP_S390_PROTECTED_DUMP") 52055a4e11c2a123 ("KVM: s390: Add CPU dump functionality") f5af74b47971fa42 ("KVM: s390: Add configuration dump functionality") 8515705b6c220303 ("KVM: s390: pv: Add query dump information") 9b3b043e43ab93ef ("KVM: s390: pv: Add query interface") 2bf7e642ade6463f ("KVM, SEV: Add KVM_EXIT_SHUTDOWN metadata for SEV-ES") 6f0a26bc688e5433 ("KVM: x86: Accept KVM_[GS]ET_TSC_KHZ as a VM ioctl.") cb7f95d34307235e ("KVM: x86/xen: Advertise and document KVM_XEN_HVM_CONFIG_EVTCHN_SEND") 606d68ae0aa0cde0 ("KVM: x86/xen: Support per-vCPU event channel upcall via local APIC") d5073800facedd10 ("KVM: x86/xen: Kernel acceleration for XENVER_version") 56c98198613059df ("KVM: x86/xen: handle PV timers oneshot mode") 262290ec5a70de93 ("KVM: x86/xen: Add KVM_XEN_VCPU_ATTR_TYPE_VCPU_ID") 6f90edd30e11ac53 ("KVM: x86/xen: intercept EVTCHNOP_send from guests") d2a2ac589a304b07 ("KVM: x86/xen: Support direct injection of event channel events")
That just rebuilds perf, as these patches add just an ioctl that is S390
specific and may clash with other arches, so are so far being excluded
in the harvester script:
$ tools/perf/trace/beauty/kvm_ioctl.sh > before
$ cp include/uapi/linux/kvm.h tools/include/uapi/linux/kvm.h
$ tools/perf/trace/beauty/kvm_ioctl.sh > after
$ diff -u before after
$ grep 390 tools/perf/trace/beauty/kvm_ioctl.sh
egrep -v " ((ARM|PPC|S390)_|[GS]ET_(DEBUGREGS|PIT2|XSAVE|TSC_KHZ)|CREATE_SPAPR_TCE_64)" | \
$
This is also by now used by tools/testing/selftests/kvm/, a simple test
build succeeded.
This silences this perf build warning:
Warning: Kernel ABI header at 'tools/include/uapi/linux/kvm.h' differs from latest version at 'include/uapi/linux/kvm.h'
diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h
Cc: Anup Patel <anup@brainfault.org> Cc: Ben Gardon <bgardon@google.com> Cc: Chenyi Qiang <chenyi.qiang@intel.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Gustavo A. R. Silva <gustavoars@kernel.org> Cc: Janosch Frank <frankja@linux.ibm.com> Cc: João Martins <joao.m.martins@oracle.com> Cc: Matthew Rosato <mjrosato@linux.ibm.com> Cc: Oliver Upton <oupton@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Gonda <pgonda@google.com> Cc: Pierre Morel <pmorel@linux.ibm.com> Cc: Tao Xu <tao3.xu@intel.com> Link: https://lore.kernel.org/lkml/YvzuryClcn%2FvA0Gn@kernel.org/ Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
That don't add any new ioctl, so no changes in tooling.
This silences this perf build warning:
Warning: Kernel ABI header at 'tools/include/uapi/drm/i915_drm.h' differs from latest version at 'include/uapi/drm/i915_drm.h'
diff -u tools/include/uapi/drm/i915_drm.h include/uapi/drm/i915_drm.h
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Cc: Ramalingam C <ramalingam.c@intel.com> Link: http://lore.kernel.org/lkml/Yvzrp9RFIeEkb5fI@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
CC /tmp/build/perf/bench/mem-memcpy-x86-64-asm.o
CC /tmp/build/perf/bench/mem-memset-x86-64-asm.o
And addresses this perf build warning:
Warning: Kernel ABI header at 'tools/arch/x86/include/asm/cpufeatures.h' differs from latest version at 'arch/x86/include/asm/cpufeatures.h'
diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexandre Chartre <alexandre.chartre@oracle.com> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Borislav Petkov <bp@suse.de> Cc: Daniel Sneddon <daniel.sneddon@linux.intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Cc: Wyes Karny <wyes.karny@amd.com> Link: https://lore.kernel.org/lkml/Yvznmu5oHv0ZDN2w@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
tools headers UAPI: Sync linux/fscrypt.h with the kernel sources
To pick the changes from:
f4e6c9b7d4c69f9b ("fscrypt: Add HCTR2 support for filename encryption")
That don't result in any changes in tooling, just causes this to be
rebuilt:
CC /tmp/build/perf-urgent/trace/beauty/sync_file_range.o
LD /tmp/build/perf-urgent/trace/beauty/perf-in.o
addressing this perf build warning:
Warning: Kernel ABI header at 'tools/include/uapi/linux/fscrypt.h' differs from latest version at 'include/uapi/linux/fscrypt.h'
diff -u tools/include/uapi/linux/fscrypt.h include/uapi/linux/fscrypt.h
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nathan Huckleberry <nhuck@google.com> Link: https://lore.kernel.org/lkml/Yvzl8C7O1b+hf9GS@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
tools arch x86: Sync the msr-index.h copy with the kernel sources
To pick up the changes in:
bef8fbe81a3f897c ("x86/speculation: Add RSB VM Exit protections") 96f870629fb77467 ("tools/power turbostat: dump secondary Turbo-Ratio-Limit") e424e4a2af1e673a ("x86/speculation: Disable RRSBA behavior") 21ef0d4c586f56a8 ("x86/cpu/amd: Add Spectral Chicken") 473ed2146bc6956f ("x86/bugs: Report Intel retbleed vulnerability") 56a759afcfa5827c ("KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS") d333500d1b19ee83 ("x86/cpu: Add new VMX feature, Tertiary VM-Execution control") 97304474a8186b26 ("KVM: x86/speculation: Disable Fill buffer clear within guests") 685b22d85855aad7 ("x86/speculation/mmio: Enumerate Processor MMIO Stale Data bug")
Addressing these tools/perf build warnings:
diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h
Warning: Kernel ABI header at 'tools/arch/x86/include/asm/msr-index.h' differs from latest version at 'arch/x86/include/asm/msr-index.h'
That makes the beautification scripts to pick some new entries:
If we use -v (verbose mode) we can see what it does behind the scenes:
# perf trace -v -e msr:*_msr/max-stack=32/ --filter="msr>=IA32_U_CET && msr<=IA32_INT_SSP_TAB"
Using CPUID AuthenticAMD-25-21-0
0x6a0
0x6a8
New filter for msr:read_msr: (msr>=0x6a0 && msr<=0x6a8) && (common_pid != 597499 && common_pid != 3313)
0x6a0
0x6a8
New filter for msr:write_msr: (msr>=0x6a0 && msr<=0x6a8) && (common_pid != 597499 && common_pid != 3313)
mmap size 528384B
^C#
Example with a frequent msr:
# perf trace -v -e msr:*_msr/max-stack=32/ --filter="msr==IA32_SPEC_CTRL" --max-events 2
Using CPUID AuthenticAMD-25-21-0
0x48
New filter for msr:read_msr: (msr==0x48) && (common_pid != 2612129 && common_pid != 3841)
0x48
New filter for msr:write_msr: (msr==0x48) && (common_pid != 2612129 && common_pid != 3841)
mmap size 528384B
Looking at the vmlinux_path (8 entries long)
symsrc__init: build id mismatch for vmlinux.
Using /proc/kcore for kernel data
Using /proc/kallsyms for symbols
0.000 Timer/2525383 msr:write_msr(msr: IA32_SPEC_CTRL, val: 6)
do_trace_write_msr ([kernel.kallsyms])
do_trace_write_msr ([kernel.kallsyms])
__switch_to_xtra ([kernel.kallsyms])
__switch_to ([kernel.kallsyms])
__schedule ([kernel.kallsyms])
schedule ([kernel.kallsyms])
futex_wait_queue_me ([kernel.kallsyms])
futex_wait ([kernel.kallsyms])
do_futex ([kernel.kallsyms])
__x64_sys_futex ([kernel.kallsyms])
do_syscall_64 ([kernel.kallsyms])
entry_SYSCALL_64_after_hwframe ([kernel.kallsyms])
__futex_abstimed_wait_common64 (/usr/lib64/libpthread-2.33.so)
0.030 :0/0 msr:write_msr(msr: IA32_SPEC_CTRL, val: 2)
do_trace_write_msr ([kernel.kallsyms])
do_trace_write_msr ([kernel.kallsyms])
__switch_to_xtra ([kernel.kallsyms])
__switch_to ([kernel.kallsyms])
__schedule ([kernel.kallsyms])
schedule_idle ([kernel.kallsyms])
do_idle ([kernel.kallsyms])
cpu_startup_entry ([kernel.kallsyms])
secondary_startup_64_no_verify ([kernel.kallsyms])
#
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Daniel Sneddon <daniel.sneddon@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Len Brown <len.brown@intel.com> Cc: Like Xu <like.xu@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Hoo <robert.hu@linux.intel.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/lkml/YvzbT24m2o5U%2F7+q@kernel.org/ Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
That don't result in any changes in the tables generated from that
header.
This silences this perf build warning:
Warning: Kernel ABI header at 'tools/perf/trace/beauty/include/linux/socket.h' differs from latest version at 'include/linux/socket.h'
diff -u tools/perf/trace/beauty/include/linux/socket.h include/linux/socket.h
Cc: David Ahern <dsahern@kernel.org> Cc: David S. Miller <davem@davemloft.net> Cc: Dylan Yudaken <dylany@fb.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Pavel Begunkov <asml.silence@gmail.com> Cc: Yajun Deng <yajun.deng@linux.dev> Link: https://lore.kernel.org/lkml/YvzYs+F+Xzq8Hvvp@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Tue, 14 Jun 2022 14:33:51 +0000 (07:33 -0700)]
perf cpumap: Fix alignment for masks in event encoding
A mask encoding of a cpu map is laid out as:
u16 nr
u16 long_size
unsigned long mask[];
However, the mask may be 8-byte aligned meaning there is a 4-byte pad
after long_size. This means 32-bit and 64-bit builds see the mask as
being at different offsets. On top of this the structure is in the byte
data[] encoded as:
u16 type
char data[]
This means the mask's struct isn't the required 4 or 8 byte aligned, but
is offset by 2. Consequently the long reads and writes are causing
undefined behavior as the alignment is broken.
Fix the mask struct by creating explicit 32 and 64-bit variants, use a
union to avoid data[] and casts; the struct must be packed so the
layout matches the existing perf.data layout. Taking an address of a
member of a packed struct breaks alignment so pass the packed
perf_record_cpu_map_data to functions, so they can access variables with
the right alignment.
As the 64-bit version has 4 bytes of padding, optimizing writing to only
write the 32-bit version.
Committer notes:
Disable warnings about 'packed' that break the build in some arches like
riscv64, but just around that specific struct.
Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Colin Ian King <colin.king@intel.com> Cc: Dave Marchevsky <davemarchevsky@fb.com> Cc: German Gomez <german.gomez@arm.com> Cc: Gustavo A. R. Silva <gustavoars@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Kees Kook <keescook@chromium.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20220614143353.1559597-5-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Linus Torvalds [Fri, 19 Aug 2022 16:46:11 +0000 (09:46 -0700)]
Merge tag 'sound-6.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"The only significant core change is ASoC DPCM fix for asymmetric
setup; other remaining changes are device-specific fixes, including
the hardening of string manipulations.
One change in platform/x86 is the patch I forgot to apply from a
series for CS35L41 codec"
* tag 'sound-6.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (21 commits)
ALSA: hda/realtek: Add quirk for Clevo NS50PU, NS70PU
ALSA: info: Fix llseek return value when using callback
ALSA: hda/cs8409: Support new Dolphin Variants
platform/x86: serial-multi-instantiate: Add CLSA0101 Laptop
ALSA: hda/realtek: Add quirk for Lenovo Yoga7 14IAL7
ALSA: hda: cs35l41: Clarify support for CSC3551 without _DSD Properties
ALSA: hda/realtek: Add quirks for ASUS Zenbooks using CS35L41
ASoC: codec: tlv320aic32x4: fix mono playback via I2S
ASoC: rt5640: Fix the JD voltage dropping issue
ASoC: tas2770: Fix handling of mute/unmute
ASoC: tas2770: Drop conflicting set_bias_level power setting
ASoC: tas2770: Allow mono streams
ASoC: tas2770: Set correct FSYNC polarity
ASoC: Intel: fix sof_es8336 probe
ASoC: DPCM: Don't pick up BE without substream
ASoC: SOF: ipc3-topology: Fix clang -Wformat warning
ASoC: sh: rz-ssi: Improve error handling in rz_ssi_probe() error path
ASoC: SOF: Intel: hda: Fix potential buffer overflow by snprintf()
ASoC: SOF: debug: Fix potential buffer overflow by snprintf()
ASoC: Intel: avs: Fix potential buffer overflow by snprintf()
...
Linus Torvalds [Fri, 19 Aug 2022 16:39:32 +0000 (09:39 -0700)]
Merge tag 'drm-fixes-2022-08-19' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Regular weekly fixes.
The nouveau patch just enables modesetting on GA103 hw which is like
other ampere cards that are already supported. amdgpu has 2 weeks of
fixes, as Alex was away, so a bit larger than usual, otherwise some
i915 and misc other fixes.
ttm:
- NULL ptr dereference
i915:
- disable pci resize on 32-bit systems
- don't leak the ccs state
- TLB invalidation fixes
nouveau:
- GA103 enablement
- off-by-one fix
amdgpu:
- Revert some DML stack changes
- Rounding fixes in KFD allocations
- atombios vram info table parsing fix
- DCN 3.1.4 fixes
- Clockgating fixes for various new IPs
- SMU 13.0.4 fixes
- DCN 3.1.4 FP fixes
- TMDS fixes for YCbCr420 4k modes
- DCN 3.2.x fixes
- USB 4 fixes
- SMU 13.0 fixes
- SMU driver unload memory leak fixes
- Display orientation fix
- Regression fix for generic fbdev conversion
- SDMA 6.x fixes
- SR-IOV fixes
- IH 6.x fixes
- Use after free fix in bo list handling
- Revert pipe1 support
- XGMI hive reset fix
amdkfd:
- Fix potential crach in kfd_create_indirect_link_prop()
imx:
- warning fix
meson:
- refcounting fix
lvds-codec:
- error check fix
sun4i:
- underflow fix
- dt-binding fix"
* tag 'drm-fixes-2022-08-19' of git://anongit.freedesktop.org/drm/drm: (109 commits)
Revert "drm/amd/amdgpu: add pipe1 hardware support"
drm/amdgpu: Fix use-after-free on amdgpu_bo_list mutex
drm/amdgpu: Fix interrupt handling on ih_soft ring
drm/amdgpu: Add secure display TA load for Renoir
drm/amd/display: Include scaling factor for SubVP command
drm/amdgpu/vcn: Return void from the stop_dbg_mode
drm/amdgpu: remove useless condition in amdgpu_job_stop_all_jobs_on_sched()
drm/amdgpu: Add decode_iv_ts helper for ih_v6 block
drm/amd/display: add chip revision to DCN32
drm/amd/display: avoid doing vm_init multiple time
drm/amd/display: Use pitch when calculating size to cache in MALL
drm/amd/display: Don't set DSC for phantom pipes
drm/amd/display: Update clock table policy for DCN314
drm/amd/display: Modify header inclusion pattern
drm/amd/display: Fix plug/unplug external monitor will hang while playback MPO video
drm/amd/display: Add debug parameter to retain default clock table
drm/amdgpu: Increase tlb flush timeout for sriov
drm/amd/display: do not compare integers of different widths
drm/amd/display: Add reserved dc_log_type.
drm/amd/display: Fix pixel clock programming
...
* tag 'bitmap-6.0-rc2' of https://github.com/norov/linux:
lib/cpumask: drop always-true preprocessor guard
lib/cpumask: add inline cpumask_next_wrap() for UP
cpumask: align signatures of UP implementations
Wolfram Sang [Thu, 18 Aug 2022 21:01:41 +0000 (23:01 +0200)]
cifs: move from strlcpy with unused retval to strscpy
Follow the advice of the below link and prefer 'strscpy' in this
subsystem. Conversion is 1:1 because the return value is not used.
Generated by a coccinelle script.
Ian Rogers [Tue, 14 Jun 2022 14:33:50 +0000 (07:33 -0700)]
perf cpumap: Compute mask size in constant time
perf_cpu_map__max() computes the cpumap's maximum value, no need to
iterate over all values.
Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Colin Ian King <colin.king@intel.com> Cc: Dave Marchevsky <davemarchevsky@fb.com> Cc: German Gomez <german.gomez@arm.com> Cc: Gustavo A. R. Silva <gustavoars@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Kees Kook <keescook@chromium.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20220614143353.1559597-4-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Tue, 14 Jun 2022 14:33:49 +0000 (07:33 -0700)]
perf cpumap: Synthetic events and const/static
Make the cpumap arguments const to make it clearer they are in rather
than out arguments. Make two functions static and remove external
declarations.
Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Colin Ian King <colin.king@intel.com> Cc: Dave Marchevsky <davemarchevsky@fb.com> Cc: German Gomez <german.gomez@arm.com> Cc: Gustavo A. R. Silva <gustavoars@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Kees Kook <keescook@chromium.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20220614143353.1559597-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Tue, 14 Jun 2022 14:33:48 +0000 (07:33 -0700)]
perf cpumap: Const map for max()
Allows max() to be used with 'const struct perf_cpu_maps *'.
Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Colin Ian King <colin.king@intel.com> Cc: Dave Marchevsky <davemarchevsky@fb.com> Cc: German Gomez <german.gomez@arm.com> Cc: Gustavo A. R. Silva <gustavoars@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Kees Kook <keescook@chromium.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20220614143353.1559597-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Aaron Lu [Fri, 19 Aug 2022 02:30:01 +0000 (10:30 +0800)]
x86/mm: Use proper mask when setting PUD mapping
Commit 9af2a1f2490ec("x86/mm: thread pgprot_t through
init_memory_mapping()") mistakenly used __pgprot() which doesn't respect
__default_kernel_pte_mask when setting PUD mapping.
Fix it by only setting the one bit we actually need (PSE) and leaving
the other bits (that have been properly masked) alone.
Fixes: 9af2a1f2490e ("x86/mm: thread pgprot_t through init_memory_mapping()") Signed-off-by: Aaron Lu <aaron.lu@intel.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Li kunyu [Fri, 19 Aug 2022 02:15:35 +0000 (10:15 +0800)]
KVM: Drop unnecessary initialization of "ops" in kvm_ioctl_create_device()
The variable is initialized but it is only used after its assignment.
Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Li kunyu <kunyu@nfschina.com>
Message-Id: <20220819021535.483702-1-kunyu@nfschina.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Li kunyu [Fri, 19 Aug 2022 02:28:04 +0000 (10:28 +0800)]
KVM: Drop unnecessary initialization of "npages" in hva_to_pfn_slow()
The variable is initialized but it is only used after its assignment.
Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Li kunyu <kunyu@nfschina.com>
Message-Id: <20220819022804.483914-1-kunyu@nfschina.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Josh Poimboeuf [Thu, 18 Aug 2022 15:53:43 +0000 (08:53 -0700)]
x86/kvm: Fix "missing ENDBR" BUG for fastop functions
The following BUG was reported:
traps: Missing ENDBR: andw_ax_dx+0x0/0x10 [kvm]
------------[ cut here ]------------
kernel BUG at arch/x86/kernel/traps.c:253!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
<TASK>
asm_exc_control_protection+0x2b/0x30
RIP: 0010:andw_ax_dx+0x0/0x10 [kvm]
Code: c3 cc cc cc cc 0f 1f 44 00 00 66 0f 1f 00 48 19 d0 c3 cc cc cc
cc 0f 1f 40 00 f3 0f 1e fa 20 d0 c3 cc cc cc cc 0f 1f 44 00 00
<66> 0f 1f 00 66 21 d0 c3 cc cc cc cc 0f 1f 40 00 66 0f 1f 00 21
d0
Chao Peng [Tue, 16 Aug 2022 12:53:22 +0000 (20:53 +0800)]
KVM: Rename mmu_notifier_* to mmu_invalidate_*
The motivation of this renaming is to make these variables and related
helper functions less mmu_notifier bound and can also be used for non
mmu_notifier based page invalidation. mmu_invalidate_* was chosen to
better describe the purpose of 'invalidating' a page that those
variables are used for.
- mmu_notifier_seq/range_start/range_end are renamed to
mmu_invalidate_seq/range_start/range_end.
- mmu_notifier_retry{_hva} helper functions are renamed to
mmu_invalidate_retry{_hva}.
- mmu_notifier_count is renamed to mmu_invalidate_in_progress to
avoid confusion with mn_active_invalidate_count.
- While here, also update kvm_inc/dec_notifier_count() to
kvm_mmu_invalidate_begin/end() to match the change for
mmu_notifier_count.
No functional change intended.
Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Message-Id: <20220816125322.1110439-3-chao.p.peng@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Chao Peng [Tue, 16 Aug 2022 12:53:21 +0000 (20:53 +0800)]
KVM: Rename KVM_PRIVATE_MEM_SLOTS to KVM_INTERNAL_MEM_SLOTS
KVM_INTERNAL_MEM_SLOTS better reflects the fact those slots are KVM
internally used (invisible to userspace) and avoids confusion to future
private slots that can have different meaning.
Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Message-Id: <20220816125322.1110439-2-chao.p.peng@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
KVM: Move coalesced MMIO initialization (back) into kvm_create_vm()
Invoke kvm_coalesced_mmio_init() from kvm_create_vm() now that allocating
and initializing coalesced MMIO objects is separate from registering any
associated devices. Moving coalesced MMIO cleans up the last oddity
where KVM does VM creation/initialization after kvm_create_vm(), and more
importantly after kvm_arch_post_init_vm() is called and the VM is added
to the global vm_list, i.e. after the VM is fully created as far as KVM
is concerned.
Originally, kvm_coalesced_mmio_init() was called by kvm_create_vm(), but
the original implementation was completely devoid of error handling.
Commit b1c7d43e5b34 ("KVM: coalesced_mmio: fix kvm_coalesced_mmio_init()'s
error handling" fixed the various bugs, and in doing so rightly moved the
call to after kvm_create_vm() because kvm_coalesced_mmio_init() also
registered the coalesced MMIO device. Commit e7876bba9216 ("KVM: Make
coalesced mmio use a device per zone") cleaned up that mess by having
each zone register a separate device, i.e. moved device registration to
its logical home in kvm_vm_ioctl_register_coalesced_mmio(). As a result,
kvm_coalesced_mmio_init() is now a "pure" initialization helper and can
be safely called from kvm_create_vm().
Opportunstically drop the #ifdef, KVM provides stubs for
kvm_coalesced_mmio_{init,free}() when CONFIG_KVM_MMIO=n (s390).
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220816053937.2477106-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
KVM: Unconditionally get a ref to /dev/kvm module when creating a VM
Unconditionally get a reference to the /dev/kvm module when creating a VM
instead of using try_get_module(), which will fail if the module is in
the process of being forcefully unloaded. The error handling when
try_get_module() fails doesn't properly unwind all that has been done,
e.g. doesn't call kvm_arch_pre_destroy_vm() and doesn't remove the VM
from the global list. Not removing VMs from the global list tends to be
fatal, e.g. leads to use-after-free explosions.
The obvious alternative would be to add proper unwinding, but the
justification for using try_get_module(), "rmmod --wait", is completely
bogus as support for "rmmod --wait", i.e. delete_module() without
O_NONBLOCK, was removed by commit d871a32ab863 ("module: remove rmmod
--wait option.") nearly a decade ago.
It's still possible for try_get_module() to fail due to the module dying
(more like being killed), as the module will be tagged MODULE_STATE_GOING
by "rmmod --force", i.e. delete_module(..., O_TRUNC), but playing nice
with forced unloading is an exercise in futility and gives a falsea sense
of security. Using try_get_module() only prevents acquiring _new_
references, it doesn't magically put the references held by other VMs,
and forced unloading doesn't wait, i.e. "rmmod --force" on KVM is all but
guaranteed to cause spectacular fireworks; the window where KVM will fail
try_get_module() is tiny compared to the window where KVM is building and
running the VM with an elevated module refcount.
Addressing KVM's inability to play nice with "rmmod --force" is firmly
out-of-scope. Forcefully unloading any module taints kernel (for obvious
reasons) _and_ requires the kernel to be built with
CONFIG_MODULE_FORCE_UNLOAD=y, which is off by default and comes with the
amusing disclaimer that it's "mainly for kernel developers and desperate
users". In other words, KVM is free to scoff at bug reports due to using
"rmmod --force" while VMs may be running.
Fixes: 87cf79539c74 ("KVM: Prevent module exit until all VMs are freed") Cc: stable@vger.kernel.org Cc: David Matlack <dmatlack@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220816053937.2477106-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
KVM: Properly unwind VM creation if creating debugfs fails
Properly unwind VM creation if kvm_create_vm_debugfs() fails. A recent
change to invoke kvm_create_vm_debug() in kvm_create_vm() was led astray
by buggy try_get_module() handling adding by commit 87cf79539c74 ("KVM:
Prevent module exit until all VMs are freed"). The debugfs error path
effectively inherits the bad error path of try_module_get(), e.g. KVM
leaves the to-be-free VM on vm_list even though KVM appears to do the
right thing by calling module_put() and falling through.
Opportunistically hoist kvm_create_vm_debugfs() above the call to
kvm_arch_post_init_vm() so that the "post-init" arch hook is actually
invoked after the VM is initialized (ignoring kvm_coalesced_mmio_init()
for the moment). x86 is the only non-nop implementation of the post-init
hook, and it doesn't allocate/initialize any objects that are reachable
via debugfs code (spawns a kthread worker for the NX huge page mitigation).
Leave the buggy try_get_module() alone for now, it will be fixed in a
separate commit.
Fixes: 84a858c36633 ("KVM: Actually create debugfs in kvm_create_vm()") Reported-by: syzbot+744e173caec2e1627ee0@syzkaller.appspotmail.com Cc: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Message-Id: <20220816053937.2477106-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
- net: fix suspicious RCU usage in bpf_sk_reuseport_detach()
Current release - new code bugs:
- mlxsw: ptp: fix a couple of races, static checker warnings and
error handling
Previous releases - regressions:
- netfilter:
- nf_tables: fix possible module reference underflow in error path
- make conntrack helpers deal with BIG TCP (skbs > 64kB)
- nfnetlink: re-enable conntrack expectation events
- net: fix potential refcount leak in ndisc_router_discovery()
Previous releases - always broken:
- sched: cls_route: disallow handle of 0
- neigh: fix possible local DoS due to net iface start/stop loop
- rtnetlink: fix module refcount leak in rtnetlink_rcv_msg
- sched: fix adding qlen to qcpu->backlog in gnet_stats_add_queue_cpu
- virtio_net: fix endian-ness for RSS
- dsa: mv88e6060: prevent crash on an unused port
- fec: fix timer capture timing in `fec_ptp_enable_pps()`
- ocelot: stats: fix races, integer wrapping and reading incorrect
registers (the change of register definitions here accounts for
bulk of the changed LoC in this PR)"
* tag 'net-6.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (77 commits)
net: moxa: MAC address reading, generating, validity checking
tcp: handle pure FIN case correctly
tcp: refactor tcp_read_skb() a bit
tcp: fix tcp_cleanup_rbuf() for tcp_read_skb()
tcp: fix sock skb accounting in tcp_read_skb()
igb: Add lock to avoid data race
dt-bindings: Fix incorrect "the the" corrections
net: genl: fix error path memory leak in policy dumping
stmmac: intel: Add a missing clk_disable_unprepare() call in intel_eth_pci_remove()
net: ethernet: mtk_eth_soc: fix possible NULL pointer dereference in mtk_xdp_run
net/mlx5e: Allocate flow steering storage during uplink initialization
net: mscc: ocelot: report ndo_get_stats64 from the wraparound-resistant ocelot->stats
net: mscc: ocelot: keep ocelot_stat_layout by reg address, not offset
net: mscc: ocelot: make struct ocelot_stat_layout array indexable
net: mscc: ocelot: fix race between ndo_get_stats64 and ocelot_check_stats_work
net: mscc: ocelot: turn stats_lock into a spinlock
net: mscc: ocelot: fix address of SYS_COUNT_TX_AGING counter
net: mscc: ocelot: fix incorrect ndo_get_stats64 packet counters
net: dsa: felix: fix ethtool 256-511 and 512-1023 TX packet counters
net: dsa: don't warn in dsa_port_set_state_now() when driver doesn't support it
...
Linus Torvalds [Fri, 19 Aug 2022 02:24:57 +0000 (19:24 -0700)]
Merge tag 'linux-kselftest-next-6.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull Kselftest fix from Shuah Khan:
- fix landlock test build regression
* tag 'linux-kselftest-next-6.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests/landlock: fix broken include of linux/landlock.h
Linus Torvalds [Fri, 19 Aug 2022 02:18:28 +0000 (19:18 -0700)]
Merge tag 'trace-rtla-v6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull rtla tool fixes from Steven Rostedt:
"Fixes for the Real-Time Linux Analysis tooling:
- Fix tracer name in comments and prints
- Fix setting up symlinks
- Allow extra flags to be set in build
- Consolidate and show all necessary libraries not found in build
error"
* tag 'trace-rtla-v6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
rtla: Consolidate and show all necessary libraries that failed for building
tools/rtla: Build with EXTRA_{C,LD}FLAGS
tools/rtla: Fix command symlinks
rtla: Fix tracer name
Conor Dooley [Fri, 12 Aug 2022 14:35:32 +0000 (15:35 +0100)]
perf: riscv legacy: fix kerneldoc comment warning
Fix the warning:
drivers/perf/riscv_pmu_legacy.c:76: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
Sergei Antonov [Thu, 18 Aug 2022 09:23:17 +0000 (12:23 +0300)]
net: moxa: MAC address reading, generating, validity checking
This device does not remember its MAC address, so add a possibility
to get it from the platform. If it fails, generate a random address.
This will provide a MAC address early during boot without user space
being involved.
Also remove extra calls to is_valid_ether_addr().
Made after suggestions by Andrew Lunn:
1) Use eth_hw_addr_random() to assign a random MAC address during probe.
2) Remove is_valid_ether_addr() from moxart_mac_open()
3) Add a call to platform_get_ethdev_address() during probe
4) Remove is_valid_ether_addr() from moxart_set_mac_address(). The core does this
v1 -> v2:
Handle EPROBE_DEFER returned from platform_get_ethdev_address().
Move MAC reading code to the beginning of the probe function.
Signed-off-by: Sergei Antonov <saproj@gmail.com> Suggested-by: Andrew Lunn <andrew@lunn.ch> CC: Yang Yingliang <yangyingliang@huawei.com> CC: Pavel Skripkin <paskripkin@gmail.com> CC: Guobin Huang <huangguobin4@huawei.com> CC: Yang Wei <yang.wei9@zte.com.cn> CC: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Link: https://lore.kernel.org/r/20220818092317.529557-1-saproj@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
====================
tcp: some bug fixes for tcp_read_skb()
This patchset contains 3 bug fixes and 1 minor refactor patch for
tcp_read_skb(). V1 only had the first patch, as Eric prefers to fix all
of them together, I have to group them together.
====================
Cong Wang [Wed, 17 Aug 2022 19:54:45 +0000 (12:54 -0700)]
tcp: handle pure FIN case correctly
When skb->len==0, the recv_actor() returns 0 too, but we also use 0
for error conditions. This patch amends this by propagating the errors
to tcp_read_skb() so that we can distinguish skb->len==0 case from
error cases.
Fixes: 5b1c8052b4a6 ("tcp: Introduce tcp_read_skb()") Reported-by: Eric Dumazet <edumazet@google.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Cong Wang [Wed, 17 Aug 2022 19:54:44 +0000 (12:54 -0700)]
tcp: refactor tcp_read_skb() a bit
As tcp_read_skb() only reads one skb at a time, the while loop is
unnecessary, we can turn it into an if. This also simplifies the
code logic.
Cc: Eric Dumazet <edumazet@google.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Cong Wang [Wed, 17 Aug 2022 19:54:43 +0000 (12:54 -0700)]
tcp: fix tcp_cleanup_rbuf() for tcp_read_skb()
tcp_cleanup_rbuf() retrieves the skb from sk_receive_queue, it
assumes the skb is not yet dequeued. This is no longer true for
tcp_read_skb() case where we dequeue the skb first.
Fix this by introducing a helper __tcp_cleanup_rbuf() which does
not require any skb and calling it in tcp_read_skb().
Fixes: 5b1c8052b4a6 ("tcp: Introduce tcp_read_skb()") Cc: Eric Dumazet <edumazet@google.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Cong Wang [Wed, 17 Aug 2022 19:54:42 +0000 (12:54 -0700)]
tcp: fix sock skb accounting in tcp_read_skb()
Before commit 94b02d77281e ("net: Introduce a new proto_ops
->read_skb()"), skb was not dequeued from receive queue hence
when we close TCP socket skb can be just flushed synchronously.
After this commit, we have to uncharge skb immediately after being
dequeued, otherwise it is still charged in the original sock. And we
still need to retain skb->sk, as eBPF programs may extract sock
information from skb->sk. Therefore, we have to call
skb_set_owner_sk_safe() here.
Fixes: 94b02d77281e ("net: Introduce a new proto_ops ->read_skb()") Reported-and-tested-by: syzbot+a0e6f8738b58f7654417@syzkaller.appspotmail.com Tested-by: Stanislav Fomichev <sdf@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Lin Ma [Wed, 17 Aug 2022 18:49:21 +0000 (11:49 -0700)]
igb: Add lock to avoid data race
The commit 0369858903c6 ("igb: Teardown SR-IOV before
unregister_netdev()") places the unregister_netdev() call after the
igb_disable_sriov() call to avoid functionality issue.
However, it introduces several race conditions when detaching a device.
For example, when .remove() is called, the below interleaving leads to
use-after-free.
To this end, this commit first eliminates the data races from netdev
core by using rtnl_lock (similar to commit b96dc6eb861b ("dpaa2-eth: add
MAC/PHY support through phylink")). And then adds a spinlock to
eliminate races from driver requests. (similar to commit 89d2d79baab9
("ixgbe: Add locking to prevent panic when setting sriov_numvfs to zero")
Fixes: 0369858903c6 ("igb: Teardown SR-IOV before unregister_netdev()") Signed-off-by: Lin Ma <linma@zju.edu.cn> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20220817184921.735244-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 18 Aug 2022 18:02:11 +0000 (11:02 -0700)]
Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2022-08-17 (ice)
This series contains updates to ice driver only.
Grzegorz prevents modifications to VLAN 0 when setting VLAN promiscuous
as it will already be set. He also ignores -EEXIST error when attempting
to set promiscuous and ensures promiscuous mode is properly cleared from
the hardware when being removed.
Benjamin ignores additional -EEXIST errors when setting promiscuous mode
since the existing mode is the desired mode.
Sylwester fixes VFs to allow sending of tagged traffic when no VLAN filters
exist.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
ice: Fix VF not able to send tagged traffic with no VLAN filters
ice: Ignore error message when setting same promiscuous mode
ice: Fix clearing of promisc mode with bridge over bond
ice: Ignore EEXIST when setting promisc mode
ice: Fix double VLAN error when entering promisc mode
====================
Jakub Kicinski [Tue, 16 Aug 2022 16:19:39 +0000 (09:19 -0700)]
net: genl: fix error path memory leak in policy dumping
If construction of the array of policies fails when recording
non-first policy we need to unwind.
netlink_policy_dump_add_policy() itself also needs fixing as
it currently gives up on error without recording the allocated
pointer in the pstate pointer.
Reported-by: syzbot+dc54d9ba8153b216cae0@syzkaller.appspotmail.com Fixes: 95d4e7fd2965 ("genetlink: properly support per-op policy dumping") Link: https://lore.kernel.org/r/20220816161939.577583-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
stmmac: intel: Add a missing clk_disable_unprepare() call in intel_eth_pci_remove()
Commit e57086dbdee4 ("stmmac: intel: Fix clock handling on error and remove
paths") removed this clk_disable_unprepare()
This was partly revert by commit c03cc9b5591f ("net: stmmac: Fix clock
handling on remove path") which removed this clk_disable_unprepare()
because:
"
While unloading the dwmac-intel driver, clk_disable_unprepare() is
being called twice in stmmac_dvr_remove() and
intel_eth_pci_remove(). This causes kernel panic on the second call.
"
However later on, commit 9a84d76486da8 ("net: stmmac: add clocks management
for gmac driver") has updated stmmac_dvr_remove() which do not call
clk_disable_unprepare() anymore.
So this call should now be called from intel_eth_pci_remove().
Jens Wiklander [Thu, 18 Aug 2022 11:08:59 +0000 (13:08 +0200)]
tee: add overflow check in register_shm_helper()
With special lengths supplied by user space, register_shm_helper() has
an integer overflow when calculating the number of pages covered by a
supplied user space memory region.
This causes internal_get_user_pages_fast() a helper function of
pin_user_pages_fast() to do a NULL pointer dereference:
Yufen Yu [Wed, 3 Aug 2022 02:33:55 +0000 (10:33 +0800)]
blk-mq: run queue no matter whether the request is the last request
We do test on a virtio scsi device (/dev/sda) and the default mq
scheduler is 'none'. We found a IO hung as following:
blk_finish_plug
blk_mq_plug_issue_direct
scsi_mq_get_budget
//get budget_token fail and sdev->restarts=1
scsi_end_request
scsi_run_queue_async
//sdev->restart=0 and run queue
blk_mq_request_bypass_insert
//add request to hctx->dispatch list
//continue to dispath plug list
blk_mq_dispatch_plug_list
blk_mq_try_issue_list_directly
//success issue all requests from plug list
After .get_budget fail, scsi_mq_get_budget will increase 'restarts'.
Normally, it will run hw queue when io complete and set 'restarts'
as 0. But if we run queue before adding request to the dispatch list
and blk_mq_dispatch_plug_list also success issue all requests, then
on one will run queue, and the request will be stall in the dispatch
list and cannot complete forever.
It is wrong to use last request of plug list to decide if run queue is
needed since all the remained requests in plug list may be from other
hctxs. To fix the bug, pass run_queue as true always to
blk_mq_request_bypass_insert().
Fix-suggested-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Yufen Yu <yuyufen@huawei.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Fixes: 4b563c78d628 ("block: attempt direct issue of plug list") Link: https://lore.kernel.org/r/20220803023355.3687360-1-yuyufen@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>