Linus Torvalds [Fri, 19 Nov 2021 19:33:31 +0000 (11:33 -0800)]
Merge branch 'SA_IMMUTABLE-fixes-for-v5.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull exit-vs-signal handling fixes from Eric Biederman:
"This is a small set of changes where debuggers were no longer able to
intercept synchronous SIGTRAP and SIGSEGV, introduced by the exit
cleanups.
This is essentially the change you suggested with all of i's dotted
and the t's crossed so that ptrace can intercept all of the cases it
has been able to intercept the past, and all of the cases that made it
to exit without giving ptrace a chance still don't give ptrace a
chance"
* 'SA_IMMUTABLE-fixes-for-v5.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
signal: Replace force_fatal_sig with force_exit_sig when in doubt
signal: Don't always set SA_IMMUTABLE for forced signals
Linus Torvalds [Fri, 19 Nov 2021 19:19:58 +0000 (11:19 -0800)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Six fixes, five in drivers (ufs, qla2xxx, iscsi) and one core change
to fix a regression in user space device state setting, which is used
by the iscsi daemons to effect device recovery"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: qla2xxx: Fix mailbox direction flags in qla2xxx_get_adapter_id()
scsi: ufs: core: Fix another task management completion race
scsi: ufs: core: Fix task management completion timeout race
scsi: core: sysfs: Fix hang when device state is set via sysfs
scsi: iscsi: Unblock session then wake up error handler
scsi: ufs: core: Improve SCSI abort handling
Linus Torvalds [Fri, 19 Nov 2021 19:07:13 +0000 (11:07 -0800)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma fixes from Jason Gunthorpe:
"There are a few big regression items from the merge window suggesting
that people are testing rc1's but not testing the for-next branches:
- Warnings fixes
- Crash in hf1 when creating QPs and setting counters
- Some old mlx4 cards fail to probe due to missing counters
- Syzkaller crash in the new counters code"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
MAINTAINERS: Update for VMware PVRDMA driver
RDMA/nldev: Check stat attribute before accessing it
RDMA/mlx4: Do not fail the registration on port stats
IB/hfi1: Properly allocate rdma counter desc memory
RDMA/core: Set send and receive CQ before forwarding to the driver
RDMA/netlink: Add __maybe_unused to static inline in C file
Linus Torvalds [Fri, 19 Nov 2021 18:50:11 +0000 (10:50 -0800)]
Merge tag 'drm-fixes-2021-11-19' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"This week's fixes, pretty quiet, about right for rc2. amdgpu is the
bulk of them but the scheduler ones have been reported in a few places
I think.
Otherwise just some minor i915 fixes and a few other scattered around:
scheduler:
- two refcounting fixes
cma-helper:
- use correct free path for noncoherent
efifb:
- probing fix
amdgpu:
- Better debugging info for SMU msgs
- Better error reporting when adding IP blocks
- Fix UVD powergating regression on CZ
- Clock reporting fix for navi1x
- OLED panel backlight fix
- Fix scaling on VGA/DVI for non-DC display code
- Fix GLFCLK handling for RGP on some APUs
- fix potential memory leak
* tag 'drm-fixes-2021-11-19' of git://anongit.freedesktop.org/drm/drm:
drm/amd/amdgpu: fix potential memleak
drm/amd/amdkfd: Fix kernel panic when reset failed and been triggered again
drm/amd/pm: add GFXCLK/SCLK clocks level print support for APUs
drm/amdgpu: fix set scaling mode Full/Full aspect/Center not works on vga and dvi connectors
drm/amd/display: Fix OLED brightness control on eDP
drm/amd/pm: Remove artificial freq level on Navi1x
drm/amd/pm: avoid duplicate powergate/ungate setting
drm/amdgpu: add error print when failing to add IP block(v2)
drm/amd/pm: Enhanced reporting also for a stuck command
drm/i915/guc: fix NULL vs IS_ERR() checking
drm/i915/dsi/xelpd: Fix the bit mask for wakeup GB
Revert "drm/i915/tgl/dsi: Gate the ddi clocks after pll mapping"
fbdev: Prevent probing generic drivers if a FB is already registered
drm/scheduler: fix drm_sched_job_add_implicit_dependencies harder
drm/scheduler: fix drm_sched_job_add_implicit_dependencies
drm/sun4i: fix unmet dependency on RESET_CONTROLLER for PHY_SUN6I_MIPI_DPHY
drm/cma-helper: Release non-coherent memory with dma_free_noncoherent()
drm/nouveau: hdmigv100.c: fix corrupted HDMI Vendor InfoFrame
Peter Zijlstra [Fri, 19 Nov 2021 09:29:47 +0000 (10:29 +0100)]
x86: Pin task-stack in __get_wchan()
When commit 80472ff2479d ("x86: Fix __get_wchan() for !STACKTRACE")
moved from stacktrace to native unwind_*() usage, the
try_get_task_stack() got lost, leading to use-after-free issues for
dying tasks.
signal: Replace force_fatal_sig with force_exit_sig when in doubt
Recently to prevent issues with SECCOMP_RET_KILL and similar signals
being changed before they are delivered SA_IMMUTABLE was added.
Unfortunately this broke debuggers[1][2] which reasonably expect
to be able to trap synchronous SIGTRAP and SIGSEGV even when
the target process is not configured to handle those signals.
Add force_exit_sig and use it instead of force_fatal_sig where
historically the code has directly called do_exit. This has the
implementation benefits of going through the signal exit path
(including generating core dumps) without the danger of allowing
userspace to ignore or change these signals.
This avoids userspace regressions as older kernels exited with do_exit
which debuggers also can not intercept.
In the future is should be possible to improve the quality of
implementation of the kernel by changing some of these force_exit_sig
calls to force_fatal_sig. That can be done where it matters on
a case-by-case basis with careful analysis.
Reported-by: Kyle Huey <me@kylehuey.com> Reported-by: kernel test robot <oliver.sang@intel.com>
[1] https://lkml.kernel.org/r/CAP045AoMY4xf8aC_4QU_-j7obuEPYgTcnQQP3Yxk=2X90jtpjw@mail.gmail.com
[2] https://lkml.kernel.org/r/20211117150258.GB5403@xsang-OptiPlex-9020 Fixes: a2a2dd8f85de ("signal: Add SA_IMMUTABLE to ensure forced siganls do not get changed") Fixes: 856705c39f8a ("signal/m68k: Use force_sigsegv(SIGSEGV) in fpsp040_die") Fixes: fc99908855da ("signal/powerpc: On swapcontext failure force SIGSEGV") Fixes: 21d81bf71bfe ("signal/s390: Use force_sigsegv in default_trap_handler") Fixes: e98d91d2e629 ("signal/sparc32: In setup_rt_frame and setup_fram use force_fatal_sig") Fixes: 908486441621 ("signal/sparc32: Exit with a fatal signal when try_to_clear_window_buffer fails") Fixes: eed3b85106a0 ("signal/x86: In emulate_vsyscall force a signal instead of calling do_exit") Fixes: cac0eb1efcc3 ("signal/vm86_32: Properly send SIGSEGV when the vm86 state cannot be saved.") Fixes: 05466dc15414 ("exit/syscall_user_dispatch: Send ordinary signals on failure") Link: https://lkml.kernel.org/r/871r3dqfv8.fsf_-_@email.froward.int.ebiederm.org Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Kees Cook <keescook@chromium.org> Tested-by: Kyle Huey <khuey@kylehuey.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
signal: Don't always set SA_IMMUTABLE for forced signals
Recently to prevent issues with SECCOMP_RET_KILL and similar signals
being changed before they are delivered SA_IMMUTABLE was added.
Unfortunately this broke debuggers[1][2] which reasonably expect to be
able to trap synchronous SIGTRAP and SIGSEGV even when the target
process is not configured to handle those signals.
Update force_sig_to_task to support both the case when we can allow
the debugger to intercept and possibly ignore the signal and the case
when it is not safe to let userspace know about the signal until the
process has exited.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Reported-by: Kyle Huey <me@kylehuey.com> Reported-by: kernel test robot <oliver.sang@intel.com> Cc: stable@vger.kernel.org
[1] https://lkml.kernel.org/r/CAP045AoMY4xf8aC_4QU_-j7obuEPYgTcnQQP3Yxk=2X90jtpjw@mail.gmail.com
[2] https://lkml.kernel.org/r/20211117150258.GB5403@xsang-OptiPlex-9020 Fixes: a2a2dd8f85de ("signal: Add SA_IMMUTABLE to ensure forced siganls do not get changed") Link: https://lkml.kernel.org/r/877dd5qfw5.fsf_-_@email.froward.int.ebiederm.org Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Kees Cook <keescook@chromium.org> Tested-by: Kyle Huey <khuey@kylehuey.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Dave Airlie [Fri, 19 Nov 2021 03:28:54 +0000 (13:28 +1000)]
Merge tag 'drm-misc-fixes-2021-11-18' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
A infoframe corruption fix for nouveau, a wrong free function usage fix
for GEM CMA helpers, a Kconfig dependency fix for sun4i, two fixes for
drm/scheduler refcounting and a probing fix for efifb.
Linus Torvalds [Fri, 19 Nov 2021 01:09:05 +0000 (17:09 -0800)]
Merge tag 'zstd-for-linus-5.16-rc1' of git://github.com/terrelln/linux
Pull zstd fixes from Nick Terrell:
"Fix stack usage on parisc & improve code size bloat
This contains three commits:
1. Fixes a minor unused variable warning reported by Kernel test
robot [0].
2. Improves the reported code bloat (-88KB / 374KB) [1] by outlining
some functions that are unlikely to be used in performance
sensitive workloads.
3. Fixes the reported excess stack usage on parisc [2] by removing
-O3 from zstd's compilation flags. -O3 triggered bugs in the
hppa-linux-gnu gcc-8 compiler. -O2 performance is acceptable:
neutral compression, about -1% decompression speed. We also reduce
code bloat (-105KB / 374KB).
After this our code bloat is cut from 374KB to 105KB with gcc-11. If
we wanted to cut the remaining 105KB we'd likely have to trade
signicant performance, so I want to say that this is enough for now.
We should be able to get further gains without sacrificing speed, but
that will take some significant optimization effort, and isn't
suitable for a quick fix. I've opened an upstream issue [3] to track
the code size, and try to avoid future regressions, and improve it in
the long term"
Linus Torvalds [Thu, 18 Nov 2021 22:52:24 +0000 (14:52 -0800)]
Merge tag 'thermal-5.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull thermal control fixes from Rafael Wysocki:
"These fix the handling of thermal zones during system resume and
disable building of the int340x thermal driver on 32-bit.
Specifics:
- Prevent the previous high and low thermal zone trip values from
being retained over a system suspend-resume cycle (Manaf
Meethalavalappu Pallikunhi)
- Prevent the int340x thermal driver from being built in 32-bit
kernel configurations, because running it on 32-bit is questionable
(Arnd Bergmann)"
* tag 'thermal-5.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
thermal: core: Reset previous low and high trip during thermal zone init
thermal: int340x: Limit Kconfig to 64-bit
Linus Torvalds [Thu, 18 Nov 2021 22:46:28 +0000 (14:46 -0800)]
Merge tag 'pm-5.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix a system-wide suspend issue in the DTPM framework and
improve the Energy Model documentation.
Specifics:
- Fix system suspend handling in DTPM when it is enabled, but not
actually used (Daniel Lezcano)
- Describe the new cpufreq callback for Energy Model registration and
explain the "advanced" and "simple" EM variants in the EM
documentation (Lukasz Luba)"
* tag 'pm-5.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
Documentation: power: Describe 'advanced' and 'simple' EM models
Documentation: power: Add description about new callback for EM registration
powercap: DTPM: Fix suspend failure and kernel warning
Linus Torvalds [Thu, 18 Nov 2021 22:42:36 +0000 (14:42 -0800)]
Merge tag 'acpi-5.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fix from Rafael Wysocki:
"Revert the change attempting to release PM resources blocked by unused
ACPI objects after device enumeration, because it caused boot issues
to appear on multiple systems"
* tag 'acpi-5.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
Revert "ACPI: scan: Release PM resources blocked by unused objects"
Linus Torvalds [Thu, 18 Nov 2021 22:39:40 +0000 (14:39 -0800)]
Merge tag 'platform-drivers-x86-v5.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver fixes from Hans de Goede:
"Various build- and bug-fixes as well as one hardware-id addition"
* tag 'platform-drivers-x86-v5.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86: thinkpad_acpi: fix documentation for adaptive keyboard
platform/x86: thinkpad_acpi: Fix WWAN device disabled issue after S3 deep
platform/x86: thinkpad_acpi: Add support for dual fan control
platform/x86: think-lmi: Abort probe on analyze failure
platform/x86: dell-wmi-descriptor: disable by default
platform/x86: samsung-laptop: Fix typo in a comment
platform/x86: hp_accel: Fix an error handling path in 'lis3lv02d_probe()'
platform/x86: amd-pmc: Make CONFIG_AMD_PMC depend on RTC_CLASS
platform/mellanox: mlxreg-lc: fix error code in mlxreg_lc_create_static_devices()
Linus Torvalds [Thu, 18 Nov 2021 22:35:41 +0000 (14:35 -0800)]
Merge tag 'spi-fix-v5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"A few small fixes for v5.16, one in the core for an issue with
handling of controller unregistration that was introduced with the
fixes for registering nested SPI controllers and a few more minor
device specific ones"
* tag 'spi-fix-v5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: fix use-after-free of the add_lock mutex
spi: spi-geni-qcom: fix error handling in spi_geni_grab_gpi_chan()
spi: lpspi: Silence error message upon deferred probe
spi: cadence-quadspi: fix write completion support
Nick Terrell [Tue, 16 Nov 2021 23:11:39 +0000 (15:11 -0800)]
lib: zstd: Don't add -O3 to cflags
After the update to zstd-1.4.10 passing -O3 is no longer necessary to
get good performance from zstd. Using the default optimization level -O2
is sufficient to get good performance.
I've measured no significant change to compression speed, and a ~1%
decompression speed loss, which is acceptable.
This fixes the reported parisc -Wframe-larger-than=1536 errors [0]. The
gcc-8-hppa-linux-gnu compiler performed very poorly with -O3, generating
stacks that are ~3KB. With -O2 these same functions generate stacks in
the < 100B, completely fixing the problem. Function size deltas are
listed below:
Nick Terrell [Tue, 16 Nov 2021 04:33:08 +0000 (20:33 -0800)]
lib: zstd: Don't inline functions in zstd_opt.c
`zstd_opt.c` contains the match finder for the highest compression
levels. These levels are already very slow, and are unlikely to be used
in the kernel. If they are used, they shouldn't be used in latency
sensitive workloads, so slowing them down shouldn't be a big deal.
This saves 188 KB of the 288 KB regression reported by Geert Uytterhoeven [0].
I've also opened an issue upstream [1] so that we can properly tackle
the code size issue in `zstd_opt.c` for all users, and can hopefully
remove this hack in the next zstd version we import.
Nick Terrell [Tue, 16 Nov 2021 03:08:19 +0000 (19:08 -0800)]
lib: zstd: Fix unused variable warning
The variable `litLengthSum` is only used by an `assert()`, so when
asserts are disabled the compiler doesn't see any usage and warns.
This issue is already fixed upstream by PR #2838 [0]. It was reported
by the Kernel test robot in [1].
Another approach would be to change zstd's disabled `assert()`
definition to use the argument in a disabled branch, instead of
ignoring the argument. I've avoided this approach because there are
some small changes necessary to get zstd to build, and I would
want to thoroughly re-test for performance, since that is slightly
changing the code in every function in zstd. It seems like a
trivial change, but some functions are pretty sensitive to small
changes. However, I think it is a valid approach that I would
like to see upstream take, so I've opened Issue #2868 to attempt
this upstream.
Lastly, I've chosen not to use __maybe_unused because all code
in lib/zstd/ must eventually be upstreamed. Upstream zstd can't
use __maybe_unused because it isn't portable across all compilers.
- nl80211: fix getting radio statistics in survey dump
- e100: fix device suspend/resume
Previous releases - always broken:
- tcp: fix uninitialized access in skb frags array for Rx 0cp
- bpf: fix toctou on read-only map's constant scalar tracking
- bpf: forbid bpf_ktime_get_coarse_ns and bpf_timer_* in tracing
progs
- tipc: only accept encrypted MSG_CRYPTO msgs
- smc: transfer remaining wait queue entries during fallback, fix
missing wake ups
- udp: validate checksum in udp_read_sock() (when sockmap is used)
- sched: act_mirred: drop dst for the direction from egress to
ingress
- virtio_net_hdr_to_skb: count transport header in UFO, prevent
allowing bad skbs into the stack
- nfc: reorder the logic in nfc_{un,}register_device, fix unregister
- ipsec: check return value of ipv6_skip_exthdr
- usb: r8152: add MAC passthrough support for more Lenovo Docks"
* tag 'net-5.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (96 commits)
ptp: ocp: Fix a couple NULL vs IS_ERR() checks
net: ethernet: dec: tulip: de4x5: fix possible array overflows in type3_infoblock()
net: tulip: de4x5: fix the problem that the array 'lp->phy[8]' may be out of bound
ipv6: check return value of ipv6_skip_exthdr
e100: fix device suspend/resume
devlink: Don't throw an error if flash notification sent before devlink visible
page_pool: Revert "page_pool: disable dma mapping support..."
ethernet: hisilicon: hns: hns_dsaf_misc: fix a possible array overflow in hns_dsaf_ge_srst_by_port()
octeontx2-af: debugfs: don't corrupt user memory
NFC: add NCI_UNREG flag to eliminate the race
NFC: reorder the logic in nfc_{un,}register_device
NFC: reorganize the functions in nci_request
tipc: check for null after calling kmemdup
i40e: Fix display error code in dmesg
i40e: Fix creation of first queue by omitting it if is not power of two
i40e: Fix warning message and call stack during rmmod i40e driver
i40e: Fix ping is lost after configuring ADq on VF
i40e: Fix changing previously set num_queue_pairs for PFs
i40e: Fix NULL ptr dereference on VSI filter sync
i40e: Fix correct max_pkt_size on VF RX queue
...
Linus Torvalds [Thu, 18 Nov 2021 20:41:14 +0000 (12:41 -0800)]
Merge tag 'for-5.16-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"Several xes and one old ioctl deprecation. Namely there's fix for
crashes/warnings with lzo compression that was suspected to be caused
by first pull merge resolution, but it was a different bug.
Summary:
- regression fix for a crash in lzo due to missing boundary checks of
the page array
- fix crashes on ARM64 due to missing barriers when synchronizing
status bits between work queues
- silence lockdep when reading chunk tree during mount
- fix false positive warning in integrity checker on devices with
disabled write caching
- fix signedness of bitfields in scrub
- start deprecation of balance v1 ioctl"
* tag 'for-5.16-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: deprecate BTRFS_IOC_BALANCE ioctl
btrfs: make 1-bit bit-fields of scrub_page unsigned int
btrfs: check-integrity: fix a warning on write caching disabled disk
btrfs: silence lockdep when reading chunk tree during mount
btrfs: fix memory ordering between normal and ordered work functions
btrfs: fix a out-of-bound access in copy_compressed_data_to_page()
Linus Torvalds [Thu, 18 Nov 2021 20:17:33 +0000 (12:17 -0800)]
Merge tag 'fs.idmapped.v5.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
Pull setattr idmapping fix from Christian Brauner:
"This contains a simple fix for setattr. When determining the validity
of the attributes the ia_{g,u}id fields contain the value that will be
written to inode->i_{g,u}id. When the {g,u}id attribute of the file
isn't altered and the caller's fs{g,u}id matches the current {g,u}id
attribute the attribute change is allowed.
The value in ia_{g,u}id does already account for idmapped mounts and
will have taken the relevant idmapping into account. So in order to
verify that the {g,u}id attribute isn't changed we simple need to
compare the ia_{g,u}id value against the inode's i_{g,u}id value.
This only has any meaning for idmapped mounts as idmapping helpers are
idempotent without them. And for idmapped mounts this really only has
a meaning when circular idmappings are used, i.e. mappings where e.g.
id 1000 is mapped to id 1001 and id 1001 is mapped to id 1000. Such
ciruclar mappings can e.g. be useful when sharing the same home
directory between multiple users at the same time.
Before this patch we could end up denying legitimate attribute changes
and allowing invalid attribute changes when circular mappings are
used. To even get into this situation the caller must've been
privileged both to create that mapping and to create that idmapped
mount.
This hasn't been seen in the wild anywhere but came up when expanding
the fstest suite during work on a series of hardening patches. All
idmapped fstests pass without any regressions and we're adding new
tests to verify the behavior of circular mappings.
Linus Torvalds [Thu, 18 Nov 2021 20:13:24 +0000 (12:13 -0800)]
Merge tag 'for-5.16/parisc-4' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc fixes from Helge Deller:
"parisc bug and warning fixes and wire up futex_waitv.
Fix some warnings which showed up with allmodconfig builds, a revert
of a change to the sigreturn trampoline which broke signal handling,
wire up futex_waitv and add CONFIG_PRINTK_TIME=y to 32bit defconfig"
* tag 'for-5.16/parisc-4' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Enable CONFIG_PRINTK_TIME=y in 32bit defconfig
Revert "parisc: Reduce sigreturn trampoline to 3 instructions"
parisc: Wrap assembler related defines inside __ASSEMBLY__
parisc: Wire up futex_waitv
parisc: Include stringify.h to avoid build error in crypto/api.c
parisc/sticon: fix reverse colors
Linus Torvalds [Thu, 18 Nov 2021 20:05:22 +0000 (12:05 -0800)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
"Selftest changes:
- Cleanups for the perf test infrastructure and mapping hugepages
- Avoid contention on mmap_sem when the guests start to run
- Add event channel upcall support to xen_shinfo_test
x86 changes:
- Fixes for Xen emulation
- Kill kvm_map_gfn() / kvm_unmap_gfn() and broken gfn_to_pfn_cache
- Fixes for migration of 32-bit nested guests on 64-bit hypervisor
- Compilation fixes
- More SEV cleanups
Generic:
- Cap the return value of KVM_CAP_NR_VCPUS to both KVM_CAP_MAX_VCPUS
and num_online_cpus(). Most architectures were only using one of
the two"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (42 commits)
KVM: x86: Cap KVM_CAP_NR_VCPUS by KVM_CAP_MAX_VCPUS
KVM: s390: Cap KVM_CAP_NR_VCPUS by num_online_cpus()
KVM: RISC-V: Cap KVM_CAP_NR_VCPUS by KVM_CAP_MAX_VCPUS
KVM: PPC: Cap KVM_CAP_NR_VCPUS by KVM_CAP_MAX_VCPUS
KVM: MIPS: Cap KVM_CAP_NR_VCPUS by KVM_CAP_MAX_VCPUS
KVM: arm64: Cap KVM_CAP_NR_VCPUS by kvm_arm_default_max_vcpus()
KVM: x86: Assume a 64-bit hypercall for guests with protected state
selftests: KVM: Add /x86_64/sev_migrate_tests to .gitignore
riscv: kvm: fix non-kernel-doc comment block
KVM: SEV: Fix typo in and tweak name of cmd_allowed_from_miror()
KVM: SEV: Drop a redundant setting of sev->asid during initialization
KVM: SEV: WARN if SEV-ES is marked active but SEV is not
KVM: SEV: Set sev_info.active after initial checks in sev_guest_init()
KVM: SEV: Disallow COPY_ENC_CONTEXT_FROM if target has created vCPUs
KVM: Kill kvm_map_gfn() / kvm_unmap_gfn() and gfn_to_pfn_cache
KVM: nVMX: Use a gfn_to_hva_cache for vmptrld
KVM: nVMX: Use kvm_read_guest_offset_cached() for nested VMCS check
KVM: x86/xen: Use sizeof_field() instead of open-coding it
KVM: nVMX: Use kvm_{read,write}_guest_cached() for shadow_vmcs12
KVM: x86/xen: Fix get_attr of KVM_XEN_ATTR_TYPE_SHARED_INFO
...
Linus Torvalds [Thu, 18 Nov 2021 19:01:06 +0000 (11:01 -0800)]
Merge tag 'docs-5.16-2' of git://git.lwn.net/linux
Pull documentation fixes from Jonathan Corbet:
"A handful of documentation fixes for 5.16"
* tag 'docs-5.16-2' of git://git.lwn.net/linux:
Documentation/process: fix a cross reference
Documentation: update vcpu-requests.rst reference
docs: accounting: update delay-accounting.rst reference
libbpf: update index.rst reference
docs: filesystems: Fix grammatical error "with" to "which"
doc/zh_CN: fix a translation error in management-style
docs: ftrace: fix the wrong path of tracefs
Documentation: arm: marvell: Fix link to armada_1000_pb.pdf document
Documentation: arm: marvell: Put Armada XP section between Armada 370 and 375
Documentation: arm: marvell: Add some links to homepage / product infos
docs: Update Sphinx requirements
Linus Torvalds [Thu, 18 Nov 2021 18:50:45 +0000 (10:50 -0800)]
Merge tag 'printk-for-5.16-fixup' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux
Pull printk fixes from Petr Mladek:
- Try to flush backtraces from other CPUs also on the local one. This
was a regression caused by printk_safe buffers removal.
- Remove header dependency warning.
* tag 'printk-for-5.16-fixup' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
printk: Remove printk.h inclusion in percpu.h
printk: restore flushing of NMI buffers on remote CPUs after NMI backtraces
Dan Carpenter [Thu, 18 Nov 2021 11:22:11 +0000 (14:22 +0300)]
ptp: ocp: Fix a couple NULL vs IS_ERR() checks
The ptp_ocp_get_mem() function does not return NULL, it returns error
pointers.
Fixes: a964dcf3f0ae ("ptp: ocp: Expose various resources on the timecard.") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Teng Qi [Thu, 18 Nov 2021 07:01:18 +0000 (15:01 +0800)]
net: ethernet: dec: tulip: de4x5: fix possible array overflows in type3_infoblock()
The definition of macro MOTO_SROM_BUG is:
#define MOTO_SROM_BUG (lp->active == 8 && (get_unaligned_le32(
dev->dev_addr) & 0x00ffffff) == 0x3e0008)
and the if statement
if (MOTO_SROM_BUG) lp->active = 0;
using this macro indicates lp->active could be 8. If lp->active is 8 and
the second comparison of this macro is false. lp->active will remain 8 in:
lp->phy[lp->active].gep = (*p ? p : NULL); p += (2 * (*p) + 1);
lp->phy[lp->active].rst = (*p ? p : NULL); p += (2 * (*p) + 1);
lp->phy[lp->active].mc = get_unaligned_le16(p); p += 2;
lp->phy[lp->active].ana = get_unaligned_le16(p); p += 2;
lp->phy[lp->active].fdx = get_unaligned_le16(p); p += 2;
lp->phy[lp->active].ttm = get_unaligned_le16(p); p += 2;
lp->phy[lp->active].mci = *p;
However, the length of array lp->phy is 8, so array overflows can occur.
To fix these possible array overflows, we first check lp->active and then
return -EINVAL if it is greater or equal to ARRAY_SIZE(lp->phy) (i.e. 8).
Reported-by: TOTE Robot <oslab@tsinghua.edu.cn> Signed-off-by: Teng Qi <starmiku1207184332@gmail.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 18 Nov 2021 11:48:33 +0000 (11:48 +0000)]
Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-
queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2021-11-17
This series contains updates to i40e driver only.
Eryk adds accounting for VLAN header in packet size when VF port VLAN is
configured. He also fixes TC queue distribution when the user has changed
queue counts as well as for configuration of VF ADQ which caused dropped
packets.
Michal adds tracking for when a VSI is being released to prevent null
pointer dereference when managing filters.
Karen ensures PF successfully initiates VF requested reset which could
cause a call trace otherwise.
Jedrzej moves validation of channel queue value earlier to prevent
partial configuration when the value is invalid.
Grzegorz corrects the reported error when adding filter fails.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jordy Zomer [Wed, 17 Nov 2021 19:06:48 +0000 (20:06 +0100)]
ipv6: check return value of ipv6_skip_exthdr
The offset value is used in pointer math on skb->data.
Since ipv6_skip_exthdr may return -1 the pointer to uh and th
may not point to the actual udp and tcp headers and potentially
overwrite other stuff. This is why I think this should be checked.
EDIT: added {}'s, thanks Kees
Signed-off-by: Jordy Zomer <jordy@pwning.systems> Signed-off-by: David S. Miller <davem@davemloft.net>
Jesse Brandeburg [Wed, 17 Nov 2021 20:59:52 +0000 (12:59 -0800)]
e100: fix device suspend/resume
As reported in [1], e100 was no longer working for suspend/resume
cycles. The previous commit mentioned in the fixes appears to have
broken things and this attempts to practice best known methods for
device power management and keep wake-up working while allowing
suspend/resume to work. To do this, I reorder a little bit of code
and fix the resume path to make sure the device is enabled.
Leon Romanovsky [Wed, 17 Nov 2021 14:49:09 +0000 (16:49 +0200)]
devlink: Don't throw an error if flash notification sent before devlink visible
The mlxsw driver calls to various devlink flash routines even before
users can get any access to the devlink instance itself. For example,
mlxsw_core_fw_rev_validate() one of such functions.
It causes to the WARN_ON to trigger warning about devlink not registered.
Fixes: 22bb49dfd49d ("devlink: Notify users when objects are accessible") Reported-by: Danielle Ratson <danieller@nvidia.com> Tested-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
As reported by Guillaume in [1]:
Enabling LPAE always enables CONFIG_ARCH_DMA_ADDR_T_64BIT
in 32-bit systems, which breaks the bootup proceess when a
ethernet driver is using page pool with PP_FLAG_DMA_MAP flag.
As we were hoping we had no active consumers for such system
when we removed the dma mapping support, and LPAE seems like
a common feature for 32 bits system, so revert it.
Teng Qi [Wed, 17 Nov 2021 03:44:53 +0000 (11:44 +0800)]
ethernet: hisilicon: hns: hns_dsaf_misc: fix a possible array overflow in hns_dsaf_ge_srst_by_port()
The if statement:
if (port >= DSAF_GE_NUM)
return;
limits the value of port less than DSAF_GE_NUM (i.e., 8).
However, if the value of port is 6 or 7, an array overflow could occur:
port_rst_off = dsaf_dev->mac_cb[port]->port_rst_off;
because the length of dsaf_dev->mac_cb is DSAF_MAX_PORT_NUM (i.e., 6).
To fix this possible array overflow, we first check port and if it is
greater than or equal to DSAF_MAX_PORT_NUM, the function returns.
Reported-by: TOTE Robot <oslab@tsinghua.edu.cn> Signed-off-by: Teng Qi <starmiku1207184332@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Helge Deller [Tue, 16 Nov 2021 12:12:21 +0000 (13:12 +0100)]
parisc: Wrap assembler related defines inside __ASSEMBLY__
Building allmodconfig shows errors in the gpu/drm/msm snapdragon drivers,
because a COND() define is used there which conflicts with the COND() for
PA-RISC assembly. Although the snapdragon driver isn't relevant for parisc, it
is nevertheless compiled when CONFIG_COMPILE_TEST is defined.
Move the COND() define and other PA-RISC mnemonics inside the #ifdef
__ASSEMBLY__ part to avoid this conflict.
Signed-off-by: Helge Deller <deller@gmx.de> Reported-by: kernel test robot <lkp@intel.com>
Helge Deller [Mon, 15 Nov 2021 16:34:50 +0000 (17:34 +0100)]
parisc: Include stringify.h to avoid build error in crypto/api.c
Include stringify.h to avoid this build error:
arch/parisc/include/asm/jump_label.h: error: expected ':' before '__stringify'
arch/parisc/include/asm/jump_label.h: error: label 'l_yes' defined but not used [-Werror=unused-label]
Signed-off-by: Helge Deller <deller@gmx.de> Reported-by: kernel test robot <lkp@intel.com>
Vitaly Kuznetsov [Tue, 16 Nov 2021 16:34:42 +0000 (17:34 +0100)]
KVM: s390: Cap KVM_CAP_NR_VCPUS by num_online_cpus()
KVM_CAP_NR_VCPUS is a legacy advisory value which on other architectures
return num_online_cpus() caped by KVM_CAP_NR_VCPUS or something else
(ppc and arm64 are special cases). On s390, KVM_CAP_NR_VCPUS returns
the same as KVM_CAP_MAX_VCPUS and this may turn out to be a bad
'advice'. Switch s390 to returning caped num_online_cpus() too.
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Message-Id: <20211116163443.88707-6-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Vitaly Kuznetsov [Tue, 16 Nov 2021 16:34:38 +0000 (17:34 +0100)]
KVM: arm64: Cap KVM_CAP_NR_VCPUS by kvm_arm_default_max_vcpus()
Generally, it doesn't make sense to return the recommended maximum number
of vCPUs which exceeds the maximum possible number of vCPUs.
Note: ARM64 is special as the value returned by KVM_CAP_MAX_VCPUS differs
depending on whether it is a system-wide ioctl or a per-VM one. Previously,
KVM_CAP_NR_VCPUS didn't have this difference and it seems preferable to
keep the status quo. Cap KVM_CAP_NR_VCPUS by kvm_arm_default_max_vcpus()
which is what gets returned by system-wide KVM_CAP_MAX_VCPUS.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20211116163443.88707-2-vkuznets@redhat.com> Acked-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Tom Lendacky [Mon, 24 May 2021 17:48:57 +0000 (12:48 -0500)]
KVM: x86: Assume a 64-bit hypercall for guests with protected state
When processing a hypercall for a guest with protected state, currently
SEV-ES guests, the guest CS segment register can't be checked to
determine if the guest is in 64-bit mode. For an SEV-ES guest, it is
expected that communication between the guest and the hypervisor is
performed to shared memory using the GHCB. In order to use the GHCB, the
guest must have been in long mode, otherwise writes by the guest to the
GHCB would be encrypted and not be able to be comprehended by the
hypervisor.
Create a new helper function, is_64_bit_hypercall(), that assumes the
guest is in 64-bit mode when the guest has protected state, and returns
true, otherwise invoking is_64_bit_mode() to determine the mode. Update
the hypercall related routines to use is_64_bit_hypercall() instead of
is_64_bit_mode().
Add a WARN_ON_ONCE() to is_64_bit_mode() to catch occurences of calls to
this helper function for a guest running with protected state.
Fixes: ec010ce9a72d ("KVM: SVM: Add required changes to support intercepts under SEV-ES") Reported-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Message-Id: <e0b20c770c9d0d1403f23d83e785385104211f74.1621878537.git.thomas.lendacky@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Randy Dunlap [Sun, 7 Nov 2021 03:47:06 +0000 (20:47 -0700)]
riscv: kvm: fix non-kernel-doc comment block
Don't use "/**" to begin a comment block for a non-kernel-doc comment.
Prevents this docs build warning:
vcpu_sbi.c:3: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
* Copyright (c) 2019 Western Digital Corporation or its affiliates.
Fixes: 073026e601ac ("RISC-V: KVM: Add SBI v0.1 support") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: kernel test robot <lkp@intel.com> Cc: Atish Patra <atish.patra@wdc.com> Cc: Anup Patel <anup.patel@wdc.com> Cc: kvm@vger.kernel.org Cc: kvm-riscv@lists.infradead.org Cc: linux-riscv@lists.infradead.org Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu>
Message-Id: <20211107034706.30672-1-rdunlap@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
KVM: SEV: Fix typo in and tweak name of cmd_allowed_from_miror()
Rename cmd_allowed_from_miror() to is_cmd_allowed_from_mirror(), fixing
a typo and making it obvious that the result is a boolean where
false means "not allowed".
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211109215101.2211373-7-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
KVM: SEV: Drop a redundant setting of sev->asid during initialization
Remove a fully redundant write to sev->asid during SEV/SEV-ES guest
initialization. The ASID is set a few lines earlier prior to the call to
sev_platform_init(), which doesn't take "sev" as a param, i.e. can't
muck with the ASID barring some truly magical behind-the-scenes code.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211109215101.2211373-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
KVM: SEV: WARN if SEV-ES is marked active but SEV is not
WARN if the VM is tagged as SEV-ES but not SEV. KVM relies on SEV and
SEV-ES being set atomically, and guards common flows with "is SEV", i.e.
observing SEV-ES without SEV means KVM has a fatal bug.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211109215101.2211373-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
KVM: SEV: Set sev_info.active after initial checks in sev_guest_init()
Set sev_info.active during SEV/SEV-ES activation before calling any code
that can potentially consume sev_info.es_active, e.g. set "active" and
"es_active" as a pair immediately after the initial sanity checks. KVM
generally expects that es_active can be true if and only if active is
true, e.g. sev_asid_new() deliberately avoids sev_es_guest() so that it
doesn't get a false negative. This will allow WARNing in sev_es_guest()
if the VM is tagged as SEV-ES but not SEV.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211109215101.2211373-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
KVM: SEV: Disallow COPY_ENC_CONTEXT_FROM if target has created vCPUs
Reject COPY_ENC_CONTEXT_FROM if the destination VM has created vCPUs.
KVM relies on SEV activation to occur before vCPUs are created, e.g. to
set VMCB flags and intercepts correctly.
Fixes: 902f83bac915 ("KVM: x86: Support KVM VMs sharing SEV context") Cc: stable@vger.kernel.org Cc: Peter Gonda <pgonda@google.com> Cc: Marc Orr <marcorr@google.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Nathan Tempelman <natet@google.com> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211109215101.2211373-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
David Woodhouse [Mon, 15 Nov 2021 16:50:27 +0000 (16:50 +0000)]
KVM: Kill kvm_map_gfn() / kvm_unmap_gfn() and gfn_to_pfn_cache
In commit 81f4987cf59a ("KVM: x86: Fix recording of guest steal time /
preempted status") I removed the only user of these functions because
it was basically impossible to use them safely.
There are two stages to the GFN->PFN mapping; first through the KVM
memslots to a userspace HVA and then through the page tables to
translate that HVA to an underlying PFN. Invalidations of the former
were being handled correctly, but no attempt was made to use the MMU
notifiers to invalidate the cache when the HVA->GFN mapping changed.
As a prelude to reinventing the gfn_to_pfn_cache with more usable
semantics, rip it out entirely and untangle the implementation of
the unsafe kvm_vcpu_map()/kvm_vcpu_unmap() functions from it.
All current users of kvm_vcpu_map() also look broken right now, and
will be dealt with separately. They broadly fall into two classes:
* Those which map, access the data and immediately unmap. This is
mostly gratuitous and could just as well use the existing user
HVA, and could probably benefit from a gfn_to_hva_cache as they
do so.
* Those which keep the mapping around for a longer time, perhaps
even using the PFN directly from the guest. These will need to
be converted to the new gfn_to_pfn_cache and then kvm_vcpu_map()
can be removed too.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <20211115165030.7422-8-dwmw2@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
David Woodhouse [Mon, 15 Nov 2021 16:50:26 +0000 (16:50 +0000)]
KVM: nVMX: Use a gfn_to_hva_cache for vmptrld
And thus another call to kvm_vcpu_map() can die.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <20211115165030.7422-7-dwmw2@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
David Woodhouse [Mon, 15 Nov 2021 16:50:25 +0000 (16:50 +0000)]
KVM: nVMX: Use kvm_read_guest_offset_cached() for nested VMCS check
Kill another mostly gratuitous kvm_vcpu_map() which could just use the
userspace HVA for it.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <20211115165030.7422-6-dwmw2@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
David Woodhouse [Mon, 15 Nov 2021 16:50:23 +0000 (16:50 +0000)]
KVM: x86/xen: Use sizeof_field() instead of open-coding it
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <20211115165030.7422-4-dwmw2@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
David Woodhouse [Mon, 15 Nov 2021 16:50:24 +0000 (16:50 +0000)]
KVM: nVMX: Use kvm_{read,write}_guest_cached() for shadow_vmcs12
Using kvm_vcpu_map() for reading from the guest is entirely gratuitous,
when all we do is a single memcpy and unmap it again. Fix it up to use
kvm_read_guest()... but in fact I couldn't bring myself to do that
without also making it use a gfn_to_hva_cache for both that *and* the
copy in the other direction.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <20211115165030.7422-5-dwmw2@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
David Woodhouse [Mon, 15 Nov 2021 16:50:21 +0000 (16:50 +0000)]
KVM: x86/xen: Fix get_attr of KVM_XEN_ATTR_TYPE_SHARED_INFO
In commit 84e3889464c5 ("KVM: xen: do not use struct gfn_to_hva_cache") we
stopped storing this in-kernel as a GPA, and started storing it as a GFN.
Which means we probably should have stopped calling gpa_to_gfn() on it
when userspace asks for it back.
Cc: stable@vger.kernel.org Fixes: 84e3889464c5 ("KVM: xen: do not use struct gfn_to_hva_cache") Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <20211115165030.7422-2-dwmw2@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Maxim Levitsky [Mon, 15 Nov 2021 13:18:37 +0000 (15:18 +0200)]
KVM: x86/mmu: include EFER.LMA in extended mmu role
Incorporate EFER.LMA into kvm_mmu_extended_role, as it used to compute the
guest root level and is not reflected in kvm_mmu_page_role.level when TDP
is in use. When simply running the guest, it is impossible for EFER.LMA
and kvm_mmu.root_level to get out of sync, as the guest cannot transition
from PAE paging to 64-bit paging without toggling CR0.PG, i.e. without
first bouncing through a different MMU context. And stuffing guest state
via KVM_SET_SREGS{,2} also ensures a full MMU context reset.
However, if KVM_SET_SREGS{,2} is followed by KVM_SET_NESTED_STATE, e.g. to
set guest state when migrating the VM while L2 is active, the vCPU state
will reflect L2, not L1. If L1 is using TDP for L2, then root_mmu will
have been configured using L2's state, despite not being used for L2. If
L2.EFER.LMA != L1.EFER.LMA, and L2 is using PAE paging, then root_mmu will
be configured for guest PAE paging, but will match the mmu_role for 64-bit
paging and cause KVM to not reconfigure root_mmu on the next nested VM-Exit.
Alternatively, the root_mmu's role could be invalidated after a successful
KVM_SET_NESTED_STATE that yields vcpu->arch.mmu != vcpu->arch.root_mmu,
i.e. that switches the active mmu to guest_mmu, but doing so is unnecessarily
tricky, and not even needed if L1 and L2 do have the same role (e.g., they
are both 64-bit guests and run with the same CR4).
Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20211115131837.195527-3-mlevitsk@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Maxim Levitsky [Mon, 15 Nov 2021 13:18:36 +0000 (15:18 +0200)]
KVM: nVMX: don't use vcpu->arch.efer when checking host state on nested state load
When loading nested state, don't use check vcpu->arch.efer to get the
L1 host's 64-bit vs. 32-bit state and don't check it for consistency
with respect to VM_EXIT_HOST_ADDR_SPACE_SIZE, as register state in vCPU
may be stale when KVM_SET_NESTED_STATE is called---and architecturally
does not exist. When restoring L2 state in KVM, the CPU is placed in
non-root where nested VMX code has no snapshot of L1 host state: VMX
(conditionally) loads host state fields loaded on VM-exit, but they need
not correspond to the state before entry. A simple case occurs in KVM
itself, where the host RIP field points to vmx_vmexit rather than the
instruction following vmlaunch/vmresume.
However, for the particular case of L1 being in 32- or 64-bit mode
on entry, the exit controls can be treated instead as the source of
truth regarding the state of L1 on entry, and can be used to check
that vmcs12.VM_EXIT_HOST_ADDR_SPACE_SIZE matches vmcs12.HOST_EFER if
vmcs12.VM_EXIT_LOAD_IA32_EFER is set. The consistency check on CPU
EFER vs. vmcs12.VM_EXIT_HOST_ADDR_SPACE_SIZE, instead, happens only
on VM-Enter. That's because, again, there's conceptually no "current"
L1 EFER to check on KVM_SET_NESTED_STATE.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20211115131837.195527-2-mlevitsk@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
David Woodhouse [Sun, 14 Nov 2021 08:59:02 +0000 (08:59 +0000)]
KVM: Fix steal time asm constraints
In 64-bit mode, x86 instruction encoding allows us to use the low 8 bits
of any GPR as an 8-bit operand. In 32-bit mode, however, we can only use
the [abcd] registers. For which, GCC has the "q" constraint instead of
the less restrictive "r".
Also fix st->preempted, which is an input/output operand rather than an
input.
Fixes: 81f4987cf59a ("KVM: x86: Fix recording of guest steal time / preempted status") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <89bf72db1b859990355f9c40713a34e0d2d86c98.camel@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paul Durrant [Mon, 15 Nov 2021 14:41:31 +0000 (14:41 +0000)]
cpuid: kvm_find_kvm_cpuid_features() should be declared 'static'
The lack a static declaration currently results in:
arch/x86/kvm/cpuid.c:128:26: warning: no previous prototype for function 'kvm_find_kvm_cpuid_features'
when compiling with "W=1".
Reported-by: kernel test robot <lkp@intel.com> Fixes: e77d164a71a2 ("KVM: x86: Make sure KVM_CPUID_FEATURES really are KVM_CPUID_FEATURES") Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Message-Id: <20211115144131.5943-1-pdurrant@amazon.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Dan Carpenter [Wed, 17 Nov 2021 07:34:54 +0000 (10:34 +0300)]
octeontx2-af: debugfs: don't corrupt user memory
The user supplies the "count" value to say how big its read buffer is.
The rvu_dbg_lmtst_map_table_display() function does not take the "count"
into account but instead just copies the whole table, potentially
corrupting the user's data.
Introduce the "ret" variable to store how many bytes we can copy. Also
I changed the type of "off" to size_t to make using min() simpler.
Fixes: 3d217aee2b5a ("octeontx2-af: cn10k: debugfs for dumping LMTST map table") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/20211117073454.GD5237@kili Signed-off-by: Jakub Kicinski <kuba@kernel.org>
For the above two UAF, the root cause is that the nfc_dev_up can race
between the nci_unregister_device routine. Therefore, this patch
introduce NCI_UNREG flag to easily eliminate the possible race. In
addition, the mutex_lock in nci_close_device can act as a barrier.
Signed-off-by: Lin Ma <linma@zju.edu.cn> Fixes: ce6044040a91 ("NFC: basic NCI protocol implementation") Reviewed-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Link: https://lore.kernel.org/r/20211116152732.19238-1-linma@zju.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The root cause for this race is concluded below:
1. The rfkill_blocked (USE) in nfc_dev_up is supposed to be placed after
the device_is_registered check.
2. Since the netlink operations are possible just after the device_add
in nfc_register_device, the nfc_dev_up() can happen anywhere during the
rfkill creation process, which leads to data race.
This patch reorder these actions to permit
1. Once device_del is finished, the nfc_dev_up cannot dereference the
rfkill object.
2. The rfkill_register need to be placed after the device_add of nfc_dev
because the parent device need to be created first. So this patch keeps
the order but inject device_lock to prevent the data race.
Signed-off-by: Lin Ma <linma@zju.edu.cn> Fixes: 335cc904fc8b ("NFC: RFKILL support") Reviewed-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Link: https://lore.kernel.org/r/20211116152652.19217-1-linma@zju.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Lin Ma [Mon, 15 Nov 2021 14:56:00 +0000 (22:56 +0800)]
NFC: reorganize the functions in nci_request
There is a possible data race as shown below:
thread-A in nci_request() | thread-B in nci_close_device()
| mutex_lock(&ndev->req_lock);
test_bit(NCI_UP, &ndev->flags); |
... | test_and_clear_bit(NCI_UP, &ndev->flags)
mutex_lock(&ndev->req_lock); |
|
This race will allow __nci_request() to be awaked while the device is
getting removed.
Similar to commit ac6f9b554da3 ("bluetooth: eliminate the potential race
condition when removing the HCI controller"). this patch alters the
function sequence in nci_request() to prevent the data races between the
nci_close_device().
Bernard Zhao [Mon, 15 Nov 2021 02:58:50 +0000 (18:58 -0800)]
drm/amd/amdgpu: fix potential memleak
In function amdgpu_get_xgmi_hive, when kobject_init_and_add failed
There is a potential memleak if not call kobject_put.
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Bernard Zhao <bernard@vivo.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
shaoyunl [Sun, 14 Nov 2021 17:38:18 +0000 (12:38 -0500)]
drm/amd/amdkfd: Fix kernel panic when reset failed and been triggered again
In SRIOV configuration, the reset may failed to bring asic back to normal but stop cpsch
already been called, the start_cpsch will not be called since there is no resume in this
case. When reset been triggered again, driver should avoid to do uninitialization again.
Signed-off-by: shaoyunl <shaoyun.liu@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Tadeusz Struk [Mon, 15 Nov 2021 16:01:43 +0000 (08:01 -0800)]
tipc: check for null after calling kmemdup
kmemdup can return a null pointer so need to check for it, otherwise
the null key will be dereferenced later in tipc_crypto_key_xmit as
can be seen in the trace [1].
Perry Yuan [Thu, 28 Oct 2021 10:05:42 +0000 (06:05 -0400)]
drm/amd/pm: add GFXCLK/SCLK clocks level print support for APUs
add support that allow the userspace tool like RGP to get the GFX clock
value at runtime, the fix follow the old way to show the min/current/max
clocks level for compatible consideration.
hongao [Thu, 11 Nov 2021 03:32:07 +0000 (11:32 +0800)]
drm/amdgpu: fix set scaling mode Full/Full aspect/Center not works on vga and dvi connectors
amdgpu_connector_vga_get_modes missed function amdgpu_get_native_mode
which assign amdgpu_encoder->native_mode with *preferred_mode result in
amdgpu_encoder->native_mode.clock always be 0. That will cause
amdgpu_connector_set_property returned early on:
if ((rmx_type != DRM_MODE_SCALE_NONE) &&
(amdgpu_encoder->native_mode.clock == 0))
when we try to set scaling mode Full/Full aspect/Center.
Add the missing function to amdgpu_connector_vga_get_mode can fix this.
It also works on dvi connectors because
amdgpu_connector_dvi_helper_funcs.get_mode use the same method.
Signed-off-by: hongao <hongao@uniontech.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
Roman Li [Wed, 17 Nov 2021 15:05:36 +0000 (10:05 -0500)]
drm/amd/display: Fix OLED brightness control on eDP
[Why]
After commit ("drm/amdgpu/display: add support for multiple backlights")
number of eDPs is defined while registering backlight device.
However the panel's extended caps get updated once before register call.
That leads to regression with extended caps like oled brightness control.
[How]
Update connector ext caps after register_backlight_device
Fixes: f5e81daf125cac ("drm/amdgpu/display: add support for multiple backlights") Link: https://www.reddit.com/r/AMDLaptops/comments/qst0fm/after_updating_to_linux_515_my_brightness/ Signed-off-by: Roman Li <Roman.Li@amd.com> Tested-by: Samuel Čavoj <samuel@cavoj.net> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Jasdeep Dhillon <Jasdeep.Dhillon@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
Fix misleading display error in dmesg if tc filter return fail.
Only i40e status error code should be converted to string, not linux
error code. Otherwise, we return false information about the error.
Fixes: 29b59b6af9bb ("i40e: Enable cloud filters via tc-flower") Signed-off-by: Grzegorz Szczurek <grzegorzx.szczurek@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Dave Switzer <david.switzer@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
i40e: Fix creation of first queue by omitting it if is not power of two
Reject TCs creation with proper message if the first queue
assignment is not equal to the power of two.
The first queue number was checked too late in the second queue
iteration, if second queue was configured at all. Now if first queue value
is not a power of two, then trying to create qdisc will be rejected.
Fixes: ee61c34354d5 ("i40e: Add infrastructure for queue channel support") Signed-off-by: Grzegorz Szczurek <grzegorzx.szczurek@intel.com> Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com> Tested-by: Tony Brelinski <tony.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
i40e: Fix warning message and call stack during rmmod i40e driver
Restore part of reset functionality used when reset is called
from the VF to reset itself. Without this fix warning message
is displayed when VF is being removed via sysfs.
Fix the crash of the VF during reset by ensuring
that the PF receives the reset message successfully.
Refactor code to use one function instead of two.
Fixes: eb89ebf92394 ("i40e: implement virtual device interface") Signed-off-by: Grzegorz Szczurek <grzegorzx.szczurek@intel.com> Signed-off-by: Karen Sornek <karen.sornek@intel.com> Tested-by: Tony Brelinski <tony.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Linus Torvalds [Wed, 17 Nov 2021 23:55:07 +0000 (15:55 -0800)]
Merge tag 'gfs2-v5.16-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 fixes from Andreas Gruenbacher:
- The current iomap_file_buffered_write behavior of failing the entire
write when part of the user buffer cannot be faulted in leads to an
endless loop in gfs2. Work around that in gfs2 for now.
- Various other bugs all over the place.
* tag 'gfs2-v5.16-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
gfs2: Prevent endless loops in gfs2_file_buffered_write
gfs2: Fix "Introduce flag for glock holder auto-demotion"
gfs2: Fix length of holes reported at end-of-file
gfs2: release iopen glock early in evict
gfs2: Fix atomic bug in gfs2_instantiate
gfs2: Only dereference i->iov when iter_is_iovec(i)
Linus Torvalds [Wed, 17 Nov 2021 23:12:50 +0000 (15:12 -0800)]
Merge tag 'mips-fixes_5.16_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux
Pull MIPS fixes from Thomas Bogendoerfer:
- wire futex_waitv syscall
- build fixes for lantiq and bcm63xx configs
- yamon-dt bugfix
* tag 'mips-fixes_5.16_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
mips: lantiq: add support for clk_get_parent()
mips: bcm63xx: add support for clk_get_parent()
MIPS: generic/yamon-dt: fix uninitialized variable error
MIPS: syscalls: Wire up futex_waitv syscall
Just bail out if the target IP block is already in the desired
powergate/ungate state. This can avoid some duplicate settings
which sometimes may cause unexpected issues.
Link: https://lore.kernel.org/all/YV81vidWQLWvATMM@zn.tnic/
Bug: https://bugzilla.kernel.org/show_bug.cgi?id=214921
Bug: https://bugzilla.kernel.org/show_bug.cgi?id=215025
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1789 Fixes: 1adafff6f551 ("drm/amdgpu: add missing cleanups for Polaris12 UVD/VCE on suspend") Signed-off-by: Evan Quan <evan.quan@amd.com> Tested-by: Borislav Petkov <bp@suse.de> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
Guchun Chen [Thu, 11 Nov 2021 03:15:08 +0000 (11:15 +0800)]
drm/amdgpu: add error print when failing to add IP block(v2)
Driver initialization is driven by IP version from IP
discovery table. So add error print when failing to add
ip block during driver initialization, this will be more
friendly to user to know which IP version is not correct.
[ 40.467361] [drm] host supports REQ_INIT_DATA handshake
[ 40.474076] [drm] add ip block number 0 <nv_common>
[ 40.474090] [drm] add ip block number 1 <gmc_v10_0>
[ 40.474101] [drm] add ip block number 2 <psp>
[ 40.474103] [drm] add ip block number 3 <navi10_ih>
[ 40.474114] [drm] add ip block number 4 <smu>
[ 40.474119] [drm] add ip block number 5 <amdgpu_vkms>
[ 40.474134] [drm] add ip block number 6 <gfx_v10_0>
[ 40.474143] [drm] add ip block number 7 <sdma_v5_2>
[ 40.474147] amdgpu 0000:00:08.0: amdgpu: Fatal error during GPU init
[ 40.474545] amdgpu 0000:00:08.0: amdgpu: amdgpu: finishing device.
v2: use dev_err to multi-GPU system
Signed-off-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Jack Wang [Mon, 15 Nov 2021 10:15:19 +0000 (11:15 +0100)]
RDMA/mlx4: Do not fail the registration on port stats
If the FW doesn't support MLX4_DEV_CAP_FLAG2_DIAG_PER_PORT, mlx4 driver
will fail the ib_setup_port_attrs, which is called from
ib_register_device()/enable_device_and_get(), in the end leads to device
not detected[1][2]
To fix it, add a new mlx4_ib_hw_stats_ops1, w/o alloc_hw_port_stats if FW
does not support MLX4_DEV_CAP_FLAG2_DIAG_PER_PORT.
Fixes: 2ab0daa9e03f ("RDMA: Split the alloc_hw_stats() ops to port and device variants") Link: https://lore.kernel.org/r/20211115101519.27210-1-jinpu.wang@ionos.com Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Linus Torvalds [Wed, 17 Nov 2021 16:46:15 +0000 (08:46 -0800)]
Merge tag 'hyperv-fixes-signed-20211117' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull hyperv fixes from Wei Liu:
- Fix ring size calculation for balloon driver (Boqun Feng)
- Fix issues in Hyper-V setup code (Sean Christopherson)
* tag 'hyperv-fixes-signed-20211117' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
x86/hyperv: Move required MSRs check to initial platform probing
x86/hyperv: Fix NULL deref in set_hv_tscchange_cb() if Hyper-V setup fails
Drivers: hv: balloon: Use VMBUS_RING_SIZE() wrapper for dm_ring_size
Revert "ACPI: scan: Release PM resources blocked by unused objects"
Revert commit bd9b4ca4fe5f ("ACPI: scan: Release PM resources blocked
by unused objects"), because it causes boot issues to appear on some
platforms.
Reported-by: Kyle D. Pelton <kyle.d.pelton@intel.com> Reported-by: Saranya Gopal <saranya.gopal@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
i40e: Fix ping is lost after configuring ADq on VF
Properly reconfigure VF VSIs after VF request ADQ.
Created new function to update queue mapping and queue pairs per TC
with AQ update VSI. This sets proper RSS size on NIC.
VFs num_queue_pairs should not be changed during setup of queue maps.
Previously, VF main VSI in ADQ had configured too many queues and had
wrong RSS size, which lead to packets not being consumed and drops in
connectivity.
Fixes: 5fe4e581640b ("i40e: Fix the number of queues available to be mapped for use") Co-developed-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com> Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com> Signed-off-by: Eryk Rybak <eryk.roch.rybak@intel.com> Tested-by: Tony Brelinski <tony.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
i40e: Fix changing previously set num_queue_pairs for PFs
Currently, the i40e_vsi_setup_queue_map is basing the count of queues in
TCs on a VSI's alloc_queue_pairs member which is not changed throughout
any user's action (for example via ethtool's set_channels callback).
This implies that vsi->tc_config.tc_info[n].qcount value that is given
to the kernel via netdev_set_tc_queue() that notifies about the count of
queues per particular traffic class is constant even if user has changed
the total count of queues.
This in turn caused the kernel warning after setting the queue count to
the lower value than the initial one:
[dmesg]
Number of in use tx queues changed invalidating tc mappings. Priority
traffic classification disabled!
Reason was that vsi->alloc_queue_pairs stayed at 64 value which was used
to set the qcount on TC0 (by default only TC0 exists so all of the
existing queues are assigned to TC0). we update the offset/qcount via
netdev_set_tc_queue() back to the old value but then the
netif_set_real_num_tx_queues() is using the vsi->num_queue_pairs as a
value which got set to 40.
Fix it by using vsi->req_queue_pairs as a queue count that will be
distributed across TCs. Do it only for non-zero values, which implies
that user actually requested the new count of queues.
For VSIs other than main, stay with the vsi->alloc_queue_pairs as we
only allow manipulating the queue count on main VSI.
Fixes: 5fe4e581640b ("i40e: Fix the number of queues available to be mapped for use") Co-developed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Co-developed-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com> Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com> Signed-off-by: Eryk Rybak <eryk.roch.rybak@intel.com> Tested-by: Tony Brelinski <tony.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Remove the reason of null pointer dereference in sync VSI filters.
Added new I40E_VSI_RELEASING flag to signalize deleting and releasing
of VSI resources to sync this thread with sync filters subtask.
Without this patch it is possible to start update the VSI filter list
after VSI is removed, that's causing a kernel oops.
Fixes: 79a154c5e3e8 ("i40e: main driver core") Signed-off-by: Grzegorz Szczurek <grzegorzx.szczurek@intel.com> Signed-off-by: Michal Maloszewski <michal.maloszewski@intel.com> Reviewed-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com> Reviewed-by: Witold Fijalkowski <witoldx.fijalkowski@intel.com> Reviewed-by: Jaroslaw Gawin <jaroslawx.gawin@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Tony Brelinski <tony.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Eryk Rybak [Thu, 21 Jan 2021 16:17:22 +0000 (16:17 +0000)]
i40e: Fix correct max_pkt_size on VF RX queue
Setting VLAN port increasing RX queue max_pkt_size
by 4 bytes to take VLAN tag into account.
Trigger the VF reset when setting port VLAN for
VF to renegotiate its capabilities and reinitialize.
Fixes: 0d0ad7185a32 ("i40e: don't hold spinlock while resetting VF") Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com> Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Eryk Rybak <eryk.roch.rybak@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Łukasz Stelmach [Tue, 16 Nov 2021 21:29:15 +0000 (22:29 +0100)]
net: ax88796c: use bit numbers insetad of bit masks
Change the values of EVENT_* constants from bit masks to bit numbers as
accepted by {clear,set,test}_bit() functions.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Łukasz Stelmach <l.stelmach@samsung.com> Signed-off-by: David S. Miller <davem@davemloft.net>