Paolo Bonzini [Thu, 11 Nov 2021 12:40:26 +0000 (07:40 -0500)]
Merge branch 'kvm-guest-sev-migration' into kvm-master
Add guest api and guest kernel support for SEV live migration.
Introduces a new hypercall to notify the host of changes to the page
encryption status. If the page is encrypted then it must be migrated
through the SEV firmware or a helper VM sharing the key. If page is
not encrypted then it can be migrated normally by userspace. This new
hypercall is invoked using paravirt_ops.
Conflicts: sev_active() replaced by cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT).
Ashish Kalra [Tue, 24 Aug 2021 11:07:45 +0000 (11:07 +0000)]
x86/kvm: Add kexec support for SEV Live Migration.
Reset the host's shared pages list related to kernel
specific page encryption status settings before we load a
new kernel by kexec. We cannot reset the complete
shared pages list here as we need to retain the
UEFI/OVMF firmware specific settings.
The host's shared pages list is maintained for the
guest to keep track of all unencrypted guest memory regions,
therefore we need to explicitly mark all shared pages as
encrypted again before rebooting into the new guest kernel.
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> Reviewed-by: Steve Rutherford <srutherford@google.com>
Message-Id: <3e051424ab839ea470f88333273d7a185006754f.1629726117.git.ashish.kalra@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Ashish Kalra [Tue, 24 Aug 2021 11:06:40 +0000 (11:06 +0000)]
EFI: Introduce the new AMD Memory Encryption GUID.
Introduce a new AMD Memory Encryption GUID which is currently
used for defining a new UEFI environment variable which indicates
UEFI/OVMF support for the SEV live migration feature. This variable
is setup when UEFI/OVMF detects host/hypervisor support for SEV
live migration and later this variable is read by the kernel using
EFI runtime services to verify if OVMF supports the live migration
feature.
Brijesh Singh [Tue, 24 Aug 2021 11:05:00 +0000 (11:05 +0000)]
mm: x86: Invoke hypercall when page encryption status is changed
Invoke a hypercall when a memory region is changed from encrypted ->
decrypted and vice versa. Hypervisor needs to know the page encryption
status during the guest migration.
Brijesh Singh [Tue, 24 Aug 2021 11:04:35 +0000 (11:04 +0000)]
x86/kvm: Add AMD SEV specific Hypercall3
KVM hypercall framework relies on alternative framework to patch the
VMCALL -> VMMCALL on AMD platform. If a hypercall is made before
apply_alternative() is called then it defaults to VMCALL. The approach
works fine on non SEV guest. A VMCALL would causes #UD, and hypervisor
will be able to decode the instruction and do the right things. But
when SEV is active, guest memory is encrypted with guest key and
hypervisor will not be able to decode the instruction bytes.
To highlight the need to provide this interface, capturing the
flow of apply_alternatives() :
setup_arch() call init_hypervisor_platform() which detects
the hypervisor platform the kernel is running under and then the
hypervisor specific initialization code can make early hypercalls.
For example, KVM specific initialization in case of SEV will try
to mark the "__bss_decrypted" section's encryption state via early
page encryption status hypercalls.
Now, apply_alternatives() is called much later when setup_arch()
calls check_bugs(), so we do need some kind of an early,
pre-alternatives hypercall interface. Other cases of pre-alternatives
hypercalls include marking per-cpu GHCB pages as decrypted on SEV-ES
and per-cpu apf_reason, steal_time and kvm_apic_eoi as decrypted for
SEV generally.
Add SEV specific hypercall3, it unconditionally uses VMMCALL. The hypercall
will be used by the SEV guest to notify encrypted pages to the hypervisor.
This kvm_sev_hypercall3() function is abstracted and used as follows :
All these early hypercalls are made through early_set_memory_XX() interfaces,
which in turn invoke pv_ops (paravirt_ops).
This early_set_memory_XX() -> pv_ops.mmu.notify_page_enc_status_changed()
is a generic interface and can easily have SEV, TDX and any other
future platform specific abstractions added to it.
Currently, pv_ops.mmu.notify_page_enc_status_changed() callback is setup to
invoke kvm_sev_hypercall3() in case of SEV.
Similarly, in case of TDX, pv_ops.mmu.notify_page_enc_status_changed()
can be setup to a TDX specific callback.
Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Borislav Petkov <bp@suse.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: x86@kernel.org Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Steve Rutherford <srutherford@google.com> Reviewed-by: Venu Busireddy <venu.busireddy@oracle.com> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Message-Id: <6fd25c749205dd0b1eb492c60d41b124760cc6ae.1629726117.git.ashish.kalra@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Linus Torvalds [Thu, 11 Nov 2021 01:05:37 +0000 (17:05 -0800)]
Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
"Only bug fixes and cleanups for ext4 this merge window.
Of note are fixes for the combination of the inline_data and
fast_commit fixes, and more accurately calculating when to schedule
additional lazy inode table init, especially when CONFIG_HZ is 100HZ"
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: fix error code saved on super block during file system abort
ext4: inline data inode fast commit replay fixes
ext4: commit inline data during fast commit
ext4: scope ret locally in ext4_try_to_trim_range()
ext4: remove an unused variable warning with CONFIG_QUOTA=n
ext4: fix boolreturn.cocci warnings in fs/ext4/name.c
ext4: prevent getting empty inode buffer
ext4: move ext4_fill_raw_inode() related functions
ext4: factor out ext4_fill_raw_inode()
ext4: prevent partial update of the extent blocks
ext4: check for inconsistent extents between index and leaf block
ext4: check for out-of-order index extents in ext4_valid_extent_entries()
ext4: convert from atomic_t to refcount_t on ext4_io_end->count
ext4: refresh the ext4_ext_path struct after dropping i_data_sem.
ext4: ensure enough credits in ext4_ext_shift_path_extents
ext4: correct the left/middle/right debug message for binsearch
ext4: fix lazy initialization next schedule time computation in more granular unit
Revert "ext4: enforce buffer head state assertion in ext4_da_map_blocks"
Linus Torvalds [Thu, 11 Nov 2021 00:50:57 +0000 (16:50 -0800)]
Merge tag 'for-5.16-deadlock-fix-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fix from David Sterba:
"Fix for a deadlock when direct/buffered IO is done on a mmaped file
and a fault happens (details in the patch). There's a fstest
generic/647 that triggers the problem and makes testing hard"
* tag 'for-5.16-deadlock-fix-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix deadlock due to page faults during direct IO reads and writes
Linus Torvalds [Thu, 11 Nov 2021 00:45:54 +0000 (16:45 -0800)]
Merge tag 'nfsd-5.16' of git://linux-nfs.org/~bfields/linux
Pull nfsd updates from Bruce Fields:
"A slow cycle for nfsd: mainly cleanup, including Neil's patch dropping
support for a filehandle format deprecated 20 years ago, and further
xdr-related cleanup from Chuck"
* tag 'nfsd-5.16' of git://linux-nfs.org/~bfields/linux: (26 commits)
nfsd4: remove obselete comment
nfsd: document server-to-server-copy parameters
NFSD:fix boolreturn.cocci warning
nfsd: update create verifier comment
SUNRPC: Change return value type of .pc_encode
SUNRPC: Replace the "__be32 *p" parameter to .pc_encode
NFSD: Save location of NFSv4 COMPOUND status
SUNRPC: Change return value type of .pc_decode
SUNRPC: Replace the "__be32 *p" parameter to .pc_decode
SUNRPC: De-duplicate .pc_release() call sites
SUNRPC: Simplify the SVC dispatch code path
SUNRPC: Capture value of xdr_buf::page_base
SUNRPC: Add trace event when alloc_pages_bulk() makes no progress
svcrdma: Split svcrmda_wc_{read,write} tracepoints
svcrdma: Split the svcrdma_wc_send() tracepoint
svcrdma: Split the svcrdma_wc_receive() tracepoint
NFSD: Have legacy NFSD WRITE decoders use xdr_stream_subsegment()
SUNRPC: xdr_stream_subsegment() must handle non-zero page_bases
NFSD: Initialize pointer ni with NULL and not plain integer 0
NFSD: simplify struct nfsfh
...
Linus Torvalds [Thu, 11 Nov 2021 00:32:46 +0000 (16:32 -0800)]
Merge tag 'nfs-for-5.16-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client updates from Trond Myklebust:
"Highlights include:
Features:
- NFSv4.1 can always retrieve and cache the ACCESS mode on OPEN
- Optimisations for READDIR and the 'ls -l' style workload
- Further replacements of dprintk() with tracepoints and other
tracing improvements
- Ensure we re-probe NFSv4 server capabilities when the user does a
"mount -o remount"
Bugfixes:
- Fix an Oops in pnfs_mark_request_commit()
- Fix up deadlocks in the commit code
- Fix regressions in NFSv2/v3 attribute revalidation due to the
change_attr_type optimisations
- Fix some dentry verifier races
- Fix some missing dentry verifier settings
- Fix a performance regression in nfs_set_open_stateid_locked()
- SUNRPC was sending multiple SYN calls when re-establishing a TCP
connection.
- Fix multiple NFSv4 issues due to missing sanity checking of server
return values
- Fix a potential Oops when FREE_STATEID races with an unmount
Cleanups:
- Clean up the labelled NFS code
- Remove unused header <linux/pnfs_osd_xdr.h>"
* tag 'nfs-for-5.16-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (84 commits)
NFSv4: Sanity check the parameters in nfs41_update_target_slotid()
NFS: Remove the nfs4_label argument from decode_getattr_*() functions
NFS: Remove the nfs4_label argument from nfs_setsecurity
NFS: Remove the nfs4_label argument from nfs_fhget()
NFS: Remove the nfs4_label argument from nfs_add_or_obtain()
NFS: Remove the nfs4_label argument from nfs_instantiate()
NFS: Remove the nfs4_label from the nfs_setattrres
NFS: Remove the nfs4_label from the nfs4_getattr_res
NFS: Remove the f_label from the nfs4_opendata and nfs_openres
NFS: Remove the nfs4_label from the nfs4_lookupp_res struct
NFS: Remove the label from the nfs4_lookup_res struct
NFS: Remove the nfs4_label from the nfs4_link_res struct
NFS: Remove the nfs4_label from the nfs4_create_res struct
NFS: Remove the nfs4_label from the nfs_entry struct
NFS: Create a new nfs_alloc_fattr_with_label() function
NFS: Always initialise fattr->label in nfs_fattr_alloc()
NFSv4.2: alloc_file_pseudo() takes an open flag, not an f_mode
NFS: Don't allocate nfs_fattr on the stack in __nfs42_ssc_open()
NFSv4: Remove unnecessary 'minor version' check
NFSv4: Fix potential Oops in decode_op_map()
...
Linus Torvalds [Thu, 11 Nov 2021 00:15:54 +0000 (16:15 -0800)]
Merge branch 'exit-cleanups-for-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull exit cleanups from Eric Biederman:
"While looking at some issues related to the exit path in the kernel I
found several instances where the code is not using the existing
abstractions properly.
This set of changes introduces force_fatal_sig a way of sending a
signal and not allowing it to be caught, and corrects the misuse of
the existing abstractions that I found.
A lot of the misuse of the existing abstractions are silly things such
as doing something after calling a no return function, rolling BUG by
hand, doing more work than necessary to terminate a kernel thread, or
calling do_exit(SIGKILL) instead of calling force_sig(SIGKILL).
In the review a deficiency in force_fatal_sig and force_sig_seccomp
where ptrace or sigaction could prevent the delivery of the signal was
found. I have added a change that adds SA_IMMUTABLE to change that
makes it impossible to interrupt the delivery of those signals, and
allows backporting to fix force_sig_seccomp
And Arnd found an issue where a function passed to kthread_run had the
wrong prototype, and after my cleanup was failing to build."
* 'exit-cleanups-for-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (23 commits)
soc: ti: fix wkup_m3_rproc_boot_thread return type
signal: Add SA_IMMUTABLE to ensure forced siganls do not get changed
signal: Replace force_sigsegv(SIGSEGV) with force_fatal_sig(SIGSEGV)
exit/r8188eu: Replace the macro thread_exit with a simple return 0
exit/rtl8712: Replace the macro thread_exit with a simple return 0
exit/rtl8723bs: Replace the macro thread_exit with a simple return 0
signal/x86: In emulate_vsyscall force a signal instead of calling do_exit
signal/sparc32: In setup_rt_frame and setup_fram use force_fatal_sig
signal/sparc32: Exit with a fatal signal when try_to_clear_window_buffer fails
exit/syscall_user_dispatch: Send ordinary signals on failure
signal: Implement force_fatal_sig
exit/kthread: Have kernel threads return instead of calling do_exit
signal/s390: Use force_sigsegv in default_trap_handler
signal/vm86_32: Properly send SIGSEGV when the vm86 state cannot be saved.
signal/vm86_32: Replace open coded BUG_ON with an actual BUG_ON
signal/sparc: In setup_tsb_params convert open coded BUG into BUG
signal/powerpc: On swapcontext failure force SIGSEGV
signal/sh: Use force_sig(SIGKILL) instead of do_group_exit(SIGKILL)
signal/mips: Update (_save|_restore)_fp_context to fail with -EFAULT
signal/sparc32: Remove unreachable do_exit in do_sparc_fault
...
would indicate that the new core scheduling domain encompasses all
tasks in the process group of <pid>. Specifying 0 would only create a
core scheduling domain for the thread identified by <pid> and 2 would
encompass the whole thread-group of <pid>.
Note, the values 0, 1, and 2 correspond to PIDTYPE_PID, PIDTYPE_TGID,
and PIDTYPE_PGID. A first version tried to expose those values
directly to which I objected because:
- PIDTYPE_* is an enum that is kernel internal which we should not
expose to userspace directly.
- PIDTYPE_* indicates what a given struct pid is used for it doesn't
express a scope.
But what the 4th argument of PR_SCHED_CORE prctl() expresses is the
scope of the operation, i.e. the scope of the core scheduling domain
at creation time. So Eugene's patch now simply introduces three new
defines PR_SCHED_CORE_SCOPE_THREAD, PR_SCHED_CORE_SCOPE_THREAD_GROUP,
and PR_SCHED_CORE_SCOPE_PROCESS_GROUP. They simply express what
happens.
This has been on the mailing list for quite a while with all relevant
scheduler folks Cced. I announced multiple times that I'd pick this up
if I don't see or her anyone else doing it. None of this touches
proper scheduler code but only concerns uapi so I think this is fine.
With core scheduling being quite common now for vm managers (e.g.
moving individual vcpu threads into their own core scheduling domain)
and container managers (e.g. moving the init process into its own core
scheduling domain and letting all created children inherit it) having
to rely on raw numbers passed as the 4th argument in prctl() is a bit
annoying and everyone is starting to come up with their own defines"
* tag 'kernel.sys.v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
uapi/linux/prctl: provide macro definitions for the PR_SCHED_CORE type argument
Linus Torvalds [Thu, 11 Nov 2021 00:02:08 +0000 (16:02 -0800)]
Merge tag 'pidfd.v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
Pull pidfd updates from Christian Brauner:
"Various places in the kernel have picked up pidfds.
The two most recent additions have probably been the ability to use
pidfds in bpf maps and the usage of pidfds in mm-based syscalls such
as process_mrelease() and process_madvise().
The same pattern to turn a pidfd into a struct task exists in two
places. One of those places used PIDTYPE_TGID while the other one used
PIDTYPE_PID even though it is clearly documented in all pidfd-helpers
that pidfds __currently__ only refer to thread-group leaders (subject
to change in the future if need be).
This isn't a bug per se but has the potential to be one if we allow
pidfds to refer to individual threads. If that happens we want to
audit all codepaths that make use of them to ensure they can deal with
pidfds refering to individual threads.
This adds a simple helper to turn a pidfd into a struct task making it
easy to grep for such places. Plus, it gets rid of code-duplication"
* tag 'pidfd.v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
mm: use pidfd_get_task()
pid: add pidfd_get_task() helper
Linus Torvalds [Wed, 10 Nov 2021 20:07:22 +0000 (12:07 -0800)]
Merge tag 'thermal-5.16-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull more thermal control updates from Rafael Wysocki:
"These fix two issues in the thermal core and one in the int340x
thermal driver.
Specifics:
- Replace pr_warn() with pr_warn_once() in user_space_bind() to
reduce kernel log noise (Rafael Wysocki).
- Extend the RFIM mailbox interface in the int340x thermal driver to
return 64 bit values to allow all values returned by the hardware
to be handled correctly (Srinivas Pandruvada).
- Fix possible NULL pointer dereferences in the of_thermal_ family of
functions (Subbaraman Narayanamurthy)"
* tag 'thermal-5.16-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
thermal: Replace pr_warn() with pr_warn_once() in user_space_bind()
thermal: Fix NULL pointer dereferences in of_thermal_ functions
thermal/drivers/int340x: processor_thermal: Suppot 64 bit RFIM responses
Linus Torvalds [Wed, 10 Nov 2021 19:59:55 +0000 (11:59 -0800)]
Merge tag 'pm-5.16-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull more power management updates from Rafael Wysocki:
"These fix three intel_pstate driver regressions, fix locking in the
core code suspending and resuming devices during system PM
transitions, fix the handling of cpuidle drivers based on runtime PM
during system-wide suspend, fix two issues in the operating
performance points (OPP) framework and resource-managed helpers to it.
Specifics:
- Fix two intel_pstate driver regressions related to the HWP
interrupt handling added recently (Srinivas Pandruvada).
- Fix intel_pstate driver regression introduced during the 5.11 cycle
and causing HWP desired performance to be mishandled in some cases
when switching driver modes and during system suspend and shutdown
(Rafael Wysocki).
- Fix system-wide device suspend and resume locking to avoid
deadlocks when device objects are deleted during a system-wide PM
transition (Rafael Wysocki).
- Modify system-wide suspend of devices to prevent cpuidle drivers
based on runtime PM from misbehaving during the "no IRQ" phase of
it (Ulf Hansson).
- Fix return value of _opp_add_static_v2() helper (YueHaibing).
Linus Torvalds [Wed, 10 Nov 2021 19:52:40 +0000 (11:52 -0800)]
Merge tag 'acpi-5.16-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull more ACPI updates from Rafael Wysocki:
"These add support for a new ACPI device configuration object called
_DSC, fix some issues including one recent regression, add two new
items to quirk lists and clean up assorted pieces of code.
Specifics:
- Add support for new ACPI device configuration object called _DSC
("Deepest State for Configuration") to allow certain devices to be
probed without changing their power states, document it and make
two drivers use it (Sakari Ailus, Rajmohan Mani).
- Fix device wakeup power reference counting broken recently by
mistake (Rafael Wysocki).
- Drop unused symbol and macros depending on it from acgcc.h (Rafael
Wysocki).
- Add HP ZHAN 66 Pro to the "no EC wakeup" quirk list (Binbin Zhou).
- Add Xiaomi Mi Pad 2 to the backlight quirk list and drop an unused
piece of data from all of the list entries (Hans de Goede).
- Fix register read accesses handling in the Intel PMIC operation
region driver (Hans de Goede).
- Clean up static variables initialization in the EC driver
(wangzhitong)"
* tag 'acpi-5.16-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
Documentation: ACPI: Fix non-D0 probe _DSC object example
ACPI: Drop ACPI_USE_BUILTIN_STDARG ifdef from acgcc.h
ACPI: PM: Fix device wakeup power reference counting error
ACPI: video: use platform backlight driver on Xiaomi Mi Pad 2
ACPI: video: Drop dmi_system_id.ident settings from video_detect_dmi_table[]
ACPI: PMIC: Fix intel_pmic_regs_handler() read accesses
ACPI: EC: Remove initialization of static variables to false
ACPI: EC: Use ec_no_wakeup on HP ZHAN 66 Pro
at24: Support probing while in non-zero ACPI D state
media: i2c: imx319: Support device probe in non-zero ACPI D state
ACPI: Add a convenience function to tell a device is in D0 state
Documentation: ACPI: Document _DSC object usage for enum power state
i2c: Allow an ACPI driver to manage the device's power state during probe
ACPI: scan: Obtain device's desired enumeration power state
Linus Torvalds [Wed, 10 Nov 2021 19:47:55 +0000 (11:47 -0800)]
Merge tag 'dmaengine-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine
Pull dmaengine updates from Vinod Koul:
"A bunch of driver updates, no new driver or controller support this
time though:
- Another pile of idxd updates
- pm routines cleanup for at_xdmac driver
- Correct handling of callback_result for few drivers
- zynqmp_dma driver updates and descriptor management refinement
- Hardware handshaking support for dw-axi-dmac
- Support for remotely powered controllers in Qcom bam dma
- tegra driver updates"
* tag 'dmaengine-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine: (69 commits)
dmaengine: ti: k3-udma: Set r/tchan or rflow to NULL if request fail
dmaengine: ti: k3-udma: Set bchan to NULL if a channel request fail
dmaengine: stm32-dma: avoid 64-bit division in stm32_dma_get_max_width
dmaengine: fsl-edma: support edma memcpy
dmaengine: idxd: fix resource leak on dmaengine driver disable
dmaengine: idxd: cleanup completion record allocation
dmaengine: zynqmp_dma: Correctly handle descriptor callbacks
dmaengine: xilinx_dma: Correctly handle cyclic descriptor callbacks
dmaengine: altera-msgdma: Correctly handle descriptor callbacks
dmaengine: at_xdmac: fix compilation warning
dmaengine: dw-axi-dmac: Simplify assignment in dma_chan_pause()
dmaengine: qcom: bam_dma: Add "powered remotely" mode
dt-bindings: dmaengine: bam_dma: Add "powered remotely" mode
dmaengine: sa11x0: Mark PM functions as __maybe_unused
dmaengine: switch from 'pci_' to 'dma_' API
dmaengine: ioat: switch from 'pci_' to 'dma_' API
dmaengine: hsu: switch from 'pci_' to 'dma_' API
dmaengine: hisi_dma: switch from 'pci_' to 'dma_' API
dmaengine: dw: switch from 'pci_' to 'dma_' API
dmaengine: dw-edma-pcie: switch from 'pci_' to 'dma_' API
...
Linus Torvalds [Wed, 10 Nov 2021 19:36:43 +0000 (11:36 -0800)]
Merge tag 'tag-chrome-platform-for-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux
Pull chrome platform updates from Benson Leung:
"cros_ec_typec:
- Clean up use of cros_ec_check_features
cros_ec_*:
- Rename and move cros_ec_pd_command to cros_ec_command, and make
changes to cros_ec_typec and cros_ec_proto to use the new common
command, reducing duplication.
sensorhub:
- simplify getting .driver_data in cros_ec_sensors_core and
cros_ec_sensorhub
misc:
- Maintainership change. Enric Balletbo i Serra has moved on from
Collabora, so removing him from chrome/platform maintainers. Thanks
for all of your hard work maintaining this, Enric, and best of luck
to you in your new role!
- Add Prashant Malani as driver maintainer for cros_ec_typec.c and
cros_usbpd_notify. He was already principal contributor of these
drivers"
* tag 'tag-chrome-platform-for-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux:
platform/chrome: cros_ec_proto: Use ec_command for check_features
platform/chrome: cros_ec_proto: Use EC struct for features
MAINTAINERS: Chrome: Drop Enric Balletbo i Serra
platform/chrome: cros_ec_typec: Use cros_ec_command()
platform/chrome: cros_ec_proto: Add version for ec_command
platform/chrome: cros_ec_proto: Make data pointers void
platform/chrome: cros_usbpd_notify: Move ec_command()
platform/chrome: cros_usbpd_notify: Rename cros_ec_pd_command()
platform/chrome: cros_ec: Fix spelling mistake "responsed" -> "response"
platform/chrome: cros_ec_sensorhub: simplify getting .driver_data
iio: common: cros_ec_sensors: simplify getting .driver_data
platform/chrome: cros-ec-typec: Cleanup use of check_features
platform/chrome: cros_ec_proto: Fix check_features ret val
MAINTAINERS: Add Prashant's maintainership of cros_ec drivers
Linus Torvalds [Wed, 10 Nov 2021 19:29:30 +0000 (11:29 -0800)]
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
- Fix double-evaluation of 'pte' macro argument when using 52-bit PAs
- Fix signedness of some MTE prctl PR_* constants
- Fix kmemleak memory usage by skipping early pgtable allocations
- Fix printing of CPU feature register strings
- Remove redundant -nostdlib linker flag for vDSO binaries
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: pgtable: make __pte_to_phys/__phys_to_pte_val inline functions
arm64: Track no early_pgtable_alloc() for kmemleak
arm64: mte: change PR_MTE_TCF_NONE back into an unsigned long
arm64: vdso: remove -nostdlib compiler flag
arm64: arm64_ftr_reg->name may not be a human-readable string
Linus Torvalds [Wed, 10 Nov 2021 19:25:37 +0000 (11:25 -0800)]
Merge tag 'arm-fixes-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC fixes from Arnd Bergmann:
"This is one set of fixes for the NXP/FSL DPAA2 drivers, addressing a
few minor issues. I received these just after sending out the last
v5.15 fixes, and nothing in here seemed urgent enough for a quick
follow-up"
* tag 'arm-fixes-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
soc: fsl: dpaa2-console: free buffer before returning from dpaa2_console_read
soc: fsl: dpio: use the combined functions to protect critical zone
soc: fsl: dpio: replace smp_processor_id with raw_smp_processor_id
Linus Torvalds [Wed, 10 Nov 2021 17:41:22 +0000 (09:41 -0800)]
Merge tag 'linux-watchdog-5.16-rc1' of git://www.linux-watchdog.org/linux-watchdog
Pull watchdog updates from Wim Van Sebroeck:
- f71808e_wdt: convert to watchdog framework
- db8500_wdt: Rename driver (was ux500_wdt.c)
- sunxi: Add compatibles for R329 and D1
- mtk: add disable_wdt_extrst support
- several other small fixes and improvements
* tag 'linux-watchdog-5.16-rc1' of git://www.linux-watchdog.org/linux-watchdog: (30 commits)
watchdog: db8500_wdt: Rename symbols
watchdog: db8500_wdt: Rename driver
watchdog: ux500_wdt: Drop platform data
watchdog: bcm63xx_wdt: fix fallthrough warning
watchdog: iTCO_wdt: No need to stop the timer in probe
watchdog: s3c2410: describe driver in KConfig
watchdog: sp5100_tco: Add support for get_timeleft
watchdog: mtk: add disable_wdt_extrst support
dt-bindings: watchdog: mtk-wdt: add disable_wdt_extrst support
watchdog: rza_wdt: Use semicolons instead of commas
watchdog: mlx-wdt: Use regmap_write_bits()
watchdog: rti-wdt: Make use of the helper function devm_platform_ioremap_resource()
watchdog: iTCO_wdt: Make use of the helper function devm_platform_ioremap_resource()
watchdog: ar7_wdt: Make use of the helper function devm_platform_ioremap_resource_byname()
watchdog: sunxi_wdt: Add support for D1
dt-bindings: watchdog: sunxi: Add compatibles for D1
ar7: fix kernel builds for compiler test
dt-bindings: watchdog: sunxi: Add compatibles for R329
watchdog: meson_gxbb_wdt: add timeout parameter
watchdog: meson_gxbb_wdt: add nowayout parameter
...
Linus Torvalds [Wed, 10 Nov 2021 17:07:26 +0000 (09:07 -0800)]
Merge tag 'rproc-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux
Pull remoteproc updates from Bjorn Andersson:
"The remoteproc repo is moved to a new path on git.kernel.org, to allow
Mathieu push access to the branches.
Support for the Mediatek MT8195 SCP was added, the related DeviceTree
binding was converted to YAML and MT8192 SCP was documented as well.
Amlogic Meson6, Meson8, Meson8b and Meson8m2 has an ARC core to aid in
resuming the system after suspend, a new remoteproc driver for booting
this core is introduced.
A new driver to support the DSP processor found on NXP i.MX8QM,
i.MX8QXP, i.MX8MP and i.MX8ULP is added.
The Qualcomm modem and TrustZone based remoteproc drivers gains
support for the modem in SC7280 and MSM8996 gains support for a
missing power-domain.
Throughout the Qualcomm drivers, the support for informing the
always-on power coprocessor about the state of each remoteproc is
reworked to avoid complications related to our use of genpd and the
system suspend state.
Lastly a number of small fixes are found throughout the drivers and
framework"
* tag 'rproc-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux: (39 commits)
remoteproc: Remove vdev_to_rvdev and vdev_to_rproc from remoteproc API
remoteproc: omap_remoteproc: simplify getting .driver_data
remoteproc: qcom_q6v5_mss: Use devm_platform_ioremap_resource_byname() to simplify code
remoteproc: Fix a memory leak in an error handling path in 'rproc_handle_vdev()'
remoteproc: Fix spelling mistake "atleast" -> "at least"
remoteproc: imx_dsp_rproc: mark PM functions as __maybe_unused
remoteproc: imx_dsp_rproc: Correct the comment style of copyright
dt-bindings: dsp: fsl: Update binding document for remote proc driver
remoteproc: imx_dsp_rproc: Add remoteproc driver for DSP on i.MX
remoteproc: imx_rproc: Add IMX_RPROC_SCU_API method
remoteproc: imx_rproc: Move common structure to header file
rpmsg: char: Remove useless include
remoteproc: meson-mx-ao-arc: fix a bit test
remoteproc: mss: q6v5-mss: Add modem support on SC7280
dt-bindings: remoteproc: qcom: Update Q6V5 Modem PIL binding
remoteproc: qcom: pas: Add SC7280 Modem support
dt-bindings: remoteproc: qcom: pas: Add SC7280 MPSS support
remoteproc: qcom: pas: Use the same init resources for MSM8996 and MSM8998
MAINTAINERS: Update remoteproc repo url
dt-bindings: remoteproc: k3-dsp: Cleanup SoC compatible from DT example
...
Linus Torvalds [Wed, 10 Nov 2021 17:05:11 +0000 (09:05 -0800)]
Merge tag 'rpmsg-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux
Pull rpmsg updates from Bjorn Andersson:
"For the GLINK implementation this adds support for splitting outgoing
messages that are too large to fit in the fifo, it introduces the use
of "read notifications", to avoid polling in the case where the
outgoing fifo is full and a few bugs are squashed.
The return value of rpmsg_create_ept() for when RPMSG is disabled is
corrected to return a valid error, the Mediatek rpmsg driver is
updated to match the DT binding and a couple of cleanups are done in
the virtio rpmsg driver"
* tag 'rpmsg-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux:
rpmsg: glink: Send READ_NOTIFY command in FIFO full case
rpmsg: glink: Remove channel decouple from rpdev release
rpmsg: glink: Remove the rpmsg dev in close_ack
rpmsg: glink: Add TX_DATA_CONT command while sending
rpmsg: virtio_rpmsg_bus: use dev_warn_ratelimited for msg with no recipient
rpmsg: virtio: Remove unused including <linux/of_device.h>
rpmsg: Change naming of mediatek rpmsg property
rpmsg: Fix rpmsg_create_ept return when RPMSG config is not defined
rpmsg: glink: Replace strncpy() with strscpy_pad()
* acpi-video:
ACPI: video: use platform backlight driver on Xiaomi Mi Pad 2
ACPI: video: Drop dmi_system_id.ident settings from video_detect_dmi_table[]
Merge new ACPI device configuration object _DSC support for 5.16-rc1.
* acpi-dsc:
Documentation: ACPI: Fix non-D0 probe _DSC object example
at24: Support probing while in non-zero ACPI D state
media: i2c: imx319: Support device probe in non-zero ACPI D state
ACPI: Add a convenience function to tell a device is in D0 state
Documentation: ACPI: Document _DSC object usage for enum power state
i2c: Allow an ACPI driver to manage the device's power state during probe
ACPI: scan: Obtain device's desired enumeration power state
Sakari Ailus [Wed, 10 Nov 2021 12:10:14 +0000 (14:10 +0200)]
Documentation: ACPI: Fix non-D0 probe _DSC object example
The original patch adding the example used _DSC Name when Method was
intended. Fix this.
Also replace spaces used for indentation with tabs in the example.
Fixes: 7aac08678f60 ("Documentation: ACPI: Document _DSC object usage for enum power state") Reported-by: Bingbu Cao <bingbu.cao@intel.com> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Linus Torvalds [Tue, 9 Nov 2021 19:24:08 +0000 (11:24 -0800)]
Merge tag 'for-5.16/drivers-2021-11-09' of git://git.kernel.dk/linux-block
Pull more block driver updates from Jens Axboe:
- Last series adding error handling support for add_disk() in drivers.
After this one, and once the SCSI side has been merged, we can
finally annotate add_disk() as must_check. (Luis)
- bcache fixes (Coly)
- zram fixes (Ming)
- ataflop locking fix (Tetsuo)
- nbd fixes (Ye, Yu)
- MD merge via Song
- Cleanup (Yang)
- sysfs fix (Guoqing)
- Misc fixes (Geert, Wu, luo)
* tag 'for-5.16/drivers-2021-11-09' of git://git.kernel.dk/linux-block: (34 commits)
bcache: Revert "bcache: use bvec_virt"
ataflop: Add missing semicolon to return statement
floppy: address add_disk() error handling on probe
ataflop: address add_disk() error handling on probe
block: update __register_blkdev() probe documentation
ataflop: remove ataflop_probe_lock mutex
mtd/ubi/block: add error handling support for add_disk()
block/sunvdc: add error handling support for add_disk()
z2ram: add error handling support for add_disk()
nvdimm/pmem: use add_disk() error handling
nvdimm/pmem: cleanup the disk if pmem_release_disk() is yet assigned
nvdimm/blk: add error handling support for add_disk()
nvdimm/blk: avoid calling del_gendisk() on early failures
nvdimm/btt: add error handling support for add_disk()
nvdimm/btt: use goto error labels on btt_blk_init()
loop: Remove duplicate assignments
drbd: Fix double free problem in drbd_create_device
nvdimm/btt: do not call del_gendisk() if not needed
bcache: fix use-after-free problem in bcache_device_free()
zram: replace fsync_bdev with sync_blockdev
...
Linus Torvalds [Tue, 9 Nov 2021 19:20:07 +0000 (11:20 -0800)]
Merge tag 'for-5.16/block-2021-11-09' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
- Set of fixes for the batched tag allocation (Ming, me)
- add_disk() error handling fix (Luis)
- Nested queue quiesce fixes (Ming)
- Shared tags init error handling fix (Ye)
- Misc cleanups (Jean, Ming, me)
* tag 'for-5.16/block-2021-11-09' of git://git.kernel.dk/linux-block:
nvme: wait until quiesce is done
scsi: make sure that request queue queiesce and unquiesce balanced
scsi: avoid to quiesce sdev->request_queue two times
blk-mq: add one API for waiting until quiesce is done
blk-mq: don't free tags if the tag_set is used by other device in queue initialztion
block: fix device_add_disk() kobject_create_and_add() error handling
block: ensure cached plug request matches the current queue
block: move queue enter logic into blk_mq_submit_bio()
block: make bio_queue_enter() fast-path available inline
block: split request allocation components into helpers
block: have plug stored requests hold references to the queue
blk-mq: update hctx->nr_active in blk_mq_end_request_batch()
blk-mq: add RQF_ELV debug entry
blk-mq: only try to run plug merge if request has same queue with incoming bio
block: move RQF_ELV setting into allocators
dm: don't stop request queue after the dm device is suspended
block: replace always false argument with 'false'
block: assign correct tag before doing prefetch of request
blk-mq: fix redundant check of !e expression
Linus Torvalds [Tue, 9 Nov 2021 19:16:20 +0000 (11:16 -0800)]
Merge tag 'for-5.16/bdev-size-2021-11-09' of git://git.kernel.dk/linux-block
Pull more bdev size updates from Jens Axboe:
"Two followup changes for the bdev-size series from this merge window:
- Add loff_t cast to bdev_nr_bytes() (Christoph)
- Use bdev_nr_bytes() consistently for the block parts at least (me)"
* tag 'for-5.16/bdev-size-2021-11-09' of git://git.kernel.dk/linux-block:
block: use new bdev_nr_bytes() helper for blkdev_{read,write}_iter()
block: add a loff_t cast to bdev_nr_bytes
Linus Torvalds [Tue, 9 Nov 2021 19:11:37 +0000 (11:11 -0800)]
Merge tag 'io_uring-5.16-2021-11-09' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
"Minor fixes that should go into the 5.16 release:
- Fix max worker setting not working correctly on NUMA (Beld)
- Correctly return current setting for max workers if zeroes are
passed in (Pavel)
- io_queue_sqe_arm_apoll() cleanup, as identified during the initial
merge (Pavel)
- Misc fixes (Nghia, me)"
* tag 'io_uring-5.16-2021-11-09' of git://git.kernel.dk/linux-block:
io_uring: honour zeroes as io-wq worker limits
io_uring: remove dead 'sqe' store
io_uring: remove redundant assignment to ret in io_register_iowq_max_workers()
io-wq: fix max-workers not correctly set on multi-node system
io_uring: clean up io_queue_sqe_arm_apoll
Linus Torvalds [Tue, 9 Nov 2021 19:02:04 +0000 (11:02 -0800)]
Merge tag 'for-5.16/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper updates from Mike Snitzer:
- Add DM core support for emitting audit events through the audit
subsystem. Also enhance both the integrity and crypt targets to emit
events to via dm-audit.
- Various other simple code improvements and cleanups.
* tag 'for-5.16/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm table: log table creation error code
dm: make workqueue names device-specific
dm writecache: Make use of the helper macro kthread_run()
dm crypt: Make use of the helper macro kthread_run()
dm verity: use bvec_kmap_local in verity_for_bv_block
dm log writes: use memcpy_from_bvec in log_writes_map
dm integrity: use bvec_kmap_local in __journal_read_write
dm integrity: use bvec_kmap_local in integrity_metadata
dm: add add_disk() error handling
dm: Remove redundant flush_workqueue() calls
dm crypt: log aead integrity violations to audit subsystem
dm integrity: log audit events for dm-integrity target
dm: introduce audit event module for device mapper
Linus Torvalds [Tue, 9 Nov 2021 18:56:41 +0000 (10:56 -0800)]
Merge tag 'dma-mapping-5.16' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping updates from Christoph Hellwig:
"Just a small set of changes this time. The request dma_direct_alloc
cleanups are still under review and haven't made the cut.
Summary:
- convert sparc32 to the generic dma-direct code
- use bitmap_zalloc (Christophe JAILLET)"
* tag 'dma-mapping-5.16' of git://git.infradead.org/users/hch/dma-mapping:
dma-mapping: use 'bitmap_zalloc()' when applicable
sparc32: use DMA_DIRECT_REMAP
sparc32: remove dma_make_coherent
sparc32: remove the call to dma_make_coherent in arch_dma_free
Linus Torvalds [Tue, 9 Nov 2021 18:51:12 +0000 (10:51 -0800)]
Merge tag 'ovl-update-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs
Pull overlayfs updates from Miklos Szeredi:
- Fix a regression introduced in the last cycle
- Fix a use-after-free in the AIO path
- Fix a bogus warning reported by syzbot
* tag 'ovl-update-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
ovl: fix filattr copy-up failure
ovl: fix warning in ovl_create_real()
ovl: fix use after free in struct ovl_aio_req
Linus Torvalds [Tue, 9 Nov 2021 18:34:06 +0000 (10:34 -0800)]
Merge tag 'for-linus-5.16-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux
Pull orangefs fixes from Mike Marshall:
- fix sb refcount leak when allocate sb info failed (Chenyuan Mi)
- fix error return code of orangefs_revalidate_lookup() (Jia-Ju Bai)
- remove redundant initialization of variable ret (Colin Ian King)
* tag 'for-linus-5.16-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux:
orangefs: Fix sb refcount leak when allocate sb info failed.
fs: orangefs: fix error return code of orangefs_revalidate_lookup()
orangefs: Remove redundant initialization of variable ret
Linus Torvalds [Tue, 9 Nov 2021 18:30:13 +0000 (10:30 -0800)]
Merge tag '9p-for-5.16-rc1' of git://github.com/martinetd/linux
Pull 9p updates from Dominique Martinet:
"Fixes, netfs read support and checkpatch rewrite:
- fix syzcaller uninitialized value usage after missing error check
- add module autoloading based on transport name
- convert cached reads to use netfs helpers
- adjust readahead based on transport msize
- and many, many checkpatch.pl warning fixes..."
* tag '9p-for-5.16-rc1' of git://github.com/martinetd/linux:
9p: fix a bunch of checkpatch warnings
9p: set readahead and io size according to maxsize
9p p9mode2perm: remove useless strlcpy and check sscanf return code
9p v9fs_parse_options: replace simple_strtoul with kstrtouint
9p: fix file headers
fs/9p: fix indentation and Add missing a blank line after declaration
fs/9p: fix warnings found by checkpatch.pl
9p: fix minor indentation and codestyle
fs/9p: cleanup: opening brace at the beginning of the next line
9p: Convert to using the netfs helper lib to do reads and caching
fscache_cookie_enabled: check cookie is valid before accessing it
net/9p: autoload transport modules
9p/net: fix missing error check in p9_check_errors
Linus Torvalds [Tue, 9 Nov 2021 18:11:53 +0000 (10:11 -0800)]
Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton:
"87 patches.
Subsystems affected by this patch series: mm (pagecache and hugetlb),
procfs, misc, MAINTAINERS, lib, checkpatch, binfmt, kallsyms, ramfs,
init, codafs, nilfs2, hfs, crash_dump, signals, seq_file, fork,
sysvfs, kcov, gdb, resource, selftests, and ipc"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (87 commits)
ipc/ipc_sysctl.c: remove fallback for !CONFIG_PROC_SYSCTL
ipc: check checkpoint_restore_ns_capable() to modify C/R proc files
selftests/kselftest/runner/run_one(): allow running non-executable files
virtio-mem: disallow mapping virtio-mem memory via /dev/mem
kernel/resource: disallow access to exclusive system RAM regions
kernel/resource: clean up and optimize iomem_is_exclusive()
scripts/gdb: handle split debug for vmlinux
kcov: replace local_irq_save() with a local_lock_t
kcov: avoid enable+disable interrupts if !in_task()
kcov: allocate per-CPU memory on the relevant node
Documentation/kcov: define `ip' in the example
Documentation/kcov: include types.h in the example
sysv: use BUILD_BUG_ON instead of runtime check
kernel/fork.c: unshare(): use swap() to make code cleaner
seq_file: fix passing wrong private data
seq_file: move seq_escape() to a header
signal: remove duplicate include in signal.h
crash_dump: remove duplicate include in crash_dump.h
crash_dump: fix boolreturn.cocci warning
hfs/hfsplus: use WARN_ON for sanity check
...
ipc: check checkpoint_restore_ns_capable() to modify C/R proc files
This commit removes the requirement to be root to modify sem_next_id,
msg_next_id and shm_next_id and checks checkpoint_restore_ns_capable
instead.
Since those files are specific to the IPC namespace, there is no reason
they should require root privileges. This is similar to ns_last_pid,
which also only checks checkpoint_restore_ns_capable.
[akpm@linux-foundation.org: ipc/ipc_sysctl.c needs capability.h for checkpoint_restore_ns_capable()]
Link: https://lkml.kernel.org/r/20210916163717.3179496-1-mclapinski@google.com Signed-off-by: Michal Clapinski <mclapinski@google.com> Reviewed-by: Davidlohr Bueso <dbueso@suse.de> Reviewed-by: Manfred Spraul <manfred@colorfullife.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When running a test program, 'run_one()' checks if the program has the
execution permission and fails if it doesn't. However, it's easy to
mistakenly lose the permissions, as some common tools like 'diff' don't
support the permission change well[1]. Compared to that, making mistakes
in the test program's path would only rare, as those are explicitly listed
in 'TEST_PROGS'. Therefore, it might make more sense to resolve the
situation on our own and run the program.
For this reason, this commit makes the test program runner function still
print the warning message but to try parsing the interpreter of the
program and to explicitly run it with the interpreter, in this case.
virtio-mem: disallow mapping virtio-mem memory via /dev/mem
We don't want user space to be able to map virtio-mem device memory
directly (e.g., via /dev/mem) in order to have guarantees that in a sane
setup we'll never accidentially access unplugged memory within the
device-managed region of a virtio-mem device, just as required by the
virtio-spec.
As soon as the virtio-mem driver is loaded, the device region is visible
in /proc/iomem via the parent device region. From that point on user
space is aware of the device region and we want to disallow mapping
anything inside that region (where we will dynamically (un)plug memory)
until the driver has been unloaded cleanly and e.g., another driver might
take over.
By creating our parent IORESOURCE_SYSTEM_RAM resource with
IORESOURCE_EXCLUSIVE, we will disallow any /dev/mem access to our device
region until the driver was unloaded cleanly and removed the parent
region. This will work even though only some memory blocks are actually
currently added to Linux and appear as busy in the resource tree.
So access to the region from user space is only possible
a) if we don't load the virtio-mem driver.
b) after unloading the virtio-mem driver cleanly.
Don't build virtio-mem if access to /dev/mem cannot be restricticted -- if
we have CONFIG_DEVMEM=y but CONFIG_STRICT_DEVMEM is not set.
Link: https://lkml.kernel.org/r/20210920142856.17758-4-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hanjun Guo <guohanjun@huawei.com> Cc: Jason Wang <jasowang@redhat.com> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
kernel/resource: disallow access to exclusive system RAM regions
virtio-mem dynamically exposes memory inside a device memory region as
system RAM to Linux, coordinating with the hypervisor which parts are
actually "plugged" and consequently usable/accessible.
On the one hand, the virtio-mem driver adds/removes whole memory blocks,
creating/removing busy IORESOURCE_SYSTEM_RAM resources, on the other
hand, it logically (un)plugs memory inside added memory blocks,
dynamically either exposing them to the buddy or hiding them from the
buddy and marking them PG_offline.
In contrast to physical devices, like a DIMM, the virtio-mem driver is
required to actually make use of any of the device-provided memory,
because it performs the handshake with the hypervisor. virtio-mem
memory cannot simply be access via /dev/mem without a driver.
There is no safe way to:
a) Access plugged memory blocks via /dev/mem, as they might contain
unplugged holes or might get silently unplugged by the virtio-mem
driver and consequently turned inaccessible.
b) Access unplugged memory blocks via /dev/mem because the virtio-mem
driver is required to make them actually accessible first.
The virtio-spec states that unplugged memory blocks MUST NOT be written,
and only selected unplugged memory blocks MAY be read. We want to make
sure, this is the case in sane environments -- where the virtio-mem driver
was loaded.
We want to make sure that in a sane environment, nobody "accidentially"
accesses unplugged memory inside the device managed region. For example,
a user might spot a memory region in /proc/iomem and try accessing it via
/dev/mem via gdb or dumping it via something else. By the time the mmap()
happens, the memory might already have been removed by the virtio-mem
driver silently: the mmap() would succeeed and user space might
accidentially access unplugged memory.
So once the driver was loaded and detected the device along the
device-managed region, we just want to disallow any access via /dev/mem to
it.
In an ideal world, we would mark the whole region as busy ("owned by a
driver") and exclude it; however, that would be wrong, as we don't really
have actual system RAM at these ranges added to Linux ("busy system RAM").
Instead, we want to mark such ranges as "not actual busy system RAM but
still soft-reserved and prepared by a driver for future use."
Let's teach iomem_is_exclusive() to reject access to any range with
"IORESOURCE_SYSTEM_RAM | IORESOURCE_EXCLUSIVE", even if not busy and even
if "iomem=relaxed" is set. Introduce EXCLUSIVE_SYSTEM_RAM to make it
easier for applicable drivers to depend on this setting in their Kconfig.
For now, there are no applicable ranges and we'll modify virtio-mem next
to properly set IORESOURCE_EXCLUSIVE on the parent resource container it
creates to contain all actual busy system RAM added via
add_memory_driver_managed().
Link: https://lkml.kernel.org/r/20210920142856.17758-3-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hanjun Guo <guohanjun@huawei.com> Cc: Jason Wang <jasowang@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
kernel/resource: clean up and optimize iomem_is_exclusive()
Patch series "virtio-mem: disallow mapping virtio-mem memory via /dev/mem", v5.
Let's add the basic infrastructure to exclude some physical memory regions
marked as "IORESOURCE_SYSTEM_RAM" completely from /dev/mem access, even
though they are not marked IORESOURCE_BUSY and even though "iomem=relaxed"
is set. Resource IORESOURCE_EXCLUSIVE for that purpose instead of adding
new flags to express something similar to "soft-busy" or "not busy yet,
but already prepared by a driver and not to be mapped by user space".
Use it for virtio-mem, to disallow mapping any virtio-mem memory via
/dev/mem to user space after the virtio-mem driver was loaded.
This patch (of 3):
We end up traversing subtrees of ranges we are not interested in; let's
optimize this case, skipping such subtrees, cleaning up the function a
bit.
For example, in the following configuration (/proc/iomem):
We don't have to look at any children of "0009d000-000fffff : Reserved"
if we can just skip these 15 items directly because the parent range is
not of interest.
Link: https://lkml.kernel.org/r/20210920142856.17758-1-david@redhat.com Link: https://lkml.kernel.org/r/20210920142856.17758-2-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Hanjun Guo <guohanjun@huawei.com> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is related to two previous changes. Commit 6b77697dca0a
("scripts/gdb: find vmlinux where it was before") and commit a5e8c18f9bb4
("scripts/gdb: handle split debug").
Although Chrome OS has been using the debug suffix for modules for a
while, it has just recently started using it for vmlinux as well. That
means we've now got to improve the detection of "vmlinux" to also handle
that it might end with ".debug".
Link: https://lkml.kernel.org/r/20211028151120.v2.1.Ie6bd5a232f770acd8c9ffae487a02170bad3e963@changeid Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Stephen Boyd <swboyd@chromium.org> Cc: Jan Kiszka <jan.kiszka@siemens.com> Cc: Kieran Bingham <kbingham@kernel.org> Cc: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
kcov: replace local_irq_save() with a local_lock_t
The kcov code mixes local_irq_save() and spin_lock() in
kcov_remote_{start|end}(). This creates a warning on PREEMPT_RT because
local_irq_save() disables interrupts and spin_lock_t is turned into a
sleeping lock which can not be acquired in a section with disabled
interrupts.
The kcov_remote_lock is used to synchronize the access to the hash-list
kcov_remote_map. The local_irq_save() block protects access to the
per-CPU data kcov_percpu_data.
There is no compelling reason to change the lock type to raw_spin_lock_t
to make it work with local_irq_save(). Changing it would require to
move memory allocation (in kcov_remote_add()) and deallocation outside
of the locked section.
Adding an unlimited amount of entries to the hashlist will increase the
IRQ-off time during lookup. It could be argued that this is debug code
and the latency does not matter. There is however no need to do so and
it would allow to use this facility in an RT enabled build.
Using a local_lock_t instead of local_irq_save() has the befit of adding
a protection scope within the source which makes it obvious what is
protected. On a !PREEMPT_RT && !LOCKDEP build the local_lock_irqsave()
maps directly to local_irq_save() so there is overhead at runtime.
Replace the local_irq_save() section with a local_lock_t.
Link: https://lkml.kernel.org/r/20210923164741.1859522-6-bigeasy@linutronix.de Link: https://lore.kernel.org/r/20210830172627.267989-6-bigeasy@linutronix.de Reported-by: Clark Williams <williams@redhat.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Dmitry Vyukov <dvyukov@google.com> Acked-by: Marco Elver <elver@google.com> Tested-by: Marco Elver <elver@google.com> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
kcov: avoid enable+disable interrupts if !in_task()
kcov_remote_start() may need to allocate memory in the in_task() case
(otherwise per-CPU memory has been pre-allocated) and therefore requires
enabled interrupts.
The interrupts are enabled before checking if the allocation is required
so if no allocation is required then the interrupts are needlessly enabled
and disabled again.
Enable interrupts only if memory allocation is performed.
Link: https://lkml.kernel.org/r/20210923164741.1859522-5-bigeasy@linutronix.de Link: https://lore.kernel.org/r/20210830172627.267989-5-bigeasy@linutronix.de Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Dmitry Vyukov <dvyukov@google.com> Acked-by: Marco Elver <elver@google.com> Tested-by: Marco Elver <elver@google.com> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Cc: Clark Williams <williams@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Documentation/kcov: include types.h in the example
Patch series "kcov: PREEMPT_RT fixup + misc", v2.
The last patch in series is follow-up to address the PREEMPT_RT issue
within in kcov reported by Clark [1]. Patches 1-3 are smaller things that
I noticed while staring at it. Patch 4 is small change which makes
replacement in #5 simpler / more obvious.
The first example code has includes at the top, the following two
example share that part. The last example (remote coverage collection)
requires the linux/types.h header file due its __aligned_u64 usage.
Add the linux/types.h to the top most example and a comment that the
header files from above are required as it is done in the second
example.
Pavel Skripkin [Tue, 9 Nov 2021 02:35:25 +0000 (18:35 -0800)]
sysv: use BUILD_BUG_ON instead of runtime check
There were runtime checks about sizes of struct v7_super_block and struct
sysv_inode. If one of these checks fail the kernel will panic. Since
these values are known at compile time let's use BUILD_BUG_ON(), because
it's a standard mechanism for validation checking at build time
Link: https://lkml.kernel.org/r/20210813123020.22971-1-paskripkin@gmail.com Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Muchun Song [Tue, 9 Nov 2021 02:35:19 +0000 (18:35 -0800)]
seq_file: fix passing wrong private data
DEFINE_PROC_SHOW_ATTRIBUTE() is supposed to be used to define a series
of functions and variables to register proc file easily. And the users
can use proc_create_data() to pass their own private data and get it
via seq->private in the callback. Unfortunately, the proc file system
use PDE_DATA() to get private data instead of inode->i_private. So fix
it. Fortunately, there only one user of it which does not pass any
private data, so this bug does not break any in-tree codes.
Link: https://lkml.kernel.org/r/20211029032638.84884-1-songmuchun@bytedance.com Fixes: 72e086d9fe20 ("proc: convert everything to "struct proc_ops"") Signed-off-by: Muchun Song <songmuchun@bytedance.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Florent Revest <revest@chromium.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ye Guojin [Tue, 9 Nov 2021 02:35:10 +0000 (18:35 -0800)]
crash_dump: remove duplicate include in crash_dump.h
In crash_dump.h, header file <linux/pgtable.h> is included twice. This
duplication was introduced in commit f844577d34a6("mm: reorder includes
after introduction of linux/pgtable.h") where the order of the header
files is adjusted, while the old one was not removed.
Clean it up here.
Link: https://lkml.kernel.org/r/20211020090659.1038877-1-ye.guojin@zte.com.cn Signed-off-by: Ye Guojin <ye.guojin@zte.com.cn> Reported-by: Zeal Robot <zealci@zte.com.cn> Acked-by: Baoquan He <bhe@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Changcheng Deng <deng.changcheng@zte.com.cn> Cc: Simon Horman <horms@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Arnd Bergmann [Tue, 9 Nov 2021 02:35:04 +0000 (18:35 -0800)]
hfs/hfsplus: use WARN_ON for sanity check
gcc warns about a couple of instances in which a sanity check exists but
the author wasn't sure how to react to it failing, which makes it look
like a possible bug:
fs/hfsplus/inode.c: In function 'hfsplus_cat_read_inode':
fs/hfsplus/inode.c:503:37: error: suggest braces around empty body in an 'if' statement [-Werror=empty-body]
503 | /* panic? */;
| ^
fs/hfsplus/inode.c:524:37: error: suggest braces around empty body in an 'if' statement [-Werror=empty-body]
524 | /* panic? */;
| ^
fs/hfsplus/inode.c: In function 'hfsplus_cat_write_inode':
fs/hfsplus/inode.c:582:37: error: suggest braces around empty body in an 'if' statement [-Werror=empty-body]
582 | /* panic? */;
| ^
fs/hfsplus/inode.c:608:37: error: suggest braces around empty body in an 'if' statement [-Werror=empty-body]
608 | /* panic? */;
| ^
fs/hfs/inode.c: In function 'hfs_write_inode':
fs/hfs/inode.c:464:37: error: suggest braces around empty body in an 'if' statement [-Werror=empty-body]
464 | /* panic? */;
| ^
fs/hfs/inode.c:485:37: error: suggest braces around empty body in an 'if' statement [-Werror=empty-body]
485 | /* panic? */;
| ^
panic() is probably not the correct choice here, but a WARN_ON
seems appropriate and avoids the compile-time warning.
Link: https://lkml.kernel.org/r/20210927102149.1809384-1-arnd@kernel.org Link: https://lore.kernel.org/all/20210322223249.2632268-1-arnd@kernel.org/ Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Christian Brauner <christian.brauner@ubuntu.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Harkes [Tue, 9 Nov 2021 02:34:45 +0000 (18:34 -0800)]
coda: avoid doing bad things on inode type changes during revalidation
When Coda discovers an inconsistent object, it turns it into a symlink.
However we can't just follow this change in the kernel on an existing file
or directory inode that may still have references.
This patch removes the inconsistent inode from the inode hash and
allocates a new inode for the symlink object.
Link: https://lkml.kernel.org/r/20210908140308.18491-7-jaharkes@cs.cmu.edu Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu> Cc: Alex Shi <alex.shi@linux.alibaba.com> Cc: Jing Yangyang <jing.yangyang@zte.com.cn> Cc: Xin Tan <tanxin.ctf@gmail.com> Cc: Xiyu Yang <xiyuyang19@fudan.edu.cn> Cc: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Harkes [Tue, 9 Nov 2021 02:34:42 +0000 (18:34 -0800)]
coda: avoid hidden code duplication in rename
We were actually fixing up the directory mtime in both branches after the
negative dentry test, it was just that one branch was only flagging the
directory inodes to refresh their attributes while the other branch used
the optional optimization to set mtime to the current time and not go back
to the Coda client.
Link: https://lkml.kernel.org/r/20210908140308.18491-6-jaharkes@cs.cmu.edu Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu> Cc: Alex Shi <alex.shi@linux.alibaba.com> Cc: Jing Yangyang <jing.yangyang@zte.com.cn> Cc: Xin Tan <tanxin.ctf@gmail.com> Cc: Xiyu Yang <xiyuyang19@fudan.edu.cn> Cc: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Harkes [Tue, 9 Nov 2021 02:34:39 +0000 (18:34 -0800)]
coda: avoid flagging NULL inodes
Somehow we hit a negative dentry in coda_rename even after checking with
d_really_is_positive. Maybe something raced and turned the new_dentry
negative while we were fixing up directory link counts.
Link: https://lkml.kernel.org/r/20210908140308.18491-5-jaharkes@cs.cmu.edu Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu> Cc: Alex Shi <alex.shi@linux.alibaba.com> Cc: Jing Yangyang <jing.yangyang@zte.com.cn> Cc: Xin Tan <tanxin.ctf@gmail.com> Cc: Xiyu Yang <xiyuyang19@fudan.edu.cn> Cc: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Harkes [Tue, 9 Nov 2021 02:34:33 +0000 (18:34 -0800)]
coda: check for async upcall request using local state
Originally flagged by Smatch because the code implicitly assumed outSize
is not NULL for non-async upcalls because of a flag that was (not) set in
req->uc_flags.
However req->uc_flags field is in shared state and although the current
code will not allow it to be changed before the async request check the
code is more robust when it tests against the local outSize variable.
Link: https://lkml.kernel.org/r/20210908140308.18491-3-jaharkes@cs.cmu.edu Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu> Cc: Alex Shi <alex.shi@linux.alibaba.com> Cc: Jing Yangyang <jing.yangyang@zte.com.cn> Cc: Xin Tan <tanxin.ctf@gmail.com> Cc: Xiyu Yang <xiyuyang19@fudan.edu.cn> Cc: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Harkes [Tue, 9 Nov 2021 02:34:30 +0000 (18:34 -0800)]
coda: avoid NULL pointer dereference from a bad inode
Patch series "Coda updates for -next".
The following patch series contains some fixes for the Coda kernel module
I've had sitting around and were tested extensively in a development
version of the Coda kernel module that lives outside of the main kernel.
This patch (of 9):
Avoid accessing coda_inode_info from a dentry with a bad inode.
Andrew Halaney [Tue, 9 Nov 2021 02:34:27 +0000 (18:34 -0800)]
init: make unknown command line param message clearer
The prior message is confusing users, which is the exact opposite of the
goal. If the message is being seen, one of the following situations is
happening:
1. the param is misspelled
2. the param is not valid due to the kernel configuration
3. the param is intended for init but isn't after the '--'
delineator on the command line
To make that more clear to the user, explicitly mention "kernel command
line" and also note that the params are still passed to user space to
avoid causing any alarm over params intended for init.
Link: https://lkml.kernel.org/r/20211013223502.96756-1-ahalaney@redhat.com Fixes: a3957777decc ("init: print out unknown kernel parameters") Signed-off-by: Andrew Halaney <ahalaney@redhat.com> Suggested-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Acked-by: Randy Dunlap <rdunlap@infradead.org> Cc: Borislav Petkov <bp@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
yangerkun [Tue, 9 Nov 2021 02:34:24 +0000 (18:34 -0800)]
ramfs: fix mount source show for ramfs
ramfs_parse_param does not parse key "source", and will convert
-ENOPARAM to 0. This will skip vfs_parse_fs_param_source in vfs_parse_fs_param, which
lead always "none" mount source for ramfs.
Fix it by parsing "source" in ramfs_parse_param like cgroup1_parse_param
does.
Kefeng Wang [Tue, 9 Nov 2021 02:34:02 +0000 (18:34 -0800)]
sections: provide internal __is_kernel() and __is_kernel_text() helper
An internal __is_kernel() helper which only check the kernel address
ranges, and an internal __is_kernel_text() helper which only check text
section ranges.
Link: https://lkml.kernel.org/r/20210930071143.63410-7-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Alexander Potapenko <glider@google.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: "David S. Miller" <davem@davemloft.net> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Paul Mackerras <paulus@samba.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kefeng Wang [Tue, 9 Nov 2021 02:33:58 +0000 (18:33 -0800)]
x86: mm: rename __is_kernel_text() to is_x86_32_kernel_text()
Commit 8994a854a8c5 ("x86/mm: Rename is_kernel_text to __is_kernel_text"),
add '__' prefix not to get in conflict with existing is_kernel_text() in
<linux/kallsyms.h>.
We will add __is_kernel_text() for the basic kernel text range check in
the next patch, so use private is_x86_32_kernel_text() naming for x86
special check.
Link: https://lkml.kernel.org/r/20210930071143.63410-6-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Alexander Potapenko <glider@google.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: "David S. Miller" <davem@davemloft.net> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Paul Mackerras <paulus@samba.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kefeng Wang [Tue, 9 Nov 2021 02:33:54 +0000 (18:33 -0800)]
sections: move is_kernel_inittext() into sections.h
The is_kernel_inittext() and init_kernel_text() are with same
functionality, let's just keep is_kernel_inittext() and move it into
sections.h, then update all the callers.
Link: https://lkml.kernel.org/r/20210930071143.63410-5-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Alexander Potapenko <glider@google.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: "David S. Miller" <davem@davemloft.net> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Paul Mackerras <paulus@samba.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kefeng Wang [Tue, 9 Nov 2021 02:33:47 +0000 (18:33 -0800)]
kallsyms: fix address-checks for kernel related range
The is_kernel_inittext/is_kernel_text/is_kernel function should not
include the end address(the labels _einittext, _etext and _end) when check
the address range, the issue exists since Linux v2.6.12.
Link: https://lkml.kernel.org/r/20210930071143.63410-3-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Acked-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Alexander Potapenko <glider@google.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: "David S. Miller" <davem@davemloft.net> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Paul Mackerras <paulus@samba.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kefeng Wang [Tue, 9 Nov 2021 02:33:43 +0000 (18:33 -0800)]
kallsyms: remove arch specific text and data check
Patch series "sections: Unify kernel sections range check and use", v4.
There are three head files(kallsyms.h, kernel.h and sections.h) which
include the kernel sections range check, let's make some cleanup and unify
them.
1. cleanup arch specific text/data check and fix address boundary check
in kallsyms.h
2. make all the basic/core kernel range check function into sections.h
3. update all the callers, and use the helper in sections.h to simplify
the code
After this series, we have 5 APIs about kernel sections range check in
sections.h
* is_kernel_rodata() --- already in sections.h
* is_kernel_core_data() --- come from core_kernel_data() in kernel.h
* is_kernel_inittext() --- come from kernel.h and kallsyms.h
* __is_kernel_text() --- add new internal helper
* __is_kernel() --- add new internal helper
Note: For the last two helpers, people should not use directly, consider to
use corresponding function in kallsyms.h.
This patch (of 11):
Remove arch specific text and data check after commit 266577dab10d ("arch:
remove blackfin port"), no need arch-specific text/data check.
Link: https://lkml.kernel.org/r/20210930071143.63410-1-wangkefeng.wang@huawei.com Link: https://lkml.kernel.org/r/20210930071143.63410-2-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: David S. Miller <davem@davemloft.net> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Cc: Michal Simek <monstr@monstr.eu> Cc: Petr Mladek <pmladek@suse.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kees Cook [Tue, 9 Nov 2021 02:33:37 +0000 (18:33 -0800)]
binfmt_elf: reintroduce using MAP_FIXED_NOREPLACE
Commit 9dd7ec872d62 ("elf: don't use MAP_FIXED_NOREPLACE for elf
executable mappings") reverted back to using MAP_FIXED to map ELF LOAD
segments because it was found that the segments in some binaries overlap
and can cause MAP_FIXED_NOREPLACE to fail.
The original intent of MAP_FIXED_NOREPLACE in the ELF loader was to
prevent the silent clobbering of an existing mapping (e.g. stack) by
the ELF image, which could lead to exploitable conditions. Quoting
commit 551923430331 ("fs, elf: drop MAP_FIXED usage from elf_map"),
which originally introduced the use of MAP_FIXED_NOREPLACE in the
loader:
Both load_elf_interp and load_elf_binary rely on elf_map to map
segments [to a specific] address and they use MAP_FIXED to enforce
that. This is however [a] dangerous thing prone to silent data
corruption which can be even exploitable.
...
Let's take CVE-2017-1000253 as an example ... we could end up mapping
[the executable] over the existing stack ... The [stack layout] issue
has been fixed since then ... So we should be safe and any [similar]
attack should be impractical. On the other hand this is just too
subtle [an] assumption ... it can break quite easily and [be] hard to
spot.
...
Address this [weakness] by changing MAP_FIXED to the newly added
MAP_FIXED_NOREPLACE. This will mean that mmap will fail if there is
an existing mapping clashing with the requested one [instead of
silently] clobbering it.
Then processing ET_DYN binaries the loader already calculates a total
size for the image when the first segment is mapped, maps the entire
image, and then unmaps the remainder before the remaining segments are
then individually mapped.
To avoid the earlier problems (legitimate overlapping LOAD segments
specified in the ELF), apply the same logic to ET_EXEC binaries as well.
For both ET_EXEC and ET_DYN+INTERP use MAP_FIXED_NOREPLACE for the
initial total size mapping and then use MAP_FIXED to build the final
(possibly legitimately overlapping) mappings. For ET_DYN w/out INTERP,
continue to map at a system-selected address in the mmap region.
Link: https://lkml.kernel.org/r/20210916215947.3993776-1-keescook@chromium.org Link: https://lore.kernel.org/lkml/1595869887-23307-2-git-send-email-anthony.yznaga@oracle.com Co-developed-by: Anthony Yznaga <anthony.yznaga@oracle.com> Signed-off-by: Anthony Yznaga <anthony.yznaga@oracle.com> Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Michal Hocko <mhocko@suse.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Chen Jingwen <chenjingwen6@huawei.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andrei Vagin <avagin@openvz.org> Cc: Khalid Aziz <khalid.aziz@oracle.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thomas Gleixner [Tue, 9 Nov 2021 02:33:25 +0000 (18:33 -0800)]
mm/scatterlist: replace the !preemptible warning in sg_miter_stop()
sg_miter_stop() checks for disabled preemption before unmapping a page
via kunmap_atomic(). The kernel doc mentions under context that
preemption must be disabled if SG_MITER_ATOMIC is set.
There is no active requirement for the caller to have preemption
disabled before invoking sg_mitter_stop(). The sg_mitter_*()
implementation itself has no such requirement.
In fact, preemption is disabled by kmap_atomic() as part of
sg_miter_next() and remains disabled as long as there is an active
SG_MITER_ATOMIC mapping. This is a consequence of kmap_atomic() and not
a requirement for sg_mitter_*() itself.
The user chooses SG_MITER_ATOMIC because it uses the API in a context
where blocking is not possible or blocking is possible but he chooses a
lower weight mapping which is not available on all CPUs and so it might
need less overhead to setup at a price that now preemption will be
disabled.
The kmap_atomic() implementation on PREEMPT_RT does not disable
preemption. It simply disables CPU migration to ensure that the task
remains on the same CPU while the caller remains preemptible. This in
turn triggers the warning in sg_miter_stop() because preemption is
allowed.
The PREEMPT_RT and !PREEMPT_RT implementation of kmap_atomic() disable
pagefaults as a requirement. It is sufficient to check for this instead
of disabled preemption.
Check for disabled pagefault handler in the SG_MITER_ATOMIC case.
Remove the "preemption disabled" part from the kernel doc as the
sg_milter*() implementation does not care.
[bigeasy@linutronix.de: commit description]
Link: https://lkml.kernel.org/r/20211015211409.cqopacv3pxdwn2ty@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Lucas De Marchi [Tue, 9 Nov 2021 02:33:19 +0000 (18:33 -0800)]
include/linux/string_helpers.h: add linux/string.h for strlen()
linux/string_helpers.h uses strlen(), so include the correponding header.
Otherwise we get a compilation error if it's not also included by whoever
included the helper.
Imran Khan [Tue, 9 Nov 2021 02:33:16 +0000 (18:33 -0800)]
lib, stackdepot: add helper to print stack entries into buffer
To print stack entries into a buffer, users of stackdepot, first get a
list of stack entries using stack_depot_fetch and then print this list
into a buffer using stack_trace_snprint. Provide a helper in stackdepot
for this purpose. Also change above mentioned users to use this helper.
[imran.f.khan@oracle.com: fix build error] Link: https://lkml.kernel.org/r/20210915175321.3472770-4-imran.f.khan@oracle.com
[imran.f.khan@oracle.com: export stack_depot_snprint() to modules] Link: https://lkml.kernel.org/r/20210916133535.3592491-4-imran.f.khan@oracle.com Link: https://lkml.kernel.org/r/20210915014806.3206938-4-imran.f.khan@oracle.com Signed-off-by: Imran Khan <imran.f.khan@oracle.com> Suggested-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Jani Nikula <jani.nikula@intel.com> [i915] Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: David Airlie <airlied@linux.ie> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Maxime Ripard <mripard@kernel.org> Cc: Thomas Zimmermann <tzimmermann@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Imran Khan [Tue, 9 Nov 2021 02:33:12 +0000 (18:33 -0800)]
lib, stackdepot: add helper to print stack entries
To print a stack entries, users of stackdepot, first use stack_depot_fetch
to get a list of stack entries and then use stack_trace_print to print
this list. Provide a helper in stackdepot to print stack entries based on
stackdepot handle. Also change above mentioned users to use this helper.
Imran Khan [Tue, 9 Nov 2021 02:33:09 +0000 (18:33 -0800)]
lib, stackdepot: check stackdepot handle before accessing slabs
Patch series "lib, stackdepot: check stackdepot handle before accessing slabs", v2.
PATCH-1: Checks validity of a stackdepot handle before proceeding to
access stackdepot slab/objects.
PATCH-2: Adds a helper in stackdepot, to allow users to print stack
entries just by specifying the stackdepot handle. It also changes such
users to use this new interface.
PATCH-3: Adds a helper in stackdepot, to allow users to print stack
entries into buffers just by specifying the stackdepot handle and
destination buffer. It also changes such users to use this new interface.
This patch (of 3):
stack_depot_save allocates slabs that will be used for storing objects in
future.If this slab allocation fails we may get to a situation where space
allocation for a new stack_record fails, causing stack_depot_save to
return 0 as handle. If user of this handle ends up invoking
stack_depot_fetch with this handle value, current implementation of
stack_depot_fetch will end up using slab from wrong index. To avoid this
check handle value at the beginning.
Lukas Bulwahn [Tue, 9 Nov 2021 02:33:02 +0000 (18:33 -0800)]
MAINTAINERS: rectify entry for INTEL KEEM BAY DRM DRIVER
Commit a1802de40c77 ("drm/kmb: Build files for KeemBay Display driver")
refers to the non-existing file intel,kmb_display.yaml in
Documentation/devicetree/bindings/display/.
Commit 3d60bc24bae3 ("dt-bindings: display: Add support for Intel
KeemBay Display") originating from the same patch series however adds
the file intel,keembay-display.yaml in that directory instead.
So, refer to intel,keembay-display.yaml in the INTEL KEEM BAY DRM DRIVER
section instead.
Link: https://lkml.kernel.org/r/20211026141902.4865-4-lukas.bulwahn@gmail.com Fixes: a1802de40c77 ("drm/kmb: Build files for KeemBay Display driver") Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Cc: Anitha Chrisanthus <anitha.chrisanthus@intel.com> Cc: Edmund Dea <edmund.j.dea@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Joe Perches <joe@perches.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Cc: Nobuhiro Iwamatsu <nobuhiro1.iwamatsu@toshiba.co.jp> Cc: Punit Agrawal <punit1.agrawal@toshiba.co.jp> Cc: Ralf Ramsauer <ralf.ramsauer@oth-regensburg.de> Cc: Rob Herring <robh+dt@kernel.org> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Wilken Gottwalt <wilken.gottwalt@posteo.net> Cc: Yu Chen <chenyu56@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Lukas Bulwahn [Tue, 9 Nov 2021 02:32:59 +0000 (18:32 -0800)]
MAINTAINERS: rectify entry for HIKEY960 ONBOARD USB GPIO HUB DRIVER
Commit 9c78691b3824 ("misc: hisi_hikey_usb: Driver to support onboard
USB gpio hub on Hikey960") refers to the non-existing file
Documentation/devicetree/bindings/misc/hisilicon-hikey-usb.yaml, but
this commit's patch series does not add any related devicetree binding
in misc.
So, just drop this file reference in HIKEY960 ONBOARD USB GPIO HUB DRIVER.
Link: https://lkml.kernel.org/r/20211026141902.4865-3-lukas.bulwahn@gmail.com Fixes: 9c78691b3824 ("misc: hisi_hikey_usb: Driver to support onboard USB gpio hub on Hikey960") Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Cc: Anitha Chrisanthus <anitha.chrisanthus@intel.com> Cc: Edmund Dea <edmund.j.dea@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Joe Perches <joe@perches.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Cc: Nobuhiro Iwamatsu <nobuhiro1.iwamatsu@toshiba.co.jp> Cc: Punit Agrawal <punit1.agrawal@toshiba.co.jp> Cc: Ralf Ramsauer <ralf.ramsauer@oth-regensburg.de> Cc: Rob Herring <robh+dt@kernel.org> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Wilken Gottwalt <wilken.gottwalt@posteo.net> Cc: Yu Chen <chenyu56@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>