During error on CLOSE_INSTANCE command, ctx_work_bits was not getting
cleared. During consequent mfc execution NULL pointer dereferencing of
this context led to kernel panic. This patch fixes this issue by making
sure to clear ctx_work_bits always.
On receiving last buffer driver puts MFC to MFCINST_FINISHING state which
in turn skips transferring of frame from SRC to REF queue. This causes
driver to stop MFC encoding and last frame is lost.
This patch guarantees safe handling of frames during MFCINST_FINISHING and
correct clearing of workbit to avoid early stopping of encoding.
The member void *data in the structure devfreq can be overwrite
by governor_userspace. For example:
1. The device driver assigned the devfreq governor to simple_ondemand
by the function devfreq_add_device() and init the devfreq member
void *data to a pointer of a static structure devfreq_simple_ondemand_data
by the function devfreq_add_device().
2. The user changed the devfreq governor to userspace by the command
"echo userspace > /sys/class/devfreq/.../governor".
3. The governor userspace alloced a dynamic memory for the struct
userspace_data and assigend the member void *data of devfreq to
this memory by the function userspace_init().
4. The user changed the devfreq governor back to simple_ondemand
by the command "echo simple_ondemand > /sys/class/devfreq/.../governor".
5. The governor userspace exited and assigned the member void *data
in the structure devfreq to NULL by the function userspace_exit().
6. The governor simple_ondemand fetched the static information of
devfreq_simple_ondemand_data in the function
devfreq_simple_ondemand_func() but the member void *data of devfreq was
assigned to NULL by the function userspace_exit().
7. The information of upthreshold and downdifferential is lost
and the governor simple_ondemand can't work correctly.
The member void *data in the structure devfreq is designed for
a static pointer used in a governor and inited by the function
devfreq_add_device(). This patch add an element named governor_data
in the devfreq structure which can be used by a governor(E.g userspace)
who want to assign a private data to do some private things.
Now, extended attribute value maximum length is 64K. The memory
requested here does not need continuous physical addresses, so it is
appropriate to use kvmalloc to request memory. At the same time, it
can also cope with the situation that the extended attribute will
become longer in the future.
Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20221208023233.1231330-3-yebin@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When expanding inode space in ext4_expand_extra_isize_ea() we may need
to allocate external xattr block. If quota is not initialized for the
inode, the block allocation will not be accounted into quota usage. Make
sure the quota is initialized before we try to expand inode space.
Make sure we initialize quotas before possibly expanding inode space
(and thus maybe needing to allocate external xattr block) in
ext4_ioctl_setproject(). This prevents not accounting the necessary
block allocation.
This occurs in 'ext4_xattr_inode_create()'. If 'ext4_mark_inode_dirty()'
fails, dropping i_nlink of the inode is needed. Or will lead to inode leak.
Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20221208023233.1231330-5-yebin@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 984ea3e1a886 ("ext4: limit block allocations for indirect-block
files to < 2^32") added code to try to allocate xattr block with 32-bit
block number for indirect block based files on the grounds that these
files cannot use larger block numbers. It also added BUG_ON when
allocated block could not fit into 32 bits. This is however bogus
reasoning because xattr block is stored in inode->i_file_acl and
inode->i_file_acl_hi and as such even indirect block based files can
happily use full 48 bits for xattr block number. The proper handling
seems to be there basically since 64-bit block number support was added.
So remove the bogus limitation and BUG_ON.
Cc: Eric Sandeen <sandeen@redhat.com> Fixes: 984ea3e1a886 ("ext4: limit block allocations for indirect-block files to < 2^32") Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20221121130929.32031-1-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If a block is out of range in ext4_get_branch(), -ENOMEM will be returned
to user-space. Obviously, this error code isn't really useful. This
patch fixes it by making sure the right error code (-EFSCORRUPTED) is
propagated to user-space. EUCLEAN is more informative than ENOMEM.
When a backup superblock is updated in update_backups(), the primary
superblock's offset in the group (that is, sbi->s_sbh->b_blocknr) is used
as the backup superblock's offset in its group. However, when the block
size is 1K and bigalloc is enabled, the two offsets are not equal. This
causes the backup group descriptors to be overwritten by the superblock
in update_backups(). Moreover, if meta_bg is enabled, the file system will
be corrupted because this feature uses backup group descriptors.
To solve this issue, we use a more accurate ext4_group_first_block_no() as
the offset of the backup superblock in its group.
Fixes: faf538cfe89f ("ext4: add support for online resizing with bigalloc") Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: stable@kernel.org Link: https://lore.kernel.org/r/20221117040341.1380702-4-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When converting files with inline data to extents, delayed allocations
made on a file system created with both the bigalloc and inline options
can result in invalid extent status cache content, incorrect reserved
cluster counts, kernel memory leaks, and potential kernel panics.
With bigalloc, the code that determines whether a block must be
delayed allocated searches the extent tree to see if that block maps
to a previously allocated cluster. If not, the block is delayed
allocated, and otherwise, it isn't. However, if the inline option is
also used, and if the file containing the block is marked as able to
store data inline, there isn't a valid extent tree associated with
the file. The current code in ext4_clu_mapped() calls
ext4_find_extent() to search the non-existent tree for a previously
allocated cluster anyway, which typically finds nothing, as desired.
However, a side effect of the search can be to cache invalid content
from the non-existent tree (garbage) in the extent status tree,
including bogus entries in the pending reservation tree.
To fix this, avoid searching the extent tree when allocating blocks
for bigalloc + inline files that are being converted from inline to
extent mapped.
As 'ext4_rename' will modify 'old.inode' ctime and mark inode dirty,
which may trigger expand 'extra_isize' and allocate block. If inode
didn't init quota will lead to warning. To solve above issue, init
'old.inode' firstly in 'ext4_rename'.
Reported-by: syzbot+98346927678ac3059c77@syzkaller.appspotmail.com Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20221107015335.2524319-1-yebin@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In the above issue, ioctl invokes the swap_inode_boot_loader function to
swap inode<5> and inode<12>. However, inode<5> contain incorrect imode and
disordered extents, and i_nlink is set to 1. The extents check for inode in
the ext4_iget function can be bypassed bacause 5 is EXT4_BOOT_LOADER_INO.
While links_count is set to 1, the extents are not initialized in
swap_inode_boot_loader. After the ioctl command is executed successfully,
the extents are swapped to inode<12>, in this case, run the `cat` command
to view inode<12>. And Bug_ON is triggered due to the incorrect extents.
When the boot loader inode is not initialized, its imode can be one of the
following:
1) the imode is a bad type, which is marked as bad_inode in ext4_iget and
set to S_IFREG.
2) the imode is good type but not S_IFREG.
3) the imode is S_IFREG.
The BUG_ON may be triggered by bypassing the check in cases 1 and 2.
Therefore, when the boot loader inode is bad_inode or its imode is not
S_IFREG, initialize the inode to avoid triggering the BUG.
Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jason Yan <yanaijie@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20221026042310.3839669-5-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When bigalloc is enabled, reserved cluster accounting for delayed
allocation is handled in extent_status.c. With a corrupted file
system, it's possible for this accounting to be incorrect,
dsicovered by Syzbot:
EXT4-fs error (device loop0): ext4_validate_block_bitmap:398: comm rep:
bg 0: block 5: invalid block bitmap
EXT4-fs (loop0): Delayed block allocation failed for inode 18 at logical
offset 0 with max blocks 32 with error 28
EXT4-fs (loop0): This should not happen!! Data will be lost
Before quota is enabled, a check on the preset quota inums in
ext4_super_block is added to prevent wrong quota inodes from being loaded.
In addition, when the quota fails to be enabled, the quota type and quota
inum are printed to facilitate fault locating.
Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jason Yan <yanaijie@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20221026042310.3839669-3-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There are many places that will get unhappy (and crash) when ext4_iget()
returns a bad inode. However, if iget the boot loader inode, allows a bad
inode to be returned, because the inode may not be initialized. This
mechanism can be used to bypass some checks and cause panic. To solve this
problem, we add a special iget flag EXT4_IGET_BAD. Only with this flag
we'd be returning bad inode from ext4_iget(), otherwise we always return
the error code if the inode is bad inode.(suggested by Jan Kara)
Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jason Yan <yanaijie@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20221026042310.3839669-4-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Shifting signed 32-bit value by 31 bits is undefined, so changing
significant bit to unsigned. The UBSAN warning calltrace like below:
UBSAN: shift-out-of-bounds in fs/ext4/ext4.h:591:2
left shift of 1 by 31 places cannot be represented in type 'int'
Call Trace:
<TASK>
dump_stack_lvl+0x7d/0xa5
dump_stack+0x15/0x1b
ubsan_epilogue+0xe/0x4e
__ubsan_handle_shift_out_of_bounds+0x1e7/0x20c
ext4_init_fs+0x5a/0x277
do_one_initcall+0x76/0x430
kernel_init_freeable+0x3b3/0x422
kernel_init+0x24/0x1e0
ret_from_fork+0x1f/0x30
</TASK>
Fixes: e422d280d80e ("ext4: ensure Inode flags consistency are checked at build time") Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com> Link: https://lore.kernel.org/r/20221031055833.3966222-1-cuigaosheng1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In this issue, bg_inode_table_hi is overwritten as an incorrect value.
As a result, `block < end_block` cannot be met in grow_dev_page.
Therefore, __ext4_get_inode_loc always returns '-ENOMEM' and do_writepages
keeps retrying. As a result, the writeback process is in the D state due
to an infinite loop.
Add a check on inode table block in the __ext4_get_inode_loc function by
referring to ext4_read_inode_bitmap to avoid this infinite loop.
Invalid userspace dma surface copies could potentially overflow
the memcpy from the surface to the snooped image leading to crashes.
To fix it the dimensions of the copybox have to be validated
against the expected size of the snooped cursor.
A typical DP-MST unplug removes a KMS connector. However care must
be taken to properly synchronize with user-space. The expected
sequence of events is the following:
1. The kernel notices that the DP-MST port is gone.
2. The kernel marks the connector as disconnected, then sends a
uevent to make user-space re-scan the connector list.
3. User-space notices the connector goes from connected to disconnected,
disables it.
4. Kernel handles the IOCTL disabling the connector. On success,
the very last reference to the struct drm_connector is dropped and
drm_connector_cleanup() is called.
5. The connector is removed from the list, and a uevent is sent to tell
user-space that the connector disappeared.
The very last step was missing. As a result, user-space thought the
connector still existed and could try to disable it again. Since the
kernel no longer knows about the connector, that would end up with
EINVAL and confused user-space.
Fix this by sending a hotplug uevent from drm_connector_cleanup().
Signed-off-by: Simon Ser <contact@emersion.fr> Cc: stable@vger.kernel.org Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Lyude Paul <lyude@redhat.com> Cc: Jonas Ådahl <jadahl@redhat.com> Tested-by: Jonas Ådahl <jadahl@redhat.com> Reviewed-by: Lyude Paul <lyude@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221017153150.60675-2-contact@emersion.fr Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When add the 'a *:* rwm' entry to devcgroup A's whitelist, at first A's
exceptions will be cleaned and A's behavior is changed to
DEVCG_DEFAULT_ALLOW. Then parent's exceptions will be copyed to A's
whitelist. If copy failure occurs, just return leaving A to grant
permissions to all devices. And A may grant more permissions than
parent.
Backup A's whitelist and recover original exceptions after copy
failure.
Cc: stable@vger.kernel.org Fixes: 53ea10b1e18f ("device_cgroup: add proper checking when changing default behavior") Signed-off-by: Wang Weiyang <wangweiyang2@huawei.com> Reviewed-by: Aristeu Rozanski <aris@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The second (UID) strcmp in acpi_dev_hid_uid_match considers
"0" and "00" different, which can prevent device registration.
Have the AMD IOMMU driver's ivrs_acpihid parsing code remove
any leading zeroes to make the UID strcmp succeed. Now users
can safely specify "AMDxxxxx:00" or "AMDxxxxx:0" and expect
the same behaviour.
When pci_create_attr() fails, pci_remove_resource_files() is called which
will iterate over the res_attr[_wc] arrays and frees every non NULL entry.
To avoid a double free here set the array entry only after it's clear we
successfully initialized it.
pci_device_is_present() previously didn't work for VFs because it reads the
Vendor and Device ID, which are 0xffff for VFs, which looks like they
aren't present. Check the PF instead.
Wei Gong reported that if virtio I/O is in progress when the driver is
unbound or "0" is written to /sys/.../sriov_numvfs, the virtio I/O
operation hangs, which may result in output like this:
This happened because pci_device_is_present(VF) returned "false" in
virtio_pci_remove(), so it called virtio_break_device(). The broken vq
meant that vring_interrupt() skipped the vq.callback() that would have
completed the virtio I/O operation via virtblk_done().
[bhelgaas: commit log, simplify to always use pci_physfn(), add stable tag] Link: https://lore.kernel.org/r/20221026060912.173250-1-mst@redhat.com Reported-by: Wei Gong <gongwei833x@gmail.com> Tested-by: Wei Gong <gongwei833x@gmail.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Erase can be zeroed in spi_nor_parse_4bait() or
spi_nor_init_non_uniform_erase_map(). In practice it happened with
mt25qu256a, which supports 4K, 32K, 64K erases with 3b address commands,
but only 4K and 64K erase with 4b address commands.
Fixes: 94fd364f3325 ("mtd: spi-nor: fix erase_type array to indicate current map conf") Signed-off-by: Alexander Sverdlin <alexander.sverdlin@nokia.com> Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20211119081412.29732-1-alexander.sverdlin@nokia.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When fixing the problem mentioned in PATCH1, we also found
the following problem:
If the IPMI is disconnected and in the sending process, the
uninstallation driver will be stuck for a long time.
The main problem is that uninstalling the driver waits for curr_msg to
be sent or HOSED. After stopping tasklet, the only place to trigger the
timeout mechanism is the circular poll in shutdown_smi.
The poll function delays 10us and calls smi_event_handler(smi_info,10).
Smi_event_handler deducts 10us from kcs->ibf_timeout.
But the poll func is followed by schedule_timeout_uninterruptible(1).
The time consumed here is not counted in kcs->ibf_timeout.
So when 10us is deducted from kcs->ibf_timeout, at least 1 jiffies has
actually passed. The waiting time has increased by more than a
hundredfold.
Now instead of calling poll(). call smi_event_handler() directly and
calculate the elapsed time.
For verification, you can directly use ebpf to check the kcs->
ibf_timeout for each call to kcs_event() when IPMI is disconnected.
Decrement at normal rate before unloading. The decrement rate becomes
very slow after unloading.
- limit bitmap chunk size internal u64 variable to values not overflowing
the u32 bitmap superblock structure variable stored on persistent media
- assign bitmap chunk size internal u64 variable from unsigned values to
avoid possible sign extension artifacts when assigning from a s32 value
The bug has been there since at least kernel 4.0.
Steps to reproduce it:
1: mdadm -C /dev/mdx -l 1 --bitmap=internal --bitmap-chunk=256M -e 1.2
-n2 /dev/rnbd1 /dev/rnbd2
2 resize member device rnbd1 and rnbd2 to 8 TB
3 mdadm --grow /dev/mdx --size=max
The bitmap_chunksize will overflow without patch.
Cc: stable@vger.kernel.org Signed-off-by: Florian-Ewald Mueller <florian-ewald.mueller@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Three mount options: "tcpnodelay" and "noautotune" and "noblocksend"
were not displayed when passed in on cifs/smb3 mounts (e.g. displayed
in /proc/mounts e.g.). No change to defaults so these are not
displayed if not specified on mount.
Cc: stable@vger.kernel.org Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since rc was initialised to -ENOMEM in cifs_get_smb_ses(), when an
existing smb session was found, free_xid() would be called and then
print
CIFS: fs/cifs/connect.c: Existing tcp session with server found
CIFS: fs/cifs/connect.c: VFS: in cifs_get_smb_ses as Xid: 44 with uid: 0
CIFS: fs/cifs/connect.c: Existing smb sess found (status=1)
CIFS: fs/cifs/connect.c: VFS: leaving cifs_get_smb_ses (xid = 44) rc = -12
Fix this by initialising rc to 0 and then let free_xid() print this
instead
CIFS: fs/cifs/connect.c: Existing tcp session with server found
CIFS: fs/cifs/connect.c: VFS: in cifs_get_smb_ses as Xid: 14 with uid: 0
CIFS: fs/cifs/connect.c: Existing smb sess found (status=1)
CIFS: fs/cifs/connect.c: VFS: leaving cifs_get_smb_ses (xid = 14) rc = 0
Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Cc: stable@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The dvb-core tries to sync the releases of opened files at
dvb_dmxdev_release() with two refcounts: dvbdev->users and
dvr_dvbdev->users. A problem is present in those two syncs: when yet
another dvb_demux_open() is called during those sync waits,
dvb_demux_open() continues to process even if the device is being
closed. This includes the increment of the former refcount, resulting
in the leftover refcount after the sync of the latter refcount at
dvb_dmxdev_release(). It ends up with use-after-free, since the
function believes that all usages were gone and releases the
resources.
This patch addresses the problem by adding the check of dmxdev->exit
flag at dvb_demux_open(), just like dvb_dvr_open() already does. With
the exit flag check, the second call of dvb_demux_open() fails, hence
the further corruption can be avoided.
Also for avoiding the races of the dmxdev->exit flag reference, this
patch serializes the dmxdev->exit set up and the sync waits with the
dmxdev->mutex lock at dvb_dmxdev_release(). Without the mutex lock,
dvb_demux_open() (or dvb_dvr_open()) may run concurrently with
dvb_dmxdev_release(), which allows to skip the exit flag check and
continue the open process that is being closed.
CVE-2022-41218 is assigned to those bugs above.
Reported-by: Hyunwoo Kim <imv4bel@gmail.com> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/20220908132754.30532-1-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In function dvb_register_device() -> dvb_register_media_device() ->
dvb_create_media_entity(), dvb->entity is allocated and initialized. If
the initialization fails, it frees the dvb->entity, and return an error
code. The caller takes the error code and handles the error by calling
dvb_media_device_free(), which unregisters the entity and frees the
field again if it is not NULL. As dvb->entity may not NULLed in
dvb_create_media_entity() when the allocation of dvbdev->pad fails, a
double free may occur. This may also cause an Use After free in
media_device_unregister_entity().
Fix this by storing NULL to dvb->entity when it is freed.
clang-15's ability to elide loops completely became more aggressive when
it can deduce how a variable is being updated in a loop. Counting down
one variable by an increment of another can be replaced by a modulo
operation.
For 64b variables on 32b ARM EABI targets, this can result in the
compiler generating calls to __aeabi_uldivmod, which it does for a do
while loop in float64_rem().
For the kernel, we'd generally prefer that developers not open code 64b
division via binary / operators and instead use the more explicit
helpers from div64.h. On arm-linux-gnuabi targets, failure to do so can
result in linkage failures due to undefined references to
__aeabi_uldivmod().
While developers can avoid open coding divisions on 64b variables, the
compiler doesn't know that the Linux kernel has a partial implementation
of a compiler runtime (--rtlib) to enforce this convention.
It's also undecidable for the compiler whether the code in question
would be faster to execute the loop vs elide it and do the 64b division.
While I actively avoid using the internal -mllvm command line flags, I
think we get better code than using barrier() here, which will force
reloads+spills in the loop for all toolchains.
Link: https://github.com/ClangBuiltLinux/linux/issues/1666 Reported-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Tested-by: Nathan Chancellor <nathan@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The retries in load_ucode_intel_ap() were in place to support systems
with mixed steppings. Mixed steppings are no longer supported and there is
only one microcode image at a time. Any retries will simply reattempt to
apply the same image over and over without making progress.
[ bp: Zap the circumstantial reasoning from the commit message. ]
When generate a synthetic event with many params and then create a trace
action for it [1], kernel panic happened [2].
It is because that in trace_action_create() 'data->n_params' is up to
SYNTH_FIELDS_MAX (current value is 64), and array 'data->var_ref_idx'
keeps indices into array 'hist_data->var_refs' for each synthetic event
param, but the length of 'data->var_ref_idx' is TRACING_MAP_VARS_MAX
(current value is 16), so out-of-bound write happened when 'data->n_params'
more than 16. In this case, 'data->match_data.event' is overwritten and
eventually cause the panic.
To solve the issue, adjust the length of 'data->var_ref_idx' to be
SYNTH_FIELDS_MAX and add sanity checks to avoid out-of-bound write.
[1]
# cd /sys/kernel/tracing/
# echo "my_synth_event int v1; int v2; int v3; int v4; int v5; int v6;\
int v7; int v8; int v9; int v10; int v11; int v12; int v13; int v14;\
int v15; int v16; int v17; int v18; int v19; int v20; int v21; int v22;\
int v23; int v24; int v25; int v26; int v27; int v28; int v29; int v30;\
int v31; int v32; int v33; int v34; int v35; int v36; int v37; int v38;\
int v39; int v40; int v41; int v42; int v43; int v44; int v45; int v46;\
int v47; int v48; int v49; int v50; int v51; int v52; int v53; int v54;\
int v55; int v56; int v57; int v58; int v59; int v60; int v61; int v62;\
int v63" >> synthetic_events
# echo 'hist:keys=pid:ts0=common_timestamp.usecs if comm=="bash"' >> \
events/sched/sched_waking/trigger
# echo "hist:keys=next_pid:onmatch(sched.sched_waking).my_synth_event(\
pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,\
pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,\
pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,\
pid,pid,pid,pid,pid,pid,pid,pid,pid)" >> events/sched/sched_switch/trigger
Otherwise the commit that will be aborted will be associated with the
metadata objects that will be torn down. Must write needs_check flag
to metadata with a reset block manager.
Found through code-inspection (and compared against dm-thin.c).
When dm_resume() and dm_destroy() are concurrent, it will
lead to UAF, as follows:
BUG: KASAN: use-after-free in __run_timers+0x173/0x710
Write of size 8 at addr ffff88816d9490f0 by task swapper/0/0
<snip>
Call Trace:
<IRQ>
dump_stack_lvl+0x73/0x9f
print_report.cold+0x132/0xaa2
_raw_spin_lock_irqsave+0xcd/0x160
__run_timers+0x173/0x710
kasan_report+0xad/0x110
__run_timers+0x173/0x710
__asan_store8+0x9c/0x140
__run_timers+0x173/0x710
call_timer_fn+0x310/0x310
pvclock_clocksource_read+0xfa/0x250
kvm_clock_read+0x2c/0x70
kvm_clock_get_cycles+0xd/0x20
ktime_get+0x5c/0x110
lapic_next_event+0x38/0x50
clockevents_program_event+0xf1/0x1e0
run_timer_softirq+0x49/0x90
__do_softirq+0x16e/0x62c
__irq_exit_rcu+0x1fa/0x270
irq_exit_rcu+0x12/0x20
sysvec_apic_timer_interrupt+0x8e/0xc0
One of the concurrency UAF can be shown as below:
use free
do_resume |
__find_device_hash_cell |
dm_get |
atomic_inc(&md->holders) |
| dm_destroy
| __dm_destroy
| if (!dm_suspended_md(md))
| atomic_read(&md->holders)
| msleep(1)
dm_resume |
__dm_resume |
dm_table_resume_targets |
pool_resume |
do_waker #add delay work |
dm_put |
atomic_dec(&md->holders) |
| dm_table_destroy
| pool_dtr
| __pool_dec
| __pool_destroy
| destroy_workqueue
| kfree(pool) # free pool
time out
__do_softirq
run_timer_softirq # pool has already been freed
This can be easily reproduced using:
1. create thin-pool
2. dmsetup suspend pool
3. dmsetup resume pool
4. dmsetup remove_all # Concurrent with 3
The root cause of this UAF bug is that dm_resume() adds timer after
dm_destroy() skips cancelling the timer because of suspend status.
After timeout, it will call run_timer_softirq(), however pool has
already been freed. The concurrency UAF bug will happen.
Therefore, cancelling timer again in __pool_destroy().
Following process may generate a broken btree mixed with fresh and
stale btree nodes, which could get dm thin trapped in an infinite loop
while looking up data block:
Transaction 1: pmd->root = A, A->B->C // One path in btree
pmd->root = X, X->Y->Z // Copy-up
Transaction 2: X,Z is updated on disk, Y write failed.
// Commit failed, dm thin becomes read-only.
process_bio_read_only
dm_thin_find_block
__find_block
dm_btree_lookup(pmd->root)
The pmd->root points to a broken btree, Y may contain stale node
pointing to any block, for example X, which gets dm thin trapped into
a dead loop while looking up Z.
Fix this by setting pmd->root in __open_metadata(), so that dm thin
will use the last transaction's pmd->root if commit failed.
Function metadata_operation_failed() is called when operations failed
on dm pool metadata, dm pool will destroy and recreate metadata. So,
shrinker will be unregistered and registered, which could down write
shrinker_rwsem under pmd_write_lock.
Fix it by allocating dm_block_manager before locking pmd->root_lock
and destroying old dm_block_manager after unlocking pmd->root_lock,
then old dm_block_manager is replaced with new dm_block_manager under
pmd->root_lock. So, shrinker register/unregister could be done without
holding pmd->root_lock.
Fetch a reproducer in [Link].
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216676 Cc: stable@vger.kernel.org #v5.2+ Fixes: 32a50478816e ("dm thin: add read only and fail io modes") Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Same ABBA deadlock pattern fixed in commit 4b60f452ec51 ("dm thin: Fix
ABBA deadlock between shrink_slab and dm_pool_abort_metadata") to
DM-cache's metadata.
In 2016 Linus moved install_exec_creds immediately after
setup_new_exec, in binfmt_elf as a cleanup and as part of closing a
potential information leak.
Perform the same cleanup for the other binary formats.
Different binary formats doing the same things the same way makes exec
easier to reason about and easier to maintain.
Greg Ungerer reports:
> I tested the the whole series on non-MMU m68k and non-MMU arm
> (exercising binfmt_flat) and it all tested out with no problems,
> so for the binfmt_flat changes: Tested-by: Greg Ungerer <gerg@linux-m68k.org>
Ref: 7ef7b9e2eed4 ("binfmt_elf: switch to new creds when switching to new mm") Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Greg Ungerer <gerg@linux-m68k.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Stable-dep-of: e7f703ff2507 ("binfmt: Fix error return code in load_elf_fdpic_binary()") Signed-off-by: Sasha Levin <sashal@kernel.org>
In cpufreq_policy_alloc(), it will call uninitialed completion in
cpufreq_sysfs_release() when kobject_init_and_add() fails. And
that will cause a crash such as the following page fault in complete:
The pin configuration (done with generic pin controller helpers and
as expressed by bindings) requires children nodes with either:
1. "pins" property and the actual configuration,
2. another set of nodes with above point.
The qup_i2c12_default pin configuration used second method - with a
"pinmux" child.
Sparse reports that calling add_device_randomness() on `uid` is a
violation of address spaces. And indeed the next usage uses readl()
properly, but that was left out when passing it toadd_device_
randomness(). So instead copy the whole thing to the stack first.
If a file consists of an inline extent followed by a regular or prealloc
extent, then a legitimate attempt to resolve a logical address in the
non-inline region will result in add_all_parents reading the invalid
offset field of the inline extent. If the inline extent item is placed
in the leaf eb s.t. it is the first item, attempting to access the
offset field will not only be meaningless, it will go past the end of
the eb and cause this panic:
After a full run of a make_min_config test, I noticed there were a lot of
CONFIGs still enabled that really should not be. Looking at them, I
noticed they were all defined as "default y". The issue is that the test
simple removes the config and re-runs make oldconfig, which enables it
again because it is set to default 'y'. Instead, explicitly disable the
config with writing "# CONFIG_FOO is not set" to the file to keep it from
being set again.
With this change, one of my box's minconfigs went from 768 configs set,
down to 521 configs set.
Link: https://lkml.kernel.org/r/20221202115936.016fce23@gandalf.local.home Cc: stable@vger.kernel.org Fixes: 6a34f8a6f58b1 ("ktest: Added config_bisect test type") Reviewed-by: John 'Warthog9' Hawley (VMware) <warthog9@eaglescrag.net> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
LLCC driver uses REGMAP_MMIO for accessing the hardware registers. So
select the dependency in Kconfig. Without this, there will be errors
while building the driver with COMPILE_TEST only:
With char becoming unsigned by default, and with `char` alone being
ambiguous and based on architecture, signed chars need to be marked
explicitly as such. Use `s8` and `u8` types here, since that's what
surrounding code does. This fixes:
drivers/media/dvb-frontends/stv0288.c:471 stv0288_set_frontend() warn: assigning (-9) to unsigned variable 'tm'
drivers/media/dvb-frontends/stv0288.c:471 stv0288_set_frontend() warn: we never enter this loop
Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: linux-media@vger.kernel.org Cc: stable@vger.kernel.org Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fixes: d297b99b09d5 ("net/af_packet: add VLAN support for AF_PACKET SOCK_RAW GSO") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Acked-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Willem de Bruijn <willemb@google.com> Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently, the kernel drops GSO VLAN tagged packet if it's created with
socket(AF_PACKET, SOCK_RAW, 0) plus virtio_net_hdr.
The reason is AF_PACKET doesn't adjust the skb network header if there is
a VLAN tag. Then after virtio_net_hdr_set_proto() called, the skb->protocol
will be set to ETH_P_IP/IPv6. And in later inet/ipv6_gso_segment() the skb
is dropped as network header position is invalid.
Let's handle VLAN packets by adjusting network header position in
packet_parse_headers(). The adjustment is safe and does not affect the
later xmit as tap device also did that.
In packet_snd(), packet_parse_headers() need to be moved before calling
virtio_net_hdr_set_proto(), so we can set correct skb->protocol and
network header first.
There is no need to update tpacket_snd() as it calls packet_parse_headers()
in tpacket_fill_skb(), which is already before calling virtio_net_hdr_*
functions.
skb->no_fcs setting is also moved upper to make all skb settings together
and keep consistency with function packet_sendmsg_spkt().
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Willem de Bruijn <willemb@google.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Link: https://lore.kernel.org/r/20220425014502.985464-1-liuhangbin@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fixes: c543a949569b ("SUNRPC: Use gssproxy upcall for server RPCGSS authentication.") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: <stable@vger.kernel.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In check_acpi_tpm2(), we get the TPM2 table just to make
sure the table is there, not used after the init, so the
acpi_put_table() should be added to release the ACPI memory.
Fixes: 5774c0c63655 ("tpm_tis: Consolidate the platform and acpi probe flow") Cc: stable@vger.kernel.org Signed-off-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In crb_acpi_add(), we get the TPM2 table to retrieve information
like start method, and then assign them to the priv data, so the
TPM2 table is not used after the init, should be freed, call
acpi_put_table() to fix the memory leak.
Depending on the memory configuration, isolate_freepages_block() may scan
pages out of the target range and causes panic.
Panic can occur on systems with multiple zones in a single pageblock.
The reason it is rare is that it only happens in special
configurations. Depending on how many similar systems there are, it
may be a good idea to fix this problem for older kernels as well.
The problem is that pfn as argument of fast_isolate_around() could be out
of the target range. Therefore we should consider the case where pfn <
start_pfn, and also the case where end_pfn < pfn.
This problem should have been addressd by the commit 061b56896dc8 ("mm,
compaction: make fast_isolate_freepages() stay within zone") but there was
an oversight.
Case1: pfn < start_pfn
<at memory compaction for node Y>
| node X's zone | node Y's zone
+-----------------+------------------------------...
pageblock ^ ^ ^
+-----------+-----------+-----------+-----------+...
^ ^ ^
^ ^ end_pfn
^ start_pfn = cc->zone->zone_start_pfn
pfn
<---------> scanned range by "Scan After"
Case2: end_pfn < pfn
<at memory compaction for node X>
| node X's zone | node Y's zone
+-----------------+------------------------------...
pageblock ^ ^ ^
+-----------+-----------+-----------+-----------+...
^ ^ ^
^ ^ pfn
^ end_pfn
start_pfn
<---------> scanned range by "Scan Before"
It seems that there is no good reason to skip nr_isolated pages just after
given pfn. So let perform simple scan from start to end instead of
dividing the scan into "Before" and "After".
Link: https://lkml.kernel.org/r/20221026112438.236336-1-a.naribayashi@fujitsu.com Fixes: 061b56896dc8 ("mm, compaction: make fast_isolate_freepages() stay within zone"). Signed-off-by: NARIBAYASHI Akira <a.naribayashi@fujitsu.com> Cc: David Rientjes <rientjes@google.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There's a crash in mempool_free when running the lvm test
shell/lvchange-rebuild-raid.sh.
The reason for the crash is this:
* super_written calls atomic_dec_and_test(&mddev->pending_writes) and
wake_up(&mddev->sb_wait). Then it calls rdev_dec_pending(rdev, mddev)
and bio_put(bio).
* so, the process that waited on sb_wait and that is woken up is racing
with bio_put(bio).
* if the process wins the race, it calls bioset_exit before bio_put(bio)
is executed.
* bio_put(bio) attempts to free a bio into a destroyed bio set - causing
a crash in mempool_free.
We fix this bug by moving bio_put before atomic_dec_and_test.
We also move rdev_dec_pending before atomic_dec_and_test as suggested by
Neil Brown.
The function md_end_flush has a similar bug - we must call bio_put before
we decrement the number of in-progress bios.
The propagate_mnt() function handles mount propagation when creating
mounts and propagates the source mount tree @source_mnt to all
applicable nodes of the destination propagation mount tree headed by
@dest_mnt.
Unfortunately it contains a bug where it fails to terminate at peers of
@source_mnt when looking up copies of the source mount that become
masters for copies of the source mount tree mounted on top of slaves in
the destination propagation tree causing a NULL dereference.
Once the mechanics of the bug are understood it's easy to trigger.
Because of unprivileged user namespaces it is available to unprivileged
users.
While fixing this bug we've gotten confused multiple times due to
unclear terminology or missing concepts. So let's start this with some
clarifications:
* The terms "master" or "peer" denote a shared mount. A shared mount
belongs to a peer group.
* A peer group is a set of shared mounts that propagate to each other.
They are identified by a peer group id. The peer group id is available
in @shared_mnt->mnt_group_id.
Shared mounts within the same peer group have the same peer group id.
The peers in a peer group can be reached via @shared_mnt->mnt_share.
* The terms "slave mount" or "dependent mount" denote a mount that
receives propagation from a peer in a peer group. IOW, shared mounts
may have slave mounts and slave mounts have shared mounts as their
master. Slave mounts of a given peer in a peer group are listed on
that peers slave list available at @shared_mnt->mnt_slave_list.
* The term "master mount" denotes a mount in a peer group. IOW, it
denotes a shared mount or a peer mount in a peer group. The term
"master mount" - or "master" for short - is mostly used when talking
in the context of slave mounts that receive propagation from a master
mount. A master mount of a slave identifies the closest peer group a
slave mount receives propagation from. The master mount of a slave can
be identified via @slave_mount->mnt_master. Different slaves may point
to different masters in the same peer group.
* Multiple peers in a peer group can have non-empty ->mnt_slave_lists.
Non-empty ->mnt_slave_lists of peers don't intersect. Consequently, to
ensure all slave mounts of a peer group are visited the
->mnt_slave_lists of all peers in a peer group have to be walked.
* Slave mounts point to a peer in the closest peer group they receive
propagation from via @slave_mnt->mnt_master (see above). Together with
these peers they form a propagation group (see below). The closest
peer group can thus be identified through the peer group id
@slave_mnt->mnt_master->mnt_group_id of the peer/master that a slave
mount receives propagation from.
* A shared-slave mount is a slave mount to a peer group pg1 while also
a peer in another peer group pg2. IOW, a peer group may receive
propagation from another peer group.
If a peer group pg1 is a slave to another peer group pg2 then all
peers in peer group pg1 point to the same peer in peer group pg2 via
->mnt_master. IOW, all peers in peer group pg1 appear on the same
->mnt_slave_list. IOW, they cannot be slaves to different peer groups.
* A pure slave mount is a slave mount that is a slave to a peer group
but is not a peer in another peer group.
* A propagation group denotes the set of mounts consisting of a single
peer group pg1 and all slave mounts and shared-slave mounts that point
to a peer in that peer group via ->mnt_master. IOW, all slave mounts
such that @slave_mnt->mnt_master->mnt_group_id is equal to
@shared_mnt->mnt_group_id.
The concept of a propagation group makes it easier to talk about a
single propagation level in a propagation tree.
For example, in propagate_mnt() the immediate peers of @dest_mnt and
all slaves of @dest_mnt's peer group form a propagation group propg1.
So a shared-slave mount that is a slave in propg1 and that is a peer
in another peer group pg2 forms another propagation group propg2
together with all slaves that point to that shared-slave mount in
their ->mnt_master.
* A propagation tree refers to all mounts that receive propagation
starting from a specific shared mount.
For example, for propagate_mnt() @dest_mnt is the start of a
propagation tree. The propagation tree ecompasses all mounts that
receive propagation from @dest_mnt's peer group down to the leafs.
With that out of the way let's get to the actual algorithm.
We know that @dest_mnt is guaranteed to be a pure shared mount or a
shared-slave mount. This is guaranteed by a check in
attach_recursive_mnt(). So propagate_mnt() will first propagate the
source mount tree to all peers in @dest_mnt's peer group:
for (n = next_peer(dest_mnt); n != dest_mnt; n = next_peer(n)) {
ret = propagate_one(n);
if (ret)
goto out;
}
Notice, that the peer propagation loop of propagate_mnt() doesn't
propagate @dest_mnt itself. @dest_mnt is mounted directly in
attach_recursive_mnt() after we propagated to the destination
propagation tree.
The mount that will be mounted on top of @dest_mnt is @source_mnt. This
copy was created earlier even before we entered attach_recursive_mnt()
and doesn't concern us a lot here.
It's just important to notice that when propagate_mnt() is called
@source_mnt will not yet have been mounted on top of @dest_mnt. Thus,
@source_mnt->mnt_parent will either still point to @source_mnt or - in
the case @source_mnt is moved and thus already attached - still to its
former parent.
For each peer @m in @dest_mnt's peer group propagate_one() will create a
new copy of the source mount tree and mount that copy @child on @m such
that @child->mnt_parent points to @m after propagate_one() returns.
propagate_one() will stash the last destination propagation node @m in
@last_dest and the last copy it created for the source mount tree in
@last_source.
Hence, if we call into propagate_one() again for the next destination
propagation node @m, @last_dest will point to the previous destination
propagation node and @last_source will point to the previous copy of the
source mount tree and mounted on @last_dest.
Each new copy of the source mount tree is created from the previous copy
of the source mount tree. This will become important later.
The peer loop in propagate_mnt() is straightforward. We iterate through
the peers copying and updating @last_source and @last_dest as we go
through them and mount each copy of the source mount tree @child on a
peer @m in @dest_mnt's peer group.
After propagate_mnt() handled the peers in @dest_mnt's peer group
propagate_mnt() will propagate the source mount tree down the
propagation tree that @dest_mnt's peer group propagates to:
for (m = next_group(dest_mnt, dest_mnt); m;
m = next_group(m, dest_mnt)) {
/* everything in that slave group */
n = m;
do {
ret = propagate_one(n);
if (ret)
goto out;
n = next_peer(n);
} while (n != m);
}
The next_group() helper will recursively walk the destination
propagation tree, descending into each propagation group of the
propagation tree.
The important part is that it takes care to propagate the source mount
tree to all peers in the peer group of a propagation group before it
propagates to the slaves to those peers in the propagation group. IOW,
it creates and mounts copies of the source mount tree that become
masters before it creates and mounts copies of the source mount tree
that become slaves to these masters.
It is important to remember that propagating the source mount tree to
each mount @m in the destination propagation tree simply means that we
create and mount new copies @child of the source mount tree on @m such
that @child->mnt_parent points to @m.
Since we know that each node @m in the destination propagation tree
headed by @dest_mnt's peer group will be overmounted with a copy of the
source mount tree and since we know that the propagation properties of
each copy of the source mount tree we create and mount at @m will mostly
mirror the propagation properties of @m. We can use that information to
create and mount the copies of the source mount tree that become masters
before their slaves.
The easy case is always when @m and @last_dest are peers in a peer group
of a given propagation group. In that case we know that we can simply
copy @last_source without having to figure out what the master for the
new copy @child of the source mount tree needs to be as we've done that
in a previous call to propagate_one().
The hard case is when we're dealing with a slave mount or a shared-slave
mount @m in a destination propagation group that we need to create and
mount a copy of the source mount tree on.
For each propagation group in the destination propagation tree we
propagate the source mount tree to we want to make sure that the copies
@child of the source mount tree we create and mount on slaves @m pick an
ealier copy of the source mount tree that we mounted on a master @m of
the destination propagation group as their master. This is a mouthful
but as far as we can tell that's the core of it all.
But, if we keep track of the masters in the destination propagation tree
@m we can use the information to find the correct master for each copy
of the source mount tree we create and mount at the slaves in the
destination propagation tree @m.
Let's walk through the base case as that's still fairly easy to grasp.
If we're dealing with the first slave in the propagation group that
@dest_mnt is in then we don't yet have marked any masters in the
destination propagation tree.
We know the master for the first slave to @dest_mnt's peer group is
simple @dest_mnt. So we expect this algorithm to yield a copy of the
source mount tree that was mounted on a peer in @dest_mnt's peer group
as the master for the copy of the source mount tree we want to mount at
the first slave @m:
for (n = m; ; n = p) {
p = n->mnt_master;
if (p == dest_master || IS_MNT_MARKED(p))
break;
}
For the first slave we walk the destination propagation tree all the way
up to a peer in @dest_mnt's peer group. IOW, the propagation hierarchy
can be walked by walking up the @mnt->mnt_master hierarchy of the
destination propagation tree @m. We will ultimately find a peer in
@dest_mnt's peer group and thus ultimately @dest_mnt->mnt_master.
Btw, here the assumption we listed at the beginning becomes important.
Namely, that peers in a peer group pg1 that are slaves in another peer
group pg2 appear on the same ->mnt_slave_list. IOW, all slaves who are
peers in peer group pg1 point to the same peer in peer group pg2 via
their ->mnt_master. Otherwise the termination condition in the code
above would be wrong and next_group() would be broken too.
So the first iteration sets:
n = m;
p = n->mnt_master;
such that @p now points to a peer or @dest_mnt itself. We walk up one
more level since we don't have any marked mounts. So we end up with:
n = dest_mnt;
p = dest_mnt->mnt_master;
If @dest_mnt's peer group is not slave to another peer group then @p is
now NULL. If @dest_mnt's peer group is a slave to another peer group
then @p now points to @dest_mnt->mnt_master points which is a master
outside the propagation tree we're dealing with.
Now we need to figure out the master for the copy of the source mount
tree we're about to create and mount on the first slave of @dest_mnt's
peer group:
do {
struct mount *parent = last_source->mnt_parent;
if (last_source == first_source)
break;
done = parent->mnt_master == p;
if (done && peers(n, parent))
break;
last_source = last_source->mnt_master;
} while (!done);
We know that @last_source->mnt_parent points to @last_dest and
@last_dest is the last peer in @dest_mnt's peer group we propagated to
in the peer loop in propagate_mnt().
Consequently, @last_source is the last copy we created and mount on that
last peer in @dest_mnt's peer group. So @last_source is the master we
want to pick.
We know that @last_source->mnt_parent->mnt_master points to
@last_dest->mnt_master. We also know that @last_dest->mnt_master is
either NULL or points to a master outside of the destination propagation
tree and so does @p. Hence:
done = parent->mnt_master == p;
is trivially true in the base condition.
We also know that for the first slave mount of @dest_mnt's peer group
that @last_dest either points @dest_mnt itself because it was
initialized to:
last_dest = dest_mnt;
at the beginning of propagate_mnt() or it will point to a peer of
@dest_mnt in its peer group. In both cases it is guaranteed that on the
first iteration @n and @parent are peers (Please note the check for
peers here as that's important.):
if (done && peers(n, parent))
break;
So, as we expected, we select @last_source, which referes to the last
copy of the source mount tree we mounted on the last peer in @dest_mnt's
peer group, as the master of the first slave in @dest_mnt's peer group.
The rest is taken care of by clone_mnt(last_source, ...). We'll skip
over that part otherwise this becomes a blogpost.
At the end of propagate_mnt() we now mark @m->mnt_master as the first
master in the destination propagation tree that is distinct from
@dest_mnt->mnt_master. IOW, we mark @dest_mnt itself as a master.
By marking @dest_mnt or one of it's peers we are able to easily find it
again when we later lookup masters for other copies of the source mount
tree we mount copies of the source mount tree on slaves @m to
@dest_mnt's peer group. This, in turn allows us to find the master we
selected for the copies of the source mount tree we mounted on master in
the destination propagation tree again.
The important part is to realize that the code makes use of the fact
that the last copy of the source mount tree stashed in @last_source was
mounted on top of the previous destination propagation node @last_dest.
What this means is that @last_source allows us to walk the destination
propagation hierarchy the same way each destination propagation node @m
does.
If we take @last_source, which is the copy of @source_mnt we have
mounted on @last_dest in the previous iteration of propagate_one(), then
we know @last_source->mnt_parent points to @last_dest but we also know
that as we walk through the destination propagation tree that
@last_source->mnt_master will point to an earlier copy of the source
mount tree we mounted one an earlier destination propagation node @m.
IOW, @last_source->mnt_parent will be our hook into the destination
propagation tree and each consecutive @last_source->mnt_master will lead
us to an earlier propagation node @m via
@last_source->mnt_master->mnt_parent.
Hence, by walking up @last_source->mnt_master, each of which is mounted
on a node that is a master @m in the destination propagation tree we can
also walk up the destination propagation hierarchy.
So, for each new destination propagation node @m we use the previous
copy of @last_source and the fact it's mounted on the previous
propagation node @last_dest via @last_source->mnt_master->mnt_parent to
determine what the master of the new copy of @last_source needs to be.
The goal is to find the _closest_ master that the new copy of the source
mount tree we are about to create and mount on a slave @m in the
destination propagation tree needs to pick. IOW, we want to find a
suitable master in the propagation group.
As the propagation structure of the source mount propagation tree we
create mirrors the propagation structure of the destination propagation
tree we can find @m's closest master - i.e., a marked master - which is
a peer in the closest peer group that @m receives propagation from. We
store that closest master of @m in @p as before and record the slave to
that master in @n
We then search for this master @p via @last_source by walking up the
master hierarchy starting from the last copy of the source mount tree
stored in @last_source that we created and mounted on the previous
destination propagation node @m.
We will try to find the master by walking @last_source->mnt_master and
by comparing @last_source->mnt_master->mnt_parent->mnt_master to @p. If
we find @p then we can figure out what earlier copy of the source mount
tree needs to be the master for the new copy of the source mount tree
we're about to create and mount at the current destination propagation
node @m.
If @last_source->mnt_master->mnt_parent and @n are peers then we know
that the closest master they receive propagation from is
@last_source->mnt_master->mnt_parent->mnt_master. If not then the
closest immediate peer group that they receive propagation from must be
one level higher up.
This builds on the earlier clarification at the beginning that all peers
in a peer group which are slaves of other peer groups all point to the
same ->mnt_master, i.e., appear on the same ->mnt_slave_list, of the
closest peer group that they receive propagation from.
However, terminating the walk has corner cases.
If the closest marked master for a given destination node @m cannot be
found by walking up the master hierarchy via @last_source->mnt_master
then we need to terminate the walk when we encounter @source_mnt again.
This isn't an arbitrary termination. It simply means that the new copy
of the source mount tree we're about to create has a copy of the source
mount tree we created and mounted on a peer in @dest_mnt's peer group as
its master. IOW, @source_mnt is the peer in the closest peer group that
the new copy of the source mount tree receives propagation from.
We absolutely have to stop @source_mnt because @last_source->mnt_master
either points outside the propagation hierarchy we're dealing with or it
is NULL because @source_mnt isn't a shared-slave.
So continuing the walk past @source_mnt would cause a NULL dereference
via @last_source->mnt_master->mnt_parent. And so we have to stop the
walk when we encounter @source_mnt again.
One scenario where this can happen is when we first handled a series of
slaves of @dest_mnt's peer group and then encounter peers in a new peer
group that is a slave to @dest_mnt's peer group. We handle them and then
we encounter another slave mount to @dest_mnt that is a pure slave to
@dest_mnt's peer group. That pure slave will have a peer in @dest_mnt's
peer group as its master. Consequently, the new copy of the source mount
tree will need to have @source_mnt as it's master. So we walk the
propagation hierarchy all the way up to @source_mnt based on
@last_source->mnt_master.
So terminate on @source_mnt, easy peasy. Except, that the check misses
something that the rest of the algorithm already handles.
If @dest_mnt has peers in it's peer group the peer loop in
propagate_mnt():
for (n = next_peer(dest_mnt); n != dest_mnt; n = next_peer(n)) {
ret = propagate_one(n);
if (ret)
goto out;
}
will consecutively update @last_source with each previous copy of the
source mount tree we created and mounted at the previous peer in
@dest_mnt's peer group. So after that loop terminates @last_source will
point to whatever copy of the source mount tree was created and mounted
on the last peer in @dest_mnt's peer group.
Furthermore, if there is even a single additional peer in @dest_mnt's
peer group then @last_source will __not__ point to @source_mnt anymore.
Because, as we mentioned above, @dest_mnt isn't even handled in this
loop but directly in attach_recursive_mnt(). So it can't even accidently
come last in that peer loop.
So the first time we handle a slave mount @m of @dest_mnt's peer group
the copy of the source mount tree we create will make the __last copy of
the source mount tree we created and mounted on the last peer in
@dest_mnt's peer group the master of the new copy of the source mount
tree we create and mount on the first slave of @dest_mnt's peer group__.
But this means that the termination condition that checks for
@source_mnt is wrong. The @source_mnt cannot be found anymore by
propagate_one(). Instead it will find the last copy of the source mount
tree we created and mounted for the last peer of @dest_mnt's peer group
again. And that is a peer of @source_mnt not @source_mnt itself.
IOW, we fail to terminate the loop correctly and ultimately dereference
@last_source->mnt_master->mnt_parent. When @source_mnt's peer group
isn't slave to another peer group then @last_source->mnt_master is NULL
causing the splat below.
For example, assume @dest_mnt is a pure shared mount and has three peers
in its peer group:
After this sequence has been processed @last_source will point to (P3),
the copy generated for the third peer in @dest_mnt's peer group we
handled. So the copy of the source mount tree (P4) we create and mount
on the first slave of @dest_mnt's peer group:
will pick the last copy of the source mount tree (P3) as master, not (S0).
When walking the propagation hierarchy via @last_source's master
hierarchy we encounter (P3) but not (S0), i.e., @source_mnt.
We can fix this in multiple ways:
(1) By setting @last_source to @source_mnt after we processed the peers
in @dest_mnt's peer group right after the peer loop in
propagate_mnt().
(2) By changing the termination condition that relies on finding exactly
@source_mnt to finding a peer of @source_mnt.
(3) By only moving @last_source when we actually venture into a new peer
group or some clever variant thereof.
The first two options are minimally invasive and what we want as a fix.
The third option is more intrusive but something we'd like to explore in
the near future.
This passes all LTP tests and specifically the mount propagation
testsuite part of it. It also holds up against all known reproducers of
this issues.
Final words.
First, this is a clever but __worringly__ underdocumented algorithm.
There isn't a single detailed comment to be found in next_group(),
propagate_one() or anywhere else in that file for that matter. This has
been a giant pain to understand and work through and a bug like this is
insanely difficult to fix without a detailed understanding of what's
happening. Let's not talk about the amount of time that was sunk into
fixing this.
Second, all the cool kids with access to
unshare --mount --user --map-root --propagation=unchanged
are going to have a lot of fun. IOW, triggerable by unprivileged users
while namespace_lock() lock is held.
Correctly calculate available space including the size of the chunk
buffer. This fixes a buffer overflow when multiple MIDI sysex
messages are sent to a PODxt device.
A PODxt device sends 0xb2, 0xc2 or 0xf2 as a status byte for MIDI
messages over USB that should otherwise have a 0xb0, 0xc0 or 0xf0
status byte. This is usually corrected by the driver on other OSes.
There is a wrong case of link() on overlay:
$ mkdir /lower /fuse /merge
$ mount -t fuse /fuse
$ mkdir /fuse/upper /fuse/work
$ mount -t overlay /merge -o lowerdir=/lower,upperdir=/fuse/upper,\
workdir=work
$ touch /merge/file
$ chown bin.bin /merge/file // the file's caller becomes "bin"
$ ln /merge/file /merge/lnkfile
Then we will get an error(EACCES) because fuse daemon checks the link()'s
caller is "bin", it denied this request.
In the changing history of ovl_link(), there are two key commits:
The first is commit ebf87ef2002f ("ovl: fix sgid on directory") which
overrides the cred's fsuid/fsgid using the new inode. The new inode's
owner is initialized by inode_init_owner(), and inode->fsuid is
assigned to the current user. So the override fsuid becomes the
current user. We know link() is actually modifying the directory, so
the caller must have the MAY_WRITE permission on the directory. The
current caller may should have this permission. This is acceptable
to use the caller's fsuid.
The second is commit 9de35335a5b1 ("ovl: share inode for hard link")
which removed the inode creation in ovl_link(). This commit move
inode_init_owner() into ovl_create_object(), so the ovl_link() just
give the old inode to ovl_create_or_link(). Then the override fsuid
becomes the old inode's fsuid, neither the caller nor the overlay's
mounter! So this is incorrect.
Fix this bug by using ovl mounter's fsuid/fsgid to do underlying
fs's link().
I no longer work for Plantronics (aka Poly, aka HP) and do not have
access to the headsets in order to test. However, as noted by Maxim,
the other 32xx models that share the same base code set as the 3220
would need the same quirk. This patch adds the PIDs for the rest of
the Blackwire 32XX product family that require the quirk.
Plantronics Blackwire 3210 Series (047f:c055)
Plantronics Blackwire 3215 Series (047f:c057)
Plantronics Blackwire 3225 Series (047f:c058)
Quote from previous patch by Maxim Mikityanskiy
Plantronics Blackwire 3220 Series (047f:c056) sends HID reports twice
for each volume key press. This patch adds a quirk to hid-plantronics
for this product ID, which will ignore the second volume key press if
it happens within 5 ms from the last one that was handled.
The patch was tested on the mentioned model only, it shouldn't affect
other models, however, this quirk might be needed for them too.
Auto-repeat (when a key is held pressed) is not affected, because the
rate is about 3 times per second, which is far less frequent than once
in 5 ms.
End quote
rtas_os_term() is called during panic. Its behavior depends on a couple
of conditions in the /rtas node of the device tree, the traversal of
which entails locking and local IRQ state changes. If the kernel panics
while devtree_lock is held, rtas_os_term() as currently written could
hang.
Instead of discovering the relevant characteristics at panic time,
cache them in file-static variables at boot. Note the lookup for
"ibm,extended-os-term" is converted to of_property_read_bool() since it
is a boolean property, not an RTAS function token.
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
[mpe: Incorporate suggested change from Nick] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221118150751.469393-4-nathanl@linux.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
Since kernel 5.3.4 my laptop (ICH8M controller) does not see Kingston
SV300S37A60G SSD disk connected into a SATA connector on wake from
suspend. The problem was introduced in 9562ffa77461 ("libata/ahci: Drop
PCS quirk for Denverton and beyond"): the quirk is not applied on wake
from suspend as it originally was.
It is worth to mention the commit contained another bug: the quirk is
not applied at all to controllers which require it. The fix commit 19e84f7467da ("libata/ahci: Fix PCS quirk application") landed in 5.3.8.
So testing my patch anywhere between commits 9562ffa77461 and 19e84f7467da is pointless.
Not all disks trigger the problem. For example nothing bad happens with
Western Digital WD5000LPCX HDD.
Test hardware:
- Acer 5920G with ICH8M SATA controller
- sda: some SATA HDD connnected into the DVD drive IDE port with a
SATA-IDE caddy. It is a boot disk
- sdb: Kingston SV300S37A60G SSD connected into the only SATA port
sd 0:0:0:0: [sda] Starting disk
sd 2:0:0:0: [sdb] Starting disk
ata4: SATA link down (SStatus 4 SControl 300)
ata3: SATA link down (SStatus 4 SControl 300)
ata1.00: ACPI cmd ef/03:0c:00:00:00:a0 (SET FEATURES) filtered out
ata1.00: ACPI cmd ef/03:42:00:00:00:a0 (SET FEATURES) filtered out
ata1: FORCE: cable set to 80c
ata5: SATA link down (SStatus 0 SControl 300)
ata3: SATA link down (SStatus 4 SControl 300)
ata3: SATA link down (SStatus 4 SControl 300)
ata3.00: disabled
sd 2:0:0:0: rejecting I/O to offline device
ata3.00: detaching (SCSI 2:0:0:0)
sd 2:0:0:0: [sdb] Start/Stop Unit failed: Result: hostbyte=DID_NO_CONNECT
driverbyte=DRIVER_OK
sd 2:0:0:0: [sdb] Synchronizing SCSI cache
sd 2:0:0:0: [sdb] Synchronize Cache(10) failed: Result:
hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 2:0:0:0: [sdb] Stopping disk
sd 2:0:0:0: [sdb] Start/Stop Unit failed: Result: hostbyte=DID_BAD_TARGET
driverbyte=DRIVER_OK
Commit 9562ffa77461 dropped ahci_pci_reset_controller() which internally
calls ahci_reset_controller() and applies the PCS quirk if needed after
that. It was called each time a reset was required instead of just
ahci_reset_controller(). This patch puts the function back in place.
Fixes: 9562ffa77461 ("libata/ahci: Drop PCS quirk for Denverton and beyond") Signed-off-by: Adam Vodopjan <grozzly@protonmail.com> Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
When using shadow doorbells, the event index and the doorbell values are
written to host memory. Prior to this patch, the values written would
erroneously be written in host endianness. This causes trouble on
big-endian platforms. Fix this by adding missing endian conversions.
This issue was noticed by Guenter while testing various big-endian
platforms under QEMU[1]. A similar fix required for hw/nvme in QEMU is
up for review as well[2].
This is because in scatterwalk_copychunks(), we attempted to write to
a buffer (@sign) that was allocated in the stack (vmalloc area) by
crypt_message() and thus accessing its remaining 8 (x2) bytes ended up
crossing a page boundary.
To simply fix it, we could just pass @sign kmalloc'd from
crypt_message() and then we're done. Luckily, we don't seem to pass
any other vmalloc'd buffers in smb_rqst::rq_iov...
Instead, let's map the correct pages and offsets from vmalloc buffers
as well in cifs_sg_set_buf() and then avoiding such oopses.
Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Cc: stable@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
When updating the operating mode as part of regulator enable, the caller
has already locked the regulator tree and drms_uA_update() must not try
to do the same in order not to trigger a deadlock.
The lock inversion is reported by lockdep as:
======================================================
WARNING: possible circular locking dependency detected
6.1.0-next-20221215 #142 Not tainted
------------------------------------------------------
udevd/154 is trying to acquire lock: ffffc11f123d7e50 (regulator_list_mutex){+.+.}-{3:3}, at: regulator_lock_dependent+0x54/0x280
but task is already holding lock: ffff80000e4c36e8 (regulator_ww_class_acquire){+.+.}-{0:0}, at: regulator_enable+0x34/0x80
Prior to commit 5ba50eec5b48 ("iio: adc128s052: add ACPI _HID
AANT1280"), the driver unconditionally used spi_get_device_id() to get
the index into the adc128_config array.
However, with that commit, OF-based boards now incorrectly treat all
supported sensors as if they are an adc128s052, because all the .data
members of the adc128_of_match table are implicitly 0. Our board,
which has an adc122s021, thus exposes 8 channels whereas it really
only has two.
Drop 'mlock' usage by making use of iio_device_claim_direct_mode().
This change actually makes sure we cannot do a single conversion while
buffering is enable. Note there was a potential race in the previous
code since we were only acquiring the lock after checking if the bus is
enabled.
Fixes: 73ddaa6b2f8a ("iio:adc: Add common code for ADI Sigma Delta devices") Signed-off-by: Nuno Sá <nuno.sa@analog.com> Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com> Cc: <Stable@vger.kernel.org> #No rush as race is very old. Link: https://lore.kernel.org/r/20220920112821.975359-2-nuno.sa@analog.com Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit c898ba4ee8d8 ("reiserfs: add atomic addition of selinux attributes
during inode creation") defined reiserfs_security_free() to free the name
and value of a security xattr allocated by the active LSM through
security_old_inode_init_security(). However, this function is not called
in the reiserfs code.
Thus, add a call to reiserfs_security_free() whenever
reiserfs_security_init() is called, and initialize value to NULL, to avoid
to call kfree() on an uninitialized pointer.
Finally, remove the kfree() for the xattr name, as it is not allocated
anymore.
Fixes: c898ba4ee8d8 ("reiserfs: add atomic addition of selinux attributes during inode creation") Cc: stable@vger.kernel.org Cc: Jeff Mahoney <jeffm@suse.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reported-by: Mimi Zohar <zohar@linux.ibm.com> Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com> Reviewed-by: Mimi Zohar <zohar@linux.ibm.com> Signed-off-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Some Wacom devices have a special "bootloader" mode that is used for
firmware flashing. When operating in this mode, the device cannot be
used for input, and the HID descriptor is not able to be processed by
the driver. The driver generates an "Unknown device_type" warning and
then returns an error code from wacom_probe(). This is a problem because
userspace still needs to be able to interact with the device via hidraw
to perform the firmware flash.
This commit adds a non-generic device definition for 056a:0094 which
is used when devices are in "bootloader" mode. It marks the devices
with a special BOOTLOADER type that is recognized by wacom_probe() and
wacom_raw_event(). When we see this type we ensure a hidraw device is
created and otherwise keep our hands off so that userspace is in full
control.
Since commit 192e7b4fa42c ("usb: dwc3: Don't switch OTG -> peripheral
if extcon is present"), Dual Role support on Intel Merrifield platform
broke due to rearranging the call to dwc3_get_extcon().
It appears to be caused by ulpi_read_id() masking the timeout on the first
test write. In the past dwc3 probe continued by calling dwc3_core_soft_reset()
followed by dwc3_get_extcon() which happend to return -EPROBE_DEFER.
On deferred probe ulpi_read_id() finally succeeded. Due to above mentioned
rearranging -EPROBE_DEFER is not returned and probe completes without phy.
On Intel Merrifield the timeout on the first test write issue is reproducible
but it is difficult to find the root cause. Using a mainline kernel and
rootfs with buildroot ulpi_read_id() succeeds. As soon as adding
ftrace / bootconfig to find out why, ulpi_read_id() fails and we can't
analyze the flow. Using another rootfs ulpi_read_id() fails even without
adding ftrace. We suspect the issue is some kind of timing / race, but
merely retrying ulpi_read_id() does not resolve the issue.
As we now changed ulpi_read_id() to return -ETIMEDOUT in this case, we
need to handle the error by calling dwc3_core_soft_reset() and request
-EPROBE_DEFER. On deferred probe ulpi_read_id() is retried and succeeds.
HDMI audio is not working on the HP EliteDesk 800 G6 because the pin is
unconnected. This issue can be resolved by using the 'hdajackretask'
tool to override the unconnected pin to force it to connect.
In commit 76d62f24db07 ("pstore: Switch pmsg_lock to an rt_mutex
to avoid priority inversion") I changed a lock to an rt_mutex.
However, its possible that CONFIG_RT_MUTEXES is not enabled,
which then results in a build failure, as the 0day bot detected:
https://lore.kernel.org/linux-mm/202212211244.TwzWZD3H-lkp@intel.com/
Thus this patch changes CONFIG_PSTORE_PMSG to select
CONFIG_RT_MUTEXES, which ensures the build will not fail.
Cc: Wei Wang <wvw@google.com> Cc: Midas Chien<midaschieh@google.com> Cc: Connor O'Brien <connoro@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: Anton Vorontsov <anton@enomsg.org> Cc: Colin Cross <ccross@android.com> Cc: Tony Luck <tony.luck@intel.com> Cc: kernel test robot <lkp@intel.com> Cc: kernel-team@android.com Fixes: 76d62f24db07 ("pstore: Switch pmsg_lock to an rt_mutex to avoid priority inversion") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: John Stultz <jstultz@google.com> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20221221051855.15761-1-jstultz@google.com Signed-off-by: Sasha Levin <sashal@kernel.org>