[ 44.211730] ==================================================================
[ 44.212045] BUG: KASAN: slab-out-of-bounds in memcmp+0x8b/0xb0
[ 44.212045] Read of size 8 at addr ffff88800870f320 by task poc.xfrm/97
[ 44.212045]
[ 44.212045] CPU: 0 PID: 97 Comm: poc.xfrm Not tainted 6.4.0-rc7-00072-gdad9774deaf1-dirty #4
[ 44.212045] Call Trace:
[ 44.212045] <TASK>
[ 44.212045] dump_stack_lvl+0x37/0x50
[ 44.212045] print_report+0xcc/0x620
[ 44.212045] ? __virt_addr_valid+0xf3/0x170
[ 44.212045] ? memcmp+0x8b/0xb0
[ 44.212045] kasan_report+0xb2/0xe0
[ 44.212045] ? memcmp+0x8b/0xb0
[ 44.212045] kasan_check_range+0x39/0x1c0
[ 44.212045] memcmp+0x8b/0xb0
[ 44.212045] xfrm_state_walk+0x21c/0x420
[ 44.212045] ? __pfx_dump_one_state+0x10/0x10
[ 44.212045] xfrm_dump_sa+0x1e2/0x290
[ 44.212045] ? __pfx_xfrm_dump_sa+0x10/0x10
[ 44.212045] ? __kernel_text_address+0xd/0x40
[ 44.212045] ? kasan_unpoison+0x27/0x60
[ 44.212045] ? mutex_lock+0x60/0xe0
[ 44.212045] ? __pfx_mutex_lock+0x10/0x10
[ 44.212045] ? kasan_save_stack+0x22/0x50
[ 44.212045] netlink_dump+0x322/0x6c0
[ 44.212045] ? __pfx_netlink_dump+0x10/0x10
[ 44.212045] ? mutex_unlock+0x7f/0xd0
[ 44.212045] ? __pfx_mutex_unlock+0x10/0x10
[ 44.212045] __netlink_dump_start+0x353/0x430
[ 44.212045] xfrm_user_rcv_msg+0x3a4/0x410
[ 44.212045] ? __pfx__raw_spin_lock_irqsave+0x10/0x10
[ 44.212045] ? __pfx_xfrm_user_rcv_msg+0x10/0x10
[ 44.212045] ? __pfx_xfrm_dump_sa+0x10/0x10
[ 44.212045] ? __pfx_xfrm_dump_sa_done+0x10/0x10
[ 44.212045] ? __stack_depot_save+0x382/0x4e0
[ 44.212045] ? filter_irq_stacks+0x1c/0x70
[ 44.212045] ? kasan_save_stack+0x32/0x50
[ 44.212045] ? kasan_save_stack+0x22/0x50
[ 44.212045] ? kasan_set_track+0x25/0x30
[ 44.212045] ? __kasan_slab_alloc+0x59/0x70
[ 44.212045] ? kmem_cache_alloc_node+0xf7/0x260
[ 44.212045] ? kmalloc_reserve+0xab/0x120
[ 44.212045] ? __alloc_skb+0xcf/0x210
[ 44.212045] ? netlink_sendmsg+0x509/0x700
[ 44.212045] ? sock_sendmsg+0xde/0xe0
[ 44.212045] ? __sys_sendto+0x18d/0x230
[ 44.212045] ? __x64_sys_sendto+0x71/0x90
[ 44.212045] ? do_syscall_64+0x3f/0x90
[ 44.212045] ? entry_SYSCALL_64_after_hwframe+0x72/0xdc
[ 44.212045] ? netlink_sendmsg+0x509/0x700
[ 44.212045] ? sock_sendmsg+0xde/0xe0
[ 44.212045] ? __sys_sendto+0x18d/0x230
[ 44.212045] ? __x64_sys_sendto+0x71/0x90
[ 44.212045] ? do_syscall_64+0x3f/0x90
[ 44.212045] ? entry_SYSCALL_64_after_hwframe+0x72/0xdc
[ 44.212045] ? kasan_save_stack+0x22/0x50
[ 44.212045] ? kasan_set_track+0x25/0x30
[ 44.212045] ? kasan_save_free_info+0x2e/0x50
[ 44.212045] ? __kasan_slab_free+0x10a/0x190
[ 44.212045] ? kmem_cache_free+0x9c/0x340
[ 44.212045] ? netlink_recvmsg+0x23c/0x660
[ 44.212045] ? sock_recvmsg+0xeb/0xf0
[ 44.212045] ? __sys_recvfrom+0x13c/0x1f0
[ 44.212045] ? __x64_sys_recvfrom+0x71/0x90
[ 44.212045] ? do_syscall_64+0x3f/0x90
[ 44.212045] ? entry_SYSCALL_64_after_hwframe+0x72/0xdc
[ 44.212045] ? copyout+0x3e/0x50
[ 44.212045] netlink_rcv_skb+0xd6/0x210
[ 44.212045] ? __pfx_xfrm_user_rcv_msg+0x10/0x10
[ 44.212045] ? __pfx_netlink_rcv_skb+0x10/0x10
[ 44.212045] ? __pfx_sock_has_perm+0x10/0x10
[ 44.212045] ? mutex_lock+0x8d/0xe0
[ 44.212045] ? __pfx_mutex_lock+0x10/0x10
[ 44.212045] xfrm_netlink_rcv+0x44/0x50
[ 44.212045] netlink_unicast+0x36f/0x4c0
[ 44.212045] ? __pfx_netlink_unicast+0x10/0x10
[ 44.212045] ? netlink_recvmsg+0x500/0x660
[ 44.212045] netlink_sendmsg+0x3b7/0x700
[ 44.212045] ? __pfx_netlink_sendmsg+0x10/0x10
[ 44.212045] ? __pfx_netlink_sendmsg+0x10/0x10
[ 44.212045] sock_sendmsg+0xde/0xe0
[ 44.212045] __sys_sendto+0x18d/0x230
[ 44.212045] ? __pfx___sys_sendto+0x10/0x10
[ 44.212045] ? rcu_core+0x44a/0xe10
[ 44.212045] ? __rseq_handle_notify_resume+0x45b/0x740
[ 44.212045] ? _raw_spin_lock_irq+0x81/0xe0
[ 44.212045] ? __pfx___rseq_handle_notify_resume+0x10/0x10
[ 44.212045] ? __pfx_restore_fpregs_from_fpstate+0x10/0x10
[ 44.212045] ? __pfx_blkcg_maybe_throttle_current+0x10/0x10
[ 44.212045] ? __pfx_task_work_run+0x10/0x10
[ 44.212045] __x64_sys_sendto+0x71/0x90
[ 44.212045] do_syscall_64+0x3f/0x90
[ 44.212045] entry_SYSCALL_64_after_hwframe+0x72/0xdc
[ 44.212045] RIP: 0033:0x44b7da
[ 44.212045] RSP: 002b:00007ffdc8838548 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 44.212045] RAX: ffffffffffffffda RBX: 00007ffdc8839978 RCX: 000000000044b7da
[ 44.212045] RDX: 0000000000000038 RSI: 00007ffdc8838770 RDI: 0000000000000003
[ 44.212045] RBP: 00007ffdc88385b0 R08: 00007ffdc883858c R09: 000000000000000c
[ 44.212045] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
[ 44.212045] R13: 00007ffdc8839968 R14: 00000000004c37d0 R15: 0000000000000001
[ 44.212045] </TASK>
[ 44.212045]
[ 44.212045] Allocated by task 97:
[ 44.212045] kasan_save_stack+0x22/0x50
[ 44.212045] kasan_set_track+0x25/0x30
[ 44.212045] __kasan_kmalloc+0x7f/0x90
[ 44.212045] __kmalloc_node_track_caller+0x5b/0x140
[ 44.212045] kmemdup+0x21/0x50
[ 44.212045] xfrm_dump_sa+0x17d/0x290
[ 44.212045] netlink_dump+0x322/0x6c0
[ 44.212045] __netlink_dump_start+0x353/0x430
[ 44.212045] xfrm_user_rcv_msg+0x3a4/0x410
[ 44.212045] netlink_rcv_skb+0xd6/0x210
[ 44.212045] xfrm_netlink_rcv+0x44/0x50
[ 44.212045] netlink_unicast+0x36f/0x4c0
[ 44.212045] netlink_sendmsg+0x3b7/0x700
[ 44.212045] sock_sendmsg+0xde/0xe0
[ 44.212045] __sys_sendto+0x18d/0x230
[ 44.212045] __x64_sys_sendto+0x71/0x90
[ 44.212045] do_syscall_64+0x3f/0x90
[ 44.212045] entry_SYSCALL_64_after_hwframe+0x72/0xdc
[ 44.212045]
[ 44.212045] The buggy address belongs to the object at ffff88800870f300
[ 44.212045] which belongs to the cache kmalloc-64 of size 64
[ 44.212045] The buggy address is located 32 bytes inside of
[ 44.212045] allocated 36-byte region [ffff88800870f300, ffff88800870f324)
[ 44.212045]
[ 44.212045] The buggy address belongs to the physical page:
[ 44.212045] page:00000000e4de16ee refcount:1 mapcount:0 mapping:000000000 ...
[ 44.212045] flags: 0x100000000000200(slab|node=0|zone=1)
[ 44.212045] page_type: 0xffffffff()
[ 44.212045] raw: 0100000000000200ffff888004c41640dead0000000001220000000000000000
[ 44.212045] raw: 0000000000000000000000008020002000000001ffffffff0000000000000000
[ 44.212045] page dumped because: kasan: bad access detected
[ 44.212045]
[ 44.212045] Memory state around the buggy address:
[ 44.212045] ffff88800870f200: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
[ 44.212045] ffff88800870f280: 00 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc
[ 44.212045] >ffff88800870f300: 00 00 00 00 04 fc fc fc fc fc fc fc fc fc fc fc
[ 44.212045] ^
[ 44.212045] ffff88800870f380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 44.212045] ffff88800870f400: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 44.212045] ==================================================================
By investigating the code, we find the root cause of this OOB is the lack
of checks in xfrm_dump_sa(). The buggy code allows a malicious user to pass
arbitrary value of filter->splen/dplen. Hence, with crafted xfrm states,
the attacker can achieve 8 bytes heap OOB read, which causes info leak.
if (attrs[XFRMA_ADDRESS_FILTER]) {
filter = kmemdup(nla_data(attrs[XFRMA_ADDRESS_FILTER]),
sizeof(*filter), GFP_KERNEL);
if (filter == NULL)
return -ENOMEM;
// NO MORE CHECKS HERE !!!
}
This patch fixes the OOB by adding necessary boundary checks, just like
the code in pfkey_dump() function.
Fixes: d3623099d350 ("ipsec: add support of limited SA dump") Signed-off-by: Lin Ma <linma@zju.edu.cn> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
In the I2C_FUNC_SMBUS_BLOCK_DATA case, the invalid length byte value
(outside of 1-32) of the SMBus block data response from the Slave device
is not correctly handled by the I2C Designware driver.
In case IC_EMPTYFIFO_HOLD_MASTER_EN==1, which cannot be detected
from the registers, the Master can be disabled only if the STOP bit
is set. Without STOP bit set, the Master remains active, holding the bus
until receiving a block data response length. This hangs the bus and
is unrecoverable.
Avoid this by issuing another dump read to reach the stop condition when
an invalid length byte is received.
Cc: stable@vger.kernel.org Signed-off-by: Tam Nguyen <tamnguyenchi@os.amperecomputing.com> Acked-by: Jarkko Nikula <jarkko.nikula@linux.intel.com> Link: https://lore.kernel.org/r/20230726080001.337353-3-tamnguyenchi@os.amperecomputing.com Reviewed-by: Andi Shyti <andi.shyti@kernel.org> Signed-off-by: Wolfram Sang <wsa@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 0daede80f870 ("i2c: designware: Convert driver to using regmap API")
changes the logic to validate the whole 32-bit return value of
DW_IC_DATA_CMD register instead of 8-bit LSB without reason.
Later, commit f53f15ba5a85 ("i2c: designware: Get right data length"),
introduced partial fix but not enough because the "tmp > 0" still test
tmp as 32-bit value and is wrong in case the IC_DATA_CMD[11] is set.
Revert the logic to just before commit 0daede80f870
("i2c: designware: Convert driver to using regmap API").
Fixes: f53f15ba5a85 ("i2c: designware: Get right data length") Fixes: 0daede80f870 ("i2c: designware: Convert driver to using regmap API") Cc: stable@vger.kernel.org Signed-off-by: Tam Nguyen <tamnguyenchi@os.amperecomputing.com> Signed-off-by: Quan Nguyen <quan@os.amperecomputing.com> Acked-by: Jarkko Nikula <jarkko.nikula@linux.intel.com> Link: https://lore.kernel.org/r/20230726080001.337353-2-tamnguyenchi@os.amperecomputing.com Reviewed-by: Andi Shyti <andi.shyti@kernel.org> Signed-off-by: Wolfram Sang <wsa@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pausing and canceling balance can race to interrupt balance lead to BUG_ON
panic in btrfs_cancel_balance. The BUG_ON condition in btrfs_cancel_balance
does not take this race scenario into account.
However, the race condition has no other side effects. We can fix that.
In production we were seeing a variety of WARN_ON()'s in the extent_map
code, specifically in btrfs_drop_extent_map_range() when we have to call
add_extent_mapping() for our second split.
Consider the following extent map layout
PINNED
[0 16K) [32K, 48K)
and then we call btrfs_drop_extent_map_range for [0, 36K), with
skip_pinned == true. The initial loop will have
start = 0
end = 36K
len = 36K
we will find the [0, 16k) extent, but since we are pinned we will skip
it, which has this code
start = em_end;
if (end != (u64)-1)
len = start + len - em_end;
em_end here is 16K, so now the values are
start = 16K
len = 16K + 36K - 16K = 36K
len should instead be 20K. This is a problem when we find the next
extent at [32K, 48K), we need to split this extent to leave [36K, 48k),
however the code for the split looks like this
em_end = 48K
split->start = 16K + 36K // this should be 16K + 20K
split->len = 48K - (16K + 36K) // this overflows as 16K + 36K is 52K
and now we have an invalid extent_map in the tree that potentially
overlaps other entries in the extent map. Even in the non-overlapping
case we will have split->start set improperly, which will cause problems
with any block related calculations.
We don't actually need len in this loop, we can simply use end as our
end point, and only adjust start up when we find a pinned extent we need
to skip.
Adjust the logic to do this, which keeps us from inserting an invalid
extent map.
We only skip_pinned in the relocation case, so this is relatively rare,
except in the case where you are running relocation a lot, which can
happen with auto relocation on.
Fixes: 55ef68990029 ("Btrfs: Fix btrfs_drop_extent_cache for skip pinned case") CC: stable@vger.kernel.org # 4.14+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Do not read the data register to clear the error flags for lpuart32
platforms, the additional read may cause the receive FIFO underflow
since the DMA has already read the data register.
Actually all lpuart32 platforms support write 1 to clear those error
bits, let's use this method to better clear the error flags.
Fixes: 42b68768e51b ("serial: fsl_lpuart: DMA support for 32-bit variant") Cc: stable <stable@kernel.org> Signed-off-by: Sherry Sun <sherry.sun@nxp.com> Link: https://lore.kernel.org/r/20230801022304.24251-1-sherry.sun@nxp.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In commit 9b9c8195f3f0 ("tty: n_gsm: fix UAF in gsm_cleanup_mux"), the UAF
problem is not completely fixed. There is a race condition in
gsm_cleanup_mux(), which caused this UAF.
The UAF problem is triggered by the following race:
task[5046] task[5054]
----------------------- -----------------------
gsm_cleanup_mux();
dlci = gsm->dlci[0];
mutex_lock(&gsm->mutex);
gsm_cleanup_mux();
dlci = gsm->dlci[0]; //Didn't take the lock
gsm_dlci_release(gsm->dlci[i]);
gsm->dlci[i] = NULL;
mutex_unlock(&gsm->mutex);
mutex_lock(&gsm->mutex);
dlci->dead = true; //UAF
The vdpa_nl_policy structure is used to validate the nlattr when parsing
the incoming nlmsg. It will ensure the attribute being described produces
a valid nlattr pointer in info->attrs before entering into each handler
in vdpa_nl_ops.
That is to say, the missing part in vdpa_nl_policy may lead to illegal
nlattr after parsing, which could lead to OOB read just like CVE-2023-3773.
This patch adds the missing nla_policy for vdpa max vqp attr to avoid
such bugs.
Fixes: ad69dd0bf26b ("vdpa: Introduce query of device config layout") Signed-off-by: Lin Ma <linma@zju.edu.cn> Cc: stable@vger.kernel.org
Message-Id: <20230727175757.73988-7-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The vdpa_nl_policy structure is used to validate the nlattr when parsing
the incoming nlmsg. It will ensure the attribute being described produces
a valid nlattr pointer in info->attrs before entering into each handler
in vdpa_nl_ops.
That is to say, the missing part in vdpa_nl_policy may lead to illegal
nlattr after parsing, which could lead to OOB read just like CVE-2023-3773.
This patch adds the missing nla_policy for vdpa queue index attr to avoid
such bugs.
Fixes: 13b00b135665 ("vdpa: Add support for querying vendor statistics") Signed-off-by: Lin Ma <linma@zju.edu.cn> Cc: stable@vger.kernelorg
Message-Id: <20230727175757.73988-5-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The vdpa_nl_policy structure is used to validate the nlattr when parsing
the incoming nlmsg. It will ensure the attribute being described produces
a valid nlattr pointer in info->attrs before entering into each handler
in vdpa_nl_ops.
That is to say, the missing part in vdpa_nl_policy may lead to illegal
nlattr after parsing, which could lead to OOB read just like CVE-2023-3773.
This patch adds the missing nla_policy for vdpa features attr to avoid
such bugs.
Fixes: 90fea5a800c3 ("vdpa: device feature provisioning") Signed-off-by: Lin Ma <linma@zju.edu.cn> Cc: stable@vger.kernel.org
Message-Id: <20230727175757.73988-3-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
With hardened usercopy enabled (CONFIG_HARDENED_USERCOPY=y), using the
/proc/powerpc/rtas/firmware_update interface to prepare a system
firmware update yields a BUG():
The blocks of the firmware image are copied directly from user memory
to objects allocated from flash_block_cache, so flash_block_cache must
be created using kmem_cache_create_usercopy() to mark it safe for user
access.
Fixes: 6d07d1cd300f ("usercopy: Restrict non-usercopy caches to size 0") Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Reviewed-by: Kees Cook <keescook@chromium.org>
[mpe: Trim and indent oops] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230810-rtas-flash-vs-hardened-usercopy-v2-1-dcf63793a938@linux.ibm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Tegra processors prior to Tegra186 used APB DMA for I2C requiring
CONFIG_TEGRA20_APB_DMA=y while Tegra186 and later use GPC DMA requiring
CONFIG_TEGRA186_GPC_DMA=y.
The check for if the processor uses APB DMA is inverted and so the wrong
DMA config options are checked.
This means if CONFIG_TEGRA20_APB_DMA=y but CONFIG_TEGRA186_GPC_DMA=n
with a Tegra186 or later processor the driver will incorrectly think DMA is
enabled and attempt to request DMA channels that will never be availible,
leaving the driver in a perpetual EPROBE_DEFER state.
The controller may be shared with other port, for example the firmware.
Handle the interrupt from other sources will cause crash since some
data are not initialized. So only handle the interrupt of the driver's
transfer and discard others.
Fixes: d62fbdb99a85 ("i2c: add support for HiSilicon I2C controller") Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Reviewed-by: Andi Shyti <andi.shyti@kernel.org> Link: https://lore.kernel.org/r/20230801124625.63587-1-yangyicong@huawei.com Signed-off-by: Wolfram Sang <wsa@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
iproc_i2c_rd_reg() and iproc_i2c_wr_reg() are called from both
interrupt context (e.g. bcm_iproc_i2c_isr) and process context
(e.g. bcm_iproc_i2c_suspend). Therefore, interrupts should be
disabled to avoid potential deadlock. To prevent this scenario,
use spin_lock_irqsave().
Fixes: 9a1038728037 ("i2c: iproc: add NIC I2C support") Signed-off-by: Chengfeng Ye <dg573847474@gmail.com> Acked-by: Ray Jui <ray.jui@broadcom.com> Reviewed-by: Andi Shyti <andi.shyti@kernel.org> Signed-off-by: Wolfram Sang <wsa@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
With deferred close we can have closes that race with lease breaks,
and so with the current checks for whether to send the lease response,
oplock_response(), this can mean that an unmount (kill_sb) can occur
just before we were checking if the tcon->ses is valid. See below:
To fix this change the ordering of the checks before sending the oplock_response
to first check if the openFileList is empty.
Fixes: da787d5b7498 ("SMB3: Do not send lease break acknowledgment if all file handles have been closed") Suggested-by: Bharath SM <bharathsm@microsoft.com> Reviewed-by: Bharath SM <bharathsm@microsoft.com> Reviewed-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
mlx5_vdpa_destroy_mr can be called from .set_map with data ASID after
the control virtqueue ASID iotlb has been populated. The control vq
iotlb must not be cleared, since it will not be populated again.
So call the ASID aware destroy function which makes sure that the
right vq resource is destroyed.
Fixes: 8fcd20c30704 ("vdpa/mlx5: Support different address spaces for control and data") Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Reviewed-by: Gal Pressman <gal@nvidia.com>
Message-Id: <20230802171231.11001-5-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The mr->initialized flag is shared between the control vq and data vq
part of the mr init/uninit. But if the control vq and data vq get placed
in different ASIDs, it can happen that initializing the control vq will
prevent the data vq mr from being initialized.
This patch consolidates the control and data vq init parts into their
own init functions. The mr->initialized will now be used for the data vq
only. The control vq currently doesn't need a flag.
The uninitializing part is also taken care of: mlx5_vdpa_destroy_mr got
split into data and control vq functions which are now also ASID aware.
Fixes: 8fcd20c30704 ("vdpa/mlx5: Support different address spaces for control and data") Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Eugenio Pérez <eperezma@redhat.com> Reviewed-by: Gal Pressman <gal@nvidia.com>
Message-Id: <20230802171231.11001-3-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The IRQ injection work used spin_lock_irq() to protect the
scheduling of the softirq, but spin_lock_bh() should be
used.
With spin_lock_irq(), we noticed delay of more than 6
seconds between the time a NAPI polling work is scheduled
and the time it is executed.
Fixes: c8a6153b6c59 ("vduse: Introduce VDUSE - vDPA Device in Userspace") Cc: xieyongji@bytedance.com Suggested-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Message-Id: <20230705114505.63274-1-maxime.coquelin@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Xie Yongji <xieyongji@bytedance.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
vm_dev has a separate lifecycle because it has a 'struct device'
embedded. Thus, having a release callback for it is correct.
Allocating the vm_dev struct with devres totally breaks this protection,
though. Instead of waiting for the vm_dev release callback, the memory
is freed when the platform_device is removed. Resulting in a
use-after-free when finally the callback is to be called.
To easily see the problem, compile the kernel with
CONFIG_DEBUG_KOBJECT_RELEASE and unbind with sysfs.
The fix is easy, don't use devres in this case.
Found during my research about object lifetime problems.
Fixes: 7eb781b1bbb7 ("virtio_mmio: add cleanup for virtio_mmio_probe") Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Message-Id: <20230629120526.7184-1-wsa+renesas@sang-engineering.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
If a task creates a new block group and that block group becomes unused
before we finish its creation, at btrfs_create_pending_block_groups(),
then when btrfs_mark_bg_unused() is called against the block group, we
assume that the block group is currently in the list of block groups to
reclaim, and we move it out of the list of new block groups and into the
list of unused block groups. This has two consequences:
1) We move it out of the list of new block groups associated to the
current transaction. So the block group creation is not finished and
if we attempt to delete the bg because it's unused, we will not find
the block group item in the extent tree (or the new block group tree),
its device extent items in the device tree etc, resulting in the
deletion to fail due to the missing items;
2) We don't increment the reference count on the block group when we
move it to the list of unused block groups, because we assumed the
block group was on the list of block groups to reclaim, and in that
case it already has the correct reference count. However the block
group was on the list of new block groups, in which case no extra
reference was taken because it's local to the current task. This
later results in doing an extra reference count decrement when
removing the block group from the unused list, eventually leading the
reference count to 0.
This second case was caught when running generic/297 from fstests, which
produced the following assertion failure and stack trace:
Fix this by adding a runtime flag to the block group to tell that the
block group is still in the list of new block groups, and therefore it
should not be moved to the list of unused block groups, at
btrfs_mark_bg_unused(), until the flag is cleared, when we finish the
creation of the block group at btrfs_create_pending_block_groups().
Fixes: a9f189716cf1 ("btrfs: move out now unused BG from the reclaim list") CC: stable@vger.kernel.org # 5.15+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
In zoned mode the sequential status of zone can be also tracked in the
runtime flags of block group.
Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
Stable-dep-of: 0657b20c5a76 ("btrfs: fix use-after-free of new block group that became unused") Signed-off-by: Sasha Levin <sashal@kernel.org>
We already have flags in block group to track various status bits,
convert needs_free_space as well and reduce size of btrfs_block_group.
Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
Stable-dep-of: 0657b20c5a76 ("btrfs: fix use-after-free of new block group that became unused") Signed-off-by: Sasha Levin <sashal@kernel.org>
An unused block group is easy to remove to free up space and should be
reclaimed fast. Such block group can often already be a target of the
reclaim process. As we check list_empty(&bg->bg_list), we keep it in the
reclaim list. That block group is never reclaimed until the file system
is filled e.g. up to 75%.
Instead, we can move unused block group to the unused list and delete it
fast.
Instead of calling aperture_remove_conflicting_devices() to remove the
conflicting devices, just call to aperture_detach_devices() to detach
the device that matches the same PCI BAR / aperture range. Since the
former is just a wrapper of the latter plus a sysfb_disable() call,
and now that's done in this function but only for the primary devices.
This fixes a regression introduced by commit ee7a69aa38d8 ("fbdev:
Disable sysfb device registration when removing conflicting FBs"),
where we remove the sysfb when loading a driver for an unrelated pci
device, resulting in the user losing their efifb console or similar.
Note that in practice this only is a problem with the nvidia blob,
because that's the only gpu driver people might install which does not
come with an fbdev driver of it's own. For everyone else the real gpu
driver will restore a working console.
Also note that in the referenced bug there's confusion that this same
bug also happens on amdgpu. But that was just another amdgpu specific
regression, which just happened to happen at roughly the same time and
with the same user-observable symptoms. That bug is fixed now, see
https://bugzilla.kernel.org/show_bug.cgi?id=216331#c15
Note that we should not have any such issues on non-pci multi-gpu
issues, because I could only find two such cases:
- SoC with some external panel over spi or similar. These panel
drivers do not use drm_aperture_remove_conflicting_framebuffers(),
so no problem.
- vga+mga, which is a direct console driver and entirely bypasses all
this.
For the above reasons the cc: stable is just notionally, this patch
will need a backport and that's up to nvidia if they care enough.
v2:
- Explain a bit better why other multi-gpu that aren't pci shouldn't
have any issues with making all this fully pci specific.
Generic fbdev drivers use the apertures field in struct fb_info to
control ownership of the framebuffer memory and graphics device. Do
not set the values in hyperv-fb.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221219160516.23436-9-tzimmermann@suse.de
Stable-dep-of: 5ae3716cfdcd ("video/aperture: Only remove sysfb on the default vga pci device") Signed-off-by: Sasha Levin <sashal@kernel.org>
Xiang reports that VMs occasionally fail to boot on GICv4.1 systems when
running a preemptible kernel, as it is possible that a vCPU is blocked
without requesting a doorbell interrupt.
The issue is that any preemption that occurs between vgic_v4_put() and
schedule() on the block path will mark the vPE as nonresident and *not*
request a doorbell irq. This occurs because when the vcpu thread is
resumed on its way to block, vcpu_load() will make the vPE resident
again. Once the vcpu actually blocks, we don't request a doorbell
anymore, and the vcpu won't be woken up on interrupt delivery.
Fix it by tracking that we're entering WFI, and key the doorbell
request on that flag. This allows us not to make the vPE resident
when going through a preempt/schedule cycle, meaning we don't lose
any state.
Cc: stable@vger.kernel.org Fixes: 8e01d9a396e6 ("KVM: arm64: vgic-v4: Move the GICv4 residency flow to be driven by vcpu_load/put") Reported-by: Xiang Chen <chenxiang66@hisilicon.com> Suggested-by: Zenghui Yu <yuzenghui@huawei.com> Tested-by: Xiang Chen <chenxiang66@hisilicon.com> Co-developed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Acked-by: Zenghui Yu <yuzenghui@huawei.com> Link: https://lore.kernel.org/r/20230713070657.3873244-1-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Sasha Levin <sashal@kernel.org>
[Why] hdcp are enabled for asics from raven. for old asics
which hdcp are not enabled, hdcp_workqueue are null. some
access to hdcp work queue are not guarded with pointer check.
[How] add hdcp_workqueue pointer check before access workqueue.
[Why]
multiple display hdcp are enabled within event_property_validate,
event_property_update by looping all displays on mst hub. when
one of display on mst hub in unplugged or disabled, hdcp are
disabled for all displays on mst hub within hdcp_reset_display
by looping all displays of mst link. for displays still active,
their encryption status are off. kernel driver will not run hdcp
authentication again. therefore, hdcp are not enabled automatically.
[How]
within is_content_protection_different, check drm_crtc_state changes
of all displays on mst hub, if need, triger hdcp_update_display to
re-run hdcp authentication.
[Why]
connector hdcp properties are lost after display is
unplgged from mst hub. connector is destroyed with
dm_dp_mst_connector_destroy. when display is plugged
back, hdcp is not desired and it wouldnt be enabled.
[How]
save hdcp properties into hdcp_work within
amdgpu_dm_atomic_commit_tail. If the same display is
plugged back with same display index, its hdcp
properties will be retrieved from hdcp_work within
dm_dp_mst_get_modes.
igc_configure_rx_ring() function will be called as part of XDP program
setup. If Rx hardware timestamp is enabled prio to XDP program setup,
this timestamp enablement will be overwritten when buffer size is
written into SRRCTL register.
Thus, this commit read the register value before write to SRRCTL
register. This commit is tested by using xdp_hw_metadata bpf selftest
tool. The tool enables Rx hardware timestamp and then attach XDP program
to igc driver. It will display hardware timestamp of UDP packet with
port number 9092. Below are detail of test steps and results.
Command on DUT:
sudo ./xdp_hw_metadata <interface name>
Command on Link Partner:
echo -n skb | nc -u -q1 <destination IPv4 addr> 9092
Result before this patch:
skb hwtstamp is not found!
Optionally, read PHC to confirm the values obtained are almost the same:
Command:
sudo ./testptp -d /dev/ptp0 -g
Result:
clock time: 1677800973.913598978 or Fri Mar 3 07:49:33 2023
Fixes: fc9df2a0b520 ("igc: Enable RX via AF_XDP zero-copy") Cc: <stable@vger.kernel.org> # 5.14+ Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Jesper Dangaard Brouer <brouer@redhat.com> Tested-by: Jesper Dangaard Brouer <brouer@redhat.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
When ring_buffer_swap_cpu was called during resize process,
the cpu buffer was swapped in the middle, resulting in incorrect state.
Continuing to run in the wrong state will result in oops.
This issue can be easily reproduced using the following two scripts:
/tmp # cat test1.sh
//#! /bin/sh
for i in `seq 0 100000`
do
echo 2000 > /sys/kernel/debug/tracing/buffer_size_kb
sleep 0.5
echo 5000 > /sys/kernel/debug/tracing/buffer_size_kb
sleep 0.5
done
/tmp # cat test2.sh
//#! /bin/sh
for i in `seq 0 100000`
do
echo irqsoff > /sys/kernel/debug/tracing/current_tracer
sleep 1
echo nop > /sys/kernel/debug/tracing/current_tracer
sleep 1
done
/tmp # ./test1.sh &
/tmp # ./test2.sh &
A typical oops log is as follows, sometimes with other different oops logs.
After analysis, the seq of the error is as follows [1-5]:
int ring_buffer_resize(struct trace_buffer *buffer, unsigned long size,
int cpu_id)
{
for_each_buffer_cpu(buffer, cpu) {
cpu_buffer = buffer->buffers[cpu];
//1. get cpu_buffer, aka cpu_buffer(A)
...
...
schedule_work_on(cpu,
&cpu_buffer->update_pages_work);
//2. 'update_pages_work' is queue on 'cpu', cpu_buffer(A) is passed to
// update_pages_handler, do the update process, set 'update_done' in
// complete(&cpu_buffer->update_done) and to wakeup resize process.
//---->
//3. Just at this moment, ring_buffer_swap_cpu is triggered,
//cpu_buffer(A) be swaped to cpu_buffer(B), the max_buffer.
//ring_buffer_swap_cpu is called as the 'Call trace' below.
/* wait for all the updates to complete */
for_each_buffer_cpu(buffer, cpu) {
cpu_buffer = buffer->buffers[cpu];
//4. get cpu_buffer, cpu_buffer(B) is used in the following process,
//the state of cpu_buffer(A) and cpu_buffer(B) is totally wrong.
//for example, cpu_buffer(A)->update_done will leave be set 1, and will
//not 'wait_for_completion' at the next resize round.
if (!cpu_buffer->nr_pages_to_update)
continue;
if (cpu_online(cpu))
wait_for_completion(&cpu_buffer->update_done);
cpu_buffer->nr_pages_to_update = 0;
}
...
}
//5. the state of cpu_buffer(A) and cpu_buffer(B) is totally wrong,
//Continuing to run in the wrong state, then oops occurs.
In function ‘fortify_memcpy_chk’,
inlined from ‘get_conn_info_complete’ at net/bluetooth/mgmt.c:7281:2:
include/linux/fortify-string.h:592:25: error: call to
‘__read_overflow2_field’ declared with attribute warning: detected read
beyond size of field (2nd parameter); maybe use struct_group()?
[-Werror=attribute-warning]
592 | __read_overflow2_field(q_size_field, size);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
This is due to the wrong member is used for memcpy(). Use correct one.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
As per the generic KASAN code in mm/kasan, disable KCOV with
KCOV_INSTRUMENT := n in the makefile.
This fixes a ppc64 boot hang when KCOV and KASAN are enabled.
kasan_early_init() gets called before a PACA is initialised, but the
KCOV hook expects a valid PACA.
Suggested-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230710044143.146840-1-bgray@linux.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
The variable codec->regmap is often protected by the lock
codec->regmap_lock when is accessed. However, it is accessed without
holding the lock when is accessed in snd_hdac_regmap_sync():
if (codec->regmap)
In my opinion, this may be a harmful race, because if codec->regmap is
set to NULL right after the condition is checked, a null-pointer
dereference can occur in the called function regcache_sync():
map->lock(map->lock_arg); --> Line 360 in drivers/base/regmap/regcache.c
To fix this possible null-pointer dereference caused by data race, the
mutex_lock coverage is extended to protect the if statement as well as the
function call to regcache_sync().
[ Note: the lack of the regmap_lock itself is harmless for the current
codec driver implementations, as snd_hdac_regmap_sync() is only for
PM runtime resume that is prohibited during the codec probe.
But the change makes the whole code more consistent, so it's merged
as is -- tiwai ]
These models use NSIWAY amplifiers for internal speaker, but cannot put
sound outside from these amplifiers. So eapd verbs are needed to initialize
the amplifiers. They can be added during boot to get working sound out
of internal speaker.
In a previous commit 2681631c2973 ("fs/ntfs3: Add null pointer check to
attr_load_runs_vcn"), ni can be NULL in attr_load_runs_vcn(), and thus it
should be checked before being used.
However, in the call stack of this commit, mft_ni in mi_read() is
aliased with ni in attr_load_runs_vcn(), and it is also used in
mi_read() at two places:
mi_read()
rw_lock = &mft_ni->file.run_lock -> No check
attr_load_runs_vcn(mft_ni, ...)
ni (namely mft_ni) is checked in the previous commit
attr_load_runs_vcn(..., &mft_ni->file.run) -> No check
Thus, to avoid possible null-pointer dereferences, the related checks
should be added.
These bugs are reported by a static analysis tool implemented by myself,
and they are found by extending a known bug fixed in the previous commit.
Thus, they could be theoretical bugs.
Signed-off-by: Jia-Ju Bai <baijiaju@buaa.edu.cn> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
ni_create_attr_list uses WARN_ON to catch error cases while generating
attribute list, which only prints out stack trace and may not be enough.
This repalces them with more proper error handling flow.
Signed-off-by: Edward Lo <loyuantsung@gmail.com> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
fence Decrements the reference count before exiting.
Avoid Race Vulnerabilities for fence use-after-free.
v2 (chk): actually fix the use after free and not just move it.
Signed-off-by: shanzhulig <shanzhulig@gmail.com> Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Adding the device ID from the Asus Ally gets the bluetooth working
on the device.
Signed-off-by: Matthew Anderson <ruinairas1992@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Intel Barlow Ridge discrete USB4 host router has the same limitation as
the previous generations so make sure the USB3 bandwidth limitation
quirk is applied to Barlow Ridge too.
Signed-off-by: Gil Fine <gil.fine@linux.intel.com> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Intel Barlow Ridge is the first USB4 v2 controller from Intel. The
controller exposes standard USB4 PCI class ID in typical configurations,
however there is a way to configure it so that it uses a special class
ID to allow using s different driver than the Windows inbox one. For
this reason add the Barlow Ridge PCI ID to the Linux driver too so that
the driver can attach regardless of the class ID.
When nonstatic_release_resource_db() frees all resources associated
with an PCMCIA socket, it forgets to free socket_data too, causing
a memory leak observable with kmemleak:
To fix these possible data races, the lock sdp->sd_tune.gt_spin is
acquired before accessing the fields of gfs2_tune and released after these
accesses.
Further changes by Andreas:
- Don't hold the spin lock over the seq_printf operations.
Reported-by: BassCheck <bass@buaa.edu.cn> Signed-off-by: Tuo Li <islituo@gmail.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
USB PHY DPDM wakeup bit is enabled by default, when USB wakeup
is not required(/sys/.../wakeup is disabled), this bit should be
disabled, otherwise we will have unexpected wakeup if do USB device
connect/disconnect while system sleep.
This bit can be enabled for both host and device mode.
Signed-off-by: Li Jun <jun.li@nxp.com> Signed-off-by: Xu Yang <xu.yang_2@nxp.com> Acked-by: Peter Chen <peter.chen@kernel.org>
Message-ID: <20230517081907.3410465-3-xu.yang_2@nxp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
ISOC transfers expect a certain cadence of requests being queued. Not
keeping up with the expected rate of requests results in missed ISOC
transfers (EXDEV). The application layer may or may not produce video
frames to match this expectation, so uvc gadget driver must handle cases
where the application is not queuing up buffers fast enough to fulfill
ISOC requirements.
Currently, uvc gadget driver waits for new video buffer to become available
before queuing up usb requests. With this patch the gadget driver queues up
0 length usb requests whenever there are no video buffers available. The
USB controller's complete callback is used as the limiter for how quickly
the 0 length packets will be queued. Video buffers are still queued as
soon as they become available.
When serial console over USB is enabled, gs_console_connect
queues gs_console_work, where it acquires the spinlock and
queues the usb request, and this request goes to gadget layer.
Now consider a situation where gadget layer prints something
to dmesg, this will eventually call gs_console_write() which
requires cons->lock. And this causes spinlock recursion. Avoid
this by excluding usb_ep_queue from the spinlock.
From the experiments with camera sensors using SGRBG10_1X10/3280x2464 and
SRGGB10_1X10/3280x2464 formats, it becomes clear that on sdm845 and sm8250
VFE outputs the lines padded to a length multiple of 16 bytes. As in the
current driver the value of the bpl_alignment is set to 8 bytes, the frames
captured in formats with the bytes-per-line value being not a multiple of
16 get corrupted.
Set the bpl_alignment of the camss video output device to 16 for sdm845 and
sm8250 to fix that.
Getting below error when using KCSAN to check the driver. Adding lock to
protect parameter num_rdy when getting the value with function:
v4l2_m2m_num_src_bufs_ready/v4l2_m2m_num_dst_bufs_ready.
kworker/u16:3: [name:report&]BUG: KCSAN: data-race in v4l2_m2m_buf_queue
kworker/u16:3: [name:report&]
kworker/u16:3: [name:report&]read-write to 0xffffff8105f35b94 of 1 bytes by task 20865 on cpu 7:
kworker/u16:3: v4l2_m2m_buf_queue+0xd8/0x10c
Returning early from stm32_usart_serial_remove() results in a resource
leak as several cleanup functions are not called. The driver core ignores
the return value and there is no possibility to clean up later.
uart_remove_one_port() only returns non-zero if there is some
inconsistency (i.e. stm32_usart_driver.state[port->line].uart_port == NULL).
This should never happen, and even if it does it's a bad idea to exit
early in the remove callback without cleaning up.
This prepares changing the prototype of struct platform_driver::remove to
return void. See commit 5c5a7680e67b ("platform: Provide a remove callback
that returns no value") for further details about this quest.
Don't collect exiting session in smb2_reconnect_server(), because it
will be released soon.
Note that the exiting session will stay in server->smb_ses_list until
it complete the cifs_free_ipc() and logoff() and then delete itself
from the list.
Signed-off-by: Winston Wen <wentao@uniontech.com> Reviewed-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The shutdown is called on reboot/shutdown of the machine.
At this point the firmware tracing cannot be used anymore but in case of
IPC3 it is using and keeping a DMA channel active (dtrace).
For Tiger Lake platforms we have a quirk in place to fix rare reboot issues
when a DMA was active before rebooting the system.
If the tracing is enabled this quirk will be always used and a print
appears on the kernel log which might be misleading or not even correct.
Release the fw tracing before executing the shutdown to make sure that this
known DMA user is cleared away.
Reviewed-by: Kai Vehmanen <kai.vehmanen@linux.intel.com> Reviewed-by: Daniel Baluta <daniel.baluta@nxp.com> Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com> Reviewed-by: Rander Wang <rander.wang@intel.com> Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com> Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com> Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Link: https://lore.kernel.org/r/20230616100039.378150-4-pierre-louis.bossart@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
[Why and How]
Add back debug bits enabling RCO for dcn314 as underflow
associated with this change has been resolved
Acked-by: Stylon Wang <stylon.wang@amd.com> Signed-off-by: Daniel Miess <daniel.miess@amd.com> Reviewed-by: Jun Lei <jun.lei@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
[Why]
Hardware implements root clock gating by utilizing the DPP DTO registers
with a special case of DTO enabled, phase = 0, modulo = 1. This
conflicts with our policy to always update the DPPDTO for cases where
it's expected to be disabled.
The pipes unexpectedly enter a higher power state than expected because
of this programming flow.
[How]
Guard the upper layers of HWSS against this hardware quirk with
programming the register with an internal state flag in DCCG.
While technically acting as global state for the DCCG, HWSS shouldn't be
expected to understand the hardware quirk for having DTO disabled
causing more power than DTO enabled with this specific setting.
This also prevents sequencing errors from occuring in the future if
we have to program DPP DTO in multiple locations.
Acked-by: Stylon Wang <stylon.wang@amd.com> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Reviewed-by: Jun Lei <jun.lei@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Previously when destroying a QP/RQ, the result of the firmware
destruction function was ignored and upper layers weren't informed
about the failure.
Which in turn could lead to various problems since when upper layer
isn't aware of the failure it continues its operation thinking that the
related QP/RQ was successfully destroyed while it actually wasn't,
which could lead to the below kernel WARN.
Currently, we return the correct firmware destruction status to upper
layers which in case of the RQ would be mlx5_ib_destroy_wq() which
was already capable of handling RQ destruction failure or in case of
a QP to destroy_qp_common(), which now would actually warn upon qp
destruction failure.
[Description]
- Previously we wanted to apply extra 60us of prefetch for min DCFCLK
(200Mhz), but DCFCLK can be calculated to be 201Mhz which underflows
also without the extra prefetch
- Instead, apply the the extra 60us prefetch for any DCFCLK freq <=
300Mhz
Reviewed-by: Nevenko Stupar <nevenko.stupar@amd.com> Reviewed-by: Jun Lei <jun.lei@amd.com> Acked-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Alvin Lee <alvin.lee2@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
When using cpu to update page tables, vm update fences are unused.
Install stub fence into these fence pointers instead of NULL
to avoid NULL dereference when calling dma_fence_wait() on them.
Suggested-by: Christian König <christian.koenig@amd.com> Signed-off-by: Lang Yu <Lang.Yu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
An Interrupt Remapping Table (IRT) stores interrupt remapping configuration
for each device. In a normal operation, the AMD IOMMU caches the table
to optimize subsequent data accesses. This requires the IOMMU driver to
invalidate IRT whenever it updates the table. The invalidation process
includes issuing an INVALIDATE_INTERRUPT_TABLE command following by
a COMPLETION_WAIT command.
However, there are cases in which the IRT is updated at a high rate.
For example, for IOMMU AVIC, the IRTE[IsRun] bit is updated on every
vcpu scheduling (i.e. amd_iommu_update_ga()). On system with large
amount of vcpus and VFIO PCI pass-through devices, the invalidation
process could potentially become a performance bottleneck.
Introducing a new kernel boot option:
amd_iommu=irtcachedis
which disables IRTE caching by setting the IRTCachedis bit in each IOMMU
Control register, and bypass the IRT invalidation process.
Adds the USB and Bluetooth IDs for the Logitech G915 TKL keyboard, for device detection
For this device, this provides battery reporting on top of hid-generic
Currently upon a heartbeat failure, we don't know if the failure
is due to firmware hang or due to a bad PCI link. Hence, we
are reading a PCI config space register with a known value (vendor ID)
so we will know which of the two possibilities caused the heartbeat
failure.
If dma_direct_alloc() alloc memory in size of 64MB, the inner function
dma_common_contiguous_remap() will allocate 128KB memory by invoking
the function kmalloc_array(). and the kmalloc_array seems to fail to try to
allocate 128KB mem.
The functionality described in Commit 61bef9e68dca ("ASoC: SOF: Intel: hda: enforce exclusion between HDaudio and SoundWire")
does not seem to be properly implemented with two issues that need to
be corrected.
a) The test used is incorrect when DisplayAudio codecs are not supported.
b) Conversely when only Display Audio codecs can be found, we do want
to start the SoundWire links, if any. That will help add the relevant
topologies and machine descriptors, and identify cases where the
SoundWire information in ACPI needs to be modified with a quirk.
It is considered good practice to call cpu_relax() in busy loops, see
Documentation/process/volatile-considered-harmful.rst. This can not
only lower CPU power consumption or yield to a hyperthreaded twin
processor, but also allows an architecture to mitigate hardware issues
(e.g. ARM Erratum 754327 for Cortex-A9 prior to r2p0) in the
architecture-specific cpu_relax() implementation.
In addition, cpu_relax() is also a compiler barrier. It is not
immediately obvious that the @op argument "function" will result in an
actual function call (e.g. in case of inlining).
Where a function call is a C sequence point, this is lost on inlining.
Therefore, with agressive enough optimization it might be possible for
the compiler to hoist the:
(val) = op(args);
"load" out of the loop because it doesn't see the value changing. The
addition of cpu_relax() would inhibit this.
As the iopoll helpers lack calls to cpu_relax(), people are sometimes
reluctant to use them, and may fall back to open-coded polling loops
(including cpu_relax() calls) instead.
Fix this by adding calls to cpu_relax() to the iopoll helpers:
- For the non-atomic case, it is sufficient to call cpu_relax() in
case of a zero sleep-between-reads value, as a call to
usleep_range() is a safe barrier otherwise. However, it doesn't
hurt to add the call regardless, for simplicity, and for similarity
with the atomic case below.
- For the atomic case, cpu_relax() must be called regardless of the
sleep-between-reads value, as there is no guarantee all
architecture-specific implementations of udelay() handle this.
if (!SOF_RT711_JDSRC(sof_sdw_quirk)) is tested in rt711_sdca_add_codec_
device_props(), and we don't add software node to the device if jack
source is not set. We need to do the same test in
sof_sdw_rt711_sdca_exit(), and avoid removing software node if jack
source is not set.
Fix USB-related warnings in prtrvt, prtvt7, prti6q and prtwd2 device trees
by disabling unused usbphynop1 and usbphynop2 USB PHYs and providing proper
configuration for the over-current detection. This fixes the following
warnings with the current kernel:
usb_phy_generic usbphynop1: dummy supplies not allowed for exclusive requests
usb_phy_generic usbphynop2: dummy supplies not allowed for exclusive requests
imx_usb 2184200.usb: No over current polarity defined
By the way, fix over-current detection on usbotg port for prtvt7, prti6q
and prtwd2 boards. Only prtrvt do not have OC on USB OTG port.
We have SOF and generic ACP support enabled for Vangogh platform
on some machines. Since we have same PCI id used for probing,
add check for machine configuration flag to avoid conflict with
newer pci drivers. Such machine flag has been initialized via
dmi match on few Vangogh based machines. If no flag is
specified probe and register older platform device.
R-Car H3 ES1.* was only available to an internal development group and
needed a lot of quirks and workarounds. These become a maintenance
burden now, so our development group decided to remove upstream support
for this SoC and prevent booting it. Public users only have ES2 onwards.
Add check to fix the possible array out of bounds violation by
making speed equal to GEN1_CORE_CLK_FREQ when its value is more
than the size of "pcie_gen_freq" array. This array has size of
four but possible speed (CLS) values are from "0 to 0xF". So,
"speed - 1" values are "-1 to 0xE".
The type of size is unsigned int, if size is 0x40000000, there will
be an integer overflow, size will be zero after size *= sizeof(uint32_t),
will cause uninitialized memory to be referenced later.
Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: hackyzh002 <hackyzh002@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
For smu v13_0_2, if the GPU supports xgmi, refer to
commit f5c7e7797060 ("drm/amdgpu: Adjust removal control flow for smu v13_0_2"),
it will run gpu recover in AMDGPU_RESET_FOR_DEVICE_REMOVE mode when removing,
which makes all devices in hive list have hw reset but no resume except the
basic ip blocks, then other ip blocks will not call .hw_fini according to
ip_block.status.hw.
Since psp_free_shared_bufs just includes some software operations, so move
it to psp_sw_fini.
Reviewed-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Signed-off-by: Longlong Yao <Longlong.Yao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
In commit 7beecaf7d507 ("net: phy: at803x: improve the WOL feature"), it
seems not correct to use a wol_en bit in a 1588 Control Register which is
only available on AR8031/AR8033(share the same phy_id) to determine if WoL
is enabled. Change it back to use AT803X_INTR_ENABLE_WOL for determining
the WoL status which is applicable on all chips supporting wol. Also update
the at803x_set_wol() function to only update the 1588 register on chips
having it. After this change, disabling wol at probe from commit d7cd5e06c9dd ("net: phy: at803x: disable WOL at probe") is no longer
needed. Change it to just disable the WoL bit in 1588 register for
AR8031/AR8033 to be aligned with AT803X_INTR_ENABLE_WOL in probe.
Fixes: 7beecaf7d507 ("net: phy: at803x: improve the WOL feature") Signed-off-by: Li Yang <leoyang.li@nxp.com> Reviewed-by: Viorel Suman <viorel.suman@nxp.com> Reviewed-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
Use devm_regulator_get_enable_optional() instead of hand writing it. It
saves some line of code.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Stable-dep-of: e58f30246c35 ("net: phy: at803x: fix the wol setting functions") Signed-off-by: Sasha Levin <sashal@kernel.org>
Commit 0227f058aa29 ("net/smc: Unbind r/w buffer size from clcsock
and make them tunable") introduced the net.smc.rmem and net.smc.wmem
sysctls to specify the size of buffers to be used for SMC type
connections. This created a regression for users that specified the
buffer size via setsockopt() as the effective buffer size was now
doubled.
Re-introduce the division by 2 in the SMC buffer create code and level
this out by duplicating the net.smc.[rw]mem values used for initializing
sk_rcvbuf/sk_sndbuf at socket creation time. This gives users of both
methods (setsockopt or sysctl) the effective buffer size that they
expect.
Initialize net.smc.[rw]mem from its own constant of 64kB, respectively.
Internal performance tests show that this value is a good compromise
between throughput/latency and memory consumption. Also, this decouples
it from any tuning that was done to net.ipv4.tcp_[rw]mem[1] before the
module for SMC protocol was loaded. Check that no more than INT_MAX / 2
is assigned to net.smc.[rw]mem, in order to avoid any overflow condition
when that is doubled for use in sk_sndbuf or sk_rcvbuf.
While at it, drop the confusing sk_buf_size variable from
__smc_buf_create and name "compressed" buffer size variables more
consistently.
Background:
Before the commit mentioned above, SMC's buffer allocator in
__smc_buf_create() always used half of the sockets' sk_rcvbuf/sk_sndbuf
value as initial value to search for appropriate buffers. If the search
resorted to using a bigger buffer when all buffers of the specified
size were busy, the duplicate of the used effective buffer size is
stored back to sk_rcvbuf/sk_sndbuf.
When available, buffers of exactly the size that a user had specified as
input to setsockopt() were used, despite setsockopt()'s documentation in
"man 7 socket" talking of a mandatory duplication:
[...]
SO_SNDBUF
Sets or gets the maximum socket send buffer in bytes.
The kernel doubles this value (to allow space for book‐
keeping overhead) when it is set using setsockopt(2),
and this doubled value is returned by getsockopt(2).
The default value is set by the
/proc/sys/net/core/wmem_default file and the maximum
allowed value is set by the /proc/sys/net/core/wmem_max
file. The minimum (doubled) value for this option is
2048.
[...]
Fixes: 0227f058aa29 ("net/smc: Unbind r/w buffer size from clcsock and make them tunable") Co-developed-by: Jan Karcher <jaka@linux.ibm.com> Signed-off-by: Jan Karcher <jaka@linux.ibm.com> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Reviewed-by: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
It's clear that rmbs_lock and sndbufs_lock are aims to protect the
rmbs list or the sndbufs list.
During connection establieshment, smc_buf_get_slot() will always
be invoked, and it only performs read semantics in rmbs list and
sndbufs list.
Based on the above considerations, we replace mutex with rw_semaphore.
Only smc_buf_get_slot() use down_read() to allow smc_buf_get_slot()
run concurrently, other part use down_write() to keep exclusive
semantics.
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Stable-dep-of: 833bac7ec392 ("net/smc: Fix setsockopt and sysctl to specify same buffer size again") Signed-off-by: Sasha Levin <sashal@kernel.org>
The test relies on 'nc' being the netcat version from the nmap project.
While this seems to be the case on Fedora, it is not the case on Ubuntu,
resulting in failures such as [1].
Fix by explicitly using the 'ncat' utility from the nmap project and the
skip the test in case it is not installed.
remove temporary files created by 'mirred_egress_to_ingress_tcp' test
in the cleanup() handler. Also, change variable names to avoid clashing
with globals from lib.sh.
We encountered many kernel exceptions of VM_BUG_ON(zspage->isolated ==
0) in dec_zspage_isolation() and BUG_ON(!pages[1]) in zs_unmap_object()
lately. This issue only occurs when migration and reclamation occur at
the same time.
With our memory stress test, we can reproduce this issue several times
a day. We have no idea why no one else encountered this issue. BTW,
we switched to the new kernel version with this defect a few months
ago.
Since fullness and isolated share the same unsigned int, modifications of
them should be protected by the same lock.
[andrew.yang@mediatek.com: move comment] Link: https://lkml.kernel.org/r/20230727062910.6337-1-andrew.yang@mediatek.com Link: https://lkml.kernel.org/r/20230721063705.11455-1-andrew.yang@mediatek.com Fixes: c4549b871102 ("zsmalloc: remove zspage isolation for migration") Signed-off-by: Andrew Yang <andrew.yang@mediatek.com> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Cc: Matthias Brugger <matthias.bgg@gmail.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>