We hold pmlmepriv->lock in position (1) of thread 1 and
use del_timer_sync() to wait timer to stop, but timer handler
also need pmlmepriv->lock in position (2) of thread 2.
As a result, rtw_joinbss_event_prehandle() will block forever.
This patch extracts del_timer_sync() from the protection of
spin_lock_bh(), which could let timer handler to obtain
the needed lock. What`s more, we change spin_lock_bh() to
spin_lock_irq() in _rtw_join_timeout_handler() in order to
prevent deadlock.
We hold pmlmepriv->lock in position (1) of thread 1 and use
del_timer_sync() to wait timer to stop, but timer handler
also need pmlmepriv->lock in position (2) of thread 2.
As a result, rtw_surveydone_event_callback() will block forever.
This patch extracts del_timer_sync() from the protection of
spin_lock_bh(), which could let timer handler to obtain
the needed lock. What`s more, we change spin_lock_bh() in
rtw_scan_timeout_handler() to spin_lock_irq(). Otherwise,
spin_lock_bh() will also cause deadlock() in timer handler.
To be sufficiently out of range for the usercopy test to see the lifetime
mismatch, expand the size of the "bad" buffer, which will let it be
beyond current_stack_pointer regardless of stack growth direction.
Paired with the recent addition of stack depth checking under
CONFIG_HARDENED_USERCOPY=y, this will correctly start tripping again.
Right now the (framework) mlock lock is (ab)used for multiple purposes:
1- protecting concurrent accesses over the odr local cache
2- avoid changing samplig frequency whilst buffer is running
Let's start by handling situation #1 with a local lock.
In r871xu_drv_init(), if r8712_init_drv_sw() fails, then the memory
allocated by r8712_alloc_io_queue() in r8712_usb_dvobj_init() is not
properly released as there is no action will be performed by
r8712_usb_dvobj_deinit().
To properly release it, we should call r8712_free_io_queue() in
r8712_usb_dvobj_deinit().
Besides, in r871xu_dev_remove(), r8712_usb_dvobj_deinit() will be called
by r871x_dev_unload() under condition `padapter->bup` and
r8712_free_io_queue() is called by r8712_free_drv_sw().
However, r8712_usb_dvobj_deinit() does not rely on `padapter->bup` and
calling r8712_free_io_queue() in r8712_free_drv_sw() is negative for
better understading the code.
So I move r8712_usb_dvobj_deinit() into r871xu_dev_remove(), and remove
r8712_free_io_queue() from r8712_free_drv_sw().
kstrdup() is also a memory allocation-related function, it returns NULL
when some memory errors happen. So it is better to check the return
value of it so to catch the memory error in time. Besides, there should
have a kfree() to clear up the allocation if we get a failure later in
this function to prevent memory leak.
The maths at the end of iter_xarray_get_pages() to calculate the actual
size doesn't work under some circumstances, such as when it's been asked to
extract a partial single page. Various terms of the equation cancel out
and you end up with actual == offset. The same issue exists in
iter_xarray_get_pages_alloc().
Fix these to just use min() to select the lesser amount from between the
amount of page content transcribed into the buffer, minus the offset, and
the size limit specified.
This doesn't appear to have caused a problem yet upstream because network
filesystems aren't getting the pages from an xarray iterator, but rather
passing it directly to the socket, which just iterates over it. Cachefiles
*does* do DIO from one to/from ext4/xfs/btrfs/etc. but it always asks for
whole pages to be written or read.
Fixes: 8f7e74412f16 ("iov_iter: Add ITER_XARRAY") Reported-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: David Howells <dhowells@redhat.com>
cc: Alexander Viro <viro@zeniv.linux.org.uk>
cc: Dominique Martinet <asmadeus@codewreck.org>
cc: Mike Marshall <hubcap@omnibond.com>
cc: Gao Xiang <xiang@kernel.org>
cc: linux-afs@lists.infradead.org
cc: v9fs-developer@lists.sourceforge.net
cc: devel@lists.orangefs.org
cc: linux-erofs@lists.ozlabs.org
cc: linux-cachefs@redhat.com
cc: linux-fsdevel@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Sasha Levin <sashal@kernel.org>
Swap around the GRE and VLAN parts in the flow-key offloaded by
the driver to fit in with other tunnel types and the firmware.
Without this change used cases with GRE+VLAN on the outer header
does not get offloaded as the flow-key mismatches what the
firmware expect.
Fixes: 6c27392b059d ("nfp: flower: add support to offload QinQ match") Fixes: 61c382052735 ("nfp: flower-ct: compile match sections of flow_payload") Signed-off-by: Etienne van der Linde <etienne.vanderlinde@corigine.com> Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Gcc-12 correctly warned about this code using a non-NULL pointer as a
truth value:
drivers/gpu/drm/imx/ipuv3-crtc.c: In function ‘ipu_crtc_disable_planes’:
drivers/gpu/drm/imx/ipuv3-crtc.c:72:21: error: the comparison will always evaluate as ‘true’ for the address of ‘plane’ will never be NULL [-Werror=address]
72 | if (&ipu_crtc->plane[1] && plane == &ipu_crtc->plane[1]->base)
| ^
due to the extraneous '&' address-of operator.
Philipp Zabel points out that The mistake had no adverse effect since
the following condition doesn't actually dereference the NULL pointer,
but the intent of the code was obviously to check for it, not to take
the address of the member.
In our server, there may be no high order (>= 6) memory since we reserve
lots of HugeTLB pages when booting. Then the system panic. So use
alloc_large_system_hash() to allocate table_perturb.
Fixes: 10c654c9aecf ("tcp: dynamically allocate the perturb table used by source ports") Signed-off-by: Muchun Song <songmuchun@bytedance.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20220607070214.94443-1-songmuchun@bytedance.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Commit 3e8cac4b967d ("net: dsa: mv88e6xxx: Link in pcs_get_state() if AN
is bypassed") added the ability to link if AN was bypassed, and added
filling of state->an_complete field, but set it to true if AN was
enabled in BMCR, not when AN was reported complete in BMSR.
This was done because for some reason, when I wanted to use BMSR value
to infer an_complete, I was looking at BMSR_ANEGCAPABLE bit (which was
always 1), instead of BMSR_ANEGCOMPLETE bit.
Use BMSR_ANEGCOMPLETE for filling state->an_complete.
Fixes: 3e8cac4b967d ("net: dsa: mv88e6xxx: Link in pcs_get_state() if AN is bypassed") Signed-off-by: Marek Behún <kabel@kernel.org> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Every iteration of for_each_child_of_node() decrements
the reference count of the previous node.
When break from a for_each_child_of_node() loop,
we need to explicitly call of_node_put() on the child node when
not need anymore.
Add missing of_node_put() to avoid refcount leak.
Fixes: 8a1a6e9b5a3d ("Altera TSE: Add main and header file for Altera Ethernet Driver") Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Link: https://lore.kernel.org/r/20220607041144.7553-1-linmq006@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
But was eventually revised more thoroughly:
- restrict the check to the only branch where needed, in an
uncommon GRE path that uses header_ops and calls skb_pull.
- test skb_transport_header, which is set along with csum_start
in skb_partial_csum_set in the normal header_ops datapath.
Turns out skbs can arrive in this branch without the transport
header set, e.g., through BPF redirection.
Revise the check back to check csum_start directly, and only if
CHECKSUM_PARTIAL. Do leave the check in the updated location.
Check field regardless of whether TUNNEL_CSUM is configured.
When combining two steering rules into one check
not only do they share the same actions but those
actions are also the same. This resolves an issue where
when creating two different rules with the same match
the actions are overwritten and one of the rules is deleted
a FW syndrome can be seen in dmesg.
mlx5_core 0000:03:00.0: mlx5_cmd_check:819:(pid 2105): DEALLOC_MODIFY_HEADER_CONTEXT(0x941) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0x1ab444)
Fixes: 5626b5e413af ("net/mlx5: Add hash table to search FTEs in a flow-group") Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The current design does not arm the tracer if traces are available before
the tracer string database is fully loaded, leading to an unfunctional tracer.
This fix will rearm the tracer every time the FW triggers tracer event
regardless of the tracer strings database status.
In some use-cases, mlx5 instances will need to search for their peer
device (the other port on the same HCA). For that, mlx5 device matching
mechanism relied on auxiliary_find_device() to search, and used a bad matching
callback function.
This approach has two issues:
1) next_phys_dev() the matching function, assumed all devices are
of the type mlx5_adev (mlx5 auxiliary device) which is wrong and
could lead to crashes, this worked for a while, since only lately
other drivers started registering auxiliary devices.
2) using the auxiliary class bus (auxiliary_find_device) to search for
mlx5_core_dev devices, who are actually PCIe device instances, is wrong.
This works since mlx5_core always has at least one mlx5_adev instance
hanging around in the aux bus.
As suggested by others we can fix 1. by comparing device names prefixes
if they have the string "mlx5_core" in them, which is not a best practice !
but even with that fixed, still 2. needs fixing, we are trying to
match pcie device peers so we should look in the right bus (pci bus),
hence this fix.
The fix:
1) search the pci bus for mlx5 peer devices, instead of the aux bus
2) to validated devices are the same type "mlx5_core_dev" compare if
they have the same driver, which is bulletproof.
This wouldn't have worked with the aux bus since the various mlx5 aux
device types don't share the same driver, even if they share the same device
wrapper struct (mlx5_adev) "which helped to find the parent device"
Fixes: 71c13243b916 ("net/mlx5: Register mlx5 devices to auxiliary virtual bus") Reported-by: Alexander Lobakin <alexandr.lobakin@intel.com> Reported-by: Maher Sanalla <msanalla@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maher Sanalla <msanalla@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
When search for a peer lag device we can filter based on that
device's capabilities.
Downstream patch will be less strict when filtering compatible devices
and remove the limitation where we require exact MLX5_MAX_PORTS and
change it to a range.
Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
EXPORT_SYMBOL and __init is a bad combination because the .init.text
section is freed up after the initialization. Hence, modules cannot
use symbols annotated __init. The access to a freed symbol may end up
with kernel panic.
modpost used to detect it, but it has been broken for a decade.
Recently, I fixed modpost so it started to warn it again, then this
showed up in linux-next builds.
There are two ways to fix it:
- Remove __init
- Remove EXPORT_SYMBOL
I chose the latter for this case because the caller (net/ipv6/seg6.c)
and the callee (net/ipv6/seg6_hmac.c) belong to the same module.
It seems an internal function call in ipv6.ko.
Fixes: 72caafc2ee5b ("ipv6: sr: add core files for SR HMAC support") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
EXPORT_SYMBOL and __init is a bad combination because the .init.text
section is freed up after the initialization. Hence, modules cannot
use symbols annotated __init. The access to a freed symbol may end up
with kernel panic.
modpost used to detect it, but it has been broken for a decade.
Recently, I fixed modpost so it started to warn it again, then this
showed up in linux-next builds.
There are two ways to fix it:
- Remove __init
- Remove EXPORT_SYMBOL
I chose the latter for this case because the only in-tree call-site,
net/ipv4/xfrm4_policy.c is never compiled as modular.
(CONFIG_XFRM is boolean)
Fixes: 662068a32854 ("xfrm: Introduce xfrm_input_afinfo to access the the callbacks properly") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
EXPORT_SYMBOL and __init is a bad combination because the .init.text
section is freed up after the initialization. Hence, modules cannot
use symbols annotated __init. The access to a freed symbol may end up
with kernel panic.
modpost used to detect it, but it has been broken for a decade.
Recently, I fixed modpost so it started to warn it again, then this
showed up in linux-next builds.
There are two ways to fix it:
- Remove __init
- Remove EXPORT_SYMBOL
I chose the latter for this case because the only in-tree call-site,
drivers/net/phy/phy_device.c is never compiled as modular.
(CONFIG_PHYLIB is boolean)
Fixes: 9df39a9c35c8 ("net: phy: Allow splitting MDIO bus/device support from PHYs") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
I found that NFSD's new NFSv3 READDIRPLUS XDR encoder was screwing up
right at the end of the page array. xdr_get_next_encode_buffer() does
not compute the value of xdr->end correctly:
* The check to see if we're on the final available page in xdr->buf
needs to account for the space consumed by @nbytes.
* The new xdr->end value needs to account for the portion of @nbytes
that is to be encoded into the previous buffer.
Fixes: 0f08a2d9e59f ("nfsd4: allow encoding across page boundaries") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: NeilBrown <neilb@suse.de> Reviewed-by: J. Bruce Fields <bfields@fieldses.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
xdpxceiver run on a AF_XDP ZC enabled driver revealed a problem with XSK
Tx batching API. There is a test that checks how invalid Tx descriptors
are handled by AF_XDP. Each valid descriptor is followed by invalid one
on Tx side whereas the Rx side expects only to receive a set of valid
descriptors.
In current xsk_tx_peek_release_desc_batch() function, the amount of
available descriptors is hidden inside xskq_cons_peek_desc_batch(). This
can be problematic in cases where invalid descriptors are present due to
the fact that xskq_cons_peek_desc_batch() returns only a count of valid
descriptors. This means that it is impossible to properly update XSK
ring state when calling xskq_cons_release_n().
To address this issue, pull out the contents of
xskq_cons_peek_desc_batch() so that callers (currently only
xsk_tx_peek_release_desc_batch()) will always be able to update the
state of ring properly, as total count of entries is now available and
use this value as an argument in xskq_cons_release_n(). By
doing so, xskq_cons_peek_desc_batch() can be dropped altogether.
Fixes: 2039e6aa39c9 ("xsk: Introduce batched Tx descriptor interfaces") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20220607142200.576735-1-maciej.fijalkowski@intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>
Move desc_array from the driver to the pool. The reason behind this is
that we can then reuse this array as a temporary storage for descriptors
in all zero-copy drivers that use the batched interface. This will make
it easier to add batching to more drivers.
i40e is the only driver that has a batched Tx zero-copy
implementation, so no need to touch any other driver.
Every iteration of for_each_available_child_of_node() decrements
the reference count of the previous node.
when breaking early from a for_each_available_child_of_node() loop,
we need to explicitly call of_node_put() on the gphy_fw_np.
Add missing of_node_put() to avoid refcount leak.
Fixes: 1e1742476b63 ("net: dsa: Add Lantiq / Intel DSA driver for vrx200") Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Link: https://lore.kernel.org/r/20220605072335.11257-1-linmq006@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
syzbot reported an illegal copy_to_user() attempt
from bpf_prog_get_info_by_fd() [1]
There was no repro yet on this bug, but I think
that commit 60ad8ce18223 ("mm/usercopy: Detect vmalloc overruns")
is exposing a prior bug in bpf arm64.
bpf_prog_get_info_by_fd() looks at prog->jited_len
to determine if the JIT image can be copied out to user space.
My theory is that syzbot managed to get a prog where prog->jited_len
has been set to 43, while prog->bpf_func has ben cleared.
It is not clear why copy_to_user(uinsns, NULL, ulen) is triggering
this particular warning.
I thought find_vma_area(NULL) would not find a vm_struct.
As we do not hold vmap_area_lock spinlock, it might be possible
that the found vm_struct was garbage.
unix_dgram_poll() calls unix_dgram_peer_wake_me() without `other`'s
lock held and check if its receive queue is full. Here we need to
use unix_recvq_full_lockless() instead of unix_recvq_full(), otherwise
KCSAN will report a data-race.
EXPORT_SYMBOL and __init is a bad combination because the .init.text
section is freed up after the initialization. Hence, modules cannot
use symbols annotated __init. The access to a freed symbol may end up
with kernel panic.
modpost used to detect it, but it has been broken for a decade.
Recently, I fixed modpost so it started to warn it again, then this
showed up in linux-next builds.
There are two ways to fix it:
- Remove __init
- Remove EXPORT_SYMBOL
I chose the latter for this case because none of the in-tree call-sites
(arch/arm/xen/enlighten.c, arch/x86/xen/grant-table.c) is compiled as
modular.
If user requests for NFT_CHAIN_HW_OFFLOAD, then check if either device
provides the .ndo_setup_tc interface or there is an indirect flow block
that has been registered. Otherwise, bail out early from the preparation
phase. Moreover, validate that family == NFPROTO_NETDEV and hook is
NF_NETDEV_INGRESS.
of_find_device_by_node() takes reference, we should use put_device()
to release it when not need anymore.
Add missing put_device() to avoid refcount leak.
Fixes: ea473861584c ("MIPS/OCTEON/ata: Convert pata_octeon_cf.c to use device tree.") Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru> Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Remove inactive bool field in nft_hook object that was introduced in 6a32264e7707 ("netfilter: nf_tables: delete devices from flowtable").
Move stale flowtable hooks to transaction list instead.
Deleting twice the same device does not result in ENOENT.
Fixes: 6a32264e7707 ("netfilter: nf_tables: delete devices from flowtable") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Handle empty data-lanes = < >; property, which translates to
dsi_lanes = 0 as invalid.
Fixes: 454c13700cba3 ("drm/bridge: ti-sn65dsi83: Add TI SN65DSI83 and SN65DSI84 driver") Signed-off-by: Marek Vasut <marex@denx.de> Cc: Jonas Karlman <jonas@kwiboo.se> Cc: Laurent Pinchart <Laurent.pinchart@ideasonboard.com> Cc: Lucas Stach <l.stach@pengutronix.de> Cc: Marek Vasut <marex@denx.de> Cc: Maxime Ripard <maxime@cerno.tech> Cc: Neil Armstrong <narmstrong@baylibre.com> Cc: Robert Foss <robert.foss@linaro.org> Cc: Sam Ravnborg <sam@ravnborg.org> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> Reviewed-by: Lucas Stach <l.stach@pengutronix.de> Link: https://patchwork.freedesktop.org/patch/msgid/20220518233844.248504-1-marex@denx.de Signed-off-by: Sasha Levin <sashal@kernel.org>
sn65dsi83_parse_dt() takes a reference on 'ctx->host_node' that must be
released in the error handling path of this function and of the probe.
This is only done in the remove function up to now.
In order to garantee migrated data be persisted during checkpoint,
otherwise out-of-order persistency between data and node may cause
data corruption after SPOR.
Configuring for a nommu classic m68k target and enabling the generic rtc
driver (CONFIG_RTC_DRV_GENERIC) will result in the following compile
error:
m68k-linux-ld: arch/m68k/kernel/time.o: in function `rtc_ioctl':
time.c:(.text+0x82): undefined reference to `mach_get_rtc_pll'
m68k-linux-ld: time.c:(.text+0xbc): undefined reference to `mach_set_rtc_pll'
m68k-linux-ld: time.c:(.text+0xf4): undefined reference to `mach_set_rtc_pll'
There are no definitions of "mach_set_rtc_pll" and "mach_get_rtc_pll" in the
nommu code paths. Move these definitions and the associated "mach_hwclk",
so that they are around their use case in time.c. This means they will
always be defined on the builds that require them, and not on those that
cannot use them - such as ColdFire (both with and without MMU enabled).
The pointer to buffer loading kernel binaries is in kernel space for
kexec_fil mode, When copy_from_user copies data from pointer to a block
of memory, it checkes that the pointer is in the user space range, on
RISCV-V that is:
and TASK_SIZE is 0x4000000000 for 64-bits, which now causes
copy_from_user to reject the access of the field 'buf' of struct
kexec_segment that is in range [CONFIG_PAGE_OFFSET - VMALLOC_SIZE,
CONFIG_PAGE_OFFSET), is invalid user space pointer.
This patch fixes this issue by skipping access_ok(), use mempcy() instead.
In pxa3xx_gcu_probe(), the sequence of error lable is wrong, it will
leads some resource leaked, so adjust the sequence to handle the error
correctly, and if pxa3xx_gcu_add_buffer() fails, pxa3xx_gcu_free_buffers()
need be called.
In pxa3xx_gcu_remove(), add missing clk_disable_unpreprare().
This patch fixes a bug where GEN1 VMs doesn't allow resolutions greater
than 64 MB size (eg 7680x4320). Unnecessary PCI check limits Gen1 VRAM
to legacy PCI BAR size only (ie 64MB). Thus any, resolution requesting
greater then 64MB (eg 7680x4320) would fail. MMIO region assigning this
memory shouldn't be limited by PCI bar size.
When doing layoutget as part of the open() compound, we have to be
careful to release the layout locks before we can call any further RPC
calls, such as setattr(). The reason is that those calls could trigger
a recall, which could deadlock.
The arguments desc_size and ZYNQMP_DMA_NUM_DESCS were 32 bit. Though
this overflow condition is not observed but it is a potential problem
in the case of 32-bit multiplication. Hence fix it by changing the
desc_size data type to size_t.
In addition to coverity fix it also reuse ZYNQMP_DMA_DESC_SIZE macro in
dma_alloc_coherent API argument.
When configuring a nommu classic m68k system enabling the uboot parameter
passing support (CONFIG_UBOOT) will produce the following compile error:
m68k-linux-ld: arch/m68k/kernel/uboot.o: in function `process_uboot_commandline':
uboot.c:(.init.text+0x32): undefined reference to `_init_sp'
The logic to support this option is only used on ColdFire based platforms
(in its head.S startup code). So make the selection of this option
depend on building for a ColdFire based platform.
The non-MMU m68k pagetable ZERO_PAGE() macro is being set to the
somewhat non-sensical value of "virt_to_page(0)". The zeroth page
is not in any way guaranteed to be a page full of "0". So the result
is that ZERO_PAGE() will almost certainly contain random values.
We already allocate a real "empty_zero_page" in the mm setup code shared
between MMU m68k and non-MMU m68k. It is just not hooked up to the
ZERO_PAGE() macro for the non-MMU m68k case.
Fix ZERO_PAGE() to use the allocated "empty_zero_page" pointer.
I am not aware of any specific issues caused by the old code.
Timeout as 1 second sets an upper limit on the length
of the transfer executed, but there is no maximum length
of a write or read message set in i2c_adapter_quirks for
this controller.
This upper limit affects devices that require sending
large firmware blobs over I2C.
To remove that limitation, calculate the minimal time
necessary, plus some wiggle room, for every message and
use it instead of the default one second, if more than
one second.
Signed-off-by: Lucas Tanure <tanureal@opensource.cirrus.com> Acked-by: Michal Simek <michal.simek@xilinx.com> Signed-off-by: Wolfram Sang <wsa@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Syzbot triggers two WARNs in f2fs_is_valid_blkaddr and
__is_bitmap_valid. For example, in f2fs_is_valid_blkaddr,
if type is DATA_GENERIC_ENHANCE or DATA_GENERIC_ENHANCE_READ,
it invokes WARN_ON if blkaddr is not in the right range.
The call trace is as follows:
It will cause null-ptr-deref when using 'res', if platform_get_resource()
returns NULL, so move using 'res' after devm_ioremap_resource() that
will check it to avoid null-ptr-deref.
And use devm_platform_get_and_ioremap_resource() to simplify code.
To prepare for support asynchronous tracer_init_tracefs initcall,
avoid calling create_trace_option_files before __update_tracer_options.
Otherwise, create_trace_option_files will show warning because
some tracers in trace_types list are already in tr->topts.
For example, hwlat_tracer call register_tracer in late_initcall,
and global_trace.dir is already created in tracing_init_dentry,
hwlat_tracer will be put into tr->topts.
Then if the __update_tracer_options is executed after hwlat_tracer
registered, create_trace_option_files find that hwlat_tracer is
already in tr->topts.
When setting bootparams="trace_event=initcall:initcall_start tp_printk=1" in the
cmdline, the output_printk() was called, and the spin_lock_irqsave() was called in the
atomic and irq disable interrupt context suitation. On the PREEMPT_RT kernel,
these locks are replaced with sleepable rt-spinlock, so the stack calltrace will
be triggered.
Fix it by raw_spin_lock_irqsave when PREEMPT_RT and "trace_event=initcall:initcall_start
tp_printk=1" enabled.
Even though idxd driver has always supported interrupt, it never actually
set the DMA_INTERRUPT cap bit. Rectify this mistake so the interrupt
capability is advertised.
The function percent_rmt_hitm_cmp() wrongly uses local HITMs for
sorting remote HITMs.
Since this function is to sort cache lines for remote HITMs, this patch
changes to use 'rmt_hitm' field for correct sorting.
Fixes: 6acd8fb40c1aecc3 ("perf c2c report: Add hitm/store percent related sort keys") Signed-off-by: Leo Yan <leo.yan@linaro.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Joe Mario <jmario@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220530084253.750190-1-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Mounting NFS rootfs was timing out when deferred_probe_timeout was
non-zero [1]. This was because ip_auto_config() initcall times out
waiting for the network interfaces to show up when
deferred_probe_timeout was non-zero. While ip_auto_config() calls
wait_for_device_probe() to make sure any currently running deferred
probe work or asynchronous probe finishes, that wasn't sufficient to
account for devices being deferred until deferred_probe_timeout.
Commit a0d279236f01 ("driver core: Ensure wait_for_device_probe() waits
until the deferred_probe_timeout fires") tried to fix that by making
sure wait_for_device_probe() waits for deferred_probe_timeout to expire
before returning.
However, if wait_for_device_probe() is called from the kernel_init()
context:
- Before deferred_probe_initcall() [2], it causes the boot process to
hang due to a deadlock.
- After deferred_probe_initcall() [3], it blocks kernel_init() from
continuing till deferred_probe_timeout expires and beats the point of
deferred_probe_timeout that's trying to wait for userspace to load
modules.
Neither of this is good. So revert the changes to
wait_for_device_probe().
If sd_probe() sees an early error before sdkp->device is initialized,
sd_zbc_release_disk() is called. This causes a NULL pointer dereference
when sd_is_zoned() is called inside that function. Avoid this by removing
the call to sd_zbc_release_disk() in sd_probe() error path.
This change is safe and does not result in zone information memory leakage
because the zone information for a zoned disk is allocated only when
sd_revalidate_disk() is called, at which point sdkp->disk_dev is fully set,
resulting in sd_disk_release() being called when needed to cleanup a disk
zone information using sd_zbc_release_disk().
Link: https://lore.kernel.org/r/20220601062544.905141-2-damien.lemoal@opensource.wdc.com Fixes: fc529d62a928 ("sd: Implement support for ZBC devices") Reported-by: Dongliang Mu <mudongliangabcd@gmail.com> Suggested-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
In AFS, a directory is handled as a file that the client downloads and
parses locally for the purposes of performing lookup and getdents
operations. The in-kernel afs filesystem has a number of functions that
do this.
A directory file is arranged as a series of 2K blocks divided into
32-byte slots, where a directory entry occupies one or more slots, plus
each block starts with one or more metadata blocks.
When parsing a block, if the last slots are occupied by a dirent that
occupies more than a single slot and the file position points at a slot
that's not the initial one, the logic in afs_dir_iterate_block() that
skips over it won't advance the file pointer to the end of it. This
will cause an infinite loop in getdents() as it will keep retrying that
block and failing to advance beyond the final entry.
Fix this by advancing the file pointer if the next entry will be beyond
it when we skip a block.
This was found by the generic/676 xfstest but can also be triggered with
something like:
In the binding example, the regulator mode 4 is shown as a valid mode,
but the driver actually only support mode 0 to 2:
This generates an error in dmesg when copy/pasting the binding example:
[ 0.306080] vbuck1: invalid regulator-allowed-modes element 4
[ 0.307290] vbuck2: invalid regulator-allowed-modes element 4
This commit fixes this error by removing the invalid mode from the
examples.
Fixes: 7965f75f288a ("regulator: document binding for MT6315 regulator") Signed-off-by: Fabien Parent <fparent@baylibre.com> Link: https://lore.kernel.org/r/20220529154613.337559-1-fparent@baylibre.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Commit 9c2b0ba40099 ("s390: add options to change branch prediction
behaviour for the kernel") introduced .Lsie_exit label - supposedly
to fence off SIE instruction. However, the corresponding address
range length .Lsie_crit_mcck_length was not updated, which led to
BPON code potentionally marked with CIF_MCCK_GUEST flag.
Both .Lsie_exit and .Lsie_crit_mcck_length were removed with commit 140ddaa40243 ("s390: remove critical section cleanup from entry.S"),
but the issue persisted - currently BPOFF and BPENTER macros might
get wrongly considered by the machine check handler as a guest.
Fixes: 9c2b0ba40099 ("s390: add options to change branch prediction behaviour for the kernel") Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
We should set the pci driver data in probe instead of the vdpa device
adding callback. Otherwise if no vDPA device is created we will lose
the pointer to the management device.
Fixes: 9e1b75f5c3728 ("vDPA/ifcvf: implement management netlink framework for ifcvf") Tested-by: Zheyu Ma <zheyuma97@gmail.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220524055557.1938-1-jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
1) A new passive FastOpen TCP socket is created.
This FO socket waits for an ACK coming from client to be a complete
ESTABLISHED one.
2) A socket operation on this socket goes through lock_sock()
release_sock() dance.
3) While the socket is owned by the user in step 2),
a retransmit of the SYN is received and stored in socket backlog.
4) At release_sock() time, the socket backlog is processed while
in process context.
5) A SYNACK packet is cooked in response of the SYN retransmit.
6) -> tcp_rtx_synack() is called in process context.
Before blamed commit, tcp_rtx_synack() was always called from BH handler,
from a timer handler.
Fix this by using TCP_INC_STATS() & NET_INC_STATS()
which do not assume caller is in non preemptible context.
In qdisc_run_end(), the spin_unlock() only has store-release semantic,
which guarantees all earlier memory access are visible before it. But
the subsequent test_bit() has no barrier semantics so may be reordered
ahead of the spin_unlock(). The store-load reordering may cause a packet
stuck problem.
The concurrent operations can be described as below,
CPU 0 | CPU 1
qdisc_run_end() | qdisc_run_begin()
. | .
----> /* may be reorderd here */ | .
| . | .
| spin_unlock() | set_bit()
| . | smp_mb__after_atomic()
---- test_bit() | spin_trylock()
. | .
Consider the following sequence of events:
CPU 0 reorder test_bit() ahead and see MISSED = 0
CPU 1 calls set_bit()
CPU 1 calls spin_trylock() and return fail
CPU 0 executes spin_unlock()
At the end of the sequence, CPU 0 calls spin_unlock() and does nothing
because it see MISSED = 0. The skb on CPU 1 has beed enqueued but no one
take it, until the next cpu pushing to the qdisc (if ever ...) will
notice and dequeue it.
This patch fix this by adding one explicit barrier. As spin_unlock() and
test_bit() ordering is a store-load ordering, a full memory barrier
smp_mb() is needed here.
Fixes: d26b22ffd3e9 ("net: sched: fix packet stuck problem for lockless qdisc") Signed-off-by: Guoju Fang <gjfang@linux.alibaba.com> Link: https://lore.kernel.org/r/20220528101628.120193-1-gjfang@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Some features (LRO, HW GRO) conflict with XDP. If there is an attempt to
enable such features while XDP is active, they will be set to `off
[requested on]`. In order to activate these features after XDP is turned
off, the driver needs to call netdev_update_features(). This commit adds
this missing call after XDP state changes.
Fixes: cdfa51004b50 ("net/mlx5e: Properly block LRO when XDP is enabled") Fixes: 2125deee1f60 ("net/mlx5e: Properly block HW GRO when XDP is enabled") Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Tuple entries that don't have nat configured for them
which are added to the ct nat table will always create
a new modify header, as we don't check for possible
re-use on them. The same for tuples that have nat configured
for them but are added to ct table.
Fix the above by only avoiding wasteful re-use lookup
for actually natted entries in ct nat table.
The cited commit changed promisc table to be created on demand with the
highest priority in the NIC table replacing the vlan table, this caused
tc NIC tables miss flow to skip the prmoisc table because it use vlan
table as miss table.
OVS offload in NIC mode use promisc by default so any unicast packet
which will be handled by tc NIC tables miss flow will skip the promisc
rule and will be dropped.
Fix this by adding new empty table in new tc level with low priority and
point the nic tc chain miss to it, the new table is managed so it will
point to vlan table if promisc is disabled and to promisc table if enabled.
If an error occurs after a successful pci_request_selected_regions() call,
it should be undone by a corresponding pci_release_selected_regions() call,
as already done in vp_modern_remove().
Fixes: 90c9c82be585 ("virtio-pci: introduce modern device module") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Message-Id: <237109725aad2c3c03d14549f777b1927c84b045.1648977064.git.christophe.jaillet@wanadoo.fr> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
In vdpa_nl_cmd_dev_get_doit(), if the call to genlmsg_reply() fails we
must not call nlmsg_free() since this is done inside genlmsg_reply().
Fix it.
Fixes: 5d73c85b0481 ("vdpa: Enable user to query vdpa device info") Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Eli Cohen <elic@nvidia.com>
Message-Id: <20220518133804.1075129-2-elic@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Most of bioset_exit() is fine being called twice, as it clears the
various allocations etc when they are freed. The exception is
bio_alloc_cache_destroy(), which does not clear ->cache when it has
freed it.
This isn't necessarily a bug, but can be if buggy users does call the
exit path more then once, or with just a memset() bioset which has
never been initialized. dm appears to be one such user.
tx_channel_offset is calculated in efx_allocate_msix_channels, but it is
also calculated again in efx_set_channels because it was originally done
there, and when efx_allocate_msix_channels was introduced it was
forgotten to be removed from efx_set_channels.
Moreover, the old calculation is wrong when using
efx_separate_tx_channels because now we can have XDP channels after the
TX channels, so n_channels - n_tx_channels doesn't point to the first TX
channel.
Remove the old calculation from efx_set_channels, and add the
initialization of this variable if MSI or legacy interrupts are used,
next to the initialization of the rest of the related variables, where
it was missing.
Fixes: 2e6b0faad9d7 ("sfc: allocate channels for XDP tx queues") Reported-by: Tianhao Zhao <tizhao@redhat.com> Signed-off-by: Íñigo Huguet <ihuguet@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
Normally, all channels have RX and TX queues, but this is not true if
modparam efx_separate_tx_channels=1 is used. In that cases, some
channels only have RX queues and others only TX queues (or more
preciselly, they have them allocated, but not initialized).
Fix efx_channel_has_tx_queues to return the correct value for this case
too.
Messages shown at probe time before the fix:
sfc 0000:03:00.0 ens6f0np0: MC command 0x82 inlen 544 failed rc=-22 (raw=0) arg=0
------------[ cut here ]------------
netdevice: ens6f0np0: failed to initialise TXQ -1
WARNING: CPU: 1 PID: 626 at drivers/net/ethernet/sfc/ef10.c:2393 efx_ef10_tx_init+0x201/0x300 [sfc]
[...] stripped
RIP: 0010:efx_ef10_tx_init+0x201/0x300 [sfc]
[...] stripped
Call Trace:
efx_init_tx_queue+0xaa/0xf0 [sfc]
efx_start_channels+0x49/0x120 [sfc]
efx_start_all+0x1f8/0x430 [sfc]
efx_net_open+0x5a/0xe0 [sfc]
__dev_open+0xd0/0x190
__dev_change_flags+0x1b3/0x220
dev_change_flags+0x21/0x60
[...] stripped
Messages shown at remove time before the fix:
sfc 0000:03:00.0 ens6f0np0: failed to flush 10 queues
sfc 0000:03:00.0 ens6f0np0: failed to flush queues
Fixes: b9a23191bc74 ("sfc: fix channel allocation with brute force") Reported-by: Tianhao Zhao <tizhao@redhat.com> Signed-off-by: Martin Habets <habetsm.xilinx@gmail.com> Tested-by: Íñigo Huguet <ihuguet@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
We had various bugs over the years with code
breaking the assumption that tp->snd_cwnd is greater
than zero.
Lately, syzbot reported the WARN_ON_ONCE(!tp->prior_cwnd) added
in commit 880d0d4d2051 ("tcp: fix zero cwnd in tcp_cwnd_reduction")
can trigger, and without a repro we would have to spend
considerable time finding the bug.
Instead of complaining too late, we want to catch where
and when tp->snd_cwnd is set to an illegal value.
"struct smc_cdc_tx_pend **" can not directly convert
to "struct smc_wr_tx_pend_priv *".
Fixes: ec2304289e09 ("net/smc: put slot when connection is killed") Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
Appartly bcache can copy into bios that do not just contain fresh
pages but can have offsets into the bio_vecs. Restore support for tht
in bio_copy_data_iter.
Fixes: 4d65b6743169 ("block: rewrite bio_copy_data_iter to use bvec_kmap_local and memcpy_to_bvec") Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220524143919.1155501-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
I think there is something wrong with BPF_PROBE_MEM in ___bpf_prog_run()
in big-endian machine. Let's make a test and see what will happen if we
want to load a 'u16' with BPF_PROBE_MEM.
Let's make the src value '0x0001', the value of dest register will become
0x0001000000000000, as the value will be loaded to the first 2 byte of
DST with following code:
The code of the test case is simply, it will load sk->sk_family to the
register with BPF_PROBE_MEM and check if it is AF_INET6. With this patch,
now the test case 'bpf_iter' can pass:
$10 bpf_iter:OK
Fixes: cf537e6867cf ("bpf: Add support for BTF pointers to interpreter") Signed-off-by: Menglong Dong <imagedong@tencent.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jiang Biao <benbjiang@tencent.com> Reviewed-by: Hao Peng <flyingpeng@tencent.com> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/bpf/20220524021228.533216-1-imagedong@tencent.com Signed-off-by: Sasha Levin <sashal@kernel.org>
3ae12ffb0e4a ("random: remove unused tracepoints") removed all the
tracepoints from drivers/char/random.c, one of which,
random:urandom_read, was used by stacktrace_build_id selftest to trigger
stack trace capture.
Fix breakage by switching to kprobing urandom_read() function.
There at least 6 PEBs reserved on UBI device:
1. EBA_RESERVED_PEBS[1]
2. WL_RESERVED_PEBS[1]
3. UBI_LAYOUT_VOLUME_EBS[2]
4. MIN_FASTMAP_RESERVED_PEBS[2]
When all ubi volumes take all their PEBs, there are 3 (EBA_RESERVED_PEBS +
WL_RESERVED_PEBS + MIN_FASTMAP_RESERVED_PEBS - MIN_FASTMAP_TAKEN_PEBS[1])
free PEBs. Since commit 3f238ace2d6f09e ("ubi: Fix producing anchor PEBs")
and commit d9363547c71f1f1 ("ubi: Select fastmap anchor PEBs considering
wear level rules") applied, there is only 1 (3 - FASTMAP_ANCHOR_PEBS[1] -
FASTMAP_NEXT_ANCHOR_PEBS[1]) free PEB to fill pool and wl_pool, after
filling pool, wl_pool is always empty. So, UBI could be stuck in an
infinite loop:
, which cause high cpu usage of ubi_bgt:
top - 12:10:42 up 5 min, 2 users, load average: 1.76, 0.68, 0.27
Tasks: 123 total, 3 running, 54 sleeping, 0 stopped, 0 zombie
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1589 root 20 0 0 0 0 R 45.0 0.0 0:38.86 ubi_bgt0d
319 root 20 0 0 0 0 I 15.2 0.0 0:15.29 kworker/0:3-eve
371 root 20 0 0 0 0 I 14.9 0.0 0:12.85 kworker/3:3-eve
20 root 20 0 0 0 0 I 11.3 0.0 0:05.33 kworker/1:0-eve
202 root 20 0 0 0 0 I 11.3 0.0 0:04.93 kworker/2:3-eve
In commit d9363547c71f1f1 ("ubi: Select fastmap anchor PEBs considering
wear level rules"), there are three key changes:
1) Choose the fastmap anchor when the most free PEBs are available.
2) Enable anchor move within the anchor area again as it is useful
for distributing wear.
3) Import a candidate fm anchor and check this PEB's erase count during
wear leveling. If the wear leveling limit is exceeded, use the used
anchor area PEB with the lowest erase count to replace it.
The anchor candidate can be removed, we can check fm_anchor PEB's erase
count during wear leveling. Fix it by:
1) Removing 'fm_next_anchor' and check 'fm_anchor' during wear leveling.
2) Preferentially filling one free peb into fm_wl_pool in condition of
ubi->free_count > ubi->beb_rsvd_pebs, then try to reserve enough
free count for fastmap non anchor pebs after the above prerequisites
are met.
Then, there are at least 1 PEB in pool and 1 PEB in wl_pool after calling
ubi_refill_pools() with all erase works done.
With the `-z unique-symbol` linker flag or any similar mechanism,
it is possible to trigger the following:
ERROR: modpost: "param_set_uint.0" [vmlinux] is a static EXPORT_SYMBOL
The reason is that for now the condition from remove_dot():
if (m && (s[n + m] == '.' || s[n + m] == 0))
which was designed to test if it's a dot or a '\0' after the suffix
is never satisfied.
This is due to that `s[n + m]` always points to the last digit of a
numeric suffix, not on the symbol next to it (from a custom debug
print added to modpost):
param_set_uint.0, s[n + m] is '0', s[n + m + 1] is '\0'
So it's off-by-one and was like that since 2014.
Fix this for the sake of any potential upcoming features, but don't
bother stable-backporting, as it's well hidden -- apart from that
LD flag, it can be triggered only with GCC LTO which never landed
upstream.
of_get_child_by_name() returns a node pointer with refcount
incremented, we should use of_node_put() on it when done.
mv88e6xxx_mdio_register() pass the device node to of_mdiobus_register().
We don't need the device node after it.
Add missing of_node_put() to avoid refcount leak.
Fixes: 3a17ca7229f7 ("net: dsa: mv88e6xxx: Support multiple MDIO busses") Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Reviewed-by: Marek Behún <kabel@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
of_get_child_by_name() returns a node pointer with refcount
incremented, we should use of_node_put() on it when not need anymore.
am65_cpsw_init_cpts() and am65_cpsw_nuss_probe() don't release
the refcount in error case.
Add missing of_node_put() to avoid refcount leak.
Fixes: ed8eb260deff ("net: ethernet: ti: am65-cpsw-nuss: enable packet timestamping support") Fixes: ddd326a112f7 ("net: ethernet: ti: introduce am65x/j721e gigabit eth subsystem driver") Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
The "fsp->location" variable comes from user via ethtool_get_rxnfc().
Check that it is valid to prevent an out of bounds read.
Fixes: f03e5ebc0605 ("net: ethernet: mediatek: add ethtool functions to configure RX flows of HW LRO") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
In qdisc_run_begin(), smp_mb__before_atomic() used before test_bit()
does not provide any ordering guarantee as test_bit() is not an atomic
operation. This, added to the fact that the spin_trylock() call at
the beginning of qdisc_run_begin() does not guarantee acquire
semantics if it does not grab the lock, makes it possible for the
following statement :
if (test_bit(__QDISC_STATE_MISSED, &qdisc->state))
to be executed before an enqueue operation called before
qdisc_run_begin().
In the above scenario, CPU 1 and CPU 2 both try to grab the
qdisc->seqlock at the same time. Only CPU 2 succeeds and enters the
bypass code path, where it emits its skb then calls __qdisc_run().
CPU1 fails, sets MISSED and goes down the traditionnal enqueue() +
dequeue() code path. But when executing qdisc_run_begin() for the
second time, after enqueuing its skbuff, it sees the MISSED bit still
set (by itself) and consequently chooses to exit early without setting
it again nor trying to grab the spinlock again.
Meanwhile CPU2 has seen MISSED = 1, cleared it, checked the queue
and found it empty, so it returned.
At the end of the sequence, we end up with skb1 enqueued in the
backlog, both CPUs out of __dev_xmit_skb(), the MISSED bit not set,
and no __netif_schedule() called made. skb1 will now linger in the
qdisc until somebody later performs a full __qdisc_run(). Associated
to the bypass capacity of the qdisc, and the ability of the TCP layer
to avoid resending packets which it knows are still in the qdisc, this
can lead to serious traffic "holes" in a TCP connection.
We fix this by replacing the smp_mb__before_atomic() / test_bit() /
set_bit() / smp_mb__after_atomic() sequence inside qdisc_run_begin()
by a single test_and_set_bit() call, which is more concise and
enforces the needed memory barriers.
Fixes: c66448a6f72e ("net: sched: add barrier to ensure correct ordering for lockless qdisc") Signed-off-by: Vincent Ray <vray@kalrayinc.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20220526001746.2437669-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>