Jakub Sitnicki [Mon, 21 Nov 2022 08:54:26 +0000 (09:54 +0100)]
l2tp: Don't sleep and disable BH under writer-side sk_callback_lock
When holding a reader-writer spin lock we cannot sleep. Calling
setup_udp_tunnel_sock() with write lock held violates this rule, because we
end up calling percpu_down_read(), which might sleep, as syzbot reports
[1]:
Trim the writer-side critical section for sk_callback_lock down to the
minimum, so that it covers only operations on sk_user_data.
Also, when grabbing the sk_callback_lock, we always need to disable BH, as
Eric points out. Failing to do so leads to deadlocks because we acquire
sk_callback_lock in softirq context, which can get stuck waiting on us if:
v2:
- Check and set sk_user_data while holding sk_callback_lock for both
L2TP encapsulation types (IP and UDP) (Tetsuo)
Cc: Tom Parkin <tparkin@katalix.com> Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Fixes: 6e77507cc06b ("l2tp: Serialize access to sk_user_data with sk_callback_lock") Reported-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot+703d9e154b3b58277261@syzkaller.appspotmail.com Reported-by: syzbot+50680ced9e98a61f7698@syzkaller.appspotmail.com Reported-by: syzbot+de987172bb74a381879b@syzkaller.appspotmail.com Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yuan Can [Mon, 21 Nov 2022 03:32:26 +0000 (03:32 +0000)]
net: dm9051: Fix missing dev_kfree_skb() in dm9051_loop_rx()
The dm9051_loop_rx() returns without release skb when dm9051_stop_mrcmd()
returns error, free the skb to avoid this leak.
Fixes: b4c6114939b8 ("net: Add dm9051 driver") Signed-off-by: Yuan Can <yuancan@huawei.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Wang Hai [Sun, 20 Nov 2022 06:24:38 +0000 (14:24 +0800)]
arcnet: fix potential memory leak in com20020_probe()
In com20020_probe(), if com20020_config() fails, dev and info
will not be freed, which will lead to a memory leak.
This patch adds freeing dev and info after com20020_config()
fails to fix this bug.
Compile tested only.
Fixes: 16cd360085fe ("[PATCH] pcmcia: add return value to _config() functions") Signed-off-by: Wang Hai <wanghai38@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Wed, 23 Nov 2022 04:20:58 +0000 (20:20 -0800)]
Merge tag 'mlx5-fixes-2022-11-21' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5 fixes 2022-11-21
This series provides bug fixes to mlx5 driver.
* tag 'mlx5-fixes-2022-11-21' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5e: Fix possible race condition in macsec extended packet number update routine
net/mlx5e: Fix MACsec update SecY
net/mlx5e: Fix MACsec SA initialization routine
net/mlx5e: Remove leftovers from old XSK queues enumeration
net/mlx5e: Offload rule only when all encaps are valid
net/mlx5e: Fix missing alignment in size of MTT/KLM entries
net/mlx5: Fix sync reset event handler error flow
net/mlx5: E-Switch, Set correctly vport destination
net/mlx5: Lag, avoid lockdep warnings
net/mlx5: Fix handling of entry refcount when command is not issued to FW
net/mlx5: cmdif, Print info on any firmware cmd failure to tracepoint
net/mlx5: SF: Fix probing active SFs during driver probe phase
net/mlx5: Fix FW tracer timestamp calculation
net/mlx5: Do not query pci info while pci disabled
====================
Yan Cangang [Sun, 20 Nov 2022 05:52:59 +0000 (13:52 +0800)]
net: ethernet: mtk_eth_soc: fix memory leak in error path
In mtk_ppe_init(), when dmam_alloc_coherent() or devm_kzalloc() failed,
the rhashtable ppe->l2_flows isn't destroyed. Fix it.
In mtk_probe(), when mtk_ppe_init() or mtk_eth_offload_init() or
register_netdev() failed, have the same problem. Fix it.
Fixes: 2aa931cdbe4a ("net: ethernet: mtk_eth_soc: support creating mac address based offload entries") Signed-off-by: Yan Cangang <nalanzeyu@gmail.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ziyang Xuan [Sun, 20 Nov 2022 03:54:05 +0000 (11:54 +0800)]
net: ethernet: mtk_eth_soc: fix potential memory leak in mtk_rx_alloc()
When fail to dma_map_single() in mtk_rx_alloc(), it returns directly.
But the memory allocated for local variable data is not freed, and
local variabel data has not been attached to ring->data[i] yet, so the
memory allocated for local variable data will not be freed outside
mtk_rx_alloc() too. Thus memory leak would occur in this scenario.
Add skb_free_frag(data) when dma_map_single() failed.
Fixes: 23c4b376a76d ("net: ethernet: mtk_eth_soc: rely on page_pool for single page buffers") Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com> Acked-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Link: https://lore.kernel.org/r/20221120035405.1464341-1-william.xuanziyang@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
====================
dccp/tcp: Fix bhash2 issues related to WARN_ON() in inet_csk_get_port().
syzkaller was hitting a WARN_ON() in inet_csk_get_port() in the 4th patch,
which was because we forgot to fix up bhash2 bucket when connect() for a
socket bound to a wildcard address fails in __inet_stream_connect().
There was a similar report [0], but its repro does not fire the WARN_ON() due
to inconsistent error handling.
When connect() for a socket bound to a wildcard address fails, saddr may or
may not be reset depending on where the failure happens. When we fail in
__inet_stream_connect(), sk->sk_prot->disconnect() resets saddr. OTOH, in
(dccp|tcp)_v[46]_connect(), if we fail after inet_hash6?_connect(), we
forget to reset saddr.
We fix this inconsistent error handling in the 1st patch, and then we'll
fix the bhash2 WARN_ON() issue.
Note that there is still an issue in that we reset saddr without checking
if there are conflicting sockets in bhash and bhash2, but this should be
another series.
dccp/tcp: Fixup bhash2 bucket when connect() fails.
If a socket bound to a wildcard address fails to connect(), we
only reset saddr and keep the port. Then, we have to fix up the
bhash2 bucket; otherwise, the bucket has an inconsistent address
in the list.
Also, listen() for such a socket will fire the WARN_ON() in
inet_csk_get_port(). [0]
Note that when a system runs out of memory, we give up fixing the
bucket and unlink sk from bhash and bhash2 by inet_put_port().
Fixes: 7d8f6909a4d0 ("net: Add a bhash2 table hashed by port and address") Reported-by: syzbot <syzkaller@googlegroups.com> Reported-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When we call connect() for a socket bound to a wildcard address, we update
saddr locklessly. However, it could result in a data race; another thread
iterating over bhash might see a corrupted address.
dccp/tcp: Reset saddr on failure after inet6?_hash_connect().
When connect() is called on a socket bound to the wildcard address,
we change the socket's saddr to a local address. If the socket
fails to connect() to the destination, we have to reset the saddr.
However, when an error occurs after inet_hash6?_connect() in
(dccp|tcp)_v[46]_conect(), we forget to reset saddr and leave
the socket bound to the address.
From the user's point of view, whether saddr is reset or not varies
with errno. Let's fix this inconsistent behaviour.
Note that after this patch, the repro [0] will trigger the WARN_ON()
in inet_csk_get_port() again, but this patch is not buggy and rather
fixes a bug papering over the bhash2's bug for which we need another
fix.
For the record, the repro causes -EADDRNOTAVAIL in inet_hash6_connect()
by this sequence:
This problem is caused by rotten packets, which are received after
polling but before interrupts are enabled again. This can be fixed by
checking for pending work and rescheduling if necessary after interrupts
has been enabled again.
Yang Yingliang [Sat, 19 Nov 2022 07:02:02 +0000 (15:02 +0800)]
bnx2x: fix pci device refcount leak in bnx2x_vf_is_pcie_pending()
As comment of pci_get_domain_bus_and_slot() says, it returns
a pci device with refcount increment, when finish using it,
the caller must decrement the reference count by calling
pci_dev_put(). Call pci_dev_put() before returning from
bnx2x_vf_is_pcie_pending() to avoid refcount leak.
Fixes: dfd193b1e8bb ("bnx2x: Prepare device and initialize VF database") Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/20221119070202.1407648-1-yangyingliang@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Xin Long [Fri, 18 Nov 2022 21:33:03 +0000 (16:33 -0500)]
net: sched: allow act_ct to be built without NF_NAT
In commit e53f71049a64 ("net/sched: Make NET_ACT_CT depends on NF_NAT"),
it fixed the build failure when NF_NAT is m and NET_ACT_CT is y by
adding depends on NF_NAT for NET_ACT_CT. However, it would also cause
NET_ACT_CT cannot be built without NF_NAT, which is not expected. This
patch fixes it by changing to use "(!NF_NAT || NF_NAT)" as the depend.
Liu Jian [Thu, 17 Nov 2022 12:59:18 +0000 (20:59 +0800)]
net: sparx5: fix error handling in sparx5_port_open()
If phylink_of_phy_connect() fails, the port should be disabled.
If sparx5_serdes_set()/phy_power_on() fails, the port should be
disabled and the phylink should be stopped and disconnected.
Fixes: 2e47ca74b0ea ("net: sparx5: add port module support") Fixes: c68622c37ed0 ("net: sparx5: add hostmode with phylink support") Signed-off-by: Liu Jian <liujian56@huawei.com> Tested-by: Bjarni Jonasson <bjarni.jonasson@microchip.com> Reviewed-by: Steen Hegelund <steen.hegelund@microchip.com> Link: https://lore.kernel.org/r/20221117125918.203997-1-liujian56@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Wang ShaoBo [Fri, 18 Nov 2022 06:24:47 +0000 (14:24 +0800)]
net: wwan: iosm: use ACPI_FREE() but not kfree() in ipc_pcie_read_bios_cfg()
acpi_evaluate_dsm() should be coupled with ACPI_FREE() to free the ACPI
memory, because we need to track the allocation of acpi_object when
ACPI_DBG_TRACK_ALLOCATIONS enabled, so use ACPI_FREE() instead of kfree().
Jacob Keller [Fri, 18 Nov 2022 22:27:29 +0000 (14:27 -0800)]
ice: fix handling of burst Tx timestamps
Commit f5a718e65326 ("ice: Add low latency Tx timestamp read") refactored
PTP timestamping logic to use a threaded IRQ instead of a separate kthread.
This implementation introduced ice_misc_intr_thread_fn and redefined the
ice_ptp_process_ts function interface to return a value of whether or not
the timestamp processing was complete.
ice_misc_intr_thread_fn would take the return value from ice_ptp_process_ts
and convert it into either IRQ_HANDLED if there were no more timestamps to
be processed, or IRQ_WAKE_THREAD if the thread should continue processing.
This is not correct, as the kernel does not re-schedule threaded IRQ
functions automatically. IRQ_WAKE_THREAD can only be used by the main IRQ
function.
This results in the ice_ptp_process_ts function (and in turn the
ice_ptp_tx_tstamp function) from only being called exactly once per
interrupt.
If an application sends a burst of Tx timestamps without waiting for a
response, the interrupt will trigger for the first timestamp. However,
later timestamps may not have arrived yet. This can result in dropped or
discarded timestamps. Worse, on E822 hardware this results in the interrupt
logic getting stuck such that no future interrupts will be triggered. The
result is complete loss of Tx timestamp functionality.
Fix this by modifying the ice_misc_intr_thread_fn to perform its own
polling of the ice_ptp_process_ts function. We sleep for a few microseconds
between attempts to avoid wasting significant CPU time. The value was
chosen to allow time for the Tx timestamps to complete without wasting so
much time that we overrun application wait budgets in the worst case.
The ice_ptp_process_ts function also currently returns false in the event
that the Tx tracker is not initialized. This would result in the threaded
IRQ handler never exiting if it gets started while the tracker is not
initialized.
Fix the function to appropriately return true when the tracker is not
initialized.
Note that this will not reproduce with default ptp4l behavior, as the
program always synchronously waits for a timestamp response before sending
another timestamp request.
Reported-by: Siddaraju DH <siddaraju.dh@intel.com> Fixes: f5a718e65326 ("ice: Add low latency Tx timestamp read") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20221118222729.1565317-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Tue, 22 Nov 2022 04:50:12 +0000 (20:50 -0800)]
Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2022-11-18 (iavf)
Ivan Vecera resolves issues related to reset by adding back call to
netif_tx_stop_all_queues() and adding calls to dev_close() to ensure
device is properly closed during reset.
Stefan Assmann removes waiting for setting of MAC address as this breaks
ARP.
Slawomir adds setting of __IAVF_IN_REMOVE_TASK bit to prevent deadlock
between remove and shutdown.
* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
iavf: Fix race condition between iavf_shutdown and iavf_remove
iavf: remove INITIAL_MAC_SET to allow gARP to work properly
iavf: Do not restart Tx queues after reset task failure
iavf: Fix a crash during reset task
====================
====================
tipc: fix two race issues in tipc_conn_alloc
The race exists beteen tipc_topsrv_accept() and tipc_conn_close(),
one is allocating the con while the other is freeing it and there
is no proper lock protecting it. Therefore, a null-pointer-defer
and a use-after-free may be triggered, see details on each patch.
====================
Xin Long [Fri, 18 Nov 2022 21:45:01 +0000 (16:45 -0500)]
tipc: add an extra conn_get in tipc_conn_alloc
One extra conn_get() is needed in tipc_conn_alloc(), as after
tipc_conn_alloc() is called, tipc_conn_close() may free this
con before deferencing it in tipc_topsrv_accept():
This patch fixes it by holding it in tipc_conn_alloc(), then after
all accessing in tipc_topsrv_accept() releasing it. Note when does
this in tipc_topsrv_kern_subscr(), as tipc_conn_rcv_sub() returns
0 or -1 only, we don't need to check for "> 0".
Fixes: 4c838d76896f ("tipc: introduce new TIPC server infrastructure") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
It was caused by !con->sock in tipc_conn_close(). In tipc_topsrv_accept(),
con is allocated in conn_idr then its sock is set:
con = tipc_conn_alloc();
... <----[1]
con->sock = newsock;
If tipc_conn_close() is called in anytime of [1], the null-pointer-def
is triggered by con->sock->sk due to con->sock is not yet set.
This patch fixes it by moving the con->sock setting to tipc_conn_alloc()
under s->idr_lock. So that con->sock can never be NULL when getting the
con from s->conn_idr. It will be also safer to move con->server and flag
CF_CONNECTED setting under s->idr_lock, as they should all be set before
tipc_conn_alloc() is called.
Fixes: 4c838d76896f ("tipc: introduce new TIPC server infrastructure") Reported-by: Wei Chen <harperchen1110@gmail.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Emeel Hakim [Sun, 30 Oct 2022 09:19:52 +0000 (11:19 +0200)]
net/mlx5e: Fix possible race condition in macsec extended packet number update routine
Currenty extended packet number (EPN) update routine is accessing
macsec object without holding the general macsec lock hence facing
a possible race condition when an EPN update occurs while updating
or deleting the SA.
Fix by holding the general macsec lock before accessing the object.
Fixes: 25a2465945e7 ("net/mlx5e: Support MACsec offload extended packet number (EPN)") Signed-off-by: Emeel Hakim <ehakim@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Emeel Hakim [Sun, 30 Oct 2022 09:52:42 +0000 (11:52 +0200)]
net/mlx5e: Fix MACsec update SecY
Currently updating SecY destroys and re-creates RX SA objects,
the re-created RX SA objects are not identical to the destroyed
objects and it disagree on the encryption enabled property which
holds the value false after recreation, this value is not
supported with offload which leads to no traffic after an update.
Fix by recreating an identical objects.
Fixes: e2d59ea8dae3 ("net/mlx5e: Add MACsec offload SecY support") Signed-off-by: Emeel Hakim <ehakim@nvidia.com> Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Emeel Hakim [Sun, 30 Oct 2022 09:43:24 +0000 (11:43 +0200)]
net/mlx5e: Fix MACsec SA initialization routine
Currently as part of MACsec SA initialization routine
extended packet number (EPN) object attribute is always
being set without checking if EPN is actually enabled,
the above could lead to a NULL dereference.
Fix by adding such a check.
Fixes: 25a2465945e7 ("net/mlx5e: Support MACsec offload extended packet number (EPN)") Signed-off-by: Emeel Hakim <ehakim@nvidia.com> Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Tariq Toukan [Mon, 14 Nov 2022 09:56:11 +0000 (11:56 +0200)]
net/mlx5e: Remove leftovers from old XSK queues enumeration
Before the cited commit, for N channels, a dedicated set of N queues was
created to support XSK, in indices [N, 2N-1], doubling the number of
queues.
In addition, changing the number of channels was prohibited, as it would
shift the indices.
Remove these two leftovers, as we moved XSK to a new queueing scheme,
starting from index 0.
Fixes: 1e019f1e58de ("net/mlx5e: xsk: Use queue indices starting from 0 for XSK queues") Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Chris Mi [Thu, 17 Nov 2022 05:45:45 +0000 (07:45 +0200)]
net/mlx5e: Offload rule only when all encaps are valid
The cited commit adds a for loop to support multiple encapsulations.
But it only checks if the last encap is valid.
Fix it by setting slow path flag when one of the encap is invalid.
Fixes: 5e6b075108dd ("net/mlx5e: Move flow attr reformat action bit to per dest flags") Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Moshe Shemesh [Sat, 29 Oct 2022 06:03:48 +0000 (09:03 +0300)]
net/mlx5: Fix sync reset event handler error flow
When sync reset now event handling fails on mlx5_pci_link_toggle() then
no reset was done. However, since mlx5_cmd_fast_teardown_hca() was
already done, the firmware function is closed and the driver is left
without firmware functionality.
Fix it by setting device error state and reopen the firmware resources.
Reopening is done by the thread that was called for devlink reload
fw_activate as it already holds the devlink lock.
Fixes: 58c6894aa3af ("net/mlx5: Add support for devlink reload action fw activate") Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Aya Levin <ayal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Roi Dayan [Wed, 16 Nov 2022 09:10:15 +0000 (11:10 +0200)]
net/mlx5: E-Switch, Set correctly vport destination
The cited commit moved from using reformat_id integer to packet_reformat
pointer which introduced the possibility to null pointer dereference.
When setting packet reformat flag and pkt_reformat pointer must
exists so checking MLX5_ESW_DEST_ENCAP is not enough, we need
to make sure the pkt_reformat is valid and check for MLX5_ESW_DEST_ENCAP_VALID.
If the dest encap valid flag does not exists then pkt_reformat can be
either invalid address or null.
Also, to make sure we don't try to access invalid pkt_reformat set it to
null when invalidated and invalidate it before calling add flow code as
its logically more correct and to be safe.
Fixes: 80a1e71aaa3f ("net/mlx5: Add flow steering actions to fs_cmd shim layer") Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Chris Mi <cmi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Eli Cohen [Mon, 15 Aug 2022 08:25:26 +0000 (11:25 +0300)]
net/mlx5: Lag, avoid lockdep warnings
ldev->lock is used to serialize lag change operations. Since multiport
eswtich functionality was added, we now change the mode dynamically.
However, acquiring ldev->lock is not allowed as it could possibly lead
to a deadlock as reported by the lockdep mechanism.
[ 836.154963] WARNING: possible circular locking dependency detected
[ 836.155850] 5.19.0-rc5_net_56b7df2 #1 Not tainted
[ 836.156549] ------------------------------------------------------
[ 836.157418] handler1/12198 is trying to acquire lock:
[ 836.158178] ffff888187d52b58 (&ldev->lock){+.+.}-{3:3}, at: mlx5_lag_do_mirred+0x3b/0x70 [mlx5_core]
[ 836.159575]
[ 836.159575] but task is already holding lock:
[ 836.160474] ffff8881d4de2930 (&block->cb_lock){++++}-{3:3}, at: tc_setup_cb_add+0x5b/0x200
[ 836.161669] which lock already depends on the new lock.
[ 836.162905]
[ 836.162905] the existing dependency chain (in reverse order) is:
[ 836.164008] -> #3 (&block->cb_lock){++++}-{3:3}:
[ 836.164946] down_write+0x25/0x60
[ 836.165548] tcf_block_get_ext+0x1c6/0x5d0
[ 836.166253] ingress_init+0x74/0xa0 [sch_ingress]
[ 836.167028] qdisc_create.constprop.0+0x130/0x5e0
[ 836.167805] tc_modify_qdisc+0x481/0x9f0
[ 836.168490] rtnetlink_rcv_msg+0x16e/0x5a0
[ 836.169189] netlink_rcv_skb+0x4e/0xf0
[ 836.169861] netlink_unicast+0x190/0x250
[ 836.170543] netlink_sendmsg+0x243/0x4b0
[ 836.171226] sock_sendmsg+0x33/0x40
[ 836.171860] ____sys_sendmsg+0x1d1/0x1f0
[ 836.172535] ___sys_sendmsg+0xab/0xf0
[ 836.173183] __sys_sendmsg+0x51/0x90
[ 836.173836] do_syscall_64+0x3d/0x90
[ 836.174471] entry_SYSCALL_64_after_hwframe+0x46/0xb0
[ 836.175282]
Moshe Shemesh [Thu, 17 Nov 2022 07:07:20 +0000 (09:07 +0200)]
net/mlx5: Fix handling of entry refcount when command is not issued to FW
In case command interface is down, or the command is not allowed, driver
did not increment the entry refcount, but might have decrement as part
of forced completion handling.
Fix that by always increment and decrement the refcount to make it
symmetric for all flows.
Fixes: c9bf91832346 ("net/mlx5: Avoid possible free of command entry while timeout comp handler") Signed-off-by: Eran Ben Elisha <eranbe@nvidia.com> Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reported-by: Jack Wang <jinpu.wang@ionos.com> Tested-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Moshe Shemesh [Tue, 31 May 2022 06:14:03 +0000 (09:14 +0300)]
net/mlx5: cmdif, Print info on any firmware cmd failure to tracepoint
While moving to new CMD API (quiet API), some pre-existing flows may call the new API
function that in case of error, returns the error instead of printing it as previously done.
For such flows we bring back the print but to tracepoint this time for sys admins to
have the ability to check for errors especially for commands using the new quiet API.
Tracepoint output example:
devlink-1333 [001] ..... 822.746922: mlx5_cmd: ACCESS_REG(0x805) op_mod(0x0) failed, status bad resource(0x5), syndrome (0xb06e1f), err(-22)
Fixes: cc938f3d0a24 ("net/mlx5: cmdif, Add new api for command execution") Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Shay Drory [Thu, 4 Aug 2022 09:38:41 +0000 (12:38 +0300)]
net/mlx5: SF: Fix probing active SFs during driver probe phase
When SF devices and SF port representors are located on different
functions, unloading and reloading of SF parent driver doesn't recreate
the existing SF present in the device.
Fix it by querying SFs and probe active SFs during driver probe phase.
Moshe Shemesh [Thu, 20 Oct 2022 09:25:59 +0000 (12:25 +0300)]
net/mlx5: Fix FW tracer timestamp calculation
Fix a bug in calculation of FW tracer timestamp. Decreasing one in the
calculation should effect only bits 52_7 and not effect bits 6_0 of the
timestamp, otherwise bits 6_0 are always set in this calculation.
Roy Novich [Sun, 24 Jul 2022 06:49:07 +0000 (09:49 +0300)]
net/mlx5: Do not query pci info while pci disabled
The driver should not interact with PCI while PCI is disabled. Trying to
do so may result in being unable to get vital signs during PCI reset,
driver gets timed out and fails to recover.
Fixes: 6c21b7068c31 ("net/mlx5: Print more info on pci error handlers") Signed-off-by: Roy Novich <royno@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Aya Levin <ayal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Dan Carpenter [Fri, 18 Nov 2022 15:07:54 +0000 (18:07 +0300)]
octeontx2-af: cn10k: mcs: Fix copy and paste bug in mcs_bbe_intr_handler()
This code accidentally uses the RX macro twice instead of the RX and TX.
Fixes: cc9411755df0 ("octeontx2-af: cn10k: mcs: Handle MCS block interrupts") Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Kees Cook [Fri, 18 Nov 2022 04:21:52 +0000 (20:21 -0800)]
ipv4/fib: Replace zero-length array with DECLARE_FLEX_ARRAY() helper
Zero-length arrays are deprecated[1] and are being replaced with
flexible array members in support of the ongoing efforts to tighten the
FORTIFY_SOURCE routines on memcpy(), correctly instrument array indexing
with UBSAN_BOUNDS, and to globally enable -fstrict-flex-arrays=3.
Replace zero-length array with flexible-array member in struct key_vector.
This results in no differences in binary output.
[1] https://github.com/KSPP/linux/issues/78
Cc: Jakub Kicinski <kuba@kernel.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: David Ahern <dsahern@kernel.org> Cc: Eric Dumazet <edumazet@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: "Gustavo A. R. Silva" <gustavoars@kernel.org> Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Díaz [Fri, 18 Nov 2022 03:44:21 +0000 (21:44 -0600)]
selftests/net: Find nettest in current directory
The `nettest` binary, built from `selftests/net/nettest.c`,
was expected to be found in the path during test execution of
`fcnal-test.sh` and `pmtu.sh`, leading to tests getting
skipped when the binary is not installed in the system, as can
be seen in these logs found in the wild [1]:
# TEST: vti4: PMTU exceptions [SKIP]
[ 350.600250] IPv6: ADDRCONF(NETDEV_CHANGE): veth_b: link becomes ready
[ 350.607421] IPv6: ADDRCONF(NETDEV_CHANGE): veth_a: link becomes ready
# 'nettest' command not found; skipping tests
# xfrm6udp not supported
# TEST: vti6: PMTU exceptions (ESP-in-UDP) [SKIP]
[ 351.605102] IPv6: ADDRCONF(NETDEV_CHANGE): veth_b: link becomes ready
[ 351.612243] IPv6: ADDRCONF(NETDEV_CHANGE): veth_a: link becomes ready
# 'nettest' command not found; skipping tests
# xfrm4udp not supported
The `unicast_extensions.sh` tests also rely on `nettest`, but
it runs fine there because it looks for the binary in the
current working directory [2]:
The same mechanism that works for the Unicast extensions tests
is here copied over to the PMTU and functional tests.
====================
The following patchset contains late Netfilter fixes for net:
1) Use READ_ONCE()/WRITE_ONCE() to update ct->mark, from Daniel Xu.
Not reported by syzbot, but I presume KASAN would trigger post
a splat on this. This is a rather old issue, predating git history.
2) Do not set up extensions for set element with end interval flag
set on. This leads to bogusly skipping this elements as expired
when listing the set/map to userspace as well as increasing
memory consumpton when stateful expressions are used. This issue
has been present since 4.18, when timeout support for rbtree set
was added.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Lu Wei [Thu, 17 Nov 2022 15:07:22 +0000 (23:07 +0800)]
net: microchip: sparx5: Fix return value in sparx5_tc_setup_qdisc_ets()
Function sparx5_tc_setup_qdisc_ets() always returns negative value
because it return -EOPNOTSUPP in the end. This patch returns the
rersult of sparx5_tc_ets_add() and sparx5_tc_ets_del() directly.
Fixes: 8b09c03e2ba5 ("net: microchip: sparx5: add support for offloading ets qdisc") Signed-off-by: Lu Wei <luwei32@huawei.com> Reviewed-by: Daniel Machon <daniel.machon@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Shang XiaoJing [Thu, 17 Nov 2022 11:37:14 +0000 (19:37 +0800)]
nfc: s3fwrn5: Fix potential memory leak in s3fwrn5_nci_send()
s3fwrn5_nci_send() won't free the skb when it failed for the check
before s3fwrn5_write(). As the result, the skb will memleak. Free the
skb when the check failed.
Fixes: 44e3e91fe104 ("nfc: s3fwrn5: Add driver for Samsung S3FWRN5 NFC Chip") Signed-off-by: Shang XiaoJing <shangxiaojing@huawei.com> Suggested-by: Pavel Machek <pavel@denx.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Shang XiaoJing [Thu, 17 Nov 2022 11:37:13 +0000 (19:37 +0800)]
nfc: nxp-nci: Fix potential memory leak in nxp_nci_send()
nxp_nci_send() won't free the skb when it failed for the check before
write(). As the result, the skb will memleak. Free the skb when the
check failed.
Fixes: 36d6b11a0742 ("NFC: nxp-nci: Add support for NXP NCI chips") Signed-off-by: Shang XiaoJing <shangxiaojing@huawei.com> Suggested-by: Pavel Machek <pavel@denx.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Shang XiaoJing [Thu, 17 Nov 2022 11:37:12 +0000 (19:37 +0800)]
nfc: nfcmrvl: Fix potential memory leak in nfcmrvl_i2c_nci_send()
nfcmrvl_i2c_nci_send() will be called by nfcmrvl_nci_send(), and skb
should be freed in nfcmrvl_i2c_nci_send(). However, nfcmrvl_nci_send()
won't free the skb when it failed for the test_bit(). Free the skb when
test_bit() failed.
Fixes: faf2f4e38ac0 ("NFC: nfcmrvl: add i2c driver") Signed-off-by: Shang XiaoJing <shangxiaojing@huawei.com> Suggested-by: Pavel Machek <pavel@denx.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Hangbin Liu [Fri, 18 Nov 2022 03:43:53 +0000 (11:43 +0800)]
bonding: fix ICMPv6 header handling when receiving IPv6 messages
Currently, we get icmp6hdr via function icmp6_hdr(), which needs the skb
transport header to be set first. But there is no rule to ask driver set
transport header before netif_receive_skb() and bond_handle_frame(). So
we will not able to get correct icmp6hdr on some drivers.
Fix this by using skb_header_pointer to get the IPv6 and ICMPV6 headers.
Reported-by: Liang Li <liali@redhat.com> Fixes: 5bd50b2a0a38 ("bonding: add new parameter ns_targets") Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Link: https://lore.kernel.org/r/20221118034353.1285609-1-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Sat, 19 Nov 2022 03:41:24 +0000 (19:41 -0800)]
Merge branch 'nfp-fixes-for-v6-1'
Simon Horman says:
====================
nfp: fixes for v6.1
- Ensure that information displayed by "devlink port show"
reflects the number of lanes available to be split.
- Avoid NULL dereference in ethtool test code.
====================
Jaco Coetzee [Thu, 17 Nov 2022 15:37:44 +0000 (16:37 +0100)]
nfp: add port from netdev validation for EEPROM access
Setting of the port flag `NFP_PORT_CHANGED`, introduced
to ensure the correct reading of EEPROM data, causes a
fatal kernel NULL pointer dereference in cases where
the target netdev type cannot be determined.
Add validation of port struct pointer before attempting
to set the `NFP_PORT_CHANGED` flag. Return that operation
is not supported if the netdev type cannot be determined.
Fixes: 04bc9617883c ("nfp: ethtool: fix the display error of `ethtool -m DEVNAME`") Signed-off-by: Jaco Coetzee <jaco.coetzee@corigine.com> Reviewed-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Diana Wang [Thu, 17 Nov 2022 15:37:43 +0000 (16:37 +0100)]
nfp: fill splittable of devlink_port_attrs correctly
The error is reflected in that it shows wrong splittable status of
port when executing "devlink port show".
The reason which leads the error is that the assigned operation of
splittable is just a simple negation operation of split and it does
not consider port lanes quantity. A splittable port should have
several lanes that can be split(lanes quantity > 1).
If without the judgement, it will show wrong message for some
firmware, such as 2x25G, 2x10G.
Fixes: ebd06ce72133 ("devlink: Add a new devlink port split ability attribute and pass to netlink") Signed-off-by: Diana Wang <na.wang@corigine.com> Reviewed-by: Louis Peens <louis.peens@corigine.com> Reviewed-by: Niklas Söderlund <niklas.soderlund@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Yang Yingliang [Thu, 17 Nov 2022 13:51:48 +0000 (21:51 +0800)]
net: pch_gbe: fix pci device refcount leak while module exiting
As comment of pci_get_domain_bus_and_slot() says, it returns
a pci device with refcount increment, when finish using it,
the caller must decrement the reference count by calling
pci_dev_put().
In pch_gbe_probe(), pci_get_domain_bus_and_slot() is called,
so in error path in probe() and remove() function, pci_dev_put()
should be called to avoid refcount leak. Compile tested only.
As comment of pci_get_domain_bus_and_slot() says, it returns
a pci device with refcount increment, when finish using it,
the caller must decrement the reference count by calling
pci_dev_put().
So before returning from rvu_dbg_rvu_pf_cgx_map_display() or
cgx_print_dmac_flt(), pci_dev_put() is called to avoid refcount
leak.
Fixes: 32356a740212 ("octeontx2-af: Debugfs support for DMAC filters") Fixes: 73523dded6f3 ("octeontx2-af: Display CGX, NIX and PF map in debugfs.") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Link: https://lore.kernel.org/r/20221117124658.162409-1-yangyingliang@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Liu Jian [Thu, 17 Nov 2022 11:13:56 +0000 (19:13 +0800)]
net: ethernet: mtk_eth_soc: fix error handling in mtk_open()
If mtk_start_dma() fails, invoke phylink_disconnect_phy() to perform
cleanup. phylink_disconnect_phy() contains the put_device action. If
phylink_disconnect_phy is not performed, the Kref of netdev will leak.
Fixes: 78450bb0f2fd ("net: ethernet: mediatek: Add basic PHYLINK support") Signed-off-by: Liu Jian <liujian56@huawei.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/20221117111356.161547-1-liujian56@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Slawomir Laba [Thu, 3 Nov 2022 13:00:03 +0000 (14:00 +0100)]
iavf: Fix race condition between iavf_shutdown and iavf_remove
Fix a deadlock introduced by commit 2d25b6bf5966 ("iavf: Add waiting so the port is initialized in remove")
due to race condition between iavf_shutdown and iavf_remove, where
iavf_remove stucks forever in while loop since iavf_shutdown already
set __IAVF_REMOVE adapter state.
Fix this by checking if the __IAVF_IN_REMOVE_TASK has already been
set and return if so.
Fixes: 2d25b6bf5966 ("iavf: Add waiting so the port is initialized in remove") Signed-off-by: Slawomir Laba <slawomirx.laba@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Marek Szlosek <marek.szlosek@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Stefan Assmann [Thu, 10 Nov 2022 14:14:44 +0000 (15:14 +0100)]
iavf: remove INITIAL_MAC_SET to allow gARP to work properly
IAVF_FLAG_INITIAL_MAC_SET prevents waiting on iavf_is_mac_set_handled()
the first time the MAC is set. This breaks gratuitous ARP because the
MAC address has not been updated yet when the gARP packet is sent out.
Current behaviour:
$ echo 1 > /sys/class/net/ens4f0/device/sriov_numvfs
iavf 0000:88:02.0: MAC address: ee:04:19:14:ec:ea
$ ip addr add 192.168.1.1/24 dev ens4f0v0
$ ip link set dev ens4f0v0 up
$ echo 1 > /proc/sys/net/ipv4/conf/ens4f0v0/arp_notify
$ ip link set ens4f0v0 addr 00:11:22:33:44:55
07:23:41.676611 ee:04:19:14:ec:ea > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 192.168.1.1 tell 192.168.1.1, length 28
With IAVF_FLAG_INITIAL_MAC_SET removed:
$ echo 1 > /sys/class/net/ens4f0/device/sriov_numvfs
iavf 0000:88:02.0: MAC address: 3e:8a:16:a2:37:6d
$ ip addr add 192.168.1.1/24 dev ens4f0v0
$ ip link set dev ens4f0v0 up
$ echo 1 > /proc/sys/net/ipv4/conf/ens4f0v0/arp_notify
$ ip link set ens4f0v0 addr 00:11:22:33:44:55
07:28:01.836608 00:11:22:33:44:55 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 192.168.1.1 tell 192.168.1.1, length 28
Fixes: 6fb54b1682cb ("iavf: Add waiting for response from PF in set mac") Signed-off-by: Stefan Assmann <sassmann@kpanic.de> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Ivan Vecera [Tue, 8 Nov 2022 10:25:02 +0000 (11:25 +0100)]
iavf: Do not restart Tx queues after reset task failure
After commit ba3dd0a75731 ("iavf: Detach device during reset task")
the device is detached during reset task and re-attached at its end.
The problem occurs when reset task fails because Tx queues are
restarted during device re-attach and this leads later to a crash.
To resolve this issue properly close the net device in cause of
failure in reset task to avoid restarting of tx queues at the end.
Also replace the hacky manipulation with IFF_UP flag by device close
that clears properly both IFF_UP and __LINK_STATE_START flags.
In these case iavf_close() does not do anything because the adapter
state is already __IAVF_DOWN.
Reproducer:
1) Run some Tx traffic (e.g. iperf3) over iavf interface
2) Set VF trusted / untrusted in loop
[root@host ~]# cat repro.sh
PF=enp65s0f0
IF=${PF}v0
ip link set up $IF
ip addr add 192.168.0.2/24 dev $IF
sleep 1
Fixes: ba3dd0a75731 ("iavf: Detach device during reset task") Cc: Jacob Keller <jacob.e.keller@intel.com> Cc: Patryk Piotrowski <patryk.piotrowski@intel.com> Cc: SlawomirX Laba <slawomirx.laba@intel.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Ivan Vecera [Tue, 8 Nov 2022 09:35:34 +0000 (10:35 +0100)]
iavf: Fix a crash during reset task
Recent commit ba3dd0a75731 ("iavf: Detach device during reset task")
removed netif_tx_stop_all_queues() with an assumption that Tx queues
are already stopped by netif_device_detach() in the beginning of
reset task. This assumption is incorrect because during reset
task a potential link event can start Tx queues again.
Revert this change to fix this issue.
Reproducer:
1. Run some Tx traffic (e.g. iperf3) over iavf interface
2. Switch MTU of this interface in a loop
netfilter: nf_tables: do not set up extensions for end interval
Elements with an end interval flag set on do not store extensions. The
global set definition is currently setting on the timeout and stateful
expression for end interval elements.
This leads to skipping end interval elements from the set->ops->walk()
path as the expired check bogusly reports true.
Moreover, do not set up stateful expressions for elements with end
interval flag set on since this is never used.
Fixes: 480e4ab40f55 ("netfilter: nf_tables: allow to specify stateful expression in set definition") Fixes: de589f0d3172 ("netfilter: nft_set_rbtree: add timeout support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Daniel Xu [Wed, 9 Nov 2022 19:39:07 +0000 (12:39 -0700)]
netfilter: conntrack: Fix data-races around ct mark
nf_conn:mark can be read from and written to in parallel. Use
READ_ONCE()/WRITE_ONCE() for reads and writes to prevent unwanted
compiler optimizations.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Wang Hai [Thu, 17 Nov 2022 06:55:27 +0000 (14:55 +0800)]
net: pch_gbe: fix potential memleak in pch_gbe_tx_queue()
In pch_gbe_xmit_frame(), NETDEV_TX_OK will be returned whether
pch_gbe_tx_queue() sends data successfully or not, so pch_gbe_tx_queue()
needs to free skb before returning. But pch_gbe_tx_queue() returns without
freeing skb in case of dma_map_single() fails. Add dev_kfree_skb_any()
to fix it.
Fixes: d818c768e816 ("net: Add Gigabit Ethernet driver of Topcliff PCH") Signed-off-by: Wang Hai <wanghai38@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Lin Ma [Wed, 16 Nov 2022 13:02:49 +0000 (21:02 +0800)]
nfc/nci: fix race with opening and closing
Previously we leverage NCI_UNREG and the lock inside nci_close_device to
prevent the race condition between opening a device and closing a
device. However, it still has problem because a failed opening command
will erase the NCI_UNREG flag and allow another opening command to
bypass the status checking.
This fix corrects that by making sure the NCI_UNREG is held.
Reported-by: syzbot+43475bf3cfbd6e41f5b7@syzkaller.appspotmail.com Fixes: 854468422e4d ("NFC: add NCI_UNREG flag to eliminate the race") Signed-off-by: Lin Ma <linma@zju.edu.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Wed, 16 Nov 2022 10:06:53 +0000 (12:06 +0200)]
net: dsa: sja1105: disallow C45 transactions on the BASE-TX MDIO bus
You'd think people know that the internal 100BASE-TX PHY on the SJA1110
responds only to clause 22 MDIO transactions, but they don't :)
When a clause 45 transaction is attempted, sja1105_base_tx_mdio_read()
and sja1105_base_tx_mdio_write() don't expect "reg" to contain bit 30
set (MII_ADDR_C45) and pack this value into the SPI transaction buffer.
But the field in the SPI buffer has a width smaller than 30 bits, so we
see this confusing message from the packing() API rather than a proper
rejection of C45 transactions:
Fixes: 56509c9467b5 ("net: dsa: sja1105: register the MDIO buses for 100base-T1 and 100base-TX") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
David Howells [Wed, 16 Nov 2022 14:02:28 +0000 (14:02 +0000)]
rxrpc: Fix race between conn bundle lookup and bundle removal [ZDI-CAN-15975]
After rxrpc_unbundle_conn() has removed a connection from a bundle, it
checks to see if there are any conns with available channels and, if not,
removes and attempts to destroy the bundle.
Whilst it does check after grabbing client_bundles_lock that there are no
connections attached, this races with rxrpc_look_up_bundle() retrieving the
bundle, but not attaching a connection for the connection to be attached
later.
There is therefore a window in which the bundle can get destroyed before we
manage to attach a new connection to it.
Fix this by adding an "active" counter to struct rxrpc_bundle:
(1) rxrpc_connect_call() obtains an active count by prepping/looking up a
bundle and ditches it before returning.
(2) If, during rxrpc_connect_call(), a connection is added to the bundle,
this obtains an active count, which is held until the connection is
discarded.
(3) rxrpc_deactivate_bundle() is created to drop an active count on a
bundle and destroy it when the active count reaches 0. The active
count is checked inside client_bundles_lock() to prevent a race with
rxrpc_look_up_bundle().
(4) rxrpc_unbundle_conn() then calls rxrpc_deactivate_bundle().
Fixes: f3133506e603 ("rxrpc: Rewrite the client connection manager") Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-15975 Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: zdi-disclosures@trendmicro.com
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org Signed-off-by: David S. Miller <davem@davemloft.net>
Wang Yufen [Thu, 17 Nov 2022 02:45:03 +0000 (10:45 +0800)]
selftests/net: fix missing xdp_dummy
After commit 0213486b9839 ("selftests/bpf: Store BPF object files with
.bpf.o extension"), we should use xdp_dummy.bpf.o instade of xdp_dummy.o.
In addition, use the BPF_FILE variable to save the BPF object file name,
which can be better identified and modified.
Fixes: 0213486b9839 ("selftests/bpf: Store BPF object files with .bpf.o extension") Signed-off-by: Wang Yufen <wangyufen@huawei.com> Cc: Daniel Müller <deso@posteo.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Mahesh Bandewar [Wed, 16 Nov 2022 01:19:14 +0000 (17:19 -0800)]
ipvlan: hold lower dev to avoid possible use-after-free
Recently syzkaller discovered the issue of disappearing lower
device (NETDEV_UNREGISTER) while the virtual device (like
macvlan) is still having it as a lower device. So it's just
a matter of time similar discovery will be made for IPvlan
device setup. So fixing it preemptively. Also while at it,
add a refcount tracker.
Fixes: 35b69229bc37 ("ipvlan: Initial check-in of the IPVLAN driver.") Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Commit cc5a1afd02ef ("neighbour: make proxy_queue.qlen limit
per-device") introduced the length counter qlen in struct neigh_parms.
There are separate neigh_parms instances for IPv4/ARP and IPv6/ND, and
while the family specific qlen is incremented in pneigh_enqueue(), the
mentioned commit decrements always the IPv4/ARP specific qlen,
regardless of the currently processed family, in pneigh_queue_purge()
and neigh_proxy_process().
As a result, with IPv6/ND, the family specific qlen is only incremented
(and never decremented) until it exceeds PROXY_QLEN, and then, according
to the check in pneigh_enqueue(), neighbor solicitations are not
answered anymore. As an example, this is noted when using the
subnet-router anycast address to access a Linux router. After a certain
amount of time (in the observed case, qlen exceeded PROXY_QLEN after two
days), the Linux router stops answering neighbor solicitations for its
subnet-router anycast address and effectively becomes unreachable.
Another result with IPv6/ND is that the IPv4/ARP specific qlen is
decremented more often than incremented. This leads to negative qlen
values, as a signed integer has been used for the length counter qlen,
and potentially to an integer overflow.
Fix this by introducing the helper function neigh_parms_qlen_dec(),
which decrements the family specific qlen. Thereby, make use of the
existing helper function neigh_get_dev_parms_rcu(), whose definition
therefore needs to be placed earlier in neighbour.c. Take the family
member from struct neigh_table to determine the currently processed
family and appropriately call neigh_parms_qlen_dec() from
pneigh_queue_purge() and neigh_proxy_process().
Additionally, use an unsigned integer for the length counter qlen.
Fixes: cc5a1afd02ef ("neighbour: make proxy_queue.qlen limit per-device") Signed-off-by: Thomas Zeitlhofer <thomas.zeitlhofer+lkml@ze-it.at> Signed-off-by: David S. Miller <davem@davemloft.net>
Leon Romanovsky [Tue, 15 Nov 2022 17:34:39 +0000 (19:34 +0200)]
net: liquidio: simplify if expression
Fix the warning reported by kbuild:
cocci warnings: (new ones prefixed by >>)
>> drivers/net/ethernet/cavium/liquidio/lio_main.c:1797:54-56: WARNING !A || A && B is equivalent to !A || B
drivers/net/ethernet/cavium/liquidio/lio_main.c:1827:54-56: WARNING !A || A && B is equivalent to !A || B
Fixes: a9e550fbba5e ("net: liquidio: release resources when liquidio driver open failed") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Saeed Mahameed <saeed@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Matthieu Baerts [Tue, 15 Nov 2022 22:10:45 +0000 (14:10 -0800)]
selftests: mptcp: run mptcp_sockopt from a new netns
Not running it from a new netns causes issues if some MPTCP settings are
modified, e.g. if MPTCP is disabled from the sysctl knob, if multiple
addresses are available and added to the MPTCP path-manager, etc.
In these cases, the created connection will not behave as expected, e.g.
unable to create an MPTCP socket, more than one subflow is seen, etc.
A new "sandbox" net namespace is now created and used to run
mptcp_sockopt from this controlled environment.
Fixes: 65c567cfaa02 ("selftests: mptcp: add mptcp getsockopt test cases") Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paolo Abeni [Tue, 15 Nov 2022 22:10:44 +0000 (14:10 -0800)]
selftests: mptcp: gives slow test-case more time
On slow or busy VM, some test-cases still fail because the
data transfer completes before the endpoint manipulation
actually took effect.
Address the issue by artificially increasing the runtime for
the relevant test-cases.
Fixes: 89b559d8fcab ("selftests: mptcp: signal addresses testcases") Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/309 Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Hangbin Liu [Tue, 15 Nov 2022 14:24:00 +0000 (22:24 +0800)]
net: use struct_group to copy ip/ipv6 header addresses
kernel test robot reported warnings when build bonding module with
make W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash drivers/net/bonding/:
from ../drivers/net/bonding/bond_main.c:35:
In function ‘fortify_memcpy_chk’,
inlined from ‘iph_to_flow_copy_v4addrs’ at ../include/net/ip.h:566:2,
inlined from ‘bond_flow_ip’ at ../drivers/net/bonding/bond_main.c:3984:3:
../include/linux/fortify-string.h:413:25: warning: call to ‘__read_overflow2_field’ declared with attribute warning: detected read beyond size of f
ield (2nd parameter); maybe use struct_group()? [-Wattribute-warning]
413 | __read_overflow2_field(q_size_field, size);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In function ‘fortify_memcpy_chk’,
inlined from ‘iph_to_flow_copy_v6addrs’ at ../include/net/ipv6.h:900:2,
inlined from ‘bond_flow_ip’ at ../drivers/net/bonding/bond_main.c:3994:3:
../include/linux/fortify-string.h:413:25: warning: call to ‘__read_overflow2_field’ declared with attribute warning: detected read beyond size of f
ield (2nd parameter); maybe use struct_group()? [-Wattribute-warning]
413 | __read_overflow2_field(q_size_field, size);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This is because we try to copy the whole ip/ip6 address to the flow_key,
while we only point the to ip/ip6 saddr. Note that since these are UAPI
headers, __struct_group() is used to avoid the compiler warnings.
Reported-by: kernel test robot <lkp@intel.com> Fixes: 56c7592b4aad ("net: Add full IPv6 addresses to flow_keys") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://lore.kernel.org/r/20221115142400.1204786-1-liuhangbin@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
An external PHY needs settling time after power up or reset.
In the bind() function an mdio bus is registered. If at this point
the external PHY is still initialising, no valid PHY ID will be
read and on phy_find_first() the bind() function will fail.
If an external PHY is present, wait the maximum time specified
in 802.3 45.2.7.1.1.
Linus Torvalds [Wed, 16 Nov 2022 18:49:06 +0000 (10:49 -0800)]
Merge tag 'for-linus-6.1-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
"Two trivial cleanups, and three simple fixes"
* tag 'for-linus-6.1-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/platform-pci: use define instead of literal number
xen/platform-pci: add missing free_irq() in error path
xen-pciback: Allow setting PCI_MSIX_FLAGS_MASKALL too
xen/pcpu: fix possible memory leak in register_pcpu()
x86/xen: Use kstrtobool() instead of strtobool()
Linus Torvalds [Wed, 16 Nov 2022 18:40:00 +0000 (10:40 -0800)]
Merge tag 'pinctrl-v6.1-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
Pull pin control fixes from Linus Walleij:
"Aere is a hopefully final round of pin control fixes. Nothing special,
driver fixes and we caught a potential NULL pointer exception.
- Fix a potential NULL dereference in the core!
- Fix all pin mux routes in the Rockchop PX30 driver
- Fix the UFS pins in the Qualcomm SC8280XP driver
- Fix bias disabling in the Mediatek driver
- Fix debounce time settings in the Mediatek driver"
* tag 'pinctrl-v6.1-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: mediatek: Export debounce time tables
pinctrl: mediatek: Fix EINT pins input debounce time configuration
pinctrl: devicetree: fix null pointer dereferencing in pinctrl_dt_to_map
pinctrl: mediatek: common-v2: Fix bias-disable for PULL_PU_PD_RSEL_TYPE
pinctrl: qcom: sc8280xp: Rectify UFS reset pins
pinctrl: rockchip: list all pins in a possible mux route for PX30
Linus Torvalds [Wed, 16 Nov 2022 18:36:13 +0000 (10:36 -0800)]
Merge tag 'platform-drivers-x86-v6.1-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver fixes from Hans de Goede:
- Surface Pro 9 and Surface Laptop 5 kbd, battery, etc support (this
is just a few hw-id additions)
- A couple of other hw-id / DMI-quirk additions
- A few small bug fixes + 1 build fix
* tag 'platform-drivers-x86-v6.1-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86: ideapad-laptop: Add module parameters to match DMI quirk tables
platform/x86: ideapad-laptop: Fix interrupt storm on fn-lock toggle on some Yoga laptops
platform/x86: hp-wmi: Ignore Smart Experience App event
platform/surface: aggregator_registry: Add support for Surface Laptop 5
platform/surface: aggregator_registry: Add support for Surface Pro 9
platform/surface: aggregator: Do not check for repeated unsequenced packets
platform/x86: acer-wmi: Enable SW_TABLET_MODE on Switch V 10 (SW5-017)
platform/x86: asus-wmi: add missing pci_dev_put() in asus_wmi_set_xusb2pr()
platform/x86/intel: pmc: Don't unconditionally attach Intel PMC when virtualized
platform/x86: thinkpad_acpi: Enable s2idle quirk for 21A1 machine type
platform/x86/amd: pmc: Add new ACPI ID AMDI0009
platform/x86/amd: pmc: Remove more CONFIG_DEBUG_FS checks
Gleb Mazovetskiy [Mon, 14 Nov 2022 22:56:16 +0000 (22:56 +0000)]
tcp: configurable source port perturb table size
On embedded systems with little memory and no relevant
security concerns, it is beneficial to reduce the size
of the table.
Reducing the size from 2^16 to 2^8 saves 255 KiB
of kernel RAM.
Makes the table size configurable as an expert option.
The size was previously increased from 2^8 to 2^16
in commit 144394530265 ("tcp: increase source port perturb table to
2^16").
Signed-off-by: Gleb Mazovetskiy <glex.spb@gmail.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Sitnicki [Mon, 14 Nov 2022 19:16:19 +0000 (20:16 +0100)]
l2tp: Serialize access to sk_user_data with sk_callback_lock
sk->sk_user_data has multiple users, which are not compatible with each
other. Writers must synchronize by grabbing the sk->sk_callback_lock.
l2tp currently fails to grab the lock when modifying the underlying tunnel
socket fields. Fix it by adding appropriate locking.
We err on the side of safety and grab the sk_callback_lock also inside the
sk_destruct callback overridden by l2tp, even though there should be no
refs allowing access to the sock at the time when sk_destruct gets called.
v4:
- serialize write to sk_user_data in l2tp sk_destruct
v3:
- switch from sock lock to sk_callback_lock
- document write-protection for sk_user_data
v2:
- update Fixes to point to origin of the bug
- use real names in Reported/Tested-by tags
Cc: Tom Parkin <tparkin@katalix.com> Fixes: 72ecdb28cfd7 ("[L2TP]: PPP over L2TP driver core") Reported-by: Haowei Yan <g1042620637@gmail.com> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yuan Can [Mon, 14 Nov 2022 14:22:25 +0000 (14:22 +0000)]
net: thunderbolt: Fix error handling in tbnet_init()
A problem about insmod thunderbolt-net failed is triggered with following
log given while lsmod does not show thunderbolt_net:
insmod: ERROR: could not insert module thunderbolt-net.ko: File exists
The reason is that tbnet_init() returns tb_register_service_driver()
directly without checking its return value, if tb_register_service_driver()
failed, it returns without removing property directory, resulting the
property directory can never be created later.
Fix by remove property directory when tb_register_service_driver() returns
error.
Fixes: 8231206ef815 ("net: Add support for networking over Thunderbolt cable") Signed-off-by: Yuan Can <yuancan@huawei.com> Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 16 Nov 2022 09:10:29 +0000 (09:10 +0000)]
Merge branch 'microchip-fixes'
Shang XiaoJing says:
====================
net: microchip: Fix potential null-ptr-deref due to create_singlethread_workqueue()
There are some functions call create_singlethread_workqueue() without
checking ret value, and the NULL workqueue_struct pointer may causes
null-ptr-deref. Will be fixed by this patch.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Hans de Goede [Tue, 15 Nov 2022 19:34:00 +0000 (20:34 +0100)]
platform/x86: ideapad-laptop: Add module parameters to match DMI quirk tables
Add module parameters to allow setting the hw_rfkill_switch and
set_fn_lock_led feature flags for testing these on laptops which are not
on the DMI-id based allow lists for these 2 flags.
Arnav Rawat [Fri, 11 Nov 2022 14:32:09 +0000 (14:32 +0000)]
platform/x86: ideapad-laptop: Fix interrupt storm on fn-lock toggle on some Yoga laptops
Commit f27f4abba6ae ("platform/x86: ideapad-laptop: Fix Legion 5 Fn lock
LED") uses the WMI event-id for the fn-lock event on some Legion 5 laptops
to manually toggle the fn-lock LED because the EC does not do it itself.
However, the same WMI ID is also sent on some Yoga laptops. Here, setting
the fn-lock state is not valid behavior, and causes the EC to spam
interrupts until the laptop is rebooted.
Add a set_fn_lock_led_list[] DMI-id list and only enable the workaround to
manually set the LED on models on this list.
Maximilian Luz [Tue, 15 Nov 2022 23:14:40 +0000 (00:14 +0100)]
platform/surface: aggregator_registry: Add support for Surface Laptop 5
Add device nodes to enable support for battery and charger status, the
ACPI platform profile, as well as internal HID devices (including
touchpad and keyboard) on the Surface Laptop 5.
Vladimir Oltean [Mon, 14 Nov 2022 14:35:51 +0000 (16:35 +0200)]
net: dsa: don't leak tagger-owned storage on switch driver unbind
In the initial commit bf2637eecc71 ("net: dsa: introduce tagger-owned
storage for private and shared data"), we had a call to
tag_ops->disconnect(dst) issued from dsa_tree_free(), which is called at
tree teardown time.
There were problems with connecting to a switch tree as a whole, so this
got reworked to connecting to individual switches within the tree. In
this process, tag_ops->disconnect(ds) was made to be called only from
switch.c (cross-chip notifiers emitted as a result of dynamic tag proto
changes), but the normal driver teardown code path wasn't replaced with
anything.
Solve this problem by adding a function that does the opposite of
dsa_switch_setup_tag_protocol(), which is called from the equivalent
spot in dsa_switch_teardown(). The positioning here also ensures that we
won't have any use-after-free in tagging protocol (*rcv) ops, since the
teardown sequence is as follows:
dsa_tree_teardown
-> dsa_tree_teardown_master
-> dsa_master_teardown
-> unsets master->dsa_ptr, making no further packets match the
ETH_P_XDSA packet type handler
-> dsa_tree_teardown_ports
-> dsa_port_teardown
-> dsa_slave_destroy
-> unregisters DSA net devices, there is even a synchronize_net()
in unregister_netdevice_many()
-> dsa_tree_teardown_switches
-> dsa_switch_teardown
-> dsa_switch_teardown_tag_protocol
-> finally frees the tagger-owned storage
Fixes: 0ec0bbfc69f8 ("net: dsa: make tagging protocols connect to individual switches from a tree") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Saeed Mahameed <saeed@kernel.org> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20221114143551.1906361-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>