xsk: Fix possible crash when multiple sockets are created
Fix a crash that happens if an Rx only socket is created first, then a
second socket is created that is Tx only and bound to the same umem as
the first socket and also the same netdev and queue_id together with the
XDP_SHARED_UMEM flag. In this specific case, the tx_descs array page
pool was not created by the first socket as it was an Rx only socket.
When the second socket is bound it needs this tx_descs array of this
shared page pool as it has a Tx component, but unfortunately it was
never allocated, leading to a crash. Note that this array is only used
for zero-copy drivers using the batched Tx APIs, currently only ice and
i40e.
Detect such case during bind() and allocate this memory region via newly
introduced xp_alloc_tx_descs(). Also, use kvcalloc instead of kcalloc as
for other buffer pool allocations, so that it matches the kvfree() from
xp_destroy().
Fixes: 65e837d98a40 ("i40e: xsk: Move tmp desc array from driver to pool") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20220425153745.481322-1-maciej.fijalkowski@intel.com
Adam Zabrocki [Fri, 22 Apr 2022 16:40:27 +0000 (18:40 +0200)]
kprobes: Fix KRETPROBES when CONFIG_KRETPROBE_ON_RETHOOK is set
The recent kernel change in 7df43f83baa8 ("kprobes: Use rethook for kretprobe
if possible"), introduced a potential NULL pointer dereference bug in the
KRETPROBE mechanism. The official Kprobes documentation defines that "Any or
all handlers can be NULL". Unfortunately, there is a missing return handler
verification to fulfill these requirements and can result in a NULL pointer
dereference bug.
This patch adds such verification in kretprobe_rethook_handler() function.
Fixes: 7df43f83baa8 ("kprobes: Use rethook for kretprobe if possible") Signed-off-by: Adam Zabrocki <pi3@pi3.com.pl> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Naveen N. Rao <naveen.n.rao@linux.ibm.com> Cc: Anil S. Keshavamurthy <anil.s.keshavamurthy@intel.com> Link: https://lore.kernel.org/bpf/20220422164027.GA7862@pi3.com.pl
bpf, lwt: Fix crash when using bpf_skb_set_tunnel_key() from bpf_xmit lwt hook
xmit_check_hhlen() observes the dst for getting the device hard header
length to make sure a modified packet can fit. When a helper which changes
the dst - such as bpf_skb_set_tunnel_key() - is called as part of the
xmit program the accessed dst is no longer valid.
bpf: Fix release of page_pool in BPF_PROG_RUN in test runner
The live packet mode in BPF_PROG_RUN allocates a page_pool instance for
each test run instance and uses it for the packet data. On setup it creates
the page_pool, and calls xdp_reg_mem_model() to allow pages to be returned
properly from the XDP data path. However, xdp_reg_mem_model() also raises
the reference count of the page_pool itself, so the single
page_pool_destroy() count on teardown was not enough to actually release
the pool. To fix this, add an additional xdp_unreg_mem_model() call on
teardown.
Fixes: c23b8c0ec3f7 ("bpf: Add "live packet" mode for XDP in BPF_PROG_RUN") Reported-by: Freysteinn Alfredsson <freysteinn.alfredsson@kau.se> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20220409213053.3117305-1-toke@redhat.com
While checking AF_XDP copy mode combined with busy poll, strange
results were observed. rxdrop and txonly scenarios worked fine, but
l2fwd broke immediately.
After a deeper look, it turned out that for l2fwd, Tx side was exiting
early due to xsk_no_wakeup() returning true and in the end
xsk_generic_xmit() was never called. Note that AF_XDP Tx in copy mode
is syscall steered, so the current behavior is broken.
Txonly scenario only worked due to the fact that
sk_mark_napi_id_once_xdp() was never called - since Rx side is not in
the picture for this case and mentioned function is called in
xsk_rcv_check(), sk::sk_napi_id was never set, which in turn meant that
xsk_no_wakeup() was returning false (see the sk->sk_napi_id >=
MIN_NAPI_ID check in there).
To fix this, prefer busy poll in xsk_sendmsg() only when zero copy is
enabled on a given AF_XDP socket. By doing so, busy poll in copy mode
would not exit early on Tx side and eventually xsk_generic_xmit() will
be called.
drivers: net: slip: fix NPD bug in sl_tx_timeout()
When a slip driver is detaching, the slip_close() will act to
cleanup necessary resources and sl->tty is set to NULL in
slip_close(). Meanwhile, the packet we transmit is blocked,
sl_tx_timeout() will be called. Although slip_close() and
sl_tx_timeout() use sl->lock to synchronize, we don`t judge
whether sl->tty equals to NULL in sl_tx_timeout() and the
null pointer dereference bug will happen.
We've added 8 non-merge commits during the last 8 day(s) which contain
a total of 9 files changed, 139 insertions(+), 36 deletions(-).
The main changes are:
1) rethook related fixes, from Jiri and Masami.
2) Fix the case when tracing bpf prog is attached to struct_ops, from Martin.
3) Support dual-stack sockets in bpf_tcp_check_syncookie, from Maxim.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf: Adjust bpf_tcp_check_syncookie selftest to test dual-stack sockets
bpf: Support dual-stack sockets in bpf_tcp_check_syncookie
bpf: selftests: Test fentry tracing a struct_ops program
bpf: Resolve to prog->aux->dst_prog->type only for BPF_PROG_TYPE_EXT
rethook: Fix to use WRITE_ONCE() for rethook:: Handler
selftests/bpf: Fix warning comparing pointer to 0
bpf: Fix sparse warnings in kprobe_multi_resolve_syms
bpftool: Explicit errno handling in skeletons
====================
bpf: Adjust bpf_tcp_check_syncookie selftest to test dual-stack sockets
The previous commit fixed support for dual-stack sockets in
bpf_tcp_check_syncookie. This commit adjusts the selftest to verify the
fixed functionality.
bpf: Support dual-stack sockets in bpf_tcp_check_syncookie
bpf_tcp_gen_syncookie looks at the IP version in the IP header and
validates the address family of the socket. It supports IPv4 packets in
AF_INET6 dual-stack sockets.
On the other hand, bpf_tcp_check_syncookie looks only at the address
family of the socket, ignoring the real IP version in headers, and
validates only the packet size. This implementation has some drawbacks:
1. Packets are not validated properly, allowing a BPF program to trick
bpf_tcp_check_syncookie into handling an IPv6 packet on an IPv4
socket.
2. Dual-stack sockets fail the checks on IPv4 packets. IPv4 clients end
up receiving a SYNACK with the cookie, but the following ACK gets
dropped.
This patch fixes these issues by changing the checks in
bpf_tcp_check_syncookie to match the ones in bpf_tcp_gen_syncookie. IP
version from the header is taken into account, and it is validated
properly with address family.
Fixes: a02ecae654cb ("bpf: add helper to check for a valid SYN cookie") Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Acked-by: Arthur Fabre <afabre@cloudflare.com> Link: https://lore.kernel.org/bpf/20220406124113.2795730-1-maximmi@nvidia.com
net: usb: aqc111: Fix out-of-bounds accesses in RX fixup
aqc111_rx_fixup() contains several out-of-bounds accesses that can be
triggered by a malicious (or defective) USB device, in particular:
- The metadata array (desc_offset..desc_offset+2*pkt_count) can be out of bounds,
causing OOB reads and (on big-endian systems) OOB endianness flips.
- A packet can overlap the metadata array, causing a later OOB
endianness flip to corrupt data used by a cloned SKB that has already
been handed off into the network stack.
- A packet SKB can be constructed whose tail is far beyond its end,
causing out-of-bounds heap data to be considered part of the SKB's
data.
Found doing variant analysis. Tested it with another driver (ax88179_178a), since
I don't have a aqc111 device to test it, but the code looks very similar.
Signed-off-by: Marcin Kozlowski <marcinguy@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
qede_build_skb() assumes build_skb() always works and goes straight
to skb_reserve(). However, build_skb() can fail under memory pressure.
This results in a kernel panic because the skb to reserve is NULL.
Add a check in case build_skb() failed to allocate and return NULL.
The NULL return is handled correctly in callers to qede_build_skb().
Fixes: c84f1b7b0321b ("qede: Add build_skb() support.") Signed-off-by: Jamie Bainbridge <jamie.bainbridge@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Move it to the CONFIG_IPV6_PIMSM_V2 scope where its used.
Fixes: 38668f544876 ("net: ip6mr: add support for passing full packet on wrong mif") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 6 Apr 2022 14:03:50 +0000 (15:03 +0100)]
Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2022-04-05
Maciej Fijalkowski says:
We were solving issues around AF_XDP busy poll's not-so-usual scenarios,
such as very big busy poll budgets applied to very small HW rings. This
set carries the things that were found during that work that apply to
net tree.
One thing that was fixed for all in-tree ZC drivers was missing on ice
side all the time - it's about syncing RCU before destroying XDP
resources. Next one fixes the bit that is checked in ice_xsk_wakeup and
third one avoids false setting of DD bits on Tx descriptors.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The driver doesn't support clause 45 register access yet, but doesn't
check if the access is a c45 one either. This leads to spurious register
reads and writes. Add the check.
Fixes: 0096bbe70959 ("net: phy: mscc-miim: Add MDIO driver") Signed-off-by: Michael Walle <michael@walle.cc> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 6 Apr 2022 12:54:52 +0000 (13:54 +0100)]
Merge branch 'axienet-broken-link'
Andy Chiu says:
====================
Fix broken link on Xilinx's AXI Ethernet in SGMII mode
The Ethernet driver use phy-handle to reference the PCS/PMA PHY. This
could be a problem if one wants to configure an external PHY via phylink,
since it use the same phandle to get the PHY. To fix this, introduce a
dedicated pcs-handle to point to the PCS/PMA PHY and deprecate the use
of pointing it with phy-handle. A similar use case of pcs-handle can be
seen on dpaa2 as well.
--- patch v5 ---
- Re-apply the v4 patch on the net tree.
- Describe the pcs-handle DT binding at ethernet-controller level.
--- patch v6 ---
- Remove "preferrably" to clearify usage of pcs_handle.
--- patch v7 ---
- Rebase the patch on latest net/master
--- patch v8 ---
- Rebase the patch on net-next/master
- Add "reviewed-by" tag in PATCH 3/4: dt-bindings: net: add pcs-handle
attribute
- Remove "fix" tag in last commit message since this is not a critical
bug and will not be back ported to stable.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Chiu [Tue, 5 Apr 2022 09:19:29 +0000 (17:19 +0800)]
net: axiemac: use a phandle to reference pcs_phy
In some SGMII use cases where both a fixed link external PHY and the
internal PCS/PMA PHY need to be configured, we should explicitly use a
phandle "pcs-phy" to get the reference to the PCS/PMA PHY. Otherwise, the
driver would use "phy-handle" in the DT as the reference to both the
external and the internal PCS/PMA PHY.
In other cases where the core is connected to a SFP cage, we could still
point phy-handle to the intenal PCS/PMA PHY, and let the driver connect
to the SFP module, if exist, via phylink.
Signed-off-by: Andy Chiu <andy.chiu@sifive.com> Reviewed-by: Greentime Hu <greentime.hu@sifive.com> Reviewed-by: Robert Hancock <robert.hancock@calian.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Chiu [Tue, 5 Apr 2022 09:19:28 +0000 (17:19 +0800)]
dt-bindings: net: add pcs-handle attribute
Document the new pcs-handle attribute to support connecting to an
external PHY. For Xilinx's AXI Ethernet, this is used when the core
operates in SGMII or 1000Base-X modes and links through the internal
PCS/PMA PHY.
Signed-off-by: Andy Chiu <andy.chiu@sifive.com> Reviewed-by: Greentime Hu <greentime.hu@sifive.com> Reviewed-by: Rob Herring <robh@kernel.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Chiu [Tue, 5 Apr 2022 09:19:27 +0000 (17:19 +0800)]
net: axienet: factor out phy_node in struct axienet_local
the struct member `phy_node` of struct axienet_local is not used by the
driver anymore after initialization. It might be a remnent of old code
and could be removed.
Signed-off-by: Andy Chiu <andy.chiu@sifive.com> Reviewed-by: Greentime Hu <greentime.hu@sifive.com> Reviewed-by: Robert Hancock <robert.hancock@calian.com> Reviewed-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Chiu [Tue, 5 Apr 2022 09:19:26 +0000 (17:19 +0800)]
net: axienet: setup mdio unconditionally
The call to axienet_mdio_setup should not depend on whether "phy-node"
pressents on the DT. Besides, since `lp->phy_node` is used if PHY is in
SGMII or 100Base-X modes, move it into the if statement. And the next patch
will remove `lp->phy_node` from driver's private structure and do an
of_node_put on it right away after use since it is not used elsewhere.
Signed-off-by: Andy Chiu <andy.chiu@sifive.com> Reviewed-by: Greentime Hu <greentime.hu@sifive.com> Reviewed-by: Robert Hancock <robert.hancock@calian.com> Reviewed-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
In some cases, xdp tx_queue can get used before initialization.
1. interface up/down
2. ring buffer size change
When CPU cores are lower than maximum number of channels of sfc driver,
it creates new channels only for XDP.
When an interface is up or ring buffer size is changed, all channels
are initialized.
But xdp channels are always initialized later.
So, the below scenario is possible.
Packets are received to rx queue of normal channels and it is acted
XDP_TX and tx_queue of xdp channels get used.
But these tx_queues are not initialized yet.
If so, TX DMA or queue error occurs.
In order to avoid this problem.
1. initializes xdp tx_queues earlier than other rx_queue in
efx_start_channels().
2. checks whether tx_queue is initialized or not in efx_xdp_tx_buffers().
Splat looks like:
sfc 0000:08:00.1 enp8s0f1np1: TX queue 10 spurious TX completion id 250
sfc 0000:08:00.1 enp8s0f1np1: resetting (RECOVER_OR_ALL)
sfc 0000:08:00.1 enp8s0f1np1: MC command 0x80 inlen 100 failed rc=-22
(raw=22) arg=789
sfc 0000:08:00.1 enp8s0f1np1: has been disabled
Fixes: 97013b4d3a6f ("sfc: fix lack of XDP TX queues - error XDP TX failed (-22)") Acked-by: Martin Habets <habetsm.xilinx@gmail.com> Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes: a5d3a0504782 ("rxrpc: Fix firewall route keepalive") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: David Howells <dhowells@redhat.com> Cc: Marc Dionne <marc.dionne@auristor.com> Cc: linux-afs@lists.infradead.org Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
While parsing user-provided actions, openvswitch module may dynamically
allocate memory and store pointers in the internal copy of the actions.
So this memory has to be freed while destroying the actions.
Currently there are only two such actions: ct() and set(). However,
there are many actions that can hold nested lists of actions and
ovs_nla_free_flow_actions() just jumps over them leaking the memory.
For example, removal of the flow with the following actions will lead
to a leak of the memory allocated by nf_ct_tmpl_alloc():
actions:clone(ct(commit),0)
Non-freed set() action may also leak the 'dst' structure for the
tunnel info including device references.
Under certain conditions with a high rate of flow rotation that may
cause significant memory leak problem (2MB per second in reporter's
case). The problem is also hard to mitigate, because the user doesn't
have direct control over the datapath flows generated by OVS.
Fix that by iterating over all the nested actions and freeing
everything that needs to be freed recursively.
New build time assertion should protect us from this problem if new
actions will be added in the future.
Unfortunately, openvswitch module doesn't use NLA_F_NESTED, so all
attributes has to be explicitly checked. sample() and clone() actions
are mixing extra attributes into the user-provided action list. That
prevents some code generalization too.
Fixes: 2183b9186077 ("openvswitch: Make tunnel set action attach a metadata dst") Link: https://mail.openvswitch.org/pipermail/ovs-dev/2022-March/392922.html Reported-by: Stéphane Graber <stgraber@ubuntu.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Tue, 5 Apr 2022 00:04:04 +0000 (02:04 +0200)]
net: ethernet: mv643xx: Fix over zealous checking of_get_mac_address()
There is often not a MAC address available in an EEPROM accessible by
Linux with Marvell devices. Instead the bootload has the MAC address
and directly programs it into the hardware. So don't consider an error
from of_get_mac_address() has fatal. However, the check was added for
the case where there is a MAC address in an the EEPROM, but the EEPROM
has not probed yet, and -EPROBE_DEFER is returned. In that case the
error should be returned. So make the check specific to this error
code.
Cc: Mauri Sandberg <maukka@ext.kapsi.fi> Reported-by: Thomas Walther <walther-it@gmx.de> Fixes: 73a49517f13c ("net: mv643xx_eth: process retval from of_get_mac_address") Signed-off-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20220405000404.3374734-1-andrew@lunn.ch Signed-off-by: Jakub Kicinski <kuba@kernel.org>
net: openvswitch: don't send internal clone attribute to the userspace.
'OVS_CLONE_ATTR_EXEC' is an internal attribute that is used for
performance optimization inside the kernel. It's added by the kernel
while parsing user-provided actions and should not be sent during the
flow dump as it's not part of the uAPI.
The issue doesn't cause any significant problems to the ovs-vswitchd
process, because reported actions are not really used in the
application lifecycle and only supposed to be shown to a human via
ovs-dpctl flow dump. However, the action list is still incorrect
and causes the following error if the user wants to look at the
datapath flows:
Currently when XDP rings are created, each descriptor gets its DD bit
set, which turns out to be the wrong approach as it can lead to a
situation where more descriptors get cleaned than it was supposed to,
e.g. when AF_XDP busy poll is run with a large batch size. In this
situation, the driver would request for more buffers than it is able to
handle.
Fix this by not setting the DD bits in ice_xdp_alloc_setup_rings(). They
should be initialized to zero instead.
Fixes: f325588a00e6 ("ice: optimize XDP_TX workloads") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Shwetha Nagaraju <shwetha.nagaraju@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
ICE_DOWN is dedicated for pf->state. Check for ICE_VSI_DOWN being set on
vsi->state in ice_xsk_wakeup().
Fixes: a40d09f79c37 ("ice: Add support for AF_XDP") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Shwetha Nagaraju <shwetha.nagaraju@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Unfortunately, the ice driver doesn't respect the RCU critical section that
XSK wakeup is surrounded with. To fix this, add synchronize_rcu() calls to
paths that destroy resources that might be in use.
This was addressed in other AF_XDP ZC enabled drivers, for reference see
for example commit 8ea5bfcc431c ("net/i40e: Fix concurrency issues
between config flow and XSK")
Fixes: 222ff8e45f21 ("ice: Add support for XDP") Fixes: a40d09f79c37 ("ice: Add support for AF_XDP") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Shwetha Nagaraju <shwetha.nagaraju@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
David Ahern [Mon, 4 Apr 2022 15:09:08 +0000 (09:09 -0600)]
ipv6: Fix stats accounting in ip6_pkt_drop
VRF devices are the loopbacks for VRFs, and a loopback can not be
assigned to a VRF. Accordingly, the condition in ip6_pkt_drop should
be '||' not '&&'.
Fixes: be468cd3950a ("vrf: Use orig netdev to count Ip6InNoRoutes and a fresh route lookup when sending dest unreach") Reported-by: Pudak, Filip <Filip.Pudak@windriver.com> Reported-by: Xiao, Jiguang <Jiguang.Xiao@windriver.com> Signed-off-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20220404150908.2937-1-dsahern@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Paolo Abeni [Tue, 5 Apr 2022 10:50:28 +0000 (12:50 +0200)]
Merge branch 'ice-bug-fixes'
Tony Nguyen says:
====================
ice bug fixes
Alice Michael says:
There were a couple of bugs that have been found and
fixed by Anatolii in the ice driver. First he fixed
a bug on ring creation by setting the default value
for the teid. Anatolli also fixed a bug with deleting
queues in ice_vc_dis_qs_msg based on their enablement.
---
v2: Remove empty lines between tags
The following are changes since commit 978971df82840b37a6a72fbbcf1966e3d577b88a:
sfc: Do not free an empty page_ring
and are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue 100GbE
====================
ice: Do not skip not enabled queues in ice_vc_dis_qs_msg
Disable check for queue being enabled in ice_vc_dis_qs_msg, because
there could be a case when queues were created, but were not enabled.
We still need to delete those queues.
Normal workflow for VF looks like:
Enable path:
VIRTCHNL_OP_ADD_ETH_ADDR (opcode 10)
VIRTCHNL_OP_CONFIG_VSI_QUEUES (opcode 6)
VIRTCHNL_OP_ENABLE_QUEUES (opcode 8)
The issue appears only in stress conditions when VF is enabled and
disabled very fast.
Eventually there will be a case, when queues are created by
VIRTCHNL_OP_CONFIG_VSI_QUEUES, but are not enabled by
VIRTCHNL_OP_ENABLE_QUEUES.
In turn, these queues are not deleted by VIRTCHNL_OP_DISABLE_QUEUES,
because there is a check whether queues are enabled in
ice_vc_dis_qs_msg.
When we bring up the VF again, we will see the "Failed to set LAN Tx queue
context" error during VIRTCHNL_OP_CONFIG_VSI_QUEUES step. This
happens because old 16 queues were not deleted and VF requests to create
16 more, but ice_sched_get_free_qparent in ice_ena_vsi_txq would fail to
find a parent node for first newly requested queue (because all nodes
are allocated to 16 old queues).
Testing Hints:
Just enable and disable VF fast enough, so it would be disabled before
reaching VIRTCHNL_OP_ENABLE_QUEUES.
while true; do
ip link set dev ens785f0v0 up
sleep 0.065 # adjust delay value for you machine
ip link set dev ens785f0v0 down
done
Fixes: 519d5f19ed82 ("ice: add support for virtchnl_queue_select.[tx|rx]_queues bitmap") Signed-off-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Alice Michael <alice.michael@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
ice: Set txq_teid to ICE_INVAL_TEID on ring creation
When VF is freshly created, but not brought up, ring->txq_teid
value is by default set to 0.
But 0 is a valid TEID. On some platforms the Root Node of
Tx scheduler has a TEID = 0. This can cause issues as shown below.
The proper way is to set ring->txq_teid to ICE_INVAL_TEID (0xFFFFFFFF).
Testing Hints:
echo 1 > /sys/class/net/ens785f0/device/sriov_numvfs
ip link set dev ens785f0v0 up
ip link set dev ens785f0v0 down
If we have freshly created VF and quickly turn it on and off, so there
would be no time to reach VIRTCHNL_OP_CONFIG_VSI_QUEUES stage, then
VIRTCHNL_OP_DISABLE_QUEUES stage will fail with error:
[ 639.531454] disable queue 89 failed 14
[ 639.532233] Failed to disable LAN Tx queues, error: ICE_ERR_AQ_ERROR
[ 639.533107] ice 0000:02:00.0: Failed to stop Tx ring 0 on VSI 5
The reason for the fail is that we are trying to send AQ command to
delete queue 89, which has never been created and receive an "invalid
argument" error from firmware.
As this queue has never been created, it's teid and ring->txq_teid
have default value 0.
ice_dis_vsi_txq has a check against non-existent queues:
node = ice_sched_find_node_by_teid(pi->root, q_teids[i]);
if (!node)
continue;
But on some platforms the Root Node of Tx scheduler has a teid = 0.
Hence, ice_sched_find_node_by_teid finds a node with teid = 0 (it is
pi->root), and we go further to submit an erroneous request to firmware.
Fixes: 8aaf8ebf6e65 ("ice: Move common functions out of ice_main.c part 7/7") Signed-off-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Alice Michael <alice.michael@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Singleton chunks (INIT, HEARTBEAT PMTU probes, and SHUTDOWN-
COMPLETE) are not counted in SCTP_GET_ASOC_STATS "sas_octrlchunks"
counter available to the assoc owner.
These are all control chunks so they should be counted as such.
Add counting of singleton chunks so they are properly accounted for.
Martin Habets [Mon, 4 Apr 2022 10:48:51 +0000 (11:48 +0100)]
sfc: Do not free an empty page_ring
When the page_ring is not used page_ptr_mask is 0.
Do not dereference page_ring[0] in this case.
Fixes: d42cac8fb5e1 ("sfc: reuse pages to avoid DMA mapping/unmapping costs") Reported-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Martin Habets <habetsm.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Tom Rix [Sun, 3 Apr 2022 14:02:02 +0000 (10:02 -0400)]
stmmac: dwmac-loongson: change loongson_dwmac_driver from global to static
Smatch reports this issue
dwmac-loongson.c:208:19: warning: symbol
'loongson_dwmac_driver' was not declared.
Should it be static?
loongson_dwmac_driver is only used in dwmac-loongson.c.
File scope variables used only in one file should
be static. Change loongson_dwmac_driver's
storage-class-specifier from global to static.
Signed-off-by: Tom Rix <trix@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 4 Apr 2022 11:44:50 +0000 (12:44 +0100)]
Merge branch 'bnxt_en-fixes'
Michael Chan says:
====================
bnxt_en: XDP redirect fixes
This series includes 3 fixes related to the XDP redirect code path in
the driver. The first one adds locking when the number of TX XDP rings
is less than the number of CPUs. The second one adjusts the maximum MTU
that can support XDP with enough tail room in the buffer. The 3rd one
fixes a race condition between TX ring shutdown and the XDP redirect path.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ray Jui [Sat, 2 Apr 2022 00:21:12 +0000 (20:21 -0400)]
bnxt_en: Prevent XDP redirect from running when stopping TX queue
Add checks in the XDP redirect callback to prevent XDP from running when
the TX ring is undergoing shutdown.
Also remove redundant checks in the XDP redirect callback to validate the
txr and the flag that indicates the ring supports XDP. The modulo
arithmetic on 'tx_nr_rings_xdp' already guarantees the derived TX
ring is an XDP ring. txr is also guaranteed to be valid after checking
BNXT_STATE_OPEN and within RCU grace period.
Fixes: 6260944814e6 ("bnxt_en: optimized XDP_REDIRECT support") Reviewed-by: Vladimir Olovyannikov <vladimir.olovyannikov@broadcom.com> Signed-off-by: Ray Jui <ray.jui@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Gospodarek [Sat, 2 Apr 2022 00:21:11 +0000 (20:21 -0400)]
bnxt_en: reserve space inside receive page for skb_shared_info
Insufficient space was being reserved in the page used for packet
reception, so the interface MTU could be set too large to still have
room for the contents of the packet when doing XDP redirect. This
resulted in the following message when redirecting a packet between
3520 and 3822 bytes with an MTU of 3822:
bnxt_en: Synchronize tx when xdp redirects happen on same ring
If there are more CPUs than the number of TX XDP rings, multiple XDP
redirects can select the same TX ring based on the CPU on which
XDP redirect is called. Add locking when needed and use static
key to decide whether to take the lock.
Fixes: 6260944814e6 ("bnxt_en: optimized XDP_REDIRECT support") Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
To fix a coverity complain, commit 8d32d003a2f9
("qed: Initialize debug string array") removed "sw-platform"
(one of the common global parameters) from the dump as this
was used in the dump with an uninitialized string, however
it did not reduce the number of common global parameters
which caused the incorrect (unable to parse) register dump
this patch fixes it with reducing NUM_COMMON_GLOBAL_PARAMS
bye one.
Cc: stable@vger.kernel.org Cc: Tim Gardner <tim.gardner@canonical.com> Cc: "David S. Miller" <davem@davemloft.net> Fixes: 8d32d003a2f9 ("qed: Initialize debug string array") Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com> Signed-off-by: Alok Prasad <palok@marvell.com> Signed-off-by: Ariel Elior <aelior@marvell.com> Signed-off-by: Manish Chopra <manishc@marvell.com> Reviewed-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 4 Apr 2022 11:40:42 +0000 (12:40 +0100)]
Merge branch 'micrel-lan8814-remove-latencies'
Horatiu Vultur says:
====================
net: phy: micrel: Remove latencies support lan8814
Remove the latencies support both from the PHY driver and from the DT.
The IP already has some default latencies values which can be used to get
decent results. It has the following values(defined in ns):
rx-1000mbit: 429
tx-1000mbit: 201
rx-100mbit: 2346
tx-100mbit: 705
v0->v1:
- fix the split of the patches, there was a compiling error between patch 2 and
patch 3.
---
But to get better results the following values needs to be set:
rx-1000mbit: 459
tx-1000mbit: 171
rx-100mbit: 1706
tx-100mbit: 1345
We are proposing to use ethtool to set these latencies, the RFC can be found
here[1]
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When the PHY and the MAC are capable of doing timestamping, the PHY has
priority. Therefore the DT option lan8814,ignore-ts was added such that
the PHY will not expose a PHC so then the timestamping was done in the
MAC. This is not the correct approach of doing it, therefore remove
this.
Fixes: 8a86c40facf211 ("net: phy: micrel: 1588 support for LAN8814 phy") Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Based on the discussions here[1], the PHY driver is the wrong place
to set the latencies, therefore remove them.
[1] https://lkml.org/lkml/2022/3/4/325
Fixes: 8a86c40facf211 ("net: phy: micrel: 1588 support for LAN8814 phy") Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
dt-bindings: net: micrel: Revert latency support and timestamping check
Revert latency support from binding.
Based on the discussion[1], the DT is the wrong place to have the
lantecies for the PHY.
[1] https://lkml.org/lkml/2022/3/4/325
Fixes: 2834d92d976f65 ("dt-bindings: net: micrel: Configure latency values and timestamping check for LAN8814 phy") Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
selftests: net: fix nexthop warning cleanup double ip typo
I made a stupid typo when adding the nexthop route warning selftest and
added both $IP and ip after it (double ip) on the cleanup path. The
error doesn't show up when running the test, but obviously it doesn't
cleanup properly after it.
Fixes: f8abe3532c37 ("selftests: net: add delete nexthop route warning test") Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Chen-Yu Tsai [Thu, 31 Mar 2022 18:48:32 +0000 (02:48 +0800)]
net: stmmac: Fix unset max_speed difference between DT and non-DT platforms
In commit f0cd312db817 ("net: stmmac: support max-speed device tree
property"), when DT platforms don't set "max-speed", max_speed is set to
-1; for non-DT platforms, it stays the default 0.
Prior to commit 69052134d2bc ("net: stmmac: Start adding phylink support"),
the check for a valid max_speed setting was to check if it was greater
than zero. This commit got it right, but subsequent patches just checked
for non-zero, which is incorrect for DT platforms.
In commit 2f5e414e6ced ("net: stmmac: convert to phylink_get_linkmodes()")
the conversion switched completely to checking for non-zero value as a
valid value, which caused 1000base-T to stop getting advertised by
default.
Instead of trying to fix all the checks, simply leave max_speed alone if
DT property parsing fails.
Fixes: f0cd312db817 ("net: stmmac: support max-speed device tree property") Fixes: 2f5e414e6ced ("net: stmmac: convert to phylink_get_linkmodes()") Signed-off-by: Chen-Yu Tsai <wens@csie.org> Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Link: https://lore.kernel.org/r/20220331184832.16316-1-wens@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The first patch fixes a warning that can be triggered by deleting a
nexthop route and specifying a device (more info in its commit msg).
And the second patch adds a selftest for that case.
Chose this way to fix it because we should match when deleting without
nh spec and should fail when deleting a nexthop route with old-style nh
spec because nexthop objects are managed separately, e.g.:
$ ip r show 1.2.3.4/32
1.2.3.4 nhid 12 via 192.168.11.2 dev dummy0
$ ip r del 1.2.3.4/32
$ ip r del 1.2.3.4/32 nhid 12
<both should work>
$ ip r del 1.2.3.4/32 dev dummy0
<should fail with ESRCH>
v2: addded more to patch 01's commit message
adjusted the test comment in patch 02
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
selftests: net: add delete nexthop route warning test
Add a test which causes a WARNING on kernels which treat a
nexthop route like a normal route when comparing for deletion and a
device is specified. That is, a route is found but we hit a warning while
matching it. The warning is from fib_info_nh() in include/net/nexthop.h
because we run it on a fib_info with nexthop object. The call chain is:
inet_rtm_delroute -> fib_table_delete -> fib_nh_match (called with a
nexthop fib_info and also with fc_oif set thus calling fib_info_nh on
the fib_info and triggering the warning).
Repro steps:
$ ip nexthop add id 12 via 172.16.1.3 dev veth1
$ ip route add 172.16.101.1/32 nhid 12
$ ip route delete 172.16.101.1/32 dev veth1
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
net: ipv4: fix route with nexthop object delete warning
FRR folks have hit a kernel warning[1] while deleting routes[2] which is
caused by trying to delete a route pointing to a nexthop id without
specifying nhid but matching on an interface. That is, a route is found
but we hit a warning while matching it. The warning is from
fib_info_nh() in include/net/nexthop.h because we run it on a fib_info
with nexthop object. The call chain is:
inet_rtm_delroute -> fib_table_delete -> fib_nh_match (called with a
nexthop fib_info and also with fc_oif set thus calling fib_info_nh on
the fib_info and triggering the warning). The fix is to not do any
matching in that branch if the fi has a nexthop object because those are
managed separately. I.e. we should match when deleting without nh spec and
should fail when deleting a nexthop route with old-style nh spec because
nexthop objects are managed separately, e.g.:
$ ip r show 1.2.3.4/32
1.2.3.4 nhid 12 via 192.168.11.2 dev dummy0
$ ip r del 1.2.3.4/32
$ ip r del 1.2.3.4/32 nhid 12
<both should work>
$ ip r del 1.2.3.4/32 dev dummy0
<should fail with ESRCH>
Fixes: af674a45925c ("ipv4: Plumb support for nexthop object in a fib_info") Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Randy Dunlap [Fri, 1 Apr 2022 05:42:44 +0000 (22:42 -0700)]
net: micrel: fix KS8851_MLL Kconfig
KS8851_MLL selects MICREL_PHY, which depends on PTP_1588_CLOCK_OPTIONAL,
so make KS8851_MLL also depend on PTP_1588_CLOCK_OPTIONAL since
'select' does not follow any dependency chains.
Fixes kconfig warning and build errors:
WARNING: unmet direct dependencies detected for MICREL_PHY
Depends on [m]: NETDEVICES [=y] && PHYLIB [=y] && PTP_1588_CLOCK_OPTIONAL [=m]
Selected by [y]:
- KS8851_MLL [=y] && NETDEVICES [=y] && ETHERNET [=y] && NET_VENDOR_MICREL [=y] && HAS_IOMEM [=y]
ld: drivers/net/phy/micrel.o: in function `lan8814_ts_info':
micrel.c:(.text+0xb35): undefined reference to `ptp_clock_index'
ld: drivers/net/phy/micrel.o: in function `lan8814_probe':
micrel.c:(.text+0x2586): undefined reference to `ptp_clock_register'
Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Matt Johnston [Fri, 1 Apr 2022 02:48:44 +0000 (10:48 +0800)]
mctp: Use output netdev to allocate skb headroom
Previously the skb was allocated with headroom MCTP_HEADER_MAXLEN,
but that isn't sufficient if we are using devs that are not MCTP
specific.
This also adds a check that the smctp_halen provided to sendmsg for
extended addressing is the correct size for the netdev.
Fixes: 0ccc6f68a37b ("mctp: Populate socket implementation") Reported-by: Matthew Rinaldi <mjrinal@g.clemson.edu> Signed-off-by: Matt Johnston <matt@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Matt Johnston [Fri, 1 Apr 2022 02:48:43 +0000 (10:48 +0800)]
mctp i2c: correct mctp_i2c_header_create result
header_ops.create should return the length of the header,
instead mctp_i2c_head_create() returned 0.
This didn't cause any problem because the MCTP stack accepted
0 as success.
Fixes: 816f015977b4 ("mctp i2c: MCTP I2C binding driver") Signed-off-by: Matt Johnston <matt@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Matt Johnston [Fri, 1 Apr 2022 02:48:42 +0000 (10:48 +0800)]
mctp: Fix check for dev_hard_header() result
dev_hard_header() returns the length of the header, so
we need to test for negative errors rather than non-zero.
Fixes: fcb52e4d03dd ("mctp: Add initial routing framework") Signed-off-by: Matt Johnston <matt@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 1 Apr 2022 11:01:38 +0000 (12:01 +0100)]
Merge branch 'ice-fixups'
Tony Nguyen says:
====================
ice-fixups
This series handles a handful of cleanups for the ice
driver. Ivan fixed a problem on the VSI during a release,
fixing a MAC address setting, and a broken IFF_ALLMULTI
handling.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ivan Vecera [Thu, 31 Mar 2022 16:20:08 +0000 (09:20 -0700)]
ice: Fix broken IFF_ALLMULTI handling
Handling of all-multicast flag and associated multicast promiscuous
mode is broken in ice driver. When an user switches allmulticast
flag on or off the driver checks whether any VLANs are configured
over the interface (except default VLAN 0).
If any extra VLANs are registered it enables multicast promiscuous
mode for all these VLANs (including default VLAN 0) using
ICE_SW_LKUP_PROMISC_VLAN look-up type. In this situation all
multicast packets tagged with known VLAN ID or untagged are received
and multicast packets tagged with unknown VLAN ID ignored.
If no extra VLANs are registered (so only VLAN 0 exists) it enables
multicast promiscuous mode for VLAN 0 and uses ICE_SW_LKUP_PROMISC
look-up type. In this situation any multicast packets including
tagged ones are received.
The driver handles IFF_ALLMULTI in ice_vsi_sync_fltr() this way:
ice_vsi_sync_fltr() {
...
if (changed_flags & IFF_ALLMULTI) {
if (netdev->flags & IFF_ALLMULTI) {
if (vsi->num_vlans > 1)
ice_set_promisc(..., ICE_MCAST_VLAN_PROMISC_BITS);
else
ice_set_promisc(..., ICE_MCAST_PROMISC_BITS);
} else {
if (vsi->num_vlans > 1)
ice_clear_promisc(..., ICE_MCAST_VLAN_PROMISC_BITS);
else
ice_clear_promisc(..., ICE_MCAST_PROMISC_BITS);
}
}
...
}
The code above depends on value vsi->num_vlan that specifies number
of VLANs configured over the interface (including VLAN 0) and
this is problem because that value is modified in NDO callbacks
ice_vlan_rx_add_vid() and ice_vlan_rx_kill_vid().
Scenario 1:
1. ip link set ens7f0 allmulticast on
2. ip link add vlan10 link ens7f0 type vlan id 10
3. ip link set ens7f0 allmulticast off
4. ip link set ens7f0 allmulticast on
[1] In this scenario IFF_ALLMULTI is enabled and the driver calls
ice_set_promisc(..., ICE_MCAST_PROMISC_BITS) that installs
multicast promisc rule with non-VLAN look-up type.
[2] Then VLAN with ID 10 is added and vsi->num_vlan incremented to 2
[3] Command switches IFF_ALLMULTI off and the driver calls
ice_clear_promisc(..., ICE_MCAST_VLAN_PROMISC_BITS) but this
call is effectively NOP because it looks for multicast promisc
rules for VLAN 0 and VLAN 10 with VLAN look-up type but no such
rules exist. So the all-multicast remains enabled silently
in hardware.
[4] Command tries to switch IFF_ALLMULTI on and the driver calls
ice_clear_promisc(..., ICE_MCAST_PROMISC_BITS) but this call
fails (-EEXIST) because non-VLAN multicast promisc rule already
exists.
Scenario 2:
1. ip link add vlan10 link ens7f0 type vlan id 10
2. ip link set ens7f0 allmulticast on
3. ip link add vlan20 link ens7f0 type vlan id 20
4. ip link del vlan10 ; ip link del vlan20
5. ip link set ens7f0 allmulticast off
[1] VLAN with ID 10 is added and vsi->num_vlan==2
[2] Command switches IFF_ALLMULTI on and driver installs multicast
promisc rules with VLAN look-up type for VLAN 0 and 10
[3] VLAN with ID 20 is added and vsi->num_vlan==3 but no multicast
promisc rules is added for this new VLAN so the interface does
not receive MC packets from VLAN 20
[4] Both VLANs are removed but multicast rule for VLAN 10 remains
installed so interface receives multicast packets from VLAN 10
[5] Command switches IFF_ALLMULTI off and because vsi->num_vlan is 1
the driver tries to remove multicast promisc rule for VLAN 0
with non-VLAN look-up that does not exist.
All-multicast looks disabled from user point of view but it
is partially enabled in HW (interface receives all multicast
packets either untagged or tagged with VLAN ID 10)
To resolve these issues the patch introduces these changes:
1. Adds handling for IFF_ALLMULTI to ice_vlan_rx_add_vid() and
ice_vlan_rx_kill_vid() callbacks. So when VLAN is added/removed
and IFF_ALLMULTI is enabled an appropriate multicast promisc
rule for that VLAN ID is added/removed.
2. In ice_vlan_rx_add_vid() when first VLAN besides VLAN 0 is added
so (vsi->num_vlan == 2) and IFF_ALLMULTI is enabled then look-up
type for existing multicast promisc rule for VLAN 0 is updated
to ICE_MCAST_VLAN_PROMISC_BITS.
3. In ice_vlan_rx_kill_vid() when last VLAN besides VLAN 0 is removed
so (vsi->num_vlan == 1) and IFF_ALLMULTI is enabled then look-up
type for existing multicast promisc rule for VLAN 0 is updated
to ICE_MCAST_PROMISC_BITS.
4. Both ice_vlan_rx_{add,kill}_vid() have to run under ICE_CFG_BUSY
bit protection to avoid races with ice_vsi_sync_fltr() that runs
in ice_service_task() context.
5. Bit ICE_VSI_VLAN_FLTR_CHANGED is use-less and can be removed.
6. Error messages added to ice_fltr_*_vsi_promisc() helper functions
to avoid them in their callers
7. Small improvements to increase readability
Fixes: a3fba1d82808 ("ice: Add support for PF/VF promiscuous mode") Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Alice Michael <alice.michael@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ivan Vecera [Thu, 31 Mar 2022 16:20:07 +0000 (09:20 -0700)]
ice: Fix MAC address setting
Commit 1dc1b067ea1632 ("ice: Remove excess error variables") merged
the usage of 'status' and 'err' variables into single one in
function ice_set_mac_address(). Unfortunately this causes
a regression when call of ice_fltr_add_mac() returns -EEXIST because
this return value does not indicate an error in this case but
value of 'err' remains to be -EEXIST till the end of the function
and is returned to caller.
Prior mentioned commit this does not happen because return value of
ice_fltr_add_mac() was stored to 'status' variable first and
if it was -EEXIST then 'err' remains to be zero.
Fix the problem by reset 'err' to zero when ice_fltr_add_mac()
returns -EEXIST.
Fixes: 1dc1b067ea1632 ("ice: Remove excess error variables") Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Alexander Lobakin <alexandr.lobakin@intel.com> Signed-off-by: Alice Michael <alice.michael@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ivan Vecera [Thu, 31 Mar 2022 16:20:06 +0000 (09:20 -0700)]
ice: Clear default forwarding VSI during VSI release
VSI is set as default forwarding one when promisc mode is set for
PF interface, when PF is switched to switchdev mode or when VF
driver asks to enable allmulticast or promisc mode for the VF
interface (when vf-true-promisc-support priv flag is off).
The third case is buggy because in that case VSI associated with
VF remains as default one after VF removal.
Reproducer:
1. Create VF
echo 1 > sys/class/net/ens7f0/device/sriov_numvfs
2. Enable allmulticast or promisc mode on VF
ip link set ens7f0v0 allmulticast on
ip link set ens7f0v0 promisc on
3. Delete VF
echo 0 > sys/class/net/ens7f0/device/sriov_numvfs
4. Try to enable promisc mode on PF
ip link set ens7f0 promisc on
Although it looks that promisc mode on PF is enabled the opposite
is true because ice_vsi_sync_fltr() responsible for IFF_PROMISC
handling first checks if any other VSI is set as default forwarding
one and if so the function does not do anything. At this point
it is not possible to enable promisc mode on PF without re-probe
device.
To resolve the issue this patch clear default forwarding VSI
during ice_vsi_release() when the VSI to be released is the default
one.
Fixes: 3580aa26254f ("ice: Add VF promiscuous support") Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Alice Michael <alice.michael@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Switch drivers that don't implement ->port_change_mtu() will cause the
DSA master to remain with an MTU of 1500, since we've deleted the other
code path. In turn, this causes a regression for those systems, where
MTU-sized traffic can no longer be terminated.
Revert the change taking into account the fact that rtnl_lock() is now
taken top-level from the callers of dsa_master_setup() and
dsa_master_teardown(). Also add a comment in order for it to be
absolutely clear why it is still needed.
Fixes: 446437365619 ("net: dsa: stop updating master MTU from master.c") Reported-by: Luiz Angelo Daros de Luca <luizluca@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Luiz Angelo Daros de Luca <luizluca@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
skbuff: fix coalescing for page_pool fragment recycling
Fix a use-after-free when using page_pool with page fragments. We
encountered this problem during normal RX in the hns3 driver:
(1) Initially we have three descriptors in the RX queue. The first one
allocates PAGE1 through page_pool, and the other two allocate one
half of PAGE2 each. Page references look like this:
In skb_try_coalesce(), __skb_frag_ref() takes a page reference to
PAGE2, where it should instead have increased the page_pool frag
reference, pp_frag_count. Without coalescing, when releasing both
SKB2 and SKB3, a single reference to PAGE2 would be dropped. Now
when releasing SKB1 and SKB2, two references to PAGE2 will be
dropped, resulting in underflow.
(3c) Drop SKB2:
af_packet_rcv(SKB2)
consume_skb(SKB2)
skb_release_data(SKB2) // drops second dataref
page_pool_return_skb_page(PAGE2) // drops one pp_frag_count
SKB1 _____ PAGE1
\____
PAGE2
/
RX_BD3 _________/
(4) Userspace calls recvmsg()
Copies SKB1 and releases it. Since SKB3 was coalesced with SKB1, we
release the SKB3 page as well:
tcp_eat_recv_skb(SKB1)
skb_release_data(SKB1)
page_pool_return_skb_page(PAGE1)
page_pool_return_skb_page(PAGE2) // drops second pp_frag_count
(5) PAGE2 is freed, but the third RX descriptor was still using it!
In our case this causes IOMMU faults, but it would silently corrupt
memory if the IOMMU was disabled.
Change the logic that checks whether pp_recycle SKBs can be coalesced.
We still reject differing pp_recycle between 'from' and 'to' SKBs, but
in order to avoid the situation described above, we also reject
coalescing when both 'from' and 'to' are pp_recycled and 'from' is
cloned.
The new logic allows coalescing a cloned pp_recycle SKB into a page
refcounted one, because in this case the release (4) will drop the right
reference, the one taken by skb_try_coalesce().
Fixes: 1c55a919b1b6 ("page_pool: add frag page recycling support in page pool") Suggested-by: Alexander Duyck <alexanderduyck@fb.com> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Reviewed-by: Yunsheng Lin <linyunsheng@huawei.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eyal Birger [Thu, 31 Mar 2022 07:26:43 +0000 (10:26 +0300)]
vrf: fix packet sniffing for traffic originating from ip tunnels
in commit e8ad1a30028d
("vrf: add mac header for tunneled packets when sniffer is attached")
an Ethernet header was cooked for traffic originating from tunnel devices.
However, the header is added based on whether the mac_header is unset
and ignores cases where the device doesn't expose a mac header to upper
layers, such as in ip tunnels like ipip and gre.
Traffic originating from such devices still appears garbled when capturing
on the vrf device.
Fix by observing whether the original device exposes a header to upper
layers, similar to the logic done in af_packet.
In addition, skb->mac_len needs to be adjusted after adding the Ethernet
header for the skb_push/pull() surrounding dev_queue_xmit_nit() to work
on these packets.
Fixes: e8ad1a30028d ("vrf: add mac header for tunneled packets when sniffer is attached") Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Ziyang Xuan [Thu, 31 Mar 2022 07:04:28 +0000 (15:04 +0800)]
net/tls: fix slab-out-of-bounds bug in decrypt_internal
The memory size of tls_ctx->rx.iv for AES128-CCM is 12 setting in
tls_set_sw_offload(). The return value of crypto_aead_ivsize()
for "ccm(aes)" is 16. So memcpy() require 16 bytes from 12 bytes
memory space will trigger slab-out-of-bounds bug as following:
==================================================================
BUG: KASAN: slab-out-of-bounds in decrypt_internal+0x385/0xc40 [tls]
Read of size 16 at addr ffff888114e84e60 by task tls/10911
Replace the crypto_aead_ivsize() with prot->iv_size + prot->salt_size
when memcpy() iv value in TLS_1_3_VERSION scenario.
Fixes: 9f6c9f1dbbc3 ("net/tls: Add support of AES128-CCM based ciphers") Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Taehee Yoo [Wed, 30 Mar 2022 16:37:03 +0000 (16:37 +0000)]
net: sfc: add missing xdp queue reinitialization
After rx/tx ring buffer size is changed, kernel panic occurs when
it acts XDP_TX or XDP_REDIRECT.
When tx/rx ring buffer size is changed(ethtool -G), sfc driver
reallocates and reinitializes rx and tx queues and their buffer
(tx_queue->buffer).
But it misses reinitializing xdp queues(efx->xdp_tx_queues).
So, while it is acting XDP_TX or XDP_REDIRECT, it uses the uninitialized
tx_queue->buffer.
A new function efx_set_xdp_channels() is separated from efx_set_channels()
to handle only xdp queues.
Fixes: 2e6b0faad9d7 ("sfc: allocate channels for XDP tx queues") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Thu, 31 Mar 2022 18:23:31 +0000 (11:23 -0700)]
Merge tag 'net-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull more networking updates from Jakub Kicinski:
"Networking fixes and rethook patches.
Features:
- kprobes: rethook: x86: replace kretprobe trampoline with rethook
Current release - regressions:
- sfc: avoid null-deref on systems without NUMA awareness in the new
queue sizing code
Current release - new code bugs:
- vxlan: do not feed vxlan_vnifilter_dump_dev with non-vxlan devices
- eth: lan966x: fix null-deref on PHY pointer in timestamp ioctl when
interface is down
Previous releases - always broken:
- openvswitch: correct neighbor discovery target mask field in the
flow dump
- wireguard: ignore v6 endpoints when ipv6 is disabled and fix a leak
- rxrpc: fix call timer start racing with call destruction
- rxrpc: fix null-deref when security type is rxrpc_no_security
- can: fix UAF bugs around echo skbs in multiple drivers
Misc:
- docs: move netdev-FAQ to the 'process' section of the
documentation"
* tag 'net-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (57 commits)
vxlan: do not feed vxlan_vnifilter_dump_dev with non vxlan devices
openvswitch: Add recirc_id to recirc warning
rxrpc: fix some null-ptr-deref bugs in server_key.c
rxrpc: Fix call timer start racing with call destruction
net: hns3: fix software vlan talbe of vlan 0 inconsistent with hardware
net: hns3: fix the concurrency between functions reading debugfs
docs: netdev: move the netdev-FAQ to the process pages
docs: netdev: broaden the new vs old code formatting guidelines
docs: netdev: call out the merge window in tag checking
docs: netdev: add missing back ticks
docs: netdev: make the testing requirement more stringent
docs: netdev: add a question about re-posting frequency
docs: netdev: rephrase the 'should I update patchwork' question
docs: netdev: rephrase the 'Under review' question
docs: netdev: shorten the name and mention msgid for patch status
docs: netdev: note that RFC postings are allowed any time
docs: netdev: turn the net-next closed into a Warning
docs: netdev: move the patch marking section up
docs: netdev: minor reword
docs: netdev: replace references to old archives
...
Eric Dumazet [Wed, 30 Mar 2022 19:46:43 +0000 (12:46 -0700)]
vxlan: do not feed vxlan_vnifilter_dump_dev with non vxlan devices
vxlan_vnifilter_dump_dev() assumes it is called only
for vxlan devices. Make sure it is the case.
BUG: KASAN: slab-out-of-bounds in vxlan_vnifilter_dump_dev+0x9a0/0xb40 drivers/net/vxlan/vxlan_vnifilter.c:349
Read of size 4 at addr ffff888060d1ce70 by task syz-executor.3/17662
Jakub Kicinski [Thu, 31 Mar 2022 15:36:17 +0000 (08:36 -0700)]
Merge tag 'linux-can-fixes-for-5.18-20220331' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2022-03-31
The first patch is by Oliver Hartkopp and fixes MSG_PEEK feature in
the CAN ISOTP protocol (broken in net-next for v5.18 only).
Tom Rix's patch for the mcp251xfd driver fixes the propagation of an
error value in case of an error.
A patch by me for the m_can driver fixes a use-after-free in the xmit
handler for m_can IP cores v3.0.x.
Hangyu Hua contributes 3 patches fixing the same double free in the
error path of the xmit handler in the ems_usb, usb_8dev and mcba_usb
USB CAN driver.
Pavel Skripkin contributes a patch for the mcba_usb driver to properly
check the endpoint type.
The last patch is by me and fixes a mem leak in the gs_usb, which was
introduced in net-next for v5.18.
* tag 'linux-can-fixes-for-5.18-20220331' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
can: gs_usb: gs_make_candev(): fix memory leak for devices with extended bit timing configuration
can: mcba_usb: properly check endpoint type
can: mcba_usb: mcba_usb_start_xmit(): fix double dev_kfree_skb in error path
can: usb_8dev: usb_8dev_start_xmit(): fix double dev_kfree_skb() in error path
can: ems_usb: ems_usb_start_xmit(): fix double dev_kfree_skb() in error path
can: m_can: m_can_tx_handler(): fix use after free of skb
can: mcp251xfd: mcp251xfd_register_get_dev_id(): fix return of error value
can: isotp: restore accidentally removed MSG_PEEK feature
====================
Xiaolong Huang [Wed, 30 Mar 2022 14:22:14 +0000 (15:22 +0100)]
rxrpc: fix some null-ptr-deref bugs in server_key.c
Some function calls are not implemented in rxrpc_no_security, there are
preparse_server_key, free_preparse_server_key and destroy_server_key.
When rxrpc security type is rxrpc_no_security, user can easily trigger a
null-ptr-deref bug via ioctl. So judgment should be added to prevent it
David Howells [Wed, 30 Mar 2022 14:39:16 +0000 (15:39 +0100)]
rxrpc: Fix call timer start racing with call destruction
The rxrpc_call struct has a timer used to handle various timed events
relating to a call. This timer can get started from the packet input
routines that are run in softirq mode with just the RCU read lock held.
Unfortunately, because only the RCU read lock is held - and neither ref or
other lock is taken - the call can start getting destroyed at the same time
a packet comes in addressed to that call. This causes the timer - which
was already stopped - to get restarted. Later, the timer dispatch code may
then oops if the timer got deallocated first.
Fix this by trying to take a ref on the rxrpc_call struct and, if
successful, passing that ref along to the timer. If the timer was already
running, the ref is discarded.
The timer completion routine can then pass the ref along to the call's work
item when it queues it. If the timer or work item where already
queued/running, the extra ref is discarded.
Guangbin Huang [Wed, 30 Mar 2022 13:45:06 +0000 (21:45 +0800)]
net: hns3: fix software vlan talbe of vlan 0 inconsistent with hardware
When user delete vlan 0, as driver will not delete vlan 0 for hardware in
function hclge_set_vlan_filter_hw(), so vlan 0 in software vlan talbe should
not be deleted.
Fixes: 71e2f6111c2f ("net: hns3: sync VLAN filter entries when kill VLAN ID failed") Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Yufeng Mo [Wed, 30 Mar 2022 13:45:05 +0000 (21:45 +0800)]
net: hns3: fix the concurrency between functions reading debugfs
Currently, the debugfs mechanism is that all functions share a
global variable to save the pointer for obtaining data. When
different functions concurrently access the same file node,
repeated release exceptions occur. Therefore, the granularity
of the pointer for storing the obtained data is adjusted to be
private for each function.
Fixes: 51b4d1974cac ("net: hns3: refactor the debugfs process") Signed-off-by: Yufeng Mo <moyufeng@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
====================
docs: update and move the netdev-FAQ
A section of documentation for tree-specific process quirks had
been created a while back. There's only one tree in it, so far,
the tip tree, but the contents seem to answer similar questions
as we answer in the netdev-FAQ. Move the netdev-FAQ.
Take this opportunity to touch up and update a few sections.
v3: remove some confrontational? language from patch 7
v2: remove non-git in patch 3
add patch 5
====================
Jakub Kicinski [Wed, 30 Mar 2022 04:24:54 +0000 (21:24 -0700)]
docs: netdev: move the patch marking section up
We want people to mark their patches with net and net-next in the subject.
Many miss doing that. Move the FAQ section which points that out up, and
place it after the section which enumerates the trees, that seems like
a pretty logical place for it. Since the two sections are together we
can remove a little bit (not too much) of the repetition.
v2: also remove the text for non-git setups, we want people to use git.
Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
can: gs_usb: gs_make_candev(): fix memory leak for devices with extended bit timing configuration
Some CAN-FD capable devices offer extended bit timing information for
the data bit timing. The information must be read with an USB control
message. The memory for this message is allocated but not free()ed (in
the non error case). This patch adds the missing free.
Pavel Skripkin [Sun, 13 Mar 2022 10:09:03 +0000 (13:09 +0300)]
can: mcba_usb: properly check endpoint type
Syzbot reported warning in usb_submit_urb() which is caused by wrong
endpoint type. We should check that in endpoint is actually present to
prevent this warning.
Found pipes are now saved to struct mcba_priv and code uses them
directly instead of making pipes in place.
Fixes: 3405f1d6e64b ("can: mcba_usb: Add support for Microchip CAN BUS Analyzer") Link: https://lore.kernel.org/all/20220313100903.10868-1-paskripkin@gmail.com Reported-and-tested-by: syzbot+3bc1dce0cc0052d60fde@syzkaller.appspotmail.com Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Hangyu Hua [Fri, 11 Mar 2022 08:02:08 +0000 (16:02 +0800)]
can: mcba_usb: mcba_usb_start_xmit(): fix double dev_kfree_skb in error path
There is no need to call dev_kfree_skb() when usb_submit_urb() fails
because can_put_echo_skb() deletes original skb and
can_free_echo_skb() deletes the cloned skb.
Hangyu Hua [Fri, 11 Mar 2022 08:06:14 +0000 (16:06 +0800)]
can: usb_8dev: usb_8dev_start_xmit(): fix double dev_kfree_skb() in error path
There is no need to call dev_kfree_skb() when usb_submit_urb() fails
because can_put_echo_skb() deletes original skb and
can_free_echo_skb() deletes the cloned skb.
Fixes: ffa5f743af83 ("can: usb_8dev: Add support for USB2CAN interface from 8 devices") Link: https://lore.kernel.org/all/20220311080614.45229-1-hbh25y@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Hangyu Hua <hbh25y@gmail.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Hangyu Hua [Mon, 28 Feb 2022 08:36:39 +0000 (16:36 +0800)]
can: ems_usb: ems_usb_start_xmit(): fix double dev_kfree_skb() in error path
There is no need to call dev_kfree_skb() when usb_submit_urb() fails
beacause can_put_echo_skb() deletes the original skb and
can_free_echo_skb() deletes the cloned skb.
Link: https://lore.kernel.org/all/20220228083639.38183-1-hbh25y@gmail.com Fixes: 07edeffafb68 ("ems_usb: Added support for EMS CPC-USB/ARM7 CAN/USB interface") Cc: stable@vger.kernel.org Cc: Sebastian Haas <haas@ems-wuensche.com> Signed-off-by: Hangyu Hua <hbh25y@gmail.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
can: m_can: m_can_tx_handler(): fix use after free of skb
can_put_echo_skb() will clone skb then free the skb. Move the
can_put_echo_skb() for the m_can version 3.0.x directly before the
start of the xmit in hardware, similar to the 3.1.x branch.