Corentin Labbe [Fri, 19 Mar 2021 13:44:22 +0000 (13:44 +0000)]
net: stmmac: dwmac-sun8i: Provide TX and RX fifo sizes
MTU cannot be changed on dwmac-sun8i. (ip link set eth0 mtu xxx returning EINVAL)
This is due to tx_fifo_size being 0, since this value is used to compute valid
MTU range.
Like dwmac-sunxi (with commit a6fbf4e03518 ("net: stmmac: dwmac-sunxi: Provide TX and RX fifo sizes"))
dwmac-sun8i need to have tx and rx fifo sizes set.
I have used values from datasheets.
After this patch, setting a non-default MTU (like 1000) value works and network is still useable.
Tested-on: sun8i-h3-orangepi-pc
Tested-on: sun8i-r40-bananapi-m2-ultra
Tested-on: sun50i-a64-bananapi-m64
Tested-on: sun50i-h5-nanopi-neo-plus2
Tested-on: sun50i-h6-pine-h64 Fixes: 6f537725a8e ("net-next: stmmac: Add dwmac-sun8i") Reported-by: Belisko Marek <marek.belisko@gmail.com> Signed-off-by: Corentin Labbe <clabbe@baylibre.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Hayes Wang [Fri, 19 Mar 2021 07:37:21 +0000 (15:37 +0800)]
r8152: limit the RX buffer size of RTL8153A for USB 2.0
If the USB host controller is EHCI, the throughput is reduced from
300Mb/s to 60Mb/s, when the rx buffer size is modified from 16K to
32K.
According to the EHCI spec, the maximum size of the qTD is 20K.
Therefore, when the driver uses more than 20K buffer, the latency
time of EHCI would be increased. And, it let the RTL8153A get worse
throughput.
However, the driver uses alloc_pages() for rx buffer, so I limit
the rx buffer to 16K rather than 20K.
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=205923 Fixes: a5ac476703cf ("r8152: separate the rx buffer size") Reported-by: Robert Davies <robdavies1977@gmail.com> Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Fri, 19 Mar 2021 03:52:41 +0000 (11:52 +0800)]
sctp: move sk_route_caps check and set into sctp_outq_flush_transports
The sk's sk_route_caps is set in sctp_packet_config, and later it
only needs to change when traversing the transport_list in a loop,
as the dst might be changed in the tx path.
So move sk_route_caps check and set into sctp_outq_flush_transports
from sctp_packet_transmit. This also fixes a dst leak reported by
Chen Yi:
As calling sk_setup_caps() in sctp_packet_transmit may also set the
sk_route_caps for the ctrl sock in a netns. When the netns is being
deleted, the ctrl sock's releasing is later than dst dev's deleting,
which will cause this dev's deleting to hang and dmesg error occurs:
unregister_netdevice: waiting for xxx to become free. Usage count = 1
Reported-by: Chen Yi <yiche@redhat.com> Fixes: fc02dd43bd02 ("sctp: call sk_setup_caps in sctp_packet_transmit instead") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Johan Hovold [Thu, 18 Mar 2021 15:57:49 +0000 (16:57 +0100)]
net: cdc-phonet: fix data-interface release on probe failure
Set the disconnected flag before releasing the data interface in case
netdev registration fails to avoid having the disconnect callback try to
deregister the never registered netdev (and trigger a WARN_ON()).
Fixes: 3791006fe8d8 ("USB host CDC Phonet network interface driver") Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Bohac [Thu, 18 Mar 2021 03:42:53 +0000 (04:42 +0100)]
net: check all name nodes in __dev_alloc_name
__dev_alloc_name(), when supplied with a name containing '%d',
will search for the first available device number to generate a
unique device name.
Since commit 25be6dea04db1cadb8b6172cdcc13962846db444 ("net:
introduce name_node struct to be used in hashlist") network
devices may have alternate names. __dev_alloc_name() does take
these alternate names into account, possibly generating a name
that is already taken and failing with -ENFILE as a result.
This demonstrates the bug:
# rmmod dummy 2>/dev/null
# ip link property add dev lo altname dummy0
# modprobe dummy numdummies=1
modprobe: ERROR: could not insert 'dummy': Too many open files in system
Instead of creating a device named dummy1, modprobe fails.
Fix this by checking all the names in the d->name_node list, not just d->name.
Signed-off-by: Jiri Bohac <jbohac@suse.cz> Fixes: 25be6dea04db ("net: introduce name_node struct to be used in hashlist") Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This series of patches fixes various issues related to NPC MCAM entry
management, debugfs, devlink, CGX LMAC mapping, RSS config etc
Change-log:
v2:
Fixed below review comments
- corrected Fixed tag syntax with 12 digits SHA1
and providing space between SHA1 and subject line
- remove code improvement patch
- make commit description more clear
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Hariprasad Kelam [Thu, 18 Mar 2021 14:15:48 +0000 (19:45 +0530)]
octeontx2-af: fix infinite loop in unmapping NPC counter
unmapping npc counter works in a way by traversing all mcam
entries to find which mcam rule is associated with counter.
But loop cursor variable 'entry' is not incremented before
checking next mcam entry which resulting in infinite loop.
This in turn hogs the kworker thread forever and no other
mbox message is processed by AF driver after that.
Fix this by updating entry value before checking next
mcam entry.
Fixes: edc518a52b20 ("octeontx2-af: Map or unmap NPC MCAM entry and counter") Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Geetha sowjanya [Thu, 18 Mar 2021 14:15:47 +0000 (19:45 +0530)]
octeontx2-pf: Clear RSS enable flag on interace down
RSS configuration can not be get/set when interface is in down state
as they required mbox communication. RSS enable flag status
is used for set/get configuration. Current code do not clear the
RSS enable flag on interface down which lead to mbox error while
trying to set/get RSS configuration.
Fixes: 1dee7ca9acad ("octeontx2-pf: Receive side scaling support") Signed-off-by: Geetha sowjanya <gakula@marvell.com> Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Geetha sowjanya [Thu, 18 Mar 2021 14:15:46 +0000 (19:45 +0530)]
octeontx2-af: Fix irq free in rvu teardown
Current devlink code try to free already freed irqs as the
irq_allocate flag is not cleared after free leading to kernel
crash while removing rvu driver. The patch fixes the irq free
sequence and clears the irq_allocate flag on free.
Fixes: ea924d0ed956 ("octeontx2-af: Add mailbox IRQ and msg handlers") Signed-off-by: Geetha sowjanya <gakula@marvell.com> Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
CGX receive buffer size is a constant value and
cannot be read from CGX0 block always since
CGX0 may not enabled everytime. Hence return CGX
receive buffer size from first enabled CGX block
instead of CGX0.
Fixes: e2cbd6fcef41 ("octeontx2-af: cn10K: MTU configuration") Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The MKEX profile describes what packet fields need to be extracted from
the input packet and how to place those packet fields in the output key
for MCAM matching. The MKEX profile can be in a way where higher layer
packet fields can overwrite lower layer packet fields in output MCAM
Key.
Hence MKEX profile is always ensured that there are no overlaps between
any of the layers. But the commit 978930dabe82
("octeontx2-af: cleanup KPU config data") introduced TX TOS field which
overlaps with DMAC in MCAM key.
This led to AF driver returning error when TX rule is installed with
DMAC as match criteria since DMAC gets overwritten and cannot be
supported. This patch fixes the issue by removing TOS field from MKEX TX
profile.
Fixes: 978930dabe82 ("octeontx2-af: cleanup KPU config data") Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
netsec: restore phy power state after controller reset
Since commit be515903fa81 ("net: socionext: Stop PHY before resetting
netsec") netsec_netdev_init() power downs phy before resetting the
controller. However, the state is not restored once the reset is
complete. As a result it is not possible to bring up network on a
platform with Broadcom BCM5482 phy.
Fix the issue by restoring phy power state after controller reset is
complete.
Fixes: be515903fa81 ("net: socionext: Stop PHY before resetting netsec") Cc: stable@vger.kernel.org Signed-off-by: Mian Yousaf Kaukab <ykaukab@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 5c798a550940 ("ipv6: drop incoming packets having a v4mapped
source address") introduced an input check against v4mapped addresses.
Use of such addresses on the wire is indeed questionable and not
allowed on public Internet. As the commit pointed out
Unfortunately there are applications which use v4mapped addresses,
and breaking them is a clear regression. For example v4mapped
addresses (or any semi-valid addresses, really) may be used
for uni-direction event streams or packet export.
Since the issue which sparked the addition of the check was with
TCP and request_socks in particular push the check down to TCPv6
and DCCP. This restores the ability to receive UDPv6 packets with
v4mapped address as the source.
Keep using the IPSTATS_MIB_INHDRERRORS statistic to minimize the
user-visible changes.
Fixes: 5c798a550940 ("ipv6: drop incoming packets having a v4mapped source address") Reported-by: Sunyi Shao <sunyishao@fb.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The following pull-request contains BPF updates for your *net* tree.
We've added 10 non-merge commits during the last 4 day(s) which contain
a total of 14 files changed, 336 insertions(+), 94 deletions(-).
The main changes are:
1) Fix fexit/fmod_ret trampoline for sleepable programs, and also fix a ftrace
splat in modify_ftrace_direct() on address change, from Alexei Starovoitov.
2) Fix two oob speculation possibilities that allows unprivileged to leak mem
via side-channel, from Piotr Krysiuk and Daniel Borkmann.
netfilter: nftables: skip hook overlap logic if flowtable is stale
If the flowtable has been previously removed in this batch, skip the
hook overlap checks. This fixes spurious EEXIST errors when removing and
adding the flowtable in the same batch.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
libbpf: Use SOCK_CLOEXEC when opening the netlink socket
Otherwise, there exists a small window between the opening and closing
of the socket fd where it may leak into processes launched by some other
thread.
Fixes: 2e8290ccfe4a ("libbpf: add function to setup XDP") Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/20210317115857.6536-1-memxor@gmail.com
Yinjun Zhang [Wed, 17 Mar 2021 12:42:24 +0000 (13:42 +0100)]
netfilter: flowtable: Make sure GC works periodically in idle system
Currently flowtable's GC work is initialized as deferrable, which
means GC cannot work on time when system is idle. So the hardware
offloaded flow may be deleted for timeout, since its used time is
not timely updated.
Resolve it by initializing the GC work as delayed work instead of
deferrable.
Fixes: c550be63df75 ("netfilter: nf_flow_table: hardware offload support") Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com> Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The fexit/fmod_ret programs can be attached to kernel functions that can sleep.
The synchronize_rcu_tasks() will not wait for such tasks to complete.
In such case the trampoline image will be freed and when the task
wakes up the return IP will point to freed memory causing the crash.
Solve this by adding percpu_ref_get/put for the duration of trampoline
and separate trampoline vs its image life times.
The "half page" optimization has to be removed, since
first_half->second_half->first_half transition cannot be guaranteed to
complete in deterministic time. Every trampoline update becomes a new image.
The image with fmod_ret or fexit progs will be freed via percpu_ref_kill and
call_rcu_tasks. Together they will wait for the original function and
trampoline asm to complete. The trampoline is patched from nop to jmp to skip
fexit progs. They are freed independently from the trampoline. The image with
fentry progs only will be freed via call_rcu_tasks_trace+call_rcu_tasks which
will wait for both sleepable and non-sleepable progs to complete.
Fixes: 2cefb9dd93cc ("bpf: Introduce BPF trampoline") Reported-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Paul E. McKenney <paulmck@kernel.org> # for RCU Link: https://lore.kernel.org/bpf/20210316210007.38949-1-alexei.starovoitov@gmail.com
Wei Wang [Tue, 16 Mar 2021 22:36:47 +0000 (15:36 -0700)]
net: fix race between napi kthread mode and busy poll
Currently, napi_thread_wait() checks for NAPI_STATE_SCHED bit to
determine if the kthread owns this napi and could call napi->poll() on
it. However, if socket busy poll is enabled, it is possible that the
busy poll thread grabs this SCHED bit (after the previous napi->poll()
invokes napi_complete_done() and clears SCHED bit) and tries to poll
on the same napi. napi_disable() could grab the SCHED bit as well.
This patch tries to fix this race by adding a new bit
NAPI_STATE_SCHED_THREADED in napi->state. This bit gets set in
____napi_schedule() if the threaded mode is enabled, and gets cleared
in napi_complete_done(), and we only poll the napi in kthread if this
bit is set. This helps distinguish the ownership of the napi between
kthread and other scenarios and fixes the race issue.
Fixes: 19a9abdd15a9 ("net: implement threaded-able napi poll loop support") Reported-by: Martin Zaharinov <micron10@gmail.com> Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Wei Wang <weiwan@google.com> Cc: Alexander Duyck <alexanderduyck@fb.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Piotr Krysiuk [Tue, 16 Mar 2021 10:44:42 +0000 (11:44 +0100)]
bpf, selftests: Fix up some test_verifier cases for unprivileged
Fix up test_verifier error messages for the case where the original error
message changed, or for the case where pointer alu errors differ between
privileged and unprivileged tests. Also, add alternative tests for keeping
coverage of the original verifier rejection error message (fp alu), and
newly reject map_ptr += rX where rX == 0 given we now forbid alu on these
types for unprivileged. All test_verifier cases pass after the change. The
test case fixups were kept separate to ease backporting of core changes.
Signed-off-by: Piotr Krysiuk <piotras@gmail.com> Co-developed-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org>
Piotr Krysiuk [Tue, 16 Mar 2021 08:47:02 +0000 (09:47 +0100)]
bpf: Add sanity check for upper ptr_limit
Given we know the max possible value of ptr_limit at the time of retrieving
the latter, add basic assertions, so that the verifier can bail out if
anything looks odd and reject the program. Nothing triggered this so far,
but it also does not hurt to have these.
Signed-off-by: Piotr Krysiuk <piotras@gmail.com> Co-developed-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org>
David S. Miller [Wed, 17 Mar 2021 19:21:15 +0000 (12:21 -0700)]
Merge tag 'mac80211-for-net-2021-03-17' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
First round of fixes for 5.12-rc:
* HE (802.11ax) elements can be extended, handle that
* fix locking in network namespace changes that was
broken due to the RTNL-redux work
* various other small fixes
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
wenxu [Wed, 17 Mar 2021 04:02:43 +0000 (12:02 +0800)]
net/sched: cls_flower: fix only mask bit check in the validate_ct_state
The ct_state validate should not only check the mask bit and also
check mask_bit & key_bit..
For the +new+est case example, The 'new' and 'est' bits should be
set in both state_mask and state flags. Or the -new-est case also
will be reject by kernel.
When Openvswitch with two flows
ct_state=+trk+new,action=commit,forward
ct_state=+trk+est,action=forward
A packet go through the kernel and the contrack state is invalid,
The ct_state will be +trk-inv. Upcall to the ovs-vswitchd, the
finally dp action will be drop with -new-est+trk.
Fixes: f544e47316a3 ("net/sched: cls_flower: Reject invalid ct_state flags rules") Fixes: 62564537eb6c ("net/sched: cls_flower: validate ct_state for invalid and reply flags") Signed-off-by: wenxu <wenxu@ucloud.cn> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Shannon Nelson [Wed, 17 Mar 2021 00:07:47 +0000 (17:07 -0700)]
ionic: linearize tso skb with too many frags
We were linearizing non-TSO skbs that had too many frags, but
we weren't checking number of frags on TSO skbs. This could
lead to a bad page reference when we received a TSO skb with
more frags than the Tx descriptor could support.
v2: use gso_segs rather than yet another division
don't rework the check on the nr_frags
Fixes: 5931eb5d68ae ("ionic: Add Tx and Rx handling") Signed-off-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: David S. Miller <davem@davemloft.net>
Piotr Krysiuk [Tue, 16 Mar 2021 07:26:25 +0000 (08:26 +0100)]
bpf: Simplify alu_limit masking for pointer arithmetic
Instead of having the mov32 with aux->alu_limit - 1 immediate, move this
operation to retrieve_ptr_limit() instead to simplify the logic and to
allow for subsequent sanity boundary checks inside retrieve_ptr_limit().
This avoids in future that at the time of the verifier masking rewrite
we'd run into an underflow which would not sign extend due to the nature
of mov32 instruction.
Signed-off-by: Piotr Krysiuk <piotras@gmail.com> Co-developed-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org>
Piotr Krysiuk [Tue, 16 Mar 2021 07:20:16 +0000 (08:20 +0100)]
bpf: Fix off-by-one for area size in creating mask to left
retrieve_ptr_limit() computes the ptr_limit for registers with stack and
map_value type. ptr_limit is the size of the memory area that is still
valid / in-bounds from the point of the current position and direction
of the operation (add / sub). This size will later be used for masking
the operation such that attempting out-of-bounds access in the speculative
domain is redirected to remain within the bounds of the current map value.
When masking to the right the size is correct, however, when masking to
the left, the size is off-by-one which would lead to an incorrect mask
and thus incorrect arithmetic operation in the non-speculative domain.
Piotr found that if the resulting alu_limit value is zero, then the
BPF_MOV32_IMM() from the fixup_bpf_calls() rewrite will end up loading
0xffffffff into AX instead of sign-extending to the full 64 bit range,
and as a result, this allows abuse for executing speculatively out-of-
bounds loads against 4GB window of address space and thus extracting the
contents of kernel memory via side-channel.
Fixes: 422fca2ce03f ("bpf: prevent out of bounds speculation on pointer arithmetic") Signed-off-by: Piotr Krysiuk <piotras@gmail.com> Co-developed-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org>
Piotr Krysiuk [Tue, 16 Mar 2021 08:47:02 +0000 (09:47 +0100)]
bpf: Prohibit alu ops for pointer types not defining ptr_limit
The purpose of this patch is to streamline error propagation and in particular
to propagate retrieve_ptr_limit() errors for pointer types that are not defining
a ptr_limit such that register-based alu ops against these types can be rejected.
The main rationale is that a gap has been identified by Piotr in the existing
protection against speculatively out-of-bounds loads, for example, in case of
ctx pointers, unprivileged programs can still perform pointer arithmetic. This
can be abused to execute speculatively out-of-bounds loads without restrictions
and thus extract contents of kernel memory.
Fix this by rejecting unprivileged programs that attempt any pointer arithmetic
on unprotected pointer types. The two affected ones are pointer to ctx as well
as pointer to map. Field access to a modified ctx' pointer is rejected at a
later point in time in the verifier, and bf379367a6f9 ("bpf: Permit map_ptr
arithmetic with opcode add and offset 0") only relevant for root-only use cases.
Risk of unprivileged program breakage is considered very low.
Fixes: bf379367a6f9 ("bpf: Permit map_ptr arithmetic with opcode add and offset 0") Fixes: 6b913f15182a ("bpf: prevent out-of-bounds speculation") Signed-off-by: Piotr Krysiuk <piotras@gmail.com> Co-developed-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org>
The following sequence of commands:
register_ftrace_direct(ip, addr1);
modify_ftrace_direct(ip, addr1, addr2);
unregister_ftrace_direct(ip, addr2);
will cause the kernel to warn:
[ 30.179191] WARNING: CPU: 2 PID: 1961 at kernel/trace/ftrace.c:5223 unregister_ftrace_direct+0x130/0x150
[ 30.180556] CPU: 2 PID: 1961 Comm: test_progs W O 5.12.0-rc2-00378-g86bc10a0a711-dirty #3246
[ 30.182453] RIP: 0010:unregister_ftrace_direct+0x130/0x150
When modify_ftrace_direct() changes the addr from old to new it should update
the addr stored in ftrace_direct_funcs. Otherwise the final
unregister_ftrace_direct() won't find the address and will cause the splat.
Louis Peens [Tue, 16 Mar 2021 18:13:10 +0000 (19:13 +0100)]
nfp: flower: fix pre_tun mask id allocation
pre_tun_rule flows does not follow the usual add-flow path, instead
they are used to update the pre_tun table on the firmware. This means
that if the mask-id gets allocated here the firmware will never see the
"NFP_FL_META_FLAG_MANAGE_MASK" flag for the specific mask id, which
triggers the allocation on the firmware side. This leads to the firmware
mask being corrupted and causing all sorts of strange behaviour.
Fixes: 2d37ba78dc45 ("nfp: flower: offload pre-tunnel rules") Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Louis Peens [Tue, 16 Mar 2021 18:13:09 +0000 (19:13 +0100)]
nfp: flower: add ipv6 bit to pre_tunnel control message
Differentiate between ipv4 and ipv6 flows when configuring the pre_tunnel
table to prevent them trampling each other in the table.
Fixes: 715782139ab5 ("nfp: flower: update flow merge code to support IPv6 tunnels") Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Louis Peens [Tue, 16 Mar 2021 18:13:08 +0000 (19:13 +0100)]
nfp: flower: fix unsupported pre_tunnel flows
There are some pre_tunnel flows combinations which are incorrectly being
offloaded without proper support, fix these.
- Matching on MPLS is not supported for pre_tun.
- Match on IPv4/IPv6 layer must be present.
- Destination MAC address must match pre_tun.dev MAC
Fixes: ef3807c69f65 ("nfp: flower: verify pre-tunnel rules") Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
wenxu [Tue, 16 Mar 2021 08:33:54 +0000 (16:33 +0800)]
net/sched: act_api: fix miss set post_ct for ovs after do conntrack in act_ct
When openvswitch conntrack offload with act_ct action. The first rule
do conntrack in the act_ct in tc subsystem. And miss the next rule in
the tc and fallback to the ovs datapath but miss set post_ct flag
which will lead the ct_state_key with -trk flag.
Fixes: 4c72a1db37dc ("net/sched: cls_flower add CT_FLAGS_INVALID flag support") Signed-off-by: wenxu <wenxu@ucloud.cn> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 16 Mar 2021 22:10:43 +0000 (15:10 -0700)]
Merge tag 'linux-can-fixes-for-5.12-20210316' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2021-03-16
this is a pull request of 11 patches for net/master.
The first patch is by Martin Willi and fixes the deletion of network
name spaces with physical CAN interfaces in them.
The next two patches are by me an fix the ISOTP protocol, to ensure
that unused flags in classical CAN frames are properly initialized to
zero.
Stephane Grosjean contributes a patch for the pcan_usb_fd driver,
which add MODULE_SUPPORTED_DEVICE lines for two supported devices.
Angelo Dureghello's patch for the flexcan driver fixes a potential div
by zero, if the bitrate is not set during driver probe.
Jimmy Assarsson's patch for the kvaser_pciefd disables bus load
reporting in the device, if it was previously enabled by the vendor's
out of tree drier. A patch for the kvaser_usb adds support for a new
device, by adding the appropriate USB product ID.
Tong Zhang contributes two patches for the c_can driver. First a
use-after-free in the c_can_pci driver is fixed, in the second patch
the runtime PM for the c_can_pci is fixed by moving the runtime PM
enable/disable from the core driver to the platform driver.
The last two patches are by Torin Cooper-Bennun for the m_can driver.
First a extraneous msg loss warning is removed then he fixes the
RX-path, which might be blocked by errors.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes: d44f9d6a6de0 ("selftests: net: Add SO_REUSEADDR test to check if 4-tuples are fully utilized.") Signed-off-by: Carlos Llamas <cmllamas@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jarod Wilson [Fri, 12 Mar 2021 16:36:51 +0000 (11:36 -0500)]
wireless/nl80211: fix wdev_id may be used uninitialized
Build currently fails with -Werror=maybe-uninitialized set:
net/wireless/nl80211.c: In function '__cfg80211_wdev_from_attrs':
net/wireless/nl80211.c:124:44: error: 'wdev_id' may be used
uninitialized in this function [-Werror=maybe-uninitialized]
Easy fix is to just initialize wdev_id to 0, since it's value doesn't
otherwise matter unless have_wdev_id is true.
Fixes: 31793ce79203 ("cfg80211: avoid holding the RTNL when calling the driver") CC: Johannes Berg <johannes@sipsolutions.net> CC: "David S. Miller" <davem@davemloft.net> CC: Jakub Kicinski <kuba@kernel.org> CC: linux-wireless@vger.kernel.org CC: netdev@vger.kernel.org Signed-off-by: Jarod Wilson <jarod@redhat.com> Link: https://lore.kernel.org/r/20210312163651.1398207-1-jarod@redhat.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
mac80211: choose first enabled channel for monitor
Even if the first channel from sband channel list is invalid
or disabled mac80211 ends up choosing it as the default channel
for monitor interfaces, making them not usable.
Fix this by assigning the first available valid or enabled
channel instead.
Johannes Berg [Wed, 10 Mar 2021 20:58:40 +0000 (21:58 +0100)]
nl80211: fix locking for wireless device netns change
We have all the network interfaces marked as netns-local
since the only reasonable thing to do right now is to set
a whole device, including all netdevs, into a different
network namespace. For this reason, we also have our own
way of changing the network namespace.
Unfortunately, the RTNL locking changes broke this, and
it now results in many RTNL assertions. The trivial fix
for those (just hold RTNL for the changes) however leads
to deadlocks in the cfg80211 netdev notifier.
Since we only need the wiphy, and that's still protected
by the RTNL, add a new NL80211_FLAG_NO_WIPHY_MTX flag to
the nl80211 ops and use it to _not_ take the wiphy mutex
but only the RTNL. This way, the notifier does all the
work necessary during unregistration/registration of the
netdevs from the old and in the new namespace.
Daniel Phan [Tue, 9 Mar 2021 20:41:36 +0000 (12:41 -0800)]
mac80211: Check crypto_aead_encrypt for errors
crypto_aead_encrypt returns <0 on error, so if these calls are not checked,
execution may continue with failed encrypts. It also seems that these two
crypto_aead_encrypt calls are the only instances in the codebase that are
not checked for errors.
Brian Norris [Tue, 23 Feb 2021 05:19:26 +0000 (13:19 +0800)]
mac80211: Allow HE operation to be longer than expected.
We observed some Cisco APs sending the following HE Operation IE in
associate response:
ff 0a 24 f4 3f 00 01 fc ff 00 00 00
Its HE operation parameter is 0x003ff4, so the expected total length is
7 which does not match the actual length = 10. This causes association
failing with "HE AP is missing HE Capability/operation."
According to P802.11ax_D4 Table9-94, HE operation is extensible, and
according to 802.11-2016 10.27.8, STA should discard the part beyond
the maximum length and parse the truncated element.
Allow HE operation element to be longer than expected to handle this
case and future extensions.
Fixes: bfb6fca62971 ("mac80211: refactor extended element parsing") Signed-off-by: Brian Norris <briannorris@chromium.org> Signed-off-by: Yen-lin Lai <yenlinlai@chromium.org> Link: https://lore.kernel.org/r/20210223051926.2653301-1-yenlinlai@chromium.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Markus Theil [Sat, 13 Feb 2021 13:36:53 +0000 (14:36 +0100)]
mac80211: fix double free in ibss_leave
Clear beacon ie pointer and ie length after free
in order to prevent double free.
==================================================================
BUG: KASAN: double-free or invalid-free \
in ieee80211_ibss_leave+0x83/0xe0 net/mac80211/ibss.c:1876
Johannes Berg [Fri, 12 Feb 2021 10:22:14 +0000 (11:22 +0100)]
mac80211: fix rate mask reset
Coverity reported the strange "if (~...)" condition that's
always true. It suggested that ! was intended instead of ~,
but upon further analysis I'm convinced that what really was
intended was a comparison to 0xff/0xffff (in HT/VHT cases
respectively), since this indicates that all of the rates
are enabled.
Change the comparison accordingly.
I'm guessing this never really mattered because a reset to
not having a rate mask is basically equivalent to having a
mask that enables all rates.
can: m_can: m_can_rx_peripheral(): fix RX being blocked by errors
For M_CAN peripherals, m_can_rx_handler() was called with quota = 1,
which caused any error handling to block RX from taking place until
the next time the IRQ handler is called. This had been observed to
cause RX to be blocked indefinitely in some cases.
This is fixed by calling m_can_rx_handler with a sensibly high quota.
Fixes: 15c73136c866 ("can: m_can: Create a m_can platform framework") Link: https://lore.kernel.org/r/20210303144350.4093750-1-torin@maxiluxsystems.com Suggested-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Torin Cooper-Bennun <torin@maxiluxsystems.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
can: m_can: m_can_do_rx_poll(): fix extraneous msg loss warning
Message loss from RX FIFO 0 is already handled in
m_can_handle_lost_msg(), with netdev output included.
Removing this warning also improves driver performance under heavy
load, where m_can_do_rx_poll() may be called many times before this
interrupt is cleared, causing this message to be output many
times (thanks Mariusz Madej for this report).
There is a UAF in c_can_pci_remove(). dev is released by
free_c_can_dev() and is used by pci_iounmap(pdev, priv->base) later.
To fix this issue, save the mmio address before releasing dev.
Jimmy Assarsson [Tue, 9 Mar 2021 09:17:23 +0000 (10:17 +0100)]
can: kvaser_pciefd: Always disable bus load reporting
Under certain circumstances, when switching from Kvaser's linuxcan driver
(kvpciefd) to the SocketCAN driver (kvaser_pciefd), the bus load reporting
is not disabled.
This is flooding the kernel log with prints like:
[3485.574677] kvaser_pciefd 0000:02:00.0: Received unexpected packet type 0x00000009
Always put the controller in the expected state, instead of assuming that
bus load reporting is inactive.
Note: If bus load reporting is enabled when the driver is loaded, you will
still get a number of bus load packages (and printouts), before it is
disabled.
can: flexcan: flexcan_chip_freeze(): fix chip freeze for missing bitrate
For cases when flexcan is built-in, bitrate is still not set at
registering. So flexcan_chip_freeze() generates:
[ 1.860000] *** ZERO DIVIDE *** FORMAT=4
[ 1.860000] Current process id is 1
[ 1.860000] BAD KERNEL TRAP: 00000000
[ 1.860000] PC: [<402e70c8>] flexcan_chip_freeze+0x1a/0xa8
To allow chip freeze, using an hardcoded timeout when bitrate is still
not set.
Fixes: 6c15f0256dc9 ("can: flexcan: enable RX FIFO after FRZ/HALT valid") Link: https://lore.kernel.org/r/20210315231510.650593-1-angelo@kernel-space.org Signed-off-by: Angelo Dureghello <angelo@kernel-space.org>
[mkl: use if instead of ? operator] Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Since the peak_usb driver also supports the CAN-USB interfaces
"PCAN-USB X6" and "PCAN-Chip USB" from PEAK-System GmbH, this patch adds
their names to the list of explicitly supported devices.
Fixes: 4d074aad1b4a ("can: usb: Add support of PCAN-Chip USB stamp module") Fixes: 79812bfd1764 ("can: peak: Add support for PCAN-USB X6 USB interface") Link: https://lore.kernel.org/r/20210309082128.23125-3-s.grosjean@peak-system.com Signed-off-by: Stephane Grosjean <s.grosjean@peak-system.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
can: isotp: TX-path: ensure that CAN frame flags are initialized
The previous patch ensures that the TX flags (struct
can_isotp_ll_options::tx_flags) are 0 for classic CAN frames or a user
configured value for CAN-FD frames.
This patch sets the CAN frames flags unconditionally to the ISO-TP TX
flags, so that they are initialized to a proper value. Otherwise when
running "candump -x" on a classical CAN ISO-TP stream shows wrongly
set "B" and "E" flags.
| $ candump any,0:0,#FFFFFFFF -extA
| [...]
| can0 TX B E 713 [8] 2B 0A 0B 0C 0D 0E 0F 00
| can0 TX B E 713 [8] 2C 01 02 03 04 05 06 07
| can0 TX B E 713 [8] 2D 08 09 0A 0B 0C 0D 0E
| can0 TX B E 713 [8] 2E 0F 00 01 02 03 04 05
Martin Willi [Tue, 2 Mar 2021 12:24:23 +0000 (13:24 +0100)]
can: dev: Move device back to init netns on owning netns delete
When a non-initial netns is destroyed, the usual policy is to delete
all virtual network interfaces contained, but move physical interfaces
back to the initial netns. This keeps the physical interface visible
on the system.
CAN devices are somewhat special, as they define rtnl_link_ops even
if they are physical devices. If a CAN interface is moved into a
non-initial netns, destroying that netns lets the interface vanish
instead of moving it back to the initial netns. default_device_exit()
skips CAN interfaces due to having rtnl_link_ops set. Reproducer:
ip netns add foo
ip link set can0 netns foo
ip netns delete foo
WARNING: CPU: 1 PID: 84 at net/core/dev.c:11030 ops_exit_list+0x38/0x60
CPU: 1 PID: 84 Comm: kworker/u4:2 Not tainted 5.10.19 #1
Workqueue: netns cleanup_net
[<c010e700>] (unwind_backtrace) from [<c010a1d8>] (show_stack+0x10/0x14)
[<c010a1d8>] (show_stack) from [<c086dc10>] (dump_stack+0x94/0xa8)
[<c086dc10>] (dump_stack) from [<c086b938>] (__warn+0xb8/0x114)
[<c086b938>] (__warn) from [<c086ba10>] (warn_slowpath_fmt+0x7c/0xac)
[<c086ba10>] (warn_slowpath_fmt) from [<c0629f20>] (ops_exit_list+0x38/0x60)
[<c0629f20>] (ops_exit_list) from [<c062a5c4>] (cleanup_net+0x230/0x380)
[<c062a5c4>] (cleanup_net) from [<c0142c20>] (process_one_work+0x1d8/0x438)
[<c0142c20>] (process_one_work) from [<c0142ee4>] (worker_thread+0x64/0x5a8)
[<c0142ee4>] (worker_thread) from [<c0148a98>] (kthread+0x148/0x14c)
[<c0148a98>] (kthread) from [<c0100148>] (ret_from_fork+0x14/0x2c)
To properly restore physical CAN devices to the initial netns on owning
netns exit, introduce a flag on rtnl_link_ops that can be set by drivers.
For CAN devices setting this flag, default_device_exit() considers them
non-virtual, applying the usual namespace move.
The issue was introduced in the commit mentioned below, as at that time
CAN devices did not have a dellink() operation.
Davide Caratti [Mon, 15 Mar 2021 10:41:16 +0000 (11:41 +0100)]
mptcp: fix ADD_ADDR HMAC in case port is specified
Currently, Linux computes the HMAC contained in ADD_ADDR sub-option using
the Address Id and the IP Address, and hardcodes a destination port equal
to zero. This is not ok for ADD_ADDR with port: ensure to account for the
endpoint port when computing the HMAC, in compliance with RFC8684 §3.4.1.
Fixes: 1d0d3d44ede7 ("mptcp: add port support for ADD_ADDR suboption writing") Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Acked-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
tcp: relookup sock for RST+ACK packets handled by obsolete req sock
Currently tcp_check_req can be called with obsolete req socket for which big
socket have been already created (because of CPU race or early demux
assigning req socket to multiple packets in gro batch).
Commit 2b027671dde9cb6409d9 ("tcp: try to keep packet if SYN_RCV race
is lost") added retry in case when tcp_check_req is called for PSH|ACK packet.
But if client sends RST+ACK immediatly after connection being
established (it is performing healthcheck, for example) retry does not
occur. In that case tcp_check_req tries to close req socket,
leaving big socket active.
Fixes: 2b027671dde ("tcp: try to keep packet if SYN_RCV race is lost") Signed-off-by: Alexander Ovechkin <ovov@yandex-team.ru> Reported-by: Oleg Senin <olegsenin@yandex-team.ru> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes: 8f23ce1cbbee ("tipc: add support for AEAD key setting via netlink") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tuong Lien <tuong.t.lien@dektech.com.au> Cc: Jon Maloy <jmaloy@redhat.com> Cc: Ying Xue <ying.xue@windriver.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ong Boon Leong [Mon, 15 Mar 2021 04:33:42 +0000 (12:33 +0800)]
net: phylink: Fix phylink_err() function name error in phylink_major_config
if pl->mac_ops->mac_finish() failed, phylink_err should use
"mac_finish" instead of "mac_prepare".
Fixes: 0e77ad11c559b ("net: phylink: re-implement interface configuration with PCS") Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Xie He [Sun, 14 Mar 2021 11:21:01 +0000 (04:21 -0700)]
net: hdlc_x25: Prevent racing between "x25_close" and "x25_xmit"/"x25_rx"
"x25_close" is called by "hdlc_close" in "hdlc.c", which is called by
hardware drivers' "ndo_stop" function.
"x25_xmit" is called by "hdlc_start_xmit" in "hdlc.c", which is hardware
drivers' "ndo_start_xmit" function.
"x25_rx" is called by "hdlc_rcv" in "hdlc.c", which receives HDLC frames
from "net/core/dev.c".
"x25_close" races with "x25_xmit" and "x25_rx" because their callers race.
However, we need to ensure that the LAPB APIs called in "x25_xmit" and
"x25_rx" are called before "lapb_unregister" is called in "x25_close".
This patch adds locking to ensure when "x25_xmit" and "x25_rx" are doing
their work, "lapb_unregister" is not yet called in "x25_close".
Reasons for not solving the racing between "x25_close" and "x25_xmit" by
calling "netif_tx_disable" in "x25_close":
1. We still need to solve the racing between "x25_close" and "x25_rx";
2. The design of the HDLC subsystem assumes the HDLC hardware drivers
have full control over the TX queue, and the HDLC protocol drivers (like
this driver) have no control. Controlling the queue here in the protocol
driver may interfere with hardware drivers' control of the queue.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Xie He <xie.he.0141@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Westphal [Mon, 15 Mar 2021 10:31:09 +0000 (11:31 +0100)]
netfilter: ctnetlink: fix dump of the expect mask attribute
Before this change, the mask is never included in the netlink message, so
"conntrack -E expect" always prints 0.0.0.0.
In older kernels the l3num callback struct was passed as argument, based
on tuple->src.l3num. After the l3num indirection got removed, the call
chain is based on m.src.l3num, but this value is 0xffff.
Init l3num to the correct value.
Fixes: d4d47e24afda6 ("netfilter: conntrack: remove ctnetlink callbacks from l3 protocol trackers") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Mark Tomlinson [Mon, 8 Mar 2021 01:24:13 +0000 (14:24 +1300)]
netfilter: x_tables: Use correct memory barriers.
When a new table value was assigned, it was followed by a write memory
barrier. This ensured that all writes before this point would complete
before any writes after this point. However, to determine whether the
rules are unused, the sequence counter is read. To ensure that all
writes have been done before these reads, a full memory barrier is
needed, not just a write memory barrier. The same argument applies when
incrementing the counter, before the rules are read.
Changing to using smp_mb() instead of smp_wmb() fixes the kernel panic
reported in d59e3d10acbd (which is still present), while still
maintaining the same speed of replacing tables.
The smb_mb() barriers potentially slow the packet path, however testing
has shown no measurable change in performance on a 4-core MIPS64
platform.
Fixes: a52d52e33bbd ("netfilter: get rid of atomic ops in fast path") Signed-off-by: Mark Tomlinson <mark.tomlinson@alliedtelesis.co.nz> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This (and the preceding) patch basically re-implemented the RCU
mechanisms of patch 8abf527013d4. That patch was replaced because of the
performance problems that it created when replacing tables. Now, we have
the same issue: the call to synchronize_rcu() makes replacing tables
slower by as much as an order of magnitude.
Prior to using RCU a script calling "iptables" approx. 200 times was
taking 1.16s. With RCU this increased to 11.59s.
Revert these patches and fix the issue in a different way.
Signed-off-by: Mark Tomlinson <mark.tomlinson@alliedtelesis.co.nz> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This (and the following) patch basically re-implemented the RCU
mechanisms of patch 8abf527013d4. That patch was replaced because of the
performance problems that it created when replacing tables. Now, we have
the same issue: the call to synchronize_rcu() makes replacing tables
slower by as much as an order of magnitude.
Revert these patches and fix the issue in a different way.
Signed-off-by: Mark Tomlinson <mark.tomlinson@alliedtelesis.co.nz> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Hangbin Liu [Tue, 9 Mar 2021 03:22:14 +0000 (11:22 +0800)]
selftests/bpf: Set gopt opt_class to 0 if get tunnel opt failed
When fixing the bpf test_tunnel.sh geneve failure. I only fixed the IPv4
part but forgot the IPv6 issue. Similar with the IPv4 fixes 4816859dd2cf
("selftests/bpf: No need to drop the packet when there is no geneve opt"),
when there is no tunnel option and bpf_skb_get_tunnel_opt() returns error,
there is no need to drop the packets and break all geneve rx traffic.
Just set opt_class to 0 and keep returning TC_ACT_OK at the end.
Fixes: 4816859dd2cf ("selftests/bpf: No need to drop the packet when there is no geneve opt") Fixes: 3de8ce91aa0e ("selftests/bpf: bpf tunnel test.") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: William Tu <u9012063@gmail.com> Link: https://lore.kernel.org/bpf/20210309032214.2112438-1-liuhangbin@gmail.com
flow_dissector: fix byteorder of dissected ICMP ID
flow_dissector_key_icmp::id is of type u16 (CPU byteorder),
ICMP header has its ID field in network byteorder obviously.
Sparse says:
net/core/flow_dissector.c:178:43: warning: restricted __be16 degrades to integer
Convert ID value to CPU byteorder when storing it into
flow_dissector_key_icmp.
Fixes: 68da2600d00e ("flow_dissector: extract more ICMP information") Signed-off-by: Alexander Lobakin <alobakin@pm.me> Signed-off-by: David S. Miller <davem@davemloft.net>
Local variable ----addr@____sys_recvmsg created at:
____sys_recvmsg+0x168/0xd50 net/socket.c:2550
____sys_recvmsg+0x168/0xd50 net/socket.c:2550
Bytes 2-3 of 12 are uninitialized
Memory access of size 12 starts at ffff88817c627b40
Data copied to user address 0000000020000140
Fixes: e1acd3e21749 ("net: Add Qualcomm IPC router") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Courtney Cavin <courtney.cavin@sonymobile.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Tong Zhang [Sun, 14 Mar 2021 18:08:36 +0000 (14:08 -0400)]
net: arcnet: com20020 fix error handling
There are two issues when handling error case in com20020pci_probe()
1. priv might be not initialized yet when calling com20020pci_remove()
from com20020pci_probe(), since the priv is set at the very last but it
can jump to error handling in the middle and priv remains NULL.
2. memory leak - the net device is allocated in alloc_arcdev but not
properly released if error happens in the middle of the big for loop
Alex Elder [Fri, 12 Mar 2021 15:12:48 +0000 (09:12 -0600)]
net: ipa: terminate message handler arrays
When a QMI handle is initialized, an array of message handler
structures is provided, defining how any received message should
be handled based on its type and message ID. The QMI core code
traverses this array when a message arrives and calls the function
associated with the (type, msg_id) found in the array.
The array is supposed to be terminated with an empty (all zero)
entry though. Without it, an unsupported message will cause
the QMI core code to go past the end of the array.
Fix this bug, by properly terminating the message handler arrays
provided when QMI handles are set up by the IPA driver.
Fixes: 8b29b90afdb07 ("soc: qcom: ipa: AP/modem communications") Reported-by: Sujit Kautkar <sujitka@chromium.org> Signed-off-by: Alex Elder <elder@linaro.org> Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 13 Mar 2021 02:02:28 +0000 (18:02 -0800)]
Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git
/tnguy/net-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2021-03-12
This series contains updates to ice, i40e, ixgbe and igb drivers.
Magnus adjusts the return value for xsk allocation for ice. This fixes
reporting of napi work done and matches the behavior of other Intel NIC
drivers for xsk allocations.
Maciej moves storing of the rx_offset value to after the build_skb flag
is set as this flag affects the offset value for ice, i40e, and ixgbe.
Li RongQing resolves an issue where an Rx buffer can be reused
prematurely with XDP redirect for igb.
====================
Mat Martineau [Sat, 13 Mar 2021 00:43:52 +0000 (16:43 -0800)]
selftests: mptcp: Restore packet capture option in join tests
The join self tests previously used the '-c' command line option to
enable creation of pcap files for the tests that run, but the change to
allow running a subset of the join tests made overlapping use of that
option.
Restore the capture functionality with '-c' and move the syncookie test
option to '-k'.
Fixes: c6f269da83d8 ("selftests: mptcp: add command line arguments for mptcp_join.sh") Acked-and-tested-by: Geliang Tang <geliangtang@gmail.com> Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ilya Lipnitskiy [Fri, 12 Mar 2021 08:07:03 +0000 (00:07 -0800)]
net: dsa: mt7530: setup core clock even in TRGMII mode
A recent change to MIPS ralink reset logic made it so mt7530 actually
resets the switch on platforms such as mt7621 (where bit 2 is the reset
line for the switch). That exposed an issue where the switch would not
function properly in TRGMII mode after a reset.
Reconfigure core clock in TRGMII mode to fix the issue.
Tested on Ubiquiti ER-X (MT7621) with TRGMII mode enabled.
Fixes: 79c924b824e5 ("MIPS: ralink: manage low reset lines") Signed-off-by: Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Fri, 12 Mar 2021 07:41:12 +0000 (10:41 +0300)]
mptcp: fix bit MPTCP_PUSH_PENDING tests
The MPTCP_PUSH_PENDING define is 6 and these tests should be testing if
BIT(6) is set.
Fixes: ff256403c23a ("mptcp: fix race in release_cb") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Fri, 12 Mar 2021 00:52:50 +0000 (16:52 -0800)]
net: phy: broadcom: Fix RGMII delays for BCM50160 and BCM50610M
The PHY driver entry for BCM50160 and BCM50610M calls
bcm54xx_config_init() but does not call bcm54xx_config_clock_delay() in
order to configuration appropriate clock delays on the PHY, fix that.
Fixes: 5be5752c6a31 ("net: phy: Allow BCM5481x PHYs to setup internal TX/RX clock delay") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dylan Hung [Fri, 12 Mar 2021 00:34:05 +0000 (11:04 +1030)]
ftgmac100: Restart MAC HW once
The interrupt handler may set the flag to reset the mac in the future,
but that flag is not cleared once the reset has occurred.
Fixes: b2ad46ca4a45 ("ftgmac100: Rework NAPI & interrupts handling") Signed-off-by: Dylan Hung <dylan_hung@aspeedtech.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Reviewed-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Joel Stanley <joel@jms.id.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Robert Hancock [Thu, 11 Mar 2021 20:05:18 +0000 (14:05 -0600)]
net: axienet: Fix probe error cleanup
The driver did not always clean up all allocated resources when probe
failed. Fix the probe cleanup path to clean up everything that was
allocated.
Fixes: 66a94da963b3 ("net: axienet: Handle deferred probe on clock properly") Signed-off-by: Robert Hancock <robert.hancock@calian.com> Signed-off-by: David S. Miller <davem@davemloft.net>
liuyacan [Thu, 11 Mar 2021 16:32:25 +0000 (00:32 +0800)]
net: correct sk_acceptq_is_full()
The "backlog" argument in listen() specifies
the maximom length of pending connections,
so the accept queue should be considered full
if there are exactly "backlog" elements.
Signed-off-by: liuyacan <yacanliu@163.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Li RongQing [Thu, 21 Jan 2021 21:54:23 +0000 (13:54 -0800)]
igb: avoid premature Rx buffer reuse
Igb needs a similar fix as commit 8c52a2746044b ("i40e: avoid
premature Rx buffer reuse")
The page recycle code, incorrectly, relied on that a page fragment
could not be freed inside xdp_do_redirect(). This assumption leads to
that page fragments that are used by the stack/XDP redirect can be
reused and overwritten.
To avoid this, store the page count prior invoking xdp_do_redirect().
Longer explanation:
Intel NICs have a recycle mechanism. The main idea is that a page is
split into two parts. One part is owned by the driver, one part might
be owned by someone else, such as the stack.
t0: Page is allocated, and put on the Rx ring
+---------------
used by NIC ->| upper buffer
(rx_buffer) +---------------
| lower buffer
+---------------
page count == USHRT_MAX
rx_buffer->pagecnt_bias == USHRT_MAX
t1: Buffer is received, and passed to the stack (e.g.)
+---------------
| upper buff (skb)
+---------------
used by NIC ->| lower buffer
(rx_buffer) +---------------
page count == USHRT_MAX
rx_buffer->pagecnt_bias == USHRT_MAX - 1
t2: Buffer is received, and redirected
+---------------
| upper buff (skb)
+---------------
used by NIC ->| lower buffer
(rx_buffer) +---------------
This means that buffer *cannot* be flipped/reused, because the skb is
still using it.
The problem arises when xdp_do_redirect() actually frees the
segment. Then we get:
page count == USHRT_MAX - 1
rx_buffer->pagecnt_bias == USHRT_MAX - 2
From a recycle perspective, the buffer can be flipped and reused,
which means that the skb data area is passed to the Rx HW ring!
To work around this, the page count is stored prior calling
xdp_do_redirect().
Fixes: 15d5613e8254 ("igb: add XDP support") Signed-off-by: Li RongQing <lirongqing@baidu.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Tested-by: Vishakha Jambekar <vishakha.jambekar@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
ixgbe: move headroom initialization to ixgbe_configure_rx_ring
ixgbe_rx_offset(), that is supposed to initialize the Rx buffer headroom,
relies on __IXGBE_RX_BUILD_SKB_ENABLED flag.
Currently, the callsite of mentioned function is placed incorrectly
within ixgbe_setup_rx_resources() where Rx ring's build skb flag is not
set yet. This causes the XDP_REDIRECT to be partially broken due to
inability to create xdp_frame in the headroom space, as the headroom is
0.
Fix this by moving ixgbe_rx_offset() to ixgbe_configure_rx_ring() after
the flag setting, which happens to be set in ixgbe_set_rx_buffer_len.
Fixes: b253c51cbe6c ("ixgbe: store the result of ixgbe_rx_offset() onto ixgbe_ring") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Vishakha Jambekar <vishakha.jambekar@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
ice: move headroom initialization to ice_setup_rx_ctx
ice_rx_offset(), that is supposed to initialize the Rx buffer headroom,
relies on ICE_RX_FLAGS_RING_BUILD_SKB flag as well as XDP prog presence.
Currently, the callsite of mentioned function is placed incorrectly
within ice_setup_rx_ring() where Rx ring's build skb flag is not
set yet. This causes the XDP_REDIRECT to be partially broken due to
inability to create xdp_frame in the headroom space, as the headroom is
0.
Fix this by moving ice_rx_offset() to ice_setup_rx_ctx() after the flag
setting.
Fixes: 498dc0b568bc ("ice: store the result of ice_rx_offset() onto ice_ring") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
i40e: move headroom initialization to i40e_configure_rx_ring
i40e_rx_offset(), that is supposed to initialize the Rx buffer headroom,
relies on I40E_RXR_FLAGS_BUILD_SKB_ENABLED flag.
Currently, the callsite of mentioned function is placed incorrectly
within i40e_setup_rx_descriptors() where Rx ring's build skb flag is not
set yet. This causes the XDP_REDIRECT to be partially broken due to
inability to create xdp_frame in the headroom space, as the headroom is
0.
For the record, below is the call graph:
i40e_vsi_open
i40e_vsi_setup_rx_resources
i40e_setup_rx_descriptors
i40e_rx_offset() <-- sets offset to 0 as build_skb flag is set below
i40e_vsi_configure_rx
i40e_configure_rx_ring
set_ring_build_skb_enabled(ring) <-- set build_skb flag
Fix this by moving i40e_rx_offset() to i40e_configure_rx_ring() after
the flag setting.
Fixes: 5e1b7d2f995d ("i40e: store the result of i40e_rx_offset() onto i40e_ring") Reported-by: Jesper Dangaard Brouer <brouer@redhat.com> Co-developed-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Tested-by: Jesper Dangaard Brouer <brouer@redhat.com> Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Magnus Karlsson [Fri, 5 Feb 2021 09:09:04 +0000 (10:09 +0100)]
ice: fix napi work done reporting in xsk path
Fix the wrong napi work done reporting in the xsk path of the ice
driver. The code in the main Rx processing loop was written to assume
that the buffer allocation code returns true if all allocations where
successful and false if not. In contrast with all other Intel NIC xsk
drivers, the ice_alloc_rx_bufs_zc() has the inverted logic messing up
the work done reporting in the napi loop.
This can be fixed either by inverting the return value from
ice_alloc_rx_bufs_zc() in the function that uses this in an incorrect
way, or by changing the return value of ice_alloc_rx_bufs_zc(). We
chose the latter as it makes all the xsk allocation functions for
Intel NICs behave in the same way. My guess is that it was this
unexpected discrepancy that gave rise to this bug in the first place.
Fixes: f8d31ba87f1a ("ice, xsk: Move Rx allocation out of while-loop") Reported-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
David S. Miller [Fri, 12 Mar 2021 02:30:32 +0000 (18:30 -0800)]
Merge branch 'htb-fixes'
Maxim Mikityanskiy says:
====================
Bugfixes for HTB
The HTB offload feature introduced a few bugs in HTB. One affects the
non-offload mode, preventing attaching qdiscs to HTB classes, and the
other affects the error flow, when the netdev doesn't support the
offload, but it was requested. This short series fixes them.
====================
Acked-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: David S. Miller <davem@davemloft.net>
sch_htb: Fix offload cleanup in htb_destroy on htb_init failure
htb_init may fail to do the offload if it's not supported or if a
runtime error happens when allocating direct qdiscs. In those cases
TC_HTB_CREATE command is not sent to the driver, however, htb_destroy
gets called anyway and attempts to send TC_HTB_DESTROY.
It shouldn't happen, because the driver didn't receive TC_HTB_CREATE,
and also because the driver may not support ndo_setup_tc at all, while
q->offload is true, and htb_destroy mistakenly thinks the offload is
supported. Trying to call ndo_setup_tc in the latter case will lead to a
NULL pointer dereference.
This commit fixes the issues with htb_destroy by deferring assignment of
q->offload until after the TC_HTB_CREATE command. The necessary cleanup
of the offload entities is already done in htb_init.
Reported-by: syzbot+b53a709f04722ca12a3c@syzkaller.appspotmail.com Fixes: c027dd73e5c3 ("sch_htb: Hierarchical QoS hardware offload") Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
htb_select_queue assumes it's always the offload mode, and it ends up in
calling ndo_setup_tc without any checks. It may lead to a NULL pointer
dereference if ndo_setup_tc is not implemented, or to an error returned
from the driver, which will prevent attaching qdiscs to HTB classes in
the non-offload mode.
This commit fixes the bug by adding the missing check to
htb_select_queue. In the non-offload mode it will return sch->dev_queue,
mimicking tc_modify_qdisc's behavior for the case where select_queue is
not implemented.
Reported-by: syzbot+b53a709f04722ca12a3c@syzkaller.appspotmail.com Fixes: c027dd73e5c3 ("sch_htb: Hierarchical QoS hardware offload") Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>