net/mlx5e: Fix division by 0 in mlx5e_select_queue
mlx5e_select_queue compares num_tc_x_num_ch to real_num_tx_queues to
determine if HTB and/or PTP offloads are active. If they are, it
calculates netdev_pick_tx() % num_tc_x_num_ch to prevent it from
selecting HTB and PTP queues for regular traffic. However, before the
channels are first activated, num_tc_x_num_ch is zero. If
ndo_select_queue gets called at this point, the HTB/PTP check will pass,
and mlx5e_select_queue will attempt to take a modulo by num_tc_x_num_ch,
which equals to zero.
This commit fixes the bug by assigning num_tc_x_num_ch to a non-zero
value before registering the netdev.
Dima Chumak [Thu, 4 Mar 2021 19:28:11 +0000 (21:28 +0200)]
net/mlx5e: Offload tuple rewrite for non-CT flows
Setting connection tracking OVS flows and then setting non-CT flows that
use tuple rewrite action (e.g. mod_tp_dst), causes the latter flows not
being offloaded.
Fix by using a stricter condition in modify_header_match_supported() to
check tuple rewrite support only for flows with CT action. The check is
factored out into standalone modify_tuple_supported() function to aid
readability.
Fixes: 4ae2aaf86eab ("net/mlx5e: CT: Don't offload tuple rewrites for established tuples") Signed-off-by: Dima Chumak <dchumak@nvidia.com> Reviewed-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Alaa Hleihel [Wed, 10 Mar 2021 15:01:46 +0000 (17:01 +0200)]
net/mlx5e: Allow to match on MPLS parameters only for MPLS over UDP
Currently, we support hardware offload only for MPLS over UDP.
However, rules matching on MPLS parameters are now wrongly offloaded
for regular MPLS, without actually taking the parameters into
consideration when doing the offload.
Fix it by rejecting such unsupported rules.
Fixes: dcffbe80a25d ("net/mlx5e: Allow to match on mpls parameters") Signed-off-by: Alaa Hleihel <alaa@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Sat, 20 Mar 2021 20:40:08 +0000 (21:40 +0100)]
r8169: fix DMA being used after buffer free if WoL is enabled
IOMMU errors have been reported if WoL is enabled and interface is
brought down. It turned out that the network chip triggers DMA
transfers after the DMA buffers have been freed. For WoL to work we
need to leave rx enabled, therefore simply stop the chip from being
a DMA busmaster.
Fixes: bffe987d4250 ("r8169: add rtl8169_up") Tested-by: Paul Blazejowski <paulb@blazebox.homeip.net> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 21 Mar 2021 01:58:56 +0000 (18:58 -0700)]
Merge tag 'linux-can-fixes-for-5.12-20210320' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2021-03-20
this is a pull request of 2 patches for net/master.
The first patch is by Oliver Hartkopp. He fixes the TX-path in the
ISO-TP protocol by properly initializing the outgoing CAN frames.
The second patch is by me and reverts a patch from my previous pull
request which added MODULE_SUPPORTED_DEVICE to the peak_usb driver. In
the mean time in Linus's tree the entirely MODULE_SUPPORTED_DEVICE was
removed. So this reverts the adding of the new MODULE_SUPPORTED_DEVICE
to avoid the merge conflict.
If you prefer to resolve the merge conflict by hand, I'll send a new
pull request without that patch.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 21 Mar 2021 01:53:41 +0000 (18:53 -0700)]
Merge branch 'pa-fox-validation'
Alex Elder says:
====================
ipa: fix validation
There is sanity checking code in the IPA driver that's meant to be
enabled only during development. This allows the driver to make
certain assumptions, but not have to verify those assumptions are
true at (operational) runtime. This code is built conditional on
IPA_VALIDATION, set (if desired) inside the IPA makefile.
Unfortunately, this validation code has some errors. First, there
are some mismatched arguments supplied to some dev_err() calls in
ipa_cmd_table_valid() and ipa_cmd_header_valid(), and these are
exposed if validation is enabled. Second, the tag that enables
this conditional code isn't used consistently (it's IPA_VALIDATE
in some spots and IPA_VALIDATION in others).
This series fixes those two problems with the conditional validation
code.
Version 2 removes the two patches that introduced ipa_assert(). It
also modifies the description in the first patch so that it mentions
the changes made to ipa_cmd_table_valid().
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Alex Elder [Sat, 20 Mar 2021 14:17:28 +0000 (09:17 -0500)]
net: ipa: fix init header command validation
We use ipa_cmd_header_valid() to ensure certain values we will
program into hardware are within range, well in advance of when we
actually program them. This way we avoid having to check for errors
when we actually program the hardware.
Unfortunately the dev_err() call for a bad offset value does not
supply the arguments to match the format specifiers properly.
Fix this.
There was also supposed to be a check to ensure the size to be
programmed fits in the field that holds it. Add this missing check.
Rearrange the way we ensure the header table fits in overall IPA
memory range.
Finally, update ipa_cmd_table_valid() so the format of messages
printed for errors matches what's done in ipa_cmd_header_valid().
Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
In commit f36e837328a9 ("module: remove never implemented
MODULE_SUPPORTED_DEVICE") the MODULE_SUPPORTED_DEVICE macro was
removed from the kerne entirely. Shortly before this patch was applied
mainline the commit 5b643d51db18 ("can: peak_usb: add forgotten
supported devices") was added to net/master. As this would result in a
merge conflict, let's revert this patch.
Oliver Hartkopp [Fri, 19 Mar 2021 10:06:19 +0000 (11:06 +0100)]
can: isotp: tx-path: zero initialize outgoing CAN frames
Commit 5838a9d6a2fe ("can: isotp: TX-path: ensure that CAN frame flags are
initialized") ensured the TX flags to be properly set for outgoing CAN
frames.
In fact the root cause of the issue results from a missing initialization
of outgoing CAN frames created by isotp. This is no problem on the CAN bus
as the CAN driver only picks the correctly defined content from the struct
can(fd)_frame. But when the outgoing frames are monitored (e.g. with
candump) we potentially leak some bytes in the unused content of
struct can(fd)_frame.
Fixes: 3f9425168a78 ("can: add ISO 15765-2:2016 transport protocol") Cc: Marc Kleine-Budde <mkl@pengutronix.de> Link: https://lore.kernel.org/r/20210319100619.10858-1-socketcan@hartkopp.net Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Andrii Nakryiko [Fri, 19 Mar 2021 21:14:44 +0000 (14:14 -0700)]
Merge branch 'libbpf: Fix BTF dump of pointer-to-array-of-struct'
Jean-Philippe Brucker says:
====================
Fix an issue with the libbpf BTF dump, see patch 1 for details.
Since [v1] I added the selftest in patch 2, though I couldn't figure out
a way to make it independent from the order in which debug info is
issued by the compiler.
selftests/bpf: Add selftest for pointer-to-array-of-struct BTF dump
Bpftool used to issue forward declarations for a struct used as part of
a pointer to array, which is invalid. Add a test to check that the
struct is fully defined in this case:
After build with clang -> bpftool btf dump c -> clang/gcc:
./def-clang.h:11:23: error: array has incomplete element type 'struct inner'
struct inner (*ptr_to_array)[2];
Member ptr_to_array of struct outer is a pointer to an array of struct
inner. In the DWARF generated by clang, struct outer appears before
struct inner, so when converting BTF of struct outer into C, bpftool
issues a forward declaration to struct inner. With GCC the DWARF info is
reversed so struct inner gets fully defined.
That forward declaration is not sufficient when compilers handle an
array of the struct, even when it's only used through a pointer. Note
that we can trigger the same issue with an intermediate typedef:
Hangbin Liu [Fri, 19 Mar 2021 14:33:14 +0000 (22:33 +0800)]
selftests: forwarding: vxlan_bridge_1d: Fix vxlan ecn decapsulate value
The ECN bit defines ECT(1) = 1, ECT(0) = 2. So inner 0x02 + outer 0x01
should be inner ECT(0) + outer ECT(1). Based on the description of
__INET_ECN_decapsulate, the final decapsulate value should be
ECT(1). So fix the test expect value to 0x01.
Before the fix:
TEST: VXLAN: ECN decap: 01/02->0x02 [FAIL]
Expected to capture 10 packets, got 0.
After the fix:
TEST: VXLAN: ECN decap: 01/02->0x01 [ OK ]
Fixes: 6e32157f3984 ("selftests: forwarding: vxlan_bridge_1d: Add an ECN decap test") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Mat Martineau [Fri, 19 Mar 2021 18:33:02 +0000 (11:33 -0700)]
mptcp: Change mailing list address
The mailing list for MPTCP maintenance has moved to the
kernel.org-supported mptcp@lists.linux.dev address.
Complete, combined archives for both lists are now hosted at
https://lore.kernel.org/mptcp
Cc: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
David Brazdil [Fri, 19 Mar 2021 13:05:41 +0000 (13:05 +0000)]
selinux: vsock: Set SID for socket returned by accept()
For AF_VSOCK, accept() currently returns sockets that are unlabelled.
Other socket families derive the child's SID from the SID of the parent
and the SID of the incoming packet. This is typically done as the
connected socket is placed in the queue that accept() removes from.
Reuse the existing 'security_sk_clone' hook to copy the SID from the
parent (server) socket to the child. There is no packet SID in this
case.
Fixes: 031acdb8ab77 ("VSOCK: Introduce VM Sockets") Signed-off-by: David Brazdil <dbrazdil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Corentin Labbe [Fri, 19 Mar 2021 13:44:22 +0000 (13:44 +0000)]
net: stmmac: dwmac-sun8i: Provide TX and RX fifo sizes
MTU cannot be changed on dwmac-sun8i. (ip link set eth0 mtu xxx returning EINVAL)
This is due to tx_fifo_size being 0, since this value is used to compute valid
MTU range.
Like dwmac-sunxi (with commit a6fbf4e03518 ("net: stmmac: dwmac-sunxi: Provide TX and RX fifo sizes"))
dwmac-sun8i need to have tx and rx fifo sizes set.
I have used values from datasheets.
After this patch, setting a non-default MTU (like 1000) value works and network is still useable.
Tested-on: sun8i-h3-orangepi-pc
Tested-on: sun8i-r40-bananapi-m2-ultra
Tested-on: sun50i-a64-bananapi-m64
Tested-on: sun50i-h5-nanopi-neo-plus2
Tested-on: sun50i-h6-pine-h64 Fixes: 6f537725a8e ("net-next: stmmac: Add dwmac-sun8i") Reported-by: Belisko Marek <marek.belisko@gmail.com> Signed-off-by: Corentin Labbe <clabbe@baylibre.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Hayes Wang [Fri, 19 Mar 2021 07:37:21 +0000 (15:37 +0800)]
r8152: limit the RX buffer size of RTL8153A for USB 2.0
If the USB host controller is EHCI, the throughput is reduced from
300Mb/s to 60Mb/s, when the rx buffer size is modified from 16K to
32K.
According to the EHCI spec, the maximum size of the qTD is 20K.
Therefore, when the driver uses more than 20K buffer, the latency
time of EHCI would be increased. And, it let the RTL8153A get worse
throughput.
However, the driver uses alloc_pages() for rx buffer, so I limit
the rx buffer to 16K rather than 20K.
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=205923 Fixes: a5ac476703cf ("r8152: separate the rx buffer size") Reported-by: Robert Davies <robdavies1977@gmail.com> Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Fri, 19 Mar 2021 03:52:41 +0000 (11:52 +0800)]
sctp: move sk_route_caps check and set into sctp_outq_flush_transports
The sk's sk_route_caps is set in sctp_packet_config, and later it
only needs to change when traversing the transport_list in a loop,
as the dst might be changed in the tx path.
So move sk_route_caps check and set into sctp_outq_flush_transports
from sctp_packet_transmit. This also fixes a dst leak reported by
Chen Yi:
As calling sk_setup_caps() in sctp_packet_transmit may also set the
sk_route_caps for the ctrl sock in a netns. When the netns is being
deleted, the ctrl sock's releasing is later than dst dev's deleting,
which will cause this dev's deleting to hang and dmesg error occurs:
unregister_netdevice: waiting for xxx to become free. Usage count = 1
Reported-by: Chen Yi <yiche@redhat.com> Fixes: fc02dd43bd02 ("sctp: call sk_setup_caps in sctp_packet_transmit instead") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Add a couple of checks to make sure timestamping is on and that the
timestamp value from DMA is valid. This avoids any functional issues
that could come from a misinterpreted time stamp.
One of the functions changed doesn't need a return value added because
there was no value in checking from the calling locations.
While here, fix a couple of reverse christmas tree issues next to
the code being changed.
Fixes: 0fa12b3c5ebf ("igb: Pull timestamp from fragment before adding it to skb") Fixes: 15d5613e8254 ("igb: add XDP support") Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Dave Switzer <david.switzer@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tom Seewald [Mon, 22 Feb 2021 04:00:05 +0000 (22:00 -0600)]
igb: Fix duplicate include guard
The include guard "_E1000_HW_H_" is used by two separate header files in
two different drivers (e1000/e1000_hw.h and igb/e1000_hw.h). Using the
same include guard macro in more than one header file may cause
unexpected behavior from the compiler. Fix this by renaming the
duplicate guard in the igb driver.
Fixes: d49a2a41710f ("igb: PCI-Express 82575 Gigabit Ethernet driver") Signed-off-by: Tom Seewald <tseewald@gmail.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tom Seewald [Mon, 22 Feb 2021 04:00:04 +0000 (22:00 -0600)]
e1000e: Fix duplicate include guard
The include guard "_E1000_HW_H_" is used by header files in three
different drivers (e1000/e1000_hw.h, e1000e/hw.h, and igb/e1000_hw.h).
Using the same include guard macro in more than one header file may
cause unexpected behavior from the compiler. Fix the duplicate include
guard in the e1000e driver by renaming it.
Fixes: 3f32e81cc0ea ("[E1000E]: New pci-express e1000 driver (currently for ICH9 devices only)") Signed-off-by: Tom Seewald <tseewald@gmail.com> Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com> Acked-by: Sasha Neftin <sasha.neftin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Johan Hovold [Thu, 18 Mar 2021 15:57:49 +0000 (16:57 +0100)]
net: cdc-phonet: fix data-interface release on probe failure
Set the disconnected flag before releasing the data interface in case
netdev registration fails to avoid having the disconnect callback try to
deregister the never registered netdev (and trigger a WARN_ON()).
Fixes: 3791006fe8d8 ("USB host CDC Phonet network interface driver") Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Bohac [Thu, 18 Mar 2021 03:42:53 +0000 (04:42 +0100)]
net: check all name nodes in __dev_alloc_name
__dev_alloc_name(), when supplied with a name containing '%d',
will search for the first available device number to generate a
unique device name.
Since commit 25be6dea04db1cadb8b6172cdcc13962846db444 ("net:
introduce name_node struct to be used in hashlist") network
devices may have alternate names. __dev_alloc_name() does take
these alternate names into account, possibly generating a name
that is already taken and failing with -ENFILE as a result.
This demonstrates the bug:
# rmmod dummy 2>/dev/null
# ip link property add dev lo altname dummy0
# modprobe dummy numdummies=1
modprobe: ERROR: could not insert 'dummy': Too many open files in system
Instead of creating a device named dummy1, modprobe fails.
Fix this by checking all the names in the d->name_node list, not just d->name.
Signed-off-by: Jiri Bohac <jbohac@suse.cz> Fixes: 25be6dea04db ("net: introduce name_node struct to be used in hashlist") Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This series of patches fixes various issues related to NPC MCAM entry
management, debugfs, devlink, CGX LMAC mapping, RSS config etc
Change-log:
v2:
Fixed below review comments
- corrected Fixed tag syntax with 12 digits SHA1
and providing space between SHA1 and subject line
- remove code improvement patch
- make commit description more clear
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Hariprasad Kelam [Thu, 18 Mar 2021 14:15:48 +0000 (19:45 +0530)]
octeontx2-af: fix infinite loop in unmapping NPC counter
unmapping npc counter works in a way by traversing all mcam
entries to find which mcam rule is associated with counter.
But loop cursor variable 'entry' is not incremented before
checking next mcam entry which resulting in infinite loop.
This in turn hogs the kworker thread forever and no other
mbox message is processed by AF driver after that.
Fix this by updating entry value before checking next
mcam entry.
Fixes: edc518a52b20 ("octeontx2-af: Map or unmap NPC MCAM entry and counter") Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Geetha sowjanya [Thu, 18 Mar 2021 14:15:47 +0000 (19:45 +0530)]
octeontx2-pf: Clear RSS enable flag on interace down
RSS configuration can not be get/set when interface is in down state
as they required mbox communication. RSS enable flag status
is used for set/get configuration. Current code do not clear the
RSS enable flag on interface down which lead to mbox error while
trying to set/get RSS configuration.
Fixes: 1dee7ca9acad ("octeontx2-pf: Receive side scaling support") Signed-off-by: Geetha sowjanya <gakula@marvell.com> Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Geetha sowjanya [Thu, 18 Mar 2021 14:15:46 +0000 (19:45 +0530)]
octeontx2-af: Fix irq free in rvu teardown
Current devlink code try to free already freed irqs as the
irq_allocate flag is not cleared after free leading to kernel
crash while removing rvu driver. The patch fixes the irq free
sequence and clears the irq_allocate flag on free.
Fixes: ea924d0ed956 ("octeontx2-af: Add mailbox IRQ and msg handlers") Signed-off-by: Geetha sowjanya <gakula@marvell.com> Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
CGX receive buffer size is a constant value and
cannot be read from CGX0 block always since
CGX0 may not enabled everytime. Hence return CGX
receive buffer size from first enabled CGX block
instead of CGX0.
Fixes: e2cbd6fcef41 ("octeontx2-af: cn10K: MTU configuration") Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The MKEX profile describes what packet fields need to be extracted from
the input packet and how to place those packet fields in the output key
for MCAM matching. The MKEX profile can be in a way where higher layer
packet fields can overwrite lower layer packet fields in output MCAM
Key.
Hence MKEX profile is always ensured that there are no overlaps between
any of the layers. But the commit 978930dabe82
("octeontx2-af: cleanup KPU config data") introduced TX TOS field which
overlaps with DMAC in MCAM key.
This led to AF driver returning error when TX rule is installed with
DMAC as match criteria since DMAC gets overwritten and cannot be
supported. This patch fixes the issue by removing TOS field from MKEX TX
profile.
Fixes: 978930dabe82 ("octeontx2-af: cleanup KPU config data") Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
netsec: restore phy power state after controller reset
Since commit be515903fa81 ("net: socionext: Stop PHY before resetting
netsec") netsec_netdev_init() power downs phy before resetting the
controller. However, the state is not restored once the reset is
complete. As a result it is not possible to bring up network on a
platform with Broadcom BCM5482 phy.
Fix the issue by restoring phy power state after controller reset is
complete.
Fixes: be515903fa81 ("net: socionext: Stop PHY before resetting netsec") Cc: stable@vger.kernel.org Signed-off-by: Mian Yousaf Kaukab <ykaukab@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 5c798a550940 ("ipv6: drop incoming packets having a v4mapped
source address") introduced an input check against v4mapped addresses.
Use of such addresses on the wire is indeed questionable and not
allowed on public Internet. As the commit pointed out
Unfortunately there are applications which use v4mapped addresses,
and breaking them is a clear regression. For example v4mapped
addresses (or any semi-valid addresses, really) may be used
for uni-direction event streams or packet export.
Since the issue which sparked the addition of the check was with
TCP and request_socks in particular push the check down to TCPv6
and DCCP. This restores the ability to receive UDPv6 packets with
v4mapped address as the source.
Keep using the IPSTATS_MIB_INHDRERRORS statistic to minimize the
user-visible changes.
Fixes: 5c798a550940 ("ipv6: drop incoming packets having a v4mapped source address") Reported-by: Sunyi Shao <sunyishao@fb.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
selftest/bpf: Add a test to check trampoline freeing logic.
Add a selftest for commit f7fede7bce91 ("bpf: Fix fexit trampoline.")
to make sure that attaching fexit prog to a sleeping kernel function
will trigger appropriate trampoline and program destruction.
The following pull-request contains BPF updates for your *net* tree.
We've added 10 non-merge commits during the last 4 day(s) which contain
a total of 14 files changed, 336 insertions(+), 94 deletions(-).
The main changes are:
1) Fix fexit/fmod_ret trampoline for sleepable programs, and also fix a ftrace
splat in modify_ftrace_direct() on address change, from Alexei Starovoitov.
2) Fix two oob speculation possibilities that allows unprivileged to leak mem
via side-channel, from Piotr Krysiuk and Daniel Borkmann.
netfilter: nftables: skip hook overlap logic if flowtable is stale
If the flowtable has been previously removed in this batch, skip the
hook overlap checks. This fixes spurious EEXIST errors when removing and
adding the flowtable in the same batch.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
libbpf: Use SOCK_CLOEXEC when opening the netlink socket
Otherwise, there exists a small window between the opening and closing
of the socket fd where it may leak into processes launched by some other
thread.
Fixes: 2e8290ccfe4a ("libbpf: add function to setup XDP") Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/20210317115857.6536-1-memxor@gmail.com
Yinjun Zhang [Wed, 17 Mar 2021 12:42:24 +0000 (13:42 +0100)]
netfilter: flowtable: Make sure GC works periodically in idle system
Currently flowtable's GC work is initialized as deferrable, which
means GC cannot work on time when system is idle. So the hardware
offloaded flow may be deleted for timeout, since its used time is
not timely updated.
Resolve it by initializing the GC work as delayed work instead of
deferrable.
Fixes: c550be63df75 ("netfilter: nf_flow_table: hardware offload support") Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com> Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The fexit/fmod_ret programs can be attached to kernel functions that can sleep.
The synchronize_rcu_tasks() will not wait for such tasks to complete.
In such case the trampoline image will be freed and when the task
wakes up the return IP will point to freed memory causing the crash.
Solve this by adding percpu_ref_get/put for the duration of trampoline
and separate trampoline vs its image life times.
The "half page" optimization has to be removed, since
first_half->second_half->first_half transition cannot be guaranteed to
complete in deterministic time. Every trampoline update becomes a new image.
The image with fmod_ret or fexit progs will be freed via percpu_ref_kill and
call_rcu_tasks. Together they will wait for the original function and
trampoline asm to complete. The trampoline is patched from nop to jmp to skip
fexit progs. They are freed independently from the trampoline. The image with
fentry progs only will be freed via call_rcu_tasks_trace+call_rcu_tasks which
will wait for both sleepable and non-sleepable progs to complete.
Fixes: 2cefb9dd93cc ("bpf: Introduce BPF trampoline") Reported-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Paul E. McKenney <paulmck@kernel.org> # for RCU Link: https://lore.kernel.org/bpf/20210316210007.38949-1-alexei.starovoitov@gmail.com
Wei Wang [Tue, 16 Mar 2021 22:36:47 +0000 (15:36 -0700)]
net: fix race between napi kthread mode and busy poll
Currently, napi_thread_wait() checks for NAPI_STATE_SCHED bit to
determine if the kthread owns this napi and could call napi->poll() on
it. However, if socket busy poll is enabled, it is possible that the
busy poll thread grabs this SCHED bit (after the previous napi->poll()
invokes napi_complete_done() and clears SCHED bit) and tries to poll
on the same napi. napi_disable() could grab the SCHED bit as well.
This patch tries to fix this race by adding a new bit
NAPI_STATE_SCHED_THREADED in napi->state. This bit gets set in
____napi_schedule() if the threaded mode is enabled, and gets cleared
in napi_complete_done(), and we only poll the napi in kthread if this
bit is set. This helps distinguish the ownership of the napi between
kthread and other scenarios and fixes the race issue.
Fixes: 19a9abdd15a9 ("net: implement threaded-able napi poll loop support") Reported-by: Martin Zaharinov <micron10@gmail.com> Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Wei Wang <weiwan@google.com> Cc: Alexander Duyck <alexanderduyck@fb.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Piotr Krysiuk [Tue, 16 Mar 2021 10:44:42 +0000 (11:44 +0100)]
bpf, selftests: Fix up some test_verifier cases for unprivileged
Fix up test_verifier error messages for the case where the original error
message changed, or for the case where pointer alu errors differ between
privileged and unprivileged tests. Also, add alternative tests for keeping
coverage of the original verifier rejection error message (fp alu), and
newly reject map_ptr += rX where rX == 0 given we now forbid alu on these
types for unprivileged. All test_verifier cases pass after the change. The
test case fixups were kept separate to ease backporting of core changes.
Signed-off-by: Piotr Krysiuk <piotras@gmail.com> Co-developed-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org>
Piotr Krysiuk [Tue, 16 Mar 2021 08:47:02 +0000 (09:47 +0100)]
bpf: Add sanity check for upper ptr_limit
Given we know the max possible value of ptr_limit at the time of retrieving
the latter, add basic assertions, so that the verifier can bail out if
anything looks odd and reject the program. Nothing triggered this so far,
but it also does not hurt to have these.
Signed-off-by: Piotr Krysiuk <piotras@gmail.com> Co-developed-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org>
David S. Miller [Wed, 17 Mar 2021 19:21:15 +0000 (12:21 -0700)]
Merge tag 'mac80211-for-net-2021-03-17' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
First round of fixes for 5.12-rc:
* HE (802.11ax) elements can be extended, handle that
* fix locking in network namespace changes that was
broken due to the RTNL-redux work
* various other small fixes
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
wenxu [Wed, 17 Mar 2021 04:02:43 +0000 (12:02 +0800)]
net/sched: cls_flower: fix only mask bit check in the validate_ct_state
The ct_state validate should not only check the mask bit and also
check mask_bit & key_bit..
For the +new+est case example, The 'new' and 'est' bits should be
set in both state_mask and state flags. Or the -new-est case also
will be reject by kernel.
When Openvswitch with two flows
ct_state=+trk+new,action=commit,forward
ct_state=+trk+est,action=forward
A packet go through the kernel and the contrack state is invalid,
The ct_state will be +trk-inv. Upcall to the ovs-vswitchd, the
finally dp action will be drop with -new-est+trk.
Fixes: f544e47316a3 ("net/sched: cls_flower: Reject invalid ct_state flags rules") Fixes: 62564537eb6c ("net/sched: cls_flower: validate ct_state for invalid and reply flags") Signed-off-by: wenxu <wenxu@ucloud.cn> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Shannon Nelson [Wed, 17 Mar 2021 00:07:47 +0000 (17:07 -0700)]
ionic: linearize tso skb with too many frags
We were linearizing non-TSO skbs that had too many frags, but
we weren't checking number of frags on TSO skbs. This could
lead to a bad page reference when we received a TSO skb with
more frags than the Tx descriptor could support.
v2: use gso_segs rather than yet another division
don't rework the check on the nr_frags
Fixes: 5931eb5d68ae ("ionic: Add Tx and Rx handling") Signed-off-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: David S. Miller <davem@davemloft.net>
Piotr Krysiuk [Tue, 16 Mar 2021 07:26:25 +0000 (08:26 +0100)]
bpf: Simplify alu_limit masking for pointer arithmetic
Instead of having the mov32 with aux->alu_limit - 1 immediate, move this
operation to retrieve_ptr_limit() instead to simplify the logic and to
allow for subsequent sanity boundary checks inside retrieve_ptr_limit().
This avoids in future that at the time of the verifier masking rewrite
we'd run into an underflow which would not sign extend due to the nature
of mov32 instruction.
Signed-off-by: Piotr Krysiuk <piotras@gmail.com> Co-developed-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org>
Piotr Krysiuk [Tue, 16 Mar 2021 07:20:16 +0000 (08:20 +0100)]
bpf: Fix off-by-one for area size in creating mask to left
retrieve_ptr_limit() computes the ptr_limit for registers with stack and
map_value type. ptr_limit is the size of the memory area that is still
valid / in-bounds from the point of the current position and direction
of the operation (add / sub). This size will later be used for masking
the operation such that attempting out-of-bounds access in the speculative
domain is redirected to remain within the bounds of the current map value.
When masking to the right the size is correct, however, when masking to
the left, the size is off-by-one which would lead to an incorrect mask
and thus incorrect arithmetic operation in the non-speculative domain.
Piotr found that if the resulting alu_limit value is zero, then the
BPF_MOV32_IMM() from the fixup_bpf_calls() rewrite will end up loading
0xffffffff into AX instead of sign-extending to the full 64 bit range,
and as a result, this allows abuse for executing speculatively out-of-
bounds loads against 4GB window of address space and thus extracting the
contents of kernel memory via side-channel.
Fixes: 422fca2ce03f ("bpf: prevent out of bounds speculation on pointer arithmetic") Signed-off-by: Piotr Krysiuk <piotras@gmail.com> Co-developed-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org>
Piotr Krysiuk [Tue, 16 Mar 2021 08:47:02 +0000 (09:47 +0100)]
bpf: Prohibit alu ops for pointer types not defining ptr_limit
The purpose of this patch is to streamline error propagation and in particular
to propagate retrieve_ptr_limit() errors for pointer types that are not defining
a ptr_limit such that register-based alu ops against these types can be rejected.
The main rationale is that a gap has been identified by Piotr in the existing
protection against speculatively out-of-bounds loads, for example, in case of
ctx pointers, unprivileged programs can still perform pointer arithmetic. This
can be abused to execute speculatively out-of-bounds loads without restrictions
and thus extract contents of kernel memory.
Fix this by rejecting unprivileged programs that attempt any pointer arithmetic
on unprotected pointer types. The two affected ones are pointer to ctx as well
as pointer to map. Field access to a modified ctx' pointer is rejected at a
later point in time in the verifier, and bf379367a6f9 ("bpf: Permit map_ptr
arithmetic with opcode add and offset 0") only relevant for root-only use cases.
Risk of unprivileged program breakage is considered very low.
Fixes: bf379367a6f9 ("bpf: Permit map_ptr arithmetic with opcode add and offset 0") Fixes: 6b913f15182a ("bpf: prevent out-of-bounds speculation") Signed-off-by: Piotr Krysiuk <piotras@gmail.com> Co-developed-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org>
The following sequence of commands:
register_ftrace_direct(ip, addr1);
modify_ftrace_direct(ip, addr1, addr2);
unregister_ftrace_direct(ip, addr2);
will cause the kernel to warn:
[ 30.179191] WARNING: CPU: 2 PID: 1961 at kernel/trace/ftrace.c:5223 unregister_ftrace_direct+0x130/0x150
[ 30.180556] CPU: 2 PID: 1961 Comm: test_progs W O 5.12.0-rc2-00378-g86bc10a0a711-dirty #3246
[ 30.182453] RIP: 0010:unregister_ftrace_direct+0x130/0x150
When modify_ftrace_direct() changes the addr from old to new it should update
the addr stored in ftrace_direct_funcs. Otherwise the final
unregister_ftrace_direct() won't find the address and will cause the splat.
Louis Peens [Tue, 16 Mar 2021 18:13:10 +0000 (19:13 +0100)]
nfp: flower: fix pre_tun mask id allocation
pre_tun_rule flows does not follow the usual add-flow path, instead
they are used to update the pre_tun table on the firmware. This means
that if the mask-id gets allocated here the firmware will never see the
"NFP_FL_META_FLAG_MANAGE_MASK" flag for the specific mask id, which
triggers the allocation on the firmware side. This leads to the firmware
mask being corrupted and causing all sorts of strange behaviour.
Fixes: 2d37ba78dc45 ("nfp: flower: offload pre-tunnel rules") Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Louis Peens [Tue, 16 Mar 2021 18:13:09 +0000 (19:13 +0100)]
nfp: flower: add ipv6 bit to pre_tunnel control message
Differentiate between ipv4 and ipv6 flows when configuring the pre_tunnel
table to prevent them trampling each other in the table.
Fixes: 715782139ab5 ("nfp: flower: update flow merge code to support IPv6 tunnels") Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Louis Peens [Tue, 16 Mar 2021 18:13:08 +0000 (19:13 +0100)]
nfp: flower: fix unsupported pre_tunnel flows
There are some pre_tunnel flows combinations which are incorrectly being
offloaded without proper support, fix these.
- Matching on MPLS is not supported for pre_tun.
- Match on IPv4/IPv6 layer must be present.
- Destination MAC address must match pre_tun.dev MAC
Fixes: ef3807c69f65 ("nfp: flower: verify pre-tunnel rules") Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
wenxu [Tue, 16 Mar 2021 08:33:54 +0000 (16:33 +0800)]
net/sched: act_api: fix miss set post_ct for ovs after do conntrack in act_ct
When openvswitch conntrack offload with act_ct action. The first rule
do conntrack in the act_ct in tc subsystem. And miss the next rule in
the tc and fallback to the ovs datapath but miss set post_ct flag
which will lead the ct_state_key with -trk flag.
Fixes: 4c72a1db37dc ("net/sched: cls_flower add CT_FLAGS_INVALID flag support") Signed-off-by: wenxu <wenxu@ucloud.cn> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 16 Mar 2021 22:10:43 +0000 (15:10 -0700)]
Merge tag 'linux-can-fixes-for-5.12-20210316' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2021-03-16
this is a pull request of 11 patches for net/master.
The first patch is by Martin Willi and fixes the deletion of network
name spaces with physical CAN interfaces in them.
The next two patches are by me an fix the ISOTP protocol, to ensure
that unused flags in classical CAN frames are properly initialized to
zero.
Stephane Grosjean contributes a patch for the pcan_usb_fd driver,
which add MODULE_SUPPORTED_DEVICE lines for two supported devices.
Angelo Dureghello's patch for the flexcan driver fixes a potential div
by zero, if the bitrate is not set during driver probe.
Jimmy Assarsson's patch for the kvaser_pciefd disables bus load
reporting in the device, if it was previously enabled by the vendor's
out of tree drier. A patch for the kvaser_usb adds support for a new
device, by adding the appropriate USB product ID.
Tong Zhang contributes two patches for the c_can driver. First a
use-after-free in the c_can_pci driver is fixed, in the second patch
the runtime PM for the c_can_pci is fixed by moving the runtime PM
enable/disable from the core driver to the platform driver.
The last two patches are by Torin Cooper-Bennun for the m_can driver.
First a extraneous msg loss warning is removed then he fixes the
RX-path, which might be blocked by errors.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes: d44f9d6a6de0 ("selftests: net: Add SO_REUSEADDR test to check if 4-tuples are fully utilized.") Signed-off-by: Carlos Llamas <cmllamas@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jarod Wilson [Fri, 12 Mar 2021 16:36:51 +0000 (11:36 -0500)]
wireless/nl80211: fix wdev_id may be used uninitialized
Build currently fails with -Werror=maybe-uninitialized set:
net/wireless/nl80211.c: In function '__cfg80211_wdev_from_attrs':
net/wireless/nl80211.c:124:44: error: 'wdev_id' may be used
uninitialized in this function [-Werror=maybe-uninitialized]
Easy fix is to just initialize wdev_id to 0, since it's value doesn't
otherwise matter unless have_wdev_id is true.
Fixes: 31793ce79203 ("cfg80211: avoid holding the RTNL when calling the driver") CC: Johannes Berg <johannes@sipsolutions.net> CC: "David S. Miller" <davem@davemloft.net> CC: Jakub Kicinski <kuba@kernel.org> CC: linux-wireless@vger.kernel.org CC: netdev@vger.kernel.org Signed-off-by: Jarod Wilson <jarod@redhat.com> Link: https://lore.kernel.org/r/20210312163651.1398207-1-jarod@redhat.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
mac80211: choose first enabled channel for monitor
Even if the first channel from sband channel list is invalid
or disabled mac80211 ends up choosing it as the default channel
for monitor interfaces, making them not usable.
Fix this by assigning the first available valid or enabled
channel instead.
Johannes Berg [Wed, 10 Mar 2021 20:58:40 +0000 (21:58 +0100)]
nl80211: fix locking for wireless device netns change
We have all the network interfaces marked as netns-local
since the only reasonable thing to do right now is to set
a whole device, including all netdevs, into a different
network namespace. For this reason, we also have our own
way of changing the network namespace.
Unfortunately, the RTNL locking changes broke this, and
it now results in many RTNL assertions. The trivial fix
for those (just hold RTNL for the changes) however leads
to deadlocks in the cfg80211 netdev notifier.
Since we only need the wiphy, and that's still protected
by the RTNL, add a new NL80211_FLAG_NO_WIPHY_MTX flag to
the nl80211 ops and use it to _not_ take the wiphy mutex
but only the RTNL. This way, the notifier does all the
work necessary during unregistration/registration of the
netdevs from the old and in the new namespace.
Daniel Phan [Tue, 9 Mar 2021 20:41:36 +0000 (12:41 -0800)]
mac80211: Check crypto_aead_encrypt for errors
crypto_aead_encrypt returns <0 on error, so if these calls are not checked,
execution may continue with failed encrypts. It also seems that these two
crypto_aead_encrypt calls are the only instances in the codebase that are
not checked for errors.
Brian Norris [Tue, 23 Feb 2021 05:19:26 +0000 (13:19 +0800)]
mac80211: Allow HE operation to be longer than expected.
We observed some Cisco APs sending the following HE Operation IE in
associate response:
ff 0a 24 f4 3f 00 01 fc ff 00 00 00
Its HE operation parameter is 0x003ff4, so the expected total length is
7 which does not match the actual length = 10. This causes association
failing with "HE AP is missing HE Capability/operation."
According to P802.11ax_D4 Table9-94, HE operation is extensible, and
according to 802.11-2016 10.27.8, STA should discard the part beyond
the maximum length and parse the truncated element.
Allow HE operation element to be longer than expected to handle this
case and future extensions.
Fixes: bfb6fca62971 ("mac80211: refactor extended element parsing") Signed-off-by: Brian Norris <briannorris@chromium.org> Signed-off-by: Yen-lin Lai <yenlinlai@chromium.org> Link: https://lore.kernel.org/r/20210223051926.2653301-1-yenlinlai@chromium.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Markus Theil [Sat, 13 Feb 2021 13:36:53 +0000 (14:36 +0100)]
mac80211: fix double free in ibss_leave
Clear beacon ie pointer and ie length after free
in order to prevent double free.
==================================================================
BUG: KASAN: double-free or invalid-free \
in ieee80211_ibss_leave+0x83/0xe0 net/mac80211/ibss.c:1876
Johannes Berg [Fri, 12 Feb 2021 10:22:14 +0000 (11:22 +0100)]
mac80211: fix rate mask reset
Coverity reported the strange "if (~...)" condition that's
always true. It suggested that ! was intended instead of ~,
but upon further analysis I'm convinced that what really was
intended was a comparison to 0xff/0xffff (in HT/VHT cases
respectively), since this indicates that all of the rates
are enabled.
Change the comparison accordingly.
I'm guessing this never really mattered because a reset to
not having a rate mask is basically equivalent to having a
mask that enables all rates.
can: m_can: m_can_rx_peripheral(): fix RX being blocked by errors
For M_CAN peripherals, m_can_rx_handler() was called with quota = 1,
which caused any error handling to block RX from taking place until
the next time the IRQ handler is called. This had been observed to
cause RX to be blocked indefinitely in some cases.
This is fixed by calling m_can_rx_handler with a sensibly high quota.
Fixes: 15c73136c866 ("can: m_can: Create a m_can platform framework") Link: https://lore.kernel.org/r/20210303144350.4093750-1-torin@maxiluxsystems.com Suggested-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Torin Cooper-Bennun <torin@maxiluxsystems.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
can: m_can: m_can_do_rx_poll(): fix extraneous msg loss warning
Message loss from RX FIFO 0 is already handled in
m_can_handle_lost_msg(), with netdev output included.
Removing this warning also improves driver performance under heavy
load, where m_can_do_rx_poll() may be called many times before this
interrupt is cleared, causing this message to be output many
times (thanks Mariusz Madej for this report).
There is a UAF in c_can_pci_remove(). dev is released by
free_c_can_dev() and is used by pci_iounmap(pdev, priv->base) later.
To fix this issue, save the mmio address before releasing dev.
Jimmy Assarsson [Tue, 9 Mar 2021 09:17:23 +0000 (10:17 +0100)]
can: kvaser_pciefd: Always disable bus load reporting
Under certain circumstances, when switching from Kvaser's linuxcan driver
(kvpciefd) to the SocketCAN driver (kvaser_pciefd), the bus load reporting
is not disabled.
This is flooding the kernel log with prints like:
[3485.574677] kvaser_pciefd 0000:02:00.0: Received unexpected packet type 0x00000009
Always put the controller in the expected state, instead of assuming that
bus load reporting is inactive.
Note: If bus load reporting is enabled when the driver is loaded, you will
still get a number of bus load packages (and printouts), before it is
disabled.
can: flexcan: flexcan_chip_freeze(): fix chip freeze for missing bitrate
For cases when flexcan is built-in, bitrate is still not set at
registering. So flexcan_chip_freeze() generates:
[ 1.860000] *** ZERO DIVIDE *** FORMAT=4
[ 1.860000] Current process id is 1
[ 1.860000] BAD KERNEL TRAP: 00000000
[ 1.860000] PC: [<402e70c8>] flexcan_chip_freeze+0x1a/0xa8
To allow chip freeze, using an hardcoded timeout when bitrate is still
not set.
Fixes: 6c15f0256dc9 ("can: flexcan: enable RX FIFO after FRZ/HALT valid") Link: https://lore.kernel.org/r/20210315231510.650593-1-angelo@kernel-space.org Signed-off-by: Angelo Dureghello <angelo@kernel-space.org>
[mkl: use if instead of ? operator] Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Since the peak_usb driver also supports the CAN-USB interfaces
"PCAN-USB X6" and "PCAN-Chip USB" from PEAK-System GmbH, this patch adds
their names to the list of explicitly supported devices.
Fixes: 4d074aad1b4a ("can: usb: Add support of PCAN-Chip USB stamp module") Fixes: 79812bfd1764 ("can: peak: Add support for PCAN-USB X6 USB interface") Link: https://lore.kernel.org/r/20210309082128.23125-3-s.grosjean@peak-system.com Signed-off-by: Stephane Grosjean <s.grosjean@peak-system.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
can: isotp: TX-path: ensure that CAN frame flags are initialized
The previous patch ensures that the TX flags (struct
can_isotp_ll_options::tx_flags) are 0 for classic CAN frames or a user
configured value for CAN-FD frames.
This patch sets the CAN frames flags unconditionally to the ISO-TP TX
flags, so that they are initialized to a proper value. Otherwise when
running "candump -x" on a classical CAN ISO-TP stream shows wrongly
set "B" and "E" flags.
| $ candump any,0:0,#FFFFFFFF -extA
| [...]
| can0 TX B E 713 [8] 2B 0A 0B 0C 0D 0E 0F 00
| can0 TX B E 713 [8] 2C 01 02 03 04 05 06 07
| can0 TX B E 713 [8] 2D 08 09 0A 0B 0C 0D 0E
| can0 TX B E 713 [8] 2E 0F 00 01 02 03 04 05
Martin Willi [Tue, 2 Mar 2021 12:24:23 +0000 (13:24 +0100)]
can: dev: Move device back to init netns on owning netns delete
When a non-initial netns is destroyed, the usual policy is to delete
all virtual network interfaces contained, but move physical interfaces
back to the initial netns. This keeps the physical interface visible
on the system.
CAN devices are somewhat special, as they define rtnl_link_ops even
if they are physical devices. If a CAN interface is moved into a
non-initial netns, destroying that netns lets the interface vanish
instead of moving it back to the initial netns. default_device_exit()
skips CAN interfaces due to having rtnl_link_ops set. Reproducer:
ip netns add foo
ip link set can0 netns foo
ip netns delete foo
WARNING: CPU: 1 PID: 84 at net/core/dev.c:11030 ops_exit_list+0x38/0x60
CPU: 1 PID: 84 Comm: kworker/u4:2 Not tainted 5.10.19 #1
Workqueue: netns cleanup_net
[<c010e700>] (unwind_backtrace) from [<c010a1d8>] (show_stack+0x10/0x14)
[<c010a1d8>] (show_stack) from [<c086dc10>] (dump_stack+0x94/0xa8)
[<c086dc10>] (dump_stack) from [<c086b938>] (__warn+0xb8/0x114)
[<c086b938>] (__warn) from [<c086ba10>] (warn_slowpath_fmt+0x7c/0xac)
[<c086ba10>] (warn_slowpath_fmt) from [<c0629f20>] (ops_exit_list+0x38/0x60)
[<c0629f20>] (ops_exit_list) from [<c062a5c4>] (cleanup_net+0x230/0x380)
[<c062a5c4>] (cleanup_net) from [<c0142c20>] (process_one_work+0x1d8/0x438)
[<c0142c20>] (process_one_work) from [<c0142ee4>] (worker_thread+0x64/0x5a8)
[<c0142ee4>] (worker_thread) from [<c0148a98>] (kthread+0x148/0x14c)
[<c0148a98>] (kthread) from [<c0100148>] (ret_from_fork+0x14/0x2c)
To properly restore physical CAN devices to the initial netns on owning
netns exit, introduce a flag on rtnl_link_ops that can be set by drivers.
For CAN devices setting this flag, default_device_exit() considers them
non-virtual, applying the usual namespace move.
The issue was introduced in the commit mentioned below, as at that time
CAN devices did not have a dellink() operation.
Davide Caratti [Mon, 15 Mar 2021 10:41:16 +0000 (11:41 +0100)]
mptcp: fix ADD_ADDR HMAC in case port is specified
Currently, Linux computes the HMAC contained in ADD_ADDR sub-option using
the Address Id and the IP Address, and hardcodes a destination port equal
to zero. This is not ok for ADD_ADDR with port: ensure to account for the
endpoint port when computing the HMAC, in compliance with RFC8684 §3.4.1.
Fixes: 1d0d3d44ede7 ("mptcp: add port support for ADD_ADDR suboption writing") Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Acked-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
tcp: relookup sock for RST+ACK packets handled by obsolete req sock
Currently tcp_check_req can be called with obsolete req socket for which big
socket have been already created (because of CPU race or early demux
assigning req socket to multiple packets in gro batch).
Commit 2b027671dde9cb6409d9 ("tcp: try to keep packet if SYN_RCV race
is lost") added retry in case when tcp_check_req is called for PSH|ACK packet.
But if client sends RST+ACK immediatly after connection being
established (it is performing healthcheck, for example) retry does not
occur. In that case tcp_check_req tries to close req socket,
leaving big socket active.
Fixes: 2b027671dde ("tcp: try to keep packet if SYN_RCV race is lost") Signed-off-by: Alexander Ovechkin <ovov@yandex-team.ru> Reported-by: Oleg Senin <olegsenin@yandex-team.ru> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes: 8f23ce1cbbee ("tipc: add support for AEAD key setting via netlink") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tuong Lien <tuong.t.lien@dektech.com.au> Cc: Jon Maloy <jmaloy@redhat.com> Cc: Ying Xue <ying.xue@windriver.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ong Boon Leong [Mon, 15 Mar 2021 04:33:42 +0000 (12:33 +0800)]
net: phylink: Fix phylink_err() function name error in phylink_major_config
if pl->mac_ops->mac_finish() failed, phylink_err should use
"mac_finish" instead of "mac_prepare".
Fixes: 0e77ad11c559b ("net: phylink: re-implement interface configuration with PCS") Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Xie He [Sun, 14 Mar 2021 11:21:01 +0000 (04:21 -0700)]
net: hdlc_x25: Prevent racing between "x25_close" and "x25_xmit"/"x25_rx"
"x25_close" is called by "hdlc_close" in "hdlc.c", which is called by
hardware drivers' "ndo_stop" function.
"x25_xmit" is called by "hdlc_start_xmit" in "hdlc.c", which is hardware
drivers' "ndo_start_xmit" function.
"x25_rx" is called by "hdlc_rcv" in "hdlc.c", which receives HDLC frames
from "net/core/dev.c".
"x25_close" races with "x25_xmit" and "x25_rx" because their callers race.
However, we need to ensure that the LAPB APIs called in "x25_xmit" and
"x25_rx" are called before "lapb_unregister" is called in "x25_close".
This patch adds locking to ensure when "x25_xmit" and "x25_rx" are doing
their work, "lapb_unregister" is not yet called in "x25_close".
Reasons for not solving the racing between "x25_close" and "x25_xmit" by
calling "netif_tx_disable" in "x25_close":
1. We still need to solve the racing between "x25_close" and "x25_rx";
2. The design of the HDLC subsystem assumes the HDLC hardware drivers
have full control over the TX queue, and the HDLC protocol drivers (like
this driver) have no control. Controlling the queue here in the protocol
driver may interfere with hardware drivers' control of the queue.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Xie He <xie.he.0141@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Westphal [Mon, 15 Mar 2021 10:31:09 +0000 (11:31 +0100)]
netfilter: ctnetlink: fix dump of the expect mask attribute
Before this change, the mask is never included in the netlink message, so
"conntrack -E expect" always prints 0.0.0.0.
In older kernels the l3num callback struct was passed as argument, based
on tuple->src.l3num. After the l3num indirection got removed, the call
chain is based on m.src.l3num, but this value is 0xffff.
Init l3num to the correct value.
Fixes: d4d47e24afda6 ("netfilter: conntrack: remove ctnetlink callbacks from l3 protocol trackers") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Mark Tomlinson [Mon, 8 Mar 2021 01:24:13 +0000 (14:24 +1300)]
netfilter: x_tables: Use correct memory barriers.
When a new table value was assigned, it was followed by a write memory
barrier. This ensured that all writes before this point would complete
before any writes after this point. However, to determine whether the
rules are unused, the sequence counter is read. To ensure that all
writes have been done before these reads, a full memory barrier is
needed, not just a write memory barrier. The same argument applies when
incrementing the counter, before the rules are read.
Changing to using smp_mb() instead of smp_wmb() fixes the kernel panic
reported in d59e3d10acbd (which is still present), while still
maintaining the same speed of replacing tables.
The smb_mb() barriers potentially slow the packet path, however testing
has shown no measurable change in performance on a 4-core MIPS64
platform.
Fixes: a52d52e33bbd ("netfilter: get rid of atomic ops in fast path") Signed-off-by: Mark Tomlinson <mark.tomlinson@alliedtelesis.co.nz> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>