git.baikalelectronics.ru Git

net/mlx5e: Fix division by 0 in mlx5e_select_queue

mlx5e_select_queue compares num_tc_x_num_ch to real_num_tx_queues to
determine if HTB and/or PTP offloads are active. If they are, it
calculates netdev_pick_tx() % num_tc_x_num_ch to prevent it from
selecting HTB and PTP queues for regular traffic. However, before the
channels are first activated, num_tc_x_num_ch is zero. If
ndo_select_queue gets called at this point, the HTB/PTP check will pass,
and mlx5e_select_queue will attempt to take a modulo by num_tc_x_num_ch,
which equals to zero.

This commit fixes the bug by assigning num_tc_x_num_ch to a non-zero
value before registering the netdev.

Fixes: 63f9d5b33580 ("net/mlx5e: Support HTB offload")
Reported-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: Fix error path for ethtool set-priv-flag

Expose error value when failing to comply to command:
$ ethtool --set-priv-flags eth2 rx_cqe_compress [on/off]

Fixes: 7916af0c8b24 ("net/mlx5e: Fail safe cqe compressing/moderation mode setting")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: Offload tuple rewrite for non-CT flows

Setting connection tracking OVS flows and then setting non-CT flows that
use tuple rewrite action (e.g. mod_tp_dst), causes the latter flows not
being offloaded.

Fix by using a stricter condition in modify_header_match_supported() to
check tuple rewrite support only for flows with CT action. The check is
factored out into standalone modify_tuple_supported() function to aid
readability.

Fixes: 4ae2aaf86eab ("net/mlx5e: CT: Don't offload tuple rewrites for established tuples")
Signed-off-by: Dima Chumak <dchumak@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: Allow to match on MPLS parameters only for MPLS over UDP

Currently, we support hardware offload only for MPLS over UDP.
However, rules matching on MPLS parameters are now wrongly offloaded
for regular MPLS, without actually taking the parameters into
consideration when doing the offload.
Fix it by rejecting such unsupported rules.

Fixes: dcffbe80a25d ("net/mlx5e: Allow to match on mpls parameters")
Signed-off-by: Alaa Hleihel <alaa@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: Add back multicast stats for uplink representor

The multicast counter got removed from uplink representor due to the
cited patch.

Fixes: a66db552d9f6 ("net/mlx5e: Fix multicast counter not up-to-date in "ip -s"")
Signed-off-by: Huy Nguyen <huyn@nvidia.com>
Reviewed-by: Daniel Jurgens <danielj@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

docs: networking: Fix a typo

s/subsytem/subsystem/

Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8169: fix DMA being used after buffer free if WoL is enabled

IOMMU errors have been reported if WoL is enabled and interface is
brought down. It turned out that the network chip triggers DMA
transfers after the DMA buffers have been freed. For WoL to work we
need to leave rx enabled, therefore simply stop the chip from being
a DMA busmaster.

Fixes: bffe987d4250 ("r8169: add rtl8169_up")
Tested-by: Paul Blazejowski <paulb@blazebox.homeip.net>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'linux-can-fixes-for-5.12-20210320' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can

Marc Kleine-Budde says:

====================
pull-request: can 2021-03-20

this is a pull request of 2 patches for net/master.

The first patch is by Oliver Hartkopp. He fixes the TX-path in the
ISO-TP protocol by properly initializing the outgoing CAN frames.

The second patch is by me and reverts a patch from my previous pull
request which added MODULE_SUPPORTED_DEVICE to the peak_usb driver. In
the mean time in Linus's tree the entirely MODULE_SUPPORTED_DEVICE was
removed. So this reverts the adding of the new MODULE_SUPPORTED_DEVICE
to avoid the merge conflict.

If you prefer to resolve the merge conflict by hand, I'll send a new
pull request without that patch.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'pa-fox-validation'

Alex Elder says:

====================
ipa: fix validation

There is sanity checking code in the IPA driver that's meant to be
enabled only during development.  This allows the driver to make
certain assumptions, but not have to verify those assumptions are
true at (operational) runtime.  This code is built conditional on
IPA_VALIDATION, set (if desired) inside the IPA makefile.

Unfortunately, this validation code has some errors.  First, there
are some mismatched arguments supplied to some dev_err() calls in
ipa_cmd_table_valid() and ipa_cmd_header_valid(), and these are
exposed if validation is enabled.  Second, the tag that enables
this conditional code isn't used consistently (it's IPA_VALIDATE
in some spots and IPA_VALIDATION in others).

This series fixes those two problems with the conditional validation
code.

Version 2 removes the two patches that introduced ipa_assert().  It
also modifies the description in the first patch so that it mentions
the changes made to ipa_cmd_table_valid().
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipa: fix init header command validation

We use ipa_cmd_header_valid() to ensure certain values we will
program into hardware are within range, well in advance of when we
actually program them. This way we avoid having to check for errors
when we actually program the hardware.

Unfortunately the dev_err() call for a bad offset value does not
supply the arguments to match the format specifiers properly.
Fix this.

There was also supposed to be a check to ensure the size to be
programmed fits in the field that holds it. Add this missing check.

Rearrange the way we ensure the header table fits in overall IPA
memory range.

Finally, update ipa_cmd_table_valid() so the format of messages
printed for errors matches what's done in ipa_cmd_header_valid().

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Alexei Starovoitov says:

====================
pull-request: bpf 2021-03-20

The following pull-request contains BPF updates for your *net* tree.

We've added 5 non-merge commits during the last 3 day(s) which contain
a total of 8 files changed, 155 insertions(+), 12 deletions(-).

The main changes are:

1) Use correct nops in fexit trampoline, from Stanislav.

2) Fix BTF dump, from Jean-Philippe.

3) Fix umd memory leak, from Zqiang.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

can: peak_usb: Revert "can: peak_usb: add forgotten supported devices"

In commit f36e837328a9 ("module: remove never implemented
MODULE_SUPPORTED_DEVICE") the MODULE_SUPPORTED_DEVICE macro was
removed from the kerne entirely. Shortly before this patch was applied
mainline the commit 5b643d51db18 ("can: peak_usb: add forgotten
supported devices") was added to net/master. As this would result in a
merge conflict, let's revert this patch.

Fixes: 5b643d51db18 ("can: peak_usb: add forgotten supported devices")
Link: https://lore.kernel.org/r/20210320192649.341832-1-mkl@pengutronix.de
Suggested-by: Leon Romanovsky <leon@kernel.org>
Cc: Stephane Grosjean <s.grosjean@peak-system.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: isotp: tx-path: zero initialize outgoing CAN frames

Commit 5838a9d6a2fe ("can: isotp: TX-path: ensure that CAN frame flags are
initialized") ensured the TX flags to be properly set for outgoing CAN
frames.

In fact the root cause of the issue results from a missing initialization
of outgoing CAN frames created by isotp. This is no problem on the CAN bus
as the CAN driver only picks the correctly defined content from the struct
can(fd)_frame. But when the outgoing frames are monitored (e.g. with
candump) we potentially leak some bytes in the unused content of
struct can(fd)_frame.

Fixes: 3f9425168a78 ("can: add ISO 15765-2:2016 transport protocol")
Cc: Marc Kleine-Budde <mkl@pengutronix.de>
Link: https://lore.kernel.org/r/20210319100619.10858-1-socketcan@hartkopp.net
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

bpf: Use NOP_ATOMIC5 instead of emit_nops(&prog, 5) for BPF_TRAMP_F_CALL_ORIG

__bpf_arch_text_poke does rewrite only for atomic nop5, emit_nops(xxx, 5)
emits non-atomic one which breaks fentry/fexit with k8 atomics:

P6_NOP5 == P6_NOP5_ATOMIC (0f1f440000 == 0f1f440000)
K8_NOP5 != K8_NOP5_ATOMIC (6666906690 != 6666666690)

Can be reproduced by doing "ideal_nops = k8_nops" in "arch_init_ideal_nops()
and running fexit_bpf2bpf selftest.

Fixes: f7fede7bce91 ("bpf: Fix fexit trampoline.")
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210320000001.915366-1-sdf@google.com

bpf: Fix umd memory leak in copy_process()

The syzbot reported a memleak as follows:

BUG: memory leak
unreferenced object 0xffff888101b41d00 (size 120):
  comm "kworker/u4:0", pid 8, jiffies 4294944270 (age 12.780s)
  backtrace:
    [<ffffffff8125dc56>] alloc_pid+0x66/0x560
    [<ffffffff81226405>] copy_process+0x1465/0x25e0
    [<ffffffff81227943>] kernel_clone+0xf3/0x670
    [<ffffffff812281a1>] kernel_thread+0x61/0x80
    [<ffffffff81253464>] call_usermodehelper_exec_work
    [<ffffffff81253464>] call_usermodehelper_exec_work+0xc4/0x120
    [<ffffffff812591c9>] process_one_work+0x2c9/0x600
    [<ffffffff81259ab9>] worker_thread+0x59/0x5d0
    [<ffffffff812611c8>] kthread+0x178/0x1b0
    [<ffffffff8100227f>] ret_from_fork+0x1f/0x30

unreferenced object 0xffff888110ef5c00 (size 232):
  comm "kworker/u4:0", pid 8414, jiffies 4294944270 (age 12.780s)
  backtrace:
    [<ffffffff8154a0cf>] kmem_cache_zalloc
    [<ffffffff8154a0cf>] __alloc_file+0x1f/0xf0
    [<ffffffff8154a809>] alloc_empty_file+0x69/0x120
    [<ffffffff8154a8f3>] alloc_file+0x33/0x1b0
    [<ffffffff8154ab22>] alloc_file_pseudo+0xb2/0x140
    [<ffffffff81559218>] create_pipe_files+0x138/0x2e0
    [<ffffffff8126c793>] umd_setup+0x33/0x220
    [<ffffffff81253574>] call_usermodehelper_exec_async+0xb4/0x1b0
    [<ffffffff8100227f>] ret_from_fork+0x1f/0x30

After the UMD process exits, the pipe_to_umh/pipe_from_umh and
tgid need to be released.

Fixes: d49be41f7f8f ("bpf: Add kernel module with user mode driver that populates bpffs.")
Reported-by: syzbot+44908bb56d2bfe56b28e@syzkaller.appspotmail.com
Signed-off-by: Zqiang <qiang.zhang@windriver.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210317030915.2865-1-qiang.zhang@windriver.com

Merge branch 'libbpf: Fix BTF dump of pointer-to-array-of-struct'

Jean-Philippe Brucker says:

====================

Fix an issue with the libbpf BTF dump, see patch 1 for details.

Since [v1] I added the selftest in patch 2, though I couldn't figure out
a way to make it independent from the order in which debug info is
issued by the compiler.

[v1]: https://lore.kernel.org/bpf/20210318122700.396574-1-jean-philippe@linaro.org/
====================

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>

selftests/bpf: Add selftest for pointer-to-array-of-struct BTF dump

Bpftool used to issue forward declarations for a struct used as part of
a pointer to array, which is invalid. Add a test to check that the
struct is fully defined in this case:

@@ -134,9 +134,9 @@
};
};

-struct struct_in_array {};
+struct struct_in_array;

-struct struct_in_array_typed {};
+struct struct_in_array_typed;

typedef struct struct_in_array_typed struct_in_array_t[2];

@@ -189,3 +189,7 @@
struct struct_with_embedded_stuff _14;
};

+struct struct_in_array {};
+
+struct struct_in_array_typed {};
+
...
#13/1 btf_dump: syntax:FAIL

Suggested-by: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210319112554.794552-3-jean-philippe@linaro.org

libbpf: Fix BTF dump of pointer-to-array-of-struct

The vmlinux.h generated from BTF is invalid when building
drivers/phy/ti/phy-gmii-sel.c with clang:

vmlinux.h:61702:27: error: array type has incomplete element type ‘struct reg_field’
61702 |  const struct reg_field (*regfields)[3];
      |                           ^~~~~~~~~

bpftool generates a forward declaration for this struct regfield, which
compilers aren't happy about. Here's a simplified reproducer:

struct inner {
int val;
};
struct outer {
struct inner (*ptr_to_array)[2];
} A;

After build with clang -> bpftool btf dump c -> clang/gcc:
./def-clang.h:11:23: error: array has incomplete element type 'struct inner'
        struct inner (*ptr_to_array)[2];

Member ptr_to_array of struct outer is a pointer to an array of struct
inner. In the DWARF generated by clang, struct outer appears before
struct inner, so when converting BTF of struct outer into C, bpftool
issues a forward declaration to struct inner. With GCC the DWARF info is
reversed so struct inner gets fully defined.

That forward declaration is not sufficient when compilers handle an
array of the struct, even when it's only used through a pointer. Note
that we can trigger the same issue with an intermediate typedef:

struct inner {
        int val;
};
typedef struct inner inner2_t[2];
struct outer {
        inner2_t *ptr_to_array;
} A;

Becomes:

struct inner;
typedef struct inner inner2_t[2];

And causes:

./def-clang.h:10:30: error: array has incomplete element type 'struct inner'
typedef struct inner inner2_t[2];

To fix this, clear through_ptr whenever we encounter an intermediate
array, to make the inner struct part of a strong link and force full
declaration.

Fixes: b653ca9bf78a ("libbpf: add btf_dump API for BTF-to-C conversion")
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210319112554.794552-2-jean-philippe@linaro.org

selftests: forwarding: vxlan_bridge_1d: Fix vxlan ecn decapsulate value

The ECN bit defines ECT(1) = 1, ECT(0) = 2. So inner 0x02 + outer 0x01
should be inner ECT(0) + outer ECT(1). Based on the description of
__INET_ECN_decapsulate, the final decapsulate value should be
ECT(1). So fix the test expect value to 0x01.

Before the fix:
TEST: VXLAN: ECN decap: 01/02->0x02                                 [FAIL]
        Expected to capture 10 packets, got 0.

After the fix:
TEST: VXLAN: ECN decap: 01/02->0x01                                 [ OK ]

Fixes: 6e32157f3984 ("selftests: forwarding: vxlan_bridge_1d: Add an ECN decap test")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mptcp: Change mailing list address

The mailing list for MPTCP maintenance has moved to the
kernel.org-supported mptcp@lists.linux.dev address.

Complete, combined archives for both lists are now hosted at
https://lore.kernel.org/mptcp

Cc: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2021-03-19

This series contains updates to e1000e and igb drivers.

Tom Seewald fixes duplicate guard issues by including the driver name in
the guard for e1000e and igb.

Jesse adds checks that timestamping is on and valid to avoid possible
issues with a misinterpreted time stamp for igb.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

selinux: vsock: Set SID for socket returned by accept()

For AF_VSOCK, accept() currently returns sockets that are unlabelled.
Other socket families derive the child's SID from the SID of the parent
and the SID of the incoming packet. This is typically done as the
connected socket is placed in the queue that accept() removes from.

Reuse the existing 'security_sk_clone' hook to copy the SID from the
parent (server) socket to the child. There is no packet SID in this
case.

Fixes: 031acdb8ab77 ("VSOCK: Introduce VM Sockets")
Signed-off-by: David Brazdil <dbrazdil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: stmmac: dwmac-sun8i: Provide TX and RX fifo sizes

MTU cannot be changed on dwmac-sun8i. (ip link set eth0 mtu xxx returning EINVAL)
This is due to tx_fifo_size being 0, since this value is used to compute valid
MTU range.
Like dwmac-sunxi (with commit a6fbf4e03518 ("net: stmmac: dwmac-sunxi: Provide TX and RX fifo sizes"))
dwmac-sun8i need to have tx and rx fifo sizes set.
I have used values from datasheets.
After this patch, setting a non-default MTU (like 1000) value works and network is still useable.

Tested-on: sun8i-h3-orangepi-pc
Tested-on: sun8i-r40-bananapi-m2-ultra
Tested-on: sun50i-a64-bananapi-m64
Tested-on: sun50i-h5-nanopi-neo-plus2
Tested-on: sun50i-h6-pine-h64
Fixes: 6f537725a8e ("net-next: stmmac: Add dwmac-sun8i")
Reported-by: Belisko Marek <marek.belisko@gmail.com>
Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8152: limit the RX buffer size of RTL8153A for USB 2.0

If the USB host controller is EHCI, the throughput is reduced from
300Mb/s to 60Mb/s, when the rx buffer size is modified from 16K to
32K.

According to the EHCI spec, the maximum size of the qTD is 20K.
Therefore, when the driver uses more than 20K buffer, the latency
time of EHCI would be increased. And, it let the RTL8153A get worse
throughput.

However, the driver uses alloc_pages() for rx buffer, so I limit
the rx buffer to 16K rather than 20K.

BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=205923
Fixes: a5ac476703cf ("r8152: separate the rx buffer size")
Reported-by: Robert Davies <robdavies1977@gmail.com>
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sch_red: Fix a typo

s/recalcultion/recalculation/

Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sctp: move sk_route_caps check and set into sctp_outq_flush_transports

The sk's sk_route_caps is set in sctp_packet_config, and later it
only needs to change when traversing the transport_list in a loop,
as the dst might be changed in the tx path.

So move sk_route_caps check and set into sctp_outq_flush_transports
from sctp_packet_transmit. This also fixes a dst leak reported by
Chen Yi:

https://bugzilla.kernel.org/show_bug.cgi?id=212227

As calling sk_setup_caps() in sctp_packet_transmit may also set the
sk_route_caps for the ctrl sock in a netns. When the netns is being
deleted, the ctrl sock's releasing is later than dst dev's deleting,
which will cause this dev's deleting to hang and dmesg error occurs:

unregister_netdevice: waiting for xxx to become free. Usage count = 1

Reported-by: Chen Yi <yiche@redhat.com>
Fixes: fc02dd43bd02 ("sctp: call sk_setup_caps in sctp_packet_transmit instead")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

igb: check timestamp validity

Add a couple of checks to make sure timestamping is on and that the
timestamp value from DMA is valid. This avoids any functional issues
that could come from a misinterpreted time stamp.

One of the functions changed doesn't need a return value added because
there was no value in checking from the calling locations.

While here, fix a couple of reverse christmas tree issues next to
the code being changed.

Fixes: 0fa12b3c5ebf ("igb: Pull timestamp from fragment before adding it to skb")
Fixes: 15d5613e8254 ("igb: add XDP support")
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Dave Switzer <david.switzer@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

igb: Fix duplicate include guard

The include guard "_E1000_HW_H_" is used by two separate header files in
two different drivers (e1000/e1000_hw.h and igb/e1000_hw.h). Using the
same include guard macro in more than one header file may cause
unexpected behavior from the compiler. Fix this by renaming the
duplicate guard in the igb driver.

Fixes: d49a2a41710f ("igb: PCI-Express 82575 Gigabit Ethernet driver")
Signed-off-by: Tom Seewald <tseewald@gmail.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

e1000e: Fix duplicate include guard

The include guard "_E1000_HW_H_" is used by header files in three
different drivers (e1000/e1000_hw.h, e1000e/hw.h, and igb/e1000_hw.h).
Using the same include guard macro in more than one header file may
cause unexpected behavior from the compiler. Fix the duplicate include
guard in the e1000e driver by renaming it.

Fixes: 3f32e81cc0ea ("[E1000E]: New pci-express e1000 driver (currently for ICH9 devices only)")
Signed-off-by: Tom Seewald <tseewald@gmail.com>
Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com>
Acked-by: Sasha Neftin <sasha.neftin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

net: cdc-phonet: fix data-interface release on probe failure

Set the disconnected flag before releasing the data interface in case
netdev registration fails to avoid having the disconnect callback try to
deregister the never registered netdev (and trigger a WARN_ON()).

Fixes: 3791006fe8d8 ("USB host CDC Phonet network interface driver")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

1) Several patches to testore use of memory barriers instead of RCU to
   ensure consistent access to ruleset, from Mark Tomlinson.

2) Fix dump of expectation via ctnetlink, from Florian Westphal.

3) GRE helper works for IPv6, from Ludovic Senecaux.

4) Set error on unsupported flowtable flags.

5) Use delayed instead of deferrable workqueue in the flowtable,
   from Yinjun Zhang.

6) Fix spurious EEXIST in case of add-after-delete flowtable in
   the same batch.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: check all name nodes in __dev_alloc_name

__dev_alloc_name(), when supplied with a name containing '%d',
will search for the first available device number to generate a
unique device name.

Since commit 25be6dea04db1cadb8b6172cdcc13962846db444 ("net:
introduce name_node struct to be used in hashlist") network
devices may have alternate names.  __dev_alloc_name() does take
these alternate names into account, possibly generating a name
that is already taken and failing with -ENFILE as a result.

This demonstrates the bug:

    # rmmod dummy 2>/dev/null
    # ip link property add dev lo altname dummy0
    # modprobe dummy numdummies=1
    modprobe: ERROR: could not insert 'dummy': Too many open files in system

Instead of creating a device named dummy1, modprobe fails.

Fix this by checking all the names in the d->name_node list, not just d->name.

Signed-off-by: Jiri Bohac <jbohac@suse.cz>
Fixes: 25be6dea04db ("net: introduce name_node struct to be used in hashlist")
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: marvell: Remove reference to CONFIG_MV64X60

Commit 4b4b1325bc1a ("powerpc/embedded6xx: Remove C2K board support")
removed last selector of CONFIG_MV64X60.

As it is not a user selectable config item, all references to it
are stale. Remove them.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'octeontx2-fixes'

Hariprasad Kelam says:

====================
octeontx2: miscellaneous fixes

This series of patches fixes various issues related to NPC MCAM entry
management, debugfs, devlink, CGX LMAC mapping, RSS config etc

Change-log:
v2:
Fixed below review comments
- corrected Fixed tag syntax with 12 digits SHA1
and providing space between SHA1 and subject line
- remove code improvement patch
- make commit description more clear
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

octeontx2-af: Fix uninitialized variable warning

Initialize l4_key_offset variable to fix uninitialized
variable compiler warning.

Fixes: 5bfad498401d ("octeontx2-af: Support ESP/AH RSS hashing")
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

octeontx2-af: fix infinite loop in unmapping NPC counter

unmapping npc counter works in a way by traversing all mcam
entries to find which mcam rule is associated with counter.
But loop cursor variable 'entry' is not incremented before
checking next mcam entry which resulting in infinite loop.

This in turn hogs the kworker thread forever and no other
mbox message is processed by AF driver after that.
Fix this by updating entry value before checking next
mcam entry.

Fixes: edc518a52b20 ("octeontx2-af: Map or unmap NPC MCAM entry and counter")
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

octeontx2-pf: Clear RSS enable flag on interace down

RSS configuration can not be get/set when interface is in down state
as they required mbox communication. RSS enable flag status
is used for set/get configuration. Current code do not clear the
RSS enable flag on interface down which lead to mbox error while
trying to set/get RSS configuration.

Fixes: 1dee7ca9acad ("octeontx2-pf: Receive side scaling support")
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

octeontx2-af: Fix irq free in rvu teardown

Current devlink code try to free already freed irqs as the
irq_allocate flag is not cleared after free leading to kernel
crash while removing rvu driver. The patch fixes the irq free
sequence and clears the irq_allocate flag on free.

Fixes: ea924d0ed956 ("octeontx2-af: Add mailbox IRQ and msg handlers")
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

octeontx2-af: Return correct CGX RX fifo size

CGX receive buffer size is a constant value and
cannot be read from CGX0 block always since
CGX0 may not enabled everytime. Hence return CGX
receive buffer size from first enabled CGX block
instead of CGX0.

Fixes: e2cbd6fcef41 ("octeontx2-af: cn10K: MTU configuration")
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

octeontx2-af: Remove TOS field from MKEX TX

The MKEX profile describes what packet fields need to be extracted from
the input packet and how to place those packet fields in the output key
for MCAM matching. The MKEX profile can be in a way where higher layer
packet fields can overwrite lower layer packet fields in output MCAM
Key.
Hence MKEX profile is always ensured that there are no overlaps between
any of the layers. But the commit 978930dabe82
("octeontx2-af: cleanup KPU config data") introduced TX TOS field which
overlaps with DMAC in MCAM key.
This led to AF driver returning error when TX rule is installed with
DMAC as match criteria since DMAC gets overwritten and cannot be
supported. This patch fixes the issue by removing TOS field from MKEX TX
profile.

Fixes: 978930dabe82 ("octeontx2-af: cleanup KPU config data")
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

octeontx2-af: Formatting debugfs entry rsrc_alloc.

With the existing rsrc_alloc's format, there is misalignment for the
pcifunc entries whose VF's index is a double digit. This patch fixes
this.

    pcifunc     NPA         NIX0        NIX1        SSO GROUP   SSOWS
    TIM         CPT0        CPT1        REE0        REE1
    PF0:VF0     8           5
    PF0:VF1     9                       3
    PF0:VF10    18          10
    PF0:VF11    19                      8
    PF0:VF12    20          11
    PF0:VF13    21                      9
    PF0:VF14    22          12
    PF0:VF15    23                      10
    PF1         0           0

Fixes: aa130fb943aa ("octeontx2-af: Dump current resource provisioning status")
Signed-off-by: Rakesh Babu <rsaladi2@marvell.com>
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

octeontx2-pf: Do not modify number of rules

In the ETHTOOL_GRXCLSRLALL ioctl ethtool uses
below structure to read number of rules from the driver.

    struct ethtool_rxnfc {
            __u32                           cmd;
            __u32                           flow_type;
            __u64                           data;
            struct ethtool_rx_flow_spec     fs;
            union {
                    __u32                   rule_cnt;
                    __u32                   rss_context;
            };
            __u32                           rule_locs[0];
    };

Driver must not modify rule_cnt member. But currently driver
modifies it by modifying rss_context. Hence fix it by using a
local variable.

Fixes: 31758486a5fc ("octeontx2-pf: Add RSS multi group support")
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netsec: restore phy power state after controller reset

Since commit be515903fa81 ("net: socionext: Stop PHY before resetting
netsec") netsec_netdev_init() power downs phy before resetting the
controller. However, the state is not restored once the reset is
complete. As a result it is not possible to bring up network on a
platform with Broadcom BCM5482 phy.

Fix the issue by restoring phy power state after controller reset is
complete.

Fixes: be515903fa81 ("net: socionext: Stop PHY before resetting netsec")
Cc: stable@vger.kernel.org
Signed-off-by: Mian Yousaf Kaukab <ykaukab@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: weaken the v4mapped source check

This reverts commit 5c798a5509407913203c1475b7f45d14e2e230b2.

Commit 5c798a550940 ("ipv6: drop incoming packets having a v4mapped
source address") introduced an input check against v4mapped addresses.
Use of such addresses on the wire is indeed questionable and not
allowed on public Internet. As the commit pointed out

https://tools.ietf.org/html/draft-itojun-v6ops-v4mapped-harmful-02

lists potential issues.

Unfortunately there are applications which use v4mapped addresses,
and breaking them is a clear regression. For example v4mapped
addresses (or any semi-valid addresses, really) may be used
for uni-direction event streams or packet export.

Since the issue which sparked the addition of the check was with
TCP and request_socks in particular push the check down to TCPv6
and DCCP. This restores the ability to receive UDPv6 packets with
v4mapped address as the source.

Keep using the IPSTATS_MIB_INHDRERRORS statistic to minimize the
user-visible changes.

Fixes: 5c798a550940 ("ipv6: drop incoming packets having a v4mapped source address")
Reported-by: Sunyi Shao <sunyishao@fb.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftest/bpf: Add a test to check trampoline freeing logic.

Add a selftest for commit f7fede7bce91 ("bpf: Fix fexit trampoline.")
to make sure that attaching fexit prog to a sleeping kernel function
will trigger appropriate trampoline and program destruction.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210318004523.55908-1-alexei.starovoitov@gmail.com

Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Daniel Borkmann says:

====================
pull-request: bpf 2021-03-18

The following pull-request contains BPF updates for your *net* tree.

We've added 10 non-merge commits during the last 4 day(s) which contain
a total of 14 files changed, 336 insertions(+), 94 deletions(-).

The main changes are:

1) Fix fexit/fmod_ret trampoline for sleepable programs, and also fix a ftrace
splat in modify_ftrace_direct() on address change, from Alexei Starovoitov.

2) Fix two oob speculation possibilities that allows unprivileged to leak mem
via side-channel, from Piotr Krysiuk and Daniel Borkmann.

3) Fix libbpf's netlink handling wrt SOCK_CLOEXEC, from Kumar Kartikeya Dwivedi.

4) Fix libbpf's error handling on failure in getting section names, from Namhyung Kim.

5) Fix tunnel collect_md BPF selftest wrt Geneve option handling, from Hangbin Liu.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

netfilter: nftables: skip hook overlap logic if flowtable is stale

If the flowtable has been previously removed in this batch, skip the
hook overlap checks. This fixes spurious EEXIST errors when removing and
adding the flowtable in the same batch.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

libbpf: Use SOCK_CLOEXEC when opening the netlink socket

Otherwise, there exists a small window between the opening and closing
of the socket fd where it may leak into processes launched by some other
thread.

Fixes: 2e8290ccfe4a ("libbpf: add function to setup XDP")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20210317115857.6536-1-memxor@gmail.com

libbpf: Fix error path in bpf_object__elf_init()

When it failed to get section names, it should call into
bpf_object__elf_finish() like others.

Fixes: 310b9844216f ("libbpf: Factor out common ELF operations and improve logging")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210317145414.884817-1-namhyung@kernel.org

netfilter: flowtable: Make sure GC works periodically in idle system

Currently flowtable's GC work is initialized as deferrable, which
means GC cannot work on time when system is idle. So the hardware
offloaded flow may be deleted for timeout, since its used time is
not timely updated.

Resolve it by initializing the GC work as delayed work instead of
deferrable.

Fixes: c550be63df75 ("netfilter: nf_flow_table: hardware offload support")
Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com>
Signed-off-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nftables: allow to update flowtable flags

Honor flowtable flags from the control update path. Disallow disabling
to toggle hardware offload support though.

Fixes: 95761362366c ("netfilter: nf_tables: add flowtable offload control plane")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nftables: report EOPNOTSUPP on unsupported flowtable flags

Error was not set accordingly.

Fixes: 95761362366c ("netfilter: nf_tables: add flowtable offload control plane")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: conntrack: Fix gre tunneling over ipv6

This fix permits gre connections to be tracked within ip6tables rules

Signed-off-by: Ludovic Senecaux <linuxludo@free.fr>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

bpf: Fix fexit trampoline.

The fexit/fmod_ret programs can be attached to kernel functions that can sleep.
The synchronize_rcu_tasks() will not wait for such tasks to complete.
In such case the trampoline image will be freed and when the task
wakes up the return IP will point to freed memory causing the crash.
Solve this by adding percpu_ref_get/put for the duration of trampoline
and separate trampoline vs its image life times.
The "half page" optimization has to be removed, since
first_half->second_half->first_half transition cannot be guaranteed to
complete in deterministic time. Every trampoline update becomes a new image.
The image with fmod_ret or fexit progs will be freed via percpu_ref_kill and
call_rcu_tasks. Together they will wait for the original function and
trampoline asm to complete. The trampoline is patched from nop to jmp to skip
fexit progs. They are freed independently from the trampoline. The image with
fentry progs only will be freed via call_rcu_tasks_trace+call_rcu_tasks which
will wait for both sleepable and non-sleepable progs to complete.

Fixes: 2cefb9dd93cc ("bpf: Introduce BPF trampoline")
Reported-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Paul E. McKenney <paulmck@kernel.org> # for RCU
Link: https://lore.kernel.org/bpf/20210316210007.38949-1-alexei.starovoitov@gmail.com

net: fix race between napi kthread mode and busy poll

Currently, napi_thread_wait() checks for NAPI_STATE_SCHED bit to
determine if the kthread owns this napi and could call napi->poll() on
it. However, if socket busy poll is enabled, it is possible that the
busy poll thread grabs this SCHED bit (after the previous napi->poll()
invokes napi_complete_done() and clears SCHED bit) and tries to poll
on the same napi. napi_disable() could grab the SCHED bit as well.
This patch tries to fix this race by adding a new bit
NAPI_STATE_SCHED_THREADED in napi->state. This bit gets set in
____napi_schedule() if the threaded mode is enabled, and gets cleared
in napi_complete_done(), and we only poll the napi in kthread if this
bit is set. This helps distinguish the ownership of the napi between
kthread and other scenarios and fixes the race issue.

Fixes: 19a9abdd15a9 ("net: implement threaded-able napi poll loop support")
Reported-by: Martin Zaharinov <micron10@gmail.com>
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Wei Wang <weiwan@google.com>
Cc: Alexander Duyck <alexanderduyck@fb.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

bpf, selftests: Fix up some test_verifier cases for unprivileged

Fix up test_verifier error messages for the case where the original error
message changed, or for the case where pointer alu errors differ between
privileged and unprivileged tests. Also, add alternative tests for keeping
coverage of the original verifier rejection error message (fp alu), and
newly reject map_ptr += rX where rX == 0 given we now forbid alu on these
types for unprivileged. All test_verifier cases pass after the change. The
test case fixups were kept separate to ease backporting of core changes.

Signed-off-by: Piotr Krysiuk <piotras@gmail.com>
Co-developed-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>

bpf: Add sanity check for upper ptr_limit

Given we know the max possible value of ptr_limit at the time of retrieving
the latter, add basic assertions, so that the verifier can bail out if
anything looks odd and reject the program. Nothing triggered this so far,
but it also does not hurt to have these.

Signed-off-by: Piotr Krysiuk <piotras@gmail.com>
Co-developed-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>

Merge tag 'mac80211-for-net-2021-03-17' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211

Johannes Berg says:

====================
First round of fixes for 5.12-rc:
* HE (802.11ax) elements can be extended, handle that
* fix locking in network namespace changes that was
broken due to the RTNL-redux work
* various other small fixes
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net/sched: cls_flower: fix only mask bit check in the validate_ct_state

The ct_state validate should not only check the mask bit and also
check mask_bit & key_bit..
For the +new+est case example, The 'new' and 'est' bits should be
set in both state_mask and state flags. Or the -new-est case also
will be reject by kernel.
When Openvswitch with two flows
ct_state=+trk+new,action=commit,forward
ct_state=+trk+est,action=forward

A packet go through the kernel and the contrack state is invalid,
The ct_state will be +trk-inv. Upcall to the ovs-vswitchd, the
finally dp action will be drop with -new-est+trk.

Fixes: f544e47316a3 ("net/sched: cls_flower: Reject invalid ct_state flags rules")
Fixes: 62564537eb6c ("net/sched: cls_flower: validate ct_state for invalid and reply flags")
Signed-off-by: wenxu <wenxu@ucloud.cn>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ionic: linearize tso skb with too many frags

We were linearizing non-TSO skbs that had too many frags, but
we weren't checking number of frags on TSO skbs. This could
lead to a bad page reference when we received a TSO skb with
more frags than the Tx descriptor could support.

v2: use gso_segs rather than yet another division
don't rework the check on the nr_frags

Fixes: 5931eb5d68ae ("ionic: Add Tx and Rx handling")
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>

bpf: Simplify alu_limit masking for pointer arithmetic

Instead of having the mov32 with aux->alu_limit - 1 immediate, move this
operation to retrieve_ptr_limit() instead to simplify the logic and to
allow for subsequent sanity boundary checks inside retrieve_ptr_limit().
This avoids in future that at the time of the verifier masking rewrite
we'd run into an underflow which would not sign extend due to the nature
of mov32 instruction.

Signed-off-by: Piotr Krysiuk <piotras@gmail.com>
Co-developed-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>

bpf: Fix off-by-one for area size in creating mask to left

retrieve_ptr_limit() computes the ptr_limit for registers with stack and
map_value type. ptr_limit is the size of the memory area that is still
valid / in-bounds from the point of the current position and direction
of the operation (add / sub). This size will later be used for masking
the operation such that attempting out-of-bounds access in the speculative
domain is redirected to remain within the bounds of the current map value.

When masking to the right the size is correct, however, when masking to
the left, the size is off-by-one which would lead to an incorrect mask
and thus incorrect arithmetic operation in the non-speculative domain.
Piotr found that if the resulting alu_limit value is zero, then the
BPF_MOV32_IMM() from the fixup_bpf_calls() rewrite will end up loading
0xffffffff into AX instead of sign-extending to the full 64 bit range,
and as a result, this allows abuse for executing speculatively out-of-
bounds loads against 4GB window of address space and thus extracting the
contents of kernel memory via side-channel.

Fixes: 422fca2ce03f ("bpf: prevent out of bounds speculation on pointer arithmetic")
Signed-off-by: Piotr Krysiuk <piotras@gmail.com>
Co-developed-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>

bpf: Prohibit alu ops for pointer types not defining ptr_limit

The purpose of this patch is to streamline error propagation and in particular
to propagate retrieve_ptr_limit() errors for pointer types that are not defining
a ptr_limit such that register-based alu ops against these types can be rejected.

The main rationale is that a gap has been identified by Piotr in the existing
protection against speculatively out-of-bounds loads, for example, in case of
ctx pointers, unprivileged programs can still perform pointer arithmetic. This
can be abused to execute speculatively out-of-bounds loads without restrictions
and thus extract contents of kernel memory.

Fix this by rejecting unprivileged programs that attempt any pointer arithmetic
on unprotected pointer types. The two affected ones are pointer to ctx as well
as pointer to map. Field access to a modified ctx' pointer is rejected at a
later point in time in the verifier, and bf379367a6f9 ("bpf: Permit map_ptr
arithmetic with opcode add and offset 0") only relevant for root-only use cases.
Risk of unprivileged program breakage is considered very low.

Fixes: bf379367a6f9 ("bpf: Permit map_ptr arithmetic with opcode add and offset 0")
Fixes: 6b913f15182a ("bpf: prevent out-of-bounds speculation")
Signed-off-by: Piotr Krysiuk <piotras@gmail.com>
Co-developed-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>

ftrace: Fix modify_ftrace_direct.

The following sequence of commands:
  register_ftrace_direct(ip, addr1);
  modify_ftrace_direct(ip, addr1, addr2);
  unregister_ftrace_direct(ip, addr2);
will cause the kernel to warn:
[   30.179191] WARNING: CPU: 2 PID: 1961 at kernel/trace/ftrace.c:5223 unregister_ftrace_direct+0x130/0x150
[   30.180556] CPU: 2 PID: 1961 Comm: test_progs    W  O      5.12.0-rc2-00378-g86bc10a0a711-dirty #3246
[   30.182453] RIP: 0010:unregister_ftrace_direct+0x130/0x150

When modify_ftrace_direct() changes the addr from old to new it should update
the addr stored in ftrace_direct_funcs. Otherwise the final
unregister_ftrace_direct() won't find the address and will cause the splat.

Fixes: b9c2cbf5ea3d ("ftrace: Add modify_ftrace_direct()")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Link: https://lore.kernel.org/bpf/20210316195815.34714-1-alexei.starovoitov@gmail.com

MAINTAINERS: Update Spidernet network driver

Change the Spidernet network driver from supported to
maintained, add the linuxppc-dev ML, and add myself as
a 'maintainer'.

Cc: Ishizaki Kou <kou.ishizaki@toshiba.co.jp>
Signed-off-by: Geoff Levand <geoff@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'nfp-fixes'

Simon Horman says:

====================
Fixes for nfp pre_tunnel code

Louis Peens says:

The following set of patches fixes up a few bugs in the pre_tun
decap code paths which has been hiding for a while.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: flower: fix pre_tun mask id allocation

pre_tun_rule flows does not follow the usual add-flow path, instead
they are used to update the pre_tun table on the firmware. This means
that if the mask-id gets allocated here the firmware will never see the
"NFP_FL_META_FLAG_MANAGE_MASK" flag for the specific mask id, which
triggers the allocation on the firmware side. This leads to the firmware
mask being corrupted and causing all sorts of strange behaviour.

Fixes: 2d37ba78dc45 ("nfp: flower: offload pre-tunnel rules")
Signed-off-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: flower: add ipv6 bit to pre_tunnel control message

Differentiate between ipv4 and ipv6 flows when configuring the pre_tunnel
table to prevent them trampling each other in the table.

Fixes: 715782139ab5 ("nfp: flower: update flow merge code to support IPv6 tunnels")
Signed-off-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: flower: fix unsupported pre_tunnel flows

There are some pre_tunnel flows combinations which are incorrectly being
offloaded without proper support, fix these.

- Matching on MPLS is not supported for pre_tun.
- Match on IPv4/IPv6 layer must be present.
- Destination MAC address must match pre_tun.dev MAC

Fixes: ef3807c69f65 ("nfp: flower: verify pre-tunnel rules")
Signed-off-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: broadcom: BCM4908_ENET should not default to y, unconditionally

Merely enabling compile-testing should not enable additional code.
To fix this, restrict the automatic enabling of BCM4908_ENET to
ARCH_BCM4908.

Fixes: b2b0261f04e79481 ("net: broadcom: bcm4908enet: add BCM4908 controller driver")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Rafał Miłecki <rafal@milecki.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/sched: act_api: fix miss set post_ct for ovs after do conntrack in act_ct

When openvswitch conntrack offload with act_ct action. The first rule
do conntrack in the act_ct in tc subsystem. And miss the next rule in
the tc and fallback to the ovs datapath but miss set post_ct flag
which will lead the ct_state_key with -trk flag.

Fixes: 4c72a1db37dc ("net/sched: cls_flower add CT_FLAGS_INVALID flag support")
Signed-off-by: wenxu <wenxu@ucloud.cn>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'linux-can-fixes-for-5.12-20210316' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can

Marc Kleine-Budde says:

====================
pull-request: can 2021-03-16

this is a pull request of 11 patches for net/master.

The first patch is by Martin Willi and fixes the deletion of network
name spaces with physical CAN interfaces in them.

The next two patches are by me an fix the ISOTP protocol, to ensure
that unused flags in classical CAN frames are properly initialized to
zero.

Stephane Grosjean contributes a patch for the pcan_usb_fd driver,
which add MODULE_SUPPORTED_DEVICE lines for two supported devices.

Angelo Dureghello's patch for the flexcan driver fixes a potential div
by zero, if the bitrate is not set during driver probe.

Jimmy Assarsson's patch for the kvaser_pciefd disables bus load
reporting in the device, if it was previously enabled by the vendor's
out of tree drier. A patch for the kvaser_usb adds support for a new
device, by adding the appropriate USB product ID.

Tong Zhang contributes two patches for the c_can driver. First a
use-after-free in the c_can_pci driver is fixed, in the second patch
the runtime PM for the c_can_pci is fixed by moving the runtime PM
enable/disable from the core driver to the platform driver.

The last two patches are by Torin Cooper-Bennun for the m_can driver.
First a extraneous msg loss warning is removed then he fixes the
RX-path, which might be blocked by errors.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

docs: net: ena: Fix ena_start_xmit() function name typo

The ena.rst documentation referred to end_start_xmit() when it should refer
to ena_start_xmit(). Fix the typo.

Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Acked-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests/net: fix warnings on reuseaddr_ports_exhausted

Fix multiple warnings seen with gcc 10.2.1:
reuseaddr_ports_exhausted.c:32:41: warning: missing braces around initializer [-Wmissing-braces]
   32 | struct reuse_opts unreusable_opts[12] = {
      |                                         ^
   33 |  {0, 0, 0, 0},
      |   {   } {   }

Fixes: d44f9d6a6de0 ("selftests: net: Add SO_REUSEADDR test to check if 4-tuples are fully utilized.")
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

wireless/nl80211: fix wdev_id may be used uninitialized

Build currently fails with -Werror=maybe-uninitialized set:

net/wireless/nl80211.c: In function '__cfg80211_wdev_from_attrs':
net/wireless/nl80211.c:124:44: error: 'wdev_id' may be used
uninitialized in this function [-Werror=maybe-uninitialized]

Easy fix is to just initialize wdev_id to 0, since it's value doesn't
otherwise matter unless have_wdev_id is true.

Fixes: 31793ce79203 ("cfg80211: avoid holding the RTNL when calling the driver")
CC: Johannes Berg <johannes@sipsolutions.net>
CC: "David S. Miller" <davem@davemloft.net>
CC: Jakub Kicinski <kuba@kernel.org>
CC: linux-wireless@vger.kernel.org
CC: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Link: https://lore.kernel.org/r/20210312163651.1398207-1-jarod@redhat.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

mac80211: choose first enabled channel for monitor

Even if the first channel from sband channel list is invalid
or disabled mac80211 ends up choosing it as the default channel
for monitor interfaces, making them not usable.

Fix this by assigning the first available valid or enabled
channel instead.

Signed-off-by: Karthikeyan Kathirvel <kathirve@codeaurora.org>
Link: https://lore.kernel.org/r/1615440547-7661-1-git-send-email-kathirve@codeaurora.org
[reword commit message, comment, code cleanups]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

nl80211: fix locking for wireless device netns change

We have all the network interfaces marked as netns-local
since the only reasonable thing to do right now is to set
a whole device, including all netdevs, into a different
network namespace. For this reason, we also have our own
way of changing the network namespace.

Unfortunately, the RTNL locking changes broke this, and
it now results in many RTNL assertions. The trivial fix
for those (just hold RTNL for the changes) however leads
to deadlocks in the cfg80211 netdev notifier.

Since we only need the wiphy, and that's still protected
by the RTNL, add a new NL80211_FLAG_NO_WIPHY_MTX flag to
the nl80211 ops and use it to _not_ take the wiphy mutex
but only the RTNL. This way, the notifier does all the
work necessary during unregistration/registration of the
netdevs from the old and in the new namespace.

Reported-by: Sid Hayn <sidhayn@gmail.com>
Fixes: 31793ce79203 ("cfg80211: avoid holding the RTNL when calling the driver")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Link: https://lore.kernel.org/r/20210310215839.eadf7c43781b.I5fc6cf6676f800ab8008e03bbea9c3349b02d804@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

mac80211: Check crypto_aead_encrypt for errors

crypto_aead_encrypt returns <0 on error, so if these calls are not checked,
execution may continue with failed encrypts. It also seems that these two
crypto_aead_encrypt calls are the only instances in the codebase that are
not checked for errors.

Signed-off-by: Daniel Phan <daniel.phan36@gmail.com>
Link: https://lore.kernel.org/r/20210309204137.823268-1-daniel.phan36@gmail.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

mac80211: Allow HE operation to be longer than expected.

We observed some Cisco APs sending the following HE Operation IE in
associate response:

ff 0a 24 f4 3f 00 01 fc ff 00 00 00

Its HE operation parameter is 0x003ff4, so the expected total length is
7 which does not match the actual length = 10. This causes association
failing with "HE AP is missing HE Capability/operation."

According to P802.11ax_D4 Table9-94, HE operation is extensible, and
according to 802.11-2016 10.27.8, STA should discard the part beyond
the maximum length and parse the truncated element.

Allow HE operation element to be longer than expected to handle this
case and future extensions.

Fixes: bfb6fca62971 ("mac80211: refactor extended element parsing")
Signed-off-by: Brian Norris <briannorris@chromium.org>
Signed-off-by: Yen-lin Lai <yenlinlai@chromium.org>
Link: https://lore.kernel.org/r/20210223051926.2653301-1-yenlinlai@chromium.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

mac80211: minstrel_ht: remove unused variable 'mg'

This probably came in through some refactoring and what is
now a call to minstrel_ht_group_min_rate_offset(), remove
the unused variable.

Reported-by: kernel test robot <lkp@intel.com>
Acked-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20210219105744.f2538a80f6cf.I3d53554c158d5b896ac07ea546bceac67372ec28@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

mac80211: fix double free in ibss_leave

Clear beacon ie pointer and ie length after free
in order to prevent double free.

==================================================================
BUG: KASAN: double-free or invalid-free \
in ieee80211_ibss_leave+0x83/0xe0 net/mac80211/ibss.c:1876

CPU: 0 PID: 8472 Comm: syz-executor100 Not tainted 5.11.0-rc6-syzkaller #0
Call Trace:
__dump_stack lib/dump_stack.c:79 [inline]
dump_stack+0x107/0x163 lib/dump_stack.c:120
print_address_description.constprop.0.cold+0x5b/0x2c6 mm/kasan/report.c:230
kasan_report_invalid_free+0x51/0x80 mm/kasan/report.c:355
____kasan_slab_free+0xcc/0xe0 mm/kasan/common.c:341
kasan_slab_free include/linux/kasan.h:192 [inline]
__cache_free mm/slab.c:3424 [inline]
kfree+0xed/0x270 mm/slab.c:3760
ieee80211_ibss_leave+0x83/0xe0 net/mac80211/ibss.c:1876
rdev_leave_ibss net/wireless/rdev-ops.h:545 [inline]
__cfg80211_leave_ibss+0x19a/0x4c0 net/wireless/ibss.c:212
__cfg80211_leave+0x327/0x430 net/wireless/core.c:1172
cfg80211_leave net/wireless/core.c:1221 [inline]
cfg80211_netdev_notifier_call+0x9e8/0x12c0 net/wireless/core.c:1335
notifier_call_chain+0xb5/0x200 kernel/notifier.c:83
call_netdevice_notifiers_info+0xb5/0x130 net/core/dev.c:2040
call_netdevice_notifiers_extack net/core/dev.c:2052 [inline]
call_netdevice_notifiers net/core/dev.c:2066 [inline]
__dev_close_many+0xee/0x2e0 net/core/dev.c:1586
__dev_close net/core/dev.c:1624 [inline]
__dev_change_flags+0x2cb/0x730 net/core/dev.c:8476
dev_change_flags+0x8a/0x160 net/core/dev.c:8549
dev_ifsioc+0x210/0xa70 net/core/dev_ioctl.c:265
dev_ioctl+0x1b1/0xc40 net/core/dev_ioctl.c:511
sock_do_ioctl+0x148/0x2d0 net/socket.c:1060
sock_ioctl+0x477/0x6a0 net/socket.c:1177
vfs_ioctl fs/ioctl.c:48 [inline]
__do_sys_ioctl fs/ioctl.c:753 [inline]
__se_sys_ioctl fs/ioctl.c:739 [inline]
__x64_sys_ioctl+0x193/0x200 fs/ioctl.c:739
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9

Reported-by: syzbot+93976391bf299d425f44@syzkaller.appspotmail.com
Signed-off-by: Markus Theil <markus.theil@tu-ilmenau.de>
Link: https://lore.kernel.org/r/20210213133653.367130-1-markus.theil@tu-ilmenau.de
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

mac80211: fix rate mask reset

Coverity reported the strange "if (~...)" condition that's
always true. It suggested that ! was intended instead of ~,
but upon further analysis I'm convinced that what really was
intended was a comparison to 0xff/0xffff (in HT/VHT cases
respectively), since this indicates that all of the rates
are enabled.

Change the comparison accordingly.

I'm guessing this never really mattered because a reset to
not having a rate mask is basically equivalent to having a
mask that enables all rates.

Reported-by: Colin Ian King <colin.king@canonical.com>
Fixes: 2a8e69a47694 ("mac80211: fix and optimize MCS mask handling")
Fixes: a723b79b54ca ("mac80211: add rate mask logic for vht rates")
Reviewed-by: Colin Ian King <colin.king@canonical.com>
Link: https://lore.kernel.org/r/20210212112213.36b38078f569.I8546a20c80bc1669058eb453e213630b846e107b@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

can: m_can: m_can_rx_peripheral(): fix RX being blocked by errors

For M_CAN peripherals, m_can_rx_handler() was called with quota = 1,
which caused any error handling to block RX from taking place until
the next time the IRQ handler is called. This had been observed to
cause RX to be blocked indefinitely in some cases.

This is fixed by calling m_can_rx_handler with a sensibly high quota.

Fixes: 15c73136c866 ("can: m_can: Create a m_can platform framework")
Link: https://lore.kernel.org/r/20210303144350.4093750-1-torin@maxiluxsystems.com
Suggested-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Torin Cooper-Bennun <torin@maxiluxsystems.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: m_can: m_can_do_rx_poll(): fix extraneous msg loss warning

Message loss from RX FIFO 0 is already handled in
m_can_handle_lost_msg(), with netdev output included.

Removing this warning also improves driver performance under heavy
load, where m_can_do_rx_poll() may be called many times before this
interrupt is cleared, causing this message to be output many
times (thanks Mariusz Madej for this report).

Fixes: 60471552e2d6 ("can: m_can: add Bosch M_CAN controller support")
Link: https://lore.kernel.org/r/20210303103151.3760532-1-torin@maxiluxsystems.com
Reported-by: Mariusz Madej <mariusz.madej@xtrack.com>
Signed-off-by: Torin Cooper-Bennun <torin@maxiluxsystems.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: c_can: move runtime PM enable/disable to c_can_platform

Currently doing modprobe c_can_pci will make the kernel complain:

Unbalanced pm_runtime_enable!

this is caused by pm_runtime_enable() called before pm is initialized.

This fix is similar to 115dd8be4aa0, move those pm_enable/disable code
to c_can_platform.

Fixes: f5f8c42b2433 ("can: c_can: Add runtime PM support to Bosch C_CAN/D_CAN controller")
Link: http://lore.kernel.org/r/20210302025542.987600-1-ztong0001@gmail.com
Signed-off-by: Tong Zhang <ztong0001@gmail.com>
Tested-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: c_can_pci: c_can_pci_remove(): fix use-after-free

There is a UAF in c_can_pci_remove(). dev is released by
free_c_can_dev() and is used by pci_iounmap(pdev, priv->base) later.
To fix this issue, save the mmio address before releasing dev.

Fixes: d2cfc30675ba ("c_can_pci: generic module for C_CAN/D_CAN on PCI")
Link: https://lore.kernel.org/r/20210301024512.539039-1-ztong0001@gmail.com
Signed-off-by: Tong Zhang <ztong0001@gmail.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: kvaser_usb: Add support for USBcan Pro 4xHS

Add support for Kvaser USBcan Pro 4xHS.

Link: https://lore.kernel.org/r/20210309091724.31262-2-jimmyassarsson@gmail.com
Signed-off-by: Jimmy Assarsson <extja@kvaser.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: kvaser_pciefd: Always disable bus load reporting

Under certain circumstances, when switching from Kvaser's linuxcan driver
(kvpciefd) to the SocketCAN driver (kvaser_pciefd), the bus load reporting
is not disabled.
This is flooding the kernel log with prints like:
[3485.574677] kvaser_pciefd 0000:02:00.0: Received unexpected packet type 0x00000009

Always put the controller in the expected state, instead of assuming that
bus load reporting is inactive.

Note: If bus load reporting is enabled when the driver is loaded, you will
still get a number of bus load packages (and printouts), before it is
disabled.

Fixes: 52f7080e9d84 ("can: kvaser_pciefd: Add driver for Kvaser PCIEcan devices")
Link: https://lore.kernel.org/r/20210309091724.31262-1-jimmyassarsson@gmail.com
Signed-off-by: Jimmy Assarsson <extja@kvaser.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: flexcan: flexcan_chip_freeze(): fix chip freeze for missing bitrate

For cases when flexcan is built-in, bitrate is still not set at
registering. So flexcan_chip_freeze() generates:

[    1.860000] *** ZERO DIVIDE ***   FORMAT=4
[    1.860000] Current process id is 1
[    1.860000] BAD KERNEL TRAP: 00000000
[    1.860000] PC: [<402e70c8>] flexcan_chip_freeze+0x1a/0xa8

To allow chip freeze, using an hardcoded timeout when bitrate is still
not set.

Fixes: 6c15f0256dc9 ("can: flexcan: enable RX FIFO after FRZ/HALT valid")
Link: https://lore.kernel.org/r/20210315231510.650593-1-angelo@kernel-space.org
Signed-off-by: Angelo Dureghello <angelo@kernel-space.org>
[mkl: use if instead of ? operator]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: peak_usb: add forgotten supported devices

Since the peak_usb driver also supports the CAN-USB interfaces
"PCAN-USB X6" and "PCAN-Chip USB" from PEAK-System GmbH, this patch adds
their names to the list of explicitly supported devices.

Fixes: 4d074aad1b4a ("can: usb: Add support of PCAN-Chip USB stamp module")
Fixes: 79812bfd1764 ("can: peak: Add support for PCAN-USB X6 USB interface")
Link: https://lore.kernel.org/r/20210309082128.23125-3-s.grosjean@peak-system.com
Signed-off-by: Stephane Grosjean <s.grosjean@peak-system.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: isotp: TX-path: ensure that CAN frame flags are initialized

The previous patch ensures that the TX flags (struct
can_isotp_ll_options::tx_flags) are 0 for classic CAN frames or a user
configured value for CAN-FD frames.

This patch sets the CAN frames flags unconditionally to the ISO-TP TX
flags, so that they are initialized to a proper value. Otherwise when
running "candump -x" on a classical CAN ISO-TP stream shows wrongly
set "B" and "E" flags.

| $ candump any,0:0,#FFFFFFFF -extA
| [...]
| can0  TX B E  713   [8]  2B 0A 0B 0C 0D 0E 0F 00
| can0  TX B E  713   [8]  2C 01 02 03 04 05 06 07
| can0  TX B E  713   [8]  2D 08 09 0A 0B 0C 0D 0E
| can0  TX B E  713   [8]  2E 0F 00 01 02 03 04 05

Fixes: 3f9425168a78 ("can: add ISO 15765-2:2016 transport protocol")
Link: https://lore.kernel.org/r/20210218215434.1708249-2-mkl@pengutronix.de
Cc: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: isotp: isotp_setsockopt(): only allow to set low level TX flags for CAN-FD

CAN-FD frames have struct canfd_frame::flags, while classic CAN frames
don't.

This patch refuses to set TX flags (struct
can_isotp_ll_options::tx_flags) on non CAN-FD isotp sockets.

Fixes: 3f9425168a78 ("can: add ISO 15765-2:2016 transport protocol")
Link: https://lore.kernel.org/r/20210218215434.1708249-2-mkl@pengutronix.de
Cc: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: dev: Move device back to init netns on owning netns delete

When a non-initial netns is destroyed, the usual policy is to delete
all virtual network interfaces contained, but move physical interfaces
back to the initial netns. This keeps the physical interface visible
on the system.

CAN devices are somewhat special, as they define rtnl_link_ops even
if they are physical devices. If a CAN interface is moved into a
non-initial netns, destroying that netns lets the interface vanish
instead of moving it back to the initial netns. default_device_exit()
skips CAN interfaces due to having rtnl_link_ops set. Reproducer:

  ip netns add foo
  ip link set can0 netns foo
  ip netns delete foo

WARNING: CPU: 1 PID: 84 at net/core/dev.c:11030 ops_exit_list+0x38/0x60
CPU: 1 PID: 84 Comm: kworker/u4:2 Not tainted 5.10.19 #1
Workqueue: netns cleanup_net
[<c010e700>] (unwind_backtrace) from [<c010a1d8>] (show_stack+0x10/0x14)
[<c010a1d8>] (show_stack) from [<c086dc10>] (dump_stack+0x94/0xa8)
[<c086dc10>] (dump_stack) from [<c086b938>] (__warn+0xb8/0x114)
[<c086b938>] (__warn) from [<c086ba10>] (warn_slowpath_fmt+0x7c/0xac)
[<c086ba10>] (warn_slowpath_fmt) from [<c0629f20>] (ops_exit_list+0x38/0x60)
[<c0629f20>] (ops_exit_list) from [<c062a5c4>] (cleanup_net+0x230/0x380)
[<c062a5c4>] (cleanup_net) from [<c0142c20>] (process_one_work+0x1d8/0x438)
[<c0142c20>] (process_one_work) from [<c0142ee4>] (worker_thread+0x64/0x5a8)
[<c0142ee4>] (worker_thread) from [<c0148a98>] (kthread+0x148/0x14c)
[<c0148a98>] (kthread) from [<c0100148>] (ret_from_fork+0x14/0x2c)

To properly restore physical CAN devices to the initial netns on owning
netns exit, introduce a flag on rtnl_link_ops that can be set by drivers.
For CAN devices setting this flag, default_device_exit() considers them
non-virtual, applying the usual namespace move.

The issue was introduced in the commit mentioned below, as at that time
CAN devices did not have a dellink() operation.

Fixes: ffb7ee60ddd8 ("net: Simplfy default_device_exit and improve batching.")
Link: https://lore.kernel.org/r/20210302122423.872326-1-martin@strongswan.org
Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

mptcp: fix ADD_ADDR HMAC in case port is specified

Currently, Linux computes the HMAC contained in ADD_ADDR sub-option using
the Address Id and the IP Address, and hardcodes a destination port equal
to zero. This is not ok for ADD_ADDR with port: ensure to account for the
endpoint port when computing the HMAC, in compliance with RFC8684 §3.4.1.

Fixes: 1d0d3d44ede7 ("mptcp: add port support for ADD_ADDR suboption writing")
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Acked-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: relookup sock for RST+ACK packets handled by obsolete req sock

Currently tcp_check_req can be called with obsolete req socket for which big
socket have been already created (because of CPU race or early demux
assigning req socket to multiple packets in gro batch).

Commit 2b027671dde9cb6409d9 ("tcp: try to keep packet if SYN_RCV race
is lost") added retry in case when tcp_check_req is called for PSH|ACK packet.
But if client sends RST+ACK immediatly after connection being
established (it is performing healthcheck, for example) retry does not
occur. In that case tcp_check_req tries to close req socket,
leaving big socket active.

Fixes: 2b027671dde ("tcp: try to keep packet if SYN_RCV race is lost")
Signed-off-by: Alexander Ovechkin <ovov@yandex-team.ru>
Reported-by: Oleg Senin <olegsenin@yandex-team.ru>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: better validate user input in tipc_nl_retrieve_key()

Before calling tipc_aead_key_size(ptr), we need to ensure
we have enough data to dereference ptr->keylen.

We probably also want to make sure tipc_aead_key_size()
wont overflow with malicious ptr->keylen values.

Syzbot reported:

BUG: KMSAN: uninit-value in __tipc_nl_node_set_key net/tipc/node.c:2971 [inline]
BUG: KMSAN: uninit-value in tipc_nl_node_set_key+0x9bf/0x13b0 net/tipc/node.c:3023
CPU: 0 PID: 21060 Comm: syz-executor.5 Not tainted 5.11.0-rc7-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:79 [inline]
dump_stack+0x21c/0x280 lib/dump_stack.c:120
kmsan_report+0xfb/0x1e0 mm/kmsan/kmsan_report.c:118
__msan_warning+0x5f/0xa0 mm/kmsan/kmsan_instr.c:197
__tipc_nl_node_set_key net/tipc/node.c:2971 [inline]
tipc_nl_node_set_key+0x9bf/0x13b0 net/tipc/node.c:3023
genl_family_rcv_msg_doit net/netlink/genetlink.c:739 [inline]
genl_family_rcv_msg net/netlink/genetlink.c:783 [inline]
genl_rcv_msg+0x1319/0x1610 net/netlink/genetlink.c:800
netlink_rcv_skb+0x6fa/0x810 net/netlink/af_netlink.c:2494
genl_rcv+0x63/0x80 net/netlink/genetlink.c:811
netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline]
netlink_unicast+0x11d6/0x14a0 net/netlink/af_netlink.c:1330
netlink_sendmsg+0x1740/0x1840 net/netlink/af_netlink.c:1919
sock_sendmsg_nosec net/socket.c:652 [inline]
sock_sendmsg net/socket.c:672 [inline]
____sys_sendmsg+0xcfc/0x12f0 net/socket.c:2345
___sys_sendmsg net/socket.c:2399 [inline]
__sys_sendmsg+0x714/0x830 net/socket.c:2432
__compat_sys_sendmsg net/compat.c:347 [inline]
__do_compat_sys_sendmsg net/compat.c:354 [inline]
__se_compat_sys_sendmsg+0xa7/0xc0 net/compat.c:351
__ia32_compat_sys_sendmsg+0x4a/0x70 net/compat.c:351
do_syscall_32_irqs_on arch/x86/entry/common.c:79 [inline]
__do_fast_syscall_32+0x102/0x160 arch/x86/entry/common.c:141
do_fast_syscall_32+0x6a/0xc0 arch/x86/entry/common.c:166
do_SYSENTER_32+0x73/0x90 arch/x86/entry/common.c:209
entry_SYSENTER_compat_after_hwframe+0x4d/0x5c
RIP: 0023:0xf7f60549
Code: 03 74 c0 01 10 05 03 74 b8 01 10 06 03 74 b4 01 10 07 03 74 b0 01 10 08 03 74 d8 01 00 00 00 00 00 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 8d b4 26 00 00 00 00 8d b4 26 00 00 00 00
RSP: 002b:00000000f555a5fc EFLAGS: 00000296 ORIG_RAX: 0000000000000172
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000020000200
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000

Uninit was created at:
kmsan_save_stack_with_flags mm/kmsan/kmsan.c:121 [inline]
kmsan_internal_poison_shadow+0x5c/0xf0 mm/kmsan/kmsan.c:104
kmsan_slab_alloc+0x8d/0xe0 mm/kmsan/kmsan_hooks.c:76
slab_alloc_node mm/slub.c:2907 [inline]
__kmalloc_node_track_caller+0xa37/0x1430 mm/slub.c:4527
__kmalloc_reserve net/core/skbuff.c:142 [inline]
__alloc_skb+0x2f8/0xb30 net/core/skbuff.c:210
alloc_skb include/linux/skbuff.h:1099 [inline]
netlink_alloc_large_skb net/netlink/af_netlink.c:1176 [inline]
netlink_sendmsg+0xdbc/0x1840 net/netlink/af_netlink.c:1894
sock_sendmsg_nosec net/socket.c:652 [inline]
sock_sendmsg net/socket.c:672 [inline]
____sys_sendmsg+0xcfc/0x12f0 net/socket.c:2345
___sys_sendmsg net/socket.c:2399 [inline]
__sys_sendmsg+0x714/0x830 net/socket.c:2432
__compat_sys_sendmsg net/compat.c:347 [inline]
__do_compat_sys_sendmsg net/compat.c:354 [inline]
__se_compat_sys_sendmsg+0xa7/0xc0 net/compat.c:351
__ia32_compat_sys_sendmsg+0x4a/0x70 net/compat.c:351
do_syscall_32_irqs_on arch/x86/entry/common.c:79 [inline]
__do_fast_syscall_32+0x102/0x160 arch/x86/entry/common.c:141
do_fast_syscall_32+0x6a/0xc0 arch/x86/entry/common.c:166
do_SYSENTER_32+0x73/0x90 arch/x86/entry/common.c:209
entry_SYSENTER_compat_after_hwframe+0x4d/0x5c

Fixes: 8f23ce1cbbee ("tipc: add support for AEAD key setting via netlink")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tuong Lien <tuong.t.lien@dektech.com.au>
Cc: Jon Maloy <jmaloy@redhat.com>
Cc: Ying Xue <ying.xue@windriver.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phylink: Fix phylink_err() function name error in phylink_major_config

if pl->mac_ops->mac_finish() failed, phylink_err should use
"mac_finish" instead of "mac_prepare".

Fixes: 0e77ad11c559b ("net: phylink: re-implement interface configuration with PCS")
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hdlc_x25: Prevent racing between "x25_close" and "x25_xmit"/"x25_rx"

"x25_close" is called by "hdlc_close" in "hdlc.c", which is called by
hardware drivers' "ndo_stop" function.
"x25_xmit" is called by "hdlc_start_xmit" in "hdlc.c", which is hardware
drivers' "ndo_start_xmit" function.
"x25_rx" is called by "hdlc_rcv" in "hdlc.c", which receives HDLC frames
from "net/core/dev.c".

"x25_close" races with "x25_xmit" and "x25_rx" because their callers race.

However, we need to ensure that the LAPB APIs called in "x25_xmit" and
"x25_rx" are called before "lapb_unregister" is called in "x25_close".

This patch adds locking to ensure when "x25_xmit" and "x25_rx" are doing
their work, "lapb_unregister" is not yet called in "x25_close".

Reasons for not solving the racing between "x25_close" and "x25_xmit" by
calling "netif_tx_disable" in "x25_close":
1. We still need to solve the racing between "x25_close" and "x25_rx";
2. The design of the HDLC subsystem assumes the HDLC hardware drivers
have full control over the TX queue, and the HDLC protocol drivers (like
this driver) have no control. Controlling the queue here in the protocol
driver may interfere with hardware drivers' control of the queue.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Xie He <xie.he.0141@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netfilter: ctnetlink: fix dump of the expect mask attribute

Before this change, the mask is never included in the netlink message, so
"conntrack -E expect" always prints 0.0.0.0.

In older kernels the l3num callback struct was passed as argument, based
on tuple->src.l3num. After the l3num indirection got removed, the call
chain is based on m.src.l3num, but this value is 0xffff.

Init l3num to the correct value.

Fixes: d4d47e24afda6 ("netfilter: conntrack: remove ctnetlink callbacks from l3 protocol trackers")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: x_tables: Use correct memory barriers.

When a new table value was assigned, it was followed by a write memory
barrier. This ensured that all writes before this point would complete
before any writes after this point. However, to determine whether the
rules are unused, the sequence counter is read. To ensure that all
writes have been done before these reads, a full memory barrier is
needed, not just a write memory barrier. The same argument applies when
incrementing the counter, before the rules are read.

Changing to using smp_mb() instead of smp_wmb() fixes the kernel panic
reported in d59e3d10acbd (which is still present), while still
maintaining the same speed of replacing tables.

The smb_mb() barriers potentially slow the packet path, however testing
has shown no measurable change in performance on a 4-core MIPS64
platform.

Fixes: a52d52e33bbd ("netfilter: get rid of atomic ops in fast path")
Signed-off-by: Mark Tomlinson <mark.tomlinson@alliedtelesis.co.nz>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>