git.baikalelectronics.ru Git

ixgbe: Add locking to prevent panic when setting sriov_numvfs to zero

It is possible to disable VFs while the PF driver is processing requests
from the VF driver.  This can result in a panic.

BUG: unable to handle kernel paging request at 000000000000106c
PGD 0 P4D 0
Oops: 0000 [#1] SMP NOPTI
CPU: 8 PID: 0 Comm: swapper/8 Kdump: loaded Tainted: G I      --------- -
Hardware name: Dell Inc. PowerEdge R740/06WXJT, BIOS 2.8.2 08/27/2020
RIP: 0010:ixgbe_msg_task+0x4c8/0x1690 [ixgbe]
Code: 00 00 48 8d 04 40 48 c1 e0 05 89 7c 24 24 89 fd 48 89 44 24 10 83 ff
01 0f 84 b8 04 00 00 4c 8b 64 24 10 4d 03 a5 48 22 00 00 <41> 80 7c 24 4c
00 0f 84 8a 03 00 00 0f b7 c7 83 f8 08 0f 84 8f 0a
RSP: 0018:ffffb337869f8df8 EFLAGS: 00010002
RAX: 0000000000001020 RBX: 0000000000000000 RCX: 000000000000002b
RDX: 0000000000000002 RSI: 0000000000000008 RDI: 0000000000000006
RBP: 0000000000000006 R08: 0000000000000002 R09: 0000000000029780
R10: 00006957d8f42832 R11: 0000000000000000 R12: 0000000000001020
R13: ffff8a00e8978ac0 R14: 000000000000002b R15: ffff8a00e8979c80
FS:  0000000000000000(0000) GS:ffff8a07dfd00000(0000) knlGS:00000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000000106c CR3: 0000000063e10004 CR4: 00000000007726e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
<IRQ>
? ttwu_do_wakeup+0x19/0x140
? try_to_wake_up+0x1cd/0x550
? ixgbevf_update_xcast_mode+0x71/0xc0 [ixgbevf]
ixgbe_msix_other+0x17e/0x310 [ixgbe]
__handle_irq_event_percpu+0x40/0x180
handle_irq_event_percpu+0x30/0x80
handle_irq_event+0x36/0x53
handle_edge_irq+0x82/0x190
handle_irq+0x1c/0x30
do_IRQ+0x49/0xd0
common_interrupt+0xf/0xf

This can be eventually be reproduced with the following script:

while :
do
    echo 63 > /sys/class/net/<devname>/device/sriov_numvfs
    sleep 1
    echo 0 > /sys/class/net/<devname>/device/sriov_numvfs
    sleep 1
done

Add lock when disabling SR-IOV to prevent process VF mailbox communication.

Fixes: 313640a3faa4 ("ixgbe: Fix memory leak when SR-IOV VFs are direct assigned")
Signed-off-by: Piotr Skajewski <piotrx.skajewski@intel.com>
Tested-by: Marek Szlosek <marek.szlosek@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20220715214456.2968711-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

i40e: Fix erroneous adapter reinitialization during recovery process

Fix an issue when driver incorrectly detects state
of recovery process and erroneously reinitializes interrupts,
which results in a kernel error and call trace message.

The issue was caused by a combination of two factors:
1. Assuming the EMP reset issued after completing
firmware recovery means the whole recovery process is complete.
2. Erroneous reinitialization of interrupt vector after detecting
the above mentioned EMP reset.

Fixes (1) by changing how recovery state change is detected
and (2) by adjusting the conditional expression to ensure using proper
interrupt reinitialization method, depending on the situation.

Fixes: bdbcceb6036b ("i40e: Introduce recovery mode support")
Signed-off-by: Dawid Lukwinski <dawid.lukwinski@intel.com>
Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20220715214542.2968762-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ethernet: mtk_eth_soc: fix off by one check of ARRAY_SIZE

In mtk_wed_tx_ring_setup(.., int idx, ..), idx is used as an index here
struct mtk_wed_ring *ring = &dev->tx_ring[idx];

The bounds of idx are checked here
BUG_ON(idx > ARRAY_SIZE(dev->tx_ring));

If idx is the size of the array, it will pass this check and overflow.
So change the check to >= .

Fixes: 7e7e0fcc79f5 ("net: ethernet: mtk_eth_soc: add support for Wireless Ethernet Dispatch (WED)")
Signed-off-by: Tom Rix <trix@redhat.com>
Link: https://lore.kernel.org/r/20220716214654.1540240-1-trix@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-lan966x-fix-issues-with-mac-table'

Horatiu Vultur says:

====================
net: lan966x: Fix issues with MAC table

The patch series fixes 2 issues:
- when an entry was forgotten the irq thread was holding a spin lock and then
was talking also rtnl_lock.
- the access to the HW MAC table is indirect, so the access to the HW MAC
table was not synchronized, which means that there could be race conditions.
====================

Link: https://lore.kernel.org/r/20220714194040.231651-1-horatiu.vultur@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: lan966x: Fix usage of lan966x->mac_lock when used by FDB

When the SW bridge was trying to add/remove entries to/from HW, the
access to HW was not protected by any lock. In this way, it was
possible to have race conditions.
Fix this by using the lan966x->mac_lock to protect parallel access to HW
for this cases.

Fixes: a30cb7956ca54 ("net: lan966x: More MAC table functionality")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: lan966x: Fix usage of lan966x->mac_lock inside lan966x_mac_irq_handler

The problem with this spin lock is that it was just protecting the list
of the MAC entries in SW and not also the access to the MAC entries in HW.
Because the access to HW is indirect, then it could happen to have race
conditions.
For example when SW introduced an entry in MAC table and the irq mac is
trying to read something from the MAC.
Update such that also the access to MAC entries in HW is protected by
this lock.

Fixes: 611e23fc3576c ("net: lan966x: add support for interrupts from analyzer")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: lan966x: Fix usage of lan966x->mac_lock when entry is removed

To remove an entry to the MAC table, it is required first to setup the
entry and then issue a command for the MAC to forget the entry.
So if it happens for two threads to remove simultaneously an entry
in MAC table then it would be a race condition.
Fix this by using lan966x->mac_lock to protect the HW access.

Fixes: 178db1b95c6da ("net: lan966x: add mactable support")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: lan966x: Fix usage of lan966x->mac_lock when entry is added

To add an entry to the MAC table, it is required first to setup the
entry and then issue a command for the MAC to learn the entry.
So if it happens for two threads to add simultaneously an entry in MAC
table then it would be a race condition.
Fix this by using lan966x->mac_lock to protect the HW access.

Fixes: bf67919ecce7a ("net: lan966x: Add function lan966x_mac_ip_learn()")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: lan966x: Fix taking rtnl_lock while holding spin_lock

When the HW deletes an entry in MAC table then it generates an
interrupt. The SW will go through it's own list of MAC entries and if it
is not found then it would notify the listeners about this. The problem
is that when the SW will go through it's own list it would take a spin
lock(lan966x->mac_lock) and when it notifies that the entry is deleted.
But to notify the listeners it taking the rtnl_lock which is illegal.

This is fixed by instead of notifying right away that the entry is
deleted, move the entry on a temp list and once, it checks all the
entries then just notify that the entries from temp list are deleted.

Fixes: 611e23fc3576 ("net: lan966x: add support for interrupts from analyzer")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'dsa-docs'

Vladimir Oltean says:

====================
Update DSA documentation

These are some updates of dsa.rst, since it hasn't kept up with
development (in some cases, even since 2017). I've added Fixes: tags as
I thought was appropriate.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

docs: net: dsa: mention that VLANs are now refcounted on shared ports

The blamed commit updated the way in which VLANs are handled at the
cross-chip notifier layer and didn't update the documentation to say
that. Fix it.

Fixes: 0993bf3c00ee ("net: dsa: add explicit support for host bridge VLANs")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

docs: net: dsa: delete misinformation about -EOPNOTSUPP for FDB/MDB/VLAN

Returning -EOPNOTSUPP does *NOT* mean anything special.

port_vlan_add() is actually called from 2 code paths, one is
vlan_vid_add() from 8021q module and the other is
br_switchdev_port_vlan_add() from switchdev.

The bridge has a wrapper __vlan_vid_add() which first tries via
switchdev, then if that returns -EOPNOTSUPP, tries again via the VLAN RX
filters in the 8021q module. But DSA doesn't distinguish between one
call path and the other when calling the driver's port_vlan_add(), so if
the driver returns -EOPNOTSUPP to switchdev, it also returns -EOPNOTSUPP
to the 8021q module. And the latter is a hard error.

port_fdb_add() is called from the deferred dsa_owq only, so obviously
its return code isn't propagated anywhere, and cannot be interpreted in
any way.

The return code from port_mdb_add() is propagated to the bridge, but
again, this doesn't do anything special when -EOPNOTSUPP is returned,
but rather, br_switchdev_mdb_notify() returns void.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

docs: net: dsa: re-explain what port_fdb_dump actually does

Switchdev has changed radically from its initial implementation, and the
currently provided definition is incorrect and very confusing.

Rewrite it in light of what it actually does.

Fixes: c5a7177734a1 ("net: dsa: Move FDB dump implementation inside DSA")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

docs: net: dsa: add a section for address databases

The given definition for what VID 0 represents in the current
port_fdb_add and port_mdb_add is blatantly wrong. Delete it and explain
the concepts surrounding DSA's understanding of FDB isolation.

Fixes: a49e06c72a76 ("net: dsa: request drivers to perform FDB isolation")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

docs: net: dsa: delete port_mdb_dump

This was deleted in 2017, stop documenting it.

Fixes: f1fb937416c2 ("net: dsa: Remove redundant MDB dump support")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

docs: net: dsa: remove port_vlan_dump

This was deleted in 2017, delete the obsolete documentation.

Fixes: 0a8921c02367 ("net: dsa: Remove support for bypass bridge port attributes/vlan set")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

docs: net: dsa: remove port_bridge_tx_fwd_offload

We've changed the API through which we can offload the bridge TX
forwarding process. Update the documentation in light of the removal of
2 DSA switch ops.

Fixes: c8a01eb9e6af ("net: dsa: add a "tx_fwd_offload" argument to ->port_bridge_join")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

docs: net: dsa: document port_fast_age

The provided information about FDB flushing is not really up to date.
The DSA core automatically calls port_fast_age() when necessary, and
drivers should just implement that rather than hooking it to
port_bridge_leave, port_stp_state_set and others.

Fixes: 865a2629426d ("net: dsa: add port fast ageing")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

docs: net: dsa: document port_setup and port_teardown

These methods were added without being documented, fix that.

Fixes: f4773de37dc0 ("net: dsa: tear down devlink port regions when tearing down the devlink port on error")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

docs: net: dsa: document the teardown method

A teardown method was added to dsa_switch_ops without being documented.
Do so now.

Fixes: a01c5b7bc76b ("net: dsa: Add teardown callback for drivers")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

docs: net: dsa: document change_tag_protocol

Support for changing the tagging protocol was added without this
operation being documented; do so now.

Fixes: 4f80384ace3d ("net: dsa: allow changing the tag protocol via the "tagging" device attribute")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

docs: net: dsa: add more info about the other arguments to get_tag_protocol

Changes were made to the prototype of get_tag_protocol without
describing at a high level what they are about. Update the documentation
to explain that.

Fixes: a78b6b2a4d4d ("net: dsa: Pass a port to get_tag_protocol()")
Fixes: 482c77afe59f ("net: dsa: Get information about stacked DSA protocol")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

docs: net: dsa: rename tag_protocol to get_tag_protocol

Since the blamed commit, the enum was turned into a function pointer and
also renamed. Update the documentation.

Fixes: af53e8a57fb9 ("net: dsa: Allow the DSA driver to indicate the tag protocol")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

docs: net: dsa: document the shutdown behavior

Document the changes that took place in the DSA core in the blamed
commit.

Fixes: db3f4e7cb878 ("net: dsa: be compatible with masters which unregister on shutdown")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

docs: net: dsa: update probing documentation

Since the blamed commit we don't have register_switch_driver() and
unregister_switch_driver() anymore. Additionally, the expected
dsa_register_switch() and dsa_unregister_switch() calls aren't
documented.

Update the probing section with the details of how things are currently
done.

Fixes: 0820904404b2 ("net: dsa: Remove legacy probing support")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'net-ipv4-sysctl-races-part-3'

Kuniyuki Iwashima says:

====================
sysctl: Fix data-races around ipv4_net_table (Round 3).

This series fixes data-races around 21 knobs after
igmp_link_local_mcast_reports in ipv4_net_table.

These 4 knobs are skipped because they are safe.

  - tcp_congestion_control: Safe with RCU and xchg().
  - tcp_available_congestion_control: Read only.
  - tcp_allowed_congestion_control: Safe with RCU and spinlock().
  - tcp_fastopen_key: Safe with RCU and xchg()

So, round 4 will start with fib_multipath_use_neigh.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: Fix data-races around sysctl_tcp_fastopen_blackhole_timeout.

While reading sysctl_tcp_fastopen_blackhole_timeout, it can be changed
concurrently. Thus, we need to add READ_ONCE() to its readers.

Fixes: 75282cd4bef8 ("net/tcp_fastopen: Disable active side TFO in certain scenarios")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: Fix data-races around sysctl_tcp_fastopen.

While reading sysctl_tcp_fastopen, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.

Fixes: b63569dc8a7c ("net-tcp: Fast Open base")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: Fix data-races around sysctl_max_syn_backlog.

While reading sysctl_max_syn_backlog, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: Fix a data-race around sysctl_tcp_tw_reuse.

While reading sysctl_tcp_tw_reuse, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: Fix a data-race around sysctl_tcp_notsent_lowat.

While reading sysctl_tcp_notsent_lowat, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 7f900a05a58b ("tcp: TCP_NOTSENT_LOWAT socket option")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: Fix data-races around some timeout sysctl knobs.

While reading these sysctl knobs, they can be changed concurrently.
Thus, we need to add READ_ONCE() to their readers.

  - tcp_retries1
  - tcp_retries2
  - tcp_orphan_retries
  - tcp_fin_timeout

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: Fix data-races around sysctl_tcp_reordering.

While reading sysctl_tcp_reordering, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: Fix data-races around sysctl_tcp_migrate_req.

While reading sysctl_tcp_migrate_req, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.

Fixes: 7a0099fdba7a ("net: Introduce net.ipv4.tcp_migrate_req.")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: Fix data-races around sysctl_tcp_syncookies.

While reading sysctl_tcp_syncookies, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: Fix data-races around sysctl_tcp_syn(ack)?_retries.

While reading sysctl_tcp_syn(ack)?_retries, they can be changed
concurrently. Thus, we need to add READ_ONCE() to their readers.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: Fix data-races around keepalive sysctl knobs.

While reading sysctl_tcp_keepalive_(time|probes|intvl), they can be changed
concurrently. Thus, we need to add READ_ONCE() to their readers.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

igmp: Fix data-races around sysctl_igmp_qrv.

While reading sysctl_igmp_qrv, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.

This test can be packed into a helper, so such changes will be in the
follow-up series after net is merged into net-next.

qrv ?: READ_ONCE(net->ipv4.sysctl_igmp_qrv);

Fixes: 2b909d3ac006 ("ipv4: implement igmp_qrv sysctl to tune igmp robustness variable")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

igmp: Fix data-races around sysctl_igmp_max_msf.

While reading sysctl_igmp_max_msf, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

igmp: Fix a data-race around sysctl_igmp_max_memberships.

While reading sysctl_igmp_max_memberships, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

igmp: Fix data-races around sysctl_igmp_llm_reports.

While reading sysctl_igmp_llm_reports, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.

This test can be packed into a helper, so such changes will be in the
follow-up series after net is merged into net-next.

if (ipv4_is_local_multicast(pmc->multiaddr) &&
!READ_ONCE(net->ipv4.sysctl_igmp_llm_reports))

Fixes: b8c6057b2ed5 ("IGMP: Inhibit reports for local multicast groups")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: prestera: acl: use proper mask for port selector

Adjusted as per packet processor documentation.
This allows to properly match 'indev' for clsact rules.

Fixes: 723dac92e706 ("net: prestera: acl: migrate to new vTCAM api")
Signed-off-by: Maksym Glubokiy <maksym.glubokiy@plvision.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/tls: Fix race in TLS device down flow

Socket destruction flow and tls_device_down function sync against each
other using tls_device_lock and the context refcount, to guarantee the
device resources are freed via tls_dev_del() by the end of
tls_device_down.

In the following unfortunate flow, this won't happen:
- refcount is decreased to zero in tls_device_sk_destruct.
- tls_device_down starts, skips the context as refcount is zero, going
  all the way until it flushes the gc work, and returns without freeing
  the device resources.
- only then, tls_device_queue_ctx_destruction is called, queues the gc
  work and frees the context's device resources.

Solve it by decreasing the refcount in the socket's destruction flow
under the tls_device_lock, for perfect synchronization.  This does not
slow down the common likely destructor flow, in which both the refcount
is decreased and the spinlock is acquired, anyway.

Fixes: e31ec02c9978 ("net/tls: Add generic NIC offload infrastructure")
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: stmmac: fix dma queue left shift overflow issue

When queue number is > 4, left shift overflows due to 32 bits
integer variable. Mask calculation is wrong for MTL_RXQ_DMA_MAP1.

If CONFIG_UBSAN is enabled, kernel dumps below warning:
[   10.363842] ==================================================================
[   10.363882] UBSAN: shift-out-of-bounds in /build/linux-intel-iotg-5.15-8e6Tf4/
linux-intel-iotg-5.15-5.15.0/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c:224:12
[   10.363929] shift exponent 40 is too large for 32-bit type 'unsigned int'
[   10.363953] CPU: 1 PID: 599 Comm: NetworkManager Not tainted 5.15.0-1003-intel-iotg
[   10.363956] Hardware name: ADLINK Technology Inc. LEC-EL/LEC-EL, BIOS 0.15.11 12/22/2021
[   10.363958] Call Trace:
[   10.363960]  <TASK>
[   10.363963]  dump_stack_lvl+0x4a/0x5f
[   10.363971]  dump_stack+0x10/0x12
[   10.363974]  ubsan_epilogue+0x9/0x45
[   10.363976]  __ubsan_handle_shift_out_of_bounds.cold+0x61/0x10e
[   10.363979]  ? wake_up_klogd+0x4a/0x50
[   10.363983]  ? vprintk_emit+0x8f/0x240
[   10.363986]  dwmac4_map_mtl_dma.cold+0x42/0x91 [stmmac]
[   10.364001]  stmmac_mtl_configuration+0x1ce/0x7a0 [stmmac]
[   10.364009]  ? dwmac410_dma_init_channel+0x70/0x70 [stmmac]
[   10.364020]  stmmac_hw_setup.cold+0xf/0xb14 [stmmac]
[   10.364030]  ? page_pool_alloc_pages+0x4d/0x70
[   10.364034]  ? stmmac_clear_tx_descriptors+0x6e/0xe0 [stmmac]
[   10.364042]  stmmac_open+0x39e/0x920 [stmmac]
[   10.364050]  __dev_open+0xf0/0x1a0
[   10.364054]  __dev_change_flags+0x188/0x1f0
[   10.364057]  dev_change_flags+0x26/0x60
[   10.364059]  do_setlink+0x908/0xc40
[   10.364062]  ? do_setlink+0xb10/0xc40
[   10.364064]  ? __nla_validate_parse+0x4c/0x1a0
[   10.364068]  __rtnl_newlink+0x597/0xa10
[   10.364072]  ? __nla_reserve+0x41/0x50
[   10.364074]  ? __kmalloc_node_track_caller+0x1d0/0x4d0
[   10.364079]  ? pskb_expand_head+0x75/0x310
[   10.364082]  ? nla_reserve_64bit+0x21/0x40
[   10.364086]  ? skb_free_head+0x65/0x80
[   10.364089]  ? security_sock_rcv_skb+0x2c/0x50
[   10.364094]  ? __cond_resched+0x19/0x30
[   10.364097]  ? kmem_cache_alloc_trace+0x15a/0x420
[   10.364100]  rtnl_newlink+0x49/0x70

This change fixes MTL_RXQ_DMA_MAP1 mask issue and channel/queue
mapping warning.

Fixes: a3778358def9 ("net: stmmac: mapping mtl rx to dma channel")
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=216195
Reported-by: Cedric Wassenaar <cedric@bytespeed.nl>
Signed-off-by: Junxiao Chang <junxiao.chang@intel.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: stmmac: switch to use interrupt for hw crosstimestamping

Using current implementation of polling mode, there is high chances we
will hit into timeout error when running phc2sys. Hence, update the
implementation of hardware crosstimestamping to use the MAC interrupt
service routine instead of polling for TSIS bit in the MAC Timestamp
Interrupt Status register to be set.

Cc: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Wong Vee Khee <vee.khee.wong@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp/udp: Make early_demux back namespacified.

Commit 1c00118470cf ("ipv4: namespacify ip_early_demux sysctl knob") made
it possible to enable/disable early_demux on a per-netns basis.  Then, we
introduced two knobs, tcp_early_demux and udp_early_demux, to switch it for
TCP/UDP in commit bb245b2f9d37 ("net: Add sysctl to toggle early demux for
tcp and udp").  However, the .proc_handler() was wrong and actually
disabled us from changing the behaviour in each netns.

We can execute early_demux if net.ipv4.ip_early_demux is on and each proto
.early_demux() handler is not NULL.  When we toggle (tcp|udp)_early_demux,
the change itself is saved in each netns variable, but the .early_demux()
handler is a global variable, so the handler is switched based on the
init_net's sysctl variable.  Thus, netns (tcp|udp)_early_demux knobs have
nothing to do with the logic.  Whether we CAN execute proto .early_demux()
is always decided by init_net's sysctl knob, and whether we DO it or not is
by each netns ip_early_demux knob.

This patch namespacifies (tcp|udp)_early_demux again.  For now, the users
of the .early_demux() handler are TCP and UDP only, and they are called
directly to avoid retpoline.  So, we can remove the .early_demux() handler
from inet6?_protos and need not dereference them in ip6?_rcv_finish_core().
If another proto needs .early_demux(), we can restore it at that time.

Fixes: bb245b2f9d37 ("net: Add sysctl to toggle early demux for tcp and udp")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20220713175207.7727-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2022-07-14

This series contains updates to e1000e and igc drivers.

Sasha re-enables GPT clock when exiting s0ix to prevent hardware unit
hang and reverts a workaround for this issue on e1000e.

Lennert Buytenhek restores checks for removed device while accessing
registers to prevent NULL pointer dereferences for igc.

* '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
  igc: Reinstate IGC_REMOVED logic and implement it properly
  Revert "e1000e: Fix possible HW unit hang after an s0ix exit"
  e1000e: Enable GPT clock before sending message to CSME
====================

Link: https://lore.kernel.org/r/20220714175857.933537-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: microchip: ksz_common: Fix refcount leak bug

In ksz_switch_register(), we should call of_node_put() for the
reference returned by of_get_child_by_name() which has increased
the refcount.

Fixes: 0c1355e3344d ("net: dsa: microchip: really look for phy-mode in port nodes")
Signed-off-by: Liang He <windhl@126.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://lore.kernel.org/r/20220714153138.375919-1-windhl@126.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'stmmac-dwmac-mediatec-clock-fix'

Biao Huang says:

====================
stmmac: dwmac-mediatek: fix clock issue

changes in v5:
1. add reivewd-by as Matthias's comments.
2. fix "warning: unused variable 'ret' [-Wunused-variable]" as Jakub's comments

changes in v4:
1. improve commit message and test ko insertion/remove as Matthias's comments.
2. add patch "net: stmmac: fix pm runtime issue in stmmac_dvr_remove()" to
fix vlan filter deletion issue.
3. add patch "net: stmmac: fix unbalanced ptp clock issue in suspend/resume flow"
to fix unbalanced ptp clock issue in suspend/resume flow.

changes in v3:
1. delete mediatek_dwmac_exit() since there is no operation in it,
as Matthias's comments.

changes in v2:
1. clock configuration is still needed in probe,
and invoke mediatek_dwmac_clks_config() instead.
2. update commit message.

v1:
remove duplicated clock configuration in init/exit.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: stmmac: fix unbalanced ptp clock issue in suspend/resume flow

Current stmmac driver will prepare/enable ptp_ref clock in
stmmac_init_tstamp_counter().

The stmmac_pltfr_noirq_suspend will disable it once in suspend flow.

But in resume flow,
stmmac_pltfr_noirq_resume --> stmmac_init_tstamp_counter
stmmac_resume --> stmmac_hw_setup --> stmmac_init_ptp --> stmmac_init_tstamp_counter
ptp_ref clock reference counter increases twice, which leads to unbalance
ptp clock when resume back.

Move ptp_ref clock prepare/enable out of stmmac_init_tstamp_counter to fix it.

Fixes: f0d8487270047 ("net: stmmac: skip only stmmac_ptp_register when resume from suspend")
Signed-off-by: Biao Huang <biao.huang@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: stmmac: fix pm runtime issue in stmmac_dvr_remove()

If netif is running when stmmac_dvr_remove is invoked,
the unregister_netdev will call ndo_stop(stmmac_release) and
vlan_kill_rx_filter(stmmac_vlan_rx_kill_vid).

Currently, stmmac_dvr_remove() will disable pm runtime before
unregister_netdev. When stmmac_vlan_rx_kill_vid is invoked,
pm_runtime_resume_and_get in it returns EACCESS error number,
and reports:

dwmac-mediatek 11021000.ethernet eth0: stmmac_dvr_remove: removing driver
dwmac-mediatek 11021000.ethernet eth0: FPE workqueue stop
dwmac-mediatek 11021000.ethernet eth0: failed to kill vid 0081/0

Move the pm_runtime_disable to the end of stmmac_dvr_remove
to fix this issue.

Fixes: 6ac6d03516e93 ("net: stmmac: properly handle with runtime pm in stmmac_dvr_remove()")
Signed-off-by: Biao Huang <biao.huang@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

stmmac: dwmac-mediatek: fix clock issue

The pm_runtime takes care of the clock handling in current
stmmac drivers, and dwmac-mediatek implement the
mediatek_dwmac_clks_config() as the callback for pm_runtime.

Then, stripping duplicated clocks handling in old init()/exit()
to fix clock issue in suspend/resume test.

As to clocks in probe/remove, vendor need symmetric handling to
ensure clocks balance.

Test pass, including suspend/resume and ko insertion/remove.

Fixes: 63ec7b48df49 ("stmmac: dwmac-mediatek: add platform level clocks management")
Signed-off-by: Biao Huang <biao.huang@mediatek.com>
Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'net-sysctl-races-round2'

Kuniyuki Iwashima says:

====================
sysctl: Fix data-races around ipv4_net_table (Round 2).

This series fixes data-races around 15 knobs after ip_default_ttl in
ipv4_net_table.

These two knobs are skipped.
  - ip_local_port_range is safe with its own lock.
  - ip_local_reserved_ports uses proc_do_large_bitmap(), which will need
    an additional lock and can be fixed later.

So, the next round will start with igmp_link_local_mcast_reports.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: Fix a data-race around sysctl_tcp_probe_interval.

While reading sysctl_tcp_probe_interval, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 35dc52cc3fd5 ("ipv4: Create probe timer for tcp PMTU as per RFC4821")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: Fix a data-race around sysctl_tcp_probe_threshold.

While reading sysctl_tcp_probe_threshold, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 030427eb5bce ("ipv4: Use binary search to choose tcp PMTU probe_size")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: Fix a data-race around sysctl_tcp_mtu_probe_floor.

While reading sysctl_tcp_mtu_probe_floor, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 35e07db7d9ad ("tcp: add new tcp_mtu_probe_floor sysctl")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: Fix data-races around sysctl_tcp_min_snd_mss.

While reading sysctl_tcp_min_snd_mss, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.

Fixes: 45150c376fa7 ("tcp: add tcp_min_snd_mss sysctl")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: Fix data-races around sysctl_tcp_base_mss.

While reading sysctl_tcp_base_mss, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.

Fixes: 5eb6c2893ed1 ("[TCP]: MTU probing")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: Fix data-races around sysctl_tcp_mtu_probing.

While reading sysctl_tcp_mtu_probing, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.

Fixes: 5eb6c2893ed1 ("[TCP]: MTU probing")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: Fix data-races around sysctl_tcp_l3mdev_accept.

While reading sysctl_tcp_l3mdev_accept, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.

Fixes: ddf3fa7765d7 ("net: Allow accepted sockets to be bound to l3mdev domain")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp/dccp: Fix a data-race around sysctl_tcp_fwmark_accept.

While reading sysctl_tcp_fwmark_accept, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: bbc0055d3fac ("net: support marking accepting TCP sockets")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ip: Fix a data-race around sysctl_fwmark_reflect.

While reading sysctl_fwmark_reflect, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: df3d62898c38 ("net: add a sysctl to reflect the fwmark on replies")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ip: Fix a data-race around sysctl_ip_autobind_reuse.

While reading sysctl_ip_autobind_reuse, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: f3f5e4293ace ("tcp: bind(0) remove the SO_REUSEADDR restriction when ephemeral ports are exhausted.")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ip: Fix data-races around sysctl_ip_nonlocal_bind.

While reading sysctl_ip_nonlocal_bind, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ip: Fix data-races around sysctl_ip_fwd_update_priority.

While reading sysctl_ip_fwd_update_priority, it can be changed
concurrently. Thus, we need to add READ_ONCE() to its readers.

Fixes: ae5de211f0d1 ("net: ipv4: Control SKB reprioritization after forwarding")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ip: Fix data-races around sysctl_ip_fwd_use_pmtu.

While reading sysctl_ip_fwd_use_pmtu, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.

Fixes: eafe8df89251 ("ipv4: introduce ip_dst_mtu_maybe_forward and protect forwarding path against pmtu spoofing")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ip: Fix data-races around sysctl_ip_no_pmtu_disc.

While reading sysctl_ip_no_pmtu_disc, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ip: Fix data-races around sysctl_ip_default_ttl.

While reading sysctl_ip_default_ttl, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

x86/speculation: Use DECLARE_PER_CPU for x86_spec_ctrl_current

Clang warns:

  arch/x86/kernel/cpu/bugs.c:58:21: error: section attribute is specified on redeclared variable [-Werror,-Wsection]
  DEFINE_PER_CPU(u64, x86_spec_ctrl_current);
                      ^
  arch/x86/include/asm/nospec-branch.h:283:12: note: previous declaration is here
  extern u64 x86_spec_ctrl_current;
             ^
  1 error generated.

The declaration should be using DECLARE_PER_CPU instead so all
attributes stay in sync.

Cc: stable@vger.kernel.org
Fixes: c8f7dd8a883c ("KVM: VMX: Prevent guest RSB poisoning attacks with eIBRS")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'net-5.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
"Including fixes from netfilter, bpf and wireless.

  Still no major regressions, the release continues to be calm. An
  uptick of fixes this time around due to trivial data race fixes and
  patches flowing down from subtrees.

  There has been a few driver fixes (particularly a few fixes for false
  positives due to f588aa3f3718 which went into -next in May!) that make
  me worry the wide testing is not exactly fully through.

  So "calm" but not "let's just cut the final ASAP" vibes over here.

  Current release - regressions:

   - wifi: rtw88: fix write to const table of channel parameters

  Current release - new code bugs:

   - mac80211: add gfp_t arg to ieeee80211_obss_color_collision_notify

   - mlx5:
      - TC, allow offload from uplink to other PF's VF
      - Lag, decouple FDB selection and shared FDB
      - Lag, correct get the port select mode str

   - bnxt_en: fix and simplify XDP transmit path

   - r8152: fix accessing unset transport header

  Previous releases - regressions:

   - conntrack: fix crash due to confirmed bit load reordering (after
     atomic -> refcount conversion)

   - stmmac: dwc-qos: disable split header for Tegra194

  Previous releases - always broken:

   - mlx5e: ring the TX doorbell on DMA errors

   - bpf: make sure mac_header was set before using it

   - mac80211: do not wake queues on a vif that is being stopped

   - mac80211: fix queue selection for mesh/OCB interfaces

   - ip: fix dflt addr selection for connected nexthop

   - seg6: fix skb checksums for SRH encapsulation/insertion

   - xdp: fix spurious packet loss in generic XDP TX path

   - bunch of sysctl data race fixes

   - nf_log: incorrect offset to network header

  Misc:

   - bpf: add flags arg to bpf_dynptr_read and bpf_dynptr_write APIs"

* tag 'net-5.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (87 commits)
  nfp: flower: configure tunnel neighbour on cmsg rx
  net/tls: Check for errors in tls_device_init
  MAINTAINERS: Add an additional maintainer to the AMD XGBE driver
  xen/netback: avoid entering xenvif_rx_next_skb() with an empty rx queue
  selftests/net: test nexthop without gw
  ip: fix dflt addr selection for connected nexthop
  net: atlantic: remove aq_nic_deinit() when resume
  net: atlantic: remove deep parameter on suspend/resume functions
  sfc: fix kernel panic when creating VF
  seg6: bpf: fix skb checksum in bpf_push_seg6_encap()
  seg6: fix skb checksum in SRv6 End.B6 and End.B6.Encaps behaviors
  seg6: fix skb checksum evaluation in SRH encapsulation/insertion
  sfc: fix use after free when disabling sriov
  net: sunhme: output link status with a single print.
  r8152: fix accessing unset transport header
  net: stmmac: fix leaks in probe
  net: ftgmac100: Hold reference returned by of_get_child_by_name()
  nexthop: Fix data-races around nexthop_compat_mode.
  ipv4: Fix data-races around sysctl_ip_dynaddr.
  tcp: Fix a data-race around sysctl_tcp_ecn_fallback.
  ...

Merge tag '5.19-rc6-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs fixes from Steve French:
"Three smb3 client fixes:

   - two multichannel fixes: fix a potential deadlock freeing a channel,
     and fix a race condition on failed creation of a new channel

   - mount failure fix: work around a server bug in some common older
     Samba servers by avoiding padding at the end of the negotiate
     protocol request"

* tag '5.19-rc6-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  smb3: workaround negprot bug in some Samba servers
  cifs: remove unnecessary locking of chan_lock while freeing session
  cifs: fix race condition with delayed threads

Merge tag 'nfsd-5.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd fixes from Chuck Lever:
"Notable regression fixes:

   - Enable SETATTR(time_create) to fix regression with Mac OS clients

   - Fix a lockd crasher and broken NLM UNLCK behavior"

* tag 'nfsd-5.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  lockd: fix nlm_close_files
  lockd: set fl_owner when unlocking files
  NFSD: Decode NFSv4 birth time attribute

Merge tag 'integrity-v5.19-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity

Pull integrity fixes from Mimi Zohar:
"Here are a number of fixes for recently found bugs.

  Only 'ima: fix violation measurement list record' was introduced in
  the current release. The rest address existing bugs"

* tag 'integrity-v5.19-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
  ima: Fix potential memory leak in ima_init_crypto()
  ima: force signature verification when CONFIG_KEXEC_SIG is configured
  ima: Fix a potential integer overflow in ima_appraise_measurement
  ima: fix violation measurement list record
  Revert "evm: Fix memleak in init_desc"

Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm

Pull ARM fixes from Russell King:

- quieten the spectre-bhb prints

- mark flattened device tree sections as shareable

- remove some obsolete CPU domain code and help text

- fix thumb unaligned access abort emulation

- fix amba_device_add() refcount underflow

- fix literal placement

* tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
  ARM: 9208/1: entry: add .ltorg directive to keep literals in range
  ARM: 9207/1: amba: fix refcount underflow if amba_device_add() fails
  ARM: 9214/1: alignment: advance IT state after emulating Thumb instruction
  ARM: 9213/1: Print message about disabled Spectre workarounds only once
  ARM: 9212/1: domain: Modify Kconfig help text
  ARM: 9211/1: domain: drop modify_domain()
  ARM: 9210/1: Mark the FDT_FIXED sections as shareable
  ARM: 9209/1: Spectre-BHB: avoid pr_info() every time a CPU comes out of idle

um: Replace to_phys() and to_virt() with less generic function names

The UML function names to_virt() and to_phys() are exposed by UML
headers, and are very generic and may be defined by drivers.  As it
turns out, commit 52dffcc0903f ("pmem: refactor pmem_clear_poison()")
did exactly that.

This results in build errors such as the following when trying to build
um:allmodconfig:

  drivers/nvdimm/pmem.c: In function ‘pmem_dax_zero_page_range’:
  ./arch/um/include/asm/page.h:105:20: error: too few arguments to function ‘to_phys’
    105 | #define __pa(virt) to_phys((void *) (unsigned long) (virt))
        |                    ^~~~~~~

Use less generic function names for the um specific to_phys() and
to_virt() functions to fix the problem and to avoid similar problems in
the future.

Fixes: 52dffcc0903f ("pmem: refactor pmem_clear_poison()")
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'sound-5.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"Hopefully the last one for 5.19. This became bigger than wished, but
  all changes are pretty device-specific small fixes, which look less
  worrisome.

  The majority of changes are about various ASoC fixes, while the usual
  HD-audio quirks are included as well"

* tag 'sound-5.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (28 commits)
  ALSA: hda/realtek - Enable the headset-mic on a Xiaomi's laptop
  ALSA: hda/realtek - Fix headset mic problem for a HP machine with alc221
  ALSA: hda/realtek: fix mute/micmute LEDs for HP machines
  ALSA: hda/realtek - Fix headset mic problem for a HP machine with alc671
  ALSA: hda - Add fixup for Dell Latitidue E5430
  ALSA: hda/conexant: Apply quirk for another HP ProDesk 600 G3 model
  ALSA: hda/realtek: Fix headset mic for Acer SF313-51
  ASoC: Intel: Skylake: Correct the handling of fmt_config flexible array
  ASoC: Intel: Skylake: Correct the ssp rate discovery in skl_get_ssp_clks()
  ASoC: rt5640: Fix the wrong state of JD1 and JD2
  ASoC: Intel: sof_rt5682: fix out-of-bounds array access
  ASoC: qdsp6: fix potential memory leak in q6apm_get_audioreach_graph()
  ASoC: tas2764: Fix amp gain register offset & default
  ASoC: tas2764: Correct playback volume range
  ASoC: tas2764: Fix and extend FSYNC polarity handling
  ASoC: tas2764: Add post reset delays
  ASoC: dt-bindings: Fix description for msm8916
  ASoC: doc: Capitalize RESET line name
  ASoC: arizona: Update arizona_aif_cfg_changed to use RX_BCLK_RATE
  ASoC: cs47l92: Fix event generation for OUT1 demux
  ...

nfp: flower: configure tunnel neighbour on cmsg rx

nfp_tun_write_neigh() function will configure a tunnel neighbour when
calling nfp_tun_neigh_event_handler() or nfp_flower_cmsg_process_one_rx()
(with no tunnel neighbour type) from firmware.

When configuring IP on physical port as a tunnel endpoint, no operation
will be performed after receiving the cmsg mentioned above.

Therefore, add a progress to configure tunnel neighbour in this case.

v2: Correct format of fixes tag.

Fixes: 5b89efd39f95 ("nfp: flower: rework tunnel neighbour configuration")
Signed-off-by: Tianyu Yuan <tianyu.yuan@corigine.com>
Reviewed-by: Louis Peens <louis.peens@corigine.com>
Reviewed-by: Baowen Zheng <baowen.zheng@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20220714081915.148378-1-simon.horman@corigine.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/tls: Check for errors in tls_device_init

Add missing error checks in tls_device_init.

Fixes: e31ec02c9978 ("net/tls: Add generic NIC offload infrastructure")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20220714070754.1428-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

MAINTAINERS: Add an additional maintainer to the AMD XGBE driver

Add Shyam Sundar S K as an additional maintainer to support the AMD XGBE
network device driver.

Cc: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/db367f24089c2bbbcd1cec8e21af49922017a110.1657751501.git.thomas.lendacky@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

xen/netback: avoid entering xenvif_rx_next_skb() with an empty rx queue

xenvif_rx_next_skb() is expecting the rx queue not being empty, but
in case the loop in xenvif_rx_action() is doing multiple iterations,
the availability of another skb in the rx queue is not being checked.

This can lead to crashes:

[40072.537261] BUG: unable to handle kernel NULL pointer dereference at 0000000000000080
[40072.537407] IP: xenvif_rx_skb+0x23/0x590 [xen_netback]
[40072.537534] PGD 0 P4D 0
[40072.537644] Oops: 0000 [#1] SMP NOPTI
[40072.537749] CPU: 0 PID: 12505 Comm: v1-c40247-q2-gu Not tainted 4.12.14-122.121-default #1 SLE12-SP5
[40072.537867] Hardware name: HP ProLiant DL580 Gen9/ProLiant DL580 Gen9, BIOS U17 11/23/2021
[40072.537999] task: ffff880433b38100 task.stack: ffffc90043d40000
[40072.538112] RIP: e030:xenvif_rx_skb+0x23/0x590 [xen_netback]
[40072.538217] RSP: e02b:ffffc90043d43de0 EFLAGS: 00010246
[40072.538319] RAX: 0000000000000000 RBX: ffffc90043cd7cd0 RCX: 00000000000000f7
[40072.538430] RDX: 0000000000000000 RSI: 0000000000000006 RDI: ffffc90043d43df8
[40072.538531] RBP: 000000000000003f R08: 000077ff80000000 R09: 0000000000000008
[40072.538644] R10: 0000000000007ff0 R11: 00000000000008f6 R12: ffffc90043ce2708
[40072.538745] R13: 0000000000000000 R14: ffffc90043d43ed0 R15: ffff88043ea748c0
[40072.538861] FS: 0000000000000000(0000) GS:ffff880484600000(0000) knlGS:0000000000000000
[40072.538988] CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033
[40072.539088] CR2: 0000000000000080 CR3: 0000000407ac8000 CR4: 0000000000040660
[40072.539211] Call Trace:
[40072.539319] xenvif_rx_action+0x71/0x90 [xen_netback]
[40072.539429] xenvif_kthread_guest_rx+0x14a/0x29c [xen_netback]

Fix that by stopping the loop in case the rx queue becomes empty.

Cc: stable@vger.kernel.org
Fixes: fbf81f0589a2 ("xen-netback: process guest rx packets in batches")
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Link: https://lore.kernel.org/r/20220713135322.19616-1-jgross@suse.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

amdgpu: disable powerpc support for the newer display engine

The DRM_AMD_DC_DCN display engine support (Raven, Navi, and newer) has
not been building cleanly on powerpc and causes link errors due to
mixing hard- and soft-float object files:

  powerpc64-linux-ld: drivers/gpu/drm/amd/amdgpu/../display/dc/dml/display_mode_lib.o uses hard float, drivers/gpu/drm/amd/amdgpu/../display/dc/dcn31/dcn31_resource.o uses soft float
  powerpc64-linux-ld: failed to merge target specific data of file drivers/gpu/drm/amd/amdgpu/../display/dc/dcn31/dcn31_resource.o
  [..]

and while patches are floating around, it's not exactly obvious what is
going on.

The problem bisects to commit face7150f53c ("powerpc: Book3S 64-bit
outline-only KASAN support") but that is probably more about changing
config variables than the fundamental cause.

Despite the bisection result, a more directly related commit seems to be
478148593acc ("drm/amd/display: move FPU related code from dcn31 to
dml/dcn31 folder").  It's probably a combination of the two.

This has been going on since the merge window, without any final word.
So instead of blindly applying patches that may or may not be the right
thing, let's disable this for now.

As Michael Ellerman says:
"IIUIC this code was never enabled on ppc before, so disabling it seems
  like a reasonable fix to get the build clean"

and once we have more actual feedback (and find any potential users) we
can always re-enable it with the patch that fixes the issues and
back-port as necessary.

Fixes: face7150f53c ("powerpc: Book3S 64-bit outline-only KASAN support")
Fixes: 478148593acc ("drm/amd/display: move FPU related code from dcn31 to dml/dcn31 folder")
Reported-and-tested-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/all/20220606153910.GA1773067@roeck-us.net/
Link: https://lore.kernel.org/all/20220618232737.2036722-1-linux@roeck-us.net/
Link: https://lore.kernel.org/all/20220713050724.GA2471738@roeck-us.net/
Acked-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

igc: Reinstate IGC_REMOVED logic and implement it properly

The initially merged version of the igc driver code (via commit
870e223aa06b, "igc: Add support for PF") contained the following
IGC_REMOVED checks in the igc_rd32/wr32() MMIO accessors:

u32 igc_rd32(struct igc_hw *hw, u32 reg)
{
u8 __iomem *hw_addr = READ_ONCE(hw->hw_addr);
u32 value = 0;

if (IGC_REMOVED(hw_addr))
return ~value;

value = readl(&hw_addr[reg]);

/* reads should not return all F's */
if (!(~value) && (!reg || !(~readl(hw_addr))))
hw->hw_addr = NULL;

return value;
}

And:

#define wr32(reg, val) \
do { \
u8 __iomem *hw_addr = READ_ONCE((hw)->hw_addr); \
if (!IGC_REMOVED(hw_addr)) \
writel((val), &hw_addr[(reg)]); \
} while (0)

E.g. igb has similar checks in its MMIO accessors, and has a similar
macro E1000_REMOVED, which is implemented as follows:

#define E1000_REMOVED(h) unlikely(!(h))

These checks serve to detect and take note of an 0xffffffff MMIO read
return from the device, which can be caused by a PCIe link flap or some
other kind of PCI bus error, and to avoid performing MMIO reads and
writes from that point onwards.

However, the IGC_REMOVED macro was not originally implemented:

#ifndef IGC_REMOVED
#define IGC_REMOVED(a) (0)
#endif /* IGC_REMOVED */

This led to the IGC_REMOVED logic to be removed entirely in a
subsequent commit (commit 83e0d364e517, "igc: remove IGC_REMOVED
function"), with the rationale that such checks matter only for
virtualization and that igc does not support virtualization -- but a
PCIe device can become detached even without virtualization being in
use, and without proper checks, a PCIe bus error affecting an igc
adapter will lead to various NULL pointer dereferences, as the first
access after the error will set hw->hw_addr to NULL, and subsequent
accesses will blindly dereference this now-NULL pointer.

This patch reinstates the IGC_REMOVED checks in igc_rd32/wr32(), and
implements IGC_REMOVED the way it is done for igb, by checking for the
unlikely() case of hw_addr being NULL. This change prevents the oopses
seen when a PCIe link flap occurs on an igc adapter.

Fixes: 870e223aa06b ("igc: Add support for PF")
Signed-off-by: Lennert Buytenhek <buytenh@arista.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Acked-by: Sasha Neftin <sasha.neftin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

Revert "e1000e: Fix possible HW unit hang after an s0ix exit"

This reverts commit dc14ff7ab84684b991da6fac778ebecb2fa2ca8a.

Commit dc14ff7ab846 ("e1000e: Fix possible HW unit hang after an s0ix
exit") was a workaround for CSME problem to handle messages comes via H2ME
mailbox. This problem has been fixed by patch "e1000e: Enable the GPT
clock before sending message to the CSME".

Fixes: 7c28b2aaf1e5 ("e1000e: Add handshake with the CSME to support S0ix")
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=214821
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

e1000e: Enable GPT clock before sending message to CSME

On corporate (CSME) ADL systems, the Ethernet Controller may stop working
("HW unit hang") after exiting from the s0ix state. The reason is that
CSME misses the message sent by the host. Enabling the dynamic GPT clock
solves this problem. This clock is cleared upon HW initialization.

Fixes: 7c28b2aaf1e5 ("e1000e: Add handshake with the CSME to support S0ix")
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=214821
Reviewed-by: Dima Ruinskiy <dima.ruinskiy@intel.com>
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Chia-Lin Kao (AceLan) <acelan.kao@canonical.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

selftests/net: test nexthop without gw

This test implement the scenario described in the commit
"ip: fix dflt addr selection for connected nexthop".
The test configures a nexthop object with an output device only (no gateway
address) and a route that uses this nexthop. The goal is to check if the
kernel selects a valid source address.

Link: https://lore.kernel.org/netdev/20220712095545.10947-1-nicolas.dichtel@6wind.com/
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Link: https://lore.kernel.org/r/20220713114853.29406-2-nicolas.dichtel@6wind.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

ip: fix dflt addr selection for connected nexthop

When a nexthop is added, without a gw address, the default scope was set
to 'host'. Thus, when a source address is selected, 127.0.0.1 may be chosen
but rejected when the route is used.

When using a route without a nexthop id, the scope can be configured in the
route, thus the problem doesn't exist.

To explain more deeply: when a user creates a nexthop, it cannot specify
the scope. To create it, the function nh_create_ipv4() calls fib_check_nh()
with scope set to 0. fib_check_nh() calls fib_check_nh_nongw() wich was
setting scope to 'host'. Then, nh_create_ipv4() calls
fib_info_update_nhc_saddr() with scope set to 'host'. The src addr is
chosen before the route is inserted.

When a 'standard' route (ie without a reference to a nexthop) is added,
fib_create_info() calls fib_info_update_nhc_saddr() with the scope set by
the user. iproute2 set the scope to 'link' by default.

Here is a way to reproduce the problem:
ip netns add foo
ip -n foo link set lo up
ip netns add bar
ip -n bar link set lo up
sleep 1

ip -n foo link add name eth0 type dummy
ip -n foo link set eth0 up
ip -n foo address add 192.168.0.1/24 dev eth0

ip -n foo link add name veth0 type veth peer name veth1 netns bar
ip -n foo link set veth0 up
ip -n bar link set veth1 up

ip -n bar address add 192.168.1.1/32 dev veth1
ip -n bar route add default dev veth1

ip -n foo nexthop add id 1 dev veth0
ip -n foo route add 192.168.1.1 nhid 1

Try to get/use the route:
> $ ip -n foo route get 192.168.1.1
> RTNETLINK answers: Invalid argument
> $ ip netns exec foo ping -c1 192.168.1.1
> ping: connect: Invalid argument

Try without nexthop group (iproute2 sets scope to 'link' by dflt):
ip -n foo route del 192.168.1.1
ip -n foo route add 192.168.1.1 dev veth0

Try to get/use the route:
> $ ip -n foo route get 192.168.1.1
> 192.168.1.1 dev veth0 src 192.168.0.1 uid 0
> cache
> $ ip netns exec foo ping -c1 192.168.1.1
> PING 192.168.1.1 (192.168.1.1) 56(84) bytes of data.
> 64 bytes from 192.168.1.1: icmp_seq=1 ttl=64 time=0.039 ms
>
> --- 192.168.1.1 ping statistics ---
> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
> rtt min/avg/max/mdev = 0.039/0.039/0.039/0.000 ms

CC: stable@vger.kernel.org
Fixes: 0b2837f003c1 ("nexthop: Add support for IPv4 nexthops")
Reported-by: Edwin Brossette <edwin.brossette@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Link: https://lore.kernel.org/r/20220713114853.29406-1-nicolas.dichtel@6wind.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

ARM: 9208/1: entry: add .ltorg directive to keep literals in range

LKP reports a build issue on Clang, related to a literal load of
__current issued through the ldr_va macro. This turns out to be due to
the fact that group relocations are disabled when CONFIG_COMPILE_TEST=y,
which means that the ldr_va macro resolves to a pair of LDR
instructions, the first one being a literal load issued too far from its
literal pool.

Due to the introduction of a couple of new uses of this macro in commit
6468297033cf769e ("ARM: 9195/1: entry: avoid explicit literal loads"),
the literal pools end up getting rearranged in a way that causes the
literal for __current to go out of range. Let's fix this up by putting a
.ltorg directive in a suitable place in the code.

Link: https://lore.kernel.org/all/202205290805.1vZLAr36-lkp@intel.com/
Fixes: 6468297033cf769e ("ARM: 9195/1: entry: avoid explicit literal loads")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>

ARM: 9207/1: amba: fix refcount underflow if amba_device_add() fails

"ARM: 9192/1: amba: fix memory leak in amba_device_try_add()" leads
to a refcount underflow if amba_device_add() fails, which called by
of_amba_device_create(), the of_amba_device_create() already exists
the error handling, so amba_put_device() only need to be added into
amba_deferred_retry().

Fixes: 19051358cbc0 ("ARM: 9192/1: amba: fix memory leak in amba_device_try_add()")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>

net: atlantic: remove aq_nic_deinit() when resume

aq_nic_deinit() has been called while suspending, so we don't have to call
it again on resume.
Actually, call it again leads to another hang issue when resuming from
S3.

Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992345] Call Trace:
Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992346] <TASK>
Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992348] aq_nic_deinit+0xb4/0xd0 [atlantic]
Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992356] aq_pm_thaw+0x7f/0x100 [atlantic]
Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992362] pci_pm_resume+0x5c/0x90
Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992366] ? pci_pm_thaw+0x80/0x80
Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992368] dpm_run_callback+0x4e/0x120
Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992371] device_resume+0xad/0x200
Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992373] async_resume+0x1e/0x40
Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992374] async_run_entry_fn+0x33/0x120
Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992377] process_one_work+0x220/0x3c0
Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992380] worker_thread+0x4d/0x3f0
Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992382] ? process_one_work+0x3c0/0x3c0
Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992384] kthread+0x12a/0x150
Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992386] ? set_kthread_struct+0x40/0x40
Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992387] ret_from_fork+0x22/0x30
Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992391] </TASK>
Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992392] ---[ end trace 1ec8c79604ed5e0d ]---
Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992394] PM: dpm_run_callback(): pci_pm_resume+0x0/0x90 returns -110
Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992397] atlantic 0000:02:00.0: PM: failed to resume async: error -110

Fixes: f74dd8aaea2d ("net: atlantic: always deep reset on pm op, fixing up my null deref regression")
Signed-off-by: Chia-Lin Kao (AceLan) <acelan.kao@canonical.com>
Link: https://lore.kernel.org/r/20220713111224.1535938-2-acelan.kao@canonical.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: atlantic: remove deep parameter on suspend/resume functions

Below commit claims that atlantic NIC requires to reset the device on pm
op, and had set the deep to true for all suspend/resume functions.
commit f74dd8aaea2d ("net: atlantic: always deep reset on pm op, fixing up my null deref regression")
So, we could remove deep parameter on suspend/resume functions without
any functional change.

Fixes: f74dd8aaea2d ("net: atlantic: always deep reset on pm op, fixing up my null deref regression")
Signed-off-by: Chia-Lin Kao (AceLan) <acelan.kao@canonical.com>
Link: https://lore.kernel.org/r/20220713111224.1535938-1-acelan.kao@canonical.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

sfc: fix kernel panic when creating VF

When creating VFs a kernel panic can happen when calling to
efx_ef10_try_update_nic_stats_vf.

When releasing a DMA coherent buffer, sometimes, I don't know in what
specific circumstances, it has to unmap memory with vunmap. It is
disallowed to do that in IRQ context or with BH disabled. Otherwise, we
hit this line in vunmap, causing the crash:
  BUG_ON(in_interrupt());

This patch reenables BH to release the buffer.

Log messages when the bug is hit:
kernel BUG at mm/vmalloc.c:2727!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 6 PID: 1462 Comm: NetworkManager Kdump: loaded Tainted: G          I      --------- ---  5.14.0-119.el9.x86_64 #1
Hardware name: Dell Inc. PowerEdge R740/06WXJT, BIOS 2.8.2 08/27/2020
RIP: 0010:vunmap+0x2e/0x30
...skip...
Call Trace:
  __iommu_dma_free+0x96/0x100
  efx_nic_free_buffer+0x2b/0x40 [sfc]
  efx_ef10_try_update_nic_stats_vf+0x14a/0x1c0 [sfc]
  efx_ef10_update_stats_vf+0x18/0x40 [sfc]
  efx_start_all+0x15e/0x1d0 [sfc]
  efx_net_open+0x5a/0xe0 [sfc]
  __dev_open+0xe7/0x1a0
  __dev_change_flags+0x1d7/0x240
  dev_change_flags+0x21/0x60
  ...skip...

Fixes: abaa2a47a31d ("sfc: DMA the VF stats only when requested")
Reported-by: Ma Yuying <yuma@redhat.com>
Signed-off-by: Íñigo Huguet <ihuguet@redhat.com>
Acked-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://lore.kernel.org/r/20220713092116.21238-1-ihuguet@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'seg6-fix-skb-checksum-for-srh-encapsulation-insertion'

Andrea Mayer says:

====================
seg6: fix skb checksum for SRH encapsulation/insertion

The Linux kernel supports Segment Routing Header (SRH)
encapsulation/insertion operations by providing the capability to: i)
encapsulate a packet in an outer IPv6 header with a specified SRH; ii)
insert a specified SRH directly after the IPv6 header of the packet.
Note that the insertion operation is also referred to as 'injection'.

The two operations are respectively supported by seg6_do_srh_encap() and
seg6_do_srh_inline(), which operate on the skb associated to the packet as
needed (e.g. adding the necessary headers and initializing them, while
taking care to recalculate the skb checksum).

seg6_do_srh_encap() and seg6_do_srh_inline() do not initialize the payload
length of the IPv6 header, which is carried out by the caller functions.
However, this approach causes the corruption of the skb checksum which
needs to be updated only after initialization of headers is completed
(thanks to Paolo Abeni for detecting this issue).

The patchset fixes the skb checksum corruption by moving the IPv6 header
payload length initialization from the callers of seg6_do_srh_encap() and
seg6_do_srh_inline() directly into these functions.

This patchset is organized as follows:
- patch 1/3, seg6: fix skb checksum evaluation in SRH
   encapsulation/insertion;
    (* SRH encapsulation/insertion available since v4.10)

- patch 2/3, seg6: fix skb checksum in SRv6 End.B6 and End.B6.Encaps
   behaviors;
    (* SRv6 End.B6 and End.B6.Encaps behaviors available since v4.14)

- patch 3/3, seg6: bpf: fix skb checksum in bpf_push_seg6_encap();
    (* bpf IPv6 Segment Routing helpers available since v4.18)

====================

Link: https://lore.kernel.org/r/20220712175837.16267-1-andrea.mayer@uniroma2.it
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

seg6: bpf: fix skb checksum in bpf_push_seg6_encap()

Both helper functions bpf_lwt_seg6_action() and bpf_lwt_push_encap() use
the bpf_push_seg6_encap() to encapsulate the packet in an IPv6 with Segment
Routing Header (SRH) or insert an SRH between the IPv6 header and the
payload.
To achieve this result, such helper functions rely on bpf_push_seg6_encap()
which, in turn, leverages seg6_do_srh_{encap,inline}() to perform the
required operation (i.e. encap/inline).

This patch removes the initialization of the IPv6 header payload length
from bpf_push_seg6_encap(), as it is now handled properly by
seg6_do_srh_{encap,inline}() to prevent corruption of the skb checksum.

Fixes: 58efdeab0536 ("bpf: Add IPv6 Segment Routing helpers")
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

seg6: fix skb checksum in SRv6 End.B6 and End.B6.Encaps behaviors

The SRv6 End.B6 and End.B6.Encaps behaviors rely on functions
seg6_do_srh_{encap,inline}() to, respectively: i) encapsulate the
packet within an outer IPv6 header with the specified Segment Routing
Header (SRH); ii) insert the specified SRH directly after the IPv6
header of the packet.

This patch removes the initialization of the IPv6 header payload length
from the input_action_end_b6{_encap}() functions, as it is now handled
properly by seg6_do_srh_{encap,inline}() to avoid corruption of the skb
checksum.

Fixes: 4afe673f2e87 ("ipv6: sr: implement several seg6local actions")
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

seg6: fix skb checksum evaluation in SRH encapsulation/insertion

Support for SRH encapsulation and insertion was introduced with
commit 7ab7a75aa79e ("ipv6: sr: add support for SRH encapsulation and
injection with lwtunnels"), through the seg6_do_srh_encap() and
seg6_do_srh_inline() functions, respectively.
The former encapsulates the packet in an outer IPv6 header along with
the SRH, while the latter inserts the SRH between the IPv6 header and
the payload. Then, the headers are initialized/updated according to the
operating mode (i.e., encap/inline).
Finally, the skb checksum is calculated to reflect the changes applied
to the headers.

The IPv6 payload length ('payload_len') is not initialized
within seg6_do_srh_{inline,encap}() but is deferred in seg6_do_srh(), i.e.
the caller of seg6_do_srh_{inline,encap}().
However, this operation invalidates the skb checksum, since the
'payload_len' is updated only after the checksum is evaluated.

To solve this issue, the initialization of the IPv6 payload length is
moved from seg6_do_srh() directly into the seg6_do_srh_{inline,encap}()
functions and before the skb checksum update takes place.

Fixes: 7ab7a75aa79e ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels")
Reported-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/all/20220705190727.69d532417be7438b15404ee1@uniroma2.it
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2022-07-12

This series contains updates to ice driver only.

Paul fixes detection of E822 devices for firmware update and changes NVM
read for snapshot creation to be done in chunks as some systems cannot
read the entire NVM in the allotted time.
====================

Link: https://lore.kernel.org/r/20220712164829.7275-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

sfc: fix use after free when disabling sriov

Use after free is detected by kfence when disabling sriov. What was read
after being freed was vf->pci_dev: it was freed from pci_disable_sriov
and later read in efx_ef10_sriov_free_vf_vports, called from
efx_ef10_sriov_free_vf_vswitching.

Set the pointer to NULL at release time to not trying to read it later.

Reproducer and dmesg log (note that kfence doesn't detect it every time):
$ echo 1 > /sys/class/net/enp65s0f0np0/device/sriov_numvfs
$ echo 0 > /sys/class/net/enp65s0f0np0/device/sriov_numvfs

BUG: KFENCE: use-after-free read in efx_ef10_sriov_free_vf_vswitching+0x82/0x170 [sfc]

Use-after-free read at 0x00000000ff3c1ba5 (in kfence-#224):
  efx_ef10_sriov_free_vf_vswitching+0x82/0x170 [sfc]
  efx_ef10_pci_sriov_disable+0x38/0x70 [sfc]
  efx_pci_sriov_configure+0x24/0x40 [sfc]
  sriov_numvfs_store+0xfe/0x140
  kernfs_fop_write_iter+0x11c/0x1b0
  new_sync_write+0x11f/0x1b0
  vfs_write+0x1eb/0x280
  ksys_write+0x5f/0xe0
  do_syscall_64+0x5c/0x80
  entry_SYSCALL_64_after_hwframe+0x44/0xae

kfence-#224: 0x00000000edb8ef95-0x00000000671f5ce1, size=2792, cache=kmalloc-4k

allocated by task 6771 on cpu 10 at 3137.860196s:
  pci_alloc_dev+0x21/0x60
  pci_iov_add_virtfn+0x2a2/0x320
  sriov_enable+0x212/0x3e0
  efx_ef10_sriov_configure+0x67/0x80 [sfc]
  efx_pci_sriov_configure+0x24/0x40 [sfc]
  sriov_numvfs_store+0xba/0x140
  kernfs_fop_write_iter+0x11c/0x1b0
  new_sync_write+0x11f/0x1b0
  vfs_write+0x1eb/0x280
  ksys_write+0x5f/0xe0
  do_syscall_64+0x5c/0x80
  entry_SYSCALL_64_after_hwframe+0x44/0xae

freed by task 6771 on cpu 12 at 3170.991309s:
  device_release+0x34/0x90
  kobject_cleanup+0x3a/0x130
  pci_iov_remove_virtfn+0xd9/0x120
  sriov_disable+0x30/0xe0
  efx_ef10_pci_sriov_disable+0x57/0x70 [sfc]
  efx_pci_sriov_configure+0x24/0x40 [sfc]
  sriov_numvfs_store+0xfe/0x140
  kernfs_fop_write_iter+0x11c/0x1b0
  new_sync_write+0x11f/0x1b0
  vfs_write+0x1eb/0x280
  ksys_write+0x5f/0xe0
  do_syscall_64+0x5c/0x80
  entry_SYSCALL_64_after_hwframe+0x44/0xae

Fixes: f6bd89c2dd99f ("sfc: create vports for VFs and assign random MAC addresses")
Reported-by: Yanghang Liu <yanghliu@redhat.com>
Signed-off-by: Íñigo Huguet <ihuguet@redhat.com>
Acked-by: Martin Habets <habetsm.xilinx@gmail.com>
Link: https://lore.kernel.org/r/20220712062642.6915-1-ihuguet@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

smb3: workaround negprot bug in some Samba servers

Mount can now fail to older Samba servers due to a server
bug handling padding at the end of the last negotiate
context (negotiate contexts typically are rounded up to 8
bytes by adding padding if needed). This server bug can
be avoided by switching the order of negotiate contexts,
placing a negotiate context at the end that does not
require padding (prior to the recent netname context fix
this was the case on the client).

Fixes: cc725151f990 ("smb3: fix empty netname context on secondary channels")
Reported-by: Julian Sikorski <belegdol@gmail.com>
Tested-by: Julian Sikorski <belegdol+github@gmail.com>
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>

vf/remap: return the amount of bytes actually deduplicated

When using the FIDEDUPRANGE ioctl, in case of success the requested size
is returned. In some cases this might not be the actual amount of bytes
deduplicated.

This change modifies vfs_dedupe_file_range() to report the actual amount
of bytes deduplicated, instead of the requested amount.

Link: https://lore.kernel.org/linux-fsdevel/5548ef63-62f9-4f46-5793-03165ceccacc@tu-darmstadt.de/
Reported-by: Ansgar Lößer <ansgar.loesser@kom.tu-darmstadt.de>
Reported-by: Max Schlecht <max.schlecht@informatik.hu-berlin.de>
Reported-by: Björn Scheuermann <scheuermann@kom.tu-darmstadt.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Darrick J Wong <djwong@kernel.org>
Signed-off-by: Ansgar Lößer <ansgar.loesser@kom.tu-darmstadt.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'cgroup-for-5.19-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup

Pull cgroup fix from Tejun Heo:
"Fix an old and subtle bug in the migration path.

  css_sets are used to track tasks and migrations are tasks moving from
  a group of css_sets to another group of css_sets. The migration path
  pins all source and destination css_sets in the prep stage.

  Unfortunately, it was overloading the same list_head entry to track
  sources and destinations, which got confused for migrations which are
  partially identity leading to use-after-frees.

  Fixed by using dedicated list_heads for tracking sources and
  destinations"

* tag 'cgroup-for-5.19-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: Use separate src/dst nodes when preloading css_sets for migration