Eric Dumazet [Thu, 30 Jun 2022 15:07:50 +0000 (15:07 +0000)]
net: add skb_[inner_]tcp_all_headers helpers
Most drivers use "skb_transport_offset(skb) + tcp_hdrlen(skb)"
to compute headers length for a TCP packet, but others
use more convoluted (but equivalent) ways.
Add skb_tcp_all_headers() and skb_inner_tcp_all_headers()
helpers to harmonize this a bit.
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Clément Léger [Wed, 29 Jun 2022 12:20:03 +0000 (14:20 +0200)]
net: pcs: rzn1-miic: update speed only if interface is changed
As stated by Russel King, miic_config() can be called as a result of
ethtool setting the configuration while the link is already up. Since
the speed is also set in this function, it could potentially modify
the current speed that is set. This will only happen if there is
no PHY present and we aren't using fixed-link mode.
Handle that by storing the current interface mode in the miic_port
structure and update the speed only if the interface mode is going to
be changed.
Bin Chen [Thu, 30 Jun 2022 11:21:55 +0000 (13:21 +0200)]
nfp: support VF rate limit with NFDK
Support VF rate limiting with NFDK by adding ndo_set_vf_rate to the NFDK
ops structure.
NFDK is used to communicate via PCIE to NFP-3800 based NICs
while NFD3 is used for other NICs supported by the NFP driver.
The VF rate limit feature is already supported by the driver for NFD3.
Signed-off-by: Bin Chen <bin.chen@corigine.com> Reviewed-by: Baowen Zheng <baowen.zheng@corigine.com> Reviewed-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Alaa Mohamed [Thu, 30 Jun 2022 10:24:49 +0000 (12:24 +0200)]
selftests: net: fib_rule_tests: fix support for running individual tests
parsing and usage of -t got missed in the previous patch.
this patch fixes it
Fixes: 00670779b23e ("selftests: net: fib_rule_tests: add support to select a test to run") Signed-off-by: Alaa Mohamed <eng.alaamohamedsoliman.am@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 1 Jul 2022 12:25:00 +0000 (13:25 +0100)]
Merge branch 'mptcp-mem-scheduling'
Mat Martineau says:
====================
mptcp: Updates for mem scheduling and SK_RECLAIM
In the "net: reduce tcp_memory_allocated inflation" series (merge commit 36a2db252d65), Eric Dumazet noted that "Removal of SK_RECLAIM_CHUNK and
SK_RECLAIM_THRESHOLD is left to MPTCP maintainers as a follow up."
Patches 1-3 align MPTCP with the above TCP changes to forward memory
allocation, reclaim, and memory scheduling.
Patch 4 removes the SK_RECLAIM_* macros as Eric requested.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Thu, 30 Jun 2022 22:17:57 +0000 (15:17 -0700)]
net: remove SK_RECLAIM_THRESHOLD and SK_RECLAIM_CHUNK
There are no more users for the mentioned macros, just
drop them.
Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Thu, 30 Jun 2022 22:17:56 +0000 (15:17 -0700)]
mptcp: refine memory scheduling
Similar to commit 91efb764f795 ("net: fix sk_wmem_schedule() and
sk_rmem_schedule() errors"), let the MPTCP receive path schedule
exactly the required amount of memory.
Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Thu, 30 Jun 2022 22:17:55 +0000 (15:17 -0700)]
mptcp: drop SK_RECLAIM_* macros
After commit 51d8ac882d1b ("net: keep sk->sk_forward_alloc as small as
possible"), the MPTCP protocol is the last SK_RECLAIM_CHUNK and
SK_RECLAIM_THRESHOLD users.
Update the MPTCP reclaim schema to match the core/TCP one and drop the
mentioned macros. This additionally clean the MPTCP code a bit.
Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Thu, 30 Jun 2022 22:17:54 +0000 (15:17 -0700)]
mptcp: never fetch fwd memory from the subflow
The memory accounting is broken in such exceptional code
path, and after commit 51d8ac882d1b ("net: keep sk->sk_forward_alloc
as small as possible") we can't find much help there.
Drop the broken code.
Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 1 Jul 2022 10:21:56 +0000 (11:21 +0100)]
Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/nex
t-queue
Tony Nguyen says:
====================
100GbE Intel Wired LAN Driver Updates 2022-06-30
This series contains updates to ice driver only.
Martyna adds support for VLAN related TC switchdev filters and reworks
dummy packet implementation of VLANs to enable dynamic header insertion to
allow for more rule types.
Lu Wei utilizes eth_broadcast_addr() helper over an open coded version.
Ziyang Xuan removes unneeded NULL checks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jilin Yuan [Thu, 30 Jun 2022 07:57:51 +0000 (15:57 +0800)]
ethernet/neterion: fix repeated words in comments
Delete the redundant word 'the'.
Delete the redundant word 'a'.
Delete the redundant word 'frame'.
Delete the redundant word 'is'.
Delete the redundant word 'not'.
Signed-off-by: Jilin Yuan <yuanjilin@cdjrlc.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
Prevent permanently closed tc-taprio gates from blocking a Felix DSA switch port
Richie Pearn reports that if we install a tc-taprio schedule on a Felix
switch port, and that schedule has at least one gate that never opens
(for example TC0 below):
then packets classified to the permanently closed traffic class will not
be dequeued by the egress port. They will just remain in the queue
system, to consume resources. Frame aging does not trigger either,
because in order for that to happen, the packets need to be eligible for
egress scheduling in the first place, which they aren't. If that port is
allowed to consume the entire shared buffer of the switch (as we
configure things by default using devlink-sb), then eventually, by
sending enough packets, the entire switch will hang.
If we think enough about the problem, we realize that this is only a
special case of a more general issue, and can also be reproduced with
gates that aren't permanently closed, but are not large enough to send
an entire frame. In that sense, a permanently closed gate is simply a
case where all frames are oversized.
The ENETC has logic to reject transmitted packets that would overrun the
time window - see commit aad47a97afd1 ("net: enetc: count the tc-taprio
window drops").
The Felix switch has no such thing on a per-packet basis, but it has a
register replicated per {egress port, TC} which essentially limits the
max MTU. A packet which exceeds the per-port-TC MTU is immediately
discarded and therefore will not hang the port anymore (albeit, sadly,
this only bumps a generic drop hardware counter and we cannot really
infer the reason such as to offer a dedicated counter for these events).
This patch set calculates the max MTU per {port, TC} when the tc-taprio
config, or link speed, or port-global MTU values change. This solves the
larger "gate too small for packet" problem, but also the original issue
with the gate permanently closed that was reported by Richie.
====================
Vladimir Oltean [Tue, 28 Jun 2022 14:52:38 +0000 (17:52 +0300)]
time64.h: consolidate uses of PSEC_PER_NSEC
Time-sensitive networking code needs to work with PTP times expressed in
nanoseconds, and with packet transmission times expressed in
picoseconds, since those would be fractional at higher than gigabit
speed when expressed in nanoseconds.
Convert the existing uses in tc-taprio and the ocelot/felix DSA driver
to a PSEC_PER_NSEC macro. This macro is placed in include/linux/time64.h
as opposed to its relatives (PSEC_PER_SEC etc) from include/vdso/time64.h
because the vDSO library does not (yet) need/use it.
Cc: Andy Lutomirski <luto@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> # for the vDSO parts Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Tue, 28 Jun 2022 14:52:37 +0000 (17:52 +0300)]
net: dsa: felix: drop oversized frames with tc-taprio instead of hanging the port
Currently, sending a packet into a time gate too small for it (or always
closed) causes the queue system to hold the frame forever. Even worse,
this frame isn't subject to aging either, because for that to happen, it
needs to be scheduled for transmission in the first place. But the frame
will consume buffer memory and frame references while it is forever held
in the queue system.
Before commit f24f9e3d8757 ("net: mscc: ocelot: initialize watermarks to
sane defaults"), this behavior was somewhat subtle, as the switch had a
more intricately tuned default watermark configuration out of reset,
which did not allow any single port and tc to consume the entire switch
buffer space. Nonetheless, the held frames are still there, and they
reduce the total backplane capacity of the switch.
However, after the aforementioned commit, the behavior can be very
clearly seen, since we deliberately allow each {port, tc} to consume the
entire shared buffer of the switch minus the reservations (and we
disable all reservations by default). That is to say, we allow a
permanently closed tc-taprio gate to hang the entire switch.
A careful inspection of the documentation shows that the QSYS:Q_MAX_SDU
per-port-tc registers serve 2 purposes: one is for guard band calculation
(when zero, this falls back to QSYS:PORT_MAX_SDU), and the other is to
enable oversized frame dropping (when non-zero).
Currently the QSYS:Q_MAX_SDU registers are all zero, so oversized frame
dropping is disabled. The goal of the change is to enable it seamlessly.
For that, we need to hook into the MTU change, tc-taprio change, and
port link speed change procedures, since we depend on these variables.
Frames are not dropped on egress due to a queue system oversize
condition, instead that egress port is simply excluded from the mask of
valid destination ports for the packet. If there are no destination
ports at all, the ingress counter that increments is the generic
"drop_tail" in ethtool -S.
The issue exists in various forms since the tc-taprio offload was introduced.
Fixes: edc15c3360cf ("net: dsa: felix: Configure Time-Aware Scheduler via taprio offload") Reported-by: Richie Pearn <richard.pearn@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Tue, 28 Jun 2022 14:52:36 +0000 (17:52 +0300)]
net: dsa: felix: keep QSYS_TAG_CONFIG_INIT_GATE_STATE(0xFF) out of rmw
In vsc9959_tas_clock_adjust(), the INIT_GATE_STATE field is not changed,
only the ENABLE field. Similarly for the disabling of the time-aware
shaper in vsc9959_qos_port_tas_set().
To reflect this, keep the QSYS_TAG_CONFIG_INIT_GATE_STATE_M mask out of
the read-modify-write procedure to make it clearer what is the intention
of the code.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Tue, 28 Jun 2022 14:52:35 +0000 (17:52 +0300)]
net: dsa: felix: keep reference on entire tc-taprio config
In a future change we will need to remember the entire tc-taprio config
on all ports rather than just the base time, so use the
taprio_offload_get() helper function to replace ocelot_port->base_time
with ocelot_port->taprio.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Fri, 1 Jul 2022 02:57:38 +0000 (19:57 -0700)]
Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:
====================
1GbE Intel Wired LAN Driver Updates 2022-06-30
This series contains updates to misc Intel drivers.
Jesse removes unused macros from e100, e1000, e1000e, i40e, iavf, igc,
ixgb, ixgbe, and ixgbevf drivers.
Jiang Jian removes repeated words from ixgbe, fm10k, igb, and ixgbe
drivers.
Jilin Yuan removes repeated words from e1000, e1000e, fm10k, i40e, iavf,
igb, igbvf, igc, ixgbevf, and ice drivers.
* '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
intel/ice:fix repeated words in comments
intel/ixgbevf:fix repeated words in comments
intel/igc:fix repeated words in comments
intel/igbvf:fix repeated words in comments
intel/igb:fix repeated words in comments
intel/iavf:fix repeated words in comments
intel/i40e:fix repeated words in comments
intel/fm10k:fix repeated words in comments
intel/e1000e:fix repeated words in comments
intel/e1000:fix repeated words in comments
ixgbe: drop unexpected word 'for' in comments
igb: remove unexpected word "the"
fm10k: remove unexpected word "the"
ixgbe: remove unexpected word "the"
intel: remove unused macros
====================
drivers/net/ethernet/microchip/sparx5/sparx5_switchdev.c 01f703f32b07 ("net: sparx5: mdb add/del handle non-sparx5 devices") 9597552d3cd9 ("net: sparx5: Allow mdb entries to both CPU and ports")
Amir Goldstein [Thu, 30 Jun 2022 19:58:49 +0000 (22:58 +0300)]
vfs: fix copy_file_range() regression in cross-fs copies
A regression has been reported by Nicolas Boichat, found while using the
copy_file_range syscall to copy a tracefs file.
Before commit 6c5c6aabe260 ("vfs: allow copy_file_range to copy across
devices") the kernel would return -EXDEV to userspace when trying to
copy a file across different filesystems. After this commit, the
syscall doesn't fail anymore and instead returns zero (zero bytes
copied), as this file's content is generated on-the-fly and thus reports
a size of zero.
Another regression has been reported by He Zhe - the assertion of
WARN_ON_ONCE(ret == -EOPNOTSUPP) can be triggered from userspace when
copying from a sysfs file whose read operation may return -EOPNOTSUPP.
Since we do not have test coverage for copy_file_range() between any two
types of filesystems, the best way to avoid these sort of issues in the
future is for the kernel to be more picky about filesystems that are
allowed to do copy_file_range().
This patch restores some cross-filesystem copy restrictions that existed
prior to commit 6c5c6aabe260 ("vfs: allow copy_file_range to copy across
devices"), namely, cross-sb copy is not allowed for filesystems that do
not implement ->copy_file_range().
Filesystems that do implement ->copy_file_range() have full control of
the result - if this method returns an error, the error is returned to
the user. Before this change this was only true for fs that did not
implement the ->remap_file_range() operation (i.e. nfsv3).
Filesystems that do not implement ->copy_file_range() still fall-back to
the generic_copy_file_range() implementation when the copy is within the
same sb. This helps the kernel can maintain a more consistent story
about which filesystems support copy_file_range().
nfsd and ksmbd servers are modified to fall-back to the
generic_copy_file_range() implementation in case vfs_copy_file_range()
fails with -EOPNOTSUPP or -EXDEV, which preserves behavior of
server-side-copy.
fall-back to generic_copy_file_range() is not implemented for the smb
operation FSCTL_DUPLICATE_EXTENTS_TO_FILE, which is arguably a correct
change of behavior.
Ziyang Xuan [Thu, 16 Jun 2022 04:52:59 +0000 (12:52 +0800)]
ice: Remove unnecessary NULL check before dev_put
Since commit 8a32d00ea2a0 ("netdevice: add the case if dev is NULL"),
dev_put(NULL) is safe, check NULL before dev_put() is not needed.
Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Lu Wei [Wed, 18 May 2022 21:27:59 +0000 (14:27 -0700)]
ice: use eth_broadcast_addr() to set broadcast address
Use eth_broadcast_addr() to set broadcast address instead of memset().
Signed-off-by: Lu Wei <luwei32@huawei.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
ice: switch: dynamically add VLAN headers to dummy packets
Enable the support of creating all kinds of declared dummy packets
with the VLAN tags by inserting VLAN headers (single VLAN and QinQ
cases) if needed.
Decrease the number of declared dummy packets and increase in the
possible packet's combinations for adding switch rules.
This change enables support of creating filters that match both on
VLAN + tunnels properties in switchdev.
Signed-off-by: Martyna Szapar-Mudlaw <martyna.szapar-mudlaw@intel.com> Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Enable support for adding TC rules with both C-tag and S-tag that can
filter on the inner and outer VLAN in QinQ for basic packets (not
tunneled cases).
Signed-off-by: Wiktor Pilarczyk <wiktor.pilarczyk@intel.com> Signed-off-by: Martyna Szapar-Mudlaw <martyna.szapar-mudlaw@intel.com> Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Vladimir Oltean [Wed, 29 Jun 2022 19:33:58 +0000 (22:33 +0300)]
net: phylink: fix NULL pl->pcs dereference during phylink_pcs_poll_start
The current link mode of the phylink instance may not require an
attached PCS. However, phylink_major_config() unconditionally
dereferences this potentially NULL pointer when restarting the link poll
timer, which will panic the kernel.
Fix the problem by checking whether a PCS exists in phylink_pcs_poll_start(),
otherwise do nothing. The code prior to the blamed patch also only
looked at pcs->poll within an "if (pcs)" block.
Fixes: d4fe1494756c ("net: phylink: disable PCS polling over major configuration") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Tested-by: Gerhard Engleder <gerhard@engleder-embedded.com> Tested-by: Michael Walle <michael@walle.cc> # on kontron-kbox-a-230-ls Tested-by: Nicolas Ferre <nicolas.ferre@microchip.com> # on sam9x60ek Link: https://lore.kernel.org/r/20220629193358.4007923-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Wed, 29 Jun 2022 18:30:07 +0000 (21:30 +0300)]
net: dsa: felix: fix race between reading PSFP stats and port stats
Both PSFP stats and the port stats read by ocelot_check_stats_work() are
indirectly read through the same mechanism - write to STAT_CFG:STAT_VIEW,
read from SYS:STAT:CNT[n].
It's just that for port stats, we write STAT_VIEW with the index of the
port, and for PSFP stats, we write STAT_VIEW with the filter index.
So if we allow them to run concurrently, ocelot_check_stats_work() may
change the view from vsc9959_psfp_counters_get(), and vice versa.
Jakub Kicinski [Wed, 29 Jun 2022 18:19:10 +0000 (11:19 -0700)]
net: tun: avoid disabling NAPI twice
Eric reports that syzbot made short work out of my speculative
fix. Indeed when queue gets detached its tfile->tun remains,
so we would try to stop NAPI twice with a detach(), close()
sequence.
Alternative fix would be to move tun_napi_disable() to
tun_detach_all() and let the NAPI run after the queue
has been detached.
Fixes: 6760cd5a7546 ("net: tun: stop NAPI when detaching queues") Reported-by: syzbot <syzkaller@googlegroups.com> Reported-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20220629181911.372047-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Thu, 30 Jun 2022 17:03:22 +0000 (10:03 -0700)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma fixes from Jason Gunthorpe:
"Three minor bug fixes:
- qedr not setting the QP timeout properly toward userspace
- Memory leak on error path in ib_cm
- Divide by 0 in RDMA interrupt moderation"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
linux/dim: Fix divide by 0 in RDMA DIM
RDMA/cm: Fix memory leak in ib_cm_insert_listen
RDMA/qedr: Fix reporting QP timeout attribute
Linus Torvalds [Thu, 30 Jun 2022 16:57:18 +0000 (09:57 -0700)]
Merge tag 'fsnotify_for_v5.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull fanotify fix from Jan Kara:
"A fix for recently added fanotify API to have stricter checks and
refuse some invalid flag combinations to make our life easier in the
future"
* tag 'fsnotify_for_v5.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
fanotify: refine the validation checks on non-dir inode mask
Linus Torvalds [Thu, 30 Jun 2022 16:45:42 +0000 (09:45 -0700)]
Merge tag 'v5.19-p3' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fix from Herbert Xu:
"Fix a regression that breaks the ccp driver"
* tag 'v5.19-p3' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: ccp - Fix device IRQ counting by using platform_irq_count()
====================
net, neigh: introduce interval_probe_time for periodic probe
This series adds a new option `interval_probe_time_ms` in net, neigh
for periodic probe, and add a limitation to prevent it set to 0
====================
Yuwei Wang [Wed, 29 Jun 2022 08:48:32 +0000 (08:48 +0000)]
net, neigh: introduce interval_probe_time_ms for periodic probe
commit 9c444ee74677 ("net, neigh: Set lower cap for neigh_managed_work rearming")
fixed a case when DELAY_PROBE_TIME is configured to 0, the processing of the
system work queue hog CPU to 100%, and further more we should introduce
a new option used by periodic probe
Signed-off-by: Yuwei Wang <wangyuweihx@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Petr Machata [Wed, 29 Jun 2022 07:02:05 +0000 (10:02 +0300)]
mlxsw: spectrum_router: Fix rollback in tunnel next hop init
In mlxsw_sp_nexthop6_init(), a next hop is always added to the router
linked list, and mlxsw_sp_nexthop_type_init() is invoked afterwards. When
that function results in an error, the next hop will not have been removed
from the linked list. As the error is propagated upwards and the caller
frees the next hop object, the linked list ends up holding an invalid
object.
A similar issue comes up with mlxsw_sp_nexthop4_init(), where rollback
block does exist, however does not include the linked list removal.
Both IPv6 and IPv4 next hops have a similar issue with next-hop counter
rollbacks. As these were introduced in the same patchset as the next hop
linked list, include the cleanup in this patch.
Fixes: 4e4a8cbe8a97 ("mlxsw: spectrum_router: Keep nexthops in a linked list") Fixes: 57b8fd70b9c8 ("mlxsw: spectrum: Add support for setting counters on nexthops") Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/20220629070205.803952-1-idosch@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Duoming Zhou [Wed, 29 Jun 2022 00:26:40 +0000 (08:26 +0800)]
net: rose: fix UAF bugs caused by timer handler
There are UAF bugs in rose_heartbeat_expiry(), rose_timer_expiry()
and rose_idletimer_expiry(). The root cause is that del_timer()
could not stop the timer handler that is running and the refcount
of sock is not managed properly.
This patch adds refcount of sock when we use functions
such as rose_start_heartbeat() and so on to start timer,
and decreases the refcount of sock when timer is finished
or deleted by functions such as rose_stop_heartbeat()
and so on. As a result, the UAF bugs could be mitigated.
Jose Alonso [Tue, 28 Jun 2022 15:13:02 +0000 (12:13 -0300)]
net: usb: ax88179_178a: Fix packet receiving
This patch corrects packet receiving in ax88179_rx_fixup.
- problem observed:
ifconfig shows allways a lot of 'RX Errors' while packets
are received normally.
This occurs because ax88179_rx_fixup does not recognise properly
the usb urb received.
The packets are normally processed and at the end, the code exits
with 'return 0', generating RX Errors.
(pkt_cnt==-2 and ptk_hdr over field rx_hdr trying to identify
another packet there)
The dump shows that pkt_cnt is the number of entrys in the
per-packet metadata. It is "2 * packet count".
Each packet have two entrys. The first have a valid
value (pkt_len and AX_RXHDR_*) and the second have a
dummy-header 0x80000000 (pkt_len=0 with AX_RXHDR_DROP_ERR).
Why exists dummy-header for each packet?!?
My guess is that this was done probably to align the
entry for each packet to 64-bits and maintain compatibility
with old firmware.
There is also a padding (0x00000000) before the rx_hdr to
align the end of rx_hdr to 64-bit.
Note that packets have a alignment of 64-bits (8-bytes).
This patch assumes that the dummy-header and the last
padding are optional. So it preserves semantics and
recognises the same valid packets as the current code.
This patch was made using only the dumpfile information and
tested with only one device:
0b95:1790 ASIX Electronics Corp. AX88179 Gigabit Ethernet
Fixes: 49d60052a7fe ("net: usb: ax88179_178a: Fix out-of-bounds accesses in RX fixup") Fixes: ce13f848757a ("ax88179_178a: ASIX AX88179_178A USB 3.0/2.0 to gigabit ethernet adapter driver") Signed-off-by: Jose Alonso <joalonsof@gmail.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/d6970bb04bf67598af4d316eaeb1792040b18cfd.camel@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Yang Yingliang [Tue, 28 Jun 2022 13:12:59 +0000 (21:12 +0800)]
net: pcs-rzn1-miic: fix return value check in miic_probe()
On failure, devm_platform_ioremap_resource() returns a ERR_PTR() value
and not NULL. Fix return value checking by using IS_ERR() and return
PTR_ERR() as error value.
Fixes: d2909d765ef7 ("net: pcs: add Renesas MII converter driver") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Reviewed-by: Clément Léger <clement.leger@bootlin.com> Link: https://lore.kernel.org/r/20220628131259.3109124-1-yangyingliang@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Yevhen Orlov [Wed, 29 Jun 2022 01:29:14 +0000 (04:29 +0300)]
net: bonding: fix use-after-free after 802.3ad slave unbind
commit badbe055b92f ("bonding: fix 802.3ad aggregator reselection"),
resolve case, when there is several aggregation groups in the same bond.
bond_3ad_unbind_slave will invalidate (clear) aggregator when
__agg_active_ports return zero. So, ad_clear_agg can be executed even, when
num_of_ports!=0. Than bond_3ad_unbind_slave can be executed again for,
previously cleared aggregator. NOTE: at this time bond_3ad_unbind_slave
will not update slave ports list, because lag_ports==NULL. So, here we
got slave ports, pointing to freed aggregator memory.
Fix with checking actual number of ports in group (as was before
commit badbe055b92f ("bonding: fix 802.3ad aggregator reselection") ),
before ad_clear_agg().
Colin Ian King [Tue, 28 Jun 2022 14:54:06 +0000 (15:54 +0100)]
ipv6: remove redundant store to value after addition
There is no need to store the result of the addition back to variable count
after the addition. The store is redundant, replace += with just +
Cleans up clang scan build warning:
warning: Although the value stored to 'count' is used in the enclosing
expression, the value is never actually read from 'count'
Oleksij Rempel [Tue, 28 Jun 2022 11:43:49 +0000 (13:43 +0200)]
net: phy: ax88772a: fix lost pause advertisement configuration
In case of asix_ax88772a_link_change_notify() workaround, we run soft
reset which will automatically clear MII_ADVERTISE configuration. The
PHYlib framework do not know about changed configuration state of the
PHY, so we need use phy_init_hw() to reinit PHY configuration.
Fixes: 877cfd219ba7 ("net: usb/phy: asix: add support for ax88772A/C PHYs") Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20220628114349.3929928-1-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Lukas Wunner [Tue, 28 Jun 2022 10:15:08 +0000 (12:15 +0200)]
net: phy: Don't trigger state machine while in suspend
Upon system sleep, mdio_bus_phy_suspend() stops the phy_state_machine(),
but subsequent interrupts may retrigger it:
They may have been left enabled to facilitate wakeup and are not
quiesced until the ->suspend_noirq() phase. Unwanted interrupts may
hence occur between mdio_bus_phy_suspend() and dpm_suspend_noirq(),
as well as between dpm_resume_noirq() and mdio_bus_phy_resume().
Retriggering the phy_state_machine() through an interrupt is not only
undesirable for the reason given in mdio_bus_phy_suspend() (freezing it
midway with phydev->lock held), but also because the PHY may be
inaccessible after it's suspended: Accesses to USB-attached PHYs are
blocked once usb_suspend_both() clears the can_submit flag and PHYs on
PCI network cards may become inaccessible upon suspend as well.
Amend phy_interrupt() to avoid triggering the state machine if the PHY
is suspended. Signal wakeup instead if the attached net_device or its
parent has been configured as a wakeup source. (Those conditions are
identical to mdio_bus_phy_may_suspend().) Postpone handling of the
interrupt until the PHY has resumed.
Before stopping the phy_state_machine() in mdio_bus_phy_suspend(),
wait for a concurrent phy_interrupt() to run to completion. That is
necessary because phy_interrupt() may have checked the PHY's suspend
status before the system sleep transition commenced and it may thus
retrigger the state machine after it was stopped.
Likewise, after re-enabling interrupt handling in mdio_bus_phy_resume(),
wait for a concurrent phy_interrupt() to complete to ensure that
interrupts which it postponed are properly rerun.
The issue was exposed by commit 3e5d3f0549b3 ("usbnet: smsc95xx: Forward
PHY interrupts to PHY driver to avoid polling"), but has existed since
forever.
Vladimir Oltean [Tue, 28 Jun 2022 10:08:31 +0000 (13:08 +0300)]
net: switchdev: add reminder near struct switchdev_notifier_fdb_info
br_switchdev_fdb_notify() creates an on-stack FDB info variable, and
initializes it member by member. As such, newly added fields which are
not initialized by br_switchdev_fdb_notify() will contain junk bytes
from the stack.
Other uses of struct switchdev_notifier_fdb_info have a struct
initializer which should put zeroes in the uninitialized fields.
Add a reminder above the structure for future developers. Recently
discussed during review.
Oliver Neukum [Tue, 28 Jun 2022 09:35:17 +0000 (11:35 +0200)]
usbnet: fix memory allocation in helpers
usbnet provides some helper functions that are also used in
the context of reset() operations. During a reset the other
drivers on a device are unable to operate. As that can be block
drivers, a driver for another interface cannot use paging
in its memory allocations without risking a deadlock.
Use GFP_NOIO in the helpers.
Oleksij Rempel [Tue, 28 Jun 2022 08:51:55 +0000 (10:51 +0200)]
net: dsa: microchip: count pause packets together will all other packets
This switch is calculating tx/rx_bytes for all packets including pause.
So, include rx/tx_pause counter to rx/tx_packets to make tx/rx_bytes fit
to rx/tx_packets.
Link: https://lore.kernel.org/all/20220624220317.ckhx6z7cmzegvoqi@skbuf/ Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Coleman Dietsch [Tue, 28 Jun 2022 17:47:44 +0000 (12:47 -0500)]
selftests net: fix kselftest net fatal error
The incorrect path is causing the following error when trying to run net
kselftests:
In file included from bpf/nat6to4.c:43:
../../../lib/bpf/bpf_helpers.h:11:10: fatal error: 'bpf_helper_defs.h' file not found
^~~~~~~~~~~~~~~~~~~
1 error generated.
1) Restore set counter when one of the CPU loses race to add elements
to sets.
2) After NF_STOLEN, skb might be there no more, update nftables trace
infra to avoid access to skb in this case. From Florian Westphal.
3) nftables bridge might register a prerouting hook with zero priority,
br_netfilter incorrectly skips it. Also from Florian.
* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: br_netfilter: do not skip all hooks with 0 priority
netfilter: nf_tables: avoid skb access on nf_stolen
netfilter: nft_dynset: restore set element counter when failing to update
====================
Linus Torvalds [Wed, 29 Jun 2022 16:20:40 +0000 (09:20 -0700)]
Merge tag '5.19-rc4-ksmbd-server-fixes' of git://git.samba.org/ksmbd
Pull ksmbd server fixes from Steve French:
- seek null check (don't use f_seek op directly and blindly)
- offset validation in FSCTL_SET_ZERO_DATA
- fallocate fix (relates e.g. to xfstests generic/091 and 263)
- two cleanup fixes
- fix socket settings on some arch
* tag '5.19-rc4-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: use vfs_llseek instead of dereferencing NULL
ksmbd: check invalid FileOffset and BeyondFinalZero in FSCTL_ZERO_DATA
ksmbd: set the range of bytes to zero without extending file size in FSCTL_ZERO_DATA
ksmbd: remove duplicate flag set in smb2_write
ksmbd: smbd: Remove useless license text when SPDX-License-Identifier is already used
ksmbd: use SOCK_NONBLOCK type for kernel_accept()
Michael Walle [Mon, 27 Jun 2022 17:06:43 +0000 (19:06 +0200)]
NFC: nxp-nci: don't print header length mismatch on i2c error
Don't print a misleading header length mismatch error if the i2c call
returns an error. Instead just return the error code without any error
message.
Signed-off-by: Michael Walle <michael@walle.cc> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Walle [Mon, 27 Jun 2022 17:06:42 +0000 (19:06 +0200)]
NFC: nxp-nci: Don't issue a zero length i2c_master_read()
There are packets which doesn't have a payload. In that case, the second
i2c_master_read() will have a zero length. But because the NFC
controller doesn't have any data left, it will NACK the I2C read and
-ENXIO will be returned. In case there is no payload, just skip the
second i2c master read.
Fixes: 276f3e999832 ("NFC: nxp-nci_i2c: Add I2C support to NXP NCI driver") Signed-off-by: Michael Walle <michael@walle.cc> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata [Tue, 28 Jun 2022 10:37:44 +0000 (12:37 +0200)]
selftests: forwarding: ethtool_extended_state: Convert to busywait
Currently, this script sets up the test scenario, which is supposed to end
in an inability of the system to negotiate a link. It then waits for a bit,
and verifies that the system can diagnose why the link was not established.
The wait time for the scenario where different link speeds are forced on
the two ends of a loopback cable, was set to 4 seconds, which exactly
covered it. As of a recent mlxsw firmware update, this time gets longer,
and this test starts failing.
The time that selftests currently wait for links to be established is
currently $WAIT_TIMEOUT, or 20 seconds. It seems reasonable that if this is
the time necessary to establish and bring up a link, it should also be
enough to determine that a link cannot be established and why.
Therefore in this patch, convert the sleeps to busywaits, so that if a
failure is established sooner (as is expected), the test runs quicker. And
use $WAIT_TIMEOUT as the time to wait.
Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>