Jianbo Liu [Fri, 18 Jun 2021 06:47:15 +0000 (06:47 +0000)]
net/mlx5e: Add post meter table for flow metering
Flow meter object monitors the packets rate for the flows it is
attached to, and color packets with GREEN or RED. The post meter table
is used to check the color. Packet is dropped if it's RED, or
forwarded to post_act table if GREEN.
Packet color will be set to 8 LSB of the register C5, so they are
reserved for metering, which are previously used for matching fte id.
Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Ariel Levkovich <lariel@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Jianbo Liu [Mon, 7 Jun 2021 01:40:16 +0000 (01:40 +0000)]
net/mlx5e: Get or put meter by the index of tc police action
Add functions to create and destroy flow meter aso object.
This object only supports the range allocation. 64 objects are
allocated at a time, and there are two meters in each object.
Usually only one meter is allocated for a flow, so bitmap is used
to manage these 128 meters.
TC police action is mapped to hardware meter. As the index is unique
for each police action, add APIs to allocate or free hardware meter by
the index. If the meter is already created, increment its refcnt,
otherwise create new one. If police action has different parameters,
update hardware meter accordingly.
Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Jianbo Liu [Mon, 7 Jun 2021 03:56:05 +0000 (03:56 +0000)]
net/mlx5e: Add support to modify hardware flow meter parameters
The policing rate and burst from user are converted to flow meter
parameters in hardware. These parameters are set or modified by
ACCESS_ASO WQE, add function to support it.
Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Ariel Levkovich <lariel@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Jianbo Liu [Fri, 29 Apr 2022 07:46:47 +0000 (07:46 +0000)]
net/mlx5: Implement interfaces to control ASO SQ and CQ
Add interfaces to use ASO object control channel. The channel consists
of a control SQ and CQ to which user can post ACCESS_ASO work requests
to modify ASO objects. The functions to get wqe from SQ, fill wqe,
post the request, and poll the completion of the work, are provided.
Jianbo Liu [Sat, 30 Apr 2022 14:31:28 +0000 (14:31 +0000)]
net/mlx5: Add support to create SQ and CQ for ASO
Add a separate API to create SQ and CQ for advanced steering
operations (ASO).
Since the mlx5_en API to create these resources is strongly coupled
with netdev channels and datapath elements, this API provides an
alternative for creating send queues that are used for ASO.
Currently the API allows creating channels with 2 wqbbs only - meaning
the support will be for a single ACCESS_ASO wqe with data at a time.
Chris Mi [Mon, 30 May 2022 03:07:57 +0000 (06:07 +0300)]
net/mlx5: E-switch, Remove dependency between sriov and eswitch mode
Currently, there are three eswitch modes, none, legacy and switchdev.
None is the default mode. Remove redundant none mode as eswitch mode
should always be either legacy mode or switchdev mode.
With this patch, there are two behavior changes:
1. Legacy becomes the default mode. When querying eswitch mode using
devlink, a valid mode is always returned.
2. When disabling sriov, the eswitch mode will not change, only vfs
are unloaded.
Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Chris Mi [Thu, 5 May 2022 06:23:39 +0000 (09:23 +0300)]
net/mlx5: E-switch, Introduce flag to indicate if fdb table is created
Introduce flag to indicate if fdb table is created as a pre-step
to prepare for removing dependency between sriov and eswitch mode
in the downstream patches.
Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Chris Mi [Thu, 10 Feb 2022 07:22:04 +0000 (09:22 +0200)]
net/mlx5: E-switch, Introduce flag to indicate if vport acl namespace is created
Eswitch vport acl namespace is needed when loading vfs. There is
no need to free and reallocate it when switching eswitch mode.
Introduce flag to indicate if it is created or not. When needed,
create it. Only free it when the driver is unloaded or in bare
metal mode.
Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Before commit 17ed6fa56a03 ("net/mlx5: Lag, use lag lock") there
used to be a matching mlx5_esw_lock() function and the lock and
unlock functions were symmetric. But now we take the lock
unconditionally and must unlock unconditionally as well.
As near as I can tell this is dead code and can just be deleted.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
David S. Miller [Sat, 2 Jul 2022 15:34:05 +0000 (16:34 +0100)]
Merge branch 'lan937x-dsa-driver'
Arun Ramadoss says:
====================
net: dsa: microchip: DSA Driver support for LAN937x
LAN937x is a Multi-Port 100BASE-T1 Ethernet Physical Layer switch
compliant with the IEEE 802.3bw-2015 specification. The device provides
100 Mbit/s transmit and receive capability over a single Unshielded
Twisted Pair (UTP) cable. LAN937x is successive revision of KSZ series
switch.
This series of patches provide the DSA driver support for
Microchip LAN937X switch through MII/RMII interface. The RGMII interface
support will be added in the follow up series. LAN937x uses the most of
functionality of KSZ9477.
The LAN937x switch series family consists of following SKUs:
LAN9370:
- 4 T1 Phys
- 1 RGMII port
LAN9371:
- 3 T1 Phys & 1 TX Phy
- 2 RGMII ports
LAN9372:
- 5 T1 Phys & 1 TX Phy
- 2 RGMII ports
LAN9373:
- 5 T1 Phys
- 2 RGMII
- 1 SGMII port
LAN9374:
- 6 T1 Phys
- 2 RGMII ports
Changes in v15:
- fixed compilation issue.
- Updated the phylink_mac_link_up to check only for 10/100/1000 speed.
Changes in v14:
- Updated the patch series to latest ksz code refactoring.
- RGMII register configuration is removed from the series. It will be added in
the follow up patch series.
Changes in v13:
- Fixed the compilation issue in patch 5 and 6
Changes in v12:
- Removed the reduntant spi indirect enable in lan937x_init
- Used the ksz_port_stp_state_set function
- Apply rgmii internal delay only if it is rgmii port
- Set the bit for 100baseTx in phylink_get_caps
- Moved the ethtool related API from patch 5 to 7
- Moved lan_alu_entry struct in lan937x_dev.h from patch 5 to 9
- Moved lan_vlan_entry in lan937x_dev.h from patch 5 to 10
- Used the ksz_get_stats64 function for get_stats64 hook
- Splitted the patch 5. one for port configuration, spi driver, phy read &
write and mtu configuration.
- Updated the indentation in ethernet-controller.yaml
- lan937x.yaml: Removed the blank lines, updated the ethernet handle to macb0.
Added the rgmii internal delay only for the ports.
Changes in v11:
- Tagged as RFC to get the feedback for the subpatches 1/10, 5/10 and 6/10
Changes in v10:
- dsa.yaml: dropped moving mdio properties to dsa.yaml as per the feedback
https://patchwork.kernel.org/project/netdevbpf/patch/20220318085540.281721-3-prasanna.vengateshan@microchip.com/#24787466
- microchip,lan937x.yaml: Naming convention changes in the example
- lan937x_main.c: Moving configurations from lan937x_reset_switch() to setup()
- lan937x_main.c: helper function has been introduced for
lan937x_internal_phy_read & write
- lan937x_dev.h: lan_alu_struct struct data type changes
- lan937x_main.c: lan937x_get_stats64 make non blocking
- lan937x_main.c: modified lan937x_port_mirror_add to include extack
Changes in v9:
- lan937x_main.c: of_node_put() correction in lan937x_parse_dt_rgmii_delay
- lan937x_dev.c: removed the interface checks from lan937x_apply_rgmii_delay.
- changes in ethernet-controller.yaml and dsa.yaml
Changes in v8:
- lan937x_dev.c: fixed lan937x_r_mib_pkt warning in the sub patches
- lan937x_main.c: phylink_autoneg_inband() check removed in
lan937x_phylink_mac_link_up()
- lan937x_main.c: made legacy_pre_march2020 = false as this is non-legacy driver
and indentation correction in lan937x_phylink_mac_link_up()
- removed unnecessary parenthesis in lan937x_get_strings()
Changes in v7:
- microchip,lan937x.yaml: *-internal-delay-ps enum values & commit messages
corrections
- lan937x_main.c: removed phylink_validate() and added phylink_get_caps()
- lan937x_main.c: added support for ethtool standard stats (get_eth_*_stats
and get_stats64)
- lan937x_main.c: removed unnecessary PVID read from lan937x_port_vlan_del()
- integrated the changes of ksz9477 multi bridging support to lan937x dev and
tested both multi bridging and STP
- lan937x_port_vlan_del - dummy pvid read removed
Changes in v6:
- microchip_t1.c: There was new merge done in the net-next tree for
microchip_1.c after the v5 submission. Hence rebased it for v6.
Changes in v5:
- microchip,lan937x.yaml: Added mdio properties detail
- microchip,lan937x.yaml: *-internal-delay-ps added under port node
- lan937x_dev.c: changed devm_mdiobus_alloc from of_mdiobus_register as suggested
by Vladimir
- lan937x_dev.c: added dev_info for rgmii internal delay & error message to user
in case of out of range values
- lan937x_dev.c: return -EOPNOTSUPP for C45 regnum values for
lan937x_sw_mdio_read & write operations
- return from function with out storing in a variable
- lan937x_main.c: Added vlan_enable info in vlan_filtering API
- lan937x_main.c: lan937x_port_vlan_del: removed unintended PVID write
Changes in v4:
- tag_ksz.c: cpu_to_be16 to put_unaligned_be16
- correct spacing in comments
- tag_ksz.c: NETIF_F_HW_CSUM fix is integrated
- lan937x_dev.c: mdio_np is removed from global and handled locally
- lan937x_dev.c: unused functions removed lan937x_cfg32 & lan937x_port_cfg32
- lan937x_dev.c: lan937x_is_internal_100BTX_phy_port function name changes
- lan937x_dev.c: RGMII internal delay handling for MAC. Delay values are
retrieved from DTS and updated
- lan937x_dev.c: corrected mutex operations for few dev variables
- microchip,lan937x.yaml: introduced rx-internal-delay-ps &
tx-internal-delay-ps for RGMII internal delay
- lan937x_dev.c: Unnecessary mutex_lock has been removed
- lan937x_main.c: PHY_INTERFACE_MODE_NA handling for lan937x_phylink_validate
- lan937x_main.c: PORT_MIRROR_SNIFFER check in right place
- lan937x_main.c: memset is used instead of writing 0's individually in
lan937x_port_fdb_add function
- lan937x_main.c: Removed \n from NL_SET_ERR_MSG_MOD calls
Changes in v3:
- Removed settings of cnt_ptr to zero and the memset()
added a cleanup patch which moves this into ksz_init_mib_timer().
- Used ret everywhere instead of rc
- microchip,lan937x.yaml: Remove mdio compatible
- microchip_t1.c: Renaming standard phy registers
- tag_ksz.c: LAN937X_TAIL_TAG_OVERRIDE renaming
LAN937X_TAIL_TAG_BLOCKING_OVERRIDE
- tag_ksz.c: Changed Ingress and Egress naming convention based on
Host
- tag_ksz.c: converted to skb_mac_header(skb) from
(is_link_local_ether_addr(hdr->h_dest))
- lan937x_dev.c: Removed BCAST Storm protection settings since we
have Tc commands for them
- lan937x_dev.c: Flow control setting in lan937x_port_setup function
- lan937x_dev.c: RGMII internal delay added only for cpu port,
- lan937x_dev.c: of_get_compatible_child(node,
"microchip,lan937x-mdio") to of_get_child_by_name(node, "mdio");
- lan937x_dev.c:lan937x_get_interface API: returned
PHY_INTERFACE_MODE_INTERNAL instead of PHY_INTERFACE_MODE_NA
- lan937x_main.c: Removed compat interface implementation in
lan937x_config_cpu_port() API & dev_info corrected as well
- lan937x_main.c: deleted ds->configure_vlan_while_not_filtering
= true
- lan937x_main.c: Added explanation for lan937x_setup lines
- lan937x_main.c: FR_MAX_SIZE correction in lan937x_get_max_mtu API
- lan937x_main.c: removed lan937x_port_bridge_flags dummy functions
- lan937x_spi.c - mdiobus_unregister to be added to spi_remove
function
- lan937x_main.c: phy link layer changes
- lan937x_main.c: port mirroring: sniff port selection limiting to
one port
- lan937x_main.c: Changed to global vlan filtering
- lan937x_main.c: vlan_table array to structure
- lan937x_main.c -Use extack instead of reporting errors to Console
- lan937x_main.c - Remove cpu_port addition in vlan_add api
- lan937x_main.c - removed pvid resetting
Changes in v2:
- return check for register read/writes
- dt compatible compatible check is added against chip id value
- lan937x_internal_t1_tx_phy_write() is renamed to
lan937x_internal_phy_write()
- lan937x_is_internal_tx_phy_port is renamed to
lan937x_is_internal_100BTX_phy_port as it is 100Base-Tx phy
- Return value for lan937x_internal_phy_write() is -EOPNOTSUPP
in case of failures
- Return value for lan937x_internal_phy_read() is 0xffff
for non existent phy
- cpu_port checking is removed from lan937x_port_stp_state_set()
- lan937x_phy_link_validate: 100baseT_Full to 100baseT1_Full
- T1 Phy driver is moved to drivers/net/phy/microchip_t1.c
- Tx phy driver support will be added later
- Legacy switch checkings in dts file are removed.
- tag_ksz.c: Re-used ksz9477_rcv for lan937x_rcv
- tag_ksz.c: Xmit() & rcv() Comments are corrected w.r.to host
- net/dsa/Kconfig: Family skew numbers altered in ascending order
- microchip,lan937x.yaml: eth is replaced with ethernet
- microchip,lan937x.yaml: spi1 is replaced with spi
- microchip,lan937x.yaml: cpu labelling is removed
- microchip,lan937x.yaml: port@x value will match the reg value now
====================
net: dsa: microchip: lan937x: add phylink_get_caps support
The internal phy of the LAN937x are capable of 100Mbps Full duplex. The
xMII port of switch is capable of 10Mbps Full & Half Duplex, 100Mbps
Full & Half Duplex and 1000Mbps Half duplex. xMII port also supports Tx
and Rx Flow control.
Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: dsa: microchip: add DSA support for microchip LAN937x
Basic DSA driver support for lan937x and the device will be
configured through SPI interface.
It adds the lan937x_dev_ops in ksz_common.c file and tries to reuse the
functionality of ksz9477 series switch.
drivers/net/dsa/microchip/ path is already part of MAINTAINERS &
the new files come under this path. Hence no update needed to the
MAINTAINERS
Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: dsa: microchip: generic access to ksz9477 static and reserved table
The ksz9477 and lan937x has few difference in the static and reserved
table register 0x041C. For the ksz9477 if the bit 0 is 1 - read
operation and 0 - write operation. But for lan937x bit 1:0 used for
selecting the read/write operation, 01 - write and 10 - read.
To use ksz9477 mdb add/del and enable_stp_addr for the lan937x, masks &
shifts are introduced for ksz9477 & lan937x in ksz_common.c. Then
updated the function with masks & shifts based on the switch instead of
hard coding it.
Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: dsa: tag_ksz: add tag handling for Microchip LAN937x
The Microchip LAN937X switches have a tagging protocol which is
very similar to KSZ tagging. So that the implementation is added to
tag_ksz.c and reused common APIs
Signed-off-by: Prasanna Vengateshan <prasanna.vengateshan@microchip.com> Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
dt-bindings: net: dsa: dt bindings for microchip lan937x
Documentation in .yaml format and updates to the MAINTAINERS
Also 'make dt_binding_check' is passed.
RGMII internal delay values for the mac is retrieved from
rx-internal-delay-ps & tx-internal-delay-ps as per the feedback from
v3 patch series.
https://lore.kernel.org/netdev/20210802121550.gqgbipqdvp5x76ii@skbuf/
It supports only the delay value of 0ns and 2ns.
Signed-off-by: Prasanna Vengateshan <prasanna.vengateshan@microchip.com> Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Reviewed-by: Rob Herring <robh@kernel.org> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
dt-bindings: net: Updated micrel,led-mode for LAN8814 PHY
Enable led-mode configuration for LAN8814 phy
Signed-off-by: Divya Koppera <Divya.Koppera@microchip.com> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 30 Jun 2022 15:07:50 +0000 (15:07 +0000)]
net: add skb_[inner_]tcp_all_headers helpers
Most drivers use "skb_transport_offset(skb) + tcp_hdrlen(skb)"
to compute headers length for a TCP packet, but others
use more convoluted (but equivalent) ways.
Add skb_tcp_all_headers() and skb_inner_tcp_all_headers()
helpers to harmonize this a bit.
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: pcs: rzn1-miic: update speed only if interface is changed
As stated by Russel King, miic_config() can be called as a result of
ethtool setting the configuration while the link is already up. Since
the speed is also set in this function, it could potentially modify
the current speed that is set. This will only happen if there is
no PHY present and we aren't using fixed-link mode.
Handle that by storing the current interface mode in the miic_port
structure and update the speed only if the interface mode is going to
be changed.
Bin Chen [Thu, 30 Jun 2022 11:21:55 +0000 (13:21 +0200)]
nfp: support VF rate limit with NFDK
Support VF rate limiting with NFDK by adding ndo_set_vf_rate to the NFDK
ops structure.
NFDK is used to communicate via PCIE to NFP-3800 based NICs
while NFD3 is used for other NICs supported by the NFP driver.
The VF rate limit feature is already supported by the driver for NFD3.
Signed-off-by: Bin Chen <bin.chen@corigine.com> Reviewed-by: Baowen Zheng <baowen.zheng@corigine.com> Reviewed-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Alaa Mohamed [Thu, 30 Jun 2022 10:24:49 +0000 (12:24 +0200)]
selftests: net: fib_rule_tests: fix support for running individual tests
parsing and usage of -t got missed in the previous patch.
this patch fixes it
Fixes: cde029b185d4 ("selftests: net: fib_rule_tests: add support to select a test to run") Signed-off-by: Alaa Mohamed <eng.alaamohamedsoliman.am@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 1 Jul 2022 12:25:00 +0000 (13:25 +0100)]
Merge branch 'mptcp-mem-scheduling'
Mat Martineau says:
====================
mptcp: Updates for mem scheduling and SK_RECLAIM
In the "net: reduce tcp_memory_allocated inflation" series (merge commit 6b686a8df09b), Eric Dumazet noted that "Removal of SK_RECLAIM_CHUNK and
SK_RECLAIM_THRESHOLD is left to MPTCP maintainers as a follow up."
Patches 1-3 align MPTCP with the above TCP changes to forward memory
allocation, reclaim, and memory scheduling.
Patch 4 removes the SK_RECLAIM_* macros as Eric requested.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Thu, 30 Jun 2022 22:17:57 +0000 (15:17 -0700)]
net: remove SK_RECLAIM_THRESHOLD and SK_RECLAIM_CHUNK
There are no more users for the mentioned macros, just
drop them.
Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Thu, 30 Jun 2022 22:17:56 +0000 (15:17 -0700)]
mptcp: refine memory scheduling
Similar to commit 25f6b78d3bb7 ("net: fix sk_wmem_schedule() and
sk_rmem_schedule() errors"), let the MPTCP receive path schedule
exactly the required amount of memory.
Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Thu, 30 Jun 2022 22:17:55 +0000 (15:17 -0700)]
mptcp: drop SK_RECLAIM_* macros
After commit d80e4ca0f6c1 ("net: keep sk->sk_forward_alloc as small as
possible"), the MPTCP protocol is the last SK_RECLAIM_CHUNK and
SK_RECLAIM_THRESHOLD users.
Update the MPTCP reclaim schema to match the core/TCP one and drop the
mentioned macros. This additionally clean the MPTCP code a bit.
Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Thu, 30 Jun 2022 22:17:54 +0000 (15:17 -0700)]
mptcp: never fetch fwd memory from the subflow
The memory accounting is broken in such exceptional code
path, and after commit d80e4ca0f6c1 ("net: keep sk->sk_forward_alloc
as small as possible") we can't find much help there.
Drop the broken code.
Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 1 Jul 2022 10:21:56 +0000 (11:21 +0100)]
Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/nex
t-queue
Tony Nguyen says:
====================
100GbE Intel Wired LAN Driver Updates 2022-06-30
This series contains updates to ice driver only.
Martyna adds support for VLAN related TC switchdev filters and reworks
dummy packet implementation of VLANs to enable dynamic header insertion to
allow for more rule types.
Lu Wei utilizes eth_broadcast_addr() helper over an open coded version.
Ziyang Xuan removes unneeded NULL checks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jilin Yuan [Thu, 30 Jun 2022 07:57:51 +0000 (15:57 +0800)]
ethernet/neterion: fix repeated words in comments
Delete the redundant word 'the'.
Delete the redundant word 'a'.
Delete the redundant word 'frame'.
Delete the redundant word 'is'.
Delete the redundant word 'not'.
Signed-off-by: Jilin Yuan <yuanjilin@cdjrlc.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
Prevent permanently closed tc-taprio gates from blocking a Felix DSA switch port
Richie Pearn reports that if we install a tc-taprio schedule on a Felix
switch port, and that schedule has at least one gate that never opens
(for example TC0 below):
then packets classified to the permanently closed traffic class will not
be dequeued by the egress port. They will just remain in the queue
system, to consume resources. Frame aging does not trigger either,
because in order for that to happen, the packets need to be eligible for
egress scheduling in the first place, which they aren't. If that port is
allowed to consume the entire shared buffer of the switch (as we
configure things by default using devlink-sb), then eventually, by
sending enough packets, the entire switch will hang.
If we think enough about the problem, we realize that this is only a
special case of a more general issue, and can also be reproduced with
gates that aren't permanently closed, but are not large enough to send
an entire frame. In that sense, a permanently closed gate is simply a
case where all frames are oversized.
The ENETC has logic to reject transmitted packets that would overrun the
time window - see commit 2971eac43d36 ("net: enetc: count the tc-taprio
window drops").
The Felix switch has no such thing on a per-packet basis, but it has a
register replicated per {egress port, TC} which essentially limits the
max MTU. A packet which exceeds the per-port-TC MTU is immediately
discarded and therefore will not hang the port anymore (albeit, sadly,
this only bumps a generic drop hardware counter and we cannot really
infer the reason such as to offer a dedicated counter for these events).
This patch set calculates the max MTU per {port, TC} when the tc-taprio
config, or link speed, or port-global MTU values change. This solves the
larger "gate too small for packet" problem, but also the original issue
with the gate permanently closed that was reported by Richie.
====================
Vladimir Oltean [Tue, 28 Jun 2022 14:52:38 +0000 (17:52 +0300)]
time64.h: consolidate uses of PSEC_PER_NSEC
Time-sensitive networking code needs to work with PTP times expressed in
nanoseconds, and with packet transmission times expressed in
picoseconds, since those would be fractional at higher than gigabit
speed when expressed in nanoseconds.
Convert the existing uses in tc-taprio and the ocelot/felix DSA driver
to a PSEC_PER_NSEC macro. This macro is placed in include/linux/time64.h
as opposed to its relatives (PSEC_PER_SEC etc) from include/vdso/time64.h
because the vDSO library does not (yet) need/use it.
Cc: Andy Lutomirski <luto@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> # for the vDSO parts Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Tue, 28 Jun 2022 14:52:37 +0000 (17:52 +0300)]
net: dsa: felix: drop oversized frames with tc-taprio instead of hanging the port
Currently, sending a packet into a time gate too small for it (or always
closed) causes the queue system to hold the frame forever. Even worse,
this frame isn't subject to aging either, because for that to happen, it
needs to be scheduled for transmission in the first place. But the frame
will consume buffer memory and frame references while it is forever held
in the queue system.
Before commit 2935aaaf0324 ("net: mscc: ocelot: initialize watermarks to
sane defaults"), this behavior was somewhat subtle, as the switch had a
more intricately tuned default watermark configuration out of reset,
which did not allow any single port and tc to consume the entire switch
buffer space. Nonetheless, the held frames are still there, and they
reduce the total backplane capacity of the switch.
However, after the aforementioned commit, the behavior can be very
clearly seen, since we deliberately allow each {port, tc} to consume the
entire shared buffer of the switch minus the reservations (and we
disable all reservations by default). That is to say, we allow a
permanently closed tc-taprio gate to hang the entire switch.
A careful inspection of the documentation shows that the QSYS:Q_MAX_SDU
per-port-tc registers serve 2 purposes: one is for guard band calculation
(when zero, this falls back to QSYS:PORT_MAX_SDU), and the other is to
enable oversized frame dropping (when non-zero).
Currently the QSYS:Q_MAX_SDU registers are all zero, so oversized frame
dropping is disabled. The goal of the change is to enable it seamlessly.
For that, we need to hook into the MTU change, tc-taprio change, and
port link speed change procedures, since we depend on these variables.
Frames are not dropped on egress due to a queue system oversize
condition, instead that egress port is simply excluded from the mask of
valid destination ports for the packet. If there are no destination
ports at all, the ingress counter that increments is the generic
"drop_tail" in ethtool -S.
The issue exists in various forms since the tc-taprio offload was introduced.
Fixes: 66fc7ae56755 ("net: dsa: felix: Configure Time-Aware Scheduler via taprio offload") Reported-by: Richie Pearn <richard.pearn@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Tue, 28 Jun 2022 14:52:36 +0000 (17:52 +0300)]
net: dsa: felix: keep QSYS_TAG_CONFIG_INIT_GATE_STATE(0xFF) out of rmw
In vsc9959_tas_clock_adjust(), the INIT_GATE_STATE field is not changed,
only the ENABLE field. Similarly for the disabling of the time-aware
shaper in vsc9959_qos_port_tas_set().
To reflect this, keep the QSYS_TAG_CONFIG_INIT_GATE_STATE_M mask out of
the read-modify-write procedure to make it clearer what is the intention
of the code.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Tue, 28 Jun 2022 14:52:35 +0000 (17:52 +0300)]
net: dsa: felix: keep reference on entire tc-taprio config
In a future change we will need to remember the entire tc-taprio config
on all ports rather than just the base time, so use the
taprio_offload_get() helper function to replace ocelot_port->base_time
with ocelot_port->taprio.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Fri, 1 Jul 2022 02:57:38 +0000 (19:57 -0700)]
Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:
====================
1GbE Intel Wired LAN Driver Updates 2022-06-30
This series contains updates to misc Intel drivers.
Jesse removes unused macros from e100, e1000, e1000e, i40e, iavf, igc,
ixgb, ixgbe, and ixgbevf drivers.
Jiang Jian removes repeated words from ixgbe, fm10k, igb, and ixgbe
drivers.
Jilin Yuan removes repeated words from e1000, e1000e, fm10k, i40e, iavf,
igb, igbvf, igc, ixgbevf, and ice drivers.
* '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
intel/ice:fix repeated words in comments
intel/ixgbevf:fix repeated words in comments
intel/igc:fix repeated words in comments
intel/igbvf:fix repeated words in comments
intel/igb:fix repeated words in comments
intel/iavf:fix repeated words in comments
intel/i40e:fix repeated words in comments
intel/fm10k:fix repeated words in comments
intel/e1000e:fix repeated words in comments
intel/e1000:fix repeated words in comments
ixgbe: drop unexpected word 'for' in comments
igb: remove unexpected word "the"
fm10k: remove unexpected word "the"
ixgbe: remove unexpected word "the"
intel: remove unused macros
====================
drivers/net/ethernet/microchip/sparx5/sparx5_switchdev.c 313afdc58aad ("net: sparx5: mdb add/del handle non-sparx5 devices") 949ffda011d7 ("net: sparx5: Allow mdb entries to both CPU and ports")
Amir Goldstein [Thu, 30 Jun 2022 19:58:49 +0000 (22:58 +0300)]
vfs: fix copy_file_range() regression in cross-fs copies
A regression has been reported by Nicolas Boichat, found while using the
copy_file_range syscall to copy a tracefs file.
Before commit e22c8b4f1a9b ("vfs: allow copy_file_range to copy across
devices") the kernel would return -EXDEV to userspace when trying to
copy a file across different filesystems. After this commit, the
syscall doesn't fail anymore and instead returns zero (zero bytes
copied), as this file's content is generated on-the-fly and thus reports
a size of zero.
Another regression has been reported by He Zhe - the assertion of
WARN_ON_ONCE(ret == -EOPNOTSUPP) can be triggered from userspace when
copying from a sysfs file whose read operation may return -EOPNOTSUPP.
Since we do not have test coverage for copy_file_range() between any two
types of filesystems, the best way to avoid these sort of issues in the
future is for the kernel to be more picky about filesystems that are
allowed to do copy_file_range().
This patch restores some cross-filesystem copy restrictions that existed
prior to commit e22c8b4f1a9b ("vfs: allow copy_file_range to copy across
devices"), namely, cross-sb copy is not allowed for filesystems that do
not implement ->copy_file_range().
Filesystems that do implement ->copy_file_range() have full control of
the result - if this method returns an error, the error is returned to
the user. Before this change this was only true for fs that did not
implement the ->remap_file_range() operation (i.e. nfsv3).
Filesystems that do not implement ->copy_file_range() still fall-back to
the generic_copy_file_range() implementation when the copy is within the
same sb. This helps the kernel can maintain a more consistent story
about which filesystems support copy_file_range().
nfsd and ksmbd servers are modified to fall-back to the
generic_copy_file_range() implementation in case vfs_copy_file_range()
fails with -EOPNOTSUPP or -EXDEV, which preserves behavior of
server-side-copy.
fall-back to generic_copy_file_range() is not implemented for the smb
operation FSCTL_DUPLICATE_EXTENTS_TO_FILE, which is arguably a correct
change of behavior.
Ziyang Xuan [Thu, 16 Jun 2022 04:52:59 +0000 (12:52 +0800)]
ice: Remove unnecessary NULL check before dev_put
Since commit 1d6b50d416e7 ("netdevice: add the case if dev is NULL"),
dev_put(NULL) is safe, check NULL before dev_put() is not needed.
Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Lu Wei [Wed, 18 May 2022 21:27:59 +0000 (14:27 -0700)]
ice: use eth_broadcast_addr() to set broadcast address
Use eth_broadcast_addr() to set broadcast address instead of memset().
Signed-off-by: Lu Wei <luwei32@huawei.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
ice: switch: dynamically add VLAN headers to dummy packets
Enable the support of creating all kinds of declared dummy packets
with the VLAN tags by inserting VLAN headers (single VLAN and QinQ
cases) if needed.
Decrease the number of declared dummy packets and increase in the
possible packet's combinations for adding switch rules.
This change enables support of creating filters that match both on
VLAN + tunnels properties in switchdev.
Signed-off-by: Martyna Szapar-Mudlaw <martyna.szapar-mudlaw@intel.com> Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Enable support for adding TC rules with both C-tag and S-tag that can
filter on the inner and outer VLAN in QinQ for basic packets (not
tunneled cases).
Signed-off-by: Wiktor Pilarczyk <wiktor.pilarczyk@intel.com> Signed-off-by: Martyna Szapar-Mudlaw <martyna.szapar-mudlaw@intel.com> Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Vladimir Oltean [Wed, 29 Jun 2022 19:33:58 +0000 (22:33 +0300)]
net: phylink: fix NULL pl->pcs dereference during phylink_pcs_poll_start
The current link mode of the phylink instance may not require an
attached PCS. However, phylink_major_config() unconditionally
dereferences this potentially NULL pointer when restarting the link poll
timer, which will panic the kernel.
Fix the problem by checking whether a PCS exists in phylink_pcs_poll_start(),
otherwise do nothing. The code prior to the blamed patch also only
looked at pcs->poll within an "if (pcs)" block.
Fixes: 2fa95c82b4f1 ("net: phylink: disable PCS polling over major configuration") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Tested-by: Gerhard Engleder <gerhard@engleder-embedded.com> Tested-by: Michael Walle <michael@walle.cc> # on kontron-kbox-a-230-ls Tested-by: Nicolas Ferre <nicolas.ferre@microchip.com> # on sam9x60ek Link: https://lore.kernel.org/r/20220629193358.4007923-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Wed, 29 Jun 2022 18:30:07 +0000 (21:30 +0300)]
net: dsa: felix: fix race between reading PSFP stats and port stats
Both PSFP stats and the port stats read by ocelot_check_stats_work() are
indirectly read through the same mechanism - write to STAT_CFG:STAT_VIEW,
read from SYS:STAT:CNT[n].
It's just that for port stats, we write STAT_VIEW with the index of the
port, and for PSFP stats, we write STAT_VIEW with the filter index.
So if we allow them to run concurrently, ocelot_check_stats_work() may
change the view from vsc9959_psfp_counters_get(), and vice versa.
Jakub Kicinski [Wed, 29 Jun 2022 18:19:10 +0000 (11:19 -0700)]
net: tun: avoid disabling NAPI twice
Eric reports that syzbot made short work out of my speculative
fix. Indeed when queue gets detached its tfile->tun remains,
so we would try to stop NAPI twice with a detach(), close()
sequence.
Alternative fix would be to move tun_napi_disable() to
tun_detach_all() and let the NAPI run after the queue
has been detached.
Fixes: 683dbac65349 ("net: tun: stop NAPI when detaching queues") Reported-by: syzbot <syzkaller@googlegroups.com> Reported-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20220629181911.372047-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Thu, 30 Jun 2022 17:03:22 +0000 (10:03 -0700)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma fixes from Jason Gunthorpe:
"Three minor bug fixes:
- qedr not setting the QP timeout properly toward userspace
- Memory leak on error path in ib_cm
- Divide by 0 in RDMA interrupt moderation"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
linux/dim: Fix divide by 0 in RDMA DIM
RDMA/cm: Fix memory leak in ib_cm_insert_listen
RDMA/qedr: Fix reporting QP timeout attribute
Linus Torvalds [Thu, 30 Jun 2022 16:57:18 +0000 (09:57 -0700)]
Merge tag 'fsnotify_for_v5.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull fanotify fix from Jan Kara:
"A fix for recently added fanotify API to have stricter checks and
refuse some invalid flag combinations to make our life easier in the
future"
* tag 'fsnotify_for_v5.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
fanotify: refine the validation checks on non-dir inode mask
Linus Torvalds [Thu, 30 Jun 2022 16:45:42 +0000 (09:45 -0700)]
Merge tag 'v5.19-p3' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fix from Herbert Xu:
"Fix a regression that breaks the ccp driver"
* tag 'v5.19-p3' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: ccp - Fix device IRQ counting by using platform_irq_count()