Commit 2bdbf5a23dc6 ("smsc95xx: add phylib support") amended
smsc95xx_resume() to call phy_init_hw(). That function waits for the
device to runtime resume even though it is placed in the runtime resume
path, causing a deadlock.
The problem is that phy_init_hw() calls down to smsc95xx_mdiobus_read(),
which never uses the _nopm variant of usbnet_read_cmd().
Commit 0fe16173902d ("usbnet: smsc95xx: add reset_resume function with
reset operation") causes a similar deadlock on resume if the device was
already runtime suspended when entering system sleep:
That's because the commit introduced smsc95xx_reset_resume(), which
calls down to smsc95xx_reset(), which neglects to use _nopm accessors.
Fix by auto-detecting whether a device access is performed by the
suspend/resume task_struct and use the _nopm variant if so. This works
because the PM core guarantees that suspend/resume callbacks are run in
task context.
Fixes: 0fe16173902d ("usbnet: smsc95xx: add reset_resume function with reset operation") Signed-off-by: Lukas Wunner <lukas@wunner.de> Cc: stable@vger.kernel.org # v3.16+ Cc: Andre Edich <andre.edich@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Kurt Kanzenbach [Fri, 1 Jul 2022 17:56:06 +0000 (19:56 +0200)]
net: phy: broadcom: Add support for BCM53128 internal PHYs
Add support for BCM53128 internal PHYs. These support interrupts as well as
statistics. Therefore, enable the Broadcom PHY driver for them.
Tested on BCM53128 switch using the mainline b53 DSA driver.
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Describe the switch interrupts (dlr, switch, prp, hub, pattern) which
are connected to the GIC.
Signed-off-by: Clément Léger <clement.leger@bootlin.com> Reviewed-by: Rob Herring <robh@kernel.org> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
mlxsw: Unified bridge conversion - part 6/6
This is the sixth and final part of the conversion of mlxsw to the
unified bridge model. It transitions the last bits of functionality that
were under firmware's responsibility in the legacy model to the driver.
The last patches flip the driver to the unified bridge model and clean
up code that was used to make the conversion easier to review.
Patchset overview:
Patch #1 sets the egress VID for known unicast packets. For multicast
packets, the egress VID is configured using the MPE table. See commit 5f2d4d7933cb ("mlxsw: spectrum_fid: Configure egress VID classification
for multicast").
Patch #2 configures the VNI to FID classification that is used during
decapsulation.
Patch #3 configures ingress router interface (RIF) in FID classification
records, so that when a packet reaches the router block, its ingress RIF
is known. Care is taken to configure this in all the different flows
(e.g., RIF set on a FID, {Port, VID} joins a FID that already has a RIF
etc.).
Patch #4 configures the egress VID for routed packets. For such packets,
the egress VID is not set by the MPE table or by an FDB record at the
egress bridge, but instead by a dedicated table that maps {Egress RIF,
Egress port} to a VID.
Patch #5 removes VID configuration from RIF creation as in the unified
bridge model firmware no longer needs it.
Patch #6 sets the egress FID to use in RIF configuration so that the
device knows using which FID to bridge the packet after routing.
Patches #7-#9 add a new 802.1Q family and associated VLAN RIFs. In the
unified bridge model, we no longer need to emulate 802.1Q FIDs using
802.1D FIDs as VNI can be associated with both.
Patches #10-#11 finally flip the driver to the unified bridge model.
Patches #12-#13 clean up code that was used to make the conversion
easier to review.
Amit Cohen [Mon, 4 Jul 2022 06:11:39 +0000 (09:11 +0300)]
mlxsw: spectrum_fid: Remove '_ub_' indication from structures and defines
Some structures and defines were added with '_ub_' indication, as there
were equivalent objects for the legacy model.
Now when the legacy model is not used anymore, remove the '_ub_'
indication.
Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Amit Cohen [Mon, 4 Jul 2022 06:11:38 +0000 (09:11 +0300)]
mlxsw: spectrum_fid: Remove flood_index() from FID operation structure
The flood_index() function is not needed anymore, as in the unified
bridge model the flood index is calculated using 'mid_base' and
'fid_offset'.
Remove this function.
Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Amit Cohen [Mon, 4 Jul 2022 06:11:37 +0000 (09:11 +0300)]
mlxsw: Enable unified bridge model
After all the preparations for unified bridge model, finally flip mlxsw
driver to use the new model.
Change config profile, set 'ubridge' to true and remove the configurations
that are relevant only for the legacy model. Set 'flood_mode' to
'controlled' as the current mode is not supported with unified bridge
model.
Remove all the code which is dedicated to the legacy model. Remove
'struct mlxsw_sp.ubridge' variable which was temporarily added to separate
configurations between the models.
Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Amit Cohen [Mon, 4 Jul 2022 06:11:36 +0000 (09:11 +0300)]
mlxsw: Add ubridge to config profile
The unified bridge model is enabled via the CONFIG_PROFILE command
during driver initialization. Add the definition of the relevant fields
to the command's payload in preparation for unified bridge enablement.
Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Amit Cohen [Mon, 4 Jul 2022 06:11:35 +0000 (09:11 +0300)]
mlxsw: Add support for 802.1Q FID family
Using the legacy bridge model, there is no VID classification at egress
for 802.1Q FIDs, which means that the VID is maintained.
This behavior cause the limitation that 802.1Q FIDs cannot work with VXLAN.
This limitation stems from the fact that a decapsulated VXLAN packet should
not contain a VLAN tag. If such a packet was to egress from a local port
using a 802.1Q FID, it would "maintain" its VLAN on egress, which is no
VLAN at all.
Currently 802.1Q FIDs are emulated in mlxsw driver using 802.1D FIDs. Using
unified bridge model, there is a FID->VID mapping, so it is possible to
stop emulating 802.1Q FIDs.
The main changes are:
1. Use 'SFGC.bridge_type' = 0, to separate between 802.1Q FIDs and
802.1D FIDs.
2. Use VLAN RIF instead of the emulated one (VLAN_EMU which is emulated
using FID RIF).
3. Create VID->FID mapping when the FID is created. Then when a new port
is mapped to the FID, if it not in virtual mode, no new mapping is
needed. Save the new port in 'port_vid_list', to be able to update a
RIF in all {Port, VID}->FID mappings in case that the port will be in
virtual mode later.
4. Add a dedicated operation function per FID family to update RIF for
VID->FID mappings. For 802.1d and rFID families, just return. For
802.1q family, handle the global mapping which is created for new 802.1q
FID.
Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
In order to make the change easier to review, four new temporary FID
families will be added (e.g., MLXSW_SP_FID_TYPE_8021D_UB) and will not
be registered with the FID core until mlxsw is flipped to use the unified
bridge model.
Add .1d, rfid and dummy FID families for unified bridge, the next patch
will add .1q family separately as it requires more changes.
The following changes are required:
1. Add 'smpe_index_valid' field to 'struct mlxsw_sp_fid_family' and set
SFMR.smpe accordingly. SMPE index is reserved for rFIDs, as their
flooding is handled by firmware, and always reserved in Spectrum-1,
as it is configured as part of PGT table.
2. Add 'ubridge' field to 'struct mlxsw_sp_fid_family'. This field will
be removed later, use it in mlxsw_sp_fid_family_{register,unregister}()
to skip the registration / unregistration of the new families when the
legacy model is used.
3. Indexes - the start and end indexes of each FID family will need to be
changed according to the above diagram.
4. Add flood tables for unified bridge model, use 'fid_offset' as table
type, as in the new model the access to flood tables will be using
'fid_offset' calculation.
5. FID family operation changes:
a. rFID supposed to be created using SFMR, as it is not created by
firmware using unified bridge model.
b. port_vid_map() should perform SVFA for rFID, as the mapping is not
created by firmware using unified bridge model.
c. flood_index() is not aligned to the new model, as this function will
be removed later.
Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Amit Cohen [Mon, 4 Jul 2022 06:11:33 +0000 (09:11 +0300)]
mlxsw: Add support for VLAN RIFs
Router interfaces (RIFs) constructed on top of VLAN-aware bridges are of
'VLAN' type, whereas RIFs constructed on top of VLAN-unaware bridges are of
'FID' type.
Currently 802.1Q FIDs are emulated using 802.1D FIDs, therefore VLAN RIFs
are emulated using FID RIFs. As part of converting the driver to use
unified bridge model, 802.1Q FIDs and VLAN RIFs will be used.
The egress FID is required for VLAN RIFs in Spectrum-2 and above, but not
in Spectrum-1, as in Spectrum-1 the mapping for VLAN RIFs is VID->FID,
while in other ASICs it is FID->FID. The reason for the change is that it
is more scalable to reuse the FID->FID entry than creating multiple
{Port, VID}->FID entries for the router port. Use the existing operation
structure to separate the configuration between different ASICs.
Add support for VLAN RIFs, most of the configurations are same to FID
RIFs.
Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Amit Cohen [Mon, 4 Jul 2022 06:11:32 +0000 (09:11 +0300)]
mlxsw: Configure egress FID classification after routing
After routing, a packet needs to perform an L2 lookup using the DMAC it got
from the routing and a FID. In unified bridge model, the egress FID
configuration needs to be performed by software.
It is configured by RITR for both sub-port RIFs and FID RIFs. Currently
FID RIFs already configure eFID. Add eFID configuration for sub-port RIFs.
Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Amit Cohen [Mon, 4 Jul 2022 06:11:31 +0000 (09:11 +0300)]
mlxsw: spectrum_router: Do not configure VID for sub-port RIFs
The field 'vid' in RITR is reserved when unified bridge model is used
and the RIF's type is sub-port RIF. Instead, ingress VID is configured via
SVFA and egress VID is configured via REIV.
Set 'vid' to zero in RITR register for sub-port RIF when unified bridge
model is used.
Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Amit Cohen [Mon, 4 Jul 2022 06:11:30 +0000 (09:11 +0300)]
mlxsw: spectrum_fid: Configure layer 3 egress VID classification
After routing, the device always consults a table that determines the
packet's egress VID based on {egress RIF, egress local port}. In the
unified bridge model, it is up to software to maintain this table via REIV
register.
The table needs to be updated in the following flows:
1. When a RIF is set on a FID, need to iterate over the FID's {Port, VID}
list and issue REIV write to map the {RIF, Port} to the given VID.
2. When a {Port, VID} is mapped to a FID and the FID already has a RIF,
need to issue REIV write with a single record to map the {RIF, Port}
to the given VID.
REIV register supports a simultaneous update of 256 ports, so use this
capability for the first flow.
Handle the two above mentioned flows.
Add mlxsw_sp_fid_evid_map() function to handle egress VID classification
for both unicast and multicast. Layer 2 multicast configuration is already
done in the driver, just move it to the new function.
Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
After classification, the FID is known, but also all the attributes of
the FID, such as the router interface (RIF) via which a packet that
needs to be routed will ingress the router block.
In the legacy model, when a RIF was created / destroyed, it was
firmware's responsibility to update it in the previously mentioned FID
classification records. In the unified bridge model, this responsibility
moved to software.
The third classification requires to iterate over the FID's {Port, VID}
list and issue SVFA write with the correct mapping table according to the
port's mode (virtual or not). We never map multiple VLANs to the same FID
using VID->FID mapping, so such a mapping needs to be performed once.
When a new FID classification entry is configured and the FID already has
a RIF, set the RIF as part of SVFA configuration.
The reverse needs to be done when clearing a RIF from a FID. Currently,
clearing is done by issuing mlxsw_sp_fid_rif_set() with a NULL RIF pointer.
Instead, introduce mlxsw_sp_fid_rif_unset().
Note that mlxsw_sp_fid_rif_set() is called after the RIF is fully
operational, so it conforms to the internal requirement regarding
SVFA.irif_v: "Must not be set for a non-enabled RIF".
Do not set the ingress RIF for rFIDs, as the {Port, VID}->rFID entry is
configured by firmware when legacy model is used, a next patch will
handle this configuration for rFIDs and unified bridge model.
Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Amit Cohen [Mon, 4 Jul 2022 06:11:28 +0000 (09:11 +0300)]
mlxsw: spectrum_fid: Configure VNI to FID classification
In the new model, SFMR no longer configures both VNI->FID and FID->VNI
classifications, but only the later. The former needs to be configured via
SVFA.
Add SVFA configuration as part of vni_set() and vni_clear().
Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Amit Cohen [Mon, 4 Jul 2022 06:11:27 +0000 (09:11 +0300)]
mlxsw: Configure egress VID for unicast FDB entries
Using unified bridge model, firmware no longer configures the egress VID
"under the hood" and moves this responsibility to software.
For layer 2, this means that software needs to determine the egress VID
for both unicast (i.e., FDB) and multicast (i.e., MDB and flooding) flows.
Unicast FDB records and unicast LAG FDB records have new fields - "set_vid"
and "vid", set them. For records which point to router port, do not set
these fields.
Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 3 Jul 2022 11:02:20 +0000 (12:02 +0100)]
Merge tag 'mlx5-updates-2022-06-29' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
mlx5-updates-2022-06-29
Chris Mi Says:
==============
Remove dependency between sriov and eswitch mode
Currently, there are three eswitch modes, none, legacy and
switchdev. None is the default mode. And when disabling sriov,
current eswitch mode will be changed to none. This patchset
removes eswitch mode none and also removes dependency between
sriov and eswitch mode. With this patchset, there are two
behavior changes:
Original behavior
-----------------
- When driver is loaded without sriov enabled, none is the default
mode. But actually eswitch mode should be either legacy or
switchdev, so devlink will return unsupported when showing
eswitch mode.
- When disabling sriov in either legacy or switchdev mode, eswitch
mode will be changed to none.
New behavior
------------
- When driver is loaded, legacy will be the default mode.
- When disabling sriov in either legacy or switchdev mode, eswitch
mode will not be changed.
Jianbo Liu Says:
================
Add support offloading police action
This patchset supports offloading police action by flow meter ASO
object in hardware.
The first part is to add interfaces to create and destroy flow meter
ASO object, and modify meter parameters by ACCESS_ASO WQE. As multiple
objects are created at a time, and two meters are in one object,
bitmaps are used manage these meters in one creation.
Then the police action can be mapped to a meter by the action index.
After mlx5e tc action refactoring was merged and post_act table was
added, a simple tc flow with one police action is broken down into two
rules in hardware. One rule with the original match in the original
table, which performs a metadata rewrite and do metering, then jumps
to post_meter table. The second rule is placed in the post_act table
with all the actions left.
The rules in post_meter table match on the meter outcome. If the
outcome is GREEN, we merely jump back to the post_act table for
further processing. Otherwise, the outcome is RED, and we drop the
packet.
The last part is to support flow meter ASO object in sw steering.
Signed-off-by: David S. Miller <davem@davemloft.net>
Dario Binacchi says:
====================
This series originated as a result of CAN communication tests for an
application using the USBtin adapter (https://www.fischl.de/usbtin/).
The tests showed some errors but for the driver everything was ok.
Also, being the first time I used the slcan driver, I was amazed that
it was not possible to configure the bitrate via the ip tool.
For these two reasons, I started looking at the driver code and realized
that it didn't use the CAN network device driver interface.
Starting from these assumptions, I tried to:
- Use the CAN network device driver interface.
- Set the bitrate via the ip tool.
- Send the open/close command to the adapter from the driver.
- Add ethtool support to reset the adapter errors.
- Extend the protocol to forward the adapter CAN communication
errors and the CAN state changes to the netdev upper layers.
Except for the protocol extension patches (i. e. forward the adapter CAN
communication errors and the CAN state changes to the netdev upper
layers), the whole series has been tested under QEMU with Linux 4.19.208
using the USBtin adapter.
Testing the extension protocol patches requires updating the adapter
firmware. Before modifying the firmware I think it makes sense to know if
these extensions can be considered useful.
Before applying the series I used these commands:
slcan_attach -f -s6 -o /dev/ttyACM0
slcand ttyACM0 can0
ip link set can0 up
After applying the series I am using these commands:
slcan_attach /dev/ttyACM0
slcand ttyACM0 can0
ip link set dev can0 down
ip link set can0 type can bitrate 500000
ethtool --set-priv-flags can0 err-rst-on-open on
ip link set dev can0 up
Now there is a clearer separation between serial line and CAN,
but above all, it is possible to use the ip and ethtool commands
as it happens for any CAN device driver. The changes are backward
compatible, you can continue to use the slcand and slcan_attach
command options.
Changes in v5:
- Update the commit message.
- Restore the use of rtnl_lock() and rtnl_unlock().
Changes in v4:
- Move the patch in front of the patch "[v3,04/13] can: slcan: use CAN network device driver API".
- Add the CAN_BITRATE_UNSET (0) and CAN_BITRATE_UNKNOWN (-1U) macros.
- Simplify the bitrate check to dump it.
- Update the commit description.
- Update the commit description.
- Use the CAN_BITRATE_UNKNOWN macro.
- Use kfree_skb() instead of can_put_echo_skb() in the slc_xmit().
- Remove the `if (slcan_devs)' check in the slc_dealloc().
- Replace `sl->tty == NULL' with `!sl->tty'.
- Use CAN_BITRATE_UNSET (0) and CAN_BITRATE_UNKNOWN (-1U) macros.
- Don't reset the bitrate in ndo_stop() if it has been configured.
- Squashed to the patch [v3,09/13] can: slcan: send the close command to the adapter.
- Use the CAN_BITRATE_UNKNOWN macro.
- Add description of slc_bump_err() function.
- Remove check for the 'e' character at the beggining of the function.
It was already checked by the caller function.
- Protect decoding against the case the len value is longer than the
received data.
- Some small changes to make the decoding more readable.
- Increment all the error counters at the end of the function.
- Add description of slc_bump_state() function.
- Remove check for the 's' character at the beggining of the function.
It was already checked by the caller function.
- Protect decoding against the case the frame len is longer than the
received data (add SLC_STATE_FRAME_LEN macro).
- Set cf to NULL in case of alloc_can_err_skb() failure.
- Some small changes to make the decoding more readable.
- Use the character 'b' instead of 'f' for bus-off state.
Changes in v3:
- Increment the error counter in case of decoding failure.
- Replace (-1) with (-1U) in the commit description.
- Update the commit description.
- Remove the slc_do_set_bittiming().
- Set the bitrate in the ndo_open().
- Replace -1UL with -1U in setting a fake value for the bitrate.
- Drop the patch "can: slcan: simplify the device de-allocation".
- Add the patch "can: netlink: dump bitrate 0 if can_priv::bittiming.bitrate is -1U".
Changes in v2:
- Put the data into the allocated skb directly instead of first
filling the "cf" on the stack and then doing a memcpy().
- Move CAN_SLCAN Kconfig option inside CAN_DEV scope.
- Improve the commit message.
- Use the CAN framework support for setting fixed bit rates.
- Improve the commit message.
- Protect decoding against the case the len value is longer than the
received data.
- Continue error handling even if no skb can be allocated.
- Continue error handling even if no skb can be allocated.
====================
Dario Binacchi [Tue, 28 Jun 2022 16:31:31 +0000 (18:31 +0200)]
can: slcan: set bitrate by CAN device driver API
It allows to set the bitrate via ip tool, as it happens for the other
CAN device drivers. It still remains possible to set the bitrate via
slcand or slcan_attach utilities. In case the ip tool is used, the
driver will send the serial command to the adapter.
Dario Binacchi [Tue, 28 Jun 2022 16:31:30 +0000 (18:31 +0200)]
can: slcan: allow to send commands to the adapter
This is a preparation patch for the upcoming support to change the
bitrate via ip tool, reset the adapter error states via the ethtool API
and, more generally, send commands to the adapter.
Since the close command (i. e. "C\r") will be sent in the ndo_stop()
where netif_running() returns false, a new flag bit (i. e. SLF_XCMD) for
serial transmission has to be added.
Dario Binacchi [Tue, 28 Jun 2022 16:31:29 +0000 (18:31 +0200)]
can: slcan: use CAN network device driver API
As suggested by commit [1], now the driver uses the functions and the
data structures provided by the CAN network device driver interface.
Currently the driver doesn't implement a way to set bitrate for SLCAN
based devices via ip tool, so you'll have to do this by slcand or
slcan_attach invocation through the -sX parameter:
where -s6 in will set adapter's bitrate to 500 Kbit/s and -s8 to
1Mbit/s.
See the table below for further CAN bitrates:
- s0 -> 10 Kbit/s
- s1 -> 20 Kbit/s
- s2 -> 50 Kbit/s
- s3 -> 100 Kbit/s
- s4 -> 125 Kbit/s
- s5 -> 250 Kbit/s
- s6 -> 500 Kbit/s
- s7 -> 800 Kbit/s
- s8 -> 1000 Kbit/s
In doing so, the struct can_priv::bittiming.bitrate of the driver is not
set and since the open_candev() checks that the bitrate has been set, it
must be a non-zero value, the bitrate is set to a fake value (-1U)
before it is called.
Using the rtnl_lock()/rtnl_unlock() functions has become a bit more
tricky as the register_candev() function indirectly calls rtnl_lock()
via register_netdev(). To avoid a deadlock it is therefore necessary to
call rtnl_unlock() before calling register_candev(). The same goes for
the unregister_candev() function.
[1] commit 3d562a33af4ba ("can: CAN Network device driver and Netlink interface")
Dario Binacchi [Tue, 28 Jun 2022 16:31:28 +0000 (18:31 +0200)]
can: netlink: dump bitrate 0 if can_priv::bittiming.bitrate is -1U
Upcoming changes on slcan driver will require you to specify a bitrate
of value -1 to prevent the open_candev() from failing but at the same
time highlighting that it is a fake value. In this case the command
`ip --details -s -s link show' would print 4294967295 as the bitrate
value. The patch change this value in 0.
Jianbo Liu [Tue, 22 Jun 2021 08:25:38 +0000 (08:25 +0000)]
net/mlx5e: TC, Support offloading police action
Add parsing support by implementing struct mlx5e_tc_act for police
action.
TC rule with police actions is broken down into several rules in
different tables. One rule with the original match in the original
flow table, which set fte_id, do metering, and jump to the post_meter
table. If there are more police actions, more rules are created for
each of them. Besides, a last rule is created in the end.
In post_meter table, there are two pre-defined rules, one is to drop
packet if its packet color is RED, the other is to jump back to
post_act table. As fte_id is updated before jumping, the rule for next
meter is matched to do another round of metering (if there are
multiple meters in the flow rule). Otherwise, last fte_id is matched
and do the original actions.
Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Ariel Levkovich <lariel@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Jianbo Liu [Fri, 18 Jun 2021 06:47:15 +0000 (06:47 +0000)]
net/mlx5e: Add post meter table for flow metering
Flow meter object monitors the packets rate for the flows it is
attached to, and color packets with GREEN or RED. The post meter table
is used to check the color. Packet is dropped if it's RED, or
forwarded to post_act table if GREEN.
Packet color will be set to 8 LSB of the register C5, so they are
reserved for metering, which are previously used for matching fte id.
Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Ariel Levkovich <lariel@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Jianbo Liu [Mon, 7 Jun 2021 01:40:16 +0000 (01:40 +0000)]
net/mlx5e: Get or put meter by the index of tc police action
Add functions to create and destroy flow meter aso object.
This object only supports the range allocation. 64 objects are
allocated at a time, and there are two meters in each object.
Usually only one meter is allocated for a flow, so bitmap is used
to manage these 128 meters.
TC police action is mapped to hardware meter. As the index is unique
for each police action, add APIs to allocate or free hardware meter by
the index. If the meter is already created, increment its refcnt,
otherwise create new one. If police action has different parameters,
update hardware meter accordingly.
Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Jianbo Liu [Mon, 7 Jun 2021 03:56:05 +0000 (03:56 +0000)]
net/mlx5e: Add support to modify hardware flow meter parameters
The policing rate and burst from user are converted to flow meter
parameters in hardware. These parameters are set or modified by
ACCESS_ASO WQE, add function to support it.
Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Ariel Levkovich <lariel@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Jianbo Liu [Fri, 29 Apr 2022 07:46:47 +0000 (07:46 +0000)]
net/mlx5: Implement interfaces to control ASO SQ and CQ
Add interfaces to use ASO object control channel. The channel consists
of a control SQ and CQ to which user can post ACCESS_ASO work requests
to modify ASO objects. The functions to get wqe from SQ, fill wqe,
post the request, and poll the completion of the work, are provided.
Jianbo Liu [Sat, 30 Apr 2022 14:31:28 +0000 (14:31 +0000)]
net/mlx5: Add support to create SQ and CQ for ASO
Add a separate API to create SQ and CQ for advanced steering
operations (ASO).
Since the mlx5_en API to create these resources is strongly coupled
with netdev channels and datapath elements, this API provides an
alternative for creating send queues that are used for ASO.
Currently the API allows creating channels with 2 wqbbs only - meaning
the support will be for a single ACCESS_ASO wqe with data at a time.
Chris Mi [Mon, 30 May 2022 03:07:57 +0000 (06:07 +0300)]
net/mlx5: E-switch, Remove dependency between sriov and eswitch mode
Currently, there are three eswitch modes, none, legacy and switchdev.
None is the default mode. Remove redundant none mode as eswitch mode
should always be either legacy mode or switchdev mode.
With this patch, there are two behavior changes:
1. Legacy becomes the default mode. When querying eswitch mode using
devlink, a valid mode is always returned.
2. When disabling sriov, the eswitch mode will not change, only vfs
are unloaded.
Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Chris Mi [Thu, 5 May 2022 06:23:39 +0000 (09:23 +0300)]
net/mlx5: E-switch, Introduce flag to indicate if fdb table is created
Introduce flag to indicate if fdb table is created as a pre-step
to prepare for removing dependency between sriov and eswitch mode
in the downstream patches.
Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Chris Mi [Thu, 10 Feb 2022 07:22:04 +0000 (09:22 +0200)]
net/mlx5: E-switch, Introduce flag to indicate if vport acl namespace is created
Eswitch vport acl namespace is needed when loading vfs. There is
no need to free and reallocate it when switching eswitch mode.
Introduce flag to indicate if it is created or not. When needed,
create it. Only free it when the driver is unloaded or in bare
metal mode.
Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Before commit c6a85b7cf2dc ("net/mlx5: Lag, use lag lock") there
used to be a matching mlx5_esw_lock() function and the lock and
unlock functions were symmetric. But now we take the lock
unconditionally and must unlock unconditionally as well.
As near as I can tell this is dead code and can just be deleted.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
David S. Miller [Sat, 2 Jul 2022 15:34:05 +0000 (16:34 +0100)]
Merge branch 'lan937x-dsa-driver'
Arun Ramadoss says:
====================
net: dsa: microchip: DSA Driver support for LAN937x
LAN937x is a Multi-Port 100BASE-T1 Ethernet Physical Layer switch
compliant with the IEEE 802.3bw-2015 specification. The device provides
100 Mbit/s transmit and receive capability over a single Unshielded
Twisted Pair (UTP) cable. LAN937x is successive revision of KSZ series
switch.
This series of patches provide the DSA driver support for
Microchip LAN937X switch through MII/RMII interface. The RGMII interface
support will be added in the follow up series. LAN937x uses the most of
functionality of KSZ9477.
The LAN937x switch series family consists of following SKUs:
LAN9370:
- 4 T1 Phys
- 1 RGMII port
LAN9371:
- 3 T1 Phys & 1 TX Phy
- 2 RGMII ports
LAN9372:
- 5 T1 Phys & 1 TX Phy
- 2 RGMII ports
LAN9373:
- 5 T1 Phys
- 2 RGMII
- 1 SGMII port
LAN9374:
- 6 T1 Phys
- 2 RGMII ports
Changes in v15:
- fixed compilation issue.
- Updated the phylink_mac_link_up to check only for 10/100/1000 speed.
Changes in v14:
- Updated the patch series to latest ksz code refactoring.
- RGMII register configuration is removed from the series. It will be added in
the follow up patch series.
Changes in v13:
- Fixed the compilation issue in patch 5 and 6
Changes in v12:
- Removed the reduntant spi indirect enable in lan937x_init
- Used the ksz_port_stp_state_set function
- Apply rgmii internal delay only if it is rgmii port
- Set the bit for 100baseTx in phylink_get_caps
- Moved the ethtool related API from patch 5 to 7
- Moved lan_alu_entry struct in lan937x_dev.h from patch 5 to 9
- Moved lan_vlan_entry in lan937x_dev.h from patch 5 to 10
- Used the ksz_get_stats64 function for get_stats64 hook
- Splitted the patch 5. one for port configuration, spi driver, phy read &
write and mtu configuration.
- Updated the indentation in ethernet-controller.yaml
- lan937x.yaml: Removed the blank lines, updated the ethernet handle to macb0.
Added the rgmii internal delay only for the ports.
Changes in v11:
- Tagged as RFC to get the feedback for the subpatches 1/10, 5/10 and 6/10
Changes in v10:
- dsa.yaml: dropped moving mdio properties to dsa.yaml as per the feedback
https://patchwork.kernel.org/project/netdevbpf/patch/20220318085540.281721-3-prasanna.vengateshan@microchip.com/#24787466
- microchip,lan937x.yaml: Naming convention changes in the example
- lan937x_main.c: Moving configurations from lan937x_reset_switch() to setup()
- lan937x_main.c: helper function has been introduced for
lan937x_internal_phy_read & write
- lan937x_dev.h: lan_alu_struct struct data type changes
- lan937x_main.c: lan937x_get_stats64 make non blocking
- lan937x_main.c: modified lan937x_port_mirror_add to include extack
Changes in v9:
- lan937x_main.c: of_node_put() correction in lan937x_parse_dt_rgmii_delay
- lan937x_dev.c: removed the interface checks from lan937x_apply_rgmii_delay.
- changes in ethernet-controller.yaml and dsa.yaml
Changes in v8:
- lan937x_dev.c: fixed lan937x_r_mib_pkt warning in the sub patches
- lan937x_main.c: phylink_autoneg_inband() check removed in
lan937x_phylink_mac_link_up()
- lan937x_main.c: made legacy_pre_march2020 = false as this is non-legacy driver
and indentation correction in lan937x_phylink_mac_link_up()
- removed unnecessary parenthesis in lan937x_get_strings()
Changes in v7:
- microchip,lan937x.yaml: *-internal-delay-ps enum values & commit messages
corrections
- lan937x_main.c: removed phylink_validate() and added phylink_get_caps()
- lan937x_main.c: added support for ethtool standard stats (get_eth_*_stats
and get_stats64)
- lan937x_main.c: removed unnecessary PVID read from lan937x_port_vlan_del()
- integrated the changes of ksz9477 multi bridging support to lan937x dev and
tested both multi bridging and STP
- lan937x_port_vlan_del - dummy pvid read removed
Changes in v6:
- microchip_t1.c: There was new merge done in the net-next tree for
microchip_1.c after the v5 submission. Hence rebased it for v6.
Changes in v5:
- microchip,lan937x.yaml: Added mdio properties detail
- microchip,lan937x.yaml: *-internal-delay-ps added under port node
- lan937x_dev.c: changed devm_mdiobus_alloc from of_mdiobus_register as suggested
by Vladimir
- lan937x_dev.c: added dev_info for rgmii internal delay & error message to user
in case of out of range values
- lan937x_dev.c: return -EOPNOTSUPP for C45 regnum values for
lan937x_sw_mdio_read & write operations
- return from function with out storing in a variable
- lan937x_main.c: Added vlan_enable info in vlan_filtering API
- lan937x_main.c: lan937x_port_vlan_del: removed unintended PVID write
Changes in v4:
- tag_ksz.c: cpu_to_be16 to put_unaligned_be16
- correct spacing in comments
- tag_ksz.c: NETIF_F_HW_CSUM fix is integrated
- lan937x_dev.c: mdio_np is removed from global and handled locally
- lan937x_dev.c: unused functions removed lan937x_cfg32 & lan937x_port_cfg32
- lan937x_dev.c: lan937x_is_internal_100BTX_phy_port function name changes
- lan937x_dev.c: RGMII internal delay handling for MAC. Delay values are
retrieved from DTS and updated
- lan937x_dev.c: corrected mutex operations for few dev variables
- microchip,lan937x.yaml: introduced rx-internal-delay-ps &
tx-internal-delay-ps for RGMII internal delay
- lan937x_dev.c: Unnecessary mutex_lock has been removed
- lan937x_main.c: PHY_INTERFACE_MODE_NA handling for lan937x_phylink_validate
- lan937x_main.c: PORT_MIRROR_SNIFFER check in right place
- lan937x_main.c: memset is used instead of writing 0's individually in
lan937x_port_fdb_add function
- lan937x_main.c: Removed \n from NL_SET_ERR_MSG_MOD calls
Changes in v3:
- Removed settings of cnt_ptr to zero and the memset()
added a cleanup patch which moves this into ksz_init_mib_timer().
- Used ret everywhere instead of rc
- microchip,lan937x.yaml: Remove mdio compatible
- microchip_t1.c: Renaming standard phy registers
- tag_ksz.c: LAN937X_TAIL_TAG_OVERRIDE renaming
LAN937X_TAIL_TAG_BLOCKING_OVERRIDE
- tag_ksz.c: Changed Ingress and Egress naming convention based on
Host
- tag_ksz.c: converted to skb_mac_header(skb) from
(is_link_local_ether_addr(hdr->h_dest))
- lan937x_dev.c: Removed BCAST Storm protection settings since we
have Tc commands for them
- lan937x_dev.c: Flow control setting in lan937x_port_setup function
- lan937x_dev.c: RGMII internal delay added only for cpu port,
- lan937x_dev.c: of_get_compatible_child(node,
"microchip,lan937x-mdio") to of_get_child_by_name(node, "mdio");
- lan937x_dev.c:lan937x_get_interface API: returned
PHY_INTERFACE_MODE_INTERNAL instead of PHY_INTERFACE_MODE_NA
- lan937x_main.c: Removed compat interface implementation in
lan937x_config_cpu_port() API & dev_info corrected as well
- lan937x_main.c: deleted ds->configure_vlan_while_not_filtering
= true
- lan937x_main.c: Added explanation for lan937x_setup lines
- lan937x_main.c: FR_MAX_SIZE correction in lan937x_get_max_mtu API
- lan937x_main.c: removed lan937x_port_bridge_flags dummy functions
- lan937x_spi.c - mdiobus_unregister to be added to spi_remove
function
- lan937x_main.c: phy link layer changes
- lan937x_main.c: port mirroring: sniff port selection limiting to
one port
- lan937x_main.c: Changed to global vlan filtering
- lan937x_main.c: vlan_table array to structure
- lan937x_main.c -Use extack instead of reporting errors to Console
- lan937x_main.c - Remove cpu_port addition in vlan_add api
- lan937x_main.c - removed pvid resetting
Changes in v2:
- return check for register read/writes
- dt compatible compatible check is added against chip id value
- lan937x_internal_t1_tx_phy_write() is renamed to
lan937x_internal_phy_write()
- lan937x_is_internal_tx_phy_port is renamed to
lan937x_is_internal_100BTX_phy_port as it is 100Base-Tx phy
- Return value for lan937x_internal_phy_write() is -EOPNOTSUPP
in case of failures
- Return value for lan937x_internal_phy_read() is 0xffff
for non existent phy
- cpu_port checking is removed from lan937x_port_stp_state_set()
- lan937x_phy_link_validate: 100baseT_Full to 100baseT1_Full
- T1 Phy driver is moved to drivers/net/phy/microchip_t1.c
- Tx phy driver support will be added later
- Legacy switch checkings in dts file are removed.
- tag_ksz.c: Re-used ksz9477_rcv for lan937x_rcv
- tag_ksz.c: Xmit() & rcv() Comments are corrected w.r.to host
- net/dsa/Kconfig: Family skew numbers altered in ascending order
- microchip,lan937x.yaml: eth is replaced with ethernet
- microchip,lan937x.yaml: spi1 is replaced with spi
- microchip,lan937x.yaml: cpu labelling is removed
- microchip,lan937x.yaml: port@x value will match the reg value now
====================
net: dsa: microchip: lan937x: add phylink_get_caps support
The internal phy of the LAN937x are capable of 100Mbps Full duplex. The
xMII port of switch is capable of 10Mbps Full & Half Duplex, 100Mbps
Full & Half Duplex and 1000Mbps Half duplex. xMII port also supports Tx
and Rx Flow control.
Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: dsa: microchip: add DSA support for microchip LAN937x
Basic DSA driver support for lan937x and the device will be
configured through SPI interface.
It adds the lan937x_dev_ops in ksz_common.c file and tries to reuse the
functionality of ksz9477 series switch.
drivers/net/dsa/microchip/ path is already part of MAINTAINERS &
the new files come under this path. Hence no update needed to the
MAINTAINERS
Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: dsa: microchip: generic access to ksz9477 static and reserved table
The ksz9477 and lan937x has few difference in the static and reserved
table register 0x041C. For the ksz9477 if the bit 0 is 1 - read
operation and 0 - write operation. But for lan937x bit 1:0 used for
selecting the read/write operation, 01 - write and 10 - read.
To use ksz9477 mdb add/del and enable_stp_addr for the lan937x, masks &
shifts are introduced for ksz9477 & lan937x in ksz_common.c. Then
updated the function with masks & shifts based on the switch instead of
hard coding it.
Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: dsa: tag_ksz: add tag handling for Microchip LAN937x
The Microchip LAN937X switches have a tagging protocol which is
very similar to KSZ tagging. So that the implementation is added to
tag_ksz.c and reused common APIs
Signed-off-by: Prasanna Vengateshan <prasanna.vengateshan@microchip.com> Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
dt-bindings: net: dsa: dt bindings for microchip lan937x
Documentation in .yaml format and updates to the MAINTAINERS
Also 'make dt_binding_check' is passed.
RGMII internal delay values for the mac is retrieved from
rx-internal-delay-ps & tx-internal-delay-ps as per the feedback from
v3 patch series.
https://lore.kernel.org/netdev/20210802121550.gqgbipqdvp5x76ii@skbuf/
It supports only the delay value of 0ns and 2ns.
Signed-off-by: Prasanna Vengateshan <prasanna.vengateshan@microchip.com> Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Reviewed-by: Rob Herring <robh@kernel.org> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
dt-bindings: net: Updated micrel,led-mode for LAN8814 PHY
Enable led-mode configuration for LAN8814 phy
Signed-off-by: Divya Koppera <Divya.Koppera@microchip.com> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 30 Jun 2022 15:07:50 +0000 (15:07 +0000)]
net: add skb_[inner_]tcp_all_headers helpers
Most drivers use "skb_transport_offset(skb) + tcp_hdrlen(skb)"
to compute headers length for a TCP packet, but others
use more convoluted (but equivalent) ways.
Add skb_tcp_all_headers() and skb_inner_tcp_all_headers()
helpers to harmonize this a bit.
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Clément Léger [Wed, 29 Jun 2022 12:20:03 +0000 (14:20 +0200)]
net: pcs: rzn1-miic: update speed only if interface is changed
As stated by Russel King, miic_config() can be called as a result of
ethtool setting the configuration while the link is already up. Since
the speed is also set in this function, it could potentially modify
the current speed that is set. This will only happen if there is
no PHY present and we aren't using fixed-link mode.
Handle that by storing the current interface mode in the miic_port
structure and update the speed only if the interface mode is going to
be changed.
Bin Chen [Thu, 30 Jun 2022 11:21:55 +0000 (13:21 +0200)]
nfp: support VF rate limit with NFDK
Support VF rate limiting with NFDK by adding ndo_set_vf_rate to the NFDK
ops structure.
NFDK is used to communicate via PCIE to NFP-3800 based NICs
while NFD3 is used for other NICs supported by the NFP driver.
The VF rate limit feature is already supported by the driver for NFD3.
Signed-off-by: Bin Chen <bin.chen@corigine.com> Reviewed-by: Baowen Zheng <baowen.zheng@corigine.com> Reviewed-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Alaa Mohamed [Thu, 30 Jun 2022 10:24:49 +0000 (12:24 +0200)]
selftests: net: fib_rule_tests: fix support for running individual tests
parsing and usage of -t got missed in the previous patch.
this patch fixes it
Fixes: 00670779b23e ("selftests: net: fib_rule_tests: add support to select a test to run") Signed-off-by: Alaa Mohamed <eng.alaamohamedsoliman.am@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 1 Jul 2022 12:25:00 +0000 (13:25 +0100)]
Merge branch 'mptcp-mem-scheduling'
Mat Martineau says:
====================
mptcp: Updates for mem scheduling and SK_RECLAIM
In the "net: reduce tcp_memory_allocated inflation" series (merge commit 36a2db252d65), Eric Dumazet noted that "Removal of SK_RECLAIM_CHUNK and
SK_RECLAIM_THRESHOLD is left to MPTCP maintainers as a follow up."
Patches 1-3 align MPTCP with the above TCP changes to forward memory
allocation, reclaim, and memory scheduling.
Patch 4 removes the SK_RECLAIM_* macros as Eric requested.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Thu, 30 Jun 2022 22:17:57 +0000 (15:17 -0700)]
net: remove SK_RECLAIM_THRESHOLD and SK_RECLAIM_CHUNK
There are no more users for the mentioned macros, just
drop them.
Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Thu, 30 Jun 2022 22:17:56 +0000 (15:17 -0700)]
mptcp: refine memory scheduling
Similar to commit 91efb764f795 ("net: fix sk_wmem_schedule() and
sk_rmem_schedule() errors"), let the MPTCP receive path schedule
exactly the required amount of memory.
Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Thu, 30 Jun 2022 22:17:55 +0000 (15:17 -0700)]
mptcp: drop SK_RECLAIM_* macros
After commit 51d8ac882d1b ("net: keep sk->sk_forward_alloc as small as
possible"), the MPTCP protocol is the last SK_RECLAIM_CHUNK and
SK_RECLAIM_THRESHOLD users.
Update the MPTCP reclaim schema to match the core/TCP one and drop the
mentioned macros. This additionally clean the MPTCP code a bit.
Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Thu, 30 Jun 2022 22:17:54 +0000 (15:17 -0700)]
mptcp: never fetch fwd memory from the subflow
The memory accounting is broken in such exceptional code
path, and after commit 51d8ac882d1b ("net: keep sk->sk_forward_alloc
as small as possible") we can't find much help there.
Drop the broken code.
Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 1 Jul 2022 10:21:56 +0000 (11:21 +0100)]
Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/nex
t-queue
Tony Nguyen says:
====================
100GbE Intel Wired LAN Driver Updates 2022-06-30
This series contains updates to ice driver only.
Martyna adds support for VLAN related TC switchdev filters and reworks
dummy packet implementation of VLANs to enable dynamic header insertion to
allow for more rule types.
Lu Wei utilizes eth_broadcast_addr() helper over an open coded version.
Ziyang Xuan removes unneeded NULL checks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jilin Yuan [Thu, 30 Jun 2022 07:57:51 +0000 (15:57 +0800)]
ethernet/neterion: fix repeated words in comments
Delete the redundant word 'the'.
Delete the redundant word 'a'.
Delete the redundant word 'frame'.
Delete the redundant word 'is'.
Delete the redundant word 'not'.
Signed-off-by: Jilin Yuan <yuanjilin@cdjrlc.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
Prevent permanently closed tc-taprio gates from blocking a Felix DSA switch port
Richie Pearn reports that if we install a tc-taprio schedule on a Felix
switch port, and that schedule has at least one gate that never opens
(for example TC0 below):
then packets classified to the permanently closed traffic class will not
be dequeued by the egress port. They will just remain in the queue
system, to consume resources. Frame aging does not trigger either,
because in order for that to happen, the packets need to be eligible for
egress scheduling in the first place, which they aren't. If that port is
allowed to consume the entire shared buffer of the switch (as we
configure things by default using devlink-sb), then eventually, by
sending enough packets, the entire switch will hang.
If we think enough about the problem, we realize that this is only a
special case of a more general issue, and can also be reproduced with
gates that aren't permanently closed, but are not large enough to send
an entire frame. In that sense, a permanently closed gate is simply a
case where all frames are oversized.
The ENETC has logic to reject transmitted packets that would overrun the
time window - see commit aad47a97afd1 ("net: enetc: count the tc-taprio
window drops").
The Felix switch has no such thing on a per-packet basis, but it has a
register replicated per {egress port, TC} which essentially limits the
max MTU. A packet which exceeds the per-port-TC MTU is immediately
discarded and therefore will not hang the port anymore (albeit, sadly,
this only bumps a generic drop hardware counter and we cannot really
infer the reason such as to offer a dedicated counter for these events).
This patch set calculates the max MTU per {port, TC} when the tc-taprio
config, or link speed, or port-global MTU values change. This solves the
larger "gate too small for packet" problem, but also the original issue
with the gate permanently closed that was reported by Richie.
====================
Vladimir Oltean [Tue, 28 Jun 2022 14:52:38 +0000 (17:52 +0300)]
time64.h: consolidate uses of PSEC_PER_NSEC
Time-sensitive networking code needs to work with PTP times expressed in
nanoseconds, and with packet transmission times expressed in
picoseconds, since those would be fractional at higher than gigabit
speed when expressed in nanoseconds.
Convert the existing uses in tc-taprio and the ocelot/felix DSA driver
to a PSEC_PER_NSEC macro. This macro is placed in include/linux/time64.h
as opposed to its relatives (PSEC_PER_SEC etc) from include/vdso/time64.h
because the vDSO library does not (yet) need/use it.
Cc: Andy Lutomirski <luto@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> # for the vDSO parts Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Tue, 28 Jun 2022 14:52:37 +0000 (17:52 +0300)]
net: dsa: felix: drop oversized frames with tc-taprio instead of hanging the port
Currently, sending a packet into a time gate too small for it (or always
closed) causes the queue system to hold the frame forever. Even worse,
this frame isn't subject to aging either, because for that to happen, it
needs to be scheduled for transmission in the first place. But the frame
will consume buffer memory and frame references while it is forever held
in the queue system.
Before commit f24f9e3d8757 ("net: mscc: ocelot: initialize watermarks to
sane defaults"), this behavior was somewhat subtle, as the switch had a
more intricately tuned default watermark configuration out of reset,
which did not allow any single port and tc to consume the entire switch
buffer space. Nonetheless, the held frames are still there, and they
reduce the total backplane capacity of the switch.
However, after the aforementioned commit, the behavior can be very
clearly seen, since we deliberately allow each {port, tc} to consume the
entire shared buffer of the switch minus the reservations (and we
disable all reservations by default). That is to say, we allow a
permanently closed tc-taprio gate to hang the entire switch.
A careful inspection of the documentation shows that the QSYS:Q_MAX_SDU
per-port-tc registers serve 2 purposes: one is for guard band calculation
(when zero, this falls back to QSYS:PORT_MAX_SDU), and the other is to
enable oversized frame dropping (when non-zero).
Currently the QSYS:Q_MAX_SDU registers are all zero, so oversized frame
dropping is disabled. The goal of the change is to enable it seamlessly.
For that, we need to hook into the MTU change, tc-taprio change, and
port link speed change procedures, since we depend on these variables.
Frames are not dropped on egress due to a queue system oversize
condition, instead that egress port is simply excluded from the mask of
valid destination ports for the packet. If there are no destination
ports at all, the ingress counter that increments is the generic
"drop_tail" in ethtool -S.
The issue exists in various forms since the tc-taprio offload was introduced.
Fixes: edc15c3360cf ("net: dsa: felix: Configure Time-Aware Scheduler via taprio offload") Reported-by: Richie Pearn <richard.pearn@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Tue, 28 Jun 2022 14:52:36 +0000 (17:52 +0300)]
net: dsa: felix: keep QSYS_TAG_CONFIG_INIT_GATE_STATE(0xFF) out of rmw
In vsc9959_tas_clock_adjust(), the INIT_GATE_STATE field is not changed,
only the ENABLE field. Similarly for the disabling of the time-aware
shaper in vsc9959_qos_port_tas_set().
To reflect this, keep the QSYS_TAG_CONFIG_INIT_GATE_STATE_M mask out of
the read-modify-write procedure to make it clearer what is the intention
of the code.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Tue, 28 Jun 2022 14:52:35 +0000 (17:52 +0300)]
net: dsa: felix: keep reference on entire tc-taprio config
In a future change we will need to remember the entire tc-taprio config
on all ports rather than just the base time, so use the
taprio_offload_get() helper function to replace ocelot_port->base_time
with ocelot_port->taprio.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>