]> git.baikalelectronics.ru Git - kernel.git/log
kernel.git
3 years agonet: annotate accesses to queue->trans_start
Eric Dumazet [Wed, 17 Nov 2021 03:29:22 +0000 (19:29 -0800)]
net: annotate accesses to queue->trans_start

In following patches, dev_watchdog() will no longer stop all queues.
It will read queue->trans_start locklessly.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: use an atomic_long_t for queue->trans_timeout
Eric Dumazet [Wed, 17 Nov 2021 03:29:21 +0000 (19:29 -0800)]
net: use an atomic_long_t for queue->trans_timeout

tx_timeout_show() assumed dev_watchdog() would stop all
the queues, to fetch queue->trans_timeout under protection
of the queue->_xmit_lock.

As we want to no longer disrupt transmits, we use an
atomic_long_t instead.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: david decotigny <david.decotigny@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge tag 'for-net-next-2021-11-16' of git://git.kernel.org/pub/scm/linux/kernel...
David S. Miller [Wed, 17 Nov 2021 14:52:44 +0000 (14:52 +0000)]
Merge tag 'for-net-next-2021-11-16' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next

Luiz Augusto von Dentz says:

====================
bluetooth-next pull request for net-next:

 - Add support for AOSP Bluetooth Quality Report
 - Enables AOSP extension for Mediatek Chip (MT7921 & MT7922)
 - Rework of HCI command execution serialization
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: ethernet: ti: cpsw: Enable PHY timestamping
Kurt Kanzenbach [Tue, 16 Nov 2021 08:03:25 +0000 (09:03 +0100)]
net: ethernet: ti: cpsw: Enable PHY timestamping

If the used PHYs also support hardware timestamping, all configuration requests
should be forwared to the PHYs instead of being processed by the MAC driver
itself.

This enables PHY timestamping in combination with the cpsw driver.

Tested with an am335x based board with two DP83640 PHYs connected to the cpsw
switch.

Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoDocumentation: networking: net_failover: Fix documentation
Vasudev Kamath [Tue, 16 Nov 2021 07:21:48 +0000 (12:51 +0530)]
Documentation: networking: net_failover: Fix documentation

Update net_failover documentation with missing and incomplete
details to get a proper working setup.

Signed-off-by: Vasudev Kamath <vasudev@copyninja.info>
Reviewed-by: Krishna Kumar <krikku@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'ocelot_net-phylink'
David S. Miller [Wed, 17 Nov 2021 11:25:45 +0000 (11:25 +0000)]
Merge branch 'ocelot_net-phylink'

Russell King says:

====================
net: ocelot_net: phylink validate implementation updates

This series converts ocelot_net to fill in the supported_interfaces
member of phylink_config, cleans up the validate() implementation,
and then converts to phylink_generic_validate().
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: ocelot_net: use phylink_generic_validate()
Russell King (Oracle) [Tue, 16 Nov 2021 10:09:41 +0000 (10:09 +0000)]
net: ocelot_net: use phylink_generic_validate()

ocelot_net has no special behaviour in its validation implementation, so
can be switched to phylink_generic_validate().

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: ocelot_net: remove interface checks in macb_validate()
Russell King (Oracle) [Tue, 16 Nov 2021 10:09:36 +0000 (10:09 +0000)]
net: ocelot_net: remove interface checks in macb_validate()

As phylink checks the interface mode against the supported_interfaces
bitmap, we no longer need to validate the interface mode in the
validation function. Remove this to simplify it.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: ocelot_net: populate supported_interfaces member
Russell King (Oracle) [Tue, 16 Nov 2021 10:09:31 +0000 (10:09 +0000)]
net: ocelot_net: populate supported_interfaces member

Populate the phy interface mode bitmap for the MSCC Ocelot driver with
the interface modes supported by the MAC.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'mtk_eth_soc-phylink'
David S. Miller [Wed, 17 Nov 2021 11:23:39 +0000 (11:23 +0000)]
Merge branch 'mtk_eth_soc-phylink'

Russell King says:

====================
net: mtk_eth_soc: phylink validate implementation updates

This series converts mtk_eth_soc to fill in the supported_interfaces
member of phylink_config, cleans up the validate() implementation, and
then converts to phylink_generic_validate().
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: mtk_eth_soc: use phylink_generic_validate()
Russell King (Oracle) [Tue, 16 Nov 2021 10:06:58 +0000 (10:06 +0000)]
net: mtk_eth_soc: use phylink_generic_validate()

mtk_eth_soc has no special behaviour in its validation implementation,
so can be switched to phylink_generic_validate().

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: mtk_eth_soc: drop use of phylink_helper_basex_speed()
Russell King (Oracle) [Tue, 16 Nov 2021 10:06:53 +0000 (10:06 +0000)]
net: mtk_eth_soc: drop use of phylink_helper_basex_speed()

Now that we have a better method to select SFP interface modes, we
no longer need to use phylink_helper_basex_speed() in a driver's
validation function, and we can also get rid of our hack to indicate
both 1000base-X and 2500base-X if the comphy is present to make that
work. Remove this hack and use of phylink_helper_basex_speed().

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: mtk_eth_soc: remove interface checks in mtk_validate()
Russell King (Oracle) [Tue, 16 Nov 2021 10:06:48 +0000 (10:06 +0000)]
net: mtk_eth_soc: remove interface checks in mtk_validate()

As phylink checks the interface mode against the supported_interfaces
bitmap, we no longer need to validate the interface mode, nor handle
PHY_INTERFACE_MODE_NA in the validation function. Remove these to
simplify the implementation.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: mtk_eth_soc: populate supported_interfaces member
Russell King (Oracle) [Tue, 16 Nov 2021 10:06:43 +0000 (10:06 +0000)]
net: mtk_eth_soc: populate supported_interfaces member

Populate the phy interface mode bitmap for the Mediatek driver with
interfaces modes supported by the MAC.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'sparx5-phylink'
David S. Miller [Wed, 17 Nov 2021 11:21:42 +0000 (11:21 +0000)]
Merge branch 'sparx5-phylink'

Russell King says:

====================
net: sparx5: phylink validate implementation updates

This series converts sparx5 to fill in the supported_interfaces member
of phylink_config, cleans up the validate() implementation, and then
converts to phylink_generic_validate().
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: sparx5: use phylink_generic_validate()
Russell King (Oracle) [Tue, 16 Nov 2021 10:02:11 +0000 (10:02 +0000)]
net: sparx5: use phylink_generic_validate()

Sparx5 has no special behaviour in its validation implementation, so can
be switched to phylink_generic_validate().

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: sparx5: clean up sparx5_phylink_validate()
Russell King (Oracle) [Tue, 16 Nov 2021 10:02:06 +0000 (10:02 +0000)]
net: sparx5: clean up sparx5_phylink_validate()

sparx5_phylink_validate() no longer needs to check for
PHY_INTERFACE_MODE_NA as phylink will walk the supported interface
types to discover the link mode capabilities. Neither is it necessary
to check the device capabilities as we will not be called for
unsupported interface modes. Remove these checks.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: sparx5: populate supported_interfaces member
Russell King (Oracle) [Tue, 16 Nov 2021 10:02:01 +0000 (10:02 +0000)]
net: sparx5: populate supported_interfaces member

Populate the phy_interface_t bitmap for the Microchip Sparx5 driver
with interfaces modes supported by the MAC.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'enetc-phylink'
David S. Miller [Wed, 17 Nov 2021 11:19:28 +0000 (11:19 +0000)]
Merge branch 'enetc-phylink'

Russell King says:

====================
net: enetc: phylink validate implementation updates

This series converts enetc to fill in the supported_interfaces member
of phylink_config, cleans up the validate() implementation, and then
converts to phylink_generic_validate().
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: enetc: use phylink_generic_validate()
Russell King (Oracle) [Tue, 16 Nov 2021 09:59:08 +0000 (09:59 +0000)]
net: enetc: use phylink_generic_validate()

enetc has no special behaviour in its validation implementation, so can
be switched to phylink_generic_validate().

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: enetc: remove interface checks in enetc_pl_mac_validate()
Russell King (Oracle) [Tue, 16 Nov 2021 09:59:03 +0000 (09:59 +0000)]
net: enetc: remove interface checks in enetc_pl_mac_validate()

As phylink checks the interface mode against the supported_interfaces
bitmap, we no longer need to validate the interface mode in the
validation function. Remove this to simplify it.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: enetc: populate supported_interfaces member
Russell King (Oracle) [Tue, 16 Nov 2021 09:58:58 +0000 (09:58 +0000)]
net: enetc: populate supported_interfaces member

Populate the phy_interface_t bitmap for the Freescale enetc driver with
interfaces modes supported by the MAC.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'xilinx-phylink'
David S. Miller [Wed, 17 Nov 2021 11:17:44 +0000 (11:17 +0000)]
Merge branch 'xilinx-phylink'

Russell King says:

====================
net: xilinx: phylink validate implementation updates

This series converts axienet to fill in the supported_interfaces member
of phylink_config, cleans up the validate() implementation, and then
converts to phylink_generic_validate().
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: axienet: use phylink_generic_validate()
Russell King (Oracle) [Tue, 16 Nov 2021 09:55:32 +0000 (09:55 +0000)]
net: axienet: use phylink_generic_validate()

axienet has no special behaviour in its validation implementation, so
can be switched to phylink_generic_validate().

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: axienet: remove interface checks in axienet_validate()
Russell King (Oracle) [Tue, 16 Nov 2021 09:55:27 +0000 (09:55 +0000)]
net: axienet: remove interface checks in axienet_validate()

As phylink checks the interface mode against the supported_interfaces
bitmap, we no longer need to validate the interface mode in the
validation function. Remove this to simplify it.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: axienet: populate supported_interfaces member
Russell King (Oracle) [Tue, 16 Nov 2021 09:55:22 +0000 (09:55 +0000)]
net: axienet: populate supported_interfaces member

Populate the phy_interface_t bitmap for the Xilinx axienet driver with
interfaces modes supported by the MAC.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge tag 'mlx5-updates-2021-11-16' of git://git.kernel.org/pub/scm/linux/kernel...
David S. Miller [Wed, 17 Nov 2021 11:03:43 +0000 (11:03 +0000)]
Merge tag 'mlx5-updates-2021-11-16' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2021-11-16

Updates for mlx5 driver:

1) Support ethtool cq mode
2) Static allocation of mod header object for the common case
3) TC support for when local and remote VTEPs are in the same
4) Create E-Switch QoS objects on demand to save on resources
5) Minor code improvements
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet/mlx5: E-switch, Create QoS on demand
Dmytro Linkin [Tue, 21 Sep 2021 16:08:38 +0000 (19:08 +0300)]
net/mlx5: E-switch, Create QoS on demand

Don't create eswitch QoS (root TSAR) on switch mode change. Create it on
first child TSAR object creation - vport or rate group. Keep track
root TSAR references and release root TSAR with last object deletion.
No need to check for QoS is enabled when installing tc matchall filter.
Remove related helper function due to no users of it.

Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: E-switch, Enable vport QoS on demand
Dmytro Linkin [Tue, 21 Sep 2021 15:45:42 +0000 (18:45 +0300)]
net/mlx5: E-switch, Enable vport QoS on demand

Vports' QoS is not commonly used but consume SW/HW resources, which
becomes an issue on BlueField SoC systems.
Don't enable QoS on vports by default on eswitch mode change and enable
when it's going to be used by one of the top level users:
- configuring TC matchall filter with police action;
- setting rate with legacy NDO API;
- calling devlink ops->rate_leaf_*() callbacks.

Disable vport QoS on vport cleanup.

Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: E-switch, move offloads mode callbacks to offloads file
Parav Pandit [Thu, 21 Oct 2021 15:21:30 +0000 (18:21 +0300)]
net/mlx5: E-switch, move offloads mode callbacks to offloads file

eswitch.c is mainly for common code between legacy and offloads mode.
MAC address get and set via devlink is applicable only in offloads mode.

Hence, move it to eswitch_offloads.c file.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: E-switch, Reuse mlx5_eswitch_set_vport_mac
Parav Pandit [Thu, 21 Oct 2021 15:17:52 +0000 (18:17 +0300)]
net/mlx5: E-switch, Reuse mlx5_eswitch_set_vport_mac

mlx5_eswitch_set_vport_mac() routine already does necessary checks which
are duplicated in implementation of
mlx5_devlink_port_function_hw_addr_set().

Hence, reuse mlx5_eswitch_set_vport_mac() and cut down the code.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: E-switch, Remove vport enabled check
Parav Pandit [Wed, 20 Oct 2021 04:56:01 +0000 (07:56 +0300)]
net/mlx5: E-switch, Remove vport enabled check

An eswitch vport of the devlink port is always enabled before a
devlink port is registered. And a eswitch vport is always disabled
after a devlink port is unregistered.
Hence avoid the vport enabled check in the devlink callback routine.
Such check is only applicable in the legacy SR-IOV callbacks.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Sunil Sudhakar Rani <sunrani@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5e: Specify out ifindex when looking up decap route
Chris Mi [Tue, 26 Oct 2021 09:08:24 +0000 (17:08 +0800)]
net/mlx5e: Specify out ifindex when looking up decap route

There is a use case that the local and remote VTEPs are in the same
host. Currently, the out ifindex is not specified when looking up the
decap route for offloads. So in this case, a local route is returned
and the route dev is lo.

Actual tunnel interface can be created with a parameter "dev" [1],
which specifies the physical device to use for tunnel endpoint
communication. Pass this parameter to driver when looking up decap
route for offloads. So that a unicast route will be returned.

[1] ip link add name vxlan1 type vxlan id 100 dev enp4s0f0 remote 1.1.1.1 dstport 4789

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5e: TC, Move comment about mod header flag to correct place
Roi Dayan [Wed, 10 Nov 2021 14:19:41 +0000 (16:19 +0200)]
net/mlx5e: TC, Move comment about mod header flag to correct place

Move the comment to the correct place where the driver actually
removes the flag and not in the check that maybe pedit actions exists.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5e: TC, Move kfree() calls after destroying all resources
Roi Dayan [Mon, 1 Nov 2021 16:13:02 +0000 (18:13 +0200)]
net/mlx5e: TC, Move kfree() calls after destroying all resources

When deleting fdb/nic flow rules first release all resources
and then call the kfree() calls instead of sparse them around
the function.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5e: TC, Destroy nic flow counter if exists
Roi Dayan [Mon, 1 Nov 2021 16:02:00 +0000 (18:02 +0200)]
net/mlx5e: TC, Destroy nic flow counter if exists

Counter is only added if counter flag exists.
So check the counter fag exists for deleting the counter.
This is the same as in add/del fdb flow.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: TC, using swap() instead of tmp variable
Yihao Han [Wed, 3 Nov 2021 06:21:09 +0000 (23:21 -0700)]
net/mlx5: TC, using swap() instead of tmp variable

swap() was used instead of the tmp variable to swap values

Signed-off-by: Yihao Han <hanyihao@vivo.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: CT: Allow static allocation of mod headers
Paul Blakey [Wed, 25 Aug 2021 13:46:41 +0000 (16:46 +0300)]
net/mlx5: CT: Allow static allocation of mod headers

As each CT rule uses at least 4 modify header actions, each rule
causes at least 3 reallocations by the mod header actions api.

Allow initial static allocation of the mod acts array, and use it for
CT rules. If the static allocation is exceeded go back to dynamic
allocation.

Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
3 years agonet/mlx5e: Refactor mod header management API
Paul Blakey [Mon, 5 Jul 2021 08:31:47 +0000 (11:31 +0300)]
net/mlx5e: Refactor mod header management API

For all mod hdr related functions to reside in a single self contained
component (mod_hdr.c), refactor alloc() and add get_id() so that user
won't rely on internal implementation, and move both to mod_hdr
component.

Rename the prefix to mlx5e_mod_hdr_* as other mod hdr functions.

Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: Avoid printing health buffer when firmware is unavailable
Aya Levin [Tue, 9 Nov 2021 13:44:58 +0000 (15:44 +0200)]
net/mlx5: Avoid printing health buffer when firmware is unavailable

Use firmware version field as an indication to health buffer's sanity.
When firmware version is 0xFFFFFFFF, deduce that firmware is unavailable
and avoid printing the health buffer to dmesg as it doesn't provide
debug info.

Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: Fix format-security build warnings
Saeed Mahameed [Wed, 3 Nov 2021 21:01:05 +0000 (14:01 -0700)]
net/mlx5: Fix format-security build warnings

Treat the string as an argument to avoid this.

drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c:482:5:
error: format string is not a string literal (potentially insecure)
                         name);
                         ^~~~
drivers/net/ethernet/mellanox/mlx5/core/en_stats.c:2079:4:
error: format string is not a string literal (potentially insecure)
                        ptp_ch_stats_desc[i].format);
                        ^~~~~~~~~~~~~~~~~~~~~~~~~~~
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
3 years agonet/mlx5e: Support ethtool cq mode
Saeed Mahameed [Wed, 15 Sep 2021 06:26:17 +0000 (23:26 -0700)]
net/mlx5e: Support ethtool cq mode

Add support for ethtool coalesce cq mode set and get.

Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
3 years agonet: document SMII and correct phylink's new validation mechanism
Russell King (Oracle) [Mon, 15 Nov 2021 17:11:17 +0000 (17:11 +0000)]
net: document SMII and correct phylink's new validation mechanism

SMII has not been documented in the kernel, but information on this PHY
interface mode has been recently found. Document it, and correct the
recently introduced phylink handling for this interface mode.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/E1mmfVl-0075nP-14@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoMerge branch 'r8169-disable-detection-of-further-chip-versions-that-didn-t-make-it...
Jakub Kicinski [Wed, 17 Nov 2021 03:10:34 +0000 (19:10 -0800)]
Merge branch 'r8169-disable-detection-of-further-chip-versions-that-didn-t-make-it-to-the-mass-market'

Heiner Kallweit says:

====================
r8169: disable detection of further chip versions that didn't make it to the mass market

There's no sign of life from further chip versions. Seems they didn't
make it to the mass market. Let's disable detection and if nobody
complains remove support a few kernel versions later.
====================

Link: https://lore.kernel.org/r/7708d13a-4a2b-090d-fadf-ecdd0fff5d2e@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agor8169: disable detection of chip version 41
Heiner Kallweit [Mon, 15 Nov 2021 20:52:35 +0000 (21:52 +0100)]
r8169: disable detection of chip version 41

It seems this chip version never made it to the wild. Therefore
disable detection and if nobody complains remove support completely
later.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agor8169: disable detection of chip version 45
Heiner Kallweit [Mon, 15 Nov 2021 20:51:52 +0000 (21:51 +0100)]
r8169: disable detection of chip version 45

It seems this chip version never made it to the wild. Therefore
disable detection and if nobody complains remove support completely
later.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agor8169: disable detection of chip versions 49 and 50
Heiner Kallweit [Mon, 15 Nov 2021 20:51:14 +0000 (21:51 +0100)]
r8169: disable detection of chip versions 49 and 50

It seems these chip versions never made it to the wild. Therefore
disable detection and if nobody complains remove support completely
later.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agor8169: enable ASPM L1/L1.1 from RTL8168h
Heiner Kallweit [Mon, 15 Nov 2021 20:17:56 +0000 (21:17 +0100)]
r8169: enable ASPM L1/L1.1 from RTL8168h

With newer chip versions ASPM-related issues seem to occur only if
L1.2 is enabled. I have a test system with RTL8168h that gives a
number of rx_missed errors when running iperf and L1.2 is enabled.
With L1.2 disabled (and L1 + L1.1 active) everything is fine.
See also [0]. Can't test this, but L1 + L1.1 being active should be
sufficient to reach higher package power saving states.

[0] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1942830

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/36feb8c4-a0b6-422a-899c-e61f2e869dfe@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoMerge branch 'net-better-packing-of-global-vars'
Jakub Kicinski [Wed, 17 Nov 2021 03:07:57 +0000 (19:07 -0800)]
Merge branch 'net-better-packing-of-global-vars'

Eric Dumazet says:

====================
net: better packing of global vars

First two patches avoid holes in data section,
and last patch makes sure some siphash keys are contained
in a single cache line.
====================

Link: https://lore.kernel.org/r/20211115172303.3732746-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: align static siphash keys
Eric Dumazet [Mon, 15 Nov 2021 17:23:03 +0000 (09:23 -0800)]
net: align static siphash keys

siphash keys use 16 bytes.

Define siphash_aligned_key_t macro so that we can make sure they
are not crossing a cache line boundary.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: use .data.once section in netdev_level_once()
Eric Dumazet [Mon, 15 Nov 2021 17:23:02 +0000 (09:23 -0800)]
net: use .data.once section in netdev_level_once()

Same rationale than prior patch : using the dedicated
section avoid holes and pack all these bool values.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoonce: use __section(".data.once")
Eric Dumazet [Mon, 15 Nov 2021 17:23:01 +0000 (09:23 -0800)]
once: use __section(".data.once")

.data.once contains nicely packed bool variables.
It is used already by DO_ONCE_LITE().

Using it also in DO_ONCE() removes holes in .data section.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoBluetooth: btusb: enable Mediatek to support AOSP extension
mark-yw.chen [Thu, 4 Nov 2021 18:26:05 +0000 (02:26 +0800)]
Bluetooth: btusb: enable Mediatek to support AOSP extension

This patch enables AOSP extension for Mediatek Chip (MT7921 & MT7922).

Signed-off-by: mark-yw.chen <mark-yw.chen@mediatek.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
3 years agoBluetooth: Attempt to clear HCI_LE_ADV on adv set terminated error event
Archie Pusaka [Thu, 11 Nov 2021 05:20:54 +0000 (13:20 +0800)]
Bluetooth: Attempt to clear HCI_LE_ADV on adv set terminated error event

We should clear the flag if the adv instance removed due to receiving
this error status is the last one we have.

Signed-off-by: Archie Pusaka <apusaka@chromium.org>
Reviewed-by: Miao-chen Chou <mcchou@chromium.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
3 years agoBluetooth: Ignore HCI_ERROR_CANCELLED_BY_HOST on adv set terminated event
Archie Pusaka [Thu, 11 Nov 2021 05:20:53 +0000 (13:20 +0800)]
Bluetooth: Ignore HCI_ERROR_CANCELLED_BY_HOST on adv set terminated event

This event is received when the controller stops advertising,
specifically for these three reasons:
(a) Connection is successfully created (success).
(b) Timeout is reached (error).
(c) Number of advertising events is reached (error).
(*) This event is NOT generated when the host stops the advertisement.
Refer to the BT spec ver 5.3 vol 4 part E sec 7.7.65.18. Note that the
section was revised from BT spec ver 5.0 vol 2 part E sec 7.7.65.18
which was ambiguous about (*).

Some chips (e.g. RTL8822CE) send this event when the host stops the
advertisement with status = HCI_ERROR_CANCELLED_BY_HOST (due to (*)
above). This is treated as an error and the advertisement will be
removed and userspace will be informed via MGMT event.

On suspend, we are supposed to temporarily disable advertisements,
and continue advertising on resume. However, due to the behavior
above, the advertisements are removed instead.

This patch returns early if HCI_ERROR_CANCELLED_BY_HOST is received.

Btmon snippet of the unexpected behavior:
@ MGMT Command: Remove Advertising (0x003f) plen 1
        Instance: 1
< HCI Command: LE Set Extended Advertising Enable (0x08|0x0039) plen 6
        Extended advertising: Disabled (0x00)
        Number of sets: 1 (0x01)
        Entry 0
          Handle: 0x01
          Duration: 0 ms (0x00)
          Max ext adv events: 0
> HCI Event: LE Meta Event (0x3e) plen 6
      LE Advertising Set Terminated (0x12)
        Status: Operation Cancelled by Host (0x44)
        Handle: 1
        Connection handle: 0
        Number of completed extended advertising events: 5
> HCI Event: Command Complete (0x0e) plen 4
      LE Set Extended Advertising Enable (0x08|0x0039) ncmd 2
        Status: Success (0x00)

Signed-off-by: Archie Pusaka <apusaka@chromium.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
3 years agoBluetooth: hci_request: Remove bg_scan_update work
Luiz Augusto von Dentz [Fri, 12 Nov 2021 00:48:44 +0000 (16:48 -0800)]
Bluetooth: hci_request: Remove bg_scan_update work

This work is no longer necessary since all the code using it has been
converted to use hci_passive_scan/hci_passive_scan_sync.

Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
3 years agoBluetooth: hci_sync: Convert MGMT_OP_SET_CONNECTABLE to use cmd_sync
Luiz Augusto von Dentz [Fri, 12 Nov 2021 00:48:43 +0000 (16:48 -0800)]
Bluetooth: hci_sync: Convert MGMT_OP_SET_CONNECTABLE to use cmd_sync

This makes MGMT_OP_SET_CONNEABLE use hci_cmd_sync_queue instead of
use a dedicated connetable_update work.

Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
3 years agoBluetooth: hci_sync: Convert MGMT_OP_SET_DISCOVERABLE to use cmd_sync
Luiz Augusto von Dentz [Fri, 12 Nov 2021 00:48:42 +0000 (16:48 -0800)]
Bluetooth: hci_sync: Convert MGMT_OP_SET_DISCOVERABLE to use cmd_sync

This makes MGMT_OP_SET_DISCOVERABLE use hci_cmd_sync_queue instead of
use a dedicated discoverable_update work.

Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
3 years agoBluetooth: btmrvl_main: repair a non-kernel-doc comment
Randy Dunlap [Mon, 15 Nov 2021 03:05:17 +0000 (19:05 -0800)]
Bluetooth: btmrvl_main: repair a non-kernel-doc comment

Do not use "/**" to begin a non-kernel-doc comment.
Fixes this build warning:

drivers/bluetooth/btmrvl_main.c:2: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: kernel test robot <lkp@intel.com>
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Johan Hedberg <johan.hedberg@gmail.com>
Cc: Luiz Augusto von Dentz <luiz.dentz@gmail.com>
Cc: linux-bluetooth@vger.kernel.org
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
3 years agoMerge branch 'inuse-cleanups'
David S. Miller [Tue, 16 Nov 2021 13:20:45 +0000 (13:20 +0000)]
Merge branch 'inuse-cleanups'

Eric Dumazet says:

====================
net: prot_inuse and sock_inuse cleanups

Small series cleaning and optimizing sock_prot_inuse_add()
and sock_inuse_add().
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: drop nopreempt requirement on sock_prot_inuse_add()
Eric Dumazet [Mon, 15 Nov 2021 17:11:50 +0000 (09:11 -0800)]
net: drop nopreempt requirement on sock_prot_inuse_add()

This is distracting really, let's make this simpler,
because many callers had to take care of this
by themselves, even if on x86 this adds more
code than really needed.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: merge net->core.prot_inuse and net->core.sock_inuse
Eric Dumazet [Mon, 15 Nov 2021 17:11:49 +0000 (09:11 -0800)]
net: merge net->core.prot_inuse and net->core.sock_inuse

net->core.sock_inuse is a per cpu variable (int),
while net->core.prot_inuse is another per cpu variable
of 64 integers.

per cpu allocator tend to place them in very different places.

Grouping them together makes sense, since it makes
updates potentially faster, if hitting the same
cache line.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: make sock_inuse_add() available
Eric Dumazet [Mon, 15 Nov 2021 17:11:48 +0000 (09:11 -0800)]
net: make sock_inuse_add() available

MPTCP hard codes it, let us instead provide this helper.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: inline sock_prot_inuse_add()
Eric Dumazet [Mon, 15 Nov 2021 17:11:47 +0000 (09:11 -0800)]
net: inline sock_prot_inuse_add()

sock_prot_inuse_add() is very small, we can inline it.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'gro-out-of-core-files'
David S. Miller [Tue, 16 Nov 2021 13:16:54 +0000 (13:16 +0000)]
Merge branch 'gro-out-of-core-files'

Eric Dumazet says:

====================
gro: get out of core files

Move GRO related content into net/core/gro.c
and include/net/gro.h.

This reduces GRO scope to where it is really needed,
and shrinks too big files (include/linux/netdevice.h
and net/core/dev.c)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: gro: populate net/core/gro.c
Eric Dumazet [Mon, 15 Nov 2021 17:05:54 +0000 (09:05 -0800)]
net: gro: populate net/core/gro.c

Move gro code and data from net/core/dev.c to net/core/gro.c
to ease maintenance.

gro_normal_list() and gro_normal_one() are inlined
because they are called from both files.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: gro: move skb_gro_receive into net/core/gro.c
Eric Dumazet [Mon, 15 Nov 2021 17:05:53 +0000 (09:05 -0800)]
net: gro: move skb_gro_receive into net/core/gro.c

net/core/gro.c will contain all core gro functions,
to shrink net/core/skbuff.c and net/core/dev.c

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: gro: move skb_gro_receive_list to udp_offload.c
Eric Dumazet [Mon, 15 Nov 2021 17:05:52 +0000 (09:05 -0800)]
net: gro: move skb_gro_receive_list to udp_offload.c

This helper is used once, no need to keep it in fat net/core/skbuff.c

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: move gro definitions to include/net/gro.h
Eric Dumazet [Mon, 15 Nov 2021 17:05:51 +0000 (09:05 -0800)]
net: move gro definitions to include/net/gro.h

include/linux/netdevice.h became too big, move gro stuff
into include/net/gro.h

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'tcp-optimizations'
David S. Miller [Tue, 16 Nov 2021 13:10:35 +0000 (13:10 +0000)]
Merge branch 'tcp-optimizations'

Eric Dumazet says:

====================
tcp: optimizations for linux-5.17

Mostly small improvements in this series.

The notable change is in "defer skb freeing after
socket lock is released" in recvmsg() (and RX zerocopy)

The idea is to try to let skb freeing to BH handler,
whenever possible, or at least perform the freeing
outside of the socket lock section, for much improved
performance. This idea can probably be extended
to other protocols.

 Tests on a 100Gbit NIC
 Max throughput for one TCP_STREAM flow, over 10 runs.

 MTU : 1500  (1428 bytes of TCP payload per MSS)
 Before: 55 Gbit
 After:  66 Gbit

 MTU : 4096+ (4096 bytes of TCP payload, plus TCP/IPv6 headers)
 Before: 82 Gbit
 After:  95 Gbit
====================

Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: move early demux fields close to sk_refcnt
Eric Dumazet [Mon, 15 Nov 2021 19:02:49 +0000 (11:02 -0800)]
net: move early demux fields close to sk_refcnt

sk_rx_dst/sk_rx_dst_ifindex/sk_rx_dst_cookie are read in early demux,
and currently spans two cache lines.

Moving them close to sk_refcnt makes more sense, as only one cache
line is needed.

New layout for this hot cache line is :

struct sock {
struct sock_common         __sk_common;          /*     0  0x88 */
/* --- cacheline 2 boundary (128 bytes) was 8 bytes ago --- */
struct dst_entry *         sk_rx_dst;            /*  0x88   0x8 */
int                        sk_rx_dst_ifindex;    /*  0x90   0x4 */
u32                        sk_rx_dst_cookie;     /*  0x94   0x4 */
socket_lock_t              sk_lock;              /*  0x98  0x20 */
atomic_t                   sk_drops;             /*  0xb8   0x4 */
int                        sk_rcvlowat;          /*  0xbc   0x4 */
/* --- cacheline 3 boundary (192 bytes) --- */

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agotcp: do not call tcp_cleanup_rbuf() if we have a backlog
Eric Dumazet [Mon, 15 Nov 2021 19:02:48 +0000 (11:02 -0800)]
tcp: do not call tcp_cleanup_rbuf() if we have a backlog

Under pressure, tcp recvmsg() has logic to process the socket backlog,
but calls tcp_cleanup_rbuf() right before.

Avoiding sending ACK right before processing new segments makes
a lot of sense, as this decrease the number of ACK packets,
with no impact on effective ACK clocking.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agotcp: check local var (timeo) before socket fields in one test
Eric Dumazet [Mon, 15 Nov 2021 19:02:47 +0000 (11:02 -0800)]
tcp: check local var (timeo) before socket fields in one test

Testing timeo before sk_err/sk_state/sk_shutdown makes more sense.

Modern applications use non-blocking IO, while a socket is terminated
only once during its life time.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agotcp: defer skb freeing after socket lock is released
Eric Dumazet [Mon, 15 Nov 2021 19:02:46 +0000 (11:02 -0800)]
tcp: defer skb freeing after socket lock is released

tcp recvmsg() (or rx zerocopy) spends a fair amount of time
freeing skbs after their payload has been consumed.

A typical ~64KB GRO packet has to release ~45 page
references, eventually going to page allocator
for each of them.

Currently, this freeing is performed while socket lock
is held, meaning that there is a high chance that
BH handler has to queue incoming packets to tcp socket backlog.

This can cause additional latencies, because the user
thread has to process the backlog at release_sock() time,
and while doing so, additional frames can be added
by BH handler.

This patch adds logic to defer these frees after socket
lock is released, or directly from BH handler if possible.

Being able to free these skbs from BH handler helps a lot,
because this avoids the usual alloc/free assymetry,
when BH handler and user thread do not run on same cpu or
NUMA node.

One cpu can now be fully utilized for the kernel->user copy,
and another cpu is handling BH processing and skb/page
allocs/frees (assuming RFS is not forcing use of a single CPU)

Tested:
 100Gbit NIC
 Max throughput for one TCP_STREAM flow, over 10 runs

MTU : 1500
Before: 55 Gbit
After:  66 Gbit

MTU : 4096+(headers)
Before: 82 Gbit
After:  95 Gbit

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agotcp: avoid indirect calls to sock_rfree
Eric Dumazet [Mon, 15 Nov 2021 19:02:45 +0000 (11:02 -0800)]
tcp: avoid indirect calls to sock_rfree

TCP uses sk_eat_skb() when skbs can be removed from receive queue.
However, the call to skb_orphan() from __kfree_skb() incurs
an indirect call so sock_rfee(), which is more expensive than
a direct call, especially for CONFIG_RETPOLINE=y.

Add tcp_eat_recv_skb() function to make the call before
__kfree_skb().

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agotcp: tp->urg_data is unlikely to be set
Eric Dumazet [Mon, 15 Nov 2021 19:02:44 +0000 (11:02 -0800)]
tcp: tp->urg_data is unlikely to be set

Use some unlikely() hints in the fast path.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agotcp: annotate races around tp->urg_data
Eric Dumazet [Mon, 15 Nov 2021 19:02:43 +0000 (11:02 -0800)]
tcp: annotate races around tp->urg_data

tcp_poll() and tcp_ioctl() are reading tp->urg_data without socket lock
owned.

Also, it is faster to first check tp->urg_data in tcp_poll(),
then tp->urg_seq == tp->copied_seq, because tp->urg_seq is
located in a different/cold cache line.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agotcp: annotate data-races on tp->segs_in and tp->data_segs_in
Eric Dumazet [Mon, 15 Nov 2021 19:02:42 +0000 (11:02 -0800)]
tcp: annotate data-races on tp->segs_in and tp->data_segs_in

tcp_segs_in() can be called from BH, while socket spinlock
is held but socket owned by user, eventually reading these
fields from tcp_get_info()

Found by code inspection, no need to backport this patch
to older kernels.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agotcp: add RETPOLINE mitigation to sk_backlog_rcv
Eric Dumazet [Mon, 15 Nov 2021 19:02:41 +0000 (11:02 -0800)]
tcp: add RETPOLINE mitigation to sk_backlog_rcv

Use INDIRECT_CALL_INET() to avoid an indirect call
when/if CONFIG_RETPOLINE=y

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agotcp: small optimization in tcp recvmsg()
Eric Dumazet [Mon, 15 Nov 2021 19:02:40 +0000 (11:02 -0800)]
tcp: small optimization in tcp recvmsg()

When reading large chunks of data, incoming packets might
be added to the backlog from BH.

tcp recvmsg() detects the backlog queue is not empty, and uses
a release_sock()/lock_sock() pair to process this backlog.

We now have __sk_flush_backlog() to perform this
a bit faster.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: cache align tcp_memory_allocated, tcp_sockets_allocated
Eric Dumazet [Mon, 15 Nov 2021 19:02:39 +0000 (11:02 -0800)]
net: cache align tcp_memory_allocated, tcp_sockets_allocated

tcp_memory_allocated and tcp_sockets_allocated often share
a common cache line, source of false sharing.

Also take care of udp_memory_allocated and mptcp_sockets_allocated.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: forward_alloc_get depends on CONFIG_MPTCP
Eric Dumazet [Mon, 15 Nov 2021 19:02:38 +0000 (11:02 -0800)]
net: forward_alloc_get depends on CONFIG_MPTCP

(struct proto)->sk_forward_alloc is currently only used by MPTCP.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: shrink struct sock by 8 bytes
Eric Dumazet [Mon, 15 Nov 2021 19:02:37 +0000 (11:02 -0800)]
net: shrink struct sock by 8 bytes

Move sk_bind_phc next to sk_peer_lock to fill a hole.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoipv6: shrink struct ipcm6_cookie
Eric Dumazet [Mon, 15 Nov 2021 19:02:36 +0000 (11:02 -0800)]
ipv6: shrink struct ipcm6_cookie

gso_size can be moved after tclass, to use an existing hole.
(8 bytes saved on 64bit arches)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: remove sk_route_nocaps
Eric Dumazet [Mon, 15 Nov 2021 19:02:35 +0000 (11:02 -0800)]
net: remove sk_route_nocaps

Instead of using a full netdev_features_t, we can use a single bit,
as sk_route_nocaps is only used to remove NETIF_F_GSO_MASK from
sk->sk_route_cap.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: remove sk_route_forced_caps
Eric Dumazet [Mon, 15 Nov 2021 19:02:34 +0000 (11:02 -0800)]
net: remove sk_route_forced_caps

We were only using one bit, and we can replace it by sk_is_tcp()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: use sk_is_tcp() in more places
Eric Dumazet [Mon, 15 Nov 2021 19:02:33 +0000 (11:02 -0800)]
net: use sk_is_tcp() in more places

Move sk_is_tcp() to include/net/sock.h and use it where we can.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agotcp: small optimization in tcp_v6_send_check()
Eric Dumazet [Mon, 15 Nov 2021 19:02:32 +0000 (11:02 -0800)]
tcp: small optimization in tcp_v6_send_check()

For TCP flows, inet6_sk(sk)->saddr has the same value
than sk->sk_v6_rcv_saddr.

Using sk->sk_v6_rcv_saddr increases data locality.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agotcp: remove dead code in __tcp_v6_send_check()
Eric Dumazet [Mon, 15 Nov 2021 19:02:31 +0000 (11:02 -0800)]
tcp: remove dead code in __tcp_v6_send_check()

For some reason, I forgot to change __tcp_v6_send_check() at
the same time I removed (ip_summed == CHECKSUM_PARTIAL) check
in __tcp_v4_send_check()

Fixes: 72334e1f93f2 ("tcp: remove dead code after CHECKSUM_PARTIAL adoption")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agotcp: minor optimization in tcp_add_backlog()
Eric Dumazet [Mon, 15 Nov 2021 19:02:30 +0000 (11:02 -0800)]
tcp: minor optimization in tcp_add_backlog()

If packet is going to be coalesced, sk_sndbuf/sk_rcvbuf values
are not used. Defer their access to the point we need them.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoBluetooth: Don't initialize msft/aosp when using user channel
Jesse Melhuish [Mon, 15 Nov 2021 22:00:52 +0000 (22:00 +0000)]
Bluetooth: Don't initialize msft/aosp when using user channel

A race condition is triggered when usermode control is given to
userspace before the kernel's MSFT query responds, resulting in an
unexpected response to userspace's reset command.

Issue can be observed in btmon:

< HCI Command: Vendor (0x3f|0x001e) plen 2                    #3 [hci0]
        05 01                                            ..
@ USER Open: bt_stack_manage (privileged) version 2.22  {0x0002} [hci0]
< HCI Command: Reset (0x03|0x0003) plen 0                     #4 [hci0]
> HCI Event: Command Complete (0x0e) plen 5                   #5 [hci0]
      Vendor (0x3f|0x001e) ncmd 1
Status: Command Disallowed (0x0c)
05                                               .
> HCI Event: Command Complete (0x0e) plen 4                   #6 [hci0]
      Reset (0x03|0x0003) ncmd 2
Status: Success (0x00)

Reviewed-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org>
Reviewed-by: Sonny Sasaka <sonnysasaka@chromium.org>
Signed-off-by: Jesse Melhuish <melhuishj@chromium.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
3 years agoBluetooth: fix uninitialized variables notify_evt
Jackie Liu [Tue, 16 Nov 2021 01:17:17 +0000 (09:17 +0800)]
Bluetooth: fix uninitialized variables notify_evt

Coverity Scan report:

[...]
*** CID 1493985:  Uninitialized variables  (UNINIT)
/net/bluetooth/hci_event.c: 4535 in hci_sync_conn_complete_evt()
4529
4530      /* Notify only in case of SCO over HCI transport data path which
4531       * is zero and non-zero value shall be non-HCI transport data path
4532       */
4533      if (conn->codec.data_path == 0) {
4534      if (hdev->notify)
>>>     CID 1493985:  Uninitialized variables  (UNINIT)
>>>     Using uninitialized value "notify_evt" when calling "*hdev->notify".
4535      hdev->notify(hdev, notify_evt);
4536      }
4537
4538      hci_connect_cfm(conn, ev->status);
4539      if (ev->status)
4540      hci_conn_del(conn);
[...]

Although only btusb uses air_mode, and he only handles HCI_NOTIFY_ENABLE_SCO_CVSD
and HCI_NOTIFY_ENABLE_SCO_TRANSP, there is still a very small chance that
ev->air_mode is not equal to 0x2 and 0x3, but notify_evt is initialized to
HCI_NOTIFY_ENABLE_SCO_CVSD or HCI_NOTIFY_ENABLE_SCO_TRANSP. the context is
maybe not correct.

Let us directly use the required function instead of re-initializing it,
so as to restore the original logic and make the code more correct.

Addresses-Coverity: ("Uninitialized variables")
Fixes: aaba6aa6dcc9 ("Bluetooth: Allow usb to auto-suspend when SCO use non-HCI transport")
Suggested-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Jackie Liu <liuyun01@kylinos.cn>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
3 years agoBluetooth: stop proccessing malicious adv data
Pavel Skripkin [Mon, 1 Nov 2021 07:12:12 +0000 (10:12 +0300)]
Bluetooth: stop proccessing malicious adv data

Syzbot reported slab-out-of-bounds read in hci_le_adv_report_evt(). The
problem was in missing validaion check.

We should check if data is not malicious and we can read next data block.
If we won't check ptr validness, code can read a way beyond skb->end and
it can cause problems, of course.

Fixes: d54e041cd6dd ("Bluetooth: hci_le_adv_report_evt code refactoring")
Reported-and-tested-by: syzbot+e3fcb9c4f3c2a931dc40@syzkaller.appspotmail.com
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
3 years agoBluetooth: hci_h4: Fix padding calculation error within h4_recv_buf()
Zijun Hu [Tue, 16 Nov 2021 08:51:38 +0000 (16:51 +0800)]
Bluetooth: hci_h4: Fix padding calculation error within h4_recv_buf()

it is erroneous to calculate padding by subtracting length of type
indication from skb->len, it will cause data analysis error for
alignment which is greater than 1, so fixed by adding length of type
indication with skb->len.

Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
3 years agonet: macb: Fix several edge cases in validate
Sean Anderson [Fri, 12 Nov 2021 19:04:00 +0000 (14:04 -0500)]
net: macb: Fix several edge cases in validate

There were several cases where validate() would return bogus supported
modes with unusual combinations of interfaces and capabilities. For
example, if state->interface was 10GBASER and the macb had HIGH_SPEED
and PCS but not GIGABIT MODE, then 10/100 modes would be set anyway. In
another case, SGMII could be enabled even if the mac was not a GEM
(despite this being checked for later on in mac_config()). These
inconsistencies make it difficult to refactor this function cleanly.

There is still the open question of what exactly the requirements for
SGMII and 10GBASER are, and what SGMII actually supports. If someone
from Cadence (or anyone else with access to the GEM/MACB datasheet)
could comment on this, it would be greatly appreciated. In particular,
what is supported by Cadence vs. vendor extension/limitation?

To address this, the current logic is split into three parts. First, we
determine what we support, then we eliminate unsupported interfaces, and
finally we set the appropriate link modes. There is still some cruft
related to NA, but this can be removed in a future patch.

Signed-off-by: Sean Anderson <sean.anderson@seco.com>
Reviewed-by: Parshuram Thombare <pthombar@cadence.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/20211112190400.1937855-1-sean.anderson@seco.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoMerge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Jakub Kicinski [Mon, 15 Nov 2021 16:49:20 +0000 (08:49 -0800)]
Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2021-11-15

We've added 72 non-merge commits during the last 13 day(s) which contain
a total of 171 files changed, 2728 insertions(+), 1143 deletions(-).

The main changes are:

1) Add btf_type_tag attributes to bring kernel annotations like __user/__rcu to
   BTF such that BPF verifier will be able to detect misuse, from Yonghong Song.

2) Big batch of libbpf improvements including various fixes, future proofing APIs,
   and adding a unified, OPTS-based bpf_prog_load() low-level API, from Andrii Nakryiko.

3) Add ingress_ifindex to BPF_SK_LOOKUP program type for selectively applying the
   programmable socket lookup logic to packets from a given netdev, from Mark Pashmfouroush.

4) Remove the 128M upper JIT limit for BPF programs on arm64 and add selftest to
   ensure exception handling still works, from Russell King and Alan Maguire.

5) Add a new bpf_find_vma() helper for tracing to map an address to the backing
   file such as shared library, from Song Liu.

6) Batch of various misc fixes to bpftool, fixing a memory leak in BPF program dump,
   updating documentation and bash-completion among others, from Quentin Monnet.

7) Deprecate libbpf bpf_program__get_prog_info_linear() API and migrate its users as
   the API is heavily tailored around perf and is non-generic, from Dave Marchevsky.

8) Enable libbpf's strict mode by default in bpftool and add a --legacy option as an
   opt-out for more relaxed BPF program requirements, from Stanislav Fomichev.

9) Fix bpftool to use libbpf_get_error() to check for errors, from Hengqi Chen.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (72 commits)
  bpftool: Use libbpf_get_error() to check error
  bpftool: Fix mixed indentation in documentation
  bpftool: Update the lists of names for maps and prog-attach types
  bpftool: Fix indent in option lists in the documentation
  bpftool: Remove inclusion of utilities.mak from Makefiles
  bpftool: Fix memory leak in prog_dump()
  selftests/bpf: Fix a tautological-constant-out-of-range-compare compiler warning
  selftests/bpf: Fix an unused-but-set-variable compiler warning
  bpf: Introduce btf_tracing_ids
  bpf: Extend BTF_ID_LIST_GLOBAL with parameter for number of IDs
  bpftool: Enable libbpf's strict mode by default
  docs/bpf: Update documentation for BTF_KIND_TYPE_TAG support
  selftests/bpf: Clarify llvm dependency with btf_tag selftest
  selftests/bpf: Add a C test for btf_type_tag
  selftests/bpf: Rename progs/tag.c to progs/btf_decl_tag.c
  selftests/bpf: Test BTF_KIND_DECL_TAG for deduplication
  selftests/bpf: Add BTF_KIND_TYPE_TAG unit tests
  selftests/bpf: Test libbpf API function btf__add_type_tag()
  bpftool: Support BTF_KIND_TYPE_TAG
  libbpf: Support BTF_KIND_TYPE_TAG
  ...
====================

Link: https://lore.kernel.org/r/20211115162008.25916-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoRevert "Merge branch 'mctp-i2c-driver'"
Jakub Kicinski [Mon, 15 Nov 2021 15:49:46 +0000 (07:49 -0800)]
Revert "Merge branch 'mctp-i2c-driver'"

This reverts commit 6fa390a81f44a84575e558232afcbd2ceb1c2391, reversing
changes made to 47d95f289c07773b12e946055cac8942b31cb704.

Wolfram Sang says:

Please revert. Besides the driver in net, it modifies the I2C core
code. This has not been acked by the I2C maintainer (in this case me).
So, please don't pull this in via the net tree. The question raised here
(extending SMBus calls to 255 byte) is complicated because we need ABI
backwards compatibility.

Link: https://lore.kernel.org/all/YZJ9H4eM%2FM7OXVN0@shikoro/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoMerge branch 'generic-phylink-validation'
David S. Miller [Mon, 15 Nov 2021 14:31:00 +0000 (14:31 +0000)]
Merge branch 'generic-phylink-validation'

Russell King says:

====================
introduce generic phylink validation

The various validate method implementations we have in phylink users
have been quite repetitive but also prone to bugs. These patches
introduce a generic implementation which relies solely on the
supported_interfaces bitmap introduced during last cycle, and in the
first patch, a bit array of MAC capabilities.

MAC drivers are free to continue to do their own thing if they have
special requirements - such as mvneta and mvpp2 which do not support
1000base-X without AN enabled. Most implementations currently in the
kernel can be converted to call phylink_generic_validate() directly
from the phylink MAC operations structure once they fill in the
supported_interfaces and mac_capabilities members of phylink_config.

This series introduces the generic implementation, and converts mvneta
and mvpp2 to use it.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: mvpp2: use phylink_generic_validate()
Russell King (Oracle) [Mon, 15 Nov 2021 10:00:37 +0000 (10:00 +0000)]
net: mvpp2: use phylink_generic_validate()

Convert mvpp2 to use phylink_generic_validate() for the bulk of its
validate() implementation. This network adapter has a restriction
that for 802.3z links, autonegotiation must be enabled.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: mvneta: use phylink_generic_validate()
Russell King (Oracle) [Mon, 15 Nov 2021 10:00:32 +0000 (10:00 +0000)]
net: mvneta: use phylink_generic_validate()

Convert mvneta to use phylink_generic_validate() for the bulk of its
validate() implementation. This network adapter has a restriction
that for 802.3z links, autonegotiation must be enabled.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>