René van Dorst [Mon, 2 Sep 2019 13:02:26 +0000 (15:02 +0200)]
net: dsa: mt7530: Add support for port 5
Adding support for port 5.
Port 5 can muxed/interface to:
- internal 5th GMAC of the switch; can be used as 2nd CPU port or as
extra port with an external phy for a 6th ethernet port.
- internal PHY of port 0 or 4; Used in most applications so that port 0
or 4 is the WAN port and interfaces with the 2nd GMAC of the SOC.
Signed-off-by: René van Dorst <opensource@vdorst.com> Tested-by: Frank Wunderlich <frank-w@public-files.de> Acked-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
René van Dorst [Mon, 2 Sep 2019 13:02:25 +0000 (15:02 +0200)]
dt-bindings: net: dsa: mt7530: Add support for port 5
MT7530 port 5 has many modes/configurations.
Update the documentation how to use port 5.
Signed-off-by: René van Dorst <opensource@vdorst.com> Cc: devicetree@vger.kernel.org Cc: Rob Herring <robh@kernel.org> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
René van Dorst [Mon, 2 Sep 2019 13:02:24 +0000 (15:02 +0200)]
net: dsa: mt7530: Convert to PHYLINK API
Convert mt7530 to PHYLINK API
Signed-off-by: René van Dorst <opensource@vdorst.com> Tested-by: Frank Wunderlich <frank-w@public-files.de> Acked-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
Hayes Wang [Mon, 2 Sep 2019 11:52:28 +0000 (19:52 +0800)]
r8152: modify rtl8152_set_speed function
First, for AUTONEG_DISABLE, we only need to modify MII_BMCR.
Second, add advertising parameter for rtl8152_set_speed(). Add
RTL_ADVERTISED_xxx for advertising parameter of rtl8152_set_speed().
Then, the advertising settings from ethtool could be saved.
Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
dpaa2-eth: Poll Tx pending frames counter on if down
Starting with firmware version MC10.18.0, a new counter for in flight
Tx frames is offered. Use it when bringing down the interface to
determine when all pending Tx frames have been processed by hardware
instead of sleeping a fixed amount of time.
Signed-off-by: Ioana Radulescu <ruxandra.radulescu@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 4 Sep 2019 04:51:25 +0000 (21:51 -0700)]
Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:
====================
100GbE Intel Wired LAN Driver Updates 2019-09-03
This series contains updates to ice driver only.
Anirudh adds the ability for the driver to handle EMP resets correctly
by adding the logic to the existing ice_reset_subtask().
Jeb fixes up the logic to properly free up the resources for a switch
rule whether or not it was successful in the removal.
Brett fixes up the reporting of ITR values to let the user know odd ITR
values are not allowed. Fixes the driver to only disable VLAN pruning
on VLAN deletion when the VLAN being deleted is the last VLAN on the VF
VSI.
Chinh updates the driver to determine the TSA value from the priority
value when in CEE mode.
Bruce aligns the driver with the hardware specification by ensuring that
a PF reset is done as part of the unload logic. Also update the driver
unloading field, based on the latest hardware specification, which
allows us to remove an unnecessary endian conversion. Moves #defines
based on their need in the code.
Jesse adds the current state of auto-negotiation in the link up message.
In addition, adds additional information to inform the user of an issue
with the topology/configuration of the link.
Usha updates the driver to allow the maximum TCs that the firmware
supports, rather than hard coding to a set value.
Dave updates the DCB initialization flow to handle the case of an actual
error during DCB init. Updated the driver to report the current stats,
even when the netdev is down, which aligns with our other drivers.
Mitch fixes the VF reset code flows to ensure that it properly calls
ice_dis_vsi_txq() to notify the firmware that the VF is being reset.
Michal fixes the driver so the DCB is not enabled when the SW LLDP is
activated, which was causing a communication issue with other NICs. The
problem lies in that DCB was being enabled without checking the number
of TCs.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Abstract:
--------
Mellanox ConnetX devices supports packet matching, packet modification and
redirection. These functionalities are also referred to as flow-steering.
To configure a steering rule, the rule is written to the device owned
memory, this memory is accessed and cached by the device when processing
a packet.
Steering rules are constructed from multiple steering entries (STE).
Rules are configured using the Firmware command interface. The Firmware
processes the given driver command and translates them to STEs, then
writes them to the device memory in the current steering tables.
This process is slow due to the architecture of the command interface and
the processing complexity of each rule.
The highlight of this patchset is to cut the middle man (The firmware) and
do steering rules programming into device directly from the driver, with
no firmware intervention whatsoever.
Motivation:
-----------
Software (driver managed) steering allows for high rule insertion rates
compared to the FW steering described above, this is achieved by using
internal RDMA writes to the device owned memory instead of the slow
command interface to program steering rules.
Software (driver managed) steering, doesn't depend on new FW
for new steering functionality, new implementations can be done in the
driver skipping the FW layer.
Performance:
------------
The insertion rate on a single core using the new approach allows
programming ~300K rules per sec. (Done via direct raw test to the new mlx5
sw steering layer, without any kernel layer involved).
Test: TC L2 rules
33K/s with Software steering (this patchset).
5K/s with FW and current driver.
This will improve OVS based solution performance.
Architecture and implementation details:
----------------------------------------
Software steering will be dynamically selected via devlink device
parameter. Example:
$ devlink dev param show pci/0000:06:00.0 name flow_steering_mode
pci/0000:06:00.0:
name flow_steering_mode type driver-specific
values:
cmode runtime value smfs
mlx5 software steering module a.k.a (DR - Direct Rule) is implemented
and contained in mlx5/core/steering directory and controlled by
MLX5_SW_STEERING kconfig flag.
mlx5 core steering layer (fs_core) already provides a shim layer for
implementing different steering mechanisms, software steering will
leverage that as seen at the end of this series.
When Software Steering for a specific steering domain
(NIC/RDMA/Vport/ESwitch, etc ..) is supported, it will cause rules
targeting this domain to be created using SW steering instead of FW.
The implementation includes:
Domain - The steering domain is the object that all other object resides
in. It holds the memory allocator, send engine, locks and other shared
data needed by lower objects such as table, matcher, rule, action.
Each domain can contain multiple tables. Domain is equivalent to
namespaces e.g (NIC/RDMA/Vport/ESwitch, etc ..) as implemented
currently in mlx5_core fs_core (flow steering core).
Table - Table objects are used for holding multiple matchers, each table
has a level used to prevent processing loops. Packets are being
directed to this table once it is set as the root table, this is done
by fs_core using a FW command. A packet is being processed inside the
table matcher by matcher until a successful hit, otherwise the packet
will perform the default action.
Matcher - Matchers objects are used to specify the fields mask for
matching when processing a packet. A matcher belongs to a table, each
matcher can hold multiple rules, each rule with different matching
values corresponding to the matcher mask. Each matcher has a priority
used for rule processing order inside the table.
Action - Action objects are created to specify different steering actions
such as count, reformat (encapsulate, decapsulate, ...), modify
header, forward to table and many other actions. When creating a rule
a sequence of actions can be provided to be executed on a successful
match.
Rule - Rule objects are used to specify a specific match on packets as
well as the actions that should be executed. A rule belongs to a
matcher.
STE - This layer is used to hold the specific STE format for the device
and to convert the requested rule to STEs. Each rule is constructed of
an STE chain, Multiple rules construct a steering graph. Each node in
the graph is a hash table containing multiple STEs. The index of each
STE in the hash table is being calculated using a CRC32 hash function.
Memory pool - Used for managing and caching device owned memory for rule
insertion. The memory is being allocated using DM (device memory) API.
Communication with device - layer for standard RDMA operation using RC QP
to configure the device steering.
Command utility - This module holds all of the FW commands that are
required for SW steering to function.
Patch planning and files:
-------------------------
1) First patch, adds the support to Add flow steering actions to fs_cmd
shim layer.
2) Next 12 patch will add a file per each Software steering
functionality/module as described above. (See patches with title: DR, *)
3) Add CONFIG_MLX5_SW_STEERING for software steering support and enable
build with the new files
4) Next two patches will add the support for software steering in mlx5
steering shim layer
net/mlx5: Add API to set the namespace steering mode
net/mlx5: Add direct rule fs_cmd implementation
5) Last two patches will add the new devlink parameter to select mlx5
steering mode, will be valid only for switchdev mode for now.
Two modes are supported:
1. DMFS - Device managed flow steering
2. SMFS - Software/Driver managed flow steering.
In the DMFS mode, the HW steering entities are created through the
FW. In the SMFS mode this entities are created though the driver
directly.
The driver will use the devlink steering mode only if the steering
domain supports it, for now SMFS will manages only the switchdev
eswitch steering domain.
User command examples:
- Set SMFS flow steering mode::
$ devlink dev param set pci/0000:06:00.0 name flow_steering_mode value "smfs" cmode runtime
- Read device flow steering mode::
$ devlink dev param show pci/0000:06:00.0 name flow_steering_mode
pci/0000:06:00.0:
name flow_steering_mode type driver-specific
values:
cmode runtime value smfs
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Brett Creeley [Thu, 8 Aug 2019 14:39:30 +0000 (07:39 -0700)]
ice: Only disable VLAN pruning for the VF when all VLANs are removed
Currently if the VF adds a VLAN, VLAN pruning will be enabled for that VSI.
Also, when a VLAN gets deleted it will disable VLAN pruning even if other
VLAN(s) exists for the VF. Fix this by only disabling VLAN pruning on the
VF VSI when removing the last VF (i.e. vf->num_vlan == 0).
Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Remove code that enables DCB in initialization when SW LLDP is
activated. DCB flag is set or reset before in ice_init_pf_dcb
based on number of TCs. So there is not need to overwrite it.
Setting DCB without checking number of TCs can cause communication
problems with other cards. Host card sends packet with VLAN priority
tag, but client card doesn't strip this tag and ping doesn't work.
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Dave Ertman [Thu, 8 Aug 2019 14:39:28 +0000 (07:39 -0700)]
ice: Report stats when VSI is down
There is currently a check in get_ndo_stats that
returns before updating stats if the VSI is down
or there are no Tx or Rx queues. This causes the
netdev to report zero stats with the netdev is down.
Remove the check so that the behavior of reporting
stats is the same as it was in IXGBE.
Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Mitch Williams [Thu, 8 Aug 2019 14:39:26 +0000 (07:39 -0700)]
ice: Always notify FW of VF reset
The call to ice_dis_vsi_txq() acts as the notification to the firmware
that the VF is being reset. Because of this, we need to make this call
every time we reset, regardless of whatever else we do to stop the Tx
queues.
Without this change, VF resets would fail to complete on interfaces that
were up and running.
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Dave Ertman [Thu, 8 Aug 2019 14:39:25 +0000 (07:39 -0700)]
ice: Correctly handle return values for init DCB
In the init path for DCB, the call to ice_init_dcb()
can return a non-zero value for either an actual
error, or due to the FW lldp engine being stopped.
We are currently treating all non-zero values only as
an indication that the FW LLDP engine is stopped.
Check for an actual error in the DCB init flow.
Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Usha Ketineni [Thu, 8 Aug 2019 14:39:24 +0000 (07:39 -0700)]
ice: Limit Max TCs on devices with more than 4 ports
This patch limits the max TCs set by the driver to the value provided by
the firmware as per the capabilities of the device. Otherwise, hard coding
to 8 TC max would fail the device configurations with more than 4 ports.
Signed-off-by: Usha Ketineni <usha.k.ketineni@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tony Nguyen [Fri, 2 Aug 2019 08:25:33 +0000 (01:25 -0700)]
ice: Cleanup defines in ice_type.h
Conventionally, if the #defines/other are not needed by other header
files being included, #includes are done first followed by #defines
and other stuff. Move the #defines before the #includes to follow this
convention.
Suggested by: Bruce Allan <bruce.w.allan@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Bruce Allan [Fri, 2 Aug 2019 08:25:30 +0000 (01:25 -0700)]
ice: update driver unloading field for Queue Shutdown AQ command
According to recent specification versions, the field in the Queue Shutdown
AdminQ command consisting of the "driver unloading" indication is not a 4
byte field (it is byte.bit 16.0). Change it to a byte and remove the
unnecessary endian conversion.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Bruce Allan [Fri, 2 Aug 2019 08:25:29 +0000 (01:25 -0700)]
ice: add needed PFR during driver unload
According to the specification, a PF Reset must be done as part of the
driver unload flow.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Chinh T Cao [Fri, 2 Aug 2019 08:25:28 +0000 (01:25 -0700)]
ice: Deduce TSA value from the priority value in the CEE mode
In CEE mode, the TSA information can be derived from the reported
priority value.
Signed-off-by: Chinh T Cao <chinh.t.cao@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Brett Creeley [Fri, 2 Aug 2019 08:25:27 +0000 (01:25 -0700)]
ice: Report what the user set for coalesce [tx|rx]-usecs
Currently if the user sets an odd value for [tx|rx]-usecs we align the
value because the hardware only understands ITR values in multiples of
2. This seems misleading because we are essentially telling the user
that the ITR value is odd, when in fact we have changed it internally.
Fix this by reporting that setting odd ITR values is not allowed.
Also, while making changes to ice_set_rc_coalesce() I noticed a bit of
code/error duplication. Make the necessary changes to remove the
duplication.
Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jeb Cramer [Fri, 2 Aug 2019 08:25:26 +0000 (01:25 -0700)]
ice: Fix resource leak in ice_remove_rule_internal()
We don't free s_rule if ice_aq_sw_rules() returns a non-zero status. If
it returned a zero status, s_rule would be freed right after, so this
implies it should be freed within the scope of the function regardless.
Signed-off-by: Jeb Cramer <jeb.j.cramer@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Add new parameter (flow_steering_mode) to control the flow steering
mode of the driver.
Two modes are supported:
1. DMFS - Device managed flow steering
2. SMFS - Software/Driver managed flow steering.
In the DMFS mode, the HW steering entities are created through the
FW. In the SMFS mode this entities are created though the driver
directly.
The driver will use the devlink steering mode only if the steering
domain supports it, for now SMFS will manages only the switchdev eswitch
steering domain.
User command examples:
- Set SMFS flow steering mode::
$ devlink dev param set pci/0000:06:00.0 name flow_steering_mode value "smfs" cmode runtime
- Read device flow steering mode::
$ devlink dev param show pci/0000:06:00.0 name flow_steering_mode
pci/0000:06:00.0:
name flow_steering_mode type driver-specific
values:
cmode runtime value smfs
Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Maor Gottlieb [Sun, 18 Aug 2019 16:18:11 +0000 (19:18 +0300)]
net/mlx5: Add support to use SMFS in switchdev mode
In case that flow steering mode of the driver is SMFS (Software Managed
Flow Steering), then use the DR (SW steering) API to create the steering
objects.
In addition, add a call to the set peer namespace when switchdev gets
devcom pair event. It is required to support VF LAG in SMFS.
Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Maor Gottlieb [Sun, 18 Aug 2019 16:15:22 +0000 (19:15 +0300)]
net/mlx5: Add API to set the namespace steering mode
Add API to set the flow steering root namesapce mode.
Setting new mode should be called before any steering operation
is executed on the namespace.
This API is going to be used by steering users such switchdev.
Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Maor Gottlieb [Tue, 20 Aug 2019 07:06:48 +0000 (10:06 +0300)]
net/mlx5: Add direct rule fs_cmd implementation
Add support to create flow steering objects
via direct rule API (SW steering).
New layer is added - fs_dr, this layer translates the command that
fs_core sends to the FW into direct rule API. In case that direct
rule is not supported in some feature then -EOPNOTSUPP is
returned.
Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
SW steering is capable of doing many steering functionalities
but there are still some functionalities which are not exposed
to upper layers and therefore performed by the FW.
This is the support for recalculating checksum using a hairpin QP.
The recalculation is required after a modify TTL action which skips
the needed CS calculation in HW.
Signed-off-by: Alex Vesker <valex@mellanox.com> Reviewed-by: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Alex Vesker [Mon, 19 Aug 2019 10:28:35 +0000 (13:28 +0300)]
net/mlx5: DR, Expose steering rule functionality
Rules are the actual objects that tie matchers, header values and
actions. Each rule belongs to a matcher, which can hold multiple rules
sharing the same mask. Each rule is a specific set of values and
actions.
When a packet reaches a matcher it is being matched against the
matcher`s rules. In case of a match over a rule its actions will be
executed. Each rule object contains a set of STEs, where each STE is a
definition of match values and actions defined by the rule.
This file handles the rule operations and processing.
Signed-off-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
On rule creation a set of actions can be provided, the actions describe
what to do with the packet in case of a match. It is possible to provide
a set of actions which will be done by order.
Signed-off-by: Alex Vesker <valex@mellanox.com> Reviewed-by: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Matcher defines which packets fields are matched when a packet arrives.
Matcher is a part of a table and can contain one or more rules. Where
rule defines specific values of the matcher's mask definition.
Signed-off-by: Alex Vesker <valex@mellanox.com> Reviewed-by: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Alex Vesker [Mon, 19 Aug 2019 08:31:25 +0000 (11:31 +0300)]
net/mlx5: DR, Expose steering table functionality
Tables are objects which are used for storing matchers, each table
belongs to a domain and defined by the domain type. When a packet
reaches the table it is being processed by each of its matchers until a
successful match. Tables can hold multiple matchers ordered by matcher
priority. Each table has a level.
Signed-off-by: Alex Vesker <valex@mellanox.com> Reviewed-by: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Domain is the frame for all of the dr (direct rule) objects.
There are different domain types which also affect the object under that
domain. Each domain can hold multiple tables which can hold multiple
matchers and so on, this means that all of the dr (direct rule) objects
exist under a specific domain. The domain object also holds the
resources needed for other objects such as memory management and
communication with the device.
Signed-off-by: Alex Vesker <valex@mellanox.com> Reviewed-by: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Alex Vesker [Mon, 19 Aug 2019 11:14:55 +0000 (14:14 +0300)]
net/mlx5: DR, Add Steering entry (STE) utilities
Steering Entry (STE) object is the basic building block of the steering
map. There are several types of STEs. Each rule can be constructed of
multiple STEs. Each STE dictates which fields of the packet's header are
being matched as well as the information about the next step in map (hit
and miss pointers). The hardware gets a packet and tries to match it
against the STEs, going to either the hit pointer or the miss pointer.
This file handles the STE operations.
Signed-off-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Alex Vesker [Tue, 20 Aug 2019 06:39:34 +0000 (09:39 +0300)]
net/mlx5: DR, Expose an internal API to issue RDMA operations
Inserting or deleting a rule is done by RDMA read/write operation to SW
ICM device memory. This file provides the support for executing these
operations. It includes allocating the needed resources and providing an
API for writing steering entries to the memory.
Signed-off-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Alex Vesker [Mon, 19 Aug 2019 10:46:40 +0000 (13:46 +0300)]
net/mlx5: DR, ICM pool memory allocator
ICM device memory is used for writing steering rules (STEs) to the NIC.
An ICM memory pool allocator was implemented to manage the required
memory. The pool consists of buckets, a bucket per chunk size.
Once a bucket is empty we will cut a row of memory from the latest
allocated MR, if the MR size is not sufficient we will allocate a new MR.
HW design requires that chunks memory address should be aligned to the
chunk size, this is the reason for managing the MR with row size that
insures memory alignment.
Current design is greedy in memory but provides quick allocation times
in steady state.
Signed-off-by: Alex Vesker <valex@mellanox.com> Reviewed-by: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Alex Vesker [Tue, 20 Aug 2019 08:57:28 +0000 (11:57 +0300)]
net/mlx5: DR, Add the internal direct rule types definitions
Add the internal header file that contains various types
definition that will be used in coming patches as well as
the internal functions decelerations.
Signed-off-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@mellanox.com> Reviewed-by: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Maor Gottlieb [Thu, 15 Aug 2019 10:54:17 +0000 (13:54 +0300)]
net/mlx5: Add flow steering actions to fs_cmd shim layer
Add flow steering actions: modify header and packet reformat
to the fs_cmd shim layer. This allows each namespace to define
possibly different functionality for alloc/dealloc action commands.
Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
David S. Miller [Mon, 2 Sep 2019 19:07:46 +0000 (12:07 -0700)]
Merge branch 'mvpp2-per-cpu-buffers'
Matteo Croce says:
====================
mvpp2: per-cpu buffers
This patchset workarounds an PP2 HW limitation which prevents to use
per-cpu rx buffers.
The first patch is just a refactor to prepare for the second one.
The second one allocates percpu buffers if the following conditions are met:
- CPU number is less or equal 4
- no port is using jumbo frames
If the following conditions are not met at load time, of jumbo frame is enabled
later on, the shared allocation is reverted.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Every mvpp2 unit can use up to 8 buffers mapped by the BM (the HW buffer
manager). The HW will place the frames in the buffer pool depending on the
frame size: short (< 128 bytes), long (< 1664) or jumbo (up to 9856).
As any unit can have up to 4 ports, the driver allocates only 2 pools,
one for small and one long frames, and share them between ports.
When the first port MTU is set higher than 1664 bytes, a third pool is
allocated for jumbo frames.
This shared allocation makes impossible to use percpu allocators,
and creates contention between HW queues.
If possible, i.e. if the number of possible CPU are less than 8 and jumbo
frames are not used, switch to a new scheme: allocate 8 per-cpu pools for
short and long frames and bind every pool to an RXQ.
When the first port MTU is set higher than 1664 bytes, the allocation
scheme is reverted to the old behaviour (3 shared pools), and when all
ports MTU are lowered, the per-cpu buffers are allocated again.
Signed-off-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Refactor mvpp2_bm_pool_create(), mvpp2_bm_pool_destroy() and
mvpp2_bm_pools_init() so that they accept a struct device instead
of a struct platform_device, as they just need platform_device->dev.
Removing such dependency makes the BM code more reusable in context
where we don't have a pointer to the platform_device.
Signed-off-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Sat, 31 Aug 2019 12:46:19 +0000 (15:46 +0300)]
net: dsa: Fix off-by-one number of calls to devlink_port_unregister
When a function such as dsa_slave_create fails, currently the following
stack trace can be seen:
[ 2.038342] sja1105 spi0.1: Probed switch chip: SJA1105T
[ 2.054556] sja1105 spi0.1: Reset switch and programmed static config
[ 2.063837] sja1105 spi0.1: Enabled switch tagging
[ 2.068706] fsl-gianfar soc:ethernet@2d90000 eth2: error -19 setting up slave phy
[ 2.076371] ------------[ cut here ]------------
[ 2.080973] WARNING: CPU: 1 PID: 21 at net/core/devlink.c:6184 devlink_free+0x1b4/0x1c0
[ 2.088954] Modules linked in:
[ 2.092005] CPU: 1 PID: 21 Comm: kworker/1:1 Not tainted 5.3.0-rc6-01360-g41b52e38d2b6-dirty #1746
[ 2.100912] Hardware name: Freescale LS1021A
[ 2.105162] Workqueue: events deferred_probe_work_func
[ 2.110287] [<c03133a4>] (unwind_backtrace) from [<c030d8cc>] (show_stack+0x10/0x14)
[ 2.117992] [<c030d8cc>] (show_stack) from [<c10b08d8>] (dump_stack+0xb4/0xc8)
[ 2.125180] [<c10b08d8>] (dump_stack) from [<c0349d04>] (__warn+0xe0/0xf8)
[ 2.132018] [<c0349d04>] (__warn) from [<c0349e34>] (warn_slowpath_null+0x40/0x48)
[ 2.139549] [<c0349e34>] (warn_slowpath_null) from [<c0f19d74>] (devlink_free+0x1b4/0x1c0)
[ 2.147772] [<c0f19d74>] (devlink_free) from [<c1064fc0>] (dsa_switch_teardown+0x60/0x6c)
[ 2.155907] [<c1064fc0>] (dsa_switch_teardown) from [<c1065950>] (dsa_register_switch+0x8e4/0xaa8)
[ 2.164821] [<c1065950>] (dsa_register_switch) from [<c0ba7fe4>] (sja1105_probe+0x21c/0x2ec)
[ 2.173216] [<c0ba7fe4>] (sja1105_probe) from [<c0b35948>] (spi_drv_probe+0x80/0xa4)
[ 2.180920] [<c0b35948>] (spi_drv_probe) from [<c0a4c1cc>] (really_probe+0x108/0x400)
[ 2.188711] [<c0a4c1cc>] (really_probe) from [<c0a4c694>] (driver_probe_device+0x78/0x1bc)
[ 2.196933] [<c0a4c694>] (driver_probe_device) from [<c0a4a3dc>] (bus_for_each_drv+0x58/0xb8)
[ 2.205414] [<c0a4a3dc>] (bus_for_each_drv) from [<c0a4c024>] (__device_attach+0xd0/0x168)
[ 2.213637] [<c0a4c024>] (__device_attach) from [<c0a4b1d0>] (bus_probe_device+0x84/0x8c)
[ 2.221772] [<c0a4b1d0>] (bus_probe_device) from [<c0a4b72c>] (deferred_probe_work_func+0x84/0xc4)
[ 2.230686] [<c0a4b72c>] (deferred_probe_work_func) from [<c03650a4>] (process_one_work+0x218/0x510)
[ 2.239772] [<c03650a4>] (process_one_work) from [<c03660d8>] (worker_thread+0x2a8/0x5c0)
[ 2.247908] [<c03660d8>] (worker_thread) from [<c036b348>] (kthread+0x148/0x150)
[ 2.255265] [<c036b348>] (kthread) from [<c03010e8>] (ret_from_fork+0x14/0x2c)
[ 2.262444] Exception stack(0xea965fb0 to 0xea965ff8)
[ 2.267466] 5fa0: 00000000000000000000000000000000
[ 2.275598] 5fc0: 0000000000000000000000000000000000000000000000000000000000000000
[ 2.283729] 5fe0: 000000000000000000000000000000000000001300000000
[ 2.290333] ---[ end trace ca5d506728a0581a ]---
devlink_free is complaining right here:
WARN_ON(!list_empty(&devlink->port_list));
This happens because devlink_port_unregister is no longer done right
away in dsa_port_setup when a DSA_PORT_TYPE_USER has failed.
Vivien said about this change that:
Also no need to call devlink_port_unregister from within dsa_port_setup
as this step is inconditionally handled by dsa_port_teardown on error.
which is not really true. The devlink_port_unregister function _is_
being called unconditionally from within dsa_port_setup, but not for
this port that just failed, just for the previous ones which were set
up.
ports_teardown:
for (i = 0; i < port; i++)
dsa_port_teardown(&ds->ports[i]);
Initially I was tempted to fix this by extending the "for" loop to also
cover the port that failed during setup. But this could have potentially
unforeseen consequences unrelated to devlink_port or even other types of
ports than user ports, which I can't really test for. For example, if
for some reason devlink_port_register itself would fail, then
unconditionally unregistering it in dsa_port_teardown would not be a
smart idea. The list might go on.
So just make dsa_port_setup undo the setup it had done upon failure, and
let the for loop undo the work of setting up the previous ports, which
are guaranteed to be brought up to a consistent state.
Fixes: b02266e6ab6d ("net: dsa: use a single switch statement for port setup") Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Merge tag 'char-misc-5.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes from Greg KH:
"Here are some small char and misc driver fixes for reported issues for
5.3-rc7
Also included in here is the documentation for how we are handling
hardware issues under embargo that everyone has finally agreed on, as
well as a MAINTAINERS update for the suckers who agreed to handle the
LICENSES/ files.
All of these have been in linux-next last week with no reported
issues"
* tag 'char-misc-5.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
fsi: scom: Don't abort operations for minor errors
vmw_balloon: Fix offline page marking with compaction
VMCI: Release resource if the work is already queued
Documentation/process: Embargoed hardware security issues
lkdtm/bugs: fix build error in lkdtm_EXHAUST_STACK
mei: me: add Tiger Lake point LP device ID
intel_th: pci: Add Tiger Lake support
intel_th: pci: Add support for another Lewisburg PCH
stm class: Fix a double free of stm_source_device
MAINTAINERS: add entry for LICENSES and SPDX stuff
fpga: altera-ps-spi: Fix getting of optional confd gpio
Merge tag 'usb-5.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are some small USB fixes that have been in linux-next this past
week for 5.3-rc7
They fix the usual xhci, syzbot reports, and other small issues that
have come up last week.
All have been in linux-next with no reported issues"
* tag 'usb-5.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
USB: cdc-wdm: fix race between write and disconnect due to flag abuse
usb: host: xhci: rcar: Fix typo in compatible string matching
usb: host: xhci-tegra: Set DMA mask correctly
USB: storage: ums-realtek: Whitelist auto-delink support
USB: storage: ums-realtek: Update module parameter description for auto_delink_en
usb: host: ohci: fix a race condition between shutdown and irq
usb: hcd: use managed device resources
typec: tcpm: fix a typo in the comparison of pdo_max_voltage
usb-storage: Add new JMS567 revision to unusual_devs
usb: chipidea: udc: don't do hardware access if gadget has stopped
usbtmc: more sanity checking for packet size
usb: udc: lpc32xx: silence fall-through warning
Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Merge mlx5-next patches needed for upcoming mlx5 software steering.
1) Alex adds HW bits and definitions required for SW steering
2) Ariel moves device memory management to mlx5_core (From mlx5_ib)
3) Maor, Cleanups and fixups for eswitch mode and RoCE
4) Mark, Set only stag for match untagged packets
Maor Gottlieb [Thu, 29 Aug 2019 23:42:34 +0000 (23:42 +0000)]
net/mlx5: Avoid disabling RoCE when uninitialized
Move the check if RoCE steering is initialized to the
disable RoCE function, it will ensure that we disable
RoCE only if we succeeded in enabling it before.
Fixes: f9ec3586a73a ("net/mlx5: Eswitch, enable RoCE loopback traffic") Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Ariel Levkovich [Thu, 29 Aug 2019 23:42:30 +0000 (23:42 +0000)]
net/mlx5: Move device memory management to mlx5_core
Move the device memory allocation and deallocation commands
SW ICM memory to mlx5_core to expose this API for all
mlx5_core users.
This comes as preparation for supporting SW steering in kernel
where it will be required to allocate and register device
memory for direct rule insertion.
In addition, an API to register this device memory for future
remote access operations is introduced using the create_mkey
commands.
1) Fix some length checks during OGM processing in batman-adv, from
Sven Eckelmann.
2) Fix regression that caused netfilter conntrack sysctls to not be
per-netns any more. From Florian Westphal.
3) Use after free in netpoll, from Feng Sun.
4) Guard destruction of pfifo_fast per-cpu qdisc stats with
qdisc_is_percpu_stats(), from Davide Caratti. Similar bug is fixed
in pfifo_fast_enqueue().
5) Fix memory leak in mld_del_delrec(), from Eric Dumazet.
6) Handle neigh events on internal ports correctly in nfp, from John
Hurley.
7) Clear SKB timestamp in NF flow table code so that it does not
confuse fq scheduler. From Florian Westphal.
8) taprio destroy can crash if it is invoked in a failure path of
taprio_init(), because the list head isn't setup properly yet and
the list del is unconditional. Perform the list add earlier to
address this. From Vladimir Oltean.
9) Make sure to reapply vlan filters on device up, in aquantia driver.
From Dmitry Bogdanov.
10) sgiseeq driver releases DMA memory using free_page() instead of
dma_free_attrs(). From Christophe JAILLET.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (58 commits)
net: seeq: Fix the function used to release some memory in an error handling path
enetc: Add missing call to 'pci_free_irq_vectors()' in probe and remove functions
net: bcmgenet: use ethtool_op_get_ts_info()
tc-testing: don't hardcode 'ip' in nsPlugin.py
net: dsa: microchip: add KSZ8563 compatibility string
dt-bindings: net: dsa: document additional Microchip KSZ8563 switch
net: aquantia: fix out of memory condition on rx side
net: aquantia: linkstate irq should be oneshot
net: aquantia: reapply vlan filters on up
net: aquantia: fix limit of vlan filters
net: aquantia: fix removal of vlan 0
net/sched: cbs: Set default link speed to 10 Mbps in cbs_set_port_rate
taprio: Set default link speed to 10 Mbps in taprio_set_picos_per_byte
taprio: Fix kernel panic in taprio_destroy
net: dsa: microchip: fill regmap_config name
rxrpc: Fix lack of conn cleanup when local endpoint is cleaned up [ver #2]
net: stmmac: dwmac-rk: Don't fail if phy regulator is absent
amd-xgbe: Fix error path in xgbe_mod_init()
netfilter: nft_meta_bridge: Fix get NFT_META_BRI_IIFVPROTO in network byteorder
mac80211: Correctly set noencrypt for PAE frames
...
Following Marek's work on the abstraction of the SERDES lanes mapping, this
series trades the .serdes_irq_setup and .serdes_irq_free callbacks for new
.serdes_irq_mapping, .serdes_irq_enable and .serdes_irq_status operations.
This has the benefit to limit the various SERDES implementations to simple
register accesses only; centralize the IRQ handling and mutex locking logic;
as well as reducing boilerplate in the driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The .serdes_irq_setup are all following the same steps: get the SERDES
lane, get the IRQ mapping, request the IRQ, then enable it. So do
the .serdes_irq_free implementations: get the SERDES lane, disable
the IRQ, then free it.
This patch removes these operations in favor of generic functions.
Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Sat, 31 Aug 2019 20:18:33 +0000 (16:18 -0400)]
net: dsa: mv88e6xxx: pass lane to .serdes_power
Now the first step of all .serdes_power implementations is getting
the lane mapping. Since we have an operation for that, call it in
the wrapper and pass the lane down to the .serdes_power operation.
This also allows to avoid querying the SERDES lane twice in
mv88e6xxx_port_set_cmode.
At the same time provide mv88e6xxx_serdes_power_{up,down} helpers
and prefer up/down instead of on/off as in the documentation.
Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Even though 88E6352 has no dedicated lane for SERDES interfaces, it
uses a similar code as the other .serdes_get_lane implementations to
check the port's CMODE and ensure that SERDES operations are doable.
For consistency, implement mv88e6352_serdes_get_lane for the 88E6352
and similar switches which simply returns an unused 0xff lane address.
Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Sat, 31 Aug 2019 20:18:30 +0000 (16:18 -0400)]
net: dsa: mv88e6xxx: simplify .serdes_get_lane
Because the mapping between a SERDES interface and its lane is static,
we don't need to stick with negative error codes actually and we can
simply return 0 if there is no lane, just like the IRQ mapping.
This way we can keep a simple and intuitive API using unsigned lane
numbers while simplifying the implementations with single return
statements. Last but not least, fix the reverse chrismas tree in
mv88e6390x_serdes_get_lane.
Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Sat, 31 Aug 2019 20:18:28 +0000 (16:18 -0400)]
net: dsa: mv88e6xxx: fix SERDES IRQ mapping
The current mv88e6xxx SERDES code checks for negative error code from
irq_find_mapping, while this function returns an unsigned integer. This
patch removes this dead code and simply returns 0 is no IRQ is found.
Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Sat, 31 Aug 2019 20:18:27 +0000 (16:18 -0400)]
net: dsa: mv88e6xxx: check errors in mv88e6352_serdes_irq_link
The mv88e6352_serdes_irq_link helper is not checking for any error that
may occur during hardware accesses. Worst, the "up" boolean is set from
the potentially unused "status" variable, if read operations failed.
As done in mv88e6390_serdes_irq_link_sgmii, return right away and do
not call dsa_port_phylink_mac_change if an error occurred.
Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
YueHaibing [Sat, 31 Aug 2019 12:29:11 +0000 (12:29 +0000)]
net: hns3: remove set but not used variable 'qos'
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c: In function 'hclge_restore_vlan_table':
drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c:8016:18: warning:
variable 'qos' set but not used [-Wunused-but-set-variable]
Reported-by: Hulk Robot <hulkci@huawei.com> Fixes: 597f84f062ce ("net: hns3: reduce the parameters of some functions") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Sat, 31 Aug 2019 07:29:49 +0000 (08:29 +0100)]
net: hns3: remove redundant assignment to pointer reg_info
Pointer reg_info is being initialized with a value that is never read and
is being re-assigned a little later on. The assignment is redundant
and hence can be removed.
Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: seeq: Fix the function used to release some memory in an error handling path
In commit 47330dd6483f ("sgiseeq: replace use of dma_cache_wback_inv"),
a call to 'get_zeroed_page()' has been turned into a call to
'dma_alloc_coherent()'. Only the remove function has been updated to turn
the corresponding 'free_page()' into 'dma_free_attrs()'.
The error hndling path of the probe function has not been updated.
Fix it now.
Rename the corresponding label to something more in line.
Fixes: 47330dd6483f ("sgiseeq: replace use of dma_cache_wback_inv") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Thomas Bogendoerfer <tbogendoerfer@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Sun, 1 Sep 2019 15:52:05 +0000 (16:52 +0100)]
netlabel: remove redundant assignment to pointer iter
Pointer iter is being initialized with a value that is never read and
is being re-assigned a little later on. The assignment is redundant
and hence can be removed.
Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: David S. Miller <davem@davemloft.net>
RTL8125 uses a different register for VLAN offloading config,
therefore don't set bit RxVlan.
Fixes: 80025be3880f ("r8169: add support for RTL8125") Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
"A set of fixes for x86:
- Fix the bogus detection of 32bit user mode for uretprobes which
caused corruption of the user return address resulting in
application crashes. In the uprobes handler in_ia32_syscall() is
obviously always returning false on a 64bit kernel. Use
user_64bit_mode() instead which works correctly.
- Prevent large page splitting when ftrace flips RW/RO on the kernel
text which caused iTLB performance issues. Ftrace wants to be
converted to text_poke() which avoids the problem, but for now
allow large page preservation in the static protections check when
the change request spawns a full large page.
- Prevent arch_dynirq_lower_bound() from returning 0 when the IOAPIC
is configured via device tree. In the device tree case the GSI 1:1
mapping is meaningless therefore the lower bound which protects the
GSI range on ACPI machines is irrelevant. Return the lower bound
which the core hands to the function instead of blindly returning 0
which causes the core to allocate the invalid virtual interupt
number 0 which in turn prevents all drivers from allocating and
requesting an interrupt.
- Remove the bogus initialization of LDR and DFR in the 32bit bigsmp
APIC driver. That uses physical destination mode where LDR/DFR are
ignored, but the initialization and the missing clear of LDR caused
the APIC to be left in a inconsistent state on kexec/reboot.
- Clear LDR when clearing the APIC registers so the APIC is in a well
defined state.
- Initialize variables proper in the find_trampoline_placement()
code.
- Silence GCC( build warning for the real mode part of the build"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm/cpa: Prevent large page split when ftrace flips RW on kernel text
x86/build: Add -Wnoaddress-of-packed-member to REALMODE_CFLAGS, to silence GCC9 build warning
x86/boot/compressed/64: Fix missing initialization in find_trampoline_placement()
x86/apic: Include the LDR when clearing out APIC registers
x86/apic: Do not initialize LDR and DFR for bigsmp
uprobes/x86: Fix detection of 32-bit user mode
x86/apic: Fix arch_dynirq_lower_bound() bug for DT enabled machines
Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Thomas Gleixner:
"Two fixes for perf x86 hardware implementations:
- Restrict the period on Nehalem machines to prevent perf from
hogging the CPU
- Prevent the AMD IBS driver from overwriting the hardwre controlled
and pre-seeded reserved bits (0-6) in the count register which
caused a sample bias for dispatched micro-ops"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/amd/ibs: Fix sample bias for dispatched micro-ops
perf/x86/intel: Restrict period on Nehalem
Ryan M. Collins [Fri, 30 Aug 2019 18:49:55 +0000 (14:49 -0400)]
net: bcmgenet: use ethtool_op_get_ts_info()
This change enables the use of SW timestamping on the Raspberry Pi 4.
bcmgenet's transmit function bcmgenet_xmit() implements software
timestamping. However the SOF_TIMESTAMPING_TX_SOFTWARE capability was
missing and only SOF_TIMESTAMPING_RX_SOFTWARE was announced. By using
ethtool_ops bcmgenet_ethtool_ops() as get_ts_info(), the
SOF_TIMESTAMPING_TX_SOFTWARE capability is announced.
Similar to commit b75da18c58e6 ("smsc95xx: use ethtool_op_get_ts_info()")
Signed-off-by: Ryan M. Collins <rmc032@bucknell.edu> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Doug Berger <opendmb@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Davide Caratti [Fri, 30 Aug 2019 16:51:47 +0000 (18:51 +0200)]
tc-testing: don't hardcode 'ip' in nsPlugin.py
the following tdc test fails on Fedora:
# ./tdc.py -e 2638
-- ns/SubPlugin.__init__
Test 2638: Add matchall and try to get it
-----> prepare stage *** Could not execute: "$TC qdisc add dev $DEV1 clsact"
-----> prepare stage *** Error message: "/bin/sh: ip: command not found"
returncode 127; expected [0]
-----> prepare stage *** Aborting test run.
Let nsPlugin.py use the 'IP' variable introduced with commit b8f14a9bc815
("tc-tests: added path to ip command in tdc"), so that the path to 'ip' is
correctly resolved to the value we have in tdc_config.py.
# ./tdc.py -e 2638
-- ns/SubPlugin.__init__
Test 2638: Add matchall and try to get it
All test results:
1..1
ok 1 2638 - Add matchall and try to get it
Fixes: f893bfa56fda ("tc-testing: Restore original behaviour for namespaces in tdc") Reported-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 1 Sep 2019 06:46:13 +0000 (23:46 -0700)]
Merge branch 'Minor-cleanup-in-devlink'
Parav Pandit says:
====================
Minor cleanup in devlink
Two minor cleanup in devlink.
Patch-1 Explicitly defines devlink port index as unsigned int
Patch-2 Uses switch-case to handle different port flavours attributes
====================
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Parav Pandit [Fri, 30 Aug 2019 10:39:44 +0000 (05:39 -0500)]
devlink: Make port index data type as unsigned int
Devlink port index attribute is returned to users as u32 through
netlink response.
Change index data type from 'unsigned' to 'unsigned int' to avoid
below checkpatch.pl warning.
WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
81: FILE: include/net/devlink.h:81:
+ unsigned index;
Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 1 Sep 2019 06:44:28 +0000 (23:44 -0700)]
Merge branch 'net-tls-add-socket-diag'
Davide Caratti says:
====================
net: tls: add socket diag
The current kernel does not provide any diagnostic tool, except
getsockopt(TCP_ULP), to know more about TCP sockets that have an upper
layer protocol (ULP) on top of them. This series extends the set of
information exported by INET_DIAG_INFO, to include data that are
specific to the ULP (and that might be meaningful for debug/testing
purposes).
patch 1/3 ensures that the control plane reads/updates ULP specific data
using RCU.
patch 2/3 extends INET_DIAG_INFO and allows knowing the ULP name for
each TCP socket that has done setsockopt(TCP_ULP) successfully.
patch 3/3 extends kTLS to let programs like 'ss' know the protocol
version and the cipher in use.
Changes since v2:
- remove unneeded #ifdef and fix reverse christmas tree in
tls_get_info(), thanks to Jakub Kicinski
Changes since v1:
- don't worry about grace period when accessing ulp_ops, thanks to
Jakub Kicinski and Eric Dumazet
- use rcu_dereference() to access ULP data in tls get_info(), and
test against NULL value, thanks to Jakub Kicinski
- move RCU protected section inside tls get_info(), thanks to Jakub
Kicinski
Changes since RFC:
- some coding style fixes, thanks to Jakub Kicinski
- add X_UNSPEC as lowest value of uAPI enums, thanks to Jakub Kicinski
- fix assignment of struct nlattr *start, thanks to Jakub Kicinski
- let tls dump RXCONF and TXCONF, suggested by Jakub Kicinski
- don't dump anything if TLS version or cipher are 0 (but still return a
constant size in get_aux_size()), thanks to Boris Pismenny
- constify first argument of get_info() and get_size()
- use RCU to access access ulp_ops, like it's done for ca_ops
- add patch 1/3, from Jakub Kicinski
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Davide Caratti [Fri, 30 Aug 2019 10:25:49 +0000 (12:25 +0200)]
net: tls: export protocol version, cipher, tx_conf/rx_conf to socket diag
When an application configures kernel TLS on top of a TCP socket, it's
now possible for inet_diag_handler() to collect information regarding the
protocol version, the cipher type and TX / RX configuration, in case
INET_DIAG_INFO is requested.
Signed-off-by: Davide Caratti <dcaratti@redhat.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Davide Caratti [Fri, 30 Aug 2019 10:25:48 +0000 (12:25 +0200)]
tcp: ulp: add functions to dump ulp-specific information
currently, only getsockopt(TCP_ULP) can be invoked to know if a ULP is on
top of a TCP socket. Extend idiag_get_aux() and idiag_get_aux_size(),
introduced by commit 4648c0e7f552 ("inet_diag: allow protocols to provide
additional data"), to report the ULP name and other information that can
be made available by the ULP through optional functions.
Users having CAP_NET_ADMIN privileges will then be able to retrieve this
information through inet_diag_handler, if they specify INET_DIAG_INFO in
the request.
Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dmitry Bogdanov [Fri, 30 Aug 2019 12:08:38 +0000 (12:08 +0000)]
net: aquantia: fix out of memory condition on rx side
On embedded environments with hard memory limits it is a normal although
rare case when skb can't be allocated on rx part under high traffic.
In such OOM cases napi_complete_done() was not called.
So the napi object became in an invalid state like it is "scheduled".
Kernel do not re-schedules the poll of that napi object.
Consequently, kernel can not remove that object the system hangs on
`ifconfig down` waiting for a poll.
We are fixing this by gracefully closing napi poll routine with correct
invocation of napi_complete_done.
This was reproduced with artificially failing the allocation of skb to
simulate an "out of memory" error case and check that traffic does
not get stuck.
Fixes: 7d44f9f7bd40 ("net: ethernet: aquantia: Vector operations") Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: Dmitry Bogdanov <dmitry.bogdanov@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Igor Russkikh [Fri, 30 Aug 2019 12:08:36 +0000 (12:08 +0000)]
net: aquantia: linkstate irq should be oneshot
Declaring threaded irq handler should also indicate the irq is
oneshot. It is oneshot indeed, because HW implements irq automasking
on trigger.
Not declaring this causes some kernel configurations to fail
on interface up, because request_threaded_irq returned an err code.
The issue was originally hidden on normal x86_64 configuration with
latest kernel, because depending on interrupt controller, irq driver
added ONESHOT flag on its own.
Issue was observed on older kernels (4.14) where no such logic exists.
Fixes: 335d69472b53 ("net: aquantia: link status irq handling") Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Reported-by: Michael Symolkin <Michael.Symolkin@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dmitry Bogdanov [Fri, 30 Aug 2019 12:08:35 +0000 (12:08 +0000)]
net: aquantia: reapply vlan filters on up
In case of device reconfiguration the driver may reset the device invisible
for other modules, vlan module in particular. So vlans will not be
removed&created and vlan filters will not be configured in the device.
The patch reapplies the vlan filters at device start.
Fixes: 7f33a1ab29d28 ("net: aquantia: add support of rx-vlan-filter offload") Signed-off-by: Dmitry Bogdanov <dmitry.bogdanov@aquantia.com> Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dmitry Bogdanov [Fri, 30 Aug 2019 12:08:33 +0000 (12:08 +0000)]
net: aquantia: fix limit of vlan filters
Fix a limit condition of vlans on the interface before setting vlan
promiscuous mode
Fixes: 470119e92b79d ("net: aquantia: fix vlans not working over bridged network") Signed-off-by: Dmitry Bogdanov <dmitry.bogdanov@aquantia.com> Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dmitry Bogdanov [Fri, 30 Aug 2019 12:08:30 +0000 (12:08 +0000)]
net: aquantia: fix removal of vlan 0
Due to absence of checking against the rx flow rule when vlan 0 is being
removed, the other rule could be removed instead of the rule with vlan 0
Fixes: 7f33a1ab29d28 ("net: aquantia: add support of rx-vlan-filter offload") Signed-off-by: Dmitry Bogdanov <dmitry.bogdanov@aquantia.com> Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Fri, 30 Aug 2019 01:07:23 +0000 (04:07 +0300)]
net/sched: cbs: Set default link speed to 10 Mbps in cbs_set_port_rate
The discussion to be made is absolutely the same as in the case of
previous patch ("taprio: Set default link speed to 10 Mbps in
taprio_set_picos_per_byte"). Nothing is lost when setting a default.
Cc: Leandro Dorileo <leandro.maciel.dorileo@intel.com> Fixes: 51bdeb9ace94 ("net/sched: cbs: fix port_rate miscalculation") Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>