Jakub Kicinski [Wed, 19 May 2021 17:18:25 +0000 (10:18 -0700)]
mlx5: count all link events
mlx5 devices were observed generating MLX5_PORT_CHANGE_SUBTYPE_ACTIVE
events without an intervening MLX5_PORT_CHANGE_SUBTYPE_DOWN. This
breaks link flap detection based on Linux carrier state transition
count as netif_carrier_on() does nothing if carrier is already on.
Make sure we count such events.
netif_carrier_event() increments the counters and fires the linkwatch
events. The latter is not necessary for the use case but seems like
the right thing to do.
Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
David S. Miller [Wed, 2 Jun 2021 21:08:37 +0000 (14:08 -0700)]
Merge branch 'devlink-rate-objects'
Dmytro Linkin says:
====================
devlink: rate objects API
Resending without RFC.
Currently kernel provides a way to change tx rate of single VF in
switchdev mode via tc-police action. When lots of VFs are configured
management of theirs rates becomes non-trivial task and some grouping
mechanism is required. Implementing such grouping in tc-police will bring
flow related limitations and unwanted complications, like:
- tc-police is a policer and there is a user request for a traffic
shaper, so shared tc-police action is not suitable;
- flows requires net device to be placed on, means "groups" wouldn't
have net device instance itself. Taking into the account previous
point was reviewed a sollution, when representor have a policer and
the driver use a shaper if qdisc contains group of VFs - such approach
ugly, compilated and misleading;
- TC is ingress only, while configuring "other" side of the wire looks
more like a "real" picture where shaping is outside of the steering
world, similar to "ip link" command;
According to that devlink is the most appropriate place.
This series introduces devlink API for managing tx rate of single devlink
port or of a group by invoking callbacks (see below) of corresponding
driver. Also devlink port or a group can be added to the parent group,
where driver responsible to handle rates of a group elements. To achieve
all of that new rate object is added. It can be one of the two types:
- leaf - represents a single devlink port; created/destroyed by the
driver and bound to the devlink port. As example, some driver may
create leaf rate object for every devlink port associated with VF.
Since leaf have 1to1 mapping to it's devlink port, in user space it is
referred as pci/<bus_addr>/<port_index>;
- node - represents a group of rate objects; created/deleted by request
from the userspace; initially empty (no rate objects added). In
userspace it is referred as pci/<bus_addr>/<node_name>, where node name
can be any, except decimal number, to avoid collisions with leafs.
devlink_ops extended with following callbacks:
- rate_{leaf|node}_tx_{share|max}_set
- rate_node_{new|del}
- rate_{leaf|node}_parent_set
KAPI provides:
- creation/destruction of the leaf rate object associated with devlink
port
- destruction of rate nodes to allow a vendor driver to free allocated
resources on driver removal or due to the other reasons when nodes
destruction required
UAPI provides:
- dumping all or single rate objects
- setting tx_{share|max} of rate object of any type
- creating/deleting node rate object
- setting/unsetting parent of any rate object
Added devlink rate object support for netdevsim driver
Issues/open questions:
- Does user need DEVLINK_CMD_RATE_DEL_ALL_CHILD command to clean all
children of particular parent node? For example:
$ devlink port function rate flush netdevsim/netdevsim10/group
- priv pointer passed to the callbacks is a source of bugs; in leaf case
driver can embed rate object into internal structure and use
container_of() on it; in node case it cannot be done since nodes are
created from userspace
v1->v2:
- fixed kernel-doc for devlink_rate_leaf_{create|destroy}()
- s/func/function/ for all devlink port command occurences
v2->v3:
- devlink:
- added devlink_rate_nodes_destroy() function
- netdevsim:
- added call of devlink_rate_nodes_destroy() function
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Dmytro Linkin [Wed, 2 Jun 2021 12:17:29 +0000 (15:17 +0300)]
netdevsim: Allow setting parent node of rate objects
Implement new devlink ops that allow setting rate node as a parent for
devlink port (leaf) or another devlink node through devlink API.
Expose parent names to netdevsim debugfs in read only mode.
Co-developed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dmytro Linkin [Wed, 2 Jun 2021 12:17:28 +0000 (15:17 +0300)]
devlink: Allow setting parent node of rate objects
Refactor DEVLINK_CMD_RATE_{GET|SET} command handlers to support setting
a node as a parent for another rate object (leaf or node) by means of
new attribute DEVLINK_ATTR_RATE_PARENT_NODE_NAME. Extend devlink ops
with new callbacks rate_{leaf|node}_parent_set() to set node as a parent
for rate object to allow supporting drivers to implement rate grouping
through devlink. Driver implementations are allowed to support leafs
or node children only. Invoking callback with NULL as parent should be
threated by the driver as unset parent action.
Extend rate object struct with reference counter to disallow deleting a
node with any child pointing to it. User should unset parent for the
child explicitly.
Example:
$ devlink port function rate add netdevsim/netdevsim10/group1
$ devlink port function rate add netdevsim/netdevsim10/group2
$ devlink port function rate set netdevsim/netdevsim10/group1 parent group2
$ devlink port function rate show netdevsim/netdevsim10/group1
netdevsim/netdevsim10/group1: type node parent group2
$ devlink port function rate set netdevsim/netdevsim10/group1 noparent
Co-developed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dmytro Linkin [Wed, 2 Jun 2021 12:17:26 +0000 (15:17 +0300)]
netdevsim: Implement support for devlink rate nodes
Implement new devlink ops that allow creation, deletion and setting of
shared/max tx rate of devlink rate nodes through devlink API.
Expose rate node and it's tx rates to netdevsim debugfs.
Co-developed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dmytro Linkin [Wed, 2 Jun 2021 12:17:25 +0000 (15:17 +0300)]
devlink: Introduce rate nodes
Implement support for DEVLINK_CMD_RATE_{NEW|DEL} commands that are used
to create and delete devlink rate nodes. Add new attribute
DEVLINK_ATTR_RATE_NODE_NAME that specify node name string. The node name
is an alphanumeric identifier. No valid node name can be a devlink port
index, eg. decimal number. Extend devlink ops with new callbacks
rate_node_{new|del}() and rate_node_tx_{share|max}_set() to allow
supporting drivers to implement ports rate grouping and setting tx rate
of rate nodes through devlink.
Expose devlink_rate_nodes_destroy() function to allow vendor driver do
proper cleanup of internally allocated resources for the nodes if the
driver goes down or due to any other reasons which requires nodes to be
destroyed.
Disallow moving device from switchdev to legacy mode if any node exists
on that device. User must explicitly delete nodes before switching mode.
Example:
$ devlink port function rate add netdevsim/netdevsim10/group1
$ devlink port function rate set netdevsim/netdevsim10/group1 \
tx_share 10mbit tx_max 100mbit
Add + set command can be combined:
$ devlink port function rate add netdevsim/netdevsim10/group1 \
tx_share 10mbit tx_max 100mbit
$ devlink port function rate show netdevsim/netdevsim10/group1
netdevsim/netdevsim10/group1: type node tx_share 10mbit tx_max 100mbit
$ devlink port function rate del netdevsim/netdevsim10/group1
Co-developed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dmytro Linkin [Wed, 2 Jun 2021 12:17:22 +0000 (15:17 +0300)]
devlink: Allow setting tx rate for devlink rate leaf objects
Implement support for DEVLINK_CMD_RATE_SET command with new attributes
DEVLINK_ATTR_RATE_TX_{SHARE|MAX} that are used to set devlink rate
shared/max tx rate values. Extend devlink ops with new callbacks
rate_leaf_tx_{share|max}_set() to allow supporting drivers to implement
rate control through devlink.
New attributes are optional. Driver implementations are allowed to
support either or both of them.
Shared rate example:
$ devlink port function rate set netdevsim/netdevsim10/0 tx_share 10mbit
$ devlink port function rate show netdevsim/netdevsim10/0
netdevsim/netdevsim10/0: type leaf tx_share 10mbit
Max rate example:
$ devlink port function rate set netdevsim/netdevsim10/0 tx_max 100mbit
$ devlink port function rate show netdevsim/netdevsim10/0
netdevsim/netdevsim10/0: type leaf tx_max 100mbit
Co-developed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dmytro Linkin [Wed, 2 Jun 2021 12:17:19 +0000 (15:17 +0300)]
devlink: Introduce rate object
Allow registering rate object for devlink ports with dedicated
devlink_rate_leaf_{create|destroy}() API. Implement new netlink
DEVLINK_CMD_RATE_GET command that is used to retrieve rate object info.
Add new DEVLINK_CMD_RATE_{NEW|DEL} commands that are used for
notifications when creating/deleting leaf rate object.
Rate API is intended to be used for rate limiting of individual
devlink ports (leafs) and their aggregates (nodes).
Example:
$ devlink port show
pci/0000:03:00.0/0
pci/0000:03:00.0/1
$ devlink port function rate show
pci/0000:03:00.0/0: type leaf
pci/0000:03:00.0/1: type leaf
Co-developed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dmytro Linkin [Wed, 2 Jun 2021 12:17:18 +0000 (15:17 +0300)]
netdevsim: Implement legacy/switchdev mode for VFs
Implement callbacks to set/get eswitch mode value. Add helpers to check
current mode.
Instantiate VFs' net devices and devlink ports on switchdev enabling and
remove them on legacy enabling. Changing number of VFs while in
switchdev mode triggers VFs creation/deletion.
Also disable NDO API callback to set VF rate, since it's legacy API.
Switchdev API to set VF rate will be implemented in one of the next
patches.
Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dmytro Linkin [Wed, 2 Jun 2021 12:17:17 +0000 (15:17 +0300)]
netdevsim: Implement VFs
Allow creation of netdevsim ports for VFs along with allocations of
corresponding net devices and devlink ports.
Add enums and helpers to distinguish PFs' ports from VFs' ports.
Ports creation/deletion debugfs API intended to be used with physical
ports only.
VFs instantiation will be done in one of the next patches.
Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dmytro Linkin [Wed, 2 Jun 2021 12:17:16 +0000 (15:17 +0300)]
netdevsim: Implement port types and indexing
Define type of ports, which netdevsim driver currently operates with as
PF. Define new port type - VF, which will be implemented in following
patches. Add helper functions to distinguish them. Add helper function
to get VF index from port index.
Add port indexing logic where PFs' indexes starts from 0, VFs' - from
NSIM_DEV_VF_PORT_INDEX_BASE.
All ports uses same index pool, which means that PF port may be created
with index from VFs' indexes range.
Maximum number of VFs, which the driver can allocate, is limited by
UINT_MAX - BASE.
Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dmytro Linkin [Wed, 2 Jun 2021 12:17:15 +0000 (15:17 +0300)]
netdevsim: Disable VFs on nsim_dev_reload_destroy() call
Move VFs disabling from device release() to nsim_dev_reload_destroy() to
make VFs disabling and ports removal simultaneous.
This is a requirement for VFs ports implemented in next patches.
Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dmytro Linkin [Wed, 2 Jun 2021 12:17:14 +0000 (15:17 +0300)]
netdevsim: Add max_vfs to bus_dev
Currently there is no limit to the number of VFs netdevsim can enable.
In a real systems this value exist and used by the driver.
Fore example, some features might need to consider this value when
allocating memory.
Expose max_vfs variable to debugfs as configurable resource. If are VFs
configured (num_vfs != 0) then changing of max_vfs not allowed.
Co-developed-by: Yuval Avnery <yuvalav@nvidia.com> Signed-off-by: Yuval Avnery <yuvalav@nvidia.com> Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 2 Jun 2021 21:04:42 +0000 (14:04 -0700)]
Merge branch 'nfp-ct-offload'
Simon Horman says:
====================
Introduce conntrack offloading to the nfp driver
Louis Peens says:
This is the first in a series of patches to offload conntrack
to the nfp. The approach followed is to flatten out three
different flow rules into a single offloaded flow. The three
different flows are:
1) The rule sending the packet to conntrack (pre_ct)
2) The rule matching on +trk+est after a packet has been through
conntrack. (post_ct)
3) The rule received via callback from the netfilter (nft)
In order to offload a flow we need a combination of all three flows, but
they could be added/deleted at different times and in different order.
To solve this we save potential offloadable CT flows in the driver,
and every time we receive a callback we check against these saved flows
for valid merges. Once we have a valid combination of all three flows
this will be offloaded to the NFP. This is demonstrated in the diagram
below.
+-------------+ +----------+
| pre_ct flow +--------+ | nft flow |
+-------------+ v +------+---+
+----------+ |
| tc_merge +--------+ |
+----------+ v v
+--------------+ ^ +-------------+
| post_ct flow +-------+ +---+nft_tc merge |
+--------------+ | +-------------+
|
|
|
v
Offload to nfp
This series is only up to the point of the pre_ct and post_ct
merges into the tc_merge. Follow up series will continue
to add the nft flows and merging of these flows with the result
of the pre_ct and post_ct merged flows.
Changes since v2:
- nfp: flower-ct: add zone table entry when handling pre/post_ct flows
Fixed another docstring. Should finally have the patch check
environment properly configured now to avoid more of these.
- nfp: flower-ct: add tc merge functionality
Fixed warning found by "kernel test robot <lkp@intel.com>"
Added code comment explaining chain_index comparison
Changes since v1:
- nfp: flower-ct: add ct zone table
Fixed unused variable compile warning
Fixed missing colon in struct description
====================
Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Louis Peens [Wed, 2 Jun 2021 11:59:52 +0000 (13:59 +0200)]
nfp: flower-ct: add tc merge functionality
Add merging of pre/post_ct flow rules into the tc_merge table.
Pre_ct flows needs to be merge with post_ct flows and vice versa.
This needs to be done for all flows in the same zone table, as well
as with the wc_zone_table, which is for flows masking out ct_zone
info.
Cleanup is happening when all the tables are cleared up and prints
a warning traceback as this is not expected in the final version.
At this point we are not actually returning success for the offload,
so we do not get any delete requests for flows, so we can't delete
them that way yet. This means that cleanup happens in what would
usually be an exception path.
Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Louis Peens [Wed, 2 Jun 2021 11:59:51 +0000 (13:59 +0200)]
nfp: flower-ct: add tc_merge_tb
Add the table required to store the merge result of pre_ct and post_ct
flows. This is just the initial setup and teardown of the table,
the implementation will be in follow-up patches.
Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Louis Peens [Wed, 2 Jun 2021 11:59:50 +0000 (13:59 +0200)]
nfp: flower-ct: add a table to map flow cookies to ct flows
Add a hashtable which contains entries to map flow cookies to ct
flow entries. Currently the entries are added and not used, but
follow-up patches will use this for stats updates and flow deletes.
Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Louis Peens [Wed, 2 Jun 2021 11:59:49 +0000 (13:59 +0200)]
nfp: flower-ct: add nfp_fl_ct_flow_entries
This commit starts adding the structures and lists that will
be used in follow up commits to enable offloading of conntrack.
Some stub functions are also introduced as placeholders by
this commit.
Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Louis Peens [Wed, 2 Jun 2021 11:59:48 +0000 (13:59 +0200)]
nfp: flower-ct: add zone table entry when handling pre/post_ct flows
Start populating the pre/post_ct handler functions. Add a zone entry
to the zone table, based on the zone information from the flow. In
the case of a post_ct flow which has a wildcarded match on the zone
create a special entry.
Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Louis Peens [Wed, 2 Jun 2021 11:59:47 +0000 (13:59 +0200)]
nfp: flower-ct: add ct zone table
Add initial zone table to nfp_flower_priv. This table will be used
to store all the information required to offload conntrack.
Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Louis Peens [Wed, 2 Jun 2021 11:59:46 +0000 (13:59 +0200)]
nfp: flower-ct: add pre and post ct checks
Add checks to see if a flow is a conntrack flow we can potentially
handle. Just stub out the handling the different conntrack flows.
Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Louis Peens [Wed, 2 Jun 2021 11:59:45 +0000 (13:59 +0200)]
nfp: flower: move non-zero chain check
This is in preparation for conntrack offload support which makes
used of different chains. Add explicit checks for conntrack and
non-zero chains in the add_offload path.
Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Patch 0 documents the MAPv4/v5.
Patch 1 introduces the MAPv5 and the Inline checksum offload for RX/Ingress.
Patch 2 introduces the MAPv5 and the Inline checksum offload for TX/Egress.
A new checksum header format is used as part of MAPv5.For RX checksum offload,
the checksum is verified by the HW and the validity is marked in the checksum
header of MAPv5. For TX, the required metadata is filled up so hardware can
compute the checksum.
v1->v2:
- Fixed the compilation errors, warnings reported by kernel test robot.
- Checksum header definition is expanded to support big, little endian
formats as mentioned by Jakub.
v2->v3:
- Fixed compilation errors reported by kernel bot for big endian flavor.
v3->v4:
- Made changes to use masks instead of C bit-fields as suggested by Jakub/Alex.
v4->v5:
- Corrected checkpatch errors and warnings reported by patchwork.
v5->v6:
- Corrected the bug identified by Alex and incorporated all his comments.
v6->v7:
- Removed duplicate inclusion of linux/bitfield.h in rmnet_map_data.c
v7->v8:
- Have addressed comments given by JAkub on v7 patches.
- As suggested by Jakub, skb_cow_head() is used instead of expanding
the head directly. This is now done in rmnet_map_egress_handler().
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
net: ethernet: rmnet: Support for ingress MAPv5 checksum offload
Adding support for processing of MAPv5 downlink packets.
It involves parsing the Mapv5 packet and checking the csum header
to know whether the hardware has validated the checksum and is
valid or not.
Based on the checksum valid bit the corresponding stats are
incremented and skb->ip_summed is marked either CHECKSUM_UNNECESSARY
or left as CHEKSUM_NONE to let network stack revalidate the checksum
and update the respective snmp stats.
Current MAPV1 header has been modified, the reserved field in the
Mapv1 header is now used for next header indication.
Signed-off-by: Sharath Chandra Vurukala <sharathv@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 2 Jun 2021 00:07:56 +0000 (17:07 -0700)]
Merge branch 'iwl-next' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/linux
Tony Nguyen says:
====================
iwl-next Intel Wired LAN Driver Updates 2021-06-01
This pull request is targeting net-next and rdma-next branches.
These patches have been reviewed by netdev and rdma mailing lists[1].
This series adds RDMA support to the ice driver for E810 devices and
converts the i40e driver to use the auxiliary bus infrastructure
for X722 devices. The PCI netdev drivers register auxiliary RDMA devices
that will bind to auxiliary drivers registered by the new irdma module.
[1] https://lore.kernel.org/netdev/20210520143809.819-1-shiraz.saleem@intel.com/
---
v3:
- ice_aq_add_rdma_qsets(), ice_cfg_vsi_rdma(), ice_[ena|dis]_vsi_rdma_qset(),
and ice_cfg_rdma_fltr() no longer return ice_status
- Remove null check from ice_aq_add_rdma_qsets()
Changes since linked review (v6):
- Removed unnecessary checks in i40e_client_device_register() and
i40e_client_device_unregister()
- Simplified the i40e_client_device_register() API
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Zheng Yongjun [Tue, 1 Jun 2021 14:16:35 +0000 (22:16 +0800)]
vrf: Fix a typo
possibile ==> possible
Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
YueHaibing [Tue, 1 Jun 2021 14:02:38 +0000 (22:02 +0800)]
igb: Fix -Wunused-const-variable warning
If CONFIG_IGB_HWMON is n, gcc warns:
drivers/net/ethernet/intel/igb/e1000_82575.c:2765:17:
warning: ‘e1000_emc_therm_limit’ defined but not used [-Wunused-const-variable=]
static const u8 e1000_emc_therm_limit[4] = {
^~~~~~~~~~~~~~~~~~~~~
drivers/net/ethernet/intel/igb/e1000_82575.c:2759:17:
warning: ‘e1000_emc_temp_data’ defined but not used [-Wunused-const-variable=]
static const u8 e1000_emc_temp_data[4] = {
^~~~~~~~~~~~~~~~~~~
Move it into #ifdef block to fix this.
Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
YueHaibing [Tue, 1 Jun 2021 14:01:48 +0000 (22:01 +0800)]
cxgb4: Fix -Wunused-const-variable warning
If CONFIG_PCI_IOV is n, make W=1 warns:
drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c:3909:33:
warning: ‘cxgb4_mgmt_ethtool_ops’ defined but not used [-Wunused-const-variable=]
static const struct ethtool_ops cxgb4_mgmt_ethtool_ops = {
^~~~~~~~~~~~~~~~~~~~~~
Move it into #ifdef block to fix this.
Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
drivers/net/hamradio/bpqether.c:437:36:
warning: ‘bpq_seqops’ defined but not used [-Wunused-const-variable=]
static const struct seq_operations bpq_seqops = {
^~~~~~~~~~
Use #ifdef macro to gurad this.
Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Wong Vee Khee [Tue, 1 Jun 2021 13:52:35 +0000 (21:52 +0800)]
net: stmmac: enable platform specific safety features
On Intel platforms, not all safety features are enabled on the hardware.
The current implementation enable all safety features by default. This
will cause mass error and warning printouts after the module is loaded.
Introduce platform specific safety features flag to enable or disable
each safety features.
Signed-off-by: Wong Vee Khee <vee.khee.wong@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Nigel Christian [Tue, 1 Jun 2021 13:35:33 +0000 (09:35 -0400)]
NFC: microread: Remove redundant assignment to variable err
In the case MICROREAD_CB_TYPE_READER_ALL clang reports a dead code
warning. The error code assigned to variable err is already passed
to async_cb(). The assignment is redundant and can be removed.
Addresses-Coverity: ("Unused value") Signed-off-by: Nigel Christian <nigel.l.christian@gmail.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Peng Li [Tue, 1 Jun 2021 13:23:22 +0000 (21:23 +0800)]
net: hdlc: add braces {} to all arms of the statement
Braces {} should be used on all arms of this statement.
Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Peng Li [Tue, 1 Jun 2021 13:23:21 +0000 (21:23 +0800)]
net: hdlc: move out assignment in if condition
Should not use assignment in if condition.
Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Peng Li [Tue, 1 Jun 2021 13:23:20 +0000 (21:23 +0800)]
net: hdlc: replace comparison to NULL with "!param"
According to the chackpatch.pl, comparison to NULL could
be written "!param".
Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Peng Li [Tue, 1 Jun 2021 13:23:19 +0000 (21:23 +0800)]
net: hdlc: fix an code style issue about EXPORT_SYMBOL(foo)
According to the chackpatch.pl,
EXPORT_SYMBOL(foo); should immediately follow its function/variable.
Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Peng Li [Tue, 1 Jun 2021 13:23:18 +0000 (21:23 +0800)]
net: hdlc: fix an code style issue about "foo* bar"
Fix the checkpatch error as "foo* bar" and should be "foo *bar".
Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Peng Li [Tue, 1 Jun 2021 13:23:17 +0000 (21:23 +0800)]
net: hdlc: add blank line after declarations
This patch fixes the checkpatch error about missing a blank line
after declarations.
Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Peng Li [Tue, 1 Jun 2021 13:23:16 +0000 (21:23 +0800)]
net: hdlc: remove redundant blank lines
This patch removes some redundant blank lines.
Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 1 Jun 2021 23:54:42 +0000 (16:54 -0700)]
Merge branch 'act_vlan-allow-modify-zero'
Boris Sukholitko says:
====================
net/sched: act_vlan: Fix modify to allow 0
Currently vlan modification action checks existence of vlan priority by
comparing it to 0. Therefore it is impossible to modify existing vlan
tag to have priority 0.
For example, the following tc command will change the vlan id but will
not affect vlan priority:
tc filter add dev eth1 ingress matchall action vlan modify id 300 \
priority 0 pipe mirred egress redirect dev eth2
The incoming packet on eth1:
ethertype 802.1Q (0x8100), vlan 200, p 4, ethertype IPv4
will be changed to:
ethertype 802.1Q (0x8100), vlan 300, p 4, ethertype IPv4
although the user has intended to have p == 0.
The fix is to add tcfv_push_prio_exists flag to struct tcf_vlan_params
and rely on it when deciding to set the priority.
The same flag is used to avoid dumping unset vlan priority.
Change Log:
v3 -> v4:
- revert tcf_vlan_get_fill_size change: total size calculation may race vs dump
v2 -> v3:
- Push assumes that the priority is being set
- tcf_vlan_get_fill_size accounts for priority existence
v1 -> v2:
- Do not dump unset priority and fix tests accordingly
- Test for priority 0 modification
====================
Reviewed-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Currently vlan modification action checks existence of vlan priority by
comparing it to 0. Therefore it is impossible to modify existing vlan
tag to have priority 0.
For example, the following tc command will change the vlan id but will
not affect vlan priority:
tc filter add dev eth1 ingress matchall action vlan modify id 300 \
priority 0 pipe mirred egress redirect dev eth2
The incoming packet on eth1:
ethertype 802.1Q (0x8100), vlan 200, p 4, ethertype IPv4
will be changed to:
ethertype 802.1Q (0x8100), vlan 300, p 4, ethertype IPv4
although the user has intended to have p == 0.
The fix is to add tcfv_push_prio_exists flag to struct tcf_vlan_params
and rely on it when deciding to set the priority.
Fixes: 067c2a099ad01579ecb1 (net/sched: act_vlan: Introduce TCA_VLAN_ACT_MODIFY vlan action) Signed-off-by: Boris Sukholitko <boris.sukholitko@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yang Li [Tue, 1 Jun 2021 09:49:50 +0000 (17:49 +0800)]
NFC: nci: Remove redundant assignment to len
Variable 'len' is set to conn_info->max_pkt_payload_len but this
value is never read as it is overwritten with a new value later on,
hence it is a redundant assignment and can be removed.
Clean up the following clang-analyzer warning:
net/nfc/nci/hci.c:164:3: warning: Value stored to 'len' is never read
[clang-analyzer-deadcode.DeadStores]
Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Mon, 31 May 2021 16:17:07 +0000 (19:17 +0300)]
net: enetc: catch negative return code from enetc_pf_to_port()
After the refactoring introduced in commit cf73ed9d7458 ("net: enetc:
create a common enetc_pf_to_port helper"), enetc_pf_to_port was coded up
to return -1 in case the passed PCIe device does not have a recognized
BDF.
Make sure the -1 value is checked by the callers, to appease static
checkers.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
netpoll: don't require irqs disabled in rt kernels
write_msg(netconsole.c:836) calls netpoll_send_udp after a call to
spin_lock_irqsave, which normally disables interrupts; but in PREEMPT_RT
this call just locks an rt_mutex without disabling irqs. In this case,
netpoll_send_udp is called with interrupts enabled.
Signed-off-by: Wander Lairson Costa <wander@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vadym Kochan [Mon, 31 May 2021 14:32:43 +0000 (17:32 +0300)]
net: marvell: prestera: disable events interrupt while handling
There are change in firmware which requires that receiver will
disable event interrupts before handling them and enable them
after finish with handling. Events still may come into the queue
but without receiver interruption.
Signed-off-by: Vadym Kochan <vkochan@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yang Yingliang [Mon, 31 May 2021 12:48:59 +0000 (20:48 +0800)]
net: neterion: fix doc warnings in s2io.c
Add description for may_sleep to fix the W=1 warnings:
drivers/net/ethernet/neterion/s2io.c:1110: warning: Function parameter or member 'may_sleep' not described in 'init_tti'
drivers/net/ethernet/neterion/s2io.c:3335: warning: Function parameter or member 'may_sleep' not described in 'wait_for_cmd_complete'
drivers/net/ethernet/neterion/s2io.c:4881: warning: Function parameter or member 'may_sleep' not described in 's2io_set_multicast'
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Westphal [Sat, 29 May 2021 16:53:25 +0000 (18:53 +0200)]
netfilter: fix clang-12 fmt string warnings
nf_conntrack_h323_main.c:198:6: warning: format specifies type 'unsigned short' but
xt_AUDIT.c:121:9: warning: format specifies type 'unsigned char' but the argument has type 'int' [-Wformat]
Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Florian Westphal [Sat, 29 May 2021 16:50:45 +0000 (18:50 +0200)]
netfilter: nft_set_pipapo_avx2: fix up description warnings
W=1:
net/netfilter/nft_set_pipapo_avx2.c:159: warning: Excess function parameter 'len' description in 'nft_pipapo_avx2_refill'
net/netfilter/nft_set_pipapo_avx2.c:1124: warning: Function parameter or member 'key' not described in 'nft_pipapo_avx2_lookup'
net/netfilter/nft_set_pipapo_avx2.c:1124: warning: Excess function parameter 'elem' description in 'nft_pipapo_avx2_lookup'
Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Jian Shen [Mon, 31 May 2021 02:38:45 +0000 (10:38 +0800)]
net: hns3: add debugfs support for vlan configuration
Add debugfs support for vlan configuraion. create a single file
"vlan_config" for it, and query it by command "cat vlan_config",
return the result to userspace.
The new display style is below:
$ cat vlan_config
I_PORT_VLAN_FILTER: on
E_PORT_VLAN_FILTER: off
FUNC_ID I_VF_VLAN_FILTER E_VF_VLAN_FILTER PORT_VLAN_FILTER_BYPASS
pf off on off
vf0 off on off
FUNC_ID PVID ACCEPT_TAG1 ACCEPT_TAG2 ACCEPT_UNTAG1 ACCEPT_UNTAG2
pf 0 on on on on
vf0 0 on on on on
Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jian Shen [Mon, 31 May 2021 02:38:44 +0000 (10:38 +0800)]
net: hns3: add support for VF modify VLAN filter state
Previously, there is hardware limitation for VF to modify
the VLAN filter state, and the VLAN filter state is default
enabled. Now the limitation has been removed in some device,
so add capability flag to check whether the device supports
modify VLAN filter state. If flag on, user will be able to
modify the VLAN filter state by ethtool -K.
VF needs to send mailbox to request the PF to modify the VLAN
filter state for it.
Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jian Shen [Mon, 31 May 2021 02:38:43 +0000 (10:38 +0800)]
net: hns3: add query basic info support for VF
There are some features of VF depend on PF, so it's necessary
for VF to know whether PF supports. For compatibility, modify
the mailbox HCLGE_MBX_GET_TCINFO, extend its function, use to
get the basic information of PF, including mailbox api version
and PF capabilities.
Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jian Shen [Mon, 31 May 2021 02:38:42 +0000 (10:38 +0800)]
net: hns3: add support for modify VLAN filter state
Previously, with hardware limitation, the port VLAN filter are
effective for both PF and its VFs simultaneously, so a single
function is not able to enable/disable separately, and the VLAN
filter state is default enabled. Now some device supports each
function to bypass port VLAN filter, then each function can
switch VLAN filter separately. Add capability flag to check
whether the device supports modify VLAN filter state. If flag
on, user will be able to modify the VLAN filter state by ethtool
-K.
Furtherly, the default VLAN filter state is also changed
according to whether non-zero VLAN used. Then the device can
receive packet with any VLAN tag if only VLAN 0 used.
The function hclge_need_enable_vport_vlan_filter() is used to
help implement above changes. And the VLAN filter handle for
promisc mode can also be simplified by this function.
Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jian Shen [Mon, 31 May 2021 02:38:41 +0000 (10:38 +0800)]
net: hns3: refine function hclge_set_vf_vlan_cfg()
The struct hclge_vf_vlan_cfg is firstly designed for setting
VLAN filter tag. And it's reused for enable RX VLAN offload
later. It's strange to use member "is_kill" to indicate "enable".
So redefine the struct hclge_vf_vlan_cfg to adapt it.
For there are already 3 subcodes being used in function
hclge_set_vf_vlan_cfg(), use "switch-case" style for each
branch, rather than "if-else". Also simplify the assignment for
each branch to make it more clearly.
Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jian Shen [Mon, 31 May 2021 02:38:40 +0000 (10:38 +0800)]
net: hns3: remove unnecessary updating port based VLAN
For the PF have called hclge_update_port_base_vlan_cfg() already
before notify VF, it's unnecessary to update port based VLAN again
when received mailbox request from VF.
Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
====================
Part 2 of SJA1105 DSA driver preparation for new switch introduction (SJA1110)
This series is a continuation of:
https://patchwork.kernel.org/project/netdevbpf/cover/20210524131421.1030789-1-olteanv@gmail.com/
even though it isn't the first time these patches are submitted (they
were part of the group previously called "Add NXP SJA1110 support to the
sja1105 DSA driver"):
https://patchwork.kernel.org/project/netdevbpf/cover/20210526135535.2515123-1-vladimir.oltean@nxp.com/
but I broke that up again since these patches are already reviewed, for
the most part. There are no changes compared to v2 and v1.
This series of patches contains:
- an adaptation of the driver to the new "ethernet-ports" OF node name
- an adaptation of the driver to support more than 1 SGMII port
- a generalization of the supported phy_interface_t values per port
- an adaptation to encode SPEED_10, SPEED_100, SPEED_1000 into the
hardware registers differently depending on switch revision
- a consolidation of the PHY interface type used for RGMII and another
one for the API exposed for sja1105_dynamic_config_read()
====================
Vladimir Oltean [Sun, 30 May 2021 22:59:39 +0000 (01:59 +0300)]
net: dsa: sja1105: some table entries are always present when read dynamically
The SJA1105 has a static configuration comprised of a number of tables
with entries. Some of these can be read and modified at runtime as well,
through the dynamic configuration interface.
As a careful reader can notice from the comments in this file, the
software interface for accessing a table entry through the dynamic
reconfiguration is a bit of a no man's land, and varies wildly across
switch generations and even from one kind of table to another.
I have tried my best to come up with a software representation of a
'common denominator' SPI command to access a table entry through the
dynamic configuration interface:
struct sja1105_dyn_cmd {
bool search;
u64 valid; /* must be set to 1 */
u64 rdwrset; /* 0 to read, 1 to write */
u64 errors;
u64 valident; /* 0 if entry is invalid, 1 if valid */
u64 index;
};
Relevant to this patch is the VALIDENT bit, which for READ commands is
populated by the switch and lets us know if we're looking at junk or at
a real table entry.
In SJA1105, the dynamic reconfiguration interface for management routes
has notably not implemented the VALIDENT bit, leading to a workaround to
ignore this field in sja1105_dynamic_config_read(), as it will be set to
zero, but the data is valid nonetheless.
In SJA1110, this pattern has sadly been abused to death, and while there
are many more tables which can be read back over the dynamic config
interface compared to SJA1105, their handling isn't in any way more
uniform. Generally speaking, if there is a single possible entry in a
given table, and loading that table in the static config is mandatory as
per the documentation, then the VALIDENT bit is deemed as redundant and
more than likely not implemented.
So it is time to make the workaround more official, and add a bit to the
flags implemented by dynamic config tables. It will be used by more
tables when SJA1110 support arrives.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Sun, 30 May 2021 22:59:38 +0000 (01:59 +0300)]
net: dsa: sja1105: always keep RGMII ports in the MAC role
In SJA1105, the xMII Mode Parameters Table field called PHY_MAC denotes
the 'role' of the port, be it a PHY or a MAC. This makes a difference in
the MII and RMII protocols, but RGMII is symmetric, so either PHY or MAC
settings result in the same hardware behavior.
The SJA1110 is different, and the RGMII ports only work when configured
in MAC mode, so keep the port roles in MAC mode unconditionally.
Why we had an RGMII port in the PHY role in the first place was because
we wanted to have a way in the driver to denote whether RGMII delays
should be applied based on the phy-mode property or not. This is already
done in sja1105_parse_rgmii_delays() based on an intermediary
struct sja1105_dt_port (which contains the port role). So it is a
logical fallacy to use the hardware configuration as a scratchpad for
driver data, it isn't necessary.
We can also remove the gating condition for applying RGMII delays only
for ports in the PHY role. The .setup_rgmii_delay() method looks at
the priv->rgmii_rx_delay[port] and priv->rgmii_tx_delay[port] properties
which are already populated properly (in the case of a port in the MAC
role they are false). Removing this condition generates a few more SPI
writes for these ports (clearing the RGMII delays) which are perhaps
useless for SJA1105P/Q/R/S, where we know that the delays are disabled
by default. But for SJA1110, the firmware on the embedded microcontroller
might have done something funny, so it's always a good idea to clear the
RGMII delays if that's what Linux expects.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Sun, 30 May 2021 22:59:37 +0000 (01:59 +0300)]
net: dsa: sja1105: add a translation table for port speeds
In order to support the new speed of 2500Mbps, the SJA1110 has achieved
the great performance of changing the encoding in the MAC Configuration
Table for the port speeds of 10, 100, 1000 compared to SJA1105.
Because this is a common driver, we need a layer of indirection in order
to program the hardware with the right values irrespective of switch
generation.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>