No need to place it in struct net, nfnetlink is a module and usage
doesn't occur in fastpath.
Also remove rcu usage:
Not a single reader of net->nfnl uses rcu accessors.
When exit_batch callbacks are executed the net namespace is already dead
so no calls to these functions are possible anymore (else we'd get NULL
deref crash too).
If the module is removed, then modules that call any of those functions
have been removed too so no calls to nfnl functions are possible either.
The nfnl and nfl_stash pointers in struct net are no longer used, they
will be removed in a followup patch to minimize changes to struct net
(causes rebuild for entire network stack).
Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
netfilter: nftables: remove documentation on static functions
Since 4f16d25c68ec ("netfilter: nftables: add nft_parse_register_load()
and use it") and 345023b0db31 ("netfilter: nftables: add
nft_parse_register_store() and use it"), the following functions are not
exported symbols anymore:
Dan Carpenter [Fri, 2 Apr 2021 11:45:44 +0000 (14:45 +0300)]
netfilter: nftables: fix a warning message in nf_tables_commit_audit_collect()
The first argument of a WARN_ONCE() is a condition. This WARN_ONCE()
will only print the table name, and is potentially problematic if the
table name has a %s in it.
Fixes: c520292f29b8 ("audit: log nftables configuration change events once per table") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
netfilter: nftables: add helper function to set the base sequence number
This patch adds a helper function to calculate the base sequence number
field that is stored in the nfnetlink header. Use the helper function
whenever possible.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The spinlock nf_tables_destroy_list_lock is initialized statically.
It is unnecessary to initialize by spin_lock_init().
Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
netfilter: flowtable: dst_check() from garbage collector path
Move dst_check() to the garbage collector path. Stale routes trigger the
flow entry teardown state which makes affected flows go back to the
classic forwarding path to re-evaluate flow offloading.
IPv6 requires the dst cookie to work, store it in the flow_tuple,
otherwise dst_check() always fails.
Fixes: e5075c0badaa ("netfilter: flowtable: call dst_check() to fall back to classic forwarding") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Florian Westphal [Thu, 25 Mar 2021 17:25:12 +0000 (18:25 +0100)]
netfilter: nft_log: perform module load from nf_tables
modprobe calls from the nf_logger_find_get() API causes deadlock in very
special cases because they occur with the nf_tables transaction mutex held.
In the specific case of nf_log, deadlock is via:
A nf_tables -> transaction mutex -> nft_log -> modprobe -> nf_log_syslog \
-> pernet_ops rwsem -> wait for C
B netlink event -> rtnl_mutex -> nf_tables transaction mutex -> wait for A
C close() -> ip6mr_sk_done -> rtnl_mutex -> wait for B
Earlier patch added NFLOG/xt_LOG module softdeps to avoid the need to load
the backend module during a transaction.
For nft_log we would have to add a softdep for both nfnetlink_log or
nf_log_syslog, since we do not know in advance which of the two backends
are going to be configured.
This defers the modprobe op until after the transaction mutex is released.
Tested-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
With the exception of nfnetlink_log (packet is sent to userspace for
dissection/logging), all of them log to the kernel ringbuffer.
This is the first part of a series to merge all modules except
nfnetlink_log into a single module: nf_log_syslog.
This allows to reduce code. After the series, only two log modules remain:
nfnetlink_log and nf_log_syslog. The latter provides the same
functionality as the old per-af log modules.
This renames nf_log_ipv4 to nf_log_syslog.
Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Yang Yingliang [Tue, 30 Mar 2021 12:55:39 +0000 (20:55 +0800)]
net: mhi: remove pointless conditional before kfree_skb()
It already has null pointer check in kfree_skb(),
remove pointless pointer check before kfree_skb().
Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Cross time-stamping mechanism used in certain instance of Intel mGbE
may run at different clock frequency in comparison to the clock
frequency used by processor, so we introduce cross T/S frequency
adjustment to ensure TSC calculation is correct when processor got the
cross time-stamps.
Signed-off-by: Wong Vee Khee <vee.khee.wong@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 30 Mar 2021 20:29:39 +0000 (13:29 -0700)]
Merge branch 'rfc8335-probe'
Andreas Roeseler says:
====================
add support for RFC 8335 PROBE
The popular utility ping has several severe limitations, such as the
inability to query specific interfaces on a node and requiring
bidirectional connectivity between the probing and probed interfaces.
RFC 8335 attempts to solve these limitations by creating the new utility
PROBE which is a specialized ICMP message that makes use of the ICMP
Extension Structure outlined in RFC 4884.
This patchset adds definitions for the ICMP Extended Echo Request and
Reply (PROBE) types for both IPV4 and IPV6, adds a sysctl to enable
responses to PROBE messages, expands the list of supported ICMP messages
to accommodate PROBE types, adds ipv6_dev_find into ipv6_stubs, and adds
functionality to respond to PROBE requests.
Changes:
v1 -> v2:
- Add AFI definitions
- Switch to functions such as dev_get_by_name and ip_dev_find to lookup
net devices
v2 -> v3:
Suggested by Willem de Bruijn <willemdebruijn.kernel@gmail.com>
- Add verification of incoming messages before looking up netdev
- Add prefix for PROBE specific defined variables
- Use proc_dointvec_minmax with zero and one for sysctl
- Create struct icmp_ext_echo_iio for parsing incoming packets Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
- Include net/addrconf.h library for ipv6_dev_find
v3 -> v4:
- Use in_addr instead of __be32 for storing IPV4 addresses
- Use IFNAMSIZ to statically allocate space for name in
icmp_ext_echo_iio
Suggested by Willem de Bruijn <willemdebruijn.kernel@gmail.com>
- Use skb_header_pointer to verify fields in incoming message
- Add check to ensure that extobj_hdr.length is valid
- Check to ensure object payload is padded with ASCII NULL characters
when probing by name, as specified by RFC 8335
- Statically allocate buff using IFNAMSIZ
- Add rcu blocking around ipv6_dev_find
- Use __in_dev_get_rcu to access IPV4 addresses of identified
net_device
- Remove check for ICMPV6 PROBE types
v4 -> v5:
- Statically allocate buff to size IFNAMSIZ on declaration
- Remove goto probe in favor of single branch
- Remove strict check for incoming PROBE request padding to nearest
32-bit boundary Reported-by: kernel test robot <lkp@intel.com>
v5 -> v6:
- Add documentation for icmp_echo_enable_probe sysctl
- Remove RCU locking around ipv6_dev_find()
- Assign iio based on ctype
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Andreas Roeseler [Tue, 30 Mar 2021 01:45:51 +0000 (18:45 -0700)]
icmp: add response to RFC 8335 PROBE messages
Modify the icmp_rcv function to check PROBE messages and call icmp_echo
if a PROBE request is detected.
Modify the existing icmp_echo function to respond ot both ping and PROBE
requests.
This was tested using a custom modification to the iputils package and
wireshark. It supports IPV4 probing by name, ifindex, and probing by
both IPV4 and IPV6 addresses. It currently does not support responding
to probes off the proxy node (see RFC 8335 Section 2).
The modification to the iputils package is still in development and can
be found here: https://github.com/Juniper-Clinic-2020/iputils.git. It
supports full sending functionality of PROBE requests, but currently
does not parse the response messages, which is why Wireshark is required
to verify the sent and recieved PROBE messages. The modification adds
the ``-e'' flag to the command which allows the user to specify the
interface identifier to query the probed host. An example usage would be
<./ping -4 -e 1 [destination]> to send a PROBE request of ifindex 1 to the
destination node.
Signed-off-by: Andreas Roeseler <andreas.a.roeseler@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Andreas Roeseler [Tue, 30 Mar 2021 01:45:36 +0000 (18:45 -0700)]
net: add support for sending RFC 8335 PROBE messages
Modify the ping_supported function to support PROBE message types. This
allows tools such as the ping command in the iputils package to be
modified to send PROBE requests through the existing framework for
sending ping requests.
Signed-off-by: Andreas Roeseler <andreas.a.roeseler@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Andreas Roeseler [Tue, 30 Mar 2021 01:45:29 +0000 (18:45 -0700)]
net: add sysctl for enabling RFC 8335 PROBE messages
Section 8 of RFC 8335 specifies potential security concerns of
responding to PROBE requests, and states that nodes that support PROBE
functionality MUST be able to enable/disable responses and that
responses MUST be disabled by default
Signed-off-by: Andreas Roeseler <andreas.a.roeseler@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Andre Edich [Mon, 29 Mar 2021 09:45:36 +0000 (11:45 +0200)]
net: phy: lan87xx: fix access to wrong register of LAN87xx
The function lan87xx_config_aneg_ext was introduced to configure
LAN95xxA but as well writes to undocumented register of LAN87xx.
This fix prevents that access.
The function lan87xx_config_aneg_ext gets more suitable for the new
behavior name.
Reported-by: Måns Rullgård <mans@mansr.com> Fixes: 05b35e7eb9a1 ("smsc95xx: add phylib support") Signed-off-by: Andre Edich <andre.edich@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
this is a pull request of 39 patches for net-next/master.
The first two patches update the MAINTAINERS file. One is by me and
removes Dan Murphy from the from m_can and tcan4x5x. The other one is
by Pankaj Sharma and updates the maintainership of the m-can mmio
driver.
The next three patches are by me and update the CAN echo skb handling.
Vincent Mailhol provides 5 patches where Transmitter Delay
Compensation is added CAN bittiming calculation is cleaned up.
The next patch is by me and adds a missing HAS_IOMEM to the grcan
driver.
Michal Simek's patch for the xilinx driver add dev_err_probe()
support.
Arnd Bergmann's patch for the ucan driver fixes a compiler warning.
Stephane Grosjean provides 3 patches for the peak USB drivers, which
add ethtool set_phys_id and CAN one-shot mode.
Xulin Sun's patch removes a not needed return check in the m-can
driver. Torin Cooper-Bennun provides 3 patches for the m-can driver
that add rx-offload support to ensure that skbs are sent from softirq
context. Wan Jiabing's patch for the tcan4x5x driver removes a
duplicate include.
The next 6 patches are by me and target the mcp251xfd driver. They add
devcoredump support, simplify the UINC handling, and add HW timestamp
support.
The remaining 12 patches target the c_can driver. The first 6 are by
me and do generic checkpatch related cleanup work. Dario Binacchi's
patches bring some cleanups and increase the number of usable message
objects from 16 to 64.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 30 Mar 2021 19:59:25 +0000 (12:59 -0700)]
Merge tag 'mlx5-updates-2021-03-29' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2021-03-29
Coexistence of CQE compression and HW PTP time-stamp:
From Aya this series improves mlx5 netdev driver to allow
both mlx5 CQE compression (RX descriptor compression, that saves on PCI
transaction) and HW time-stamp PTP to co-exists.
Prior to this series both features were mutually exclusive due to the
nature of CQE compression which reduces the size of RX descriptor for
the price of trimming some data, such as the time-stamp.
In order to allow CQE compression when PTP time stamping is enabled,
We enable it on the regular performance critical RX queues which will
service all the data path traffic that is not PTP.
PTP traffic will be re-directed to dedicated RX queues on which we will
not enable CQE compression and thus keep the time-stamp intact.
Having both features is critical for systems with low PCI BW, e.g.
Multi-Host.
The series will be adding:
1) Infrastructure to create a dedicated RX queue to service the PTP traffic
2) Flow steering plumbing to capture PTP traffic both UDP packets with
destination port 319 and L2 packets with ethertype 0x88F7
3) Steer PTP traffic to the dedicated RX queue.
4) The feature will be enabled when PTP is being configured via the
already existing PTP IOCTL when CQE compression is active, otherwise
no change to the driver flow.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Dario Binacchi [Tue, 2 Mar 2021 21:54:35 +0000 (22:54 +0100)]
can: c_can: add support to 64 message objects
D_CAN controller supports 16, 32, 64 or 128 message objects, comparing
to 32 on C_CAN. AM335x/AM437x Sitara processors and DRA7 SOC all
instantiate a D_CAN controller with 64 message objects, as described
in the "DCAN features" subsection of the CAN chapter of their
technical reference manuals.
The driver policy has been kept unchanged, and as in the previous
version, the first half of the message objects is used for reception
and the second for transmission.
The I/O load is increased only in the case of 64 message objects,
keeping it unchanged in the case of 32. Two 32-bit read accesses are
in fact required, which however remained at 16-bit for configurations
with 32 message objects.
Dario Binacchi [Tue, 2 Mar 2021 21:54:34 +0000 (22:54 +0100)]
can: c_can: prepare to up the message objects number
As pointed by commit c0a9f4d396c9 ("can: c_can: Reduce register
access") the "driver casts the 16 message objects in stone, which is
completely braindead as contemporary hardware has up to 128 message
objects".
The patch prepares the module to extend the number of message objects
beyond the 32 currently managed. This was achieved by transforming the
constants used to manage RX/TX messages into variables without
changing the driver policy.
Dario Binacchi [Tue, 2 Mar 2021 21:54:32 +0000 (22:54 +0100)]
can: c_can: add a comment about IF_RX interface's use
After reading the commit 640916db2bf7 ("can: c_can: Make it SMP safe")
it may sound strange to see the IF_RX interface used by the
can_inval_tx_object function. A comment was added to avoid any
misunderstanding.
can: mcp251xfd: add HW timestamp to RX, TX and error CAN frames
This patch uses the previously added mcp251xfd_skb_set_timestamp()
function to convert the timestamp done by the CAN controller into a
proper skb hw timestamp.
This patch add the HW timestamping infrastructure. The mcp251xfd has a
free running timer of 32 bit width, running at max 40MHz, which wraps
around every 107 seconds. The current timestamp is latched into RX and
TEF objects automatically be the CAN controller.
This patch sets up a cyclecounter, timecounter and delayed worker
infrastructure (which runs every 45 seconds) to convert the timer into
a proper 64 bit based ns timestamp.
| 1f652bb6bae7 can: mcp25xxfd: rx-path: reduce number of SPI core requests to set UINC bit
| 68c0c1c7f966 can: mcp251xfd: tef-path: reduce number of SPI core requests to set UINC bit
the setting of the UINC bit in the TEF and RX FIFO was batched into a
single SPI message consisting of several transfers. All transfers but
the last need to have the cs_change set to 1.
In the original patches the array of prepared transfers is send from
the beginning with the length depending on the number of read TEF/RX
objects. The cs_change of the last transfer is temporarily set to
0 during send.
This patch removes the modification of cs_change by preparing the last
transfer with cs_change to 0 and all other to 1. When sending the SPI
message the driver now starts with an offset into the array, so that
it always ends on the last entry in the array, which has the cs_change
set to 0.
For easier debugging this patch adds dev coredump support to the
driver. A dev coredump is generated in case the chip fails to start or
an error in the interrupt handler is detected.
The dev coredump consists of all chip registers and chip memory, as
well as the driver's internal state of the TEF-, RX- and TX-FIFOs, it
can be analyzed with the mcp251xfd-dump tool of the can-utils:
can: m_can: fix periph RX path: use rx-offload to ensure skbs are sent from softirq context
For peripheral devices, m_can sent skbs directly from a threaded irq
instead of from a softirq context, breaking the tcan4x5x peripheral
driver completely. This patch transitions the driver to use the
rx-offload helper for peripherals, ensuring the skbs are sent from the
correct context, with h/w timestamping to ensure correct ordering.
can: m_can: m_can_chip_config(): enable and configure internal timestamps
This is a prerequisite for transitioning the m_can driver to rx-offload,
which works best with TX and RX timestamps.
The timestamps provided by M_CAN are 16-bit, timed according to the
nominal bit timing, and may be prescaled by a multiplier up to 16. We
choose the highest prescalar so that the timestamp wraps every 2^20 bit
times, or 209 ms at a bus speed of 5 Mbit/s. Timestamps will have a
precision of 16 bit times.
If the CAN net device has been successfully allocated, its private
data structure is impossible to be empty, remove this redundant error
return judgment.
This patch adds "ONE-SHOT" mode support to the following CAN-USB
PEAK-System GmbH interfaces:
- PCAN-USB X6
- PCAN-USB FD
- PCAN-USB Pro FD
- PCAN-Chip USB
- PCAN-USB Pro
Signed-off-by: Stephane Grosjean <s.grosjean@peak-system.com>
[mkl: split into two patches] Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Michal Simek [Thu, 4 Feb 2021 12:42:48 +0000 (13:42 +0100)]
can: xilinx_can: Simplify code by using dev_err_probe()
Use already prepared dev_err_probe() introduced by commit a787e5400a1c
("driver core: add device probe log helper").
It simplifies EPROBE_DEFER handling.
Also unify message format for similar error cases.
can: grcan: add missing Kconfig dependency to HAS_IOMEM
On ARCHs without IOMEM support the grcan driver fails to link due to
missing iomem functionality. This patch adds the missing Kconfig
dependency to HAS_IOMEM.
Vincent Mailhol [Sat, 6 Mar 2021 05:40:40 +0000 (14:40 +0900)]
can: bittiming: add CAN_KBPS, CAN_MBPS and CAN_MHZ macros
Add three macro to simplify the readability of big bit timing numbers:
- CAN_KBPS: kilobits per second (one thousand)
- CAN_MBPS: megabits per second (one million)
- CAN_MHZ: megahertz per second (one million)
Vincent Mailhol [Wed, 24 Feb 2021 00:20:08 +0000 (09:20 +0900)]
can: bittiming: add calculation for CAN FD Transmitter Delay Compensation (TDC)
The logic for the tdco calculation is to just reuse the normal sample
point: tdco = sp. Because the sample point is expressed in tenth of
percent and the tdco is expressed in time quanta, a conversion is
needed.
At the end,
ssp = tdcv + tdco
= tdcv + sp.
Another popular method is to set tdco to the middle of the bit:
tdc->tdco = can_bit_time(dbt) / 2
During benchmark tests, we could not find a clear advantages for one
of the two methods.
The tdco calculation is triggered each time the data_bittiming is
changed so that users relying on automated calculation can use the
netlink interface the exact same way without need of new parameters.
For example, a command such as:
ip link set canX type can bitrate 500000 dbitrate 4000000 fd on
would trigger the calculation.
The user using CONFIG_CAN_CALC_BITTIMING who does not want automated
calculation needs to manually set tdco to zero.
For example with:
ip link set canX type can tdco 0 bitrate 500000 dbitrate 4000000 fd on
(if the tdco parameter is provided in a previous command, it will be
overwritten).
If tdcv is set to zero (default), it is automatically calculated by
the transiver for each frame. As such, there is no code in the kernel
to calculate it.
tdcf has no automated calculation functions because we could not
figure out a formula for this parameter.
Vincent Mailhol [Wed, 24 Feb 2021 00:20:06 +0000 (09:20 +0900)]
can: netlink: move '=' operators back to previous line (checkpatch fix)
Fix the warning triggered by having an '=' at the beginning of the
line by moving it back to the previous line. Also replace all
indentations with a single space so that future entries can be more
easily added.
Extract of ./scripts/checkpatch.pl -f drivers/net/can/dev/netlink.c:
CHECK: Assignment operator '=' should be on the previous line
+ [IFLA_CAN_BITTIMING_CONST]
+ = { .len = sizeof(struct can_bittiming_const) },
CHECK: Assignment operator '=' should be on the previous line
+ [IFLA_CAN_DATA_BITTIMING]
+ = { .len = sizeof(struct can_bittiming) },
CHECK: Assignment operator '=' should be on the previous line
+ [IFLA_CAN_DATA_BITTIMING_CONST]
+ = { .len = sizeof(struct can_bittiming_const) },
Vincent Mailhol [Wed, 24 Feb 2021 00:20:04 +0000 (09:20 +0900)]
can: add new CAN FD bittiming parameters: Transmitter Delay Compensation (TDC)
At high bit rates, the propagation delay from the TX pin to the RX pin
of the transceiver causes measurement errors: the sample point on the
RX pin might occur on the previous bit.
This issue is addressed in ISO 11898-1 section 11.3.3 "Transmitter
delay compensation" (TDC).
This patch adds two new structures: can_tdc and can_tdc_const in order
to implement this TDC.
The structures are then added to can_priv.
A controller supports TDC if an only if can_priv::tdc_const is not
NULL.
TDC is active if and only if:
- fd flag is on
- can_priv::tdc.tdco is not zero.
It is the driver responsibility to check those two conditions are met.
No new controller modes are introduced (i.e. no CAN_CTRL_MODE_TDC) in
order not to be redundant with above logic.
The names of the parameters are chosen to match existing CAN
controllers specification. References:
- Bosch C_CAN FD8:
https://www.bosch-semiconductors.com/media/ip_modules/pdf_2/c_can_fd8/users_manual_c_can_fd8_r210_1.pdf
- Microchip CAN FD Controller Module:
http://ww1.microchip.com/downloads/en/DeviceDoc/MCP251XXFD-CAN-FD-Controller-Module-Family-Reference-Manual-20005678B.pdf
- SAM E701/S70/V70/V71 Family:
https://www.mouser.com/datasheet/2/268/60001527A-1284321.pdf
can: dev: can_free_echo_skb(): extend to return can frame length
In order to implement byte queue limits (bql) in CAN drivers, the
length of the CAN frame needs to be passed into the networking stack
even if the transmission failed for some reason.
To avoid to calculate this length twice, extend can_free_echo_skb() to
return that value. Convert all users of this function, too.
This patch is the natural extension of commit:
| 9420e1d495e2 ("can: dev: can_get_echo_skb(): extend to return can
| frame length")
So far the creation of the TX echo skb was optional and can be
controlled by the local sender of a CAN frame.
It turns out that the TX echo CAN skb can be piggybacked to carry
information in the driver from the TX- to the TX-complete handler.
Several drivers already use the return value of
can_get_echo_skb() (which is the length of the data field in the CAN
frame) for their number of transferred bytes statistics. The
statistics are not working if CAN echo skbs are disabled.
Another use case is to calculate and set the CAN frame length on the
wire, which is needed for BQL support in both the TX and TX-completion
handler.
For now in can_put_echo_skb(), which is called from the TX handler,
the skb carrying the CAN frame is discarded if no TX echo is
requested, leading to the above illustrated problems.
This patch changes the can_put_echo_skb() function, so that the echo
skb is always generated. If the sender requests no echo, the echo skb
is consumed in __can_get_echo_skb() without being passed into the RX
handler of the networking stack, but the CAN data length and CAN frame
length information is properly returned.
Aya Levin [Wed, 13 Jan 2021 07:54:22 +0000 (09:54 +0200)]
net/mlx5e: Update ethtool setting of CQE compression
Remove restriction blocking configuration of CQE compression when PTP rx
filter is set. Instead turn on indication for RX PTP, and try to reopen
the channels.
Aya Levin [Wed, 20 Jan 2021 14:59:27 +0000 (16:59 +0200)]
net/mlx5e: Allow coexistence of CQE compression and HW TS PTP
Update setting HW time-stamp to allow coexistence with CQE compression.
Turn on RX PTP indication and try to reopen the channels. On success,
coexistence with CQE compression is enabled. Otherwise, fall-back to
turning off CQE compression.
Aya Levin [Tue, 16 Feb 2021 10:32:48 +0000 (12:32 +0200)]
net/mlx5e: Add PTP Flow Steering support
When opening PTP channel with MLX5E_PTP_STATE_RX set, add the
corresponding flow steering rules. Capture UDP packets with destination
port 319 and L2 packets with ethertype 0x88F7 and steer them into the RQ
of the PTP channel.
Add API that manages the flow steering rules to be used in the following
patches via safe_reopen_channels mechanism.
Aya Levin [Sun, 17 Jan 2021 06:58:04 +0000 (08:58 +0200)]
net/mlx5e: Introduce Flow Steering ANY API
Add a new FS API which captures the ANY traffic from the traffic
classifier into a dedicated FS table. The table consists of a group
matching the ethertype and a must-be-last group which contains a default
rule redirecting the unmatched packets back to the RSS logic.
Aya Levin [Thu, 14 Jan 2021 15:26:35 +0000 (17:26 +0200)]
net/mlx5e: Introduce Flow Steering UDP API
Add a new FS API which captures the UDP traffic from the traffic
classifier into a dedicated FS table. This API handles both UDP over
IPv4 and IPv6 in the same manner. The tables (one for UDPv4 and another
for UDPv6) consist of a group matching the UDP destination port and a
must-be-last group which contains a default rule redirecting the
unmatched packets back to the RSS logic.
Aya Levin [Thu, 21 Jan 2021 07:32:52 +0000 (09:32 +0200)]
net/mlx5e: Cleanup Flow Steering level
Flow Steering levels are used to determine the order between the tables.
As of today, each one of these tables follows the TTC table, and hijacks
its traffic, and cannot be combined together for now. Putting them in
the same layer better reflects the situation.
Aya Levin [Thu, 25 Feb 2021 17:55:20 +0000 (19:55 +0200)]
net:mlx5e: Add PTP-TIR and PTP-RQT
Add PTP-TIR and initiate its RQT to allow PTP-RQ to integrate into the
safe-reopen flow on configuration change. Add rx_ptp_support flag on a
profile and turn it on for ETH driver. With this flag set, create a
redirect-RQT for PTP-RQ.
Aya Levin [Sun, 7 Mar 2021 13:55:04 +0000 (15:55 +0200)]
net/mlx5e: Add PTP-RX statistics
Like PTP-TX, once the PTP-RX is opened, corresponding statistics appear.
Add indication that PTP-RX was ever opened: rx_ptp_opened. If any of the
PTP RX or TX were opened, display the PTP channel's statistics.
Aya Levin [Sun, 7 Mar 2021 13:47:37 +0000 (15:47 +0200)]
net/mlx5e: Add RQ to PTP channel
Enhance PTP channel to allow PTP without disabling CQE compression. Add
RQ, TIR and PTP_RX_STATE to PTP channel. When this bit is set, PTP
channel manages its RQ, and PTP traffic is directed to the PTP-RQ which
is not affected by compression.
Aya Levin [Mon, 11 Jan 2021 14:45:21 +0000 (16:45 +0200)]
net/mlx5e: Add states to PTP channel
Add PTP TX state to PTP channel, which indicates the corresponding SQ is
available. Further patches in the set extend PTP channel to include RQ.
The PTP channel state will be used for separation and coexistence of RX
and TX PTP. Enhance conditions to verify the TX PTP state is set.
Haiyang Zhang [Mon, 29 Mar 2021 23:21:35 +0000 (16:21 -0700)]
hv_netvsc: Add error handling while switching data path
Add error handling in case of failure to send switching data path message
to the host.
Reported-by: Shachar Raindel <shacharr@microsoft.com> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 29 Mar 2021 17:40:49 +0000 (10:40 -0700)]
tcp: fix tcp_min_tso_segs sysctl
tcp_min_tso_segs is now stored in u8, so max value is 255.
255 limit is enforced by proc_dou8vec_minmax().
We can therefore remove the gso_max_segs variable.
Fixes: 47996b489bdc ("tcp: convert elligible sysctls to u8") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 29 Mar 2021 19:25:22 +0000 (12:25 -0700)]
sit: proper dev_{hold|put} in ndo_[un]init methods
After adopting CONFIG_PCPU_DEV_REFCNT=n option, syzbot was able to trigger
a warning [1]
Issue here is that:
- all dev_put() should be paired with a corresponding prior dev_hold().
- A driver doing a dev_put() in its ndo_uninit() MUST also
do a dev_hold() in its ndo_init(), only when ndo_init()
is returning 0.
Otherwise, register_netdevice() would call ndo_uninit()
in its error path and release a refcount too soon.
Fixes: 919067cc845f ("net: add CONFIG_PCPU_DEV_REFCNT") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 29 Mar 2021 18:39:51 +0000 (11:39 -0700)]
ip6_gre: proper dev_{hold|put} in ndo_[un]init methods
After adopting CONFIG_PCPU_DEV_REFCNT=n option, syzbot was able to trigger
a warning [1]
Issue here is that:
- all dev_put() should be paired with a corresponding dev_hold(),
and vice versa.
- A driver doing a dev_put() in its ndo_uninit() MUST also
do a dev_hold() in its ndo_init(), only when ndo_init()
is returning 0.
Otherwise, register_netdevice() would call ndo_uninit()
in its error path and release a refcount too soon.
ip6_gre for example (among others problematic drivers)
has to use dev_hold() in ip6gre_tunnel_init_common()
instead of from ip6gre_newlink_common(), covering
both ip6gre_tunnel_init() and ip6gre_tap_init()/
Note that ip6gre_tunnel_init_common() is not called from
ip6erspan_tap_init() thus we also need to add a dev_hold() there,
as ip6erspan_tunnel_uninit() does call dev_put()
Fixes: 919067cc845f ("net: add CONFIG_PCPU_DEV_REFCNT") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 29 Mar 2021 23:27:54 +0000 (16:27 -0700)]
Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:
====================
1GbE Intel Wired LAN Driver Updates 2021-03-29
This series contains updates to igc driver only.
Andre Guedes says:
Add XDP support for the igc driver. The approach implemented by this
series follows the same approach implemented in other Intel drivers as
much as possible for the sake of consistency.
The series is organized in two parts. In the first part, i.e. patches
from 1 to 4, igc_main.c and igc_ptp.c code is refactored in preparation
for landing the XDP support, which is introduced in the second part
(patches from 5 to 8).
As far as code organization is concerned, XDP-related helpers are
defined in a new file, igc_xdp.c, and are called by igc_main.c.
The features added by this series have been tested with the samples
provided in samples/bpf/: xdp1, xdp2, xdp_redirect_cpu, and
xdp_redirect_map.
Upcoming series will add support of UMEM and zero-copy features from
AF_XDP.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Loic Poulain [Mon, 29 Mar 2021 15:39:32 +0000 (17:39 +0200)]
net: mhi: Allow decoupled MTU/MRU
MBIM protocol makes the mhi network interface asymmetric, ingress data
received from MHI is MBIM protocol, possibly containing multiple
aggregated IP packets, while egress data received from network stack is
IP protocol.
This changes allows a 'protocol' to specify its own MRU, that when
specified is used to allocate MHI RX buffers (skb).
For MBIM, Set the default MTU to 1500, which is the usual network MTU
for WWAN IP packets, and MRU to 3.5K (for allocation efficiency),
allowing skb to fit in an usual 4K page (including padding,
skb_shared_info, ...).
Signed-off-by: Loic Poulain <loic.poulain@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Loic Poulain [Mon, 29 Mar 2021 15:39:31 +0000 (17:39 +0200)]
net: mhi: Add support for non-linear MBIM skb processing
Currently, if skb is non-linear, due to MHI skb chaining, it is
linearized in MBIM RX handler prior MBIM decoding, causing extra
allocation and copy that can be as large as the maximum MBIM frame
size (32K).
This change introduces MBIM decoding for non-linear skb, allowing to
process 'large' non-linear MBIM packets without skb linearization.
The IP packets are simply extracted from the MBIM frame using the
skb_copy_bits helper.
Signed-off-by: Loic Poulain <loic.poulain@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Mon, 29 Mar 2021 11:23:54 +0000 (12:23 +0100)]
ieee802154: hwsim: remove redundant initialization of variable res
The variable res is being initialized with a value that is
never read and it is being updated later with a new value.
The initialization is redundant and can be removed.
Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata [Mon, 29 Mar 2021 15:57:31 +0000 (17:57 +0200)]
Documentation: net: Document resilient next-hop groups
Add a document describing the principles behind resilient next-hop groups,
and some notes about how to configure and offload them.
Suggested-by: David Ahern <dsahern@gmail.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: David Ahern <dsahern@gmail.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yang Yingliang [Mon, 29 Mar 2021 12:44:27 +0000 (20:44 +0800)]
net: mdio: Correct function name mdio45_links_ok() in comment
Fix the following make W=1 kernel build warning:
drivers/net/mdio.c:95: warning: expecting prototype for mdio_link_ok(). Prototype was for mdio45_links_ok() instead
Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yang Yingliang [Mon, 29 Mar 2021 12:42:57 +0000 (20:42 +0800)]
net: bonding: Correct function name bond_change_active_slave() in comment
Fix the following make W=1 kernel build warning:
drivers/net/bonding/bond_main.c:982: warning: expecting prototype for change_active_interface(). Prototype was for bond_change_active_slave() instead
Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yang Yingliang [Mon, 29 Mar 2021 12:40:46 +0000 (20:40 +0800)]
net: phy: Correct function name mdiobus_register_board_info() in comment
Fix the following make W=1 kernel build warning:
drivers/net/phy/mdio-boardinfo.c:63: warning: expecting prototype for mdio_register_board_info(). Prototype was for mdiobus_register_board_info() instead
Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 29 Mar 2021 20:37:26 +0000 (13:37 -0700)]
Merge branch 'mlxsw-sampling-fixes'
Ido Schimmel says:
====================
mlxsw: Two sampling fixes
This patchset fixes two bugs in recent sampling submissions.
The first fix, in patch #3, prevents matchall rules with sample action
to be added in front of flower rules on egress. Patches #1-#2 are
preparations meant at avoiding similar bugs in the future. Patch #4 is a
selftest.
The second fix, in patch #5, prevents sampling from being enabled on a
port if already enabled. Patch #6 is a selftest.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>