]> git.baikalelectronics.ru Git - kernel.git/log
kernel.git
4 years agoMerge branch 'dpaa2-eth-send-a-scatter-gather-FD-instead-of-realloc-ing'
David S. Miller [Tue, 30 Jun 2020 00:42:48 +0000 (17:42 -0700)]
Merge branch 'dpaa2-eth-send-a-scatter-gather-FD-instead-of-realloc-ing'

Ioana Ciornei says:

====================
dpaa2-eth: send a scatter-gather FD instead of realloc-ing

This patch set changes the behaviour in case the Tx path is confroted
with an SKB with insufficient headroom for our hardware necessities (SW
annotation area). In the first patch, instead of realloc-ing the SKB we
now send a S/G frames descriptor while the second one adds a new
software held counter to account for for these types of frames.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agodpaa2-eth: add software counter for Tx frames converted to S/G
Ioana Ciornei [Mon, 29 Jun 2020 18:47:12 +0000 (21:47 +0300)]
dpaa2-eth: add software counter for Tx frames converted to S/G

With the previous commit, in case of insufficient SKB headroom on the Tx
path instead of reallocing the SKB we now send a S/G frame descriptor.
Export the number of occurences of this case as a per CPU counter (in
debugfs) and a total number in the ethtool statistics - "tx converted sg
frames'.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agodpaa2-eth: send a scatter-gather FD instead of realloc-ing
Ioana Ciornei [Mon, 29 Jun 2020 18:47:11 +0000 (21:47 +0300)]
dpaa2-eth: send a scatter-gather FD instead of realloc-ing

Instead of realloc-ing the skb on the Tx path when the provided headroom
is smaller than the HW requirements, create a Scatter/Gather frame
descriptor with only one entry.

Remove the '[drv] tx realloc frames' counter exposed previously through
ethtool since it is no longer used.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'sfc-prerequisites-for-EF100-driver-part-1'
David S. Miller [Tue, 30 Jun 2020 00:37:49 +0000 (17:37 -0700)]
Merge branch 'sfc-prerequisites-for-EF100-driver-part-1'

Edward Cree says:

====================
sfc: prerequisites for EF100 driver, part 1

This continues the work started by Alex Maftei <amaftei@solarflare.com>
 in the series "sfc: code refactoring", "sfc: more code refactoring",
 "sfc: even more code refactoring" and "sfc: refactor mcdi filtering
 code", to prepare for a new driver which will share much of the code
 to support the new EF100 family of Solarflare/Xilinx NICs.
After this series, there will be approximately two more of these
 'prerequisites' series, followed by the sfc_ef100 driver itself.

v2: fix reverse xmas tree in patch 5.  (Left the cases in patches 7,
 9 and 14 alone as those are all in pure movement of existing code.)
====================

Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agosfc: extend common GRO interface to support CHECKSUM_COMPLETE
Edward Cree [Mon, 29 Jun 2020 13:36:56 +0000 (14:36 +0100)]
sfc: extend common GRO interface to support CHECKSUM_COMPLETE

EF100 will use CHECKSUM_COMPLETE, but will also make use of
 efx_rx_packet_gro(), thus needs to be able to pass the checksum value
 into that function.
Drivers for older NICs pass in a csum of 0 to get the old semantics (use
 the RX flags for CHECKSUM_UNNECESSARY marking).

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agosfc: commonise ARFS handling
Edward Cree [Mon, 29 Jun 2020 13:36:33 +0000 (14:36 +0100)]
sfc: commonise ARFS handling

EF100 will use the same approach to ARFS as EF10.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agosfc: commonise drain event handling
Edward Cree [Mon, 29 Jun 2020 13:39:32 +0000 (14:39 +0100)]
sfc: commonise drain event handling

Avoids a call from generic MCDI code into ef10.c.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agosfc: commonise PCI error handlers
Edward Cree [Mon, 29 Jun 2020 13:35:41 +0000 (14:35 +0100)]
sfc: commonise PCI error handlers

EF100 will use the same mechanisms for PCI error recovery.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agosfc: track which BAR is mapped
Edward Cree [Mon, 29 Jun 2020 13:35:33 +0000 (14:35 +0100)]
sfc: track which BAR is mapped

EF100 needs to map multiple BARs (sequentially, not concurrently) in
 order to read the Function Control Window during probe.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agosfc: commonise FC advertising
Edward Cree [Mon, 29 Jun 2020 13:35:25 +0000 (14:35 +0100)]
sfc: commonise FC advertising

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agosfc: commonise other ethtool bits
Edward Cree [Mon, 29 Jun 2020 13:35:15 +0000 (14:35 +0100)]
sfc: commonise other ethtool bits

A few more ethtool handlers which EF100 will share.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agosfc: commonise ethtool NFC and RXFH/RSS functions
Edward Cree [Mon, 29 Jun 2020 13:35:05 +0000 (14:35 +0100)]
sfc: commonise ethtool NFC and RXFH/RSS functions

EF100 will share EF10's model of filtering, hashing and spreading.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agosfc: commonise ethtool link handling functions
Edward Cree [Mon, 29 Jun 2020 13:34:50 +0000 (14:34 +0100)]
sfc: commonise ethtool link handling functions

Link speeds, FEC, and autonegotiation are all things EF100 will share.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agosfc: split up nic.h
Edward Cree [Mon, 29 Jun 2020 13:34:39 +0000 (14:34 +0100)]
sfc: split up nic.h

The new nic_common.h contains the inlines for NIC-type function dispatch,
 declarations for NIC-generic functions in nic.c, and other similar NIC-
 generic functionality.  Retained in nic.h are NIC-specific declarations
 such as the siena and ef10 nic_data structs and various farch functions.

The EF100 driver will thus include nic_common.h but not nic.h.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agosfc: refactor EF10 stats handling
Edward Cree [Mon, 29 Jun 2020 13:34:20 +0000 (14:34 +0100)]
sfc: refactor EF10 stats handling

Separate the generation-count handling from the format conversion, to
 make it easier to re-use both for EF100.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agosfc: don't try to create more channels than we can have VIs
Edward Cree [Mon, 29 Jun 2020 13:33:44 +0000 (14:33 +0100)]
sfc: don't try to create more channels than we can have VIs

Calculate efx->max_vis at probe time, and check against it in
 efx_allocate_msix_channels() when considering whether to create XDP TX
 channels.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agosfc: extend bitfield macros up to POPULATE_DWORD_13
Edward Cree [Mon, 29 Jun 2020 13:33:03 +0000 (14:33 +0100)]
sfc: extend bitfield macros up to POPULATE_DWORD_13

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agosfc: determine flag word automatically in efx_has_cap()
Edward Cree [Mon, 29 Jun 2020 13:32:46 +0000 (14:32 +0100)]
sfc: determine flag word automatically in efx_has_cap()

Now that we have an _OFST definition for each individual flag bit,
 callers of efx_has_cap() don't need to specify which flag word it's
 in; we can just use the flag name directly in MCDI_CAPABILITY_OFST.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agosfc: update MCDI protocol headers
Edward Cree [Mon, 29 Jun 2020 13:32:31 +0000 (14:32 +0100)]
sfc: update MCDI protocol headers

The script used to generate these now includes _OFST definitions for
 flags, to identify the containing flag word.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet:qos: police action offloading parameter 'burst' change to the original value
Po Liu [Mon, 29 Jun 2020 06:54:16 +0000 (14:54 +0800)]
net:qos: police action offloading parameter 'burst' change to the original value

Since 'tcfp_burst' with TICK factor, driver side always need to recover
it to the original value, this patch moves the generic calculation and
recover to the 'burst' original value before offloading to device driver.

Signed-off-by: Po Liu <po.liu@nxp.com>
Acked-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'MPTCP-improve-fallback-to-TCP'
David S. Miller [Tue, 30 Jun 2020 00:29:38 +0000 (17:29 -0700)]
Merge branch 'MPTCP-improve-fallback-to-TCP'

Davide Caratti says:

====================
MPTCP: improve fallback to TCP

there are situations where MPTCP sockets should fall-back to regular TCP:
this series reworks the fallback code to pursue the following goals:

1) cleanup the non fallback code, removing most of 'if (<fallback>)' in
   the data path
2) improve performance for non-fallback sockets, avoiding locks in poll()

further work will also leverage on this changes to achieve:

a) more consistent behavior of gestockopt()/setsockopt() on passive sockets
   after fallback
b) support for "infinite maps" as per RFC8684, section 3.7

the series is made of the following items:

- patch 1 lets sendmsg() / recvmsg() / poll() use the main socket also
  after fallback
- patch 2 fixes 'simultaneous connect' scenario after fallback. The
  problem was present also before the rework, but the fix is much easier
  to implement after patch 1
- patch 3, 4, 5 are clean-ups for code that is no more needed after the
  fallback rework
- patch 6 fixes a race condition between close() and poll(). The problem
  was theoretically present before the rework, but it became almost
  systematic after patch 1
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agomptcp: close poll() races
Paolo Abeni [Mon, 29 Jun 2020 20:26:25 +0000 (22:26 +0200)]
mptcp: close poll() races

mptcp_poll always return POLLOUT for unblocking
connect(), ensure that the socket is a suitable
state.
The MPTCP_DATA_READY bit is never cleared on accept:
ensure we don't leave mptcp_accept() with an empty
accept queue and such bit set.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agomptcp: __mptcp_tcp_fallback() returns a struct sock
Paolo Abeni [Mon, 29 Jun 2020 20:26:24 +0000 (22:26 +0200)]
mptcp: __mptcp_tcp_fallback() returns a struct sock

Currently __mptcp_tcp_fallback() always return NULL
on incoming connections, because MPTCP does not create
the additional socket for the first subflow.
Since the previous commit no __mptcp_tcp_fallback()
caller needs a struct socket, so let __mptcp_tcp_fallback()
return the first subflow sock and cope correctly even with
incoming connections.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agomptcp: create first subflow at msk creation time
Paolo Abeni [Mon, 29 Jun 2020 20:26:23 +0000 (22:26 +0200)]
mptcp: create first subflow at msk creation time

This cleans the code a bit and makes the behavior more consistent.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agomptcp: check for plain TCP sock at accept time
Paolo Abeni [Mon, 29 Jun 2020 20:26:22 +0000 (22:26 +0200)]
mptcp: check for plain TCP sock at accept time

This cleanup the code a bit and avoid corrupted states
on weird syscall sequence (accept(), connect()).

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agomptcp: fallback in case of simultaneous connect
Davide Caratti [Mon, 29 Jun 2020 20:26:21 +0000 (22:26 +0200)]
mptcp: fallback in case of simultaneous connect

when a MPTCP client tries to connect to itself, tcp_finish_connect() is
never reached. Because of this, depending on the socket current state,
multiple faulty behaviours can be observed:

1) a WARN_ON() in subflow_data_ready() is hit
 WARNING: CPU: 2 PID: 882 at net/mptcp/subflow.c:911 subflow_data_ready+0x18b/0x230
 [...]
 CPU: 2 PID: 882 Comm: gh35 Not tainted 5.7.0+ #187
 [...]
 RIP: 0010:subflow_data_ready+0x18b/0x230
 [...]
 Call Trace:
  tcp_data_queue+0xd2f/0x4250
  tcp_rcv_state_process+0xb1c/0x49d3
  tcp_v4_do_rcv+0x2bc/0x790
  __release_sock+0x153/0x2d0
  release_sock+0x4f/0x170
  mptcp_shutdown+0x167/0x4e0
  __sys_shutdown+0xe6/0x180
  __x64_sys_shutdown+0x50/0x70
  do_syscall_64+0x9a/0x370
  entry_SYSCALL_64_after_hwframe+0x44/0xa9

2) client is stuck forever in mptcp_sendmsg() because the socket is not
   TCP_ESTABLISHED

 crash> bt 4847
 PID: 4847   TASK: ffff88814b2fb100  CPU: 1   COMMAND: "gh35"
  #0 [ffff8881376ff680] __schedule at ffffffff97248da4
  #1 [ffff8881376ff778] schedule at ffffffff9724a34f
  #2 [ffff8881376ff7a0] schedule_timeout at ffffffff97252ba0
  #3 [ffff8881376ff8a8] wait_woken at ffffffff958ab4ba
  #4 [ffff8881376ff940] sk_stream_wait_connect at ffffffff96c2d859
  #5 [ffff8881376ffa28] mptcp_sendmsg at ffffffff97207fca
  #6 [ffff8881376ffbc0] sock_sendmsg at ffffffff96be1b5b
  #7 [ffff8881376ffbe8] sock_write_iter at ffffffff96be1daa
  #8 [ffff8881376ffce8] new_sync_write at ffffffff95e5cb52
  #9 [ffff8881376ffe50] vfs_write at ffffffff95e6547f
 #10 [ffff8881376ffe90] ksys_write at ffffffff95e65d26
 #11 [ffff8881376fff28] do_syscall_64 at ffffffff956088ba
 #12 [ffff8881376fff50] entry_SYSCALL_64_after_hwframe at ffffffff9740008c
     RIP: 00007f126f6956ed  RSP: 00007ffc2a320278  RFLAGS: 00000217
     RAX: ffffffffffffffda  RBX: 0000000020000044  RCX: 00007f126f6956ed
     RDX: 0000000000000004  RSI: 00000000004007b8  RDI: 0000000000000003
     RBP: 00007ffc2a3202a0   R8: 0000000000400720   R9: 0000000000400720
     R10: 0000000000400720  R11: 0000000000000217  R12: 00000000004004b0
     R13: 00007ffc2a320380  R14: 0000000000000000  R15: 0000000000000000
     ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b

3) tcpdump captures show that DSS is exchanged even when MP_CAPABLE handshake
   didn't complete.

 $ tcpdump -tnnr bad.pcap
 IP 127.0.0.1.20000 > 127.0.0.1.20000: Flags [S], seq 3208913911, win 65483, options [mss 65495,sackOK,TS val 3291706876 ecr 3291694721,nop,wscale 7,mptcp capable v1], length 0
 IP 127.0.0.1.20000 > 127.0.0.1.20000: Flags [S.], seq 3208913911, ack 3208913912, win 65483, options [mss 65495,sackOK,TS val 3291706876 ecr 3291706876,nop,wscale 7,mptcp capable v1], length 0
 IP 127.0.0.1.20000 > 127.0.0.1.20000: Flags [.], ack 1, win 512, options [nop,nop,TS val 3291706876 ecr 3291706876], length 0
 IP 127.0.0.1.20000 > 127.0.0.1.20000: Flags [F.], seq 1, ack 1, win 512, options [nop,nop,TS val 3291707876 ecr 3291706876,mptcp dss fin seq 0 subseq 0 len 1,nop,nop], length 0
 IP 127.0.0.1.20000 > 127.0.0.1.20000: Flags [.], ack 2, win 512, options [nop,nop,TS val 3291707876 ecr 3291707876], length 0

force a fallback to TCP in these cases, and adjust the main socket
state to avoid hanging in mptcp_sendmsg().

Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/35
Reported-by: Christoph Paasch <cpaasch@apple.com>
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: mptcp: improve fallback to TCP
Davide Caratti [Mon, 29 Jun 2020 20:26:20 +0000 (22:26 +0200)]
net: mptcp: improve fallback to TCP

Keep using MPTCP sockets and a use "dummy mapping" in case of fallback
to regular TCP. When fallback is triggered, skip addition of the MPTCP
option on send.

Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/11
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/22
Co-developed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: phy: marvell10g: support XFI rate matching mode
Baruch Siach [Sun, 28 Jun 2020 07:04:51 +0000 (10:04 +0300)]
net: phy: marvell10g: support XFI rate matching mode

When the hardware MACTYPE hardware configuration pins are set to "XFI
with Rate Matching" the PHY interface operate at fixed 10Gbps speed. The
MAC buffer packets in both directions to match various wire speeds.

Read the MAC Type field in the Port Control register, and set the MAC
interface speed accordingly.

Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge tag 'mlx5-tls-2020-06-26' of git://git.kernel.org/pub/scm/linux/kernel/git...
David S. Miller [Tue, 30 Jun 2020 00:18:40 +0000 (17:18 -0700)]
Merge tag 'mlx5-tls-2020-06-26' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-tls-2020-06-26

1) Improve hardware layouts and structure for kTLS support

2) Generalize ICOSQ (Internal Channel Operations Send Queue)
Due to the asynchronous nature of adding new kTLS flows and handling
HW asynchronous kTLS resync requests, the XSK ICOSQ was extended to
support generic async operations, such as kTLS add flow and resync, in
addition to the existing XSK usages.

3) kTLS hardware flow steering and classification:
The driver already has the means to classify TCP ipv4/6 flows to send them
to the corresponding RSS HW engine, as reflected in patches 3 through 5,
the series will add a steering layer that will hook to the driver's TCP
classifiers and will match on well known kTLS connection, in case of a
match traffic will be redirected to the kTLS decryption engine, otherwise
traffic will continue flowing normally to the TCP RSS engine.

3) kTLS add flow RX HW offload support
New offload contexts post their static/progress params WQEs
(Work Queue Element) to communicate the newly added kTLS contexts
over the per-channel async ICOSQ.

The Channel/RQ is selected according to the socket's rxq index.

A new TLS-RX workqueue is used to allow asynchronous addition of
steering rules, out of the NAPI context.
It will be also used in a downstream patch in the resync procedure.

Feature is OFF by default. Can be turned on by:
$ ethtool -K <if> tls-hw-rx-offload on

4) Added mlx5 kTLS sw stats and new counters are documented in
Documentation/networking/tls-offload.rst
rx_tls_ctx - number of TLS RX HW offload contexts added to device for
decryption.

rx_tls_ooo - number of RX packets which were part of a TLS stream
but did not arrive in the expected order and triggered the resync
procedure.

rx_tls_del - number of TLS RX HW offload contexts deleted from device
(connection has finished).

rx_tls_err - number of RX packets which were part of a TLS stream
 but were not decrypted due to unexpected error in the state machine.

5) Asynchronous RX resync

a. The NIC driver indicates that it would like to resync on some TLS
record within the received packet (P), but the driver does not
know (yet) which of the TLS records within the packet.
At this stage, the NIC driver will query the device to find the exact
TCP sequence for resync (tcpsn), however, the driver does not wait
for the device to provide the response.

b. Eventually, the device responds, and the driver provides the tcpsn
within the resync packet to KTLS. Now, KTLS can check the tcpsn against
any processed TLS records within packet P, and also against any record
that is processed in the future within packet P.

The asynchronous resync path simplifies the device driver, as it can
save bits on the packet completion (32-bit TCP sequence), and pass this
information on an asynchronous command instead.

Performance:
    CPU: Intel(R) Xeon(R) CPU E5-2687W v4 @ 3.00GHz, 24 cores, HT off
    NIC: ConnectX-6 Dx 100GbE dual port

    Goodput (app-layer throughput) comparison:
    +---------------+-------+-------+---------+
    | # connections |   1   |   4   |    8    |
    +---------------+-------+-------+---------+
    | SW (Gbps)     |  7.26 | 24.70 |   50.30 |
    +---------------+-------+-------+---------+
    | HW (Gbps)     | 18.50 | 64.30 |   92.90 |
    +---------------+-------+-------+---------+
    | Speedup       | 2.55x | 2.56x | 1.85x * |
    +---------------+-------+-------+---------+

    * After linerate is reached, diff is observed in CPU util
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'TC-Introduce-qevents'
David S. Miller [Tue, 30 Jun 2020 00:08:28 +0000 (17:08 -0700)]
Merge branch 'TC-Introduce-qevents'

Petr Machata says:

====================
TC: Introduce qevents

The Spectrum hardware allows execution of one of several actions as a
result of queue management decisions: tail-dropping, early-dropping,
marking a packet, or passing a configured latency threshold or buffer
size. Such packets can be mirrored, trapped, or sampled.

Modeling the action to be taken as simply a TC action is very attractive,
but it is not obvious where to put these actions. At least with ECN marking
one could imagine a tree of qdiscs and classifiers that effectively
accomplishes this task, albeit in an impractically complex manner. But
there is just no way to match on dropped-ness of a packet, let alone
dropped-ness due to a particular reason.

To allow configuring user-defined actions as a result of inner workings of
a qdisc, this patch set introduces a concept of qevents. Those are attach
points for TC blocks, where filters can be put that are executed as the
packet hits well-defined points in the qdisc algorithms. The attached
blocks can be shared, in a manner similar to clsact ingress and egress
blocks, arbitrary classifiers with arbitrary actions can be put on them,
etc.

For example:

red limit 500K avpkt 1K qevent early_drop block 10
matchall action mirred egress mirror dev eth1

The central patch #2 introduces several helpers to allow easy and uniform
addition of qevents to qdiscs: initialization, destruction, qevent block
number change validation, and qevent handling, i.e. dispatch of the filters
attached to the block bound to a qevent.

Patch #1 adds root_lock argument to qdisc enqueue op. The problem this is
tackling is that if a qevent filter pushes packets to the same qdisc tree
that holds the qevent in the first place, attempt to take qdisc root lock
for the second time will lead to a deadlock. To solve the issue, qevent
handler needs to unlock and relock the root lock around the filter
processing. Passing root_lock around makes it possible to get the lock
where it is needed, and visibly so, such that it is obvious the lock will
be used when invoking a qevent.

The following two patches, #3 and #4, then add two qevents to the RED
qdisc: "early_drop" qevent fires when a packet is early-dropped; "mark"
qevent, when it is ECN-marked.

Patch #5 contains a selftest. I have mentioned this test when pushing the
RED ECN nodrop mode and said that "I have no confidence in its portability
to [...] different configurations". That still holds. The backlog and
packet size are tuned to make the test deterministic. But it is better than
nothing, and on the boxes that I ran it on it does work and shows that
qevents work the way they are supposed to, and that their addition has not
broken the other tested features.

This patch set does not deal with offloading. The idea there is that a
driver will be able to figure out that a given block is used in qevent
context by looking at binder type. A future patch-set will add a qdisc
pointer to struct flow_block_offload, which a driver will be able to
consult to glean the TC or other relevant attributes.

Changes from RFC to v1:
- Move a "q = qdisc_priv(sch)" from patch #3 to patch #4
- Fix deadlock caused by mirroring packet back to the same qdisc tree.
- Rename "tail" qevent to "tail_drop".
- Adapt to the new 100-column standard.
- Add a selftest
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoselftests: forwarding: Add a RED test for SW datapath
Petr Machata [Fri, 26 Jun 2020 22:45:29 +0000 (01:45 +0300)]
selftests: forwarding: Add a RED test for SW datapath

This test is inspired by the mlxsw RED selftest. It is much simpler to set
up (also because there is no point in testing PRIO / RED encapsulation). It
tests bare RED, ECN and ECN+nodrop modes of operation. On top of that it
tests RED early_drop and mark qevents.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: sched: sch_red: Add qevents "early_drop" and "mark"
Petr Machata [Fri, 26 Jun 2020 22:45:28 +0000 (01:45 +0300)]
net: sched: sch_red: Add qevents "early_drop" and "mark"

In order to allow acting on dropped and/or ECN-marked packets, add two new
qevents to the RED qdisc: "early_drop" and "mark". Filters attached at
"early_drop" block are executed as packets are early-dropped, those
attached at the "mark" block are executed as packets are ECN-marked.

Two new attributes are introduced: TCA_RED_EARLY_DROP_BLOCK with the block
index for the "early_drop" qevent, and TCA_RED_MARK_BLOCK for the "mark"
qevent. Absence of these attributes signifies "don't care": no block is
allocated in that case, or the existing blocks are left intact in case of
the change callback.

For purposes of offloading, blocks attached to these qevents appear with
newly-introduced binder types, FLOW_BLOCK_BINDER_TYPE_RED_EARLY_DROP and
FLOW_BLOCK_BINDER_TYPE_RED_MARK.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: sched: sch_red: Split init and change callbacks
Petr Machata [Fri, 26 Jun 2020 22:45:27 +0000 (01:45 +0300)]
net: sched: sch_red: Split init and change callbacks

In the following patches, RED will get two qevents. The implementation will
be clearer if the callback for change is not a pure subset of the callback
for init. Split the two and promote attribute parsing to the callbacks
themselves from the common code, because it will be handy there.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: sched: Introduce helpers for qevent blocks
Petr Machata [Fri, 26 Jun 2020 22:45:26 +0000 (01:45 +0300)]
net: sched: Introduce helpers for qevent blocks

Qevents are attach points for TC blocks, where filters can be put that are
executed when "interesting events" take place in a qdisc. The data to keep
and the functions to invoke to maintain a qevent will be largely the same
between qevents. Therefore introduce sched-wide helpers for qevent
management.

Currently, similarly to ingress and egress blocks of clsact pseudo-qdisc,
blocks attachment cannot be changed after the qdisc is created. To that
end, add a helper tcf_qevent_validate_change(), which verifies whether
block index attribute is not attached, or if it is, whether its value
matches the current one (i.e. there is no material change).

The function tcf_qevent_handle() should be invoked when qdisc hits the
"interesting event" corresponding to a block. This function releases root
lock for the duration of executing the attached filters, to allow packets
generated through user actions (notably mirred) to be reinserted to the
same qdisc tree.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: sched: Pass root lock to Qdisc_ops.enqueue
Petr Machata [Fri, 26 Jun 2020 22:45:25 +0000 (01:45 +0300)]
net: sched: Pass root lock to Qdisc_ops.enqueue

A following patch introduces qevents, points in qdisc algorithm where
packet can be processed by user-defined filters. Should this processing
lead to a situation where a new packet is to be enqueued on the same port,
holding the root lock would lead to deadlocks. To solve the issue, qevent
handler needs to unlock and relock the root lock when necessary.

To that end, add the root lock argument to the qdisc op enqueue, and
propagate throughout.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'net-ethernet-ti-am65-cpsw-update-and-enable-sr2-0-soc'
David S. Miller [Tue, 30 Jun 2020 00:06:19 +0000 (17:06 -0700)]
Merge branch 'net-ethernet-ti-am65-cpsw-update-and-enable-sr2-0-soc'

Grygorii Strashko says:

====================
net: ethernet: ti: am65-cpsw: update and enable sr2.0 soc

This series contains set of improvements for TI AM654x/J721E CPSW2G driver and
adds support for TI AM654x SR2.0 SoC.

Patch 1: adds vlans restoration after "if down/up"
Patches 2-5: improvments
Patch 6: adds support for TI AM654x SR2.0 SoC which allows to disable errata i2027 W/A.
By default, errata i2027 W/A (TX csum offload disabled) is enabled on AM654x SoC
for backward compatibility, unless SR2.0 SoC is identified using SOC BUS framework.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: ethernet: ti: am65-cpsw-nuss: enable am65x sr2.0 support
Grygorii Strashko [Fri, 26 Jun 2020 18:17:09 +0000 (21:17 +0300)]
net: ethernet: ti: am65-cpsw-nuss: enable am65x sr2.0 support

The AM65x SR2.0 MCU CPSW has fixed errata i2027 "CPSW: CPSW Does Not
Support CPPI Receive Checksum (Host to Ethernet) Offload Feature". This
errata also fixed for J271E SoC.

Use SOC bus data for K3 SoC identification and apply i2027 errata w/a only
for the AM65x SR1.0 SoC.

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: ethernet: ti: am65-cpsw-ethtool: configured critical setting only when no runnin...
Grygorii Strashko [Fri, 26 Jun 2020 18:17:08 +0000 (21:17 +0300)]
net: ethernet: ti: am65-cpsw-ethtool: configured critical setting only when no running netdevs

Ensure that critical setting can only be configured when there are no
running netdevs - all ports are down.

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: ethernet: ti: am65-cpsw-ethtool: skip hw cfg when change p0-rx-ptype-rrobin
Grygorii Strashko [Fri, 26 Jun 2020 18:17:07 +0000 (21:17 +0300)]
net: ethernet: ti: am65-cpsw-ethtool: skip hw cfg when change p0-rx-ptype-rrobin

Skip HW configuration when p0-rx-ptype-rrobin is changed as it will be done
by .ndev_open(),

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: ethernet: ti: am65-cpsw-nuss: fix ports mac sl initialization
Grygorii Strashko [Fri, 26 Jun 2020 18:17:06 +0000 (21:17 +0300)]
net: ethernet: ti: am65-cpsw-nuss: fix ports mac sl initialization

The MAC SL has to be initialized for each port otherwise
am65_cpsw_nuss_slave_disable_unused() will crash for disabled ports.

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: ethernet: ti: am65-cpsw: move to pf_p0_rx_ptype_rrobin init in probe
Grygorii Strashko [Fri, 26 Jun 2020 18:17:05 +0000 (21:17 +0300)]
net: ethernet: ti: am65-cpsw: move to pf_p0_rx_ptype_rrobin init in probe

The pf_p0_rx_ptype_rrobin is global parameter so move its initialization in
probe.

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: ethernet: ti: am65-cpsw-nuss: restore vlan configuration while down/up
Grygorii Strashko [Fri, 26 Jun 2020 18:17:04 +0000 (21:17 +0300)]
net: ethernet: ti: am65-cpsw-nuss: restore vlan configuration while down/up

The vlan configuration is not restored after interface down/up sequence.

Steps to check:
 # ip link add link eth0 name eth0.100 type vlan id 100
 # ifconfig eth0 down
 # ifconfig eth0 up

This patch fixes it, restoring vlan ALE entries on .ndo_open().

Fixes: e7364a21077b ("net: ethernet: ti: introduce am65x/j721e gigabit eth subsystem driver")
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoliquidio: use list_empty_careful in lio_list_delete_head
Geliang Tang [Sun, 28 Jun 2020 10:14:13 +0000 (18:14 +0800)]
liquidio: use list_empty_careful in lio_list_delete_head

Use list_empty_careful() instead of open-coding.

Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agosctp: use list_is_singular in sctp_list_single_entry
Geliang Tang [Sun, 28 Jun 2020 09:32:25 +0000 (17:32 +0800)]
sctp: use list_is_singular in sctp_list_single_entry

Use list_is_singular() instead of open-coding.

Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years ago8390: Fix coding-style issues
Armin Wolf [Sat, 27 Jun 2020 22:07:47 +0000 (00:07 +0200)]
8390: Fix coding-style issues

Fix some coding-style issues, including one which
made the function pointers in the struct ei_device
hard to understand.

Signed-off-by: Armin Wolf <W_Armin@gmx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: mscc: ocelot: remove EXPORT_SYMBOL from ocelot_net.c
Vladimir Oltean [Sat, 27 Jun 2020 12:03:06 +0000 (15:03 +0300)]
net: mscc: ocelot: remove EXPORT_SYMBOL from ocelot_net.c

Now that all net_device operations are bundled together inside
mscc_ocelot.ko and no longer part of the common library, there's no
reason to export these symbols.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'r8169-make-RTL8401-a-separate-chip-version'
David S. Miller [Mon, 29 Jun 2020 03:56:38 +0000 (20:56 -0700)]
Merge branch 'r8169-make-RTL8401-a-separate-chip-version'

Heiner Kallweit says:

====================
r8169: make RTL8401 a separate chip version

So far RTL8401 was treated like a RTL8101e, means we relied on the BIOS
to configure MAC and PHY properly. Make RTL8401 a separate chip version
and copy MAC / PHY config from r8101 vendor driver.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agor8169: sync support for RTL8401 with vendor driver
Heiner Kallweit [Sun, 28 Jun 2020 21:17:07 +0000 (23:17 +0200)]
r8169: sync support for RTL8401 with vendor driver

So far RTL8401 was treated like a RTL8101e, means we relied on the BIOS
to configure MAC and PHY properly. Make RTL8401 a separate chip version
and copy MAC / PHY config from r8101 vendor driver.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agor8169: merge handling of RTL8101e and RTL8100e
Heiner Kallweit [Sun, 28 Jun 2020 21:15:45 +0000 (23:15 +0200)]
r8169: merge handling of RTL8101e and RTL8100e

Chip versions 13, 14, 15 are treated the same by the driver, therefore
let's merge them.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'netdev_tx_t'
David S. Miller [Mon, 29 Jun 2020 03:52:53 +0000 (20:52 -0700)]
Merge branch 'netdev_tx_t'

Luc Van Oostenryck says:

====================
net: always use netdev_tx_t for xmit()'s return type

The ndo_start_xmit() methods should return a 'netdev_tx_t', not
an int, and so should return NETDEV_TX_OK, not 0.
The patches in the series fix most of the remaning drivers and
subsystems (those included in allyesconfig on x86).
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agocxgb4vf: fix t4vf_eth_xmit()'s return type
Luc Van Oostenryck [Sun, 28 Jun 2020 19:53:37 +0000 (21:53 +0200)]
cxgb4vf: fix t4vf_eth_xmit()'s return type

The method ndo_start_xmit() is defined as returning an 'netdev_tx_t',
which is a typedef for an enum type, but the implementation in this
driver returns an 'int'.

Fix this by returning 'netdev_tx_t' in this driver too.

Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agol2tp: fix l2tp_eth_dev_xmit()'s return type
Luc Van Oostenryck [Sun, 28 Jun 2020 19:53:36 +0000 (21:53 +0200)]
l2tp: fix l2tp_eth_dev_xmit()'s return type

The method ndo_start_xmit() is defined as returning an 'netdev_tx_t',
which is a typedef for an enum type, but the implementation in this
driver returns an 'int'.

Fix this by returning 'netdev_tx_t' in this driver too.

Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet/hsr: fix hsr_dev_xmit()'s return type
Luc Van Oostenryck [Sun, 28 Jun 2020 19:53:35 +0000 (21:53 +0200)]
net/hsr: fix hsr_dev_xmit()'s return type

The method ndo_start_xmit() is defined as returning an 'netdev_tx_t',
which is a typedef for an enum type, but the implementation in this
driver returns an 'int'.

Fix this by returning 'netdev_tx_t' in this driver too.

Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agousbnet: ipheth: fix ipheth_tx()'s return type
Luc Van Oostenryck [Sun, 28 Jun 2020 19:53:34 +0000 (21:53 +0200)]
usbnet: ipheth: fix ipheth_tx()'s return type

The method ndo_start_xmit() is defined as returning an 'netdev_tx_t',
which is a typedef for an enum type, but the implementation in this
driver returns an 'int'.

Fix this by returning 'netdev_tx_t' in this driver too.

Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: plip: fix plip_tx_packet()'s return type
Luc Van Oostenryck [Sun, 28 Jun 2020 19:53:33 +0000 (21:53 +0200)]
net: plip: fix plip_tx_packet()'s return type

The method ndo_start_xmit() is defined as returning an 'netdev_tx_t',
which is a typedef for an enum type, but the implementation in this
driver returns an 'int'.

Fix this by returning 'netdev_tx_t' in this driver too.

Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: dwc-xlgmac: fix xlgmac_xmit()'s return type
Luc Van Oostenryck [Sun, 28 Jun 2020 19:53:32 +0000 (21:53 +0200)]
net: dwc-xlgmac: fix xlgmac_xmit()'s return type

The method ndo_start_xmit() is defined as returning an 'netdev_tx_t',
which is a typedef for an enum type, but the implementation in this
driver returns an 'int'.

Fix this by returning 'netdev_tx_t' in this driver too.

Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: pch_gbe: fix pch_gbe_xmit_frame()'s return type
Luc Van Oostenryck [Sun, 28 Jun 2020 19:53:31 +0000 (21:53 +0200)]
net: pch_gbe: fix pch_gbe_xmit_frame()'s return type

The method ndo_start_xmit() is defined as returning an 'netdev_tx_t',
which is a typedef for an enum type, but the implementation in this
driver returns an 'int'.

Fix this by returning 'netdev_tx_t' in this driver too.

Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: nfp: fix nfp_net_tx()'s return type
Luc Van Oostenryck [Sun, 28 Jun 2020 19:53:30 +0000 (21:53 +0200)]
net: nfp: fix nfp_net_tx()'s return type

The method ndo_start_xmit() is defined as returning an 'netdev_tx_t',
which is a typedef for an enum type, but the implementation in this
driver returns an 'int'.

Fix this by returning 'netdev_tx_t' in this driver too.

Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: nb8800: fix nb8800_xmit()'s return type
Luc Van Oostenryck [Sun, 28 Jun 2020 19:53:29 +0000 (21:53 +0200)]
net: nb8800: fix nb8800_xmit()'s return type

The method ndo_start_xmit() is defined as returning an 'netdev_tx_t',
which is a typedef for an enum type, but the implementation in this
driver returns an 'int'.

Fix this by returning 'netdev_tx_t' in this driver too.

Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: arc_emac: fix arc_emac_tx()'s return type
Luc Van Oostenryck [Sun, 28 Jun 2020 19:53:28 +0000 (21:53 +0200)]
net: arc_emac: fix arc_emac_tx()'s return type

The method ndo_start_xmit() is defined as returning an 'netdev_tx_t',
which is a typedef for an enum type, but the implementation in this
driver returns an 'int'.

Fix this by returning 'netdev_tx_t' in this driver too.

Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: aquantia: fix aq_ndev_start_xmit()'s return type
Luc Van Oostenryck [Sun, 28 Jun 2020 19:53:27 +0000 (21:53 +0200)]
net: aquantia: fix aq_ndev_start_xmit()'s return type

The method ndo_start_xmit() is defined as returning an 'netdev_tx_t',
which is a typedef for an enum type, but the implementation in this
driver returns an 'int'.

Fix this by returning 'netdev_tx_t' in this driver too.

Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agocaif: fix cfv_netdev_tx()'s return type
Luc Van Oostenryck [Sun, 28 Jun 2020 19:53:26 +0000 (21:53 +0200)]
caif: fix cfv_netdev_tx()'s return type

The method ndo_start_xmit() is defined as returning an 'netdev_tx_t',
which is a typedef for an enum type, but the implementation in this
driver returns an 'int'.

Fix this by returning 'netdev_tx_t' in this driver too.

Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agocaif: fix cfspi_xmit()'s return type
Luc Van Oostenryck [Sun, 28 Jun 2020 19:53:25 +0000 (21:53 +0200)]
caif: fix cfspi_xmit()'s return type

The method ndo_start_xmit() is defined as returning an 'netdev_tx_t',
which is a typedef for an enum type, but the implementation in this
driver returns an 'int'.

Fix this by returning 'netdev_tx_t' in this driver too and
returning NETDEV_TX_OK instead of 0 accordingly.

Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agocaif: fix caif_xmit()'s return type
Luc Van Oostenryck [Sun, 28 Jun 2020 19:53:24 +0000 (21:53 +0200)]
caif: fix caif_xmit()'s return type

The method ndo_start_xmit() is defined as returning an 'netdev_tx_t',
which is a typedef for an enum type, but the implementation in this
driver returns an 'int'.

Fix this by returning 'netdev_tx_t' in this driver too.

Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agocail,hsi: fix cfhsi_xmit()'s return type
Luc Van Oostenryck [Sun, 28 Jun 2020 19:53:23 +0000 (21:53 +0200)]
cail,hsi: fix cfhsi_xmit()'s return type

The method ndo_start_xmit() is defined as returning an 'netdev_tx_t',
which is a typedef for an enum type, but the implementation in this
driver returns an 'int'.

Fix this by returning 'netdev_tx_t' in this driver too and
returning NETDEV_TX_OK instead of 0 accordingly.

Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agobareudp: Added attribute to enable & disable rx metadata collection
Martin [Sun, 28 Jun 2020 17:48:23 +0000 (23:18 +0530)]
bareudp: Added attribute to enable & disable rx metadata collection

Metadata need not be collected in receive if the packet from bareudp
device is not targeted to openvswitch.

Signed-off-by: Martin <martin.varghese@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'hinic-add-some-ethtool-ops-support'
David S. Miller [Mon, 29 Jun 2020 03:40:58 +0000 (20:40 -0700)]
Merge branch 'hinic-add-some-ethtool-ops-support'

Luo bin says:

====================
hinic: add some ethtool ops support

patch #1: support to set and get pause params with
          "ethtool -A/a" cmd
patch #2: support to set and get irq coalesce params with
          "ethtool -C/c" cmd
patch #3: support to do self test with "ethtool -t" cmd
patch #4: support to identify physical device with "ethtool -p" cmd
patch #5: support to get eeprom information with "ethtool -m" cmd
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agohinic: add support to get eeprom information
Luo bin [Sun, 28 Jun 2020 12:36:24 +0000 (20:36 +0800)]
hinic: add support to get eeprom information

add support to get eeprom information from the plug-in module
with ethtool -m cmd.

Signed-off-by: Luo bin <luobin9@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agohinic: add support to identify physical device
Luo bin [Sun, 28 Jun 2020 12:36:23 +0000 (20:36 +0800)]
hinic: add support to identify physical device

add support to identify physical device by flashing an LED
attached to it with ethtool -p cmd.

Signed-off-by: Luo bin <luobin9@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agohinic: add self test support
Luo bin [Sun, 28 Jun 2020 12:36:22 +0000 (20:36 +0800)]
hinic: add self test support

add support to excute internal and external loopback test with
ethtool -t cmd.

Signed-off-by: Luo bin <luobin9@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agohinic: add support to set and get irq coalesce
Luo bin [Sun, 28 Jun 2020 12:36:21 +0000 (20:36 +0800)]
hinic: add support to set and get irq coalesce

add support to set TX/RX irq coalesce params with ethtool -C and
get these params with ethtool -c.

Signed-off-by: Luo bin <luobin9@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agohinic: add support to set and get pause params
Luo bin [Sun, 28 Jun 2020 12:36:20 +0000 (20:36 +0800)]
hinic: add support to set and get pause params

add support to set pause params with ethtool -A and get pause
params with ethtool -a. Also remove set_link_ksettings ops for VF
and enable pause by default.

Signed-off-by: Luo bin <luobin9@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'tcp-improve-delivered-counts-in-SCM_TSTAMP_ACK'
David S. Miller [Sun, 28 Jun 2020 00:41:27 +0000 (17:41 -0700)]
Merge branch 'tcp-improve-delivered-counts-in-SCM_TSTAMP_ACK'

Yousuk Seung says:

====================
tcp: improve delivered counts in SCM_TSTAMP_ACK

Currently delivered and delivered_ce in OPT_STATS of SCM_TSTAMP_ACK do
not fully reflect the current ack being timestamped. Also they are not
in sync as the delivered count includes packets being sacked and some of
cumulatively acked but delivered_ce includes none.

This patch series updates tp->delivered and tp->delivered_ce together to
keep them in sync. It also moves generating SCM_TSTAMP_ACK to later in
tcp_clean_rtx_queue() to reflect packets being cumulatively acked up
until the current skb for sack-enabled connections.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agotcp: update delivered_ce with delivered
Yousuk Seung [Sat, 27 Jun 2020 04:05:35 +0000 (21:05 -0700)]
tcp: update delivered_ce with delivered

Currently tp->delivered is updated in various places in tcp_ack() but
tp->delivered_ce is updated once at the end. As a result two counts in
OPT_STATS of SCM_TSTAMP_ACK timestamps generated in tcp_ack() may not be
in sync. This patch updates both counts at the same in tcp_ack().

Signed-off-by: Yousuk Seung <ysseung@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agotcp: count sacked packets in tcp_sacktag_state
Yousuk Seung [Sat, 27 Jun 2020 04:05:34 +0000 (21:05 -0700)]
tcp: count sacked packets in tcp_sacktag_state

Add sack_delivered to tcp_sacktag_state and count the number of sacked
and dsacked packets. This is pure refactor for future patches to improve
tracking delivered counts.

Signed-off-by: Yousuk Seung <ysseung@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agotcp: add ece_ack flag to reno sack functions
Yousuk Seung [Sat, 27 Jun 2020 04:05:33 +0000 (21:05 -0700)]
tcp: add ece_ack flag to reno sack functions

Pass a boolean flag that tells the ECE state of the current ack to reno
sack functions. This is pure refactor for future patches to improve
tracking delivered counts.

Signed-off-by: Yousuk Seung <ysseung@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agotcp: stamp SCM_TSTAMP_ACK later in tcp_clean_rtx_queue()
Yousuk Seung [Sat, 27 Jun 2020 04:05:32 +0000 (21:05 -0700)]
tcp: stamp SCM_TSTAMP_ACK later in tcp_clean_rtx_queue()

Currently tp->delivered is updated with sacked packets but not
cumulatively acked when SCP_TSTAMP_ACK is timestamped. This patch moves
a tcp_ack_tstamp() call in tcp_clean_rtx_queue() to later in the loop so
that when a skb is fully acked OPT_STATS of SCM_TSTAMP_ACK will include
the current skb in the delivered count. When not fully acked
tcp_ack_tstamp() is a no-op and there is no change in behavior.

Signed-off-by: Yousuk Seung <ysseung@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet/mlx5e: kTLS, Improve rx handler function call
Tariq Toukan [Mon, 15 Jun 2020 10:02:49 +0000 (13:02 +0300)]
net/mlx5e: kTLS, Improve rx handler function call

Prior to this patch mlx5e tls rx handler was called unconditionally on
all rx frames and the decision whether a frame is a valid tls record
is done inside that function.  A function call can be expensive especially
for regular rx packet rate.  To avoid this, check the tls validity before
jumping into the tls rx handler.

While at it, split between kTLS device offload rx handler and FPGA tls rx
handler using a similar method.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
4 years agonet/mlx5e: kTLS, Cleanup redundant capability check
Tariq Toukan [Mon, 22 Jun 2020 15:32:36 +0000 (18:32 +0300)]
net/mlx5e: kTLS, Cleanup redundant capability check

All callers of mlx5e_ktls_build_netdev() check capability
before the call.
Remove the repeated check in the function.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
4 years agonet/mlx5e: Increase Async ICO SQ size
Tariq Toukan [Thu, 18 Jun 2020 09:45:59 +0000 (12:45 +0300)]
net/mlx5e: Increase Async ICO SQ size

Resync communication with HW for kTLS RX is done via the
async ICOSQs.
kTLS RX resync requests might come in bursts. To improve the
success chances for such bursts, use a larger ICOSQ.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
4 years agonet/mlx5e: kTLS, Add kTLS RX stats
Tariq Toukan [Mon, 15 Jun 2020 12:25:23 +0000 (15:25 +0300)]
net/mlx5e: kTLS, Add kTLS RX stats

Add global and per-channel ethtool SW stats for the device
offload.
Document the new counters in tls-offload.rst.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
4 years agonet/mlx5e: kTLS, Add kTLS RX resync support
Tariq Toukan [Tue, 16 Jun 2020 12:15:06 +0000 (15:15 +0300)]
net/mlx5e: kTLS, Add kTLS RX resync support

Implement the RX resync procedure, using the TLS async resync API.

The HW offload of TLS decryption in RX side might get out-of-sync
due to out-of-order reception of packets.
This requires SW intervention to update the HW context and get it
back in-sync.

Performance:
CPU: Intel(R) Xeon(R) CPU E5-2687W v4 @ 3.00GHz, 24 cores, HT off
NIC: ConnectX-6 Dx 100GbE dual port

Goodput (app-layer throughput) comparison:
+---------------+-------+-------+---------+
| # connections |   1   |   4   |    8    |
+---------------+-------+-------+---------+
| SW (Gbps)     |  7.26 | 24.70 |   50.30 |
+---------------+-------+-------+---------+
| HW (Gbps)     | 18.50 | 64.30 |   92.90 |
+---------------+-------+-------+---------+
| Speedup       | 2.55x | 2.56x | 1.85x * |
+---------------+-------+-------+---------+

* After linerate is reached, diff is observed in CPU util.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
4 years agonet/tls: Add asynchronous resync
Boris Pismenny [Mon, 8 Jun 2020 16:11:38 +0000 (19:11 +0300)]
net/tls: Add asynchronous resync

This patch adds support for asynchronous resynchronization in tls_device.
Async resync follows two distinct stages:

1. The NIC driver indicates that it would like to resync on some TLS
record within the received packet (P), but the driver does not
know (yet) which of the TLS records within the packet.
At this stage, the NIC driver will query the device to find the exact
TCP sequence for resync (tcpsn), however, the driver does not wait
for the device to provide the response.

2. Eventually, the device responds, and the driver provides the tcpsn
within the resync packet to KTLS. Now, KTLS can check the tcpsn against
any processed TLS records within packet P, and also against any record
that is processed in the future within packet P.

The asynchronous resync path simplifies the device driver, as it can
save bits on the packet completion (32-bit TCP sequence), and pass this
information on an asynchronous command instead.

Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
4 years agoRevert "net/tls: Add force_resync for driver resync"
Boris Pismenny [Mon, 8 Jun 2020 09:42:52 +0000 (12:42 +0300)]
Revert "net/tls: Add force_resync for driver resync"

This reverts commit 30302fb53b5752f95c1d00f4a54dc81184660404.
Revert the force resync API.
Not in use. To be replaced by a better async resync API downstream.

Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
4 years agonet/mlx5e: kTLS, Add kTLS RX HW offload support
Tariq Toukan [Thu, 28 May 2020 07:13:00 +0000 (10:13 +0300)]
net/mlx5e: kTLS, Add kTLS RX HW offload support

Implement driver support for the kTLS RX HW offload feature.
Resync support is added in a downstream patch.

New offload contexts post their static/progress params WQEs
over the per-channel async ICOSQ, protected under a spin-lock.
The Channel/RQ is selected according to the socket's rxq index.

Feature is OFF by default. Can be turned on by:
$ ethtool -K <if> tls-hw-rx-offload on

A new TLS-RX workqueue is used to allow asynchronous addition of
steering rules, out of the NAPI context.
It will be also used in a downstream patch in the resync procedure.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
4 years agonet/mlx5e: kTLS, Use kernel API to extract private offload context
Tariq Toukan [Thu, 28 May 2020 07:04:03 +0000 (10:04 +0300)]
net/mlx5e: kTLS, Use kernel API to extract private offload context

Modify the implementation of the private kTLS TX HW offload context
getter and setter, so it uses the kernel API functions, instead of
a local shadow structure.
A single BUILD_BUG_ON check is sufficient, remove the duplicate.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
4 years agonet/mlx5e: kTLS, Improve TLS feature modularity
Tariq Toukan [Tue, 26 May 2020 10:58:09 +0000 (13:58 +0300)]
net/mlx5e: kTLS, Improve TLS feature modularity

Better separate the code into c/h files, so that kTLS internals
are exposed to the corresponding non-accel flow as follows:
- Necessary datapath functions are exposed via ktls_txrx.h.
- Necessary caps and configuration functions are exposed via ktls.h,
  which became very small.

In addition, kTLS internal code sharing is done via ktls_utils.h,
which is not exposed to any non-accel file.

Add explicit WQE structures for the TLS static and progress
params, breaking the union of the static with UMR, and the progress
with PSV.

Generalize the API as a preparation for TLS RX offload support.

Move kTLS TX-specific code to the proper file.
Remove the inline tag for function in C files, let the compiler decide.
Use kzalloc/kfree for the priv_tx context.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com>
4 years agonet/mlx5e: Accel, Expose flow steering API for rules add/del
Tariq Toukan [Tue, 16 Jun 2020 10:29:07 +0000 (13:29 +0300)]
net/mlx5e: Accel, Expose flow steering API for rules add/del

Given a socket, the function extracts the TCP/IP{4,6} ntuple
and adds rule to steering.
Another function gets the rule and deletes it.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com>
4 years agonet/mlx5e: Receive flow steering framework for accelerated TCP flows
Boris Pismenny [Sun, 14 Apr 2019 13:35:24 +0000 (16:35 +0300)]
net/mlx5e: Receive flow steering framework for accelerated TCP flows

The framework allows creating flow tables to steer incoming traffic of
TCP sockets to the acceleration TIRs.
This is used in downstream patches for TLS, and will be used in the
future for other offloads.

Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
4 years agonet/mlx5e: API to manipulate TTC rules destinations
Saeed Mahameed [Thu, 2 Apr 2020 09:02:33 +0000 (02:02 -0700)]
net/mlx5e: API to manipulate TTC rules destinations

Store the default destinations of the on-load generated TTC
(Traffic Type Classifier) rules in the ttc rules table.

Introduce TTC API functions to manipulate/restore and get the TTC rule
destination and use these API functions in arfs implementation.

This will allow a better decoupling between TTC implementation and its
users.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com>
4 years agonet/mlx5e: Refactor build channel params
Tariq Toukan [Sat, 13 Jun 2020 19:53:32 +0000 (22:53 +0300)]
net/mlx5e: Refactor build channel params

Take the CQ params into their respective RQ/SQ params.
Split the params build of the different ICOSQs (sync and async),
as they require different init values.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
4 years agonet/mlx5e: Turn XSK ICOSQ into a general asynchronous one
Tariq Toukan [Tue, 26 Nov 2019 14:23:23 +0000 (16:23 +0200)]
net/mlx5e: Turn XSK ICOSQ into a general asynchronous one

There is an upcoming demand (in downstream patches) for
an ICOSQ to be populated out of the NAPI context, asynchronously.

There is already an existing one serving XSK-related use case.
In this patch, promote this ICOSQ to serve as general async ICOSQ,
to be used for XSK and non-XSK flows.

As part of this, the reg_umr bit of the SQ context is now set
(if capable), as the general async ICOSQ should support possible
posts of UMR WQEs.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
4 years agoMerge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox...
Saeed Mahameed [Sat, 27 Jun 2020 21:00:04 +0000 (14:00 -0700)]
Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux

* 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
  net/mlx5: kTLS, Improve TLS params layout structures
  net/mlx5: Avoid eswitch header inclusion in fs core layer
  net/mlx5: Avoid RDMA file inclusion in core driver
  net/mlx5: Add support in query QP, CQ and MKEY segments
  net/mlx5: Export resource dump interface

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
4 years agonet/mlx5: kTLS, Improve TLS params layout structures
Tariq Toukan [Fri, 26 Jun 2020 05:59:43 +0000 (22:59 -0700)]
net/mlx5: kTLS, Improve TLS params layout structures

Add explicit WQE segment structures for the TLS static and progress
params.
According to the HW spec, TISN is not part of the progress params context,
take it out of it.
Rename the control segment tisn field as it could hold either a TIS or
a TIR number.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
4 years agonet/mlx5: Avoid eswitch header inclusion in fs core layer
Parav Pandit [Fri, 26 Jun 2020 05:59:42 +0000 (22:59 -0700)]
net/mlx5: Avoid eswitch header inclusion in fs core layer

Flow steering core layer is independent of the eswitch layer.
Hence avoid fs_core dependency on eswitch.

Fixes: fc6043b9c888 ("net/mlx5: Split FDB fast path prio to multiple namespaces")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
4 years agonet/mlx5: Avoid RDMA file inclusion in core driver
Parav Pandit [Fri, 26 Jun 2020 05:59:41 +0000 (22:59 -0700)]
net/mlx5: Avoid RDMA file inclusion in core driver

mlx5 cq.h does not depend on RDMA verbs.
Remove RDMA verbs file inclusion.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
4 years agoMerge branch 'net-atlantic-various-non-functional-changes'
David S. Miller [Fri, 26 Jun 2020 23:32:51 +0000 (16:32 -0700)]
Merge branch 'net-atlantic-various-non-functional-changes'

Igor Russkikh says:

====================
net: atlantic: various non-functional changes

This patchset contains several non-functional changes, which were made in
out of tree driver over the time.
Mostly typos, checkpatch findings and comment fixes.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: atlantic: put ptp code under IS_REACHABLE check
Igor Russkikh [Fri, 26 Jun 2020 18:40:38 +0000 (21:40 +0300)]
net: atlantic: put ptp code under IS_REACHABLE check

A1 requires additional processing for both egress and ingress to support
PTP.
And it makes sense to get rid of this processing altogether (via ifdef),
if PTP clock is disabled globally.

This patch puts the PTP code under the corresponding IS_REACHABLE check.

Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: atlantic: add alignment checks in hw_atl2_utils_fw.c
Mark Starovoytov [Fri, 26 Jun 2020 18:40:37 +0000 (21:40 +0300)]
net: atlantic: add alignment checks in hw_atl2_utils_fw.c

This patch adds alignment checks in all the helper macros in
hw_atl2_utils_fw.c
These alignment checks are compile-time, so runtime is not affected.

All these helper macros assume the length to be aligned (multiple of 4).
If it's not aligned, then there might be issues, e.g. stack corruption.

Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: atlantic: missing space in a comment in aq_nic.h
Dmitry Bezrukov [Fri, 26 Jun 2020 18:40:36 +0000 (21:40 +0300)]
net: atlantic: missing space in a comment in aq_nic.h

This patch add a missing space in the comment in aq_nic.h

Signed-off-by: Dmitry Bezrukov <dbezrukov@marvell.com>
Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>